KBO Model Day 3: KBO Home Field Advantage Update and Starting Pitchers

Note: I’ve fixed an issue with the scraper that didn’t include double headers. You can see the updated KBO home field advantage here.

Yesterday I took a crack at finding the KBO’s home field advantage as an input into my KBO prediction model. I had hoped to find a unique value for the league’s home field advantage to bake into my model. Unfortunately, after grabbing results from that last 4 years, it turns out that the leagues home field advantages was the same as the MLB number I had used a placeholder. For some reason my scraper would die in the 2015 season.

I did some debugging and realized that there was just a single game that had a bad data layer that the parser could deal with. After hard coding it to skip that game, I was able to compile records for that last 10 seasons.

SeasonHome Team WinsVisiting Team WinsTotal GamesHome Field Advantage
201933126259311.64%
20183873227099.17%
20174023487507.20%
20164013607615.39%
20153923587504.53%
20142662535192.50%
20132992965950.50%
20122852775621.42%
20112672385055.74%
2010258262520-0.77%
Total3288297662644.98%

Whoa! So while over the last four seasons, the KBO home field advantage has been around 8%, over the last decade it’s less than 5%! And at the beginning of that span, it was negligible! Graphically, here’s each year’s HFA, and the cumulative HFA since the beginning of the sample:

KBO Home Field Advantage, 2010-2019

KBO Home Field Advantage, 2010-2019

The home field advantage has been steadily climbing. While I could probably go further back, I’m not sure it would be useful for my model, which is what I really care about here. And given the shape here, I’ve decided to weigh the last 3 years at 3x, previous 3 at 2x and first 4 at 1x. My new HFA constant is 6.67%.

For today’s games, I factored that new HFA in. The model had 2 games with about a 10% disagreement with the market, and 3 games with less than a 3% disagreement. I opted to play the top 2, but looked at the starting pitchers first. I did this manually as I still don’t have a bottoms up way to factor in individual players. My “eye-ball” test said I should stick with the home dog Samsung Lions over the Kia Tigers. However, I liked the pitcher better for the LG Twins, so I went against the model and bet against the NC Dinos.

The results: 1-1. Would have been 2-0 if I would have stuck with the NC Dinos, and 4-1 if I would have played all 5 of the model’s recommendations.

Since the games happen at 5:30, I’ve been checking them out bleary-eyed around 6 AM when my 6 year-old burst through the door. I’m starting to get a sense that runs happen in bunches in the second half of the game. Perhaps starting pitchers are less important than I thought. Perhaps starters are less significant in KBO than in the MLB? There are lots of theories like this to develop and test.

Since I have the data saved in JSON files thanks to my scraper, I was be able to whip up a quick analysis of runs by inning over time, re-using the code I wrote to determine the home field advantage, as well as the length of an average start.

KBO Runs Scored by Inning, 2010-2019

KBO Runs Scored by Inning

Percent of KBO Runs Scored by Inning, 2010-2019

Percent of KBO Runs Scored by Inning

Over the last 5 years, starters have averaged just a shade over 5 innings per game in the KBO. Over that same period, the first 5 innings have accounted for an average of 58.5% of scoring. Seems valuing starting pitchers will still be an important input to the model.

Luckily, I just saw this morning that Fangraphs now has player level ZiPS projections. I’ll dig into that later today as well.

Previously:

Next: