KBO Model Day 3: KBO Home Field Advantage Update and Starting Pitchers

Note: I’ve fixed an issue with the scraper that didn’t include double headers. You can see the updated KBO home field advantage here.

Yesterday I took a crack at finding the KBO’s home field advantage as an input into my KBO prediction model. I had hoped to find a unique value for the league’s home field advantage to bake into my model. Unfortunately, after grabbing results from that last 4 years, it turns out that the leagues home field advantages was the same as the MLB number I had used a placeholder. For some reason my scraper would die in the 2015 season.

I did some debugging and realized that there was just a single game that had a bad data layer that the parser could deal with. After hard coding it to skip that game, I was able to compile records for that last 10 seasons.

SeasonHome Team WinsVisiting Team WinsTotal GamesHome Field Advantage

Whoa! So while over the last four seasons, the KBO home field advantage has been around 8%, over the last decade it’s less than 5%! And at the beginning of that span, it was negligible! Graphically, here’s each year’s HFA, and the cumulative HFA since the beginning of the sample:

KBO Home Field Advantage, 2010-2019

KBO Home Field Advantage, 2010-2019

The home field advantage has been steadily climbing. While I could probably go further back, I’m not sure it would be useful for my model, which is what I really care about here. And given the shape here, I’ve decided to weigh the last 3 years at 3x, previous 3 at 2x and first 4 at 1x. My new HFA constant is 6.67%.

For today’s games, I factored that new HFA in. The model had 2 games with about a 10% disagreement with the market, and 3 games with less than a 3% disagreement. I opted to play the top 2, but looked at the starting pitchers first. I did this manually as I still don’t have a bottoms up way to factor in individual players. My “eye-ball” test said I should stick with the home dog Samsung Lions over the Kia Tigers. However, I liked the pitcher better for the LG Twins, so I went against the model and bet against the NC Dinos.

The results: 1-1. Would have been 2-0 if I would have stuck with the NC Dinos, and 4-1 if I would have played all 5 of the model’s recommendations.

Since the games happen at 5:30, I’ve been checking them out bleary-eyed around 6 AM when my 6 year-old burst through the door. I’m starting to get a sense that runs happen in bunches in the second half of the game. Perhaps starting pitchers are less important than I thought. Perhaps starters are less significant in KBO than in the MLB? There are lots of theories like this to develop and test.

Since I have the data saved in JSON files thanks to my scraper, I was be able to whip up a quick analysis of runs by inning over time, re-using the code I wrote to determine the home field advantage, as well as the length of an average start.

KBO Runs Scored by Inning, 2010-2019

KBO Runs Scored by Inning

Percent of KBO Runs Scored by Inning, 2010-2019

Percent of KBO Runs Scored by Inning

Over the last 5 years, starters have averaged just a shade over 5 innings per game in the KBO. Over that same period, the first 5 innings have accounted for an average of 58.5% of scoring. Seems valuing starting pitchers will still be an important input to the model.

Luckily, I just saw this morning that Fangraphs now has player level ZiPS projections. I’ll dig into that later today as well.