Note: I’ve fixed an issue with the scraper that didn’t include double headers. You can see the updated KBO home field advantage here.
Yesterday I took a crack at finding the KBO’s home field advantage as an input into my KBO prediction model. I had hoped to find a unique value for the league’s home field advantage to bake into my model. Unfortunately, after grabbing results from that last 4 years, it turns out that the leagues home field advantages was the same as the MLB number I had used a placeholder. For some reason my scraper would die in the 2015 season.
I did some debugging and realized that there was just a single game that had a bad data layer that the parser could deal with. After hard coding it to skip that game, I was able to compile records for that last 10 seasons.
|Season||Home Team Wins||Visiting Team Wins||Total Games||Home Field Advantage|
Whoa! So while over the last four seasons, the KBO home field advantage has been around 8%, over the last decade it’s less than 5%! And at the beginning of that span, it was negligible! Graphically, here’s each year’s HFA, and the cumulative HFA since the beginning of the sample:
The home field advantage has been steadily climbing. While I could probably go further back, I’m not sure it would be useful for my model, which is what I really care about here. And given the shape here, I’ve decided to weigh the last 3 years at 3x, previous 3 at 2x and first 4 at 1x. My new HFA constant is 6.67%.
For today’s games, I factored that new HFA in. The model had 2 games with about a 10% disagreement with the market, and 3 games with less than a 3% disagreement. I opted to play the top 2, but looked at the starting pitchers first. I did this manually as I still don’t have a bottoms up way to factor in individual players. My “eye-ball” test said I should stick with the home dog Samsung Lions over the Kia Tigers. However, I liked the pitcher better for the LG Twins, so I went against the model and bet against the NC Dinos.
The results: 1-1. Would have been 2-0 if I would have stuck with the NC Dinos, and 4-1 if I would have played all 5 of the model’s recommendations.
Since the games happen at 5:30, I’ve been checking them out bleary-eyed around 6 AM when my 6 year-old burst through the door. I’m starting to get a sense that runs happen in bunches in the second half of the game. Perhaps starting pitchers are less important than I thought. Perhaps starters are less significant in KBO than in the MLB? There are lots of theories like this to develop and test.
Since I have the data saved in JSON files thanks to my scraper, I was be able to whip up a quick analysis of runs by inning over time, re-using the code I wrote to determine the home field advantage, as well as the length of an average start.
Over the last 5 years, starters have averaged just a shade over 5 innings per game in the KBO. Over that same period, the first 5 innings have accounted for an average of 58.5% of scoring. Seems valuing starting pitchers will still be an important input to the model.
Luckily, I just saw this morning that Fangraphs now has player level ZiPS projections. I’ll dig into that later today as well.