Yesterday was opening day for the KBO, the Korean baseball major league. While I missed it, as in, I didn’t follow it, I was excited for the first semblance of live sports returning.
To celebrate, and atone for missing opening day, I’ve decided to make a model similar to my 2019 MLB model, only applying what I have learned since.
There is a ton more I can and should write about last year’s MLB model, but here’s the gist:
- I based the work on Joe Peta’s Trading Bases
- As a baseline, I used full season win projections.
- For each match up, I’d use the announced starting pitchers and lineups to adjust the baseline winning percentages. For example, if a team was expected to win 82 games (.500 recored or 50% of their games), but was starting their ace, whose WAR was significantly higher than the rest of the staff, the team might be expected to win 98 games (.605 or 60% of their games). These adjustments were also applied to every position player as well.
- Historically, home teams in the MLB have won 54% of their games. This has been an extremely stable fact for over 100 years. This equates to an 8% home field advantage (54% - 46% = 8%). So I’d add 8% win percentage to the home team.
- To calculate the final win expectance for both teams, I’d divide their final expected win percentage by the sum of their two percentages.
- I’d compare those percentages to the implied probability from the betting market moneylines. Where my win expectancy for a team exceeded the market’s implied odds, I’d bet on that team. The higher the discrepancy, the larger the wager.
I started all this work before spring training, and tested the model and automations through out it so that I was ready for opening day.
For the KBO, I was already a day late. So I started as simply as I could for today’s slate.
KBOMitchster, Day 1
Not only am I late to the party, I’m data poor. There is a lot of really rich, interesting MLB data thanks to Sabermetric sites like Baseball Prospectus and Fangraphs. I’m still trying to wrap my head around what data is available for the KBO. A lot of resources are, unsurprisingly, in Korean. Which I don’t read.
Luckily, Fangraphs published a ZiPS-based projected standings including total win projections. So the first cut of the model starts with that as a wins baseline. As of now, I don’t have granular WAR or similar player values, so I’m having to roll with just the baseline wins for now.
For home field advantage, some quick googling did not uncover a similar home field advantage published for the KBO, so I rolled with the 8% MLB number. For today’s games (which happen at 5:30am ET), the simplified model spit out picks on all 5 games, all 5 were heavy dogs, 3 home and 2 away, with “edges” of 3-10%.
The results: 1-4. Not great. Small sample size and all, but pausing until I can refine a bit. I’ve found a scraper for historical KBO results, so I’m going to try to tackle seeing what the real home field advantage in KBO is. Perhaps I’m over estimating the impact of home field in KBO—especially given that all the stadiums are empty right now.