Monday, March 19, 2007
MLB Odds
John Beamer checks in with his forecast for team wins, using the player forecasts of THT. The first sanity check is…
Figure out the win distribution for the league. Historically, the observed distribution is a bit over .07 wins per game. This implies a true rate of .06 wins per game. That is, obs^2 = true^2 + random^2, where obs^2 = .07^2 and random^2=sqrt(.5*.5/162)
So, .06 x 162 = 9.72. We expect 1 SD = 9.72 wins. The forecast shows 1 SD = 6.4. In fact, if you were to just create a completely random league with each team having the exact same talent level of players, you would expect 1 SD = 162 * random^2 = 6.4.
Something is terribly wrong here, and I think it’s because John says:
adjust the distribution a little so it agrees with the expected binomial distribution (it is debatable whether this is necessary but I unilaterally decided it was)
That was a terrible choice.
In 2006, the observed SD was .062, so if you want to argue that the talent level is more tightly distributed than historically, that’s ok with me. But, you can’t possibly assume all teams have the same talent levels. At the very least, make your estimate so that 1 SD = 8 wins.
I do look forward to this exciting nugget at the end:
Over the course of the season we’ll continue to follow how the odds of each team winning its division change. Also in the next month or two we’ll tackle a few related topics. One is our approach to calculating the odds of winning; two, is how our projected standings compare to prediction markets; and three is the accuracy of these projections.
Actually, what John did was worse than I thought, and how I explained it missed half the argument.
Let me restart. If every team has exactly the same talent level, you *must* predict every team at 81-81 (1 SD = 0), even though you *know* that they will end up, after 162 games, of 1 SD = .039.
If you assume that the true distribution is 1 SD = .039 (which is just a pure coincidence that it matches the binomial), then you’d be correct in making that your forecast. That is, if the true SD is .039, then the observed SD after 162 games will be sqrt(.039^2 + .039^2) * 162 = 8.9.
So, choosing the binomial as the expected distribution was just a very convenient way to do something that simply had no place for it.
As MGL pointed out in the other thread I linked to, you also have other parameters, beyond the talent level, that .062, namely change in talent level as the season goes on, parks, etc. I doubt any of that really has much impact.
In any case, let the player forecasts determine the distribution level, and not try to match the binomial.