Tuesday, September 23, 2008
Saberists predict better than Insiders
Thanks to Vegas Watch, we see that PECOTA and Neyer were off by 10 games, Vegas was off by 11, and Olney/Phillips were off by 13. Lovely.
Now, here’s the fun part. Ready? If you had forecasted every single team to finish as 81-81, the RMSE would have been 10.6. That is, of the 12 smartest and most experienced guys that Vegas Watch decided to track, here is the list, with the perfectly competitive balanced vote listed:
9.6 PECOTA
10.2 Neyer
10.5 Law
10.6 Perfectly Competitive Balanced (all teams predicted at 81-81)
10.8 Vegas
11.1 Passan
11.3 Sheehan
11.4 Brown
11.7 Kurkijan
12.1 Stark
12.1 Henson
12.4 Phillips
13.0 Olney
Says it all doesn’t it?
We know that the “insiders” are terrible at pretty much anything resembling analysis. So that is a dead issue.
The reason that 81-81 for every team does so “well” might just be that RMSE is a really bad way of evaluating these numbers. By going with 81-81, obviously you guarantee that you won’t be off by more than a certain amount. If you predict 93 wins for a team, there is a certain chance that they will win only 70 and you will be off by 23 wins! Almost never going to happen (23 win difference) if you pick 81.
How about just evaluating everyone based on whether that picked a team to be under or over .500? How does 81-81 do against everyone else? Or against the Vegas line? Lot of better ways of evaluating these projections than RMSE. RMSE might work if you are comparing one person’s picks to another. Even then, I am not sure. One of the problems is that one really bad result is going to skew someone’s results badly. Say I nail just about every team in terms of how good or bad they are, but I pick one team to win 130 games and they “only” win 91. I probably have a bad average RMSE, but I pretty much nailed everyone. In fact, if you knew that you were going to be evaluated using RMSE, it might behoove you to keep all your numbers near 81. A good team gets 85 and a bad one 77. How would each person do if we changes their picks using that criteria? I bet that just about everyone does better than 81-81!