Thursday, October 04, 2007
Forecast Evaluations
Thankfully, there’s always somebody one step ahead of me, that saves me alot of time in doing stuff. This time, it’s Nate Silver evaluating various forecasts, including Marcel. He gives you a few different ways to evaluate things (which is good), but doesn’t give you the best one. Let me explain:
He runs through the standard ones, which is good that he does so: correlation, average error, RMSE. That’s exactly how I would have started it. But, I’m going to tell you why each of those is wrong.
***
Correlation: the standard equation is y=mx+b. That’s what we learned in high school. You have your sample points (x, or the forecast), and you have a slope (m) and intercept (b) to best-fit it to the actual result (y, or OPS). Let me make this clear: in what we are trying to do here, evaluate the forecasts, m MUST be 1. It must! If you were to chop in half every single forecast that Marcel made, do you know what would happen to its correlation? Nothing! The above equation would compensate by doubling the slope.
With m firmly set to 1, all that is left is the intercept. And that is simply a league corrector. HOWEVER, it must be set PRIOR to seeing the sample data. Once again, imagine that the players that make up the sample pool are NOT representative of the entire forecasted pool of players (and they are not). What happens? You are applying a league corrector to force the mean of your limited sample forecast to the mean of the same player’s actual totals. What you should do is get the entire set of player forecasts and infer the league OPS that the forecast presumed. And it is THAT value that makes up the intercept (b).
That is, after all is said and done, you have the m=1, and b=lgOPS minus forecasterLgOPS. And once you have that, then all that is left is simply the actual OPS to compare to forecast OPS+ forecasted “b”. Once you have that, you have two choices: average absolute differences, and the RMSE (square root of the squared differences). I prefer the former, because salary and fantasty dollars are linearly correlated to production.
***
Average error: this includes a benefit to the forecasting system that guesses the right league OPS (or runs per game). Irrelevant. All salary and Fantasy dollars are essentially indexed to the league average. All you really care about is not OPS but OPS above league average. It’s the same darn thing. So, I’m glad that Nate did it, and I’m glad he pointed it out. But, this must be completed discarded in any fashion.
RMSE: again, it’s the exact same issue as average error.
***
In short, RMSE and Average error ARE the correct ways to do it. But, only after applying the correction noted above. That said, the correlation would likely come out with a slope (m) of 1 for each system. And it would probably come out with the same “b” for each system, whether it used the sample of players, or the population of players.
When I finally get around to doing my evaluations (of some of those, plus BIS, and the Community Forecasts),you’ll see all this.
***
Finally, great job about throwing all the systems into the pot, and seeing which ones are “most unique”. Ideally, Marcel should have been a “0” here, since it has no uniqueness whatsoever. Every system should be building on it. That it doesn’t means that some systems are not learning.
However, with so many similar forecasting systems, a couple that are very similar could wipe each other out. For example,say that THT and Marcel are both very unique. And, if you had just one of them, the uniqueness coefficient would stand out. But because both are there, one, or both, gets knocked down in the correlation. What would be better (and longer) is doing head-to-head, and seeing which won explains the actual OPS more. In this case, you have to be careful about the slope and intercept, as explained above.
Finally, and this should be very easy for Nate: what is the correlation of ALL the forecasting systems against OPS?
Double-finally: how about correlation if you only look at last year’s stats? This was a very bad year for forecasting systems, as I explained in MGL’s earlier thread today. I’m thinking you might get a number right around .59 with your lower threshhold.
Ouch, Rally reminded me of something that you have to do as well, and this will remove the need for a minimum playing time: the absolute error, weighted by PA.
So
abs(actual minus adjustedForecast) * actualPA
sum all that for all players, and divide by the sum of actualPA.
For guys that don’t have a forecast, you have two ways to do it. One, give them an OPS of league average (that’s what Marcel does).
A second way, is to give them a low OPS, say 100 points below league average. This way, this kills something like Marcel, which it should!
***
Another thing, this doesn’t help for Fantasy forecasts, where playing time IS part of the forecast. Therefore, what you actually want to do is:
1. (actual OPS minus league OPS) * actualPA
2. (forecast OPS minus forecast League OPS) * forecastPA
3. abs(1 minus 2)
For guys who are missing, I think it’s more appropriate to do what Marcel does (gives 200 PA standard), and make his forecasted OPS 100 points under the league OPS.
I would also force the actual OPS at 100 points below the league average. Why do this? Because what we care about is the Fantasy dollars or salaries. And, those have floors. So, ideally, what we are trying to do is combine forecasted OPS and forecasted PA into dollar figures.
If it makes it easier for people to follow/understand that we should convert the OPS,PA figures into dollars first (which have an obvious floor of zero), then do that.