THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, October 04, 2007

Forecast Evaluations

By Tangotiger, 11:37 AM

Thankfully, there’s always somebody one step ahead of me, that saves me alot of time in doing stuff.  This time, it’s Nate Silver evaluating various forecasts, including Marcel.  He gives you a few different ways to evaluate things (which is good), but doesn’t give you the best one.  Let me explain:


He runs through the standard ones, which is good that he does so: correlation, average error, RMSE.  That’s exactly how I would have started it.  But, I’m going to tell you why each of those is wrong.

***

Correlation: the standard equation is y=mx+b.  That’s what we learned in high school.  You have your sample points (x, or the forecast), and you have a slope (m) and intercept (b) to best-fit it to the actual result (y, or OPS).  Let me make this clear: in what we are trying to do here, evaluate the forecasts, m MUST be 1.  It must!  If you were to chop in half every single forecast that Marcel made, do you know what would happen to its correlation?  Nothing!  The above equation would compensate by doubling the slope. 

With m firmly set to 1, all that is left is the intercept.  And that is simply a league corrector.  HOWEVER, it must be set PRIOR to seeing the sample data.  Once again, imagine that the players that make up the sample pool are NOT representative of the entire forecasted pool of players (and they are not).  What happens?  You are applying a league corrector to force the mean of your limited sample forecast to the mean of the same player’s actual totals.  What you should do is get the entire set of player forecasts and infer the league OPS that the forecast presumed.  And it is THAT value that makes up the intercept (b).

That is, after all is said and done, you have the m=1, and b=lgOPS minus forecasterLgOPS.  And once you have that, then all that is left is simply the actual OPS to compare to forecast OPS+ forecasted “b”.  Once you have that, you have two choices: average absolute differences, and the RMSE (square root of the squared differences).  I prefer the former, because salary and fantasty dollars are linearly correlated to production.

***

Average error: this includes a benefit to the forecasting system that guesses the right league OPS (or runs per game).  Irrelevant.  All salary and Fantasy dollars are essentially indexed to the league average.  All you really care about is not OPS but OPS above league average.  It’s the same darn thing.  So, I’m glad that Nate did it, and I’m glad he pointed it out.  But, this must be completed discarded in any fashion.

RMSE: again, it’s the exact same issue as average error.

***

In short, RMSE and Average error ARE the correct ways to do it.  But, only after applying the correction noted above.  That said, the correlation would likely come out with a slope (m) of 1 for each system.  And it would probably come out with the same “b” for each system, whether it used the sample of players, or the population of players.

When I finally get around to doing my evaluations (of some of those, plus BIS, and the Community Forecasts),you’ll see all this.

***

Finally, great job about throwing all the systems into the pot, and seeing which ones are “most unique”.  Ideally, Marcel should have been a “0” here, since it has no uniqueness whatsoever.  Every system should be building on it.  That it doesn’t means that some systems are not learning.

However, with so many similar forecasting systems, a couple that are very similar could wipe each other out.  For example,say that THT and Marcel are both very unique.  And, if you had just one of them, the uniqueness coefficient would stand out.  But because both are there, one, or both, gets knocked down in the correlation.  What would be better (and longer) is doing head-to-head, and seeing which won explains the actual OPS more.  In this case, you have to be careful about the slope and intercept, as explained above.

Finally, and this should be very easy for Nate: what is the correlation of ALL the forecasting systems against OPS? 

Double-finally: how about correlation if you only look at last year’s stats? This was a very bad year for forecasting systems, as I explained in MGL’s earlier thread today.  I’m thinking you might get a number right around .59 with your lower threshhold.

(60) Comments • 2008/02/25 • SabermetricsForecasting
Page 1 of 1 pages

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:33
Avery being Avery

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP