Friday, July 23, 2010
How to evaluate a forecasting system, using David Price as our test case
I have the pre-season ordered draft list from Marcel The Monkey Forecasting System (The Marcels) that has David Price ranked as #143. I also have twenty other pro forecasting systems (The Pros) that has him ranked as high as #153 and as low as #325. The consensus rank was #216. So, Marcel has him the highest.
David Price, as of the All-Star break, has 77 Fantasy Points, which ranks him #17 in MLB. These are the sixteen players ahead of him:
105 P Jimenez, Ubaldo
103 P Wainwright, Adam
99 OF Crawford, Carl
96 P Johnson, Josh
96 OF Hamilton, Josh
95 1B Cabrera, Miguel
87 P Halladay, Roy
86 P Lester, Jon
84 OF Rios, Alex
83 1B Guerrero, Vladimir
82 1B Pujols, Albert
82 IF Wright, David
82 1B Votto, Joey
82 OF Gonzalez, Carlos
80 IF Cano, Robinson
80 P Latos, Mat
When it comes to evaluating the forecasting systems, how is it that we should evaluate their forecast of David Price? Let me give you some choices, none of which you would be wrong to choose:
CHOICE 1.
Only Marcel gets credit for its evaluation of David Price. In the real world of Fantasy Baseball (or MLB for that matter), David Price can only go to one team. And, since Marcel likes him the most out of the Pros, then Marcel would be the one that would get him on the team.
CHOICE 2.
Give Marcel most of the credit, but give some credit to the other systems. Indeed, when I run 1000 simulated drafts of these 22 forecasting systems, Marcel ends up owning him 847 times. Another Pro ends up owning him 143 times. A third Pro owns him six times, and a fourth Pro owns him 4 times. The other 18 Pros never owned David Price. One of the Pros ended up ranking him #171. And he never got him. He may as well have ranked him #325, because the end result would have been the same thing.
You may be thinking: Marcel ranks him #143 (say that’s 16 Fantasy$) and someone else ranks him #171 (say that’s 14$), so, it’s “not fair” that Marcel gets a substantial credit. Well, that may be so, but because Marcel ranked him a bit higher, Marcel is the one who will get him most of the time.
CHOICE 3.
Give credit more proportionally. If David Price has turned in say a 28$ performance, and Marcel evaluated him at 16$, then he was off by 12$. The other forecasters were off by 13$ to, say, 20$. So, that’s the “error” they get. This would basically be the traditional way of measuring the value of a forecasting system.
But, it really diminishes what Marcel really got out of the deal. After all, he was placed in a league with the same 21 Pros, and he did get him 847 out of 1000 times. Why would Marcel get an error value of 12$, while someone who got him only 143 times gets an error value of 13$?
CHOICE 4.
What if Marcel was not part of the league? After all, the Pros are not competing against all the other Pros in one league. Indeed, we never know what kind of competition we are going to be faced with. So, what if we just do a head-to-head competition of two Pros against each other?
If I pit Marcel in 21 two-man leagues, one against each Pro, where Marcel drafts first, and then another 21 leagues where Marcel drafts second, Marcel ends up with David Price in 42 of its 42 leagues. We can then look at another Pro (Pro ID #115), and we see that he wins Price in 40 of its 42 leagues (that is, he wins Price in all leagues that Marcel is not in it). What this does is give a proportional value that is more indicative of how strongly each forecaster wanted Price. Here is how many times each of the 22 Pros gets Price (leaders and trailers only):
picks rank# fan
42 143 217 (Marcel)
40 153 115
37 174 102
37 171 127
...
6 279 120
4 291 111
2 309 125
0 325 108
So, Marcel gets 42 shares of Price’s 77 points, while Pro ID#115 gets 40 shares, while the forecaster who put Price at #325 gets no shares of Price.
It is similar in concept to the proportional valuing of choice #3 above, except it is stretched out more.
CHOICE 5.
Give no value to any of the systems for David Price. $16 for David Price seems pretty low. That’s not to say it’s incorrect. That the most extreme of the Pros put him as as high as 16$ is important. But, is that how much Price would actually go for in a real league? What if we put one of the Pros (say Marcel) in a league with 21 regular Joes? Is it conceivable that everyone of those Joes will have Price at 15$ or less?
I looked at the 21 Fangraphs reader submissions that had the most Rays selections. Of those, seven had Price ranked higher than Marcel. The overall average of all the Fangraphs readers was pretty much a match for the overall average of all the Forecasters. Put simply, while the Pros had Price in the 10-15$ range, the Joes had him in the 5-20$ range. And you put any of the Pros in a league with 21 Joes, and the Pros never get Price. Indeed, running simulations confirms this. Think of the winner’s curse, where out of 21 somewhat rational bidders, at least one of them will win out against any one of the 21 machines it is pit against, each of which is beholden to its rigid rankings.
Conclusion
Is it that important that Marcel ranked Price higher than all the other Pros? Well, that’s really the question. How important could it be, if one-third of its real-life competition (the Joes) put him higher? The only time Marcel can own David Price is if he’s in a league of other like-minded folks, that has regression as its driver, and has no optimistic participants to bid against.
So, the floor is yours. Answer these questions:
a) What exactly is it that we are trying to do?
b) Why is it that you are trying to do that?
c) How do you go about doing that? (Choose one of the above five choices, or create your own sixth one.)
What I do NOT want you to do is come already decided with what you want, and then work backwards as to why your process is the one that works best. That’s what politicians do. In this blog, we’re in the process of presenting objectives, and then finding solutions to meet those objectives.


I would use choice 3 or 4, or some similar variation (like correlation of pro’s ranking and actual season ranking). That is, rate the systems based on all their choices, not just those players they mainly draft/buy.
Why? Because random results will have too much influence on your ratings if you don’t do this. In any given season, a handful of big overperformers (Price, Johnson) and underperformers (mainly high draft choices who get hurt) will have an outsize impact on the results. The pros that picked/avoided these players may have known a little bit more about them, but not THIS much more. If you had dozens of seasons to rate these pros, it would all wash out eventually—but you don’t.
What we really want to know is the underlying talent of the pros and their systems. By looking at ALL their rankings, you effectively give yourself a bigger sample size of each pro’s “talent.” But with choices 1 or 2, you are essentially using only a very small sample.