Monday, June 08, 2009
David Gassko versus THT Forecasting system
In an article written before the season started, David presented 29 hitters and pitchers who he thought would over or under-perform their THT projections. This is interesting since I think that David plays a big part in these forecasts. Like most forecasting systems, I think that a computer essentially spits out each player’s projection and I don’t think that a human being, including David, tweaks them before they are released. I could be wrong, and David can correct me if I am.
Anyway, I either told David or I made a mental note that I was going to check his “intuition” at season’s end. I am always skeptical of these, “I can beat a good forecast system just by looking at the forecasts.” They are similar to the “players who I think will breakout or fizzle” articles, although it depends on who is authoring these articles or proclamations. David is a very smart guy and a very good sabermetrician - he should be working for a team someday, if he wants to of course. He has already done some work for some teams.
On the other hand, maybe it is not so hard for a good analyst or even a good scout or insider to take a list of projections and pick out the ones that are not very good. Being able to do that is very different from coming up with good projections of your own. In other words, a scout might easily be able to accurately identify 10 or 20 bad projections but he would likely get killed by a good projection system if he had to project every player. I wish we had done some kind of survey where we had analysts look at a good projection system like THT, Pecota, Chone, Oliver, ZIPS, etc., and try and identify their top 10 or 20 that they think were “wrong.” Because most projections systems are mechanical, that may not be all that hard to do. Of course, I am saying that after I have already checked on David’s picks. It would also probably depend on whether the projection system just spits out the numbers from a computer, as I think THT (and most of the others) does, or whether someone already goes through and tweaks them before they are released. For example, if David had tweaked the THT projections, presumably he would not have any players to choose as being over or under-rated.
Anyway, here is the good news and the bad news for David:
The good news is that out of the 14 hitters (I did not include Alex Gordon as he only has a few PA) he had on his list, he was correct on 10 of them and wrong on 4 of them. For each player, he said whether they would likely over or under-perform their THT projection.
For pitchers, he went 8-3.
So overall, he was 18-7, which is quite impressive.
Or is it?
He says in the article that he didn’t look at any other projection system before he chose these players. I have no reason not to believe him.
On the other hand, you have to wonder whether David is so smart or perhaps the THT system is so bad, or at least obviously bad on some of the players, that it was easy for David to come up with 25 players that they were likely to get wrong. Keep in mind that I think that the THT projections are very good, and they have probably fared well in the various “projection evaluations.” But, as I said, for various reasons, it is probably fairly easy and commonplace for even a good projection system to get some payers obviously wrong, if that projection system is “automated.” If the projection system includes tweaks by knowledgeable human beings, then by definition, it is not so easy to get any players obviously wrong (if it is obvious, they would be corrected by those human beings, right?). Again, I think that THT is an “automated” projection system with few if any “human tweaking.”
Anyway, to see if David was really smart or the THT projections on his 25 players were just “bad” I compared the THT projections to my projections for those 25 players. I have an independent projection system which is basically a “Marcel” with a lot of normalization (park, age, and opponent adjustments). There is nothing special about my projection system whatsoever. As I said, it is just a Marcel which is more finely-tuned than a basic Marcel. In fact, the strength of my projections are my park adjustments, but for most projections you don’t need any park adjustments, unless a player changes teams, which only happens 10-15% of the time or so (just a guess). In fact, in David’s list, almost none of the players changed teams from 08 to 09, which makes it even easier to project them.
So here is what I did:
If David thought that a player would outperform or under-perform his THT projection and so did my projection, I called that a “cheat” for David. If his subjective evaluation and my projection disagreed with one another (for example, he thought a player would do better than THT’s projection and I thought he would do worse), then I called that a “non-cheat.” To be clear, I don’t think that David cheated in any way by looking at anyone else’s projection, and he didn’t have access to my projections anyway, as they are not publicly available (for no particular reason, BTW).
So here is the tally:
Hitters
Cheats: 10
Non-cheats: 4
Of the 4 players in which David and I differed, as compared to the THT projections of course, David was right on 3 of them, Howard, Ichiro, and Cano, and I was right on one of them, Miggie Cabrera.
Pitchers
Cheats: 9
Non-cheats: 2
Of the non-cheats, David was right on one of them, Greinke, and I was right on one of them, Volquez.
So while I still think that David is a really smart guy, I am going to conclude that it is probably not that hard to identify bad projections from any system, and based on the fact that of the 25 players that David identified as “bad” projections from THT, I agreed with him on 19 of them, that those were truly bad projections on THT’s part and not necessarily great insight on David’s part.
On the other hand, to be able just to look at THT’s projections and without looking at any other projection system system and come up with 25 or so of the players that THT got “obviously” wrong may be an impressive feat after all.
If someone has some spare time, maybe they can do the same thing with a few other projection systems like Chone or Marcel. If they mostly agree with David, as mine did, then we can probably just take any projections system and assume that if the other projection systems all disagree with a player’s projection, then that projection is likely to be wrong.
Tango has a boat load of projections. He can probably come up with a list of 20 or so projections for each system that the other systems disagree with. I would guess that we would find that a majority of those players had “bad” projections in that one system.
If I remember I will revisit David’s picks at the end of the season, as obviously we have larger sample sizes to work with.


Mickey,
You’re stealing my fire! I just updated the spreadsheet I’ve been using to keep track, and was patting myself on the back. After all, through Saturday’s games, my results were as follows
Hitters
Out-perform: .779 OPS (Proj.), .826 (Act.), +.047 Diff
Under-perform: .927, .822, -.105
In other words, the hitters I thought would out-perform their projections indeed have, by 47 points of OPS, and those I thought would under-perform have done the same, by a whopping 105 points. Indeed, though the group I thought would under-perform was projected to be 150 points better than the group I thought would over-perform, they’ve been exactly equal more than two months into the season. What about the pitchers?
Pitcher
Out-perform: 4.29 ERA, 3.43, -0.85
Under-perform: 3.69, 4.98, +1.30
The difference here is even greater. The group I thought would be better than their projections has been better to the tune of 85 points of ERA and the group I thought would be worse has under-performed by a whopping 1.3 earned runs per nine. In fact, though the “under-perform” group was initially projected to be 0.6 runs better than the “over-perform group,” they’ve instead been 1.55 runs worse!
In all, this implies a pretty big win for me. But, I agree, it’s possible that I cherry picked the worst THT projections. Even if our system is very good (and since I designed it, I have to believe it is!) it could be that it misses on certain fairly easy to identify players. So what I did yesterday is add a control group, namely the CHONE projections. Since I did not look at the CHONE projections in picking which players I thought THT had missed, they can be used an independent control; technically, they shouldn’t show any bias. So how does CHONE compare to my picks?
Hitters
Over-perform: .797 (Proj.), .826 (Act.), +.029 (Diff)
Under-perform: .890, .822, -.068
CHONE’s projections for each group were a little more conservative than THT’s (as we would expect, since I chose the THT projections I thought looked the worst - if I had done this experiment with CHONE and used THT as the control, we would see the same thing). Still, CHONE projected more than a 90 point difference between the two groups, when it actuality, two months into the season, there is none. Indeed, the “over-perform” group is in fact out-performing their projection by around 30 points and the “under-perform” group is in fact under-performing their projection by almost 70 points. What about the pitchers?
Pitchers
Over-perform: 3.94, 3.43, -0.51
Under-perform: 3.77, 4.98, +1.21
Again, the CHONE projections are a bit closer together (though I should note that they actually saw the under-perform group about the same as THT did), seeing around a 0.2 run gap between the two groups. Instead, the gap has been 1.55 runs - but in the other directions! CHONE still has the “over-perform” group projected half-a-run too high, and the “under-perform” group 1.2 runs too low!
The CHONE results are a huge win for me (with the caveat that a lot can change over the next four months). They suggest that either I have some magical ability to spot breakouts and collapses, or (more likely) that all computer based systems, even the best ones, can be improved with human imput. That is, the best projection is not one spit out by the computer, but one that is then modified subjective opinion, or at least my subjective opinion.
That was exactly the hypothesis of my article and it would be a HUGE result to learn that this is the case. Again, we won’t know for sure until I publish my final results in October, but so far it’s looking good.