Monday, November 21, 2011
BPro readers votes: #1 not sabermetrics, #2 sabermetrics
Following are the rWAR (Rally via Baseball-Reference), fWAR (Fangraphs), and WARP (though I would call it pWAR, Baseball Prospectus) for the top 16 starting pitchers in the AL IBA, including IBA points:
fWAR rWAR WARP IBA Pitcher
7.0 8.6 6.7 5980 Verlander
7.1 6.9 4.4 2929 CC
5.6 6.6 5.5 2726 Weaver
4.9 6.1 3.8 1421 Shields
6.4 4.0 4.5 587 Haren
5.9 5.0 4.2 405 Wilson
5.5 4.7 2.8 202 Felix
4.3 6.2 3.6 158 Beckett
4.7 3.7 4.0 148 Price
5.6 5.7 3.9 76 Fister
2.9 5.9 2.5 69 Romero
3.5 5.0 2.7 40 Gio
3.7 4.8 2.4 32 Lester
1.4 4.2 1.6 31 Hellickson
4.9 4.1 3.9 27 Masterson
4.7 3.7 3.3 23 McCarthy
I then standardized those numbers, so I can get something I can compare. I forced the mean for the three WAR systems to 4.65 for those pitchers. And I used a logarithmic function to convert the IBA points into a “presumed” WAR. To do that, I did .636 times LN(IBA points) + 1.26. I get this:
fWAR rWAR WARP ln(IBA) Pitcher
6.8 7.9 7.6 6.8 Verlander
6.9 6.2 5.3 6.3 CC
5.4 5.9 6.4 6.3 Weaver
4.7 5.4 4.7 5.9 Shields
6.2 3.3 5.4 5.3 Haren
5.7 4.3 5.1 5.1 Wilson
5.3 4.0 3.7 4.6 Felix
4.1 5.5 4.5 4.5 Beckett
4.5 3.0 4.9 4.4 Price
5.4 5.0 4.8 4.0 Fister
2.7 5.2 3.4 4.0 Romero
3.3 4.3 3.6 3.6 Gio
3.5 4.1 3.3 3.5 Lester
1.2 3.5 2.5 3.4 Hellickson
4.7 3.4 4.8 3.4 Masterson
4.5 3.0 4.2 3.3 McCarthy
In all 4 cases, the mean is 4.65 WAR. The standard deviation for the first three were kept as-is (i.e., I did not modify the slope), while the standard deviation for the presumed WAR for IBA was forced to 1.14. That 1.14 was figured after I took the simple average of the three WAR systems, and I then did the standard deviation of those 16 figures.
Anyway, now that everything is standardized, the presumed WAR according to the IBA voters could be calculated as follows:
= 0.38 * rWAR
+ 0.26 * WARP
+ 0.25 * fWAR
+ 0.54
r=0.87
This means that rWAR is the preferred metric of choice among the three, by about a 1.5 to 1 preference, over each of the other two.
The biggest divergent opinions can be seen between Shields and Fister. fWAR has Fister comfortably ahead, WARP had him barely ahead, while rWAR had a Shields somewhat ahead. Pretty much, they’d be considered even according to the three WAR. They came in at an average WAR of 5.0. But, the IBA results gave Shields a presume 5.9 WAR and Fister a presumed 4.0 WAR.
Clearly, the IBA voters did not buy into the sabermetrics story here.
We can also see that with Masterson and Hellickson. According to fWAR, Masterson was 3.5 wins ahead, while WARP had him 2.3 wins ahead. rWAR had Hellickson barely ahead. IBA had them virtually tied.
System after system, you see the IBA voters reject their results. For fWAR followers, Fister should have had a higher showing, while Shields and Weaver should have been weaker. For rWAR followers, Beckett should have had a higher showing, while Haren, Price, and Wilson should have been weaker. For WARP followers… hmmm… the exact opposite of rWAR.
***
My bet (though untested at this point) is that I think we’d probably be able to create a stronger r than 0.87, if we simply relied on the traditional stats, and not focus on the saber stats. IBA voters simply like to interpret stats their way, and not the WAR way. Shields’ CG for example probably vaults him.


How did you generate the coefficients above?
If you used a multiple simultaneous regression, then you can’t simply compare the coefficients to each other because you have massive co-linearity amongst the three variables. That’s a big stats no-no.
What you need to do to compare the different WARs is model comparison.
Also, I’m not sure how to go from r=.87 to saying they didn’t vote in accordance with advanced metrics. You can pick off anecdotal comparisons, but that’s still just anecdotal info, the plural of which is still just anecdotes. r=.87 is pretty damn high.
Finally (!), I think you’ll want to be using non-parametric stats since the BA votes are ranks and not normal.