THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, November 21, 2011

BPro readers votes: #1 not sabermetrics, #2 sabermetrics

By Tangotiger, 02:22 PM

Following are the rWAR (Rally via Baseball-Reference), fWAR (Fangraphs), and WARP (though I would call it pWAR, Baseball Prospectus) for the top 16 starting pitchers in the AL IBA, including IBA points:

fWAR      rWAR      WARP     IBA     Pitcher 
 7.0      8.6      6.7     5980    Verlander
 7.1      6.9      4.4     2929    CC
 5.6      6.6      5.5     2726    Weaver
 4.9      6.1      3.8     1421    Shields
 6.4      4.0      4.5     587    Haren
 5.9      5.0      4.2     405    Wilson
 5.5      4.7      2.8     202    Felix
 4.3      6.2      3.6     158    Beckett
 4.7      3.7      4.0     148    Price
 5.6      5.7      3.9     76    Fister
 2.9      5.9      2.5     69    Romero
 3.5      5.0      2.7     40    Gio
 3.7      4.8      2.4     32    Lester
 1.4      4.2      1.6     31    Hellickson
 4.9      4.1      3.9     27    Masterson
 4.7      3.7      3.3     23    McCarthy

I then standardized those numbers, so I can get something I can compare.  I forced the mean for the three WAR systems to 4.65 for those pitchers.  And I used a logarithmic function to convert the IBA points into a “presumed” WAR.  To do that, I did .636 times LN(IBA points) + 1.26.  I get this:

fWAR      rWAR      WARP     ln(IBA)     Pitcher 
 6.8      7.9      7.6      6.8     Verlander
 6.9      6.2      5.3      6.3     CC
 5.4      5.9      6.4      6.3     Weaver
 4.7      5.4      4.7      5.9     Shields
 6.2      3.3      5.4      5.3     Haren
 5.7      4.3      5.1      5.1     Wilson
 5.3      4.0      3.7      4.6     Felix
 4.1      5.5      4.5      4.5     Beckett
 4.5      3.0      4.9      4.4     Price
 5.4      5.0      4.8      4.0     Fister
 2.7      5.2      3.4      4.0     Romero
 3.3      4.3      3.6      3.6     Gio
 3.5      4.1      3.3      3.5     Lester
 1.2      3.5      2.5      3.4     Hellickson
 4.7      3.4      4.8      3.4     Masterson
 4.5      3.0      4.2      3.3     McCarthy

In all 4 cases, the mean is 4.65 WAR.  The standard deviation for the first three were kept as-is (i.e., I did not modify the slope), while the standard deviation for the presumed WAR for IBA was forced to 1.14.  That 1.14 was figured after I took the simple average of the three WAR systems, and I then did the standard deviation of those 16 figures.

Anyway, now that everything is standardized, the presumed WAR according to the IBA voters could be calculated as follows:
= 0.38 * rWAR
+ 0.26 * WARP
+ 0.25 * fWAR
+ 0.54

r=0.87

This means that rWAR is the preferred metric of choice among the three, by about a 1.5 to 1 preference, over each of the other two.

The biggest divergent opinions can be seen between Shields and Fister.  fWAR has Fister comfortably ahead, WARP had him barely ahead, while rWAR had a Shields somewhat ahead.  Pretty much, they’d be considered even according to the three WAR.  They came in at an average WAR of 5.0.  But, the IBA results gave Shields a presume 5.9 WAR and Fister a presumed 4.0 WAR.

Clearly, the IBA voters did not buy into the sabermetrics story here.

We can also see that with Masterson and Hellickson.  According to fWAR, Masterson was 3.5 wins ahead, while WARP had him 2.3 wins ahead.  rWAR had Hellickson barely ahead.  IBA had them virtually tied.

System after system, you see the IBA voters reject their results.  For fWAR followers, Fister should have had a higher showing, while Shields and Weaver should have been weaker.  For rWAR followers, Beckett should have had a higher showing, while Haren, Price, and Wilson should have been weaker.  For WARP followers… hmmm… the exact opposite of rWAR.

***

My bet (though untested at this point) is that I think we’d probably be able to create a stronger r than 0.87, if we simply relied on the traditional stats, and not focus on the saber stats.  IBA voters simply like to interpret stats their way, and not the WAR way.  Shields’ CG for example probably vaults him.


#1    mettle      (see all posts) 2011/11/21 (Mon) @ 17:48

How did you generate the coefficients above?
If you used a multiple simultaneous regression, then you can’t simply compare the coefficients to each other because you have massive co-linearity amongst the three variables. That’s a big stats no-no.
What you need to do to compare the different WARs is model comparison.

Also, I’m not sure how to go from r=.87 to saying they didn’t vote in accordance with advanced metrics. You can pick off anecdotal comparisons, but that’s still just anecdotal info, the plural of which is still just anecdotes. r=.87 is pretty damn high.

Finally (!), I think you’ll want to be using non-parametric stats since the BA votes are ranks and not normal.


#2    Tangotiger      (see all posts) 2011/11/21 (Mon) @ 17:52

r=.87 seems pretty low to me, given the context of interpreting all known information, and the votes are based just about on all known information.

As for what I should have done: I published the data!  Feel free to do better.  I never understand the complaints of what I should have done, and then, those people don’t do it. The universe of data I used is there!


#3    mettle      (see all posts) 2011/11/21 (Mon) @ 18:10

Publish not just data, but method. That was the first part of the question.


#4    mettle      (see all posts) 2011/11/21 (Mon) @ 19:08

When I combine the WAR measures with wins, ERA, IP and Ks in a multiple simultaneous regression, the three WAR measures are significant w/ fWAR and rWAR at p<.001 and only wins is, but only at p=.04.

I get r=.89 when I use the WAR measures in a regression and r=.85 with the traditional measures, which is obviously less (despite using 4 factors).
Combining everything, r=.92

Using a step-wise regression, which is basically a way of throwing everything in a pot and algorithmically figuring out which numbers are important yields:
IBA ~ .27 * rWAR + .39 * WARP + .41 * win
with r = .93. Note the presence of 2 WAR measures and one trad measure.

So, there’s certainly no evidence to suggest trad numbers are better, and I think there’s some evidence that SABR measures are more predictive.

Finally, the parametric point isn’t one that shows up as a comparison—it just is.


#5    Tangotiger      (see all posts) 2011/11/21 (Mon) @ 19:52

I wouldn’t necessarily just mix rate and counting stats like you are, and I would also include CG, because that’s a huge determinant for Shields especially.  That said, thanks for doing the work you did.  I definitely don’t want to do any more work myself.

As for methods: I showed the data and I described the methods.  Everything I did was reproducible.


#6    mettle      (see all posts) 2011/11/21 (Mon) @ 20:18

Given our different results, though, I’m still unclear on the method you used to calculate those coefficients and get your r. Did you do a simultaneous regression? A step-wise regression? I assumed you used parametric stats but you didn’t say. Is the r you’re reporting adjusted or unadjusted?

Why wouldn’t you mix those two types of stats? A lot of WAR work operates on the assumption they are all normal and a regression automatically normalizes them for you, so what’s the issue?

If anything CG is way too low and oddly distributed to be used on that point. Perhaps adding it in would be a useful exercise for someone else…


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 03:39
Lack of hustle during a game

May 25 02:54
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards