Thursday, June 18, 2009
ERA, FIP, xFIP, tRA
Colin checks in to see how well they correlate.
Colin: if you have the r for that table, I’d love to see it.
Buy The Book from Amazon
Colin checks in to see how well they correlate.
Colin: if you have the r for that table, I’d love to see it.
I was thinking the same thing as Graham: it would have been nice if tRA* had been included. $10 says tRA* beats xFIP.
I was actually surprised by the result. I expected FIP to be the best estimator because it implicitly includes home park effects for HR, while tRA and xFIP don’t.
Yes, #2, and I would bet tRA and tRA* do even better when park effects are used on ERA.
I did something like this earlier. Looking at all pitchers with 300 TBF in consecutive years since 1970, I found the correlation of several ERA estimators in year N with the pitcher’s ERA in year N+1:
.397 DIPS ERA
.389 XERA
.387 FIP
.380 Base Runs ERA
.378 component ERA
.361 ERA
Of course, I couldn’t use xFIP/tRA without batted ball data.
Zach, can you run a regression of ALL Those metrics against ERA in year x+1, as well as the regression equation?
I was curious how something as simple as K/9 stands up. I’ve seen at least one DIPs detractor claim that K/9 is just as accurate as DIPs.
You can also try this:
kwERA = 5.4 - 12*(K-BB-HBP)/PA
You can play around with that 5.4 to align it to league ERA.
#6/Gary: K/9 had a -0.18 r with year N+1 ERA.
...
#5/Tango: The regression equation is:
ERA_next = .11*ERA + .20*ERC + .80*XERA + (-1.06)*BsR_ERA + .02*FIP + .51*DIPS_ERA + 1.58.
The r-squared is 0.181 (an r of .426).
...
#7/Tango: kwERA has an r of 0.262 (less than ERA itself) with ERA_next. If I include it in the regression equation, the r2 increases slightly to .189:
ERA_next = .07*ERA + .61*ERC + 1.34*XERA + (-2.21)*BsR_ERA + 1.19*FIP + (-0.21)*DIPS_ERA + (-.43)*kwERA + 3.06.
Is there a reason why DIPS ERA has a negative coefficient with ERA_next if it has the highest correlation (which is positive) with ERA_next than any of the other estimators?
Colinearity. All of your ERA estimators should have a substantial correlation with each other, which makes the regression nearly meaningless. I imagine something like factor analysis is more apporopriate.
Colin is correct. In a regression, you would prefer to have your independent variables actually be independent.
What you might find interesting is to take your metric of choice (presumably XERA) and correlate it to each metric, one at a time. This will tell you how independent it is from the others. Produce the regression equation.
#6/Gary: K/9 had a -0.18 r with year N+1 ERA.
IOW, the more strikeouts, the lower the ERA the next year, but the relationship isn’t that strong; half as strong as ERA year to year. That’s how I read this, but I wanted to make sure.
At what level does r become significant?
#10/Tango: Here are the r’s between each estimator:
kwERA is the most independent of all of the metrics. DIPS and FIP are nearly identical, but DIPS’s correlation with ERA_next is much higher than FIP’s. Any two of BsR/XERA/ERC cannot be in the same regression, as the r’s between all of them are over .9; of the three, XERA has the highest r with ERA_next.
Using DIPS, XERA, and ERA, here’s the resulting regression equation:
ERA_next = .35*DIPS + .12*XERA + .09*ERA + .08*kwERA + 1.36. (r^2 of .174, an r of .417)
The equation takes out any collinearity and backs up the correlations in #4.
***
Tangotiger writes:
For some reason, I can’t post here, probably because of all the acronyms in here. I have to edit this post in order to make my comments.
***
dips is .397 and fip is .387. You must mean “barely higher”.
***
There is still some level of collinearity, considering that strikeouts and walks are used in a few of those terms (they are just masked).
Remember that dips-is-fip, and kwera-is-fip without homeruns.
May 25 05:00
Help needed with sticky issue…
May 25 04:38
The first time a pitcher has ever intentionally thrown at a batter….
May 25 03:39
Lack of hustle during a game
May 25 02:54
Largest demonstration in Canadian history?
May 25 02:38
NFLPA lawsuit against collusion
May 25 01:43
Neal Huntington’s best moves
May 24 23:50
Rooting for laundry
May 24 17:04
Firefox, IE, or Chrome?
May 24 12:07
How to beat the shift
May 24 11:11
Incredible story
Sweet, someone’s finally done one of these studies including tRA. About where I expected - a slight improvement on FIP and beaten out by xFIP. Although I think it’s a little unfair that FIP got to use its regressed(ish) version and tRA didn’t…