THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, September 27, 2006

Quick ERAs

By Tangotiger, 09:26 AM

I have two versions of component-based ERA. 


The first was based on Voros’ DIPS, and I call it FIP:
ERA = (13*HR + 3*BB - 2*SO)/IP + 3.20
You can include HBP and exclude IBB if you like.  The 3.20 is a constant and should be altered year-to-year.

The second one came about while talking with Guy at Fanhome, and I call it szERA:
ERA = 5.40 - 12 * (SO - BB)/BFP
Again, decide how you want to handle HBP and IBB.  And again, 5.40 should be altered to ensure it all adds up at the league level.

Nate Silver shows his Quick ERA as:
ERA = (2.69+K%*(-3.4)+BB%*3.88+GB%*(-0.66))^2

He correctly notes that run scoring is not linear, which is why he introduced the exponent of 2 (though 1.5 might work out better).  However, the GB% in the equation has a different denominator than the K and BB percentages (BIP as opposed to PA).  Instead, I would do “GB minus FB” per PA.  The run value between these two is around .10 runs, which makes it about one-sixth the run value of a walk minus K.  Therefore, my szERA could be modified as:
ERA = x.xx - 12 * (SO - BB)/BFP - 2 * (GB - FB)/BFP

That x.xx would need to be set accordingly.  I haven’t tested it, but I’m pretty sure this would work.  (Maybe that “2” should be a “4”.  Not sure.)

#1    David Cameron      (see all posts) 2006/09/27 (Wed) @ 11:04

I still want to see one of these component ERAs incorporate runners left on base.  It’s clearly a result of skill, and it has a pretty massive impact on run prevention.  Just from my quick-and-dirty look at THT’s LOB% and ERA, about 3% difference in LOB% makes almost a full point difference in ERA. 

As long as we have Component ERAs that don’t account for the differences in abilities to strand runners, we’re going to overrated bad pitchers and underrate good pitchers.


#2    tangotiger      (see all posts) 2006/09/27 (Wed) @ 11:55

The poster boy for the runners on base situation is likely Tom Glavine.  Strand rates, ERA, FIP for selected pitchers, ordered by FIP:

Pedro 76.2%, 2.79, 2.91
RJ 75.0%, 3.22, 3.30
Clemens 74.7%, 3.10, 3.31
Schilling 74.5%, 3.44, 3.34
Maddux 72.8%, 3.07, 3.42
Smoltz 73.6%, 3.28, 3.44
Brown 72.2%, 3.28, 3.52
Mussina 73.0%, 3.64, 3.71
Glavine 74.0%, 3.47, 4.07

STAR AVERAGE 74.0%, 3.25, 3.45

The FIP is about .20 runs higher than the ERA on average, likely showing a bias among star pitchers. 

There are two outliers.  The first is Glavine, with a 4.07 FIP, but only a 3.47 ERA.  His LOB% is 74.0%, which matches the star group.  His FIP is 0.62 higher than the star group, but his ERA is only 0.22 higher.  So, I don’t think his LOB rate is what explains it here.

More interesting is Schilling and Maddux.  In their case, Maddux was slightly worse than Schilling in FIP, but much better in ERA.  But Maddux’s strand rate was much lower.

A .01 change in strand rate, if given 10 runners on base per 9 innings, means an extra .10 runs (.01 x 10) scored, as opposed to stranded.  I don’t think the strand rate explains as much.

Glavine has pitched much better with men on (or at least much differently), and Maddux has pitched much worse with men on.  It’s interesting then that Maddux pitches as poorly (for him) as he does with men on, doesn’t leave as many men on as the star pitchers, will still have an ERA much lower than we would have at first glance expected.

Here are some random current heavyweights:
Oswalt 76.7%, 3.07, 3.33
Peavy 76.5%, 3.51, 3.80
Santana 75.9%, 3.20, 3.34
Halladay 71.3%, 3.62, 3.76


#3    MGL      (see all posts) 2006/09/27 (Wed) @ 15:42

David and Tango are talking about 2 different things, I think.  David is talking about “ability to strand runners” and Tango is talking about pitching from the stretch versus pitching from the windup.  They are very similar but not exactly the same.  We (Andy) found in our research for the book that there is a very small skill rate for windup versus stretch, but not enough to make much of any difference.

So, David, I don’t know what you are talking about, especially about overrating bad pitchers and underrating good pitchers.  In general, that is going to be slightly the case when using a non-"baseruns" type ERC formula (because the values of each event are not constant for pitchers as they are for batters - they change based on the pitcher’s rates of those components), but again, not enough to make much of a difference.

David, can you explain what you mean.  As far as I know, other than the windup/stretch slight skill differential, I have never seen any evidence that there is a skill to “stranding runners” (and I am not even sure what you mean by that), over and above their normal expected pitching rates whether there are runners on or runners not on (and of course, ALL pitchers rates get modified with runners on base, depending on the runner/out configuration).


#4    MGL      (see all posts) 2006/09/27 (Wed) @ 15:55

Tango, with your list, I am not sure you are mot just seeing a bunch of noise, as well as the fact that you have selectively sampled your pitchers, even though they all have had long careers (and thus are much less subject to noise/fluctuation/luck/sample error).  Still, they are stars because they have SOMEWHAT been lucky their whole careers and thus we would expect their ERA’s to be less than their FIP.

On the other hand, we would also expect their ERA’s to be less than their FIP (or DIPS) for precisely the reason why using FIP or DIPS to figure ERC is so bad for pitchers with a lot of data.  As I always said, DIPS or FIP is simply a poor man’s regression (regressing 100% toward the mean).  And that is wrong, but it works just fine for small samples.  For large samples, by no means do we want to regress a pitcher’s BABIP (or whatever you want to call it) 100% toward the mean.  If you do, you will definitely underrate the good pitchers and overrate the bad pitchers, which I should have said in my last post.

If you make the same list with bad pitchers with a long career, you will probably see their FIP be a lot lower than their ERA, although there is definitely selective sampling for bad pitchers with long careers (they tend to have “phony” low ERA’s for whatever reasons), so that might not be the case.

I don’t think your (Tango’s) list has anything to do with strand rate, pitching with runners on base or not, etc.  I think that is all noise.  There may be a few exceptions though, which is contrary to my argument, I guess (of it being all noise).  I think it is possible that Maddux is such a smart pitcher that his ERA will consistently outperform his FIP.  If that is the case, there must be others (he can’t be the ONLY smart one, even if he is one of the smartest), and there must be dumb pitchers as well, who do not adjust their approach enough to account for inning/score/runners/outs.  We certainly see that when pitchers throw 2-2 curve balls with large leads late in a game and end of walking a batter (which drives me crazy) or the even stupider walking the leadoff batter in the 9th inning when ahead by more than a run.  A pitcher should be able to do that less than 2% of the time I would think if he is pitching appropriately (which is more fastballs and more pitches in the middle of the zone, and basically throwing fastballs right down the middle when you get behind in the count anytime a walk is the same thing, or almost the same thing, as a hit or a home run).


#5    David Cameron      (see all posts) 2006/09/27 (Wed) @ 16:45

I’ll try to explain better. 

In the linked article, Nate presented QERA as a way to evaluate pitchers based on BB%, K%, and GB%, which is very similar to xFIP, THT’s takeoff of Tango’s FIP, which creates a component ERA to represent pitcher skill.  Essentially, the idea behind FIP, xFIP, and QERA are the same: evaluate pitchers based on the “three true outcomes”, and assume that everything else that influences run prevention is equal. 

I’d say the flaw in that assumption is that all pitchers will strand about the same amount of their baserunners.  Using THT’s LOB% as the marker of runner stranding, we can see that the league average is basically 70%, with the spread being essentially 60% to 80% for starting pitchers.  FIP, xFIP, and QERA all regress LOB% back to 70% for all pitchers since they are assuming a league average strand rate by the omission of that information from the formula. 

That’d be okay if LOB% was totally random, but it’s not.  As Tango’s data shows above, the career LOB% for the “star pitchers” was between 73-76%, significantly above the league average 70% marker.  And for “crap pitchers”, you’d see the exact opposite; their LOB% would be 65-68%, most likely. 

So a star pitcher will strand about 10% more runners than a crappy pitcher will.  Over the course of a season, that’s going to add up. 

Johan Santana has put 213 men on base this year (H+BB+HBP-HR), and of those, 55 have scored (R-HR).  Basically, 26% guys who reaches base safely by not homering crosses the plate against Johan.  If we were to assume (like FIP, xFIP, and QERA do) that the league average of 30% of his baserunners were going to score, then he’d have allowed 64 of those 213 men to cross the plate, not 55.  That’s 9 runs that these BB-K-HR/GB formulas ignore and don’t give him “credit” for, even though they clearly should, because his real LOB% isn’t league average. 

The culprit here isn’t so much pitching from the stretch or the windup; it’s distributions of hits, I think.  Crappy Pitcher X (we’ll call him Joel Pineiro) is going to put a lot of balls in play and walk a lot of guys, so when he gives up a leadoff single, it’s pretty likely that he’s going to give up a couple more hits before he gets the third out of the inning, and that leadoff baserunner is going to cross the plate. 

When Johan Santana gives up a leadoff single, though, due to his inherent already goodness, it’s not that likely that he’s going to give up several more baserunners that same inning.  It’s not because he’s going to pitch better than he normally does, but that because he’s already so good, he’s inherently more likely to get outs with that man standing on base and leave him there with crossing the plate. 

I’m not talking about guys like Glavine, who pitch better with runners on base than they do with no one on.  I’m talking about the league as a whole; because the distribution of baserunners isn’t even between good and bad pitchers, neither is the likelyhood that a baserunner will score. 

By ignoring strand rate in these component ERA formulas, we’re saying (by omission) that a baserunner against Johan Santana is just as likely to score as he is against Joel Pineiro.  And that’s just not true. 

Thus, we get things like the spread of xFIP being 3.00 to 5.50 for major league starters, while the spread of real world ERA is more like 2.50 to 6.00. 

Basically, what I’m trying to say (and feel I’m not doing a very good job of) is that FIP/xFIP/QERA tell us that bad pitchers are a little bit better than they really are, and that good pitchers are a little bit worse than they really are.  The difference isn’t massive, and the tools are still useful, but it’s still a flaw, in my opinion. 

Hopefully this made sense.  If it didn’t, I’ll take another stab at it.


#6    David Cameron      (see all posts) 2006/09/27 (Wed) @ 16:54

I should have included this in the the last comment. 

Stranding runners certainly isn’t all skill, or even mostly skill.  It’s mostly noise with a bit of skill mixed in.  In case you haven’t seen it, here’s an article Studes did last year where he correlated xFIP and LOB% at .20.  Weak, but real. 

By regressing LOB% 100% to the mean, we’re ignoring the 20% or so that’s actually pitcher influenced.

Using the formula Studes provides for predicting LOB%, it looks like the “true talent spread” is probably something like 67-75%, with good pitchers stranding about 3/4 of their runners and bad pitchers stranding about 2/3 of theirs. 

My hope was that we could somehow account for the differences in real true talent LOB% to improve FIP/xFIP/QERA.  The low walk, low strikeout groundball pitcher is going to leave a greater percentage of runners on base than the high walk, low strikeout, flyball pitcher, and that’s going to effect his ERA beyond what BB%, K%, and GB% or HR% will tell us.


#7    David Gassko      (see all posts) 2006/09/27 (Wed) @ 17:29

David,

This is what Mitchel was talking about when he wrote,

“In general, that is going to be slightly the case when using a non-"baseruns" type ERC formula (because the values of each event are not constant for pitchers as they are for batters - they change based on the pitcher’s rates of those components), but again, not enough to make much of a difference.”

You can simply plug whatever you want into BaseRuns to account for that. I do that, for example, with DIPS 3.0.


#8    Guy      (see all posts) 2006/09/27 (Wed) @ 17:57

David:
The problem with what you’re saying is that the elements captured in FIP play an important role in determining whether baserunners score.  A very good way to prevent them from scoring is to strike out subsequent batters, which is one reason that Pedro, RJ, Clemens and Schilling excel.  Another excellent way to prevent runners from scoring is NOT to allow following hitters to hit HRs—also captured in FIP.  Even BBs can advance runners and increase the chance of scoring.  So while FIP does not capture everything, it is clearly incorrect to say it “assumes” a lg avg strand rate. 

What FIP is mainly missing is BABIP, which for any single year is basically fine, but over a career (as MGL says) will shortchange many good pitchers.

* * *

Tango, I draw the opposite conclusion from your Glavine analysis.  The fact that his strand rate matches the “star average” while his FIP is .60 higher suggests that his skill at preventing runners from scoring is high relative to some of his other skills.  More specifically, I’m guessing it’s rare for a pitcher with his K rate to post a strand rate that high.


#9          (see all posts) 2006/09/27 (Wed) @ 18:03

Aren’t better pitchers going to have a higher LOB% simply because they’re better pitchers?  Someone more likely to strike out batters and less likely to walk them with the bases empty will also generally do the same with runners on base.  More runners will be stranded because the batters behind them will be less likely to get a hit—just as they’d be less likely to get a hit with the bases empty. 

A component-based ERA system isn’t ignoring LOB%—it’s predicting it.


#10    David Cameron      (see all posts) 2006/09/27 (Wed) @ 18:04

Certainly, BaseRuns is better, no doubt. 

But BaseRuns is also far more complex, and selling it to the masses is a daunting task.  The beauty of FIP is it’s relative simplicity and ability to be understood by people who have no interest in learning about modeling that way. 

So, my original point/question at the beginning of the thread still stands; I’d love to see FIP/xFIP/QERA incorporate the runner stranding issue to reflect reality a little closer. 

The whole idea behind FIP is that its fielding independant pitching.  But LOB% isn’t fielding, and it’s at least partly a skill, so if it can be included, I think it should be.

I’m not smart enough to know how to include it.  I’m just hopeful that if I bug one of you guys enough, you’ll be able to.


#11    MGL      (see all posts) 2006/09/27 (Wed) @ 18:09

I still don’t know what you mean, David.  Of course good pitchers will strand more runners as they give up fewer walks, home runs, and have more K’s (balls in play will tend to score runners I assume, even with the DP’s).  FIP and all those other similar metrics do NOT assume the same strand rates, at least I don’t think, since they use a pitcher’s walks and HR’s, which influences strand rate, do they not?

The “flaw” in those metrics, as I explained, is two-fold, and has nothing to do with strand rates per se.  One all component ERA formulas, by definition, even the ones that use a pitcher’s actual single, double, and triples rates, treat each component as if it has a constant value (e.g., .47 for the s, .78 for th d).  That is fine for batters, but not for pitchers, since a pitcher’s single value, for exampler, depends on his singles, d,t,hr,bb,so rates, etc.  IOW, the values of a pitcher’s components are not independent of their rates.  So technically, you have to use a baseruns type formla, even if you are assuming a constant BABIP (s,d,t rates) for all pitchers.  However, in my experrience, it does not make that much difference even at the extremes (pitchers like Pedro or Lima).  Secondly, all of those formulas make a false assumption which is fine for small samples (actually better than using a pitcher’s actual BABIP) but not fine for large samples.  So for small samples (say 1-3 years of full time pitching), FIP will better represent a pitcher’s trye ERA talent and future ERA, but for large samples (say, 5-10 years), it will underrate pitchers with true low BABIP (usually the better pitchers, but not necessarily) and overrate the pitchers with a true high BABIP.

With these two things (the baseruns thing and the regression thing), I am not saying anything that most of the readers and participants of this blog don’t already know.  I still don’t know what you are talking about, David, with all due respect, with regard to “strand rates.”

Correlation between FIP and LOB?  Of course they are correlated.  The better the pitcher you are, even considering walks, HR, and K only, the more you will strand runners.  Duh!

Regressing LOB% to the mean?  As I said, where does FIP or DIPS do that?  It regressing BABIP (s,d,t, rates) 100% to the mean (which is close enough to reality for small samples if you want to get at talent or predict future performance), but by no means does mean that LOB% is regressed 100%.


#12    tangotiger      (see all posts) 2006/09/27 (Wed) @ 18:15

David, ok I follow you now.

Essentially, you believe that the LOB% rate is being forced as being “equal” for all pitchers, in the FIP equation.  However, that’s not necessarily true.  The equation itself is calibrated so that the better FIP pitchers should in fact have a higher LOB rate, and the worse FIP pitchers would have a lower implied LOB rate. 

That is, if I were to try to infer the LOB rate based on the FIP, I may in fact get the LOB rate at 73% for the star pitchers, and 67% for the crappy ones.

However, I’m skeptical with what I just said, since I expected the star pitchers’ FIP and ERA to match, and they don’t.  And therefore, you may be correct.  It’s possible that the issue is that the FIP equation is linear, and therefore, I may be getting a fixed LOB% rate as you are figuring.

I will run some tests tomorrow, and will post the results.  In essence, I will work backwards to determine the strand rate (or score rate), as I did here:
http://www.tangotiger.net/rc3.html


#13    David Cameron      (see all posts) 2006/09/27 (Wed) @ 18:19

The problem with what you’re saying is that the elements captured in FIP play an important role in determining whether baserunners score.  A very good way to prevent them from scoring is to strike out subsequent batters, which is one reason that Pedro, RJ, Clemens and Schilling excel.  Another excellent way to prevent runners from scoring is NOT to allow following hitters to hit HRs—also captured in FIP.  Even BBs can advance runners and increase the chance of scoring.  So while FIP does not capture everything, it is clearly incorrect to say it “assumes” a lg avg strand rate.

Right, I understand that, as well as mylons subsequent comment.  I’m clearly not communicating my point very well.  Let’s use Johan Santana as an example. 

Last three years, his ERAs are 2.61, 2.87, and 2.77. 

Last three years, his xFIPs are 3.28, 3.35, and 3.34. 

I’m using xFIP because it’s a close comparison to Nate’s QERA statistic linked above, and my original comment that set this off was a response to the usefulness of Nate’s statistic judging pitchers on BB/K/GB rates. 

Our options are either that Santana has been lucky three years in a row, or that xFIP is shorting him credit for some repeatable skill that he has.  In my opinion, that skill is the stranding runner skill - I could be wrong, but that’s what it appears to be to me, anyways.  xFIP, by its nature, understands perfectly how many runners he’s putting on base, and his park adjusted HR/FB rates aren’t significantly different than league average, so what’s left?

Less runs are scoring off Johan Santana than a BB-K-GB/HR model would predict.  This is true of a lot of good pitchers, and the inverse is true of a lot of bad pitchers.  And it’s consistently true.

So, I could be totally off base here, but it seems to me that these BB-K-GB/HR models aren’t catching all of the run prevention value of these repeatable skills.  If guys with good BB-K-GB/HR numbers are routinely allowing less runs than our model predicts, and guys with bad BB-K-GB/HR rates are allowing more runs than the model predicts, than there’s something wrong with the model, right?


#14    tangotiger      (see all posts) 2006/09/27 (Wed) @ 18:31

David, we probably cross-posted.  You may be totally right, and it may simply be the linear function of the model.  I’ll report back tomorrow.


#15    Guy      (see all posts) 2006/09/27 (Wed) @ 19:24

"Our options are either that Santana has been lucky three years in a row, or that xFIP is shorting him credit for some repeatable skill that he has.”

Yes, it’s “shorting” him for having BABIPs of .259, .276, and .281 (and his SLGBIP is probably below avg as well, not sure).  A high strand rate is one likely consequence of that, but so is a low OBP allowed. 

In addition, I would think that FIP somewhat overstates runs allowed for the very best pitchers because of the interaction effect MGL mentions (and Tango is checking out).  Clearly, a HR allowed by a low-BB/hi-K pitcher does less damage on average.  And that would be further compounded in the case of a low-BABIP pitcher.


#16    MGL      (see all posts) 2006/09/27 (Wed) @ 19:58

Yes to whay Guy said.  It has NOTHING to do with strand rates (I don’t think)!  Of course FIP will generally be lower than the ERA of the best pitchers.  Their BABIP is generally lower than the average pitcher and what FIP uses.  And the interactive effect.  But I already said that 3 times I think.


#17    Nate Silver      (see all posts) 2006/09/27 (Wed) @ 22:24

David,

I also remain a little bit confused about your objection.  The QERA formula is derived from a very simple, undergrad-level regression analysis.  In other words, everything is derived emperically—the goal is simply to make a “best guess” at what a pitcher’s ERA is based on solely on his K%, BB% and GB%.  Coincidentially, this also works fairly well as a preditive model when you don’t have something more sophisticated at hand. 

Some of the benefit of having a superior K% (etc) is that you should strand more runners, but since that is reflected in the ERAs of pitchers in the historical data record, it should also be reflected in the coefficients chosen in the QERA formula.  It’s not like we’re trying to build some organic model of run scoring ("a strikeout is inherently worth XX"), in which case you’d need to be much more careful with your modeling.

If the results in the xFIP formula are off, it’s probably for the reason that Tango suggests—because it’s trying to apply a linear formula to something (run scoring) which isn’t terribly linear.  I don’t think we should have the same problem with QERA, but if we do, it’s probably because I made some silly mechanical mistake while crunching the numbers that should be easily fixable. wink

Also, MGL’s contributions throughout this thread are spot-on.  There are a couple of ways that pitchers can exhibit elements of skill that will create systematic differences between their ERAs and their PERAs/xFIPs/QERAs.  One of these things is the tendancy to pitch better/worse out of the stretch, and the other is smart situational pitching (for example, there is much more incentive to challenge a hitter with the bases empty than there is with say a runner on second, and smart pitchers like Glavine, Buehrle and Maddux make exactly this adjustment).  I do try and capture this stuff in PECOTA but it’s definitely hard to pin down.


#18    studes      (see all posts) 2006/09/28 (Thu) @ 05:28

Just a few comments:

- I think these shouldn’t be called “Component ERAs”.  To me, component ERAs include all the various batting elements like hits, doubles and home runs.  These are more like “DIPS ERAs.” I think the label is important because they do two different things.

- Nate’s equation is neat and better than xFIP because it includes the groundball rate—which gives you more information than just normalizing the home run rate—and has that exponent thingie.

- I ran some quick equations for pitchers with 100 or more Innings Pitched this year (sample size=126) and found a couple of interesting things.  FIP has an R squared of .55 vs. ERA, but adding LOB% to the equation raises it to .89.  Adding DER raises it a bit more to .94.  The R squared for just FIP and DER vs. ERA is .74.

I was surprised to see that LOB% improved the fit that much, more than DER.  What’s more, the R squared between FIP and LOB% was only .10 (vs. the .20 I found last year).  So, when it comes to predicting this year’s ERA, LOB% is an important variable to include.  In fact, by itself, LOB% predicted ERA as well as FIP did.

Of course, that’s very different from what Nate and Tango are doing, which is assessing a pitcher’s “true talent” and likely future performance.  I do tend to think that Nate’s exponent should take care of that issue, except for the Tom Glavine-like pitchers, guys who are adept at pitching to the situation.  And they will always be exceptions to any model.


#19    Guy      (see all posts) 2006/09/28 (Thu) @ 07:32

"Nate’s equation is neat and better than xFIP because it includes the groundball rate—which gives you more information than just normalizing the home run rate....”

Studes, is that true?  I would think that knowing the actual HR rate in a given year gives you a lot more info than the GB%.  Maybe it gives you a better prediction for following year, but I’d even wonder about that.....


#20    tangotiger      (see all posts) 2006/09/28 (Thu) @ 07:33

It’s completely expected that adding the strand rate (which is 1 minus score rate) should increase the ERA.  It is in essence doing: Runs = Runs.

***

The first test I performed was to create a pitching line where BsR and FIP would match.  So, 10.1 hits, with 1.9 2B, 0.2 3B, 1.1 HR, 3.3 BB, and 6.2 K fit the bill (in 9 IP).  This has a BABIP of .303, BA of .273, OBP of .333, and score Rate of .314.  Runs scored was 5.00.

Now that we’ve calibrated FIP and BsR, I created a star pitcher.  I reduced his hits and walks by 20%, and bumped his Ks to 9.7.  The BABIP dropped to .294, OBP to .285, score Rate to .268.  Runs Scored was 3.55, whether using BaseRuns or FIP.

First thing we notice, and it should be expected, is that the score Rate is proportional to the OBP rate.  The OBP dropped down by 14.5% and the score Rate dropped down by 14.5%.  The other thing to notice is that BABIP also had to drop (from .303 to .294) in order for FIP and BsR to match.

So, FIP is at least not forcing the strand rate to be static.

For me to undervalue the star pitcher like we found with the empirical data by .20 runs per game, I have to drop the BABIP rate to .284.  The impact here is that the scoreRate will now be .291 (instead of .268), but still below the baseline of .314.

FIP expects the Star pitchers to have a great K rate, and similarly it expects poor pitchers to have a miserable K rate.

What may be happening, with real pitchers, is that their BABIP skill is much stronger than FIP is allowing.  For example, in my illustration, FIP works if the BABIP drops slightly from the .303 baseline to .294.  So, it wants a built-in expectation that a good pitcher will have a better than average BABIP.  However, FIP built-in construction forces the BABIP to drop down to .284.  In essence, rather than adhering to DIPS, it moves farther away from it.

I’m going to have to think about it some more, and run more testing.  It’s possible that I haven’t calibrated FIP well enough, and that maybe instead of IP in the denominator, I should be using PA.


#21    tangotiger      (see all posts) 2006/09/28 (Thu) @ 07:39

"What may be happening, with real pitchers, is that their BABIP skill is much stronger than FIP is allowing.  “

should read as “much weaker”.


#22    studes      (see all posts) 2006/09/28 (Thu) @ 08:11

Studes, is that true?  I would think that knowing the actual HR rate in a given year gives you a lot more info than the GB%.  Maybe it gives you a better prediction for following year, but I’d even wonder about that....

Could be, Guy.  My gut would be that it would be better for the next year, but I don’t know.


#23    studes      (see all posts) 2006/09/28 (Thu) @ 08:22

It’s completely expected that adding the strand rate (which is 1 minus score rate) should increase the ERA.  It is in essence doing: Runs = Runs.

Not quite right.  Get on base * score rate = Runs (ignoring home runs).  So said a little differently, FIP seems to include a lot of “reaching base” information (K’s and BB’s)—more than I would have thought.  DER, which is essentially reaching base on batted balls, doesn’t add a lot to the equation once you’ve got the FIP components and Strand/Score rate.  I do find that interesting and unexpected, and it helps me put the different components in better perspective.


#24    tangotiger      (see all posts) 2006/09/28 (Thu) @ 09:10

Runs = baserunners x scoreRate + HR

If you have baserunners (H+BB-HR) and you have HR, and you have scoreRate, then you’ve got runs scored.

In your case, you are using all these pieces of information, but not in this way.  Using the actual score rate (which is runs minus HR divided by hits + walks - HR) shouldn’t be done!  You are using actual runs allowed in the denominator as part of your “independent” x-variable to estimate y.


#25    tangotiger      (see all posts) 2006/09/28 (Thu) @ 09:11

Numerator, obviously.


#26    tangotiger      (see all posts) 2006/09/28 (Thu) @ 10:20

Testing FIP

Ok, for my next test, I looked at the totals of all pitchers from 1994-2005.

For this version of FIP, I included HBP and excluded IBB.  And my FIP equation does not have the “constant” to bring it in line with ERA.

The league runs allowed per 9 innings was 4.88 runs.  The FIP was 1.30.  That gap, 3.58 runs, represents the constant fudge factor to turn FIP into an ERA-like number. 

Our question is: Is FIP biased in some way?  If it wasn’t, we expect that fudge factor to be around 3.58 for any subset of that population.  If it was biased, we should see that in the subset.  Let’s see.

My first test was to only look at pitchers with at least 500 career PA in that time period.  This represents 93% of all PA and 833 pitchers.  The fudge factor was 3.55.

Next, of these 833 pitchers, look at all those pitchers with at most .068 walks + hit batters per batter faced.  That gives me 90 pitchers who were great at limiting walks.  Their fudge factor was 3.54.  On the flip side, I looked for pitchers with at least .123 walks per batters, giving me 88 pitchers.  Their fudge factor was 3.59. 

So, we can safely say that FIP “works” with respect to walks.

Next, we repeat for HR.  The HR-happy pitchers (.037 or higher) had a fudge factor of 3.53, and the HR-limiting pitchers (.020 or less) had a fudge factor of 3.51.  Again, no HR bias.

Next, the Ks.  Pitchers with at least .22 Ks per batter had a fudge factor of only 3.39.  And those with at most .121 were at 3.66.  So, a clear bias.  But, maybe this bias is starter/reliever-based?

I did a further breakdown, this time focusing only on “usual starters”, which I defined as at least 75% of their games as a starter.  The total number of starters in my sample (min 500 PA) is 260.  Their fudge factor is 3.51.  Now, I’ll break those guys down by their K rates.  Of those with at least .20 K per batter (31 of them), the fudge factor is 3.47.  With those at most .12 K per batter (28 of them), the fudge factor is 3.63.  So, we still have a bias issue, but not as bad, and it seems to be focused on the low K pitchers.

Looking at relievers (maximum 10% of games as starters, which gives me 314 pitchers), their fudge factor is 3.46.  Breaking them by their K rates, 33 pitchers with at least .24 K per batter, the fudge factor is 3.38.  With at most .14 K per batter, 35 pitchers, the fudge factor is 3.47.  So, no bg bias.

What about using FIP itself (which is the combination of all these)?  Of our top 89 FIP pitchers, their fudge factor was 3.45.  Of the bottom 87 FIP pitchers, their fudge factor was 3.69.  Making sure this is not a starter/relief issue: a 3.50 fudge factor for the top starters, and 3.63 for the bottom starters. 

For the bottom relievers, a 3.18 fudge factor, and 3.47 for the top relievers.  The fudge factor is reversed!

Finally, their BABIP.  Going back to all 833 pitchers, those with a BABIP of at least .315 have a fudge factor of 4.19.  Those with at most .273 have a fudge factor of 3.00.  Now, we expected a bias, since that’s the point of FIP. 

I’ll have to test whether FIP would work better with PA instead of IP in the denominator.  Nonetheless, FIP seems to hold up fairly well, except for really bad relievers, who look worse with FIP than they should.

Therefore, FIP “works”.

***

By the way, the BABIP of the top relievers in FIP was: .287.  The bottom relievers in FIP had .286.  For starters, that breakdown was .293 for the top, and .295 for the bottom.  So, the BABIP is not tied-in to a pitcher’s overall K,BB,HR skill.


#27    David Cameron      (see all posts) 2006/09/28 (Thu) @ 11:05

Thanks for doing all this work, Tom.  One quick question:

The league runs allowed per 9 innings was 4.88 runs.  The FIP was 1.30.  That gap, 3.58 runs, represents the constant fudge factor to turn FIP into an ERA-like number.

This statement makes it look like the fudge factor is developed off of RA, but then added to FIP to try to make it match ERA.  That would be a problem, no? Shouldn’t we be using league ERA to develop the fudge factor, or in turn, comparing FIP to RA, not ERA?


#28    tangotiger      (see all posts) 2006/09/28 (Thu) @ 11:11

You can work it either way.  The ERA was 4.48, the RA was 4.88, and the FIP was 1.30.

So, if you do FIP to ERA first, you would add +3.18 to FIP.  Then, to go from ERA to RA, you multiply by 1.09.

RA = 1.09*(FIP+3.18) = 3.47 + 1.09*FIP

*** OR ***

RA = 3.58 + FIP

I’m not sure which one is more accurate.  I’ll probably run a regression, and see which works out best.


#29    tangotiger      (see all posts) 2006/09/28 (Thu) @ 11:37

Running a regression on those pitchers with at least 2000 BFP, the best-fit are:

ERA = FIP + 3.13
RA = (FIP + 3.29) * 1.055
RA = (ERA + 0.16) * 1.055

So, I can’t just do a straight FIP + 3.50 to get to RA, nor can I do a (FIP+3.20) * 1.09.  It’s somewhere in-between.

Same issue for converting ERA to RA. 

***

I also tested FIP using PA instead of IP in the denominator, and focusing specifically on K-pitchers.  Little change.

***

By the way, the fudge should really be done at the league level, and I wasn’t doing that.

***

Among pitchers with at least 5000 PA, the guys most hurt by FIP, relative to their ERA are: Glavine, Zito, Williams, Reed, Moyer, Hudson.  That is, FIP overpenalizes them. 

On the flip-side, the guys that FIP doesn’t kill enough: Rusch, Burkett, Hayne, Thompson, Reynolds, Leiber, Fassero, Nagy.


#30    studes      (see all posts) 2006/09/28 (Thu) @ 12:02

You are using actual runs allowed in the denominator as part of your “independent” x-variable to estimate y.

I get that.  Thanks.

So, does all this mean that FIP still works???


#31    tangotiger      (see all posts) 2006/09/28 (Thu) @ 12:11

Yes, pretty much.  Treat any difference of less than .1 as insignificant.  As well, make sure to set the fudge factor based on year and league. 

Among the top 9 in runs allowed per 9 innings (happens to be the 9 guys I selected at the beginning), here are their fudge factors:
playerID fudgeFIP
glavito02 2.91
maddugr01 3.27
clemero02 3.36
martipe02 3.38
brownke01 3.48
mussimi01 3.49
smoltjo01 3.51
schilcu01 3.53
johnsra05 3.55

If I take the top 9 in FIP, take out Glavine, add Pettitte.  His fudge is 3.65.

Here are the fudge factors for the 10 worst FIP pitchers (min 5000 PA):
playerID fudgeFIP
helliri01 3.11
hentgpa01 3.13
rueteki01 3.20
anderbr02 3.29
wakefti01 3.34
suppaje01 3.40
ponsosi01 3.60
mulhote01 3.72
limajo01 3.76
oliveda02 3.76

The FIP is not biased based on quality of pitcher.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main