THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, December 29, 2009

Mike Silva Chronicles - Part 4: FIP

By Tangotiger, 05:20 PM

FIP is the one advanced metric that intrigues me. My issue is that is favors players with a large number of strikeouts. A good pitcher doesn’t always have to strike people out, but could still make quality pitches that generate outs.


The formula for FIP is:
(13*HR + 3*[BB+HB-IB] - 2*SO) / IP + 3.2

That “3.2” is just to align it to ERA.  Otherwise, the point is that if you can keep your HR down, your BB and hit batters down, and your K up, then you will have a low FIP.

This makes it seem that all “contact plays” are the design of luck, which I think you would agree is not the case. 

This is true, that not all contact plays are the same.  But, do we have an issue with OBP?  What does OBP say: all walks, singles, and HR are worth “1”.  Does that make any sense?  No, of course not.  But, OBP is just one component to offense.  There’s also moving runners over, as well as baserunning.

FIP is the same thing: it is only concerned with one component to pitching.  And that component is the one that does not involve his fielders.  The interesting thing with FIP is that the players wtih the best FIP are also the ERA leaders.  For example, here’s the top 10 FIP of pitchers born since 1962, minimum 10,000 batters faced:
FIP
2.99 Martinez Pedro
3.28 Clemens Roger
3.33 Johnson Randy
3.33 Schilling Curt
3.34 Maddux Greg
3.35 Smoltz John
3.48 Brown Kevin
3.55 Saberhagen Bret
3.60 Gooden Dwight
3.68 Mussina Mike

And here are the top 10 in ERA, under the same criteria:
ERA
2.93 Martinez Pedro
3.12 Clemens Roger
3.16 Maddux Greg
3.28 Brown Kevin
3.29 Johnson Randy
3.33 Smoltz John
3.34 Saberhagen Bret
3.46 Schilling Curt
3.46 Cone David
3.51 Gooden Dwight

Look at the names.  Pedro is #1 in either case, as is Clemens.  Even Maddux doesn’t move much.  Schilling and RJ are on both lists, as is Smoltz and Kevin Brown and Bret Saberhagen and Dwight Gooden.  In fact, there is only one player different in the top 10: Mussina is #10 in FIP (he was 12th in ERA), and David Cone is #9 in ERA (he was 11th in FIP).

This is the revelation of DIPS and FIP: even though we completely ignored the number of hits a pitcher allowed (COMPLETELY) we were still able to get a top 10 list using only HR, BB, and SO that pretty much matched their ERA.  That’s astounding isn’t it?

So, the purpose of FIP is not to dismiss hits allowed and other non-HR contact plays, but simply to break it up into its own component (fielding-independent pitching, FIP).  And, the fantastic byproduct of doing that is that even if you dismiss hits allowed and other non-HR contact plays, you STILL get a very similar answer.

There are exceptions of course, as with everything.  Tom Glavine’s career FIP is about 0.50 runs worse than his career ERA.  This signals that Glavine does something extra, either he can sequence his events better (leaves alot of runners on base for example), or he has better control on his balls in play.  And Javy Vazquez’s ERA is about 0.30 runs worse than his FIP, which signals something different, that perhaps he gives up alot of doubles, or doesn’t sequence his events well, etc.

Overall, two-thirds of pitchers will have their FIP and ERA be within 0.20 runs of each other, and almost all will be within 0.40 runs of each other. 

That’s the power of FIP: that it’s designed to tell you one specific thing, and it tells you a second, perhaps even more important, thing.  This is perhaps the most important sabermetric finding in the last twenty years.  And we all have Voros McCracken to thank.

#1    Dave P.      (see all posts) 2009/12/29 (Tue) @ 17:43

Great answer.


#2    Matthew Cornwell      (see all posts) 2009/12/29 (Tue) @ 17:54

Glavine is such an interesting case, because he just so happens to be very good at just about everything FIP leaves out. Stranding runners/impressive stretch and windup splits?  Check.  BABIP reduction compared to teamates?  Check. High GB rates which drive up GBDP and lower XBH? Check. Hold runners well and limit HB and WP?  Check. Defend your position well? Check. Hit well? Check. 

Of course Glavine was no FIP disaster either.  A 3.90 or so FIP is still higher than the second tier class of pitchers of his era: Moyer, Wells, Rogers, Pettitte, etc. And none of them had near the collection of secondary run producing skills mentioned above. His walk rate was above average and his HR/9 rate was all-time good.  Those pesky K’s kept him from being a top 15 or so all-time pitcher.

Bottom line: Glavines’ case is very unique and should be looked at as such. There needs to be a lot more guys like him to prove that FIP doesn’t work too well.


#3    Nick Steiner      (see all posts) 2009/12/29 (Tue) @ 18:06

Tango’s whole point was the FIP doesn’t need to “work” because it is what it is.  It measures, to a very accurate degree, how many earned runs a pitcher would give up per 9 if you gave him no credit for timing or hits allowed.  The problem is when some people think that FIP (or similar measures like tRA, xFIP, etc.) are the final word. 

There are certainly players who outperform their FIP’s year in and year out.  I’m looking at Jarrod Washburn, Tom Glavine, Mark Buerhle....  With those pitchers it’s somewhat foolish to judge their seasons in retrospective (for awards) using FIP as well as projecting them.


#4    Matthew Cornwell      (see all posts) 2009/12/29 (Tue) @ 18:14

Agreed, Nick.

I just wish there were advanced statistics out there which did accurately pinpoint the contributions of guys like Glavine.  Seems to me that for long career guys like Glavine, that a metric which only takes out defensive quality would be best, such as rWAR.  I guess for short career or single season pitchers, there is too small of a sample size to determine much about the pitchers like Glavine.


#5          (see all posts) 2009/12/29 (Tue) @ 18:50

While I like FIP and it is important for what it does, I prefer tRA as it goes a step farther, weighting the outcome of each batted ball type (Fly ball, popup, ground ball, line drive). 

Like FIP uses linear weights for HR, BB, and K, tRA also uses linear weights and gives values for the results of each batted ball.
http://www.lookoutlanding.com/2008/6/23/557089/the-big-tra-post

There is also tRA+ which regresses the Home Run rate towards the pitcher’s career average to be a better predictor of future performance.

Obviously there are still issues, as you have to correctly identify the batted ball type, etc.
I do expect the results of PitchF/X to give us something better than these stats in the coming years.


#6    Michael      (see all posts) 2009/12/29 (Tue) @ 19:16

@Alex/#5: Technically, you mean tRA*, not tRA+. tRA+ is like ERA+.

Tango’s explanation is excellent. The fact that FIP/ERA are very close for pitchers shows that the average pitcher does indeed have little effect on hit types.


#7    philly      (see all posts) 2009/12/29 (Tue) @ 19:53

Overall, two-thirds of pitchers will have their FIP and ERA be within 0.20 runs of each other, and almost all will be within 0.40 runs of each other.

Is that with the same 10,000 batters faced?

If you’re going to use that as a summary it would be nice to be clear on the minimum and perhaps also address smaller samples as it’s the one year FIP based declarations often make people crazy.


#8    Matthew Cornwell      (see all posts) 2009/12/29 (Tue) @ 20:12

I have a lot of question regarding tRA*

1. How do the batted ball weights affect a guy like Greg Maddux?  Most GB pitchers would give up a higher number of runs per batted ball since BABIP is higher for GB, traditionaly.  However, Maddux was an extreme GBer and a major BABIP reducer.  Likewise, what if a flyball pitcher routinely gives up a lot more hits on FB than average?

I know that tRA* regresses K rates, etc. to league averages, but that seems unfair to pitchers who have proven to have the ability to repeat great performances in each area. 

Also, are the linear weights used in tRA* similar to baseRuns in which each pitcher’s own run environment is calculated or are the weights only compared to a league average baseline?

This is why I like Rally’s WAR for long careers, it takes out defense support and gives credit to the pitcher for everything else. Even though it looks like it suffers from the same GB/FB linear weights assumptions that tRA* does.  Given enough BF, rWAR should give a pretty good idea of a pitcher’s ability, since it includes all of those things I mentioned in post #2 .  Of course FIP, etc. is still best for pitchers who only have a season or two under their belts.


#9    Tangotiger      (see all posts) 2009/12/30 (Wed) @ 00:06

Philly: yes, same pitchers.

The point is that the larger the samples, the more the sequencing and the BABIP converge toward the league mean.

This obviously doesn’t happen with BB/PA, K/PA or HR/PA.


#10    Jud      (see all posts) 2009/12/30 (Wed) @ 03:55

The FIP vs. ERA “top 10” table, exposes a problem with FIP. The average FIP number of the “top 10” is 0.11 higher than the average ERA. As presumably the average FIP for all pitchers in baseball is the same as the average ERA, this means that FIP tends to minimize the distinction between pitchers. In other words, a FIP to run converter would need to be non-linear.
This is yet another demonstration that FIP is leaving some value on the table.
I don’t think that I’m being too original here, Colin mentioned this in THT some time ago.


#11          (see all posts) 2009/12/30 (Wed) @ 10:59

What do you mean “leaving some value on the table”?

Doesn’t every stat leave some value on the table?


#12    Jud      (see all posts) 2009/12/30 (Wed) @ 11:57

I mean that part of the pitchers value is not encompassed by FIP. While we have the positive aspect of removing the affect that fielders have on pitching stats, the Fangraphs version of FIP does not account for some of the value that a pitcher provides.


#13    oldjacket      (see all posts) 2009/12/30 (Wed) @ 12:36

Jud, how is that a problem? An average FIP over a pitcher’s career will still lead to an average ERA, they’re just different averages.


#14    Jud      (see all posts) 2009/12/30 (Wed) @ 12:53

The problem is that while an average pitcher would exhibit the same average FIP and ERA, an excellent pitcher would have a FIP that is closer to the mean than his ERA. So a linear conversion between FIP and runs is not accurate. Of course you could argue that the 0.11 difference in the mean value of the “top 10” is not significant, but I don’t think that is the case.
There was a more thorough write-up on this topic some time ago in THT or BBTF.


#15    Tangotiger      (see all posts) 2009/12/30 (Wed) @ 12:56

It’s not supposed to be “accurate”, just “accurate enough”.  The purpose of FIP is to put those components on an ERA-scale.  If you want to argue that a BaseRuns-type of FIP is better, yes, that’s true.  But, at that point, the pure ease of FIP is lost in the complexity of BsR-based FIP.

FIP is powerful for its sheer simplicity and the incredibly high correlation to ERA.


#16    Colin Wyers      (see all posts) 2009/12/30 (Wed) @ 13:09

FIP is what FIP is. Can we make a better stat? Sure. I think a lot of us have done so at some point.

To go off on a tangent - there are several different kinds of sabermetric stats out there. Some of them are your hammer, your screwdriver and your wrench - consider them your “man-portable” sabermetrics tools. FIP. OPS. Basic RC. Runs Produced. Easy to remember, easy to figure, can be figured using basic box-score stats. If you’re ever stuck with nothing but a copy of that morning’s USA Today and need to do some sabermetric analysis (yeah, “need” may not be the right word there) you have them ready.

You can do a lot of things to improve FIP, depending on your definition of “improve” and the question you are trying to answer. But then FIP isn’t a screwdriver anymore, it’s a power drill with a screwdriver attatchment on it. And sometime’s that’s what you want - most of us have absurdly powerful computers sitting right in front of us. Why not have them do the math?

But you’re always going to need a screwdriver around. There’s always going to be a place for FIP.


#17    Tangotiger      (see all posts) 2009/12/30 (Wed) @ 13:29

Right, agreed.

OPS for example is fine, if all you have is OBP and SLG handy.  If you actually have all the component numbers, then it’s beyond silly to create OPS out of them.  This is why I am so bothered by Forman’s OPS+.  He’s the only one that creates it, he has an 8-step, very complex process, to calculate it, a process that only he himself goes through.  And yet, he insists on OPS+.

If all you have are HR, BB, SO, and IP, I can’t think of a better way to put them together than FIP.  If you are going to ask me to come up with the “right” ERA, then, yeah, I’ll plug it into the computer, go through a BaseRuns machination, and get you the better ERA.  Most people are happy with FIP because it’s simple, they can do it themselves, it’s open, and it gets you 95% of the way there.

There’s no way Greinke and Bannister quote FIP otherwise.


#18    Tangotiger      (see all posts) 2009/12/30 (Wed) @ 13:31

By the way, I just realized: is FIP even on BPro or B-R.com or Baseball Cube?  Is Fangraphs (and Hardball Times when they had it) the only one that carries it?


#19    Colin Wyers      (see all posts) 2009/12/30 (Wed) @ 13:49

I think First Inning has it for minor leaguers. (They also use wOBA.)

I would’ve sworn that BPro had Voros’ original DIPS ERA somewhere on the site but I’ve looked everywhere and couldn’t find it. Clay has Defense-Adjusted ERA for the DTs, but as far as I can tell that’s more similar to what Rally does for his WAR than FIP.


#20    Tangotiger      (see all posts) 2009/12/30 (Wed) @ 13:55

I sent this to “Ask Bill”, in response to one of his readers talkign about FIP being “incomplete”, which pretty much summarizes what I said here.

When I describe Fielding Independent Pitching (FIP), I do so by saying it’s one component, like OBP and SLG are each a component of a hitter.  FIP just happens to be a much larger component of pitching than either (one of) OBP or SLG is of hitting.  FIP doesn’t care about everything about pitching, just like SLG doesn’t care about walks, or OBP doesn’t care about the extra bases on hits.


#21    OaklandA's      (see all posts) 2009/12/30 (Wed) @ 19:31

Good article, but I think it is missing one key point.  The article explains how FIP correlates very well with ERA for most pitchers, and that you can evaluate pitchers well by only looking at K, BB, and HR.  But here is what is missing - why should we evaluate pitchers by FIP instead of actual ERA (or park-adjusted ERA)?

That is the part that many first-time readers of FIP have trouble with.


#22    Nick Steiner      (see all posts) 2009/12/30 (Wed) @ 21:02

Minor league splits also has FIP.


#23    Tangotiger      (see all posts) 2009/12/30 (Wed) @ 21:20

"But here is what is missing - why should we evaluate pitchers by FIP instead of actual ERA”

No one said either/or.


#24    Nick Steiner      (see all posts) 2009/12/30 (Wed) @ 21:25

ERA is probably better for retrospective, because it includes timing of events and some BABIP skill.  FIP is definitely better for predicting future performance.


#25    Matthew Cornwell      (see all posts) 2009/12/31 (Thu) @ 01:39

Which “leaves out” more?

ERA - defensive support, leveraging/quality of batters faced, park factors, bullpen support, the pitcher’s responsibility regarding unearned runs

FIP - event timing/sit. splits/LOB%, etc., what BABIP skill does exist, pitcher defense, DP inducing, HBP, WP, leveraging/quality of batters faced, park factors, XBH prevention, pick-offs, controlling running game

Most pitchers can’t prevent enough runs by controlling the running game or limiting doubles or defending their position well in any given season to make a huge difference in their FIP or ERA.  That is why FIP works so well at a seasonal level - it leaves in what pitchers control the most, and as a fair trade-off for most pitchers (the Glavine’s being examples,) takes out what is least impactful and controllable.  However, over 15-20 seasons, those secondary run prevention tools add up to be tons of runs for many pitchers.

Take RA+ - if you could just adjust for defensive support you should get pretty close to “true” RA+ for long career guys.  BABIP and HR/FB have had enough PA’s to stabilize, leveraging and quality of batters faced is not a huge factor for modern starters, park is considered already, and bullpen support tends to be a smallish factor for most pitchers over long careers.  Outside of defensive support, what else would dramatically skew a long- tenured pitcher’s “real” RA+ level?

I guess my point is, given a very long career, some defensive-adjusted RA+ would be better than FIP or ERA.  And then use FIP for future performance and evaluating pitchers with only a handful of seasons under their belt.  FIP definitely is very useful.  Like many have said, it does what it is intended to do.


#26    OaklandA's      (see all posts) 2009/12/31 (Thu) @ 03:16

[23] Because the article does not explain why anyone should care about FIP as an evaluation tool.  If all you knew about FIP was this article, it looks like ERA is superior, and FIP comes close but doesn’t quite get there. 

Maybe some discussion about predictive value would help, or a brief summary of why Fangraphs uses FIP rather than ERA in WAR calculations I know there are explanations of this on the FanGraphs site, but I thought the intent of this article was to a be a summary of the value of FIP.


#27    Tangotiger      (see all posts) 2009/12/31 (Thu) @ 11:42

Right, the point was to describe FIP the way you would describe OBP or SLG.  You can take it a step further, but that’s outside the scope of this particular Q&A.


#28    Voros McCracken      (see all posts) 2010/01/01 (Fri) @ 18:29

If you use a base runs type formulation like here:

http://www.vorosmccracken.com/?page_id=14

You do a lot to take care of the problem with the narrowed scale.

Back in the days of runs created, the big problem with the stat for individual hitters was that Frank Thomas doesn’t play on a team of Frank Thomases, so a stat scaling itself like that is scaled wrong. Hence why linear weights type models have been preferred.

However that’s mostly not the case for a starting pitcher, so functions that don’t work in a linear fashion (like runs created or better yet base runs) become more applicable.

This change mostly just affects the width of the scale, but it can move a few guys up a few places if they have an unusual skill set (lots of home runs with few baserunners or lots of baserunners with few home runs).


#29          (see all posts) 2010/01/04 (Mon) @ 11:20

I could be wrong here but I think that FIP can also be good showing luck levels for smaller sample sizes.  When looked at over an entire career the sample is large enough that luck can even out and you have a leaderboard that is pretty much the same as ERA.

When looking at a single season or even just a few starts there is a larger variance and one can tell if the pitcher is pitching well within the things he can control even if the ERA says otherwise.


#30          (see all posts) 2010/01/06 (Wed) @ 13:01

I’m doing a bit of work on the Orioles pitching staff for 2010 and was looking at FIP for 2009.  Orioles starters (combined) had a 5.38 FIP and 5.37 ERA.  O’s relievers, however, posted a 4.49 FIP and a 4.83 ERA.  Does FIP scale to the team level?  What about for relievers?  A 550 IP sample seems fairly large, so is this is a situation where the O’s relievers were particularly bad at one “thing”?


#31    Tangotiger      (see all posts) 2010/01/06 (Wed) @ 14:08

Dude, do it for all the teams, not just 1.  Or, even better, why not let Fangraphs do it for you:

http://www.fangraphs.com/teams.aspx?pos=all&stats=rel&lg=all&type=1&season=2009&month=0

Look at the “E-F” column.  Yes, there is a bit of reliever bias.  We already knew that though, just not the extent of it.


#32          (see all posts) 2010/01/06 (Wed) @ 14:48

Thanks, Tango.  Fangraphs seems to have all my questions answered way before I come up with them.


#33    Alex Krolewski      (see all posts) 2010/01/09 (Sat) @ 14:44

Matthew/25:

In fact, a “defense-adjusted RA+” already exists: PZR, which takes a pitcher’s runs allowed, and removes defensive support by subtracting the UZR compiled behind him.  You can find discussion about PZR on this blog here: http://www.insidethebook.com/ee/index.php/site/comments/pzr/.

In addition, I also calculated PZR RA and PZR win values here at Beyond the Boxscore: http://www.beyondtheboxscore.com/2009/11/7/1098740/pzr-based-win-values-2001-2006
I didn’t calculate the UZR part of PZR myself; instead I used data that MGL gave me.  All I did was calculate the pitcher’s expected RA, and then I used that number to calculate the pitcher’s WAR.


#34    Matthew Cornwell      (see all posts) 2010/01/09 (Sat) @ 20:36

Alex - I have looked at that and love it for long career pitchers.  The consistent,long-career FIP out-performers do well with PZR. Doesn’t seem to be much different from rWAR, however besides using UZR instead of TZ.


#35    Alex Krolewski      (see all posts) 2010/01/09 (Sat) @ 23:09

Right--over a long career, PZR should be almost exactly the same as rWAR.  However, Rally subtracts a pro-rated fraction of the team’s defensive rating rather than subtract the actual defensive rating behind the pitcher.  This leads to some small differences on a seasonal scale, but they tend to cancel out over a pitcher’s career.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 14:26
Mail: rWAR v fWAR

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 13:00
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 12:05
Could Rob Dibble have been a comp for Strasburg?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II

Sep 01 22:11
PITCHf/x Summit 2010 - Recaps