THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, August 05, 2009

The best of this week at BPro

By Tangotiger, 10:27 AM

In an effort to be more balanced and highlight the “straight arrows” (© Ted Tevan) at BPro, please feel free to use this thread to link to and discuss any of the good recent work at BPro (preferably from a saberist angle).


#1    JayM      (see all posts) 2009/08/05 (Wed) @ 11:19

Not sure if this counts as this week, but here’s an article by Clay from last Monday analyzing the effects of the trades in terms of Rany’s playoff odds generator:

http://www.baseballprospectus.com/article.php?articleid=9327


#2    Colin Wyers      (see all posts) 2009/08/05 (Wed) @ 11:27

That article uses the updated in-season PECOTAs, which are frankly wrong.


#3    Tangotiger      (see all posts) 2009/08/05 (Wed) @ 11:32

Nate had a very good article a few years ago that showed the change in playoff odds, and what constitutes the sweet spot.  Just last week, I posted a simarly-themed article.

I love these articles, because it bridges alot of what we do as saberists into something actionable and easy to understand for the mainstream.


#4    Tangotiger      (see all posts) 2009/08/05 (Wed) @ 11:40

If it uses the in-season PECOTAs which we discussed last week, then yes, those can be (at best) classified as questionable, and realistically as wrong.

Let me propose the best way to test something in-season: the forecast at day N-x and the forecast at day N should be the same.  And the less it’s the same, the less the forecast is reasonable.

Let me expand on that, and I’ll use a team forecast to make life easier for me.  Let’s say that today is April 1, and there are 162 games to go for the Expos.  And I say they will win 89 games.

After game #1, of which they win, my forecast is for them to win 89 more games, plus the one they win, for a total of 90.

After game #2, of which they win, my forecast is for them to win 88 more games, plus the two, for a total of 90.

After game #3, of which they win, my forecast is now for 91.

After game #4, of which they lose, my forecast remains at 91.

And so on and so on.

If by the end of the season they end up winning 98 games, then I was off by 8 before game 1, 8 before game 2, 7 before game 3, 7 before game 4, and so on.

I further propose that the day-to-day Marcels be used as the standard benchmark, and then you can run all the forecasting systems against the Marcels to see how well they do.


#5    Colin Wyers      (see all posts) 2009/08/05 (Wed) @ 12:07

Tom, I tested the in-season updating methodology against the original, preseason PECOTAs at 100, 200, 300 and 400 PAs, from 2006-2008. In all cases, RMSE was lower for the preseason PECOTAs.


#6    Tangotiger      (see all posts) 2009/08/05 (Wed) @ 12:39

Colin: it would be wonderful if you could write that up in a blog or article.

What you are saying is that it’s better to use the preseason PECOTA than the in-season PECOTA (which is made up of the preseason PECOTA and current-season data).

Clearly, that should not happen, and can only happen if the current-season data is being overweighted.


#7    Colin Wyers      (see all posts) 2009/08/05 (Wed) @ 13:51

I swear, you can’t make this up. From an article today on FIP:

http://baseballprospectus.com/article.php?articleid=9342#commentMessage

Imagine if the entire baseball blogosphere started using the original Runs Created formula—the one Bill James developed circa Off The Wall—as our primary way of valuing a player’s offensive contribution. Forget run environments, linear weights, league adjustments, and all of the other things we’ve learned over the past thirty years; instead, for the sake of efficiency, we went back to (H + BB) * (TB) / (PA). Maybe it’s not perfect, but hell, it’s easy, and it’s not like Willie Bloomquist is going to come out better than Adam Dunn.

What, you mean like VORP?


#8    Tangotiger      (see all posts) 2009/08/05 (Wed) @ 14:03

Would someone be kind enough to quote the parts where FIP is mentioned so that I can confirm or refute those statements.

One of the commenters said:
“Great work, Shawn! I read somewhere recently that FIP was flawed, but no explanation was given. Now I know why.”

And I’d like to see why he says that.

Is OBP “flawed” for counting a walk and HR the same?  Didn’t think so.


#9    JayM      (see all posts) 2009/08/05 (Wed) @ 16:23

"If you follow these things, you know where I’m going with this: HR/9—or HR/PA, depending on how you want to look at it—was one of Voros McCracken’s original Three True Outcomes way back in 1999, and it’s been treated as such ever since. But that didn’t make sense then, and it doesn’t make sense now—a pitcher’s home run-rate isn’t nearly as stable from year to year as his strikeout and walk rates, a fact that Voros himself noted in his early articles. Logically enough, when it’s used as a major component of a defense-independent pitching stat, it makes that metric less stable as well.

This isn’t new ground. There’s been plenty of research done on the subject, and the explanation is pretty clear: a pitcher’s HR/FB rate correlates about as well as his BABIP from year to year, especially after you adjust for park effects. Over the course of several years, there will be statistically significant differences between pitchers. But that simply doesn’t manifest itself every single year, and if you’re trying to evaluate a pitcher on a season-by-season basis, or in the middle of a season, it’s probably better to just leave it out.”

“the year-to-year r-squared coefficients for QuikERA and xFIP center around 0.45, whereas FIP (which uses HR, K, BB, and HBP) comes in at 0.25, and ERA and RA are around 0.10. (These numbers are based on single season data 2004-2008. A different dataset might give you slightly different results, but should always lead to the same conclusion.)”

“Yet despite the obvious logic of using GB% or a normalized HR/FB number as the third true outcome for single-season stats, the uptake in the blogosphere has been oddly slow. FIP has largely become the de facto rate stat for pitchers, and while it’s useful over the long haul, it leans far too heavily on HR/9 to be used for shorter time spans.”


#10    Zach      (see all posts) 2009/08/05 (Wed) @ 16:33

"Logically enough, when it’s used as a major component of a defense-independent pitching stat, it makes that metric less stable as well.”

That’s weird, because home runs ARE defense-independent, right? Right.

Also, who cares if QERA is more stable than FIP? QERA uses SO, BB, and GB, whereas FIP uses SO, BB, and HR--and obviously, ground balls are more stable than HR.

The important thing is whether QERA predicts future ERA better or worse than FIP or xFIP or tRA (did he bring up xFIP at all?), like Colin tested at THT a few weeks ago.


#11    Zach      (see all posts) 2009/08/05 (Wed) @ 16:39

Something I forgot to bring up in #10: Does it make any difference to the r^2’s that QERA’s constant is a fixed 2.69, and FIP’s constant changes each year based on the league average ERA and unadjusted FIP?

Shawn tested his r^2’s from 2004-2008. Here are the league FIP constants in that time…

2004 3.14
2005 3.10
2006 3.24
2007 3.33
2008 3.22


#12    Nick      (see all posts) 2009/08/05 (Wed) @ 16:48

The important thing is whether QERA predicts future ERA better or worse than FIP or xFIP or tRA (did he bring up xFIP at all?), like Colin tested at THT a few weeks ago.

I disagree with that.  An ERA predictor is not the thing we should strive for in these metrics.  The point of FIP/tRA/xFIP is measure skill; and one ways to do that is to test repeatability.


#13    Colin Wyers      (see all posts) 2009/08/05 (Wed) @ 17:05

Nick, some things are very repeatable but don’t carry a lot of value information with them at all. We don’t really care if something predicts itself well unless we already know that something measures value well.

And now let me express myself very carefully, so that there is no confusion here: RA is a very good measure of the value of populations of pitchers - we are very confident that if we take a group of guys that all had a 3.00 RA in a season and compared them to a group of guys who had a 5.00 RA in a season (controlling for IP, of course), as a group the 3 RA pitchers are better than the 5 RA pitchers.

What we are interested in is, based upon a set of observations, is how closely we can estimate a pitcher’s talent level. (Sort of. There is a very strong arguement for using FIP/tRA to measure past performance, even though they doesn’t predict future performance as well as xFIP, because from a value perspective we are only interested in isolating a pitcher’s contributions from those of his teammates, not from other factors like “luck” or random variation around an estimated true talent level or whatever you want to call it.)


#14    JayM      (see all posts) 2009/08/05 (Wed) @ 19:07

He does talk about xFIP, and highlights it as a good metric of what he’s trying to accomplish. But the question called for discussion of FIP, and I don’t know what the rules are for quoting too much of a pay site.

In any event, the basic gist of his argument is that FIP uses HR rates, which aren’t nearly as repeatable as groundball rates, so it isn’t a good predictor of future success.

For me, that’s what I care about with a metric like that. If I simply wanted to describe what has happened in the past, I’d just look at the RA/9 in that player’s starts. With a stat like FIP, I want to know if that is a fluky thing or if I can value that player highly going forward.


#15    Tangotiger      (see all posts) 2009/08/05 (Wed) @ 19:17

You can quote what you want as long as its fair use or for educational purposes.  If you are not sure, email me what you want, and I’ll quote it.

***

FIP is what it is.  It has no “flaw”.  If someone uses it as its not intended to be used, that’s not a FIP flaw, but a user flaw.

***

Colin is right that you want to correlate the metric to Runs per 9IP (you do NOT want to correlate to itself in the future, or to ERA).

I also don’t believe these numbers in the article:

the year-to-year r-squared coefficients for QuikERA and xFIP center around 0.45, whereas FIP (which uses HR, K, BB, and HBP) comes in at 0.25, and ERA and RA are around 0.10.

Does he provide a reference?

***

As for QuikERA, I already gave my analysis of the metric the day it came out.  It hasn’t changed.  The FLAW in it is that GB% is used (i.e., GB per BIP).  If, for example, you have a high TTO pitcher (Pedro, RJ, Armando Benitez) and a low TTO guy (Radke, Rueter), and they had the same GB%, they’d get the same impact from that part of the equation. 

But, that makes no sense.  What if, for example, you had someone with 95% TTO.  Then, what does it matter if he has 80% GB or 10% GB?  Only 5% of his PA are BIP anyway.

I already noted how I handle it, and that is by looking at GB MINUS FB per PA and applying a factor to that.

***

I also have a non-HR version of FIP, which I call kwERA, and it’s 5.4 - 12*(K-BB)/PA.  This was inspired by GuyM (who has lots of great ideas by the way).


#16    Nick      (see all posts) 2009/08/05 (Wed) @ 19:51

Yeah, I doubt that xFIP and QERA have a ~.70 year to year correlation.  I’d like to see the numbers on that too.

Also, Tango- that kwERA formula is crazy.  I’ve been thinking about it for 10 minutes and I still can’t see why it would work.


#17    Matthew Cornwell      (see all posts) 2009/08/05 (Wed) @ 21:24

Since many pitchers with long careers do show ability to supress BABIP (compared to teamates) and HR/FB, and since FIP and FIPx do not directly consider the long-term accumulating affects of controling the running game, wild pitches, balks, GB% rate (which affects XBH rates and GIDP), situational split (stretch/windup) data that may be a significant “skill” in some cases (larger percentage of walks with 1B open, etc.), and any “real” LOB% results that aren’t alredy captured by the “TTO”, how many BFs does a pitcher need before FIP or FIPx are no longer effective ways to evaluate their careers?  I have no concerns or questions about using FIP for evaluating young pitchers with limited data or predicting general performance for a vast majority of pitchers without track records.  But is it reliable for pitchers who have had more than enough BFs to almost regress completely in the partial-pitcher, partial-luck, run prevention tools?

I have often wondered for pitchers with 10,000 or more PA’s, if ERA+ is not closer to “true talent” than FIP.  ERA+ leaves out team defense and quality of batters faced (not a big issue for post WW2 pitchers) and has shaky park-affects. FIP leaves out everything I mentioned above.  Which one leaves out more? Many people use FIP as a career evaluation method for guys with tons of BFs and I am not sure if this is a very good use of the stat.  I think Tom is right: a problem with the user, not the stat.

Conclusion, I want to know your opinions on if the following statement is correct:

FIP is a great stat to predict future performance or get a good idea of the skill level of pitchers without a lot if history, but not so great at evaluating pitchers with very long careers?


#18    Tangotiger      (see all posts) 2009/08/05 (Wed) @ 22:44

If I’m going to predict future performance, and I know the pitcher is pitching on the same team as last year, then why would I want to ignore his or the team’s DER?

Answer: you don’t.


#19          (see all posts) 2009/08/06 (Thu) @ 02:54

Tango, for QuikERA, if GB% was adjusted from the percentage of balls in play, to percentage of outs recorded how would QuikERA stack up as a metric in your eyes?


#20    Tangotiger      (see all posts) 2009/08/06 (Thu) @ 07:13

You’d have to do GB minus FB per out or per IP.

Otherwise, you’d get the same problem:

7 GB, 3 FB, 7 IP
7 GB, 7 FB, 7 IP

Which guy gave up more HR?

Indeed, you might simply do FB per IP, which gives you xFIP.


#21    Matt Swartz      (see all posts) 2009/08/06 (Thu) @ 09:29

Tango, I agree that xFIP and QuikERA are going to be more predictive, but I’m curious about the flaw in using GB/BIP.  I’m missing something in the middle.  Since QuikERA is a quadratic, I thought that took into account that high TTO pitchers will give up fewer BIP?  The QuikERA formula is:

(2.69-.66*GB%-3.4*K%+3.88*BB%)^2

so the derivative with respect to GB% is:

(-.66)*(2)*(2.69-.66*GB%-3.4*K%+3.88*BB%)

The derivative is negative, but for higher K% it is less negative, since there are fewer balls in play.  For BB%, it has a larger impact since HR come with more men on base.  That seems intuitive to me, but I hadn’t thought much about the GB% issue.


#22    Tangotiger      (see all posts) 2009/08/06 (Thu) @ 10:14

Matt, let’s take an extreme example.  Now, when I take an extreme example, it is ONLY to show the flaw in the design, or at least a limitation in how much you can use it.  It’s not necessarily to say that it’s useless.  After all, FIP is mostly-linear, and we know that’s not how run creation works.  FIP, after all, was designed as a quick way to derive DIPS, without the machinations that Voros needs to go through to reconstruct the pitching line.

Anyway, let’s start with something basic: GB per BIP of .30, K per PA of .20 and BB per PA of .10.  Right away, you have to ask: “Why the heck am I allowed to add numbers with different denominators”.  That’s the OPS issue, right?  But, let’s leave that aside for the moment.  The quick ERA comes in at 4.84.

Now, let’s change the K per PA to .54 and the BB per PA to .40.  Leave the GB per BIP as constant.  The quick ERA is now 4.88.  This is our baseline.  Change the GB per BIP to .99.  Quick ERA is now 3.07.  Change the GB per BIP to .01.  Quick ERA is now 5.76.

Explain to me how, if all you have is 6% of your PA that end up with a BIP, that it matters so much what the GB rate is?  Indeed, regardless if 1% or 99% of your PA are balls in play, the GB rate per BIP will affect your quick ERA equally!

It is, of course, impossible.  Quick ERA suffers from the same denominator issue that plagues OPS.

Now, the correction to this is so darn simple, and I’ve pointed it out numerous times: use the same denominator as K and BB, and that is per PA.

You can’t just use GB per PA, since now you are lumping in FB and LD together.  All you have to do is use GB minus FB per PA.

The weight of that will be about one-fourth the weight you place on K or BB.  I’ve also shown the proof for that in the past.

Here for example are the linear weights values for various events:
-.28 runs: strikeouts
-.12 runs: ground balls
-.12 runs: flyballs, excluding HR
+.32 runs: walks, line drives
+1.40 runs: flyballs, HR only

Now you can collapse the two flyball lines by taking 90% of one and 10% of the other to get:
+.03 runs: flyballs

As you can see, the difference between GB and FB is .15 runs, which is about one-fourth the difference between K and BB.

Now, the K,BB version of “FIP” would be:
5.4 - 12*(K-BB)/PA

If you want to introduce GB and FB, you simply end up with:
x - 12*(K-BB)/PA - 3*(GB-FB)/PA

Set “x” so that it matches the league-year in question.

This is the batted-ball version of FIP, which really, will probably be pretty close to xFIP.


#23    Matt Swartz      (see all posts) 2009/08/06 (Thu) @ 19:54

I see the criticism about BIP vs. PA as the denominator.  The GB-FB thing makes less sense as described.  Given the lack of autocorrelation with pitcher line drive rate for 2003-2008 data, I don’t really think pitchers affect line drive rate.  Does the GB-FB thing assume that line drive rate is a skill?

It seems like developing new coefficient for GB/BIP in place of GB/PA would be better than xFIP because it would take into account that non-linear nature to run scoring, the interaction between GB and BB and between GB and K, and wouldn’t suffer from the bias you mention above?


#24    Tangotiger      (see all posts) 2009/08/07 (Fri) @ 14:45

Does the GB-FB thing assume that line drive rate is a skill?

No, it presumes that the LD drive rate is irrelevant.

For example, say you have 100 contacted pitches, 40 results in GB, 30 in FB and 30 in LD.

Say you have someone else with 50 GB, 40 FB, 10 LD.

All that is happening is that the excess LD are distributed equally (not proportionately) to GB and FB.

You could, if you want to make it more complicated, distribute it proportionately.  But, when you have such an elegant solution already, why complicate it?

It seems like developing new coefficient for GB/BIP in place of GB/PA would be better than xFIP

Well, you will still be stuck with the same issue I have shown about the denominators.  It’s not the coefficient, it’s the denominator that is the issue.

non-linear nature to run scoring

When Voros does his DIPS, he uses a non-linear version (I think).  It’s long and complicated.  I use FIP and the correlation between DIPS and FIP is r=.99.

So, non-linear may be better, but is it really worth it?

I mean, FIP is referenced 99% of the time and the full DIPS equation is referenced 1% of the time.  Why do you think that is?

And my version of the batted ball FIP is almost as simple as FIP.  It’s a natural extension to it.

That said, try to come up with a non-linear version, run a correlation against my batted ball version of FIP (I guess I’ll call it bbFIP), and tell me how well they link up.


#25    Colin Wyers      (see all posts) 2009/08/07 (Fri) @ 16:08

What the world doesn’t need (as far as I can tell) is another nonlinear run estimator. We have BaseRuns. And that’s the best place to start.

What I would do in this case is use the batted ball components to build a pitcher’s estimated defense-independent batting line allowed and run that through BaseRuns.


#26    Tangotiger      (see all posts) 2009/08/07 (Fri) @ 16:22

Colin/25: that’s basically what Voros does (but he doesn’t use BaseRuns).  Given what we are after, we really don’t care if the equation comes out to an ERA of 3.24 or 3.37 or 3.11.  They are “close enough” for our purposes.

Hopefully, with Matt and Eric on board over there, they can look at this issue and make the improvements to quick ERA needed, or simply scrap it in favor of bbFIP.


#27    Patriot      (see all posts) 2009/08/07 (Fri) @ 17:57

Which incarnation of Voros’ formula are we talking about here?  DIPS 2.0 used linear weights of some sort(*).  More recently on his blog he did use BsR, albeit with some interesting coefficients in the A and C factors (but not the batted ball components) (**).

In the spreadsheets I post every year at the end of the season, I figure a BsR standard DIPS (i.e. just assuming that $H = Lg$H).  I was doing FIP for a couple of years, but figured I may as well use something different since lots of sources have FIP.  Despite that, I agree that it’s overkill unless you are going to use the number in a WAR calculation or some other application where the difference between 3.37 and 3.11 might make a substantial difference. 

(*) http://www.baseballthinkfactory.org/mccracken/dipsexpl.html

(**) click my name


#28    Colin Wyers      (see all posts) 2009/08/07 (Fri) @ 23:05

Using Retrosheet data from 2003-2008, here’s the rates per BB type:

BB    1B       2B       3B       HR      Outs
F     0.0571   0.0815   0.0121   0.1169  0.7365
G     0.2167   0.0187   0.0010   0.0000  0.8013
L     0.5160   0.1768   0.0149   0.0241  0.2935
P     0.0165   0.0042   0.0002   0.0000  0.9754
A     0.2120   0.0646   0.0066   0.0371  0.7023

Where F is flyball, G is groundball, L is line drive, P is popup and A is all batted balls, which we’ll abbreviate CON (for contact).

Let’s suppose that we want a prospective, not retrospective, measure of talent - that is to say, something like xFIP. This is not going to be the “right” or “most right” way of doing it - I mean, there’s a tension between doing it right and doing it simple, and I’m going to lean toward simple, although a bit less towards simple than xFIP (but much moreso than, say, PECOTA).

First, figure the league average for LD/CON.

Take total batted balls and subtract line drives. (Also, subtract unknown batted balls.) We’ll call that U, for no reason other than I need a letter to call it. Then figure a pitcher’s G/U, F/U and P/U. Then take and multiply CON by league LD/CON (save that), then subtract that from CON and we get adjusted U, or AdjU. Then take G/U etc. times AdjU. That gives us an estimate of a pitcher’s “true” battedball rates.

Then take our adjusted CON totals and mutlipy out by the rates per BB type, and combine with BB and K rates to get a pitcher’s batting line “allowed.” From there it’s really trivial to use BsR to give us runs allowed. Add our estimated outs on contact to strikeouts and divide by 3 to get adjusted IP, and it should be easy to come up with estimated RA.


#29    Tangotiger      (see all posts) 2009/08/08 (Sat) @ 00:52

You also need DPs.

And I’ll bet if you do all that (which is more or less correct), you will get what I got in post 22.


#30    Colin Wyers      (see all posts) 2009/08/08 (Sat) @ 01:26

I’ve seen BsR with both initial and final baserunners; I don’t know off the top of my head which is “better.” DP outs are already included in the outs column. So you should be able to go ahead with that as is (although if I were to sit down and implement it myself I probably would go and do the DPs).

Of course one of us could test this. I’d rather not, because it sounds like work, but will probably end up doing so at some point because I already have the pitching split-halves with xFIP figured.


#31    Tangotiger      (see all posts) 2009/08/08 (Sat) @ 08:37

Since you are splitting up GB from FB, then you need to have a different run value for outs.


#32    Tangotiger      (see all posts) 2009/08/08 (Sat) @ 09:06

Patriot/27 was marked for moderation and is now open.

Hint to the regulars: if you want to put a link, don’t put the “http://” in front.  That’s the trigger.


#33    Tangotiger      (see all posts) 2009/08/08 (Sat) @ 09:07

Patriot, I stand corrected.  That was the page I was thinking, but I combined that with Voros’ new shortcut using BaseRuns.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 14:49
Mail: rWAR v fWAR

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 13:00
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 12:05
Could Rob Dibble have been a comp for Strasburg?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II

Sep 01 22:11
PITCHf/x Summit 2010 - Recaps