THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, February 08, 2010

SIERA

By Tangotiger, 04:15 PM

Matt and Eric introduce their metric.  A correction:

Nate Silver invented QERA back in 2006 for Baseball Prospectus to adjust for a few issues with FIP and xFIP, and while he referred to the stat as a toy, it represented a big step upward in the methodology of estimators.... QERA has another problem of its own, in that GB% is really GB/Ball in Play (or, GB/BIP), while BB% and K% are measured per batters faced (SO/PA and BB/PA) ...Further, while QERA picks up some of the interaction between walk, strikeout, and ground-ball rates, it does not necessarily weight them correctly

Good for Matt and Eric for noting the two problems with QERA.  I dispute the big/upward claim however.  Also, while FIP has a glossary page, “tRA” is unlinked.  A BPro reader will have no idea what it is.

Anyway, here it is:
SIERA = 6.262 – 18.055*(SO/PA) + 11.292*(BB/PA) – 1.721*((GB-FB-PU)/PA) +10.169*((SO/PA)^2) – 7.069*(((GB-FB-PU)/PA)^2) + 9.561*(SO/PA)*((GB-FB-PU)/PA) – 4.027*(BB/PA)*((GB-FB-PU)/PA)

I’ll need a few hours to test this to see why it works, when it breaks down, and how much of a gain we’re getting over FIP, xFIP and tRA.


#1    Tangotiger      (see all posts) 2010/02/08 (Mon) @ 17:47

Ok, I have a problem with the GB term.  I just did a little test where I started with my baseline, and then I turned 1 GB into 1 FB (based on 1000 PA).  My runs total went up by +.10 runs, pretty much what I expected.

Good so far.

If I turned 10 GB into 10 FB, I ended up with +.90 runs.  Good so far.

I turned 20 GB into 20 FB and I got +1.45 runs.  Not enough, but no panic yet.

I turned 30 GB into GB and I got +1.73 runs, which is not keeping pace.

I turned 40 GB into FB and I got +1.70 runs… a downturn, which is really wrong.  Because those FB are becoming HR and it should always go up.

Indeed, I can get back to my baseline by turning 68 GB into 68 FB.

And if let’s say I turn 200 GB into 200 FB, I end up getting 40 less runs.

Why does this happen?  It’s because of that exponent.

Whether this terms if negative or positive, squaring it will turn it the same sign:
(GB-FB-PU)/PA)

And so, what the equation says is that the more extreme you get, the more you are going to go in a negative direction.

Indeed, the idea to multiply something by using the differentials is not a good idea.  The term I just quoted is a relative term, and when you use a relative term, you better be darn careful when you start multiplying with it, as it happens here.

Sorry guys, but you’re going to have problems here in some instances.


#2    Matt Swartz      (see all posts) 2010/02/08 (Mon) @ 17:52

Oops-- we forgot to point out that we fixed up the formula so that ((GB-FB-PU)/PA)^2 was negative when (GB-FB-PU)/PA was negative.  So it’s really sign(GB-FB-PU)*((GB-FB-PU)/PA)^2.


#3    Tangotiger      (see all posts) 2010/02/08 (Mon) @ 18:06

Well, that makes a difference!  Ok, so those numbers become:
+10 = +0.88
+20 = +1.45
+30 = +1.82
+40 = +2.44
+50 = +3.36
+100 = +12.46
+200 = +53.31

In this case, turning 200 GB into 200 FB adds 53 runs, which turns the baseline 4.30 ERA into 6.17.

For comparison’s sake, and turning some of the FB into HR, I get a FIP of 5.46 and a BaseRuns of 4.84.

I think it looks like your multiplier still has the chance to get out of control here, though admittedly this is an extreme case.

A pure zero GB pitcher, with all teh remaining batted balls as LD or FB becomes in SIERA a 9.16 ERA pitcher, compared to 6.13 in FIP and 5.13 in BsR.

Going the other way, and having zero FB and the rest as GB or LD, and I end up with a negative SIERA ERA (-0.47), 2.74 in FIP and 3.60 in BsR.

So, I still think you are doing something a bit extreme.

Again, I don’t intend for the equation to necessarily work at the extreme.  That’s not the point here.  The point is to show at what point something will breakdown.

If it breaks down at below-Pedro levels, that’s fine.  If it breaks down at Halladay levels, that’s not fine.


#4    Fargo      (see all posts) 2010/02/08 (Mon) @ 18:10

Should point out that this article isn’t behind the BPro pay wall—anybody can read it. I think that’s a good precedent for any new metric that they develop and incorporate into their reports.


#5    Matt Swartz      (see all posts) 2010/02/08 (Mon) @ 18:16

Thursday’s article is going to discuss the RMSE breakdowns for various types of pitchers and it generally does particularly well for very good pitchers.  All of the estimators are going to struggle to spit out an ERA of zero-- if the K% is 100%, you get a negative FIP and xFIP, and you would get a negative QERA if it didn’t turn around and go positive again because of the squared term.  SIERA does particularly well for pitchers with really good K/BB and really good GB%, so it should hold out at the extremes (it does really well for Johan Santana, about equally well for Roy Halladay...Pedro’s great years were mostly before batted ball numbers so it’s tough to say there).

We have some interesting results about line drives coming that might be interesting too and might change how you approach the hypothetical scenarios, but I’ll avoid jumping in front of the other articles for now grin


#6          (see all posts) 2010/02/08 (Mon) @ 18:34

Well, at the very least we finally have an advanced metric with a snappy acronym. So thank god for that…


#7    Michael      (see all posts) 2010/02/08 (Mon) @ 19:04

I’m sure Matt and Eric will clarify in further posts, but since Tango brought up BaseRuns, what would be the advantage of SIERA over a BaseRuns model that assumes average TB values for batted ball types? My guess is that, at least with SIERA, you can use batted ball data directly from your source of choice.


#8    Tangotiger      (see all posts) 2010/02/08 (Mon) @ 19:25

Right, I would think that trying to incorporate it into a BaseRuns model would be the ideal.

Basically, ERic and MAtt are implicitly converted GB and FB into outs and HR and singles, etc.  And then trying to convert into runs.  It’s a two-step process that they’re trying to do in one-step.


#9    Matt Swartz      (see all posts) 2010/02/08 (Mon) @ 19:32

Well, SIERA is going to get at situational pitching both intentionally (sinkers with men on first) and statistically (more runners on first when you strike out fewer hitters and walk more).  I don’t know much about team level run estimation things like Base Runs, but it would depend if they account for the fact that ground balls have a different run value when runners are on first or not.  There’s also the fact that the regression as a method picks up differences between pitchers in their BABIP skills as they correlate with K and BB rates.


#10          (see all posts) 2010/02/08 (Mon) @ 19:34

Everyone understands the benefit of ground balls. They can lead to double plays and can’t be HRs.

But is there a relationship between being a ground ball pitcher and being successful? Do GB numbers correlate highly with FIP, xFIP, or tRA (if we assume those to be strong metrics that predict future performance)?

I can understand (by may not completely agree yet) that Eric and Matt think getting ground balls is is a skill. But I’m not sure if that skill really leads to enhanced production.


#11    Matt Swartz      (see all posts) 2010/02/08 (Mon) @ 19:46

JDSussman/10:

Of course ground balls prevent runs.  That is true according to all of these metrics.  They represent batted balls that aren’t home runs and are less likely to be extra-base hits.  We found them to be very significant according to our regression, so that I guess proves it more so.


#12          (see all posts) 2010/02/08 (Mon) @ 20:08

JD,

I think it’s pretty clear that ground balls are a good thing(other two options are liners and fly balls; both bad), but as a side-note, you really couldn’t correlate ground balls with tRA to see what they’re worth, seeing as tRA in and of itself values ground balls pretty highly.


#13    Tangotiger      (see all posts) 2010/02/08 (Mon) @ 22:54

If you remove HR, then the run value of a GB is pretty much identical to the run value of a FB.  This is why something like xFIP works so well.  xFIP says give me 10% or 11% of the FB, count them as HR and ignore the rest.  And it can do that because FB=GB, after HR are removed.

Now popups = K and LD = walks.  If LD are pretty much the same for all pitchers, that’s a good reason to ignore them.

Same deal for popups.  If they are the same for pitchers, you can ignore them.

So, we’re back to xFIP.

It depends therefore as to what is actually happening, and we’ll see what MAtt and Eric will give us.  OF course if this stays behind the paywall, then this discussion is moot, as basically you are limiting the view of this to the 10K subscribers at BPro, 90% of which won’t care about this.  This would be a horrible outcome, and will prevent any kind of acceptance or discussion on the metric.

***

Matt: BaseRuns works on base/out outcomes to tell you the run value of events.  So, if you convert GB into singles and outs, then, yeah, it’ll tell you what you want.  Not necessarily as great as we’d want though.

For that, you’d want a simulator or Markov chain.  I do have this:
http://www.tangotiger.net/markov.html

But, this simple MArkov excludes outs on base, and so, GIDP is not part of the set.


#14    Tangotiger      (see all posts) 2010/02/08 (Mon) @ 23:08

MattEric also said this:

“Allows for the fact that adding strikeouts is more useful when you don’t strike out many guys to begin with, since more runners get stranded.”

If we use our trusty Markov calculator, we can see the run value of the K, based on the frequency of K.

If you go to my site, and set K=0, runs per game is 5.04 If you set it to K=1, RPG = 5.02.  Here’s the chart:

K R
0 5.038
1 5.019
2 5.000
3 4.981
4 4.962
5 4.943
6 4.924
7 4.905
8 4.886
9 4.867
10 4.848
11 4.829
12 4.810
13 4.791
...
27 4.529

As you can see, a perfect drop of .019 runs each time a regular out is turned into a K.  From 0 K to 27 K, you get a drop of .509 runs, or .01885 runs.  It’s linear and there’s no “diminishing returns”.

Therefore, I have to conclude that MattEric’s statement here is false, or at least highly questionable.

Guys, can you back up your theory?


#15    Matt Swartz      (see all posts) 2010/02/08 (Mon) @ 23:23

If I understand you right, you’re holding regular outs constant.  It’s the same problem with FIP.  Turn a batted ball into a K and you should get diminishing returns. 

Put it this way: when is a strikeout more useful?  When there are runners on base, especially in scoring position (and especially even then when you aren’t very good at generating ground balls).  What are the WHIPs of pitchers with K/PA at different levels?  It’s not random.

The issue is that you’re holding outs constant.  Pitchers rarely throw complete games so the number of outs they record is a variable just like the number of runs, and the number of PA should be the denominator.  Outs are worth -0.3 runs are so according to linear weights, right?  Something like that, anyway.  That 0.019 is about strikeouts versus other outs.


#16    Brian Cartwright      (see all posts) 2010/02/09 (Tue) @ 00:29

I agree with Matt. One less strikeout means on more PA that’s not a K, and that can be a BB, a HP or a BIP.


#17    dkappelman      (see all posts) 2010/02/09 (Tue) @ 01:06

Here’s another way to look at it.

I looked at the average value of a strikeout based on a pitcher’s season level floor(K/9) since 2002.  You get something that looks like this:

4: -.289
5: -.288
6: -.289
7: -.291
8: -.290
9: -.288
10: -.290

It’s true the value of strikeouts increase with men on base, but there doesn’t seem to be much of a difference based on how many batters the pitcher actually strikes out.

I’m not trying to say that every single pitcher is going to have exactly the same value of a strikeout in reality, but given a large enough sample, seems like that might be the case.


#18    Matt Swartz      (see all posts) 2010/02/09 (Tue) @ 01:15

How is that better that running a regression that explicitly controls for other factors?  The regression results will be in Wednesday’s article and the positive quadratic term on strikeouts is statistically significant.  Not only that, on the vast majority of subsets of data on which we ran the regression, it came out significant.  The effect is definitely there, whether it’s the value of strikeouts with men on base or not.  My best guess is that it is primarily coming from that, and the reason that it’s not showing up in Dave A’s table above is that K-rate has some correlation with BB-rate and GB-rate.


#19    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 01:22

All I did was exactly what I said I did: compare the run value of a K to the run value of other outs.  I held outs constant, I held PA constant.  It’s simply the relative value of the K to the other out.

Matt’s statement however said that a pitcher that has fewer K would have a higher marginal run value for the next K.  If I understood him correctly.

If that’s the case, then that’s wrong.

Said another way: the run value of the K is not dependent on the number of base runners to an extent more (or less) than the run value of a non-K out is dependent on the number of base runners.  The run value of the K is as tied to the number of baserunners as the run value of the nonK out is.

Just on this particular point, are we agreed?


#20    Matt Swartz      (see all posts) 2010/02/09 (Tue) @ 01:38

How exactly did you hold PA constant?  What is the BABIP for the pitcher who allows 4.5 runs per game and strikes out 27 runners per game.  It’s 1.000.  That makes no sense.  You held number of baserunners constant, I suppose.  Pitchers who strike out more hitters, with equivalent walk and ground ball rates, are going to have lower ERAs.

As to the value of a K versus a groundout or flyout or popout or lineout, it obviously depends on the situation.  Fly outs with runners on third and one out are worse for pitchers than strikeouts with runners on third and one out, which are worse than ground ball outs with runners on first and one out, which are often double plays.  The benefit of using regression is that we can pick up any correlations between the skill-variables (throwing pitches that hitters miss at, aiming for the strike zone, and throwing with a trajectory that generates ground balls) and actually spit out what ERA should be for a pitcher with that collection of skills rather than saying what the effect of each skill is in isolation like we are doing here with guesses about strikeout rates holding things constant that aren’t.  The skills interact, and the returns to skills are nonlinear, per the evidence from the regression and the RMSE tests.


#21    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 08:35

As to the value of a K versus a groundout or flyout or popout or lineout, it obviously depends on the situation.  Fly outs with runners on third and one out are worse for pitchers than strikeouts with runners on third and one out, which are worse than ground ball outs with runners on first and one out, which are often double plays. 

I believe I have said that about, I dunno, more than once in this blog.  Maybe at least 100.  This is the entire point of WPA/LI.  Yes, this I know.

The question being asked is: how much of a difference is there, and how much does it change based on your statement that the fewer Ks you have the more run impact the next K has.  This is the only thing I am evaluating here.  I have shown that the runs scored is quite predictable and in a linear fashion.

Now, if you are saying something else, that you are not holding PA and baserunners and outs constant (which is what I was presupposing your statement implied), but that you are holding BABIP constant and holding GB per BIP constant, then, yes of cours this statement is true:

Pitchers who strike out more hitters, with equivalent walk and ground ball rates, are going to have lower ERAs.

This is after all FIP!

***

So, perhaps you can clarify exactly what you mean by this:

“Allows for the fact that adding strikeouts is more useful when you don’t strike out many guys to begin with, since more runners get stranded.”

I’d like to evaluate, verify, confirm / refute this, but I seem to be missing something.


#22    Matt Swartz      (see all posts) 2010/02/09 (Tue) @ 09:29

I am holding BB/PA and (GB-FB-PU)/PA constant, per the regression.

Outs are more valuable when you have runners on base.  Holding BB/PA and (GB-FB-PU)/PA constant, there are fewer runners on base at any given time as the K/PA increases.  Thus, as K/PA goes up, expected runners on base goes down for a given BB/PA and (GB-FB-PU)/PA.

Look at FIP, for example.  Start with 0 BB/9, 0 HR/9, 6 K/9.  The FIP is 1.87.  Increase the K/9 to 9 and the FIP is 1.2; increase the K/9 to 12 and the FIP is 0.53; now increase the K/9 to 15 and FIP is (-0.13).  That obviously doesn’t make sense, but the point is that there is diminishing returns to K/9, which get in the way.

Another way: suppose that of all non-K plate appearances, the remaining PA consist of walks, doubles, and deep flyouts, alternating and each happening 1/3 of the time.  Start with 0% K/PA. 

BB, 2B, SF, BB, 2B, SF, BB, 2B, FO. 

That was a five run inning.  The next inning looks the same.  And the same in the 3rd, etc.  That’s good for 45 runs per game. 

Now increase the K/PA to 50%, so that now the batting events go like this:

1st inning: BB, K, 2B, K, FO, that’s 1 run

2nd inning: K, BB, K, 2B, K, that’s 1 run

3rd inning: FO, K, BB, K, that’s 0 runs

4th inning: 2B, K, FO, K, that’s 0 runs

and then the 5th inning is like the 1st, and so on.  So you get 0.5 runs per inning instead of 5.0 runs per inning.  I don’t need to tell you that increase the K/PA to 100%, and you get 0 runs per inning.  So the increase in K/PA from 0 to 50% lowered the ERA from 45.00 to 4.50 and the increase in K/PA from 50% to 100% lowered the ERA from 4.50 to 0.00.

What about 25% K/PA.  Then the innings would look like this:

1st inning: BB, 2B, SF, K, BB, 2B, FO (3 runs)
2nd inning: K, BB, 2B, SF, K (1 run)

and the odd and even numbered innings look like the 1st and 2nd.  So the pitcher has an ERA of 18.00.

For 75% K/PA, if you alternate the 3 K’s with 1 of BB, 2B, and FO consecutively, you’ll get 0 R every inning.

So we get:

0% K/PA: 45.00 ERA
25% K/PA: 18.00 ERA
50% K/PA: 4.50 ERA
75% K/PA: 0.00 ERA
100% K/PA: 0.00 ERA

This is diminishing returns.  Again, outs are more valuable with more runners on base.  The more of your PA are guaranteed outs, the fewer runners are on base so the fewer runs are prevented.


#23    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 10:19

For ease of discussion, I will intersperse my comments:

I am holding BB/PA and (GB-FB-PU)/PA constant, per the regression.

Outs are more valuable when you have runners on base. 

No question about that.  For people who need a reference:
http://www.tangotiger.net/rc2.html

We see that the typical out has a -.10 run value to account for the loss in run potential of runners on base.  Naturally, the fewer the runners, the closer this value is to zero.  The more the runners, the more extreme the run value.

This is also evident here:
http://www.tangotiger.net/RE9902event.html

We see the run value of the out changing based on the number of outs and the runners on base.

Holding BB/PA and (GB-FB-PU)/PA constant, there are fewer runners on base at any given time as the K/PA increases.  Thus, as K/PA goes up, expected runners on base goes down for a given BB/PA and (GB-FB-PU)/PA.

You are increasing K per PA, meaning you are decreasing contactedPA per PA (meaning BIP + HR).  In this case, you are converting some K for some contacted PA.  I’m not sure why you would necessarily hold BB walks constant in this case, but that’s not important.  You are asking: what happens when I convert a K into a contactedPA (or vice versa).  Ok, I’m with you so far.

Look at FIP, for example.  Start with 0 BB/9, 0 HR/9, 6 K/9.  The FIP is 1.87.  Increase the K/9 to 9 and the FIP is 1.2; increase the K/9 to 12 and the FIP is 0.53; now increase the K/9 to 15 and FIP is (-0.13).  That obviously doesn’t make sense, but the point is that there is diminishing returns to K/9, which get in the way.

FIP is not setup to handle any kind of extreme cases.  It takes a non-linear function makes it linear.  Let’s leave this example aside, because I don’t know that it really means what we want it to mean.

Another way: suppose that of all non-K plate appearances, the remaining PA consist of walks, doubles, and deep flyouts, alternating and each happening 1/3 of the time.  Start with 0% K/PA.

BB, 2B, SF, BB, 2B, SF, BB, 2B, FO.

That was a five run inning.  The next inning looks the same.  And the same in the 3rd, etc.  That’s good for 45 runs per game.

Now increase the K/PA to 50%, so that now the batting events go like this:

1st inning: BB, K, 2B, K, FO, that’s 1 run

2nd inning: K, BB, K, 2B, K, that’s 1 run

3rd inning: FO, K, BB, K, that’s 0 runs

4th inning: 2B, K, FO, K, that’s 0 runs

and then the 5th inning is like the 1st, and so on.  So you get 0.5 runs per inning instead of 5.0 runs per inning.  I don’t need to tell you that increase the K/PA to 100%, and you get 0 runs per inning.  So the increase in K/PA from 0 to 50% lowered the ERA from 45.00 to 4.50 and the increase in K/PA from 50% to 100% lowered the ERA from 4.50 to 0.00.

What about 25% K/PA.  Then the innings would look like this:

1st inning: BB, 2B, SF, K, BB, 2B, FO (3 runs)
2nd inning: K, BB, 2B, SF, K (1 run)

and the odd and even numbered innings look like the 1st and 2nd.  So the pitcher has an ERA of 18.00.

For 75% K/PA, if you alternate the 3 K’s with 1 of BB, 2B, and FO consecutively, you’ll get 0 R every inning.

So we get:

0% K/PA: 45.00 ERA
25% K/PA: 18.00 ERA
50% K/PA: 4.50 ERA
75% K/PA: 0.00 ERA
100% K/PA: 0.00 ERA

This is diminishing returns. 

Alright, now this is extremely easy to test, and a perfect example of what my Markov calculator can be used for.  I will test your scenario here.

However, in this case, you are not holding BB constant, right?  What you are doing is saying: I have 50 PA, of which 5 are walks and 45 are contactedPA.  Now, let’s add 50 more PA, all of which are strikeouts.  So, we didn’t convert contactedPA into K, which is how you started off your post.  Right?

In any case, this illustration is clear, and I will report back as to what happens.

Again, outs are more valuable with more runners on base.  The more of your PA are guaranteed outs, the fewer runners are on base so the fewer runs are prevented.

Yes, outs are more valuable with more runners on base.  The point is whether there is any kind of diminishing returns for the strikeouts in particular, no?  Or, are you simply saying for ANY out, K or not, then this is true.

Ok, I think this is what you are saying right?  That it’s not K in particular but ANY out.  But, in your SIERA equation, the ONLY explicit out is the K out.

Is this the issue then?  My misunderstanding that you were specifically targetting the K, as opposed to simply making a general point about the out?

So, yes, that is perfectly true if that’s the case.  This is made clear through BaseRuns, and noted in a chart here:
http://www.tangotiger.net/customlwts.html

The run value of the out is HIGHLY dynamic, and it’s entirely dependent on the run environment.


#24    Matt Swartz      (see all posts) 2010/02/09 (Tue) @ 10:48

Yes, that’s what I mean.  Thanks.

I guess I wasn’t really holding BB/PA and (GB-FB-PU)/PA constant in my example there, but the point seems to have been agreed upon anyway.  The only explicit out in the model is the strikeout so that’s where the diminishing returns come from.  I suspect this is the main reason that the coefficient for (K/PA)^2 is positive.  However, there could be other reasons that Eric and I did not think of, so we let the coefficient vary with the data rather than marrying it to BaseRuns or anything like that.


#25    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 10:52

Ok, let’s do the test from Matt’s illustration: what happens as we add more K to an existing profile of non-K PA.

Let’s start, as we always do, with the baseline of 4.905 runs per game:
http://www.tangotiger.net/markov.html

Now, let’s add 1 K (and 1 PA) to the profile (and realizing that we are actually adding 1/38th of a 100% K profile, and 37/38th of the existing baseline profile):
4.60 runs per game

At the bottom of the page, we see what the run impact is: -.285 runs for a non-K out and -.302 for a K out.

If we add 5 K (and 5 PA):
3.63 runs per game
-.226 runs for a non-K out and -.240 runs for a K out

In both cases, the run value of the K out is 6% higher than the run value of a non-K out.

If we add 10 K and 10 PA:
2.80 runs per game
-.175 for a nonK out and -.185 for a K out

Again, 6% more.

If I add 27 K and 27 PA (meaning making half the PA as K):
1.43 runs per game
-.088 runs for nonK out and -.092 for K out

In this case, down to 4.5%.

So, if Matt is talking about K outs relative to all outs, then the run value of the K is pretty constant in that regard.

If Matt was making a point of any out (and he highlighted just the K because in SIERA that is the only explicit mention of the out), then we really have no disagreement on this particular issue.


#26    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 10:53

Ok, Matt/24 and Tango/25 cross-posted, so we’re good.


#27          (see all posts) 2010/02/09 (Tue) @ 13:45

So let me summarize:

Tango doesn’t like it because it doesn’t have his stuff in it.

Matt likes it because he says its better, but only a little.

The rest of us don’t care and the two articles so far are competition for Ambien. We have to wait three articles of term paper level excitement to get some backtesting or even an example.

This is why BPro is dying. No one said to Matt “my god, this sucks.”


#28    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 14:41

LJ, as it pertains to me, I never said I didn’t like it.  Indeed, I can’t have even form an opinion unless I have evidence.

The one thing I said I did not like was the claim about QERA being a big step up.  That was an unnecessary line by MattEric, especially since it’s untrue, and it was definitely unsubstantiated.

Generally speaking, I am opposed to complex constructions because it’s almost impossible to figure out what’s going on.  Now, SIERA may be an exception to my general rule (the way BaseRuns is), but I need evidence for that.  I haven’t seen it yet.  For example, Voros’ BaseRuns-DIPS construction is almost certainly better than FIP.  But, the gain just isn’t there to make it worthwhile to go from FIP to BaseRuns-DIPS.

Right now, I’m in the process of evidence-gathering and claim-testing. 

The worst that someone could have said to MattEric is “my god, this is so freaking complex… does it, you know, work?” In that respect, it’s like Win Shares, in that: how can you tell when it works and when it doesn’t?  I don’t think anyone would have been justified to say worse than that.


#29          (see all posts) 2010/02/09 (Tue) @ 15:08

LJ, if you’re not interested in it, you don’t have to read it. But there’s no reason to be rude. There are some of us for whom this is interesting. There is a lot that can be learned from seeing various approaches to solving a problem and the dissecting of those approaches.


#30    Tangotiger      (see all posts) 2010/02/09 (Tue) @ 15:15

Part 2, also free for all (was that something you guys pushed for?  much appreciated):

http://www.baseballprospectus.com/article.php?articleid=10032

I will have to correct this as a feature not a bug:

FIP attempts to correct for BABIP luck but fails to correct for the luck inherent in HR/FB, perpetually over- or underrating certain types of pitchers in the process.

The SOLE existence of FIP is to take the things that the fielders are NOT responsible for and to remove sequencing of events.  That’s it.  If a pitcher gives up 0 HR or he gives up HR in 20% of his flyballs, I do NOT want to say that this pitcher should be marked as 10% of his FB for HR.

So, you cannot say that FIP “fails” to correct for the luck portion of HR/FB, when FIP WANT to KEEP the luck portion of HR/FB.  Just as we want to keep the luck-portion of BABIP for Jeter or Adam Dunn or whoever. Does SLG fail to correct Todd Helton’s use of Coors? No, because SLG represents what it purports to represent.

FIP is like OBP or SLG in this regard in that it focuses on a subset of ACTUAL performance and keeps that ACTUAL performance (albeit on some understandable scale).  It’s a feature, not a bug, that the walk and HR have a “1” in OBP and null and 4 in SLG.

So, rather than this being a failure, it’s actually a success!

***

Silver also made another improvement by looking at walk and strikeout rates per plate appearance, instead of per nine innings.

I sound like a broken record, but the coefficient for the K is “2” precisely because I used IP in the denominator.  Had I used PA, the coefficient would be something different, precisely for the reasons being cited by MattEric.

Again, it’s a feature that the coefficient is “2” for K and the denomiator is IP, rather than making the coefficient “2.3” or “1.7” or whatever it would be, and have PA instead of IP.

***

The rest of the article expands on the idea behind the SIERA framework, and the multiplying of the various components, or squaring them, etc.  Again, I can’t say if it works until I test it or see the results.


#31    Eric Seidman      (see all posts) 2010/02/09 (Tue) @ 15:40

I’d love to be more active here but being an accountant during tax season with 4 MBA classes at night precludes much free time for, well, anything. But I will say that tomorrow’s article looks at how we got the formula and the data used, Thursday is testing day, and Friday discusses examples of pitchers to exemplify how the stat works.



#33    tangotiger      (see all posts) 2010/02/10 (Wed) @ 20:12

Part 3:

http://www.baseballprospectus.com/article.php?articleid=10037

The conclusion: the effect of walks on ERA is linear

Little tidbit: The run value of a walk is roughly equal to 90-99% of the OBP.  And this is the case from OBP of .001 to .999.

Not exactly true in all cases of course.  Just a nice little thing to know, generally.

It seems particularly inaccurate that FIP puts a coefficient of 13 on HR/IP for all pitchers, regardless of their walking exploits. Solo shots do something different to ERA than grand slams.

Actually, the run value of the HR is pretty stable at around 1.40 runs regardless of run environment.  So, it is not particularly inaccurate at all. 

There’s a simple mathematical principle that proves this, which I’ve described in the past.  I’ll see if I can find it so that I don’t have to retype it.


#34    tangotiger      (see all posts) 2010/02/10 (Wed) @ 20:19

Proof of the run value of the HR being stable at 1.40:

http://www.insidethebook.com/ee/index.php/site/comments/woba_year_by_year_calculations/#47

The run value of the HR is the getting on plus the moving over.

The getting on is exactly 1, so that leaves us with calculating the moving over.

The average runner on base has a .30 chance of scoring.  So, if he scores, that means that the HR add +.70 runs.  There are roughly .60 runners on base.  So, .7*.6= .42.  That’s the moving over value of the HR.  1+.42 = 1.42 runs.

Now, what if it’s a great pitcher that’s on the mound?  Well, in those cases, only 25% of the runners score.  This means that the HR adds +.75 runs per runner on base.  However, because they are great pitchers, they have fewer runners on base.  Let’s say they have 0.5 runners per PA.  .75*.5 = .375.  That’s the moving over value.  The HR therefore is worth 1.375 runs.

Go the other way, and you have a terrible pitcher.  Those guys will allow 35% of runners to score, meaning the HR adds .65 runs per runner.  Because they are terrible, they allow alot of runners on base.  Let’s say that’s .65 runners per PA.  .65*.65 = .42.  The HR is worth 1.42 runs.

Go to the extreme: a pathetic pitcher will allow 50% of runners to score, meaning each HR adds .50 runs.  And, he has say 1 runner on base per PA.  1*.5=.5, and the HR is worth 1.50 runs.

Or the opposite, where an unbelievably great pitcher will allow 15% of runners to score (HR worth +.85 runs per runner) and only have say .40 runners on base.  .85*.4 = .34, and so a HR is worth 1.34 runs.

As you can see, the run value of the HR doesn’t move much.  And in the league settings we are talking about, it barely changes.  And so, it’s easier to keep the HR run value fixed at 1.40 and move on.

And not only do I have proof, but the empirical data also supports it:

http://www.insidethebook.com/ee/index.php/site/article/linear_weights_by_run_environment/

When the run environment is .419 runs per inning to .581 runs per inning, the run value of the HR barely moves off the 1.40 level.


#35    Matt Swartz      (see all posts) 2010/02/10 (Wed) @ 20:31

The run value of a home run depends how many people are on base.  Solo home runs are worth less.  If some pitchers are much more likely to surrender home runs with people on base or with bases empty, FIP will misvalue them.  On Friday, we’ll discuss Johan Santana, and you can get back to me then about this criticism.  The thing about “run environments” is that they do not cater to the specific skills of the pitcher.  This is why pitchers’ skills need to be contextualized-- their “run environment” is very different than other run environments with the same number of average runs.  It’s the reason why I don’t think that linear weights at least in this way is anywhere near the best way to estimate ERAs and why SIERA tests so much better on every dataset.  You’ll see about the testing tomorrow.

This changes severely with K and BB numbers.  Johan Santana is a solo home run machine.  The run values of his home runs are not the same as those of other pitchers.  We give more detail into Santana on Friday’s article, but Johan Santana hasn’t just been lucky for years on end.  He gives up a lot of solo home runs.

I see you posted again as I’m typing but you said the run environment of .419 to .581 which means 3.7 to 5.2 runs per game.  There are a lot of pitchers out of this range already.

Your write-up also assumes a constant fraction of runners on each base as near as I can tell.  Pitchers who give up fewer walks don’t have as many runners on first.  These differences are all small but apparently they add up.

And again, this does not get into situational pitching which is something that pitchers obviously try to do and while it may not show up well on the individual level, it may show up among pitchers with common skill sets.  The danger of building an ERA estimator from the ground up is that you won’t catch this, as near as I can tell.

Tomorrow we’ll show you the tests.  It’s really just a small subset of the tests we run, but SIERA consistently comes out ahead consistently.  Friday we’ll show you pitchers where SIERA does best and where other metrics miss.


#36    tangotiger      (see all posts) 2010/02/10 (Wed) @ 20:58

The run value of a home run depends how many people are on base.  Solo home runs are worth less.

Agreed, as noted here:

http://www.tangotiger.net/RE9902event.html

So, that makes it perfectly clear that the run value of the HR changes based on the base/out state.  Indeed, I spent several pages in The Book discussing the run value of the HR by base/out state.

We are in agreement here.

***

It’s the reason why I don’t think that linear weights at least in this way is anywhere near the best way to estimate ERAs

Agreed.  BaseRuns is what we want.  BaseRuns is the one model that properly values each outcome (H, HR, BB, etc) within its own environment.  Are we agreed here?

***

Johan Santana is a solo home run machine.  The run values of his home runs are not the same as those of other pitchers.  We give more detail into Santana on Friday’s article, but Johan Santana hasn’t just been lucky for years on end.  He gives up a lot of solo home runs.

It can very well be that Santana is smart like that.  But, FIP doesn’t care about that.  Remember what FIP says:

1. it doesn’t care about non-HR batted balls
2. it doesn’t care about how events are sequenced

This is the “feature” of FIP.  It just does what it does.  If Tom Glavine has a particular skill in sequencing, or Santana, then FIP will not capture that.  Because it won’t know and can’t know.

***

There are a lot of pitchers out of this range already.

I’ll be happy to run my RE model on all pitchers and report back the run values of all pitchers by event. 

***

Also, please note that FIP does NOT try to predict the future.  It’s simply not its raison d’etre.  If you are going to put it in a competition like that, please be nice enough to put that in.

All FIP does is express a pitcher’s actual outcomes (HR, BB, HB, SO) in an ERA form under the two conditions I noted above.  If others want to use FIP for something else, well, that’s tangential.


#37    Matt Swartz      (see all posts) 2010/02/10 (Wed) @ 21:10

Ultimately, I think regression is the best technique here and you don’t, and linear weights or Base Runs is never going to pick up all these factors of which I have thought of a few off the top of my head as we discuss this. 

We have good reason to prefer this technique.  SIERA beats xFIP and QERA handily at RMSE versus same-year park-adjusted ERA and, and beats xFIP, QERA, FIP, and tRA handily when looking at RMSE versus next-year ERA. 

If you don’t want FIP to be a model of persistent skills, that’s fine, but it is what most people use it for.  If I want a record of what happened, I use ERA or RA.  If I want to record skills, I model what is and isn’t a skill and look for a way to pick up this information.  Regression seems to work well.  If you want to use a Base Runs formula, maybe it will predict it better, but we didn’t have a formula that doesn’t exist yet to my knowledge available to us when we ran these tests.


#38    tangotiger      (see all posts) 2010/02/10 (Wed) @ 21:11

I just realized my Markov calculator gives us the run values of each event for any run environment.

http://www.tangotiger.net/markov.html

Let’s see what it says.

For the baseline one, it has a value of 1.49.  Note that it’s higher than we are used to because of the conditions of the Markov: no outs on base.  That’s unimportant here, because we are only interested to see how the run value changes.

Change “AB” to 47.  Runs per game is 2.90.  HR value is 1.43.

Change “AB” to 32.  Runs per game is 6.87.  HR value is 1.52.

Exactly as theory would have predicted.

How about a Pedro-like level.  Change “AB” to 57.  Runs per game is 1.95.  HR value is 1.38.

How about a Coors-on-steroids environment?  Change “AB” to 27.  Runs per game is 10.38.  HR is 1.53.

Indeed, at some point the HR value is going to DIMINISH as the OBP increases.  It has to, because as OBP approaches 1.000, the run value of every event approaches 1.000.

This is why the HR value doesn’t move very much.  It’s capped at the bottom by exactly 1.000, and capped at the top by something close to 1.500.


#39    tangotiger      (see all posts) 2010/02/10 (Wed) @ 21:17

If you don’t want FIP to be a model of persistent skills, that’s fine, but it is what most people use it for.  If I want a record of what happened, I use ERA or RA.

I agree that many (most?) people use FIP as you are saying.  But, FIP’s equation is so simple that there should be no debate as to what it is actually doing.  It’s self-evident, just like OBP is self-evident that a walk=HR.  FIP ignores sequencing and ignores batted balls, just like OBP ignores the extra bases from the HR.

And yes, ERA or RA (RA actually) is a record of what happened… of the pitcher AND his fielders.  Most people however use ERA to represent only the pitcher.  Again, it’s self-evident what RA does, but most people don’t use it like that.  It’s a record of runs allowed by a team, while a pitcher is on the mound.  It’s not a record of the number of runs a pitcher allowed.

In any case, I’m off on a tangent here.

I’m still not done testing your equation, so we’ll see how well it holds up.


#40    Zach      (see all posts) 2010/02/10 (Wed) @ 22:05

Matt, I’m no mathematician, but wouldn’t running a regression on any data give you the most accurate results? Isn’t it just like a regression on hitting that puts the double at 0.6 runs yet is more accurate than standard linear weights?


#41    Matt Swartz      (see all posts) 2010/02/10 (Wed) @ 22:09

Zach/40:
We ran the regressions on one data set and tested them on others, eliminating the issue you describe.  It also has a lower RMSE when looking at next-year ERA, which we did not use as a dependent variable in the regression.  I’m all for modeling before regression, but here regression does a good job.


#42    tangotiger      (see all posts) 2010/02/10 (Wed) @ 22:11

I think Matt said he set the regression equation on 2003-08 data,and then tested in on the 2009 data.

So that’s good.  Generally speaking, I agree with Zach that we still need to have logical underpinnings, not just best-fit.  The way the equation is setup, it’s hard to tell what it’s actually doing, so that’s why it’s going to take me a while to figure it out.


#43    Sky      (see all posts) 2010/02/10 (Wed) @ 22:53

It sounds to me (warning!) that SIERA is attempting to pick up on contextual/situational effects, like allowing home runs with the bases empty vs. with runners on (more than you’d expect by overall frequencies).  This seems like (warning!) both a feature and fault of regression analysis.  It picks up on lots of little things beyond what we’re expecting and can explain.  But we have to make sure it’s not finding “false positives”.

Anyway, if you want to tackle some of the things SIERA is getting at indirectly, you could include things like strand rate, or HR rates by runners on or not, or other situational measure.  You don’t want to use one year’s raw rates, and would probably do a weighted average of multiple seasons with some “regression”.

At that point you’re doing a full blown component-by-component projection, but if you’re goal is the most accurate prediction of future performance, isn’t that exactly what you should be doing?

Am I missing something of what SIERA’s trying to do?  Is there any goal of simplicity (only using certain metrics?) Or is it purely “let’s be the best at predicting future ERA”?


#44    Matt Swartz      (see all posts) 2010/02/10 (Wed) @ 23:13

I don’t really see simplicity the same way that you do.  I think that doing components misses nonlinear effects, correlation between different things, but running a regression is less presumptuous about the way things work.  I don’t think that projecting ERA optimally is a bad goal, as long as we are trying to measure the effects of certain skills rather than retroactively predicting things.  Sometimes you need to let the data talk, and I think that’s what we did.

As to false positives, I think that we checked this by looking at different years and different subsets of pitchers to make sure we did things correctly.  The term that is small and therefore statistically insignificant at this stage is the BB*GB term but it’s tough to show whether it works with a couple thousand pitchers so we left it in although it didn’t change much when we left it out in terms of the actual SIERA values.

If I had to say what SIERA is trying to do, it is asking the question of how a set of skills is used to generate ERA as a set rather than as individual components.  This is because of different base/out schemes that pitchers may be prone to find themselves in, different abilities in situational pitching that are correlated with skills, different abilities to prevent hits and extra-base hits on balls in play that are correlated with skills, and a million other things.  Trying to model each of these component-wise is bound to miss something.  I was floored when the ((GB-FB-PU)/PA)^2 coefficient was very negative and the (SO/PA)^2 coefficient was very positive.  But it’s true every time.  I have some guesses as to why that I’ve documented but Eric and I certainly didn’t go in trying to model that.  Eric and I asked why Nate did QERA in this way, because (a+b*BB%-c*K%-d*GB%)^2 forces a coefficient that is positive on K%^2 and GB%^2.  We checked and the opposite was really true with GB%.  That’s interesting especially because it’s really significant on every subset of data we checked.  Doing regression allowed the data to speak in this case.  Components would have missed that.


#45    Tangotiger      (see all posts) 2010/02/10 (Wed) @ 23:30

Components would have missed that.

Would it have?  Let’s follow along with how Voros created DIPS: for the things he wasn’t interested in (batted balls in play), he forced in a league average, and then simply figured out the adjusted hits, adjusted innings, etc.

Now, we could do the same thing here, but instead of being uninterested in batted balls, we are very interested in them, at least to separate them into GB, FB, LD, Pops.  And from each, we can generate 1B, 2b, 3b, hr, DP (though DP is dependent on runners on base) based on league average for GB, FB, LD, Pops.  And once you have the adjusted line, you simply plug it into the well-tested BaseRuns.

However, SIERA approaches it differently.  It tries to do in 1 step what it should do in 2 steps.  It combines things because the regression tells it to combine things.  It’s not interested in whether it combines it in the right way, jsut that it best-fits it, and it’s tested to the out-of-sample data.

Now, by doing that, it does have the chance to pick things up, like maybe pitchers with lots of K or few BB will get lots of FB (and hence HR) with bases empty, etc.  But, we can’t really tell in the equation.  It’s a complicated equation.

Indeed, why not also let the HR into the equation?  If the HR/FB has no predictive value (which I know to be not true), why not include it in there to improve the fit?  Indeed, why not include all the fielder-dependent components as well?  That is, throw it all in?  And if you do that, why not also throw in the sequencing, meaning why not use GB with men on base, and FB with men on base, and LD with bases empty, etc.  Use it all.

Basically, you can go from the absolute bare minimum (FIP) to the absolute maximum (kitchen sink), and you will definitely improve the fit to current-year data and for following-year data.  (It’d be pretty hard to add data and reduce fit.)

It goes back to what it is that the metric intends to do.  Matt and Eric are intentionally, I presume, handcuffing themselves by not looking at situational performance, and not looking at outcomes on batted balls. This is not a bug, but a feature.  And instead of doing it in 2 steps, they want to do it in 1 step.  Again, feature, not bug.

This is the same thing for FIP.  Each metric is carefully designed to do what it is the creator wants them to do.  There is no “wrong” in FIP or in SIERA in terms of intent.  There may be “wrong” in terms of coefficients (not in FIP, and TBD in SIERA).

So, Sky is correct to ask what it is that SIERA is trying to do, and we should evaluate it on those terms.

Otherwise, the same things that Matt is saying is wrong with FIP I can say the same thing about SIERA.  FIP is just about the best we can do if we limit ourselves to the 4 terms noted and limit ourselves to a linear equation.  It’s as good as it gets.

Is SIERA the best that we can do based on the self-imposed limitations?  To be determined.


#46    Brian Cartwright      (see all posts) 2010/02/11 (Thu) @ 00:00

The question I would be asking is “How many runs will this pitcher allow, given what we know about him? (with an average defense in a neutral park)”. So let’s keep the input to the things the pitcher has the most control over - start with BB & SO. FIP uses HR, but HR/FB is not consistent, so use FB instead, and then use the GB/FB/PU mix to stand in for BABIP, as the pitcher does control those to a large extent.

Do I have the question right? Anything I am missing anything?

I did my own ERA estimator based on wOBA allowed. I projected pitchers as a mirror of the process for batters, but then the question was “How many runs allowed” because the pitcher creates his own run environment and it becomes non-linear. Tomorrow, when I see how MattEric’s test is modeled, I will test mine in the same way.


#47    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 00:45

"FIP uses HR, but HR/FB is not consistent, so use FB instead”

Right, but the correlation is not zero.  There is a skill component to allowing HR that goes beyond the number of FB allowed.  SIERA is explicitly ignoring that (which is fine, since it’s not interested in number of HR allowed).

But, to say that you can ignore HR and not improve the best-fit would be wrong.  Including HR will help.  As it would if you included BABIP.  They would just be heavily regressed.  Just not 100% regressed.


#48    Matt Swartz      (see all posts) 2010/02/11 (Thu) @ 00:51

The ICC of HR/FB net of team is like 0.07 in our data.  BABIP is a little higher (I forget what).  So there is small effects.  But those effects are correlated with DIPS skills, something regression was helpful with when Bradbury did it in 2004 at THT and is helpful in 2010 with SIERA.


#49    Depot      (see all posts) 2010/02/11 (Thu) @ 03:10

Matt,

I really like the idea and I’m always pro-regression, but I agree with Tango that you might need to define the point of this a bit better.  You seem to adding new terms but still handicapping yourself.  If your final test is going to be predicting next year’s ERA, then why not use age or multiple years of ERA or changes in the defense?


#50    Matt Swartz      (see all posts) 2010/02/11 (Thu) @ 03:14

We’re not projecting the future.  We’re measuring how sets of skills lead to ERA.  It’s “Skill-Interactive Earned Run Average.” So the way to measure whether something is a skill is to see how persistent it is.  Thus, next-year ERA seems the best way to do it.  We also checked it against metrics that used the same level of information, QERA and xFIP, and it won handily at RMSE vs. same-year ERA.


#51    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 11:04

We also checked it against metrics that used the same level of information, QERA and xFIP, and it won handily at RMSE vs. same-year ERA.

This of course is the right way to test against the competition: “given the same parameters, what’s the best way to combine those parameters?”

***

Matt, side note: backshegoes.com is blocked at the office, so I can only respond tonight. (I was going to respond this morning, but then my kid came and said “do you want to play with me”, and, that’s all she wrote.)

One quick thing: if it was up to MGL, he’d have sold The Book for no profit, or had it available for all to download for free as a PDF.  He has no monetary incentive to sell The Book.  His only goal was to make the knowledge available for anyone with no barrier, just like all bloggers out there.  But, he’s got partners, and it not up to him.


#52    KJOK      (see all posts) 2010/02/11 (Thu) @ 13:33

There seemed to be some doubt about this in the BP comments, but the way park factor is described as being applied to SIERA is definitely incorrect.  B-R uses the Total Baseball formula, and as the person who calculated the park factors for the last published Total Baseball, I can say with certainty that those factors should NOT be ‘halved’, as they already include an adjustment for road games (and for not facing your own pitching staff).


#53    dkappelman      (see all posts) 2010/02/11 (Thu) @ 14:15

So, about the RSME vs other metrics, I think you have to be careful because the actual implementation of the metrics you tested against is not quite an apples to apples comparison to what you’re testing, given the datasets used, park factors, and even the various calculations of the other metrics.

I’m just speaking from more of a practical standpoint, that when you say SIERA beats these other stats “handily” in RSME, it may or may not ring true for the version of xFIP or tERA that shows up on FanGraphs.  I won’t speak for the regressed tRA on Statcorner, because I’m not the expert there, but that could possibly be the case there.

Now I realize that this is a completely different discussion from the framework of SIERA, but how the metrics are implemented in reality makes a difference and if I were to implement your current version of SIERA using a different dataset, I would surely get different results.

I’m just not so sure that an optimized SIERA vs standard xFIP, FIP, and (I don’t really know what standard tRA is) on retrosheet data means that SIERA is “handily” better than the others.


#54    Matt Swartz      (see all posts) 2010/02/11 (Thu) @ 14:24

Dave, I understand what you’re saying.  The only thing is that we weren’t optimizing SIERA using next-year ERA, but instead were optimizing SIERA using same-year ERA handcuffing ourselves to certain variables.  So in reality, it shouldn’t be too big of a difference to run the tests differently, and I can assure you that we were very careful to be as unbiased as humanly possible when running these tests.  Obviously, there’s gray area in how to run them, but the difference between SIERA and other estimators versus other estimators and regular ERA seems really big and probably enough to stand multiple tests. 

I don’t know if using different park factors would matter right now, because as it stands, there is basically zero correlation between each of the skill statistics and park factors we used.  I think in the long run, teams may be better able to match ground ball pitchers with high-homer parks, but they don’t really yet giving us a much happier dataset to work with.


#55    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 14:40

Part 4:

http://www.baseballprospectus.com/article.php?articleid=10042

To be blunt, our goal was to beat everyone at predicting park-adjusted ERA in the following season

Isn’t this a problem?  Suppose for example that no pitcher changes team between 2008 and 2009.  And suppose that Jimenez has a 3.50 FIP and ERA with half his games at Coors in 2008, and he has a 3.50 FIP and ERA with half his games at Coors in 2009. 

But, if you make park-adjusted ERA the test number for 2009 (say 3.20), then it’s going to look like FIP did badly.  But, it didn’t!

So, I disagree that the test case is park-adjusted ERA.  It should simply be ERA.

***

and beat everyone but FIP and tRA in terms of same-year predictive value. Though it may sound counterintuitive to openly seek a third-place finish in something like this, the rationale is that both FIP and tRA treat HR/FB as skill rather than luck

FIP does NOT treat HR or HR/FB as a skill.  FIP treats HR as an observation, and is uninterested in the level of luck associated with it.

FIP is like OBP or SLG in that it does not care how much luck is in any observed event.

***

Stat YR-Same YR-Next
SIERA 0.957 1.162
tRA 0.755 1.222
FIP 0.773 1.224

Those are the RMSE.  So, in terms of explaining current-year ERA, FIP is off by 0.77 runs per game, and SIERA by 0.96 runs per game.  And MattERic’s explanation is correct: FIP uses HR and SIERA doesn’t.

As for next-year ERA, SIERA is off by 1.16 runs per game and FIP is 1.22.  If there is a story here it’s not that SIERA did so well, but that FIP did so well!  SIERA is an improvement in this test, but it’s a small step forward.

Not to mention that I did say the test would be against actual ERA, not park adjusted.

If MattEric insist on the park-adjusted, then how about this test: only do the testing on pitchers that did NOT change teams.  This way, you don’t have to park-adjust anything, and the test is apples-to-apples.  What if the reason that SIERA does well is because it park-adjusts and FIP doesn’t?

MattEric: can you test that?

***

Also, can you add kwERA?

5.40 - 12*(K-BB)/PA

BB should be BB-IBB+HB, but use whatever all the other metrics are using.  And set “5.40” so that it floats by league/year.

Basically, this would be the baseline that all the other metrics should try to beat.  The Marcel-FIP, so-to-speak.


#56    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 14:46

To elaborate on a commenter at BPro (Juris):

to prefer SIERA to FIP—or do they each beat the other as often as not

In a head-to-head, how often does SIERA beat FIP by, say, more than 0.20 runs, and how often does FIP beat SIERA by more than 0.20 runs?  And how often do they tie?

Only for the 2009 out-of-sample data, and preferably, without the park-adjustment of 2009 ERA (of same-team pitchers).

If let’s say we have 200 pitchers that qualify, is SIERA better 40 times, FIP better 30 times, and they pretty much tie the other 130 times?  That gives SIERA a .525 win%.  (In this illustration)

I think if you present it like this, it’s more real.  RMSE of 1.25 and 1.28 doesn’t really mean much to most of us.


#57    Colin Wyers      (see all posts) 2010/02/11 (Thu) @ 14:56

I can speak to this, and Matt or Eric can step in to correct me if I’m wrong.

I wrote a set of SQL scripts to figure out tRA for Matt and Eric. These used single-season run expectancy tables that I generated. Anyone is free to review the code I used for those run expectancy tables:

http://basql.wikidot.com/house-linear-weights

I would argue that those are at least as good as anything out there. But I could be biased. If I’m wrong, please let me know.

From there, I calculated tRA as so:

http://basql.wikidot.com/tra

That’s all based upon this:

http://www.statcorner.com/tRAabout.html

I did not apply park adjustments, nor did I apply any of the regressed versions. Matt and Eric can speak to how they handled park adjustments. I don’t feel like I know enough about the calculation of tRA* to do a fair comparison.

But to the best of my abilities, the version of tRA used in Matt and Eric’s testing was “optimized” to the available data. If anyone knows of a reason that my abilities have fallen short here, please let me know.


#58    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 14:57

Isn’t the other story how xFIP (which just replaced the HR term with a league constant rate per FB) simply not that different from FIP?  Indeed, FIP was better than xFIP as much as SIERA was better than FIP.

So, doesn’t that mean that HR, not leagueRate per FB, is the better metric here?


#59    Sky      (see all posts) 2010/02/11 (Thu) @ 15:00

"I don’t know if using different park factors would matter right now, because as it stands, there is basically zero correlation between each of the skill statistics and park factors we used.”

Matt, by “skill statistics” do you mean GB/SO/BB or the terms in your regression equation?

If it’s the first one, haven’t you said that you’re trying to pick up on things beyond GB/SO/BB via the regression (like BABIP or HR/FB)?  So using better park factors might buy you more accuracy because the parks have more of an affect on those metrics than the three base metrics?

[Are you using runs park factors or component park factors?]


#60    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 15:00

By the way, this thread reminds me ALOT of the old BaseballBoards.com, where MGL, Voros, DavidSmyth, Patriot and I and others were making our bones.  Anyone from around back then can appreciate that.


#61    Colin Wyers      (see all posts) 2010/02/11 (Thu) @ 15:00

I guess that depends on what you mean by “better.”

But it’s an odd result, and certainly not what I would have expected - when I did my split-half testing of FIP, xFIP and tRA, xFIP was the downright winner. I don’t know why that wouldn’t be the case here.


#62    Sky      (see all posts) 2010/02/11 (Thu) @ 15:04

Sparked by the “might as well compare to PECOTA” comment regarding tRA* and SIERA*, well, that’d be pretty interesting, comparing to PECOTA or other projection systems.


#63    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 15:07

Guys, I hope you guys will release your data, so the rest of us can reproduce and perform our own validation.


#64    Matt Swartz      (see all posts) 2010/02/11 (Thu) @ 16:25

Rather than run a million tests ourselves (especially because I don’t have the computer skills to really run all these separately), the SIERA numbers will be available on the Statistics Sortables page really soon, and then people can run their own tests. 

I see park-adjusting SIERA as a feature rather than a bug actually, so I don’t see the point in removing the guys who changed teams or whatever, or using ERA.  But you can run your own tests.  We’ve spelled out our methodology and you can reproduce it as need be.  We think this was the best way of doing things, but people can run their own tests.

In case the thing about the park factors is true, we checked that just now and it seems to work out fine with the coefficients changing little, almost all the SIERA results staying within .05 and the ERA estimator leaderboard looking about the same.  This is because fortunately none of our regressors correlated at all with the park factors so the coefficients were unbiased.  We’ll update the formula in that case.

As far as the xFIP issue, I was surprised too.  Probably the most likely case is that one of the biases in FIP is correlated with HR/FB in such a way that it moves in the right direction.  That’s just a guess, though.  I’m not really sure why.


#65    Patriot      (see all posts) 2010/02/11 (Thu) @ 16:27

I should probably know this already, but does anyone have a breakdown on the percentage of each hit type (FB, PU, LD, etc.) that results in each outcome (S, D, T, HR, error, etc.)?  Given that breakdown, anyone could cook up a “BsR xFIP”.


#66    Matt Swartz      (see all posts) 2010/02/11 (Thu) @ 16:32

Patriot/55:

I don’t but it seems silly because a major result that people are missing here is that LD/Batted Ball net of team (and therefore park and defense) has an ICC for 03-09 of:

0.007

It’s just not something that is a pitcher skill.  Line drives will have a very high run value and the pitcher will be credited or blamed with the hitter’s ability to square up the ball.


#67    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 16:44

Matt: can you provide the raw data that you have, so that we all are working off the same dataset?  You know, whatever it is you have:

playerid,year,PA,IP,GB,FB,LD,Pop,SO,BB,IBB,HBP,HR,ERA,adjERA,FIP,xFIP,SIERA, etc,


#68    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 16:53

LD/Batted Ball net of team (and therefore park and defense) has an ICC for 03-09 of:

0.007

Is that r or r-squared?  If r-squared, that’s around r=.084.

MGL’s DIPS Revisited article:
http://www.baseballthinkfactory.org/files/primate_studies/discussion/lichtman_2004-02-29_0/

(By the way, Patriot, you can get your answers there too)

There he shows that the correlation (r) of the frequency of IF line drives is .049 and of the OF line drives is .009.  So, yes, matches what Matt is saying.

He also shows the correlation of the rate of hits per line drive.  The r was a negative for IF line drives and a huge .365 for OF line drives.

So, while the frequency of line drives does not matter as Matt is correctly suggesting, there is still an out-making skill at line drives themselves.

Groundball frequency (GB per BIP) is a huge r = .74 meaning that you can identify who is a groundball pitcher pretty easily.  Outs per GB is a low r=.062 meaning that you can’t tell who is effective at being a GB pitcher.

Basically, the opposite issue of the line drive.


#69    Colin Wyers      (see all posts) 2010/02/11 (Thu) @ 17:06

We’ve had this discussion before; event rates per batted ball type are in the comments:

http://www.insidethebook.com/ee/index.php/site/comments/the_best_of_this_week_at_bpro/


#70    Patriot      (see all posts) 2010/02/11 (Thu) @ 17:08

My point that it wouldn’t be hard to do a BsR version stands regardless of how one wants to treat the line drive.


#71    Matt Swartz      (see all posts) 2010/02/11 (Thu) @ 17:27

The LD ICC of .007 was an “r” and not “r^2”.

I don’t have a whole big dataset available with everything in it, but it should be pretty reproducable.  We did everything kind of piecemeal and so we both have a bunch of different spreadsheets on our computers.  It should be easy to redo.  The batted ball numbers were from Retrosheet and the park factors are from the Lahman database, and the xFIP was just reproducing what Studes told us to do.  So that should be enough to make your own data.  I just don’t have it easy to post form.


#72          (see all posts) 2010/02/11 (Thu) @ 18:02

david gassko’s lips is similar to tra but starts by giving league average ld rate to each pitcher. from there he assigns the average bsr to each batted ball component. this seems to be what colin was writing about back in august on the other thread. today, tango also seems to be suggesting gassko wouldn’t be wrong to assign an individual run value to those league average ld’s. 

i don’t know why li ps isn’t used more, but it probably has to do with the fact that i can’t even find the statistic listed anywhere. 

there is an explanation of the method here: http://www.hardballtimes.com/main/fantasy/article/explaining-lips/

i have interest in a metric that predicts next year era’s better than fip even if it is only marginally better. i think tango is correct that we won’t be throwing fip out any time soon and i’m not sure siera will beat fip when all is said and done. siera looked impressive on their tests today but the lack of fip park factors is significant.

i’m also interested in how lips could do against fip, tra, siera etc. matt’s done a great job of promptly answering all these questions but it looks like this future testing is largely on the readers.


#73    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 18:03

Fair enough Matt.

***

Here’s another test of SIERA:

http://www.insidethebook.com/ee/index.php/site/article/what_does_siera_think_of_walks/

I hope you guys appreciate what I am doing, and don’t think I’m trying to pick on you or something.  Frankly, if I’m being annoying, tell me to stop, and I won’t bother.  (Basically, this is your one chance to stop me.) I quite enjoy doing this.


#74    Matt Swartz      (see all posts) 2010/02/11 (Thu) @ 18:11

Feel free to run tests. I’m sure that every metric has some flaws but it’s testing very well, and clearly is picking up on some real effects. Keep in mind that we might need to make a slight change to the formula based on the Lahman issue, it looks like, so you might want to wait a bit, but it shouldn’t change things much.


#75    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 18:14

Matt, in light of my finding in post 72, I’d like to see how real-life pitchers respond, maybe breaking them up into quintiles by walks per PA.  According to my test, SIERA will over-forecast the low-walkers and under-forecast the high-walkers.

If in fact this does NOT happen (SIERA shows no bias here), then this may be an interesting finding in that low/hi-walkers shows a tendency that cancels out this bias.


#76    bucko      (see all posts) 2010/02/11 (Thu) @ 21:16

Matt, perhaps this has been mentioned elsewhere, but you really ought to be estimating cluster-adjusted standard errors.  Otherwise, your estimates will be inefficient.


#77          (see all posts) 2010/02/11 (Thu) @ 22:23

Matt, I’m trying to play around with SIERA on my own. Would you know why excel is giving me a hard time with the formula if I’ve copied it straight from your BPRO text?


#78    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 22:27

NLBB15: your post keeps getting blocked.  I’ll work on it tomorrow.  I’ve got it, and it’s not lost.


#79    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 22:32

JD,this is what I use from Column B to column Q:

PA BB SO BatBall GrnB LinB FlyB rBB rSO rBatBall rGrnB rLinB rFlyB GB-FB GBFBsign SIERA

And this is what is in Q2:
=6.262 - 18.055*J2 + 11.292*I2 - 1.721*O2 + 10.169*(J2^2) - 7.069*O2^2*P2 + 9.561*J2*O2 - 4.027*I2*O2


#80    Tangotiger      (see all posts) 2010/02/11 (Thu) @ 23:06

jinaz
(28427)

I might have missed it, but were the years in which you tested the equation the same years that you used to generate the regression coefficients? I’m sure you’re aware of this, but doing so will make your equation look exceptionally good...but that won’t necessarily carry over into a new dataset.
-j

Justin is correct that the in-sample test would look much better than the out-of-sample test.

And, Matt shows us the data:

Next-year ERA for
03-04, 04-05, 05-06, 06-07, 07-08, 08-09

SIERA 1.107 1.141 1.179 1.186 1.107 1.248
QERA 1.237 1.237 1.219 1.277 1.206 1.316
xFIP 1.284 1.403 1.211 1.404 1.287 1.311
FIP 1.120 1.230 1.298 1.236 1.170 1.283
tRA 1.162 1.202 1.273 1.216 1.171 1.307
ERA_pk 1.391 1.388 1.488 1.429 1.390 1.493

Notice how the RMSE for SIERA of the first 5 in-sample data has a range of 1.11 to 1.19, while the out-of-sample is 1.25.

For FIP, for the same periods it’s 1.12 to 1.30, wiht the out-of-sample of 1.28.

For FIP however, all the data is out-of-sample, since FIP was not calibrated with those years.  That’s why the last period fits right into the other periods.  The same applies for xFIP.

The results for QERA and tRA makes it look like they may have been calibrated for the 2003-08 data like SIERA, but I don’t know.


#81    Depot      (see all posts) 2010/02/11 (Thu) @ 23:17

bucko,

Clustering has nothing to do with efficient estimates.  What’s your concern?  What would you cluster on anyway?


#82    bucko      (see all posts) 2010/02/12 (Fri) @ 10:15

Presuming the same pitcher appears in the data more than once, I’d cluster on pitcher.


#83    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 11:30

bucko, I have no idea what you are talking about.  Can you be nice enough to post several sentences to explain yourself, with illustrations.  Pretend we are in high school, but are keen on learning.  That’s your audience.


#84    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 13:07

NLBB/72 was marked for moderation and is now finally open.

I apologize for the lowercasing, but that’s what I had to do.


#85    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 14:12

Part 5:

http://www.baseballprospectus.com/article.php?articleid=10045

I’ll comment after reading.


#86    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 14:30

FIP and xFIP multiply home runs and expected home runs, respectively, by 13 and divide by innings pitched. That treats the effect of blasts as constant for all pitchers, but the damage for someone like Pineiro is largely counteracted by his lack of ducks on the pond.

Empirical evidence says otherwise:
http://www.insidethebook.com/ee/index.php/site/article/linear_weights_by_run_environment/

RunsMin_RunsMax_n___RperI____1B_____2B_____3B_____HR_____NIBB
0.000___0.439___15___0.419___0.436___0.720___0.986___1.402___0.279

That is, the run value of a HR is 1.40, for the 15 league-seasons (leagues, not pitchers) that allowed an average of 0.419 runs per inning (3.77 per game, pretty close to Pineiro).

And that 1.40 is pretty constant.  Maybe at sub-3 runs per game it drop a little bit.  But, certainly not at the 3.7 level.

This is the kind of adjustment that SIERA makes, as it allows for a quadratic term on ground balls. Both QERA and xFIP are clearly high for the reasons above, but FIP is nearly as close as SIERA with Pineiro and is on the low side, for the reason that Pineiro only allowed 6.5 percent home runs per fly ball, below the league average.

I looked at the top 20 (career of 2002-09, min 1500 PA) in GB per PA.  Here are their SIERA and ERA and FIP:
4.22 SIERA
4.16 SIERA-adjusted
4.17 ERA
4.14 FIP

Almost a tie.

I had to normalize SIERA because it’s .06 too high compared to my sample.

How about top 20 in fewest BB/PA?
4.04 SIERA
3.98 SIERA-adjusted
3.95 ERA
3.93 FIP

Almost a tie.

How about a combination of high GB and low BB?  There were 9 pitchers who were at least one SD from the mean in both:
4.18 SIERA
4.12 SIERA-adjusted
3.82 ERA
3.94 FIP

FIP better.

I gotta go back to work.  I’ll be back.


#87    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 14:53

About Johan Santana:

The real issue is that FIP and xFIP are too bearish on home runs, neglecting to realize that his prowess when it comes to whiffing and walking mitigate the results even if he lacks the ground-balling tendencies as other star pitchers

The real issue is that Santana has a career BABIP of .263 with men on base and .295 with bases empty.

The real issue is that Santana gives up 0.59 HR per 20 bases empty PA, and 0.44 per 20 men on base PA.

While the average run value of a HR is 1.40 regardless of run environment, it will be much different if you throw a disproportionate number of those HR, given that run environment, with bases empty.

So for Santana, his personal LWTS HR becomes something like 1.25 runs per HR.  (Had he given up as many HR with bases empty as with men on base, he’d be at 1.40… I know, hard think to understand.)

Santana is not a good test case for anything.  I don’t think that SIERA hitting so well on Santana is a good indicator.


#88    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 14:58

To say it another way regarding Santana: his ERA is severely biased because the way he performs with men on base is so different from with bases empty.  We expect to see a disparity between an estimator and ERA, if that estimator DOES NOT look at splits by men on base.

If an estimator does match his ERA, then that’s not a good sign.


#89    Matt Swartz      (see all posts) 2010/02/12 (Fri) @ 15:42

I would not do career SIERA’s because skills can change too much over several years.  The goal of FIP is to estimate the effect of DIPS stats on ERA.  The goal of SIERA is to estimate the effects of DIPS skills on ERA.  So SIERA is trying to pick up unobservables.

The “how does SIERA do on subsets of data?” question is obviously a tricky one.  We know that (1) SIERA did better overall, (2) It did better on the subsets of data that we tested in Thursday’s article.  Obviously there are sister points.  For instance, we found that it did better overall, overall for high-GB pitchers, but particularly well for high-GB/high-BB pitchers.  Tango found it did worse for high-GB/low-BB pitchers.  That’s not all that surprising.

BABIP is correlated negatively with strikeout ability.  That’s something Bradbury pointed out in like 2004.  Thus, SIERA picks this up pretty well and gives more credit to strikeouts than the direct effect of an extra out.  That’s a good feature. 

We did still overestimate Santana’s ERA and a good part of that is Santana’s men on/bases empty strengths.  But I would suspect that there ARE splits skills that are correlated with DIPS skills, in general, if not in the Santana case.

The major difference I think is that SIERA and FIP are really coming from different methodologies.  SIERA is trying to work backwards from ERA because it tries to be agnostic about correlations.  FIP tries to build upwards from components.  Both are useful techniques, and will have their strengths and weaknesses.  I think that learning all we can about baseball entails picking up the effect of unobservables AND picking up on the direct effect of individual events.  So it’s good to have a statistic like FIP that asks how one strikeout effects expected run scoring.  It’s also useful to see if strikeout pitchers have other skills or have other ways of affecting run scoring.  Perhaps they strikeout clean up hitters a little more often, and so maybe their strikeout skills are higher leverage strikeouts coming with more men on base?  I’m not sure.  I don’t think it’s possible to test all these things.  Will doing a regression miss some things?  Absolutely.  Will doing linear weights miss some things?  Absolutely.  Will they miss different things?  Absolutely.  So let’s continue to do both.  If I told you only that a pitcher had a FIP of 4.00 and a SIERA of 3.50, and then I said you had to guess if a pitcher had an ERA above or below 4.00?  I hope you would guess below.  If I then asked if you to guess whether he had an ERA above or below 3.50, I would hope you would guess above.

I’m going to be out of town this weekend, but I might check in at some point later.  I hope people will think to look up a pitcher’s SIERA in the process of evaluating him from now on, though.  If you continue to look only at FIP when both are just a click away, I think that’s silly.


#90    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 15:52

Ditto all of Matt/89, with an intriguing question mark on this:

But I would suspect that there ARE splits skills that are correlated with DIPS skills, in general, if not in the Santana case.

Warrants research.

***

Having SIERA (or something like it), side-by-side with FIP (or something like it), would be ideal, precisely for the reason that each does its own thing, with some overlap, and each brings something unique to the table.

***

By the way Matt, after I sent you my test results, I noticed that I really need to calibrate SIERA on an annual basis, like with FIP, in order to get it perfect.  FYI for those who think they can just use the equation as-is.


#91    Brian Cartwright      (see all posts) 2010/02/12 (Fri) @ 17:31

Matt, can you post the rmse formula you used. When I test your data, I am getting results much lower than you posted in the article. I want to be able to reproduce what you have before going further with tests.

I’m using
SQRT( SUM( (SIERA-ERA)^2 * IP) ) / SUM(IP) )


#92    Brian Cartwright      (see all posts) 2010/02/12 (Fri) @ 17:46

and to make sure I have the formula right.

I used ABS to get the one square term to be + or -

6.262
+11.292*(BB/PA)
-18.055*(SO/PA)
+10.169*((SO/PA)*(SO/PA))
–1.721*((GB-FB-PU)/PA)
–7.069*(((GB-FB-PU)/PA)*ABS((GB-FB-PU)/PA))
–4.027*(BB/PA)*((GB-FB-PU)/PA)
+9.561*(SO/PA)*((GB-FB-PU)/PA)
AS SIERA,


#93    Depot      (see all posts) 2010/02/12 (Fri) @ 19:45

bucko/82,

Right, so you should adjust the standard errors for clustering by pitcher.  That’s a good point.  But the estimates are unaffected by this adjustment.  The only nuance here is that Matt/Eric (sort of) used stepwise regression so clustering could affect which variables they include/exclude.  Hmm...that might be a fair point if any of the coefficients were only borderline significant.

(Tango...basically, the way they (presumably) calculated standard errors assumed that each observation was independent.  This is definitely not true if you include the same pitcher more than once in your sample.  In reality, you have fewer independent observations than the regression “thinks.” So you need to increase your standard errors.)


#94    Matt Swartz      (see all posts) 2010/02/13 (Sat) @ 01:28

Brian, the SIERA formula was updated for fixed park factors (the articles have it updated, as does the glossary).  Also, we weren’t weighting the RMSE by IP.

To be honest, I’m not sure about the point suggesting clustering the standard errors or how to do that.  I think it’s best not to use hardfast rules for significance in a regression with only seven years of data.  Significant effects might not show up as significant in this case, so you do have to use some intuition.  If there were a way to refine the estimation of the coefficients of these terms, I’d be curious how to do it certainly, but I think that we get pretty significant coefficients if not on the BB*GB term which seems tied with BB*SO enough that it’s tough to take it out (especially since it barely moves the actual SIERAs more than +/- 0.10) and because no normal coefficients (which should probably not be further from 0 than -10) could be significant with seven years of sample size.


#95    Depot      (see all posts) 2010/02/13 (Sat) @ 04:42

Clustering could make a big difference in significance but, at the same time, I believe that intuition is probably more important.  Still, I always feel uncomfortable with stepwise regression and would actually prefer if you just kept all the terms in, but I can see why you might not want to.  (If you’re using Stata, clustering is especially easy to implement.)

I’m also not sure that just squaring terms and interacting them is the best way of accounting for non-linearities.  It’s still a very parametric way of doing it.  Why not just create “buckets” of players based on some of the stats?


#96    mulkowsky      (see all posts) 2010/02/13 (Sat) @ 09:48

I’m still wrestling with the finding that FIP is a better Y+1 ERA predictor that xFIP, which is also at odds with this study.  http://www.hardballtimes.com/main/article/how-well-can-we-predict-era/

Thoughts?  Anyone replicate Matt/Eric’s results?


#97    Eric Seidman      (see all posts) 2010/02/13 (Sat) @ 10:59

I’m re-checking everything pertaining to xFIP this weekend as Ockham’s Razor suggests the simplest solution often wins out—and that would be that I simply and embarrassingly coded something incorrectly. If I did, I’d rather fix it now than let it fester. I’ll be back Sun night/Mon morning and will revisit as soon as I step foot in my house.


#98    Tangotiger      (see all posts) 2010/02/23 (Tue) @ 14:29

Eric confirmed that there was a coding error with xFIP:

http://www.baseballprospectus.com/unfiltered/?p=1516

xFIP and SIERA are essentially equals.

***

And as we discussed in the other thread, bbFIP is just as good, if not better, than either of them.

Eric: if you can, please add bbFIP and szERA to your future tests.
http://www.insidethebook.com/ee/index.php/site/comments/tangos_lab_batted_ball_fip/

Also, since your dataset and my dataset differs, it would probably be best that you best-fit bbFIP and szERA to your in-sample dataset.  And you should do the same for all the other metrics.

Otherwise, what you have is SIERA being best-fitted to the in-sample data, while the other metrics were best-fitted to some other dataset.

***

This is bbFIP:

ERA = 11*bigs + 3*smalls + constant

bigs = [(BB+LD) - (SO+iFB)] / PA
smalls = (oFB - GB) / PA
constant = whatever you need to align to league
BB is walks minus IBB plus hit batters.

So, you run a regression to get the best fit for 11, 3 and constant.

***

This is szERA:
ERA = 11*(BB-SO)/PA + constant


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jul 30 03:43
Roy Halladay’s Bobby Orr career

Jul 30 02:33
Cleveland: Meet Patrick Roy

Jul 30 01:42
“I believe…”

Jul 30 00:30
Maddon at it again…

Jul 29 23:04
Introductions: Strasburg, BABIP… BABIP, Strasburg

Jul 29 20:31
Bannister: the greatest saberist spokesperson ever

Jul 29 19:25
Gotta give Joe Torre some credit

Jul 29 19:10
SABR 111 - Out value

Jul 29 17:47
Reducing bias in fielding metrics

Jul 29 17:44
Colin full-time at BPro