THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, March 17, 2010

How much year-to-year variability do we find in ERA for a star pitcher?

By Tangotiger, 02:52 PM

This is what I did.  I started with the list of the 12 best starting pitchers born between 1962 (Clemens) and 1971 (Pedro).  I took out Smoltz because of his relief stint in-between his starting stints, leaving me with 11.

For each of those pitchers, I figured out his first quality year, which I simply defined as having allowed runs at most 80% of the league average while facing at least 500 batters.  I then looked for his last quality year, and set his “last season” as one greater than his last quality year.

I get this chart:


minYear    minAge    maxYear    maxAge    playerID
1990    23    1997    30    appieke01
1995    30    2003    38    brownke01
1986    24    2005    43    clemero02
1988    25    1999    36    coneda01
1991    25    2002    36    glavito02
1993    30    2004    41    johnsra05
1992    26    2002    36    maddugr01
1994    23    2005    34    martipe02
1992    24    2003    35    mussimi01
1985    21    1994    30    saberbr01
1992    26    2004    38    schilcu01

There were 141 pitching seasons in there, from Pedro’s 34% of league average 2000 season, to David Cone’s 135% also of the 2000 season. Basically what I have are 141 seasons where we had reason to believe that our pitchers were at their peak.  (I did the “plus 1” on the last great year because entering that year, we still thought he was great.)

The average of these great seasons from these great starters was 73% of the league average.  So, in a league where the average ERA is 4.30, these eleven guys put up, on average, for those 141 seasons, an ERA of around 3.14.

What was the RANGE of their posted ERA?  With a mean of 73% of league average, one standard deviation was 16% (roughly 0.69 runs).  The 10th percentile was 54% and the 90th percentile was 96%.  The 50th percentile was the same as the mean.  So, 80% of the time, even if you KNOW you’ve got yourself a 73% pitcher, he’s going to post an ERA that is 54% to 96% for whatever reason (good luck, bad luck, temporary injuries, temporary loss of talent, etc).  That is a range of 42% (96 minus 54) of the league average ERA or roughly 1.80 runs per 9 IP!  That is, 80% of the time, a pitcher will post an ERA +/- 1 run from what his talent level is.

This is for pitchers who we “knew” were great, and we only started the “great” clock once they actually performed at a great level.

So, what to expect of Tim Lincecum and his forecasted 73% RA according to Chone (a match to our group of 11 stars)?  What is the chance he will post an ERA of 2.50 or less (58% of league average)?  That happened 20 times, or 14% of the time.  And if he posts an ERA that is 92% or worse of the league average, that also has happened 20 times with our star pitchers.  That’s an ERA of 3.96.  And if Lincecum is actually a bit better than Chone thinks (Marcel says 67%).  Well, that 3.96 is going to come down a bit.

And, what did the Fans say in a recent poll on my site?  3.88.

Give yourselves a fantastic pat on the bat boys.  You figured out in a blink of an eye, what took me thirty minutes to code.

#1    statzombie      (see all posts) 2010/03/17 (Wed) @ 15:57

This and much of the debate around Strasburg over the last day seem to presuppose that the errors are both symmetric and have constant variance. I’m not sure either of these are valid.

With regards to the symmetric argument, I am not sure there is a sound argument here, but it seems possible that younger pitchers especially will have a non-symmetric error distribution. If Strasburg is projected to have a 3.50 ERA, I am not sure it makes sense that a 2.50 and 4.50 have equal probability. Nor 1.50 and 5.50. Of course, the quantiles you give suggest a roughly symmetric distribution. However, those pitchers are cherry picked as the very best, and only their best seasons are included. My guess is once you include would be greats who didn’t make it, it will be very skewed.

However more importantly, can we assume constant variance? At the very least, can we get a plot of error vs time? We need to be careful about including those “plus 1” years, as the variance will be different (by definition, these years are not great, and will necessarily increase estimated variance of the great years).


#2    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 16:04

"This” does NOT presuppose anything.  I am using empirical results.


#3    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 16:09

For example, I showed this as empirical results:

90th: 54%
50th: 73% (also the mean)
10th: 96%

So, the gap between the numbers is 19% on one side and 23% on the other side.  So, there is slight asymmetry.


#4    statzombie      (see all posts) 2010/03/17 (Wed) @ 16:38

Right, asymmetry between past cherry picked results. What if you include the guys who were suppose to be good but weren’t? I am talking about in the context of projecting Strasburg. Perhaps I replied to the wrong thread, sorry about that.

And yes, you presuppose constant variance. You cannot give me an empirical estimate of the standard deviation without presupposing this, unless of course you modify it in some way to be more robust.


#5    statzombie      (see all posts) 2010/03/17 (Wed) @ 16:41

Please let me rescind that previous comment, it makes no sense.

Empirical results obviously presuppose nothing. Your choice of empirical methods does.


#6    Rally      (see all posts) 2010/03/17 (Wed) @ 17:09

"What if you include the guys who were suppose to be good but weren’t? I am talking about in the context of projecting Strasburg.”

What Tango’s doing here is showing the upside.  If Strasburg is the equal of these great pitchers in their primes, what range of results can we expect?

There certainly is a non-zero chance of downside, Strasburg becomming one of those might have beens, like Brien Taylor.


#7          (see all posts) 2010/03/17 (Wed) @ 18:06

I just love it when 100 people get together and make a nice, near normal distribution.


#8    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 18:20

Right, Rally gets what I’m trying to do here.

It’s one thing to say that a 21-yr old Roger Clemens to maybe be the best of all time and maybe be a has-been.  It’s another thing to say “I KNOW” that the 21-yr old Roger Clemens is one of the best ever.

Strasburg is being talked about as “I KNOW”.  And so, I am selecting after-the-fact the best pitchers of a generation.  Guys who, once they breakthrough, were studs for a long time.  And I stop them the year after their last great year.

To me, this pretty much mimics what we are after here.

And, given all that, we find a HUGE variance around performance numbers.  Even though we are REALLY REALLY REALLY certain that our group of studs here has a mean somewhere around the 73% range, we still observe, year-to-year, quite a range of performances.

We all follow baseball.  We know.  We expect some big swings.  Now it’s quantified.  We can speak in an informed manner.

At least more informed.


#9    Mike Fast      (see all posts) 2010/03/17 (Wed) @ 18:38

Strasburg is being talked about as “I KNOW”.

Are you saying that’s happening here at your blog, or in the baseball community in general?

And if it’s here, what comments are you referring to?  Are you referring only to Oliver’s projection? 

I’ve spoken a lot in defense of why I think it might be hard to project Strasburg accurately and why an (otherwise?) functioning projection system might reasonably come out with a surprising projection for him.  That does not mean that I think he is likely to put up a 2.86 ERA in the majors this year.  I said 3.50-4.50 was a reasonable range, and the voters in the poll seem to be not far off that.


#10    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 19:23

To the extent that any pitcher is a “I KNOW” talent, I’m showing the very wide range that you should expect even for pitchers at that level.


#11    J. Cross      (see all posts) 2010/03/17 (Wed) @ 23:32

Distribution of actual ERA - mERA* (normalized to have the same mean as actual ERA) for 2009 (min 100 IP, avg 171 IP):


90.0% 1.42
75.0% 0.58

50.0% -0.05

25.0% -0.64

10.0% -1.22

Std Dev 0.92
N 130


#12    statzombie      (see all posts) 2010/03/18 (Thu) @ 00:58

Thanks for that data. Certainly looks symmetric to me.

I still have some qualms about whether we can actually assume symmetric error distributions for forecasting, but it should not have much impact. In particular, OLS is still perfectly accurate (for linear estimators) in that situation. It’s the constant variance that worries me a little more.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential