Wednesday, March 17, 2010
How much year-to-year variability do we find in ERA for a star pitcher?
This is what I did. I started with the list of the 12 best starting pitchers born between 1962 (Clemens) and 1971 (Pedro). I took out Smoltz because of his relief stint in-between his starting stints, leaving me with 11.
For each of those pitchers, I figured out his first quality year, which I simply defined as having allowed runs at most 80% of the league average while facing at least 500 batters. I then looked for his last quality year, and set his “last season” as one greater than his last quality year.
I get this chart:
minYear minAge maxYear maxAge playerID
1990 23 1997 30 appieke01
1995 30 2003 38 brownke01
1986 24 2005 43 clemero02
1988 25 1999 36 coneda01
1991 25 2002 36 glavito02
1993 30 2004 41 johnsra05
1992 26 2002 36 maddugr01
1994 23 2005 34 martipe02
1992 24 2003 35 mussimi01
1985 21 1994 30 saberbr01
1992 26 2004 38 schilcu01
There were 141 pitching seasons in there, from Pedro’s 34% of league average 2000 season, to David Cone’s 135% also of the 2000 season. Basically what I have are 141 seasons where we had reason to believe that our pitchers were at their peak. (I did the “plus 1” on the last great year because entering that year, we still thought he was great.)
The average of these great seasons from these great starters was 73% of the league average. So, in a league where the average ERA is 4.30, these eleven guys put up, on average, for those 141 seasons, an ERA of around 3.14.
What was the RANGE of their posted ERA? With a mean of 73% of league average, one standard deviation was 16% (roughly 0.69 runs). The 10th percentile was 54% and the 90th percentile was 96%. The 50th percentile was the same as the mean. So, 80% of the time, even if you KNOW you’ve got yourself a 73% pitcher, he’s going to post an ERA that is 54% to 96% for whatever reason (good luck, bad luck, temporary injuries, temporary loss of talent, etc). That is a range of 42% (96 minus 54) of the league average ERA or roughly 1.80 runs per 9 IP! That is, 80% of the time, a pitcher will post an ERA +/- 1 run from what his talent level is.
This is for pitchers who we “knew” were great, and we only started the “great” clock once they actually performed at a great level.
So, what to expect of Tim Lincecum and his forecasted 73% RA according to Chone (a match to our group of 11 stars)? What is the chance he will post an ERA of 2.50 or less (58% of league average)? That happened 20 times, or 14% of the time. And if he posts an ERA that is 92% or worse of the league average, that also has happened 20 times with our star pitchers. That’s an ERA of 3.96. And if Lincecum is actually a bit better than Chone thinks (Marcel says 67%). Well, that 3.96 is going to come down a bit.
And, what did the Fans say in a recent poll on my site? 3.88.
Give yourselves a fantastic pat on the bat boys. You figured out in a blink of an eye, what took me thirty minutes to code.


This and much of the debate around Strasburg over the last day seem to presuppose that the errors are both symmetric and have constant variance. I’m not sure either of these are valid.
With regards to the symmetric argument, I am not sure there is a sound argument here, but it seems possible that younger pitchers especially will have a non-symmetric error distribution. If Strasburg is projected to have a 3.50 ERA, I am not sure it makes sense that a 2.50 and 4.50 have equal probability. Nor 1.50 and 5.50. Of course, the quantiles you give suggest a roughly symmetric distribution. However, those pitchers are cherry picked as the very best, and only their best seasons are included. My guess is once you include would be greats who didn’t make it, it will be very skewed.
However more importantly, can we assume constant variance? At the very least, can we get a plot of error vs time? We need to be careful about including those “plus 1” years, as the variance will be different (by definition, these years are not great, and will necessarily increase estimated variance of the great years).