Monday, November 28, 2011
Forecasting 300 wins
I have several posts on Bill James’ site suggesting giving Cliff Lee a 20% chance at 300 wins is quite optimistic.
It’s throughout the comments of this article (which is free, as are all the comments, including Bill’s). It’s a decent read.
For posterity, all my posts are being reproduced below.
tangotiger
CC and Halladay are just a bit over 3 years apart, and so, they have roughly 1.8 years difference left in MLB (as an estimate). That is, CC has 1.8 more years left. Note: use real ages, and not this fake rounded to some integer.
In their last 3 years, they had virtually the same number of average (weighted) wins, close to 19.5.
So, yeah, CC is going to get an extra 19.5 x 1.8 = 35 wins in the estimate above Doc, going forward. If he’s only 12 down, then he should definitely be way above Doc.
2:17 PM Nov 16th
***
angotiger
What about Cliff Lee? He’s got about 5 yrs at 15 wins, or 75 wins. He’s 181 wins short. So, 75/181 - 0.5 = less than zero.
I mean, we can look for comps like Lee, and I doubt we’re going to found one in four who reached 300.
Therefore, I have to conclude two things:
1. The “pace” formula has changed from wins to something that also includes ERA (i.e., derive wins based on ERA, in addition to using wins). So, Halladay is a better pitcher than CC, so, we expect Halladay to win more. Lee is better than his wins says, etc.
2. There’s still an error in the formula. It still seems unbelievable that Verlander is 4.5 years younger than Lee, is only 12 wins behind him, and has barely a better chance at 300 wins than Lee.
9:06 AM Nov 17th
***
tangotiger
Cliff Lee, through his age 32 season, has 119 wins. Among all pitchers through their age 32 season, he ranks #207 in wins.
Ideally, we’d look for more comps, like guys who were great in their last 3-4 seasons.
Scanning the list of pitchers real quick, you get names like non-300 winners Luis Tiant, Orel Hershiser, Curt Schilling, Kevin Brown, Ron Guidry, Tommy John, David Cone, Dennis Martinez.... and Randy Johnson. My guess is that if you spent more time looking into it, and were more methodical, that you’ll find it’s probably closer to 10% (at best), that a great pitcher will get at least 181 wins after his age 32 season, after pitching great for the previous 3-4 seasons.
Just a guess.
9:16 AM Nov 17th
***
angotiger
At age 29-32, Cliff Lee had an ERA+ (i.e., ERA adjusted relative to league and park) of 146, which mean he gave up earned runs at a rate of 1/1.46 = 68.5% of the league average. He also had 900 IP (and 125 starts).
Of pitchers born 1918 (Feller) through 1971 (Pedro), I looked at the best pitchers via ERA+, with at least 700 IP and 90 starts.
There were 14 pitchers who met the above thresholds, and had an ERA+ of 132 or better. The average ERA+ of those 14 pitchers was 145, and IP was 920.
Two of those pitchers had at least 181 wins from their age33 season onward: Randy Johnson and Warren Spahn. That’s 14%.
These are the 12 pitchers that did not get 181 wins, but were just as good as Cliff Lee, in their age 29 - age 32 seasons:
Greg Maddux
Pedro Martinez
Tom Glavine
Bob Gibson
John Smoltz
Jim Palmer
Kevin Brown
David Cone
Roger Clemens
Curt Schilling
Tom Seaver
Mike Mussina
If I look at the next group of 14 best pitchers, none of them got 181+ wins after age 32. If we look at the 14 best pitchers after that, one (Niekro) got at least 181+ wins.
The only other pitchers who managed to get 181+ wins after age 32 was Jamie Moyer.
Therefore, if I were to set the odds for Cliff Lee, it would probably be close to 10%.
9:40 AM Nov 17th
***
tangotiger
I wrote a simulator to try to answer this question.
First, I constructed the chance that a high-quality pitcher (at age 30-33) is still pitching at any age, starting at age 34. I started at 100% for age 34, and then dropped it by almost 8 percentage points each year.
This gave me an average career of almost 7 seasons (with a range from 1 to 13).
Then, given that he’s pitching, how many wins did he get. I took a uniform distribution, mean of 13 wins, with a range from 4 to 22 wins, and then just randomly assigned that as his wins.
It’s fairly crude, but it works pretty nicely.
Anyway, so the average number of wins per rest of career using this method is 90 wins, which matches the empirical for our gang of 13 above.
The chance of getting at least 181 wins is 3.5% according to this construction of the simulator.
Remember that of these top 14 pitchers, two actually did end up with at least 181 wins (14%), and three out of the top 42 did (7%).
So, it would seem that either the empirical is too small a sample for us to draw a conclusion, or my simulator is too crude for it to be useful.
***
Regardless, even if I adjust the simulator to make it more realistic to have an RJ type of career, I doubt I’d be able to move that 3.5% any higher than 5-10%.
Considering that the original favorite toy had Lee at “under” 0 percent, I have to highly question the results that Dewan is posting here. It’s just not supportable.
1:37 PM Nov 18th
***
tangotiger
I made some changes to the sim to try to string more good seasons together. This keeps the overall average at 90 wins for rest-of-career, but it widens the extremes. The best I could do is get it to 6% chance of at least 181 wins.
However, in so doing, I also increase the chance of the 125+ wins rest-of-career. And, it ends up much higher than the empirical.
My best guess is that the chance that Cliff Lee gets at least 181 wins (and finish with at least 300 wins) is under 5%. His chance at 250 wins is 20%-25%. 20% is what I get from the empirical data, and 25% is what I get from the simulator.
The favorite toy sets the odds for Cliff Lee at 7% for 250 wins.
2:52 PM Nov 18th
***
tangotiger
One last thing. As I said, for a pitcher of Cliff Lee’s caliber and age, the empirical data is showing an average of 90 wins for rest-of-career.
Bill’s favorite toy is suggesting 75 wins for his rest of career. That is too low.
If we use the 90 wins as the better estimate, and we put that in the favorite toy, we get this as Cliff Lee’s chance at 250 wins:
90 / (250-119) - 0.5 = 19%
Given that the empirical is 20%, I call that a bullseye.
So, really, Bill’s method requires a bit of tweaking for “years left”. It should have been 6 years at 15 wins left for Cliff Lee, rather than 5.
2:55 PM Nov 18th
***
tangotiger
“As a result, age is typically a much smaller factor in projecting future pitching performance. “
If that is an open challenge, please specify exactly what you are saying. Because I have no doubt that I will be able to disprove what I think you are saying. So, specify exactly the hypothesis please.
12:25 AM Nov 22nd
***
tangotiger
The only way it makes sense that someone who is a bit more than three years older (please, don’t round to an integer first and then subtract… that’s just bad math), and with just a 12 win lead can have the same chance as the other guy to reach 300 wins is if he’s much better than the other guy.
Let’s use the Favorite Toy, and work our way backwards. To have a 50% chance at winning 300, if you have 188 wins means that you are forecasted for an average of 112 wins.
To have a 50% chance at winning 300 if you have 176 wins means that you are forecasted for an average of 124 wins.
Let’s give Halladay 7 remaining years, and CC 9 remaining years. That means Doc is going to average 112/7= 16 wins a year and CC will average 13.8.
I suppose it’s possible that Doc is really much better than CC (notwithstanding NYY v PHI offense support). But, it sure seems hard to believe that Doc can average 16 wins a year for 7 years.
I showed that the best pitchers in history averaged 13 wins a year for 7 years. You can push it to 14, I doubt you can go to 15, and 16 is really really pushing it unreasonably.
Anyway, without the benefit of seeing the system, as well as the testing of the system, it’s hard to say more.
8:49 AM Nov 22nd
***
tangotiger
b-r.com uses the Bill James “seasonal age”.
My point is that the actual difference in age between CC and Doc is a bit over three years.
Bad math says to first convert each person’s age into an integer (30 and 34 in this case), and then subtract the two integers.
Correct math says to use the actual date of birth, and subtract the two ages, and report the actual difference, which is 3.2 years apart in this case.
Given that the entirety of the discussion rests on the gap in their ages in return for the 12-win headstart for Doc, then it makes a big difference if we treat the difference in their age as 4 or as 3.2.
2:34 PM Nov 22nd
***
Has the 300-Game Winner Become Extinct?
By John Dewan
November 16, 2011
It seems like every time a pitcher reaches the magical mark of 300 wins, many fans and baseball people wonder aloud: “Is this the last time we’ll see someone reach 300 wins?” That was a popular sentiment after Greg Maddux reached the mark in 2004, then Tom Glavine (2007), and most recently Randy Johnson (2009).
At the end of the 2011 season the closest active pitcher to 300 wins was Tim Wakefield, Boston’s 45-year-old knuckleballer. Wakefield notched career win number 200 on September 13. Of course, the seemingly immortal Jamie Moyer has 267 career wins and is attempting to come back from Tommy John surgery, but Moyer turns 49 in four days (November 18). It seems unlikely that either of these two veterans will reach 300 wins. Is the 300-game winner an extinct breed?
Not at all.
Each year, in the Bill James Handbook Bill lists the players he thinks are the most likely to reach 300 wins based on a formula he devised to measure a pitcher’s chances for this sacred milestone. The key to the formula is the pitcher’s momentum (wins in recent seasons) matched up with his win total thus far in his career.
Here are the top-five 300 win candidates heading into 2012:
Player 2011 Age Career Wins Chance at 300 Wins
Roy Halladay 34 188 49%
CC Sabathia 30 176 48%
Justin Verlander 28 107 31%
Cliff Lee 32 119 24%
Dan Haren 30 107 19%
Roy Halladay and CC Sabathia each have around a 50-50 shot at winning 300 games. Justin Verlander only had a 10% chance at 300 wins entering the 2011 season, but after a 24-win season, his chances skyrocket to 31%. The chance that one of these five gets 300 wins in his career is about 90%.
For the complete list of 300 win candidates, check out the Bill James Handbook 2012 in stores and available at ACTASports.com now.
COMMENTS (40 Comments, most recent shown first)
tangotiger
Interestingly, if you use my suggestion, and look at ALL pitchers to figure out the maximum boundary, you’ll get much higher totals (thanks to Niekro, Moyer, et al). Now the max will be more like 220 wins at age 34.
Anyway, if you do use the min/max uniform technique, then being 181 wins away means you have a 20% chance of getting to 300 wins.
That matches what Bill and John have basically been saying.
Of course, if you make the min/max as 22/220, then the average is 121 wins. And we know the average can’t really be 121 wins (in between Maddux and Glavine, and we can presume they are not pitchers who have an “average” career path after age 33). So, using that technique means you can’t use a uniform distribution.
The reality is that if the average is closer to 90-100 wins than it is to 121 wins, and even if you keep the top-end at 220 wins, you’ll have to reign enough players toward the center. You have no choice but to bring it down below 20%.
So, 10% seems to be the plausible maximum expectation if you want to build a model. And closer to 5% for a more reasonable model.
3 minutes ago
tangotiger
If we treat the chance of getting 20 to 200 wins as equal (a pitcher is just as likely to get 20 wins from age 34-onward as he is to get 35 wins or 78 wins or 126 wins, etc), then figuring out the chance that Cliff Lee can get 181 wins becomes straightforward.
You set your boundary wins (25 and 200 in this case), and see that 181 wins or higher occurs twenty times (200-181+1) out of 176 (200-25+1), or 11%.
So perhaps an easy way to do this, to at least have some sort of sniff test, is to set the “maximum boundary” wins for each age. We see in this case, RJ and Spahn were both close to 200, so that’s an easy one. Do this for every age, and we can come up with a plausible maximum number.
We can set the minimum boundary range to 10% of that figure.
Finally, if we can presume a uniform distribution, then we can come up with an initial sniff-test number for any (really good) pitcher at any age.
2 hours ago
tangotiger
1. I calculated each pitcher’s age as season minus birth year.
2. I took the top 20 pitchers in RA per 9 innings, relative to league, at age 30-33, for pitchers born between 1921 (Spahn) and 1971 (Pedro).
3. I then calculated number of wins from age 34 to end of career. These are the results:
Wins playerID
199 johnsra05
197 spahnwa01
172 clemero02
134 maddugr01
118 glavito02
117 schilcu01
108 seaveto01
106 mussimi01
104 gibsobo01
85 leiteal01
78 fordwh01
72 brownke01
58 coneda01
53 palmeji01
45 stewada01
37 martipe02
35 keyji01
29 scottmi03
25 rogerst01
22 tudorjo01
Given the relatively small number of data points, that actually follows very closely to just a straight line. Indeed, the correlation of those points to a simple straight line is r=0.97.
So, yes, it does not follow a normal distribution, but rather a uniform distribution.
Giving Cliff Lee a 20% chance of getting 181 wins seems on the very optimistic side.
If you try to simulate Cliff Lee’s rest of career, you’ll be hard-pressed to get him at least 181 wins 10% of the time. My sims give him 5%.
It is possible that a pitcher can remain unusually healthy in an aytypical fashion that allows him to get into a boom-or-bust mode (RJ, Spahn, Clemens being a huge jump from the rest of the gang). That however would have to be demonstrated first.
6 hours ago
bjames
The method does not anticipate a normal distribution curve for “career remaining wins”, and I’ll assure you that you would not FIND a normal distribution curve for career remaining wins. A normal distribution curve would be an inappropriate tool to use for this problem.
6:10 PM Nov 27th
bjames
The Favorite Toy would not work for making estimates of this nature, and the system that IS used for these estimates is as different from that as could be. I have Cliff Lee at 20%. . .you want him at 24%? Is that right? There is obviously no way to make distinctions that are that fine, given that we’re dealing with the intersection of two data sets--1) Pitchers who win 300 games, and 2) Pitchers who are similar to Cliff Lee.
6:07 PM Nov 27th
glkanter
The more I look at this table, the more troublesome it becomes.
Lee and Verlander re 4 years apart and 12 wins apart, same as CC and Halladay. But the difference in Lee/Verlander Chance at 300 Wins is not as similar as the CC/Halladay Chance at 300 Wins.
I can see only 3 components to this calculation:
Lifetime wins to date
Remaining years
Wins per future season
CC is:
30 years old
Signed to the Yankees for the next 6 years
Has only a spring training groin injury DL trip in his whole career
Wins 19 or 20 games each season
He is not be deficient in any of these categories to Halladay.
TTo
hh T
4:14 PM Nov 25th
tangotiger
Right, it would be a median of 124 wins. But, at that point, we are expecting a fairly normal distribution, so I said average, which is the same thing. But, yes, median is the correct term.
12:14 AM Nov 24th
glkanter
To go from 176 wins to 300 or more wins requires winning at least 124 more games.
10:58 AM Nov 23rd
jedlovec3
I agree with Tangotiger about being careful when rounding ages.
To nitpick one thing: “To have a 50% chance at winning 300 if you have 176 wins means that you are forecasted for an average of 124 wins. “
Technically, a 50 percent chance at winning 300 games means you are forecasted for a MEDIAN of 124 wins, no? The difference is usually irrelevant, since most distributions are close enough to the normal distribution and the median is close to the average. But, sometimes it does make a difference.
As far as the validity of the system and age adjustments, I’ll leave that to Bill.
9:05 AM Nov 23rd
glkanter
I think you covered it pretty well.
I see the percents in that table as being so unreasonable, that 6 wins is immaterial.
I also see the need for precision when digging deeper.
4:20 PM Nov 22nd
tangotiger
The favorite toy would treat the 0.8 difference in birth years as close to 0.5 service years remaining.
If you have an estimate of 15 wins, then that’s 7.5 wins that affects the total. If you are short some 120 wins, that affects the probability by 6 percentage points.
So, that’s part of an explanation as to the actual magnitude. If your point is that it’s six, schmix, then fine.
My point is the difference between bad math and good math. If your point is that it’s math, schmath, then fine.
3:45 PM Nov 22nd
***
tangotiger
Right, it would be a median of 124 wins. But, at that point, we are expecting a fairly normal distribution, so I said average, which is the same thing. But, yes, median is the correct term.
12:14 AM Nov 24th
***
tangotiger
1. I calculated each pitcher’s age as season minus birth year.
2. I took the top 20 pitchers in RA per 9 innings, relative to league, at age 30-33, for pitchers born between 1921 (Spahn) and 1971 (Pedro).
3. I then calculated number of wins from age 34 to end of career. These are the results:
Wins playerID
199 johnsra05
197 spahnwa01
172 clemero02
134 maddugr01
118 glavito02
117 schilcu01
108 seaveto01
106 mussimi01
104 gibsobo01
85 leiteal01
78 fordwh01
72 brownke01
58 coneda01
53 palmeji01
45 stewada01
37 martipe02
35 keyji01
29 scottmi03
25 rogerst01
22 tudorjo01
Given the relatively small number of data points, that actually follows very closely to just a straight line. Indeed, the correlation of those points to a simple straight line is r=0.97.
So, yes, it does not follow a normal distribution, but rather a uniform distribution.
Giving Cliff Lee a 20% chance of getting 181 wins seems on the very optimistic side.
If you try to simulate Cliff Lee’s rest of career, you’ll be hard-pressed to get him at least 181 wins 10% of the time. My sims give him 5%.
It is possible that a pitcher can remain unusually healthy in an aytypical fashion that allows him to get into a boom-or-bust mode (RJ, Spahn, Clemens being a huge jump from the rest of the gang). That however would have to be demonstrated first.
6 hours ago
***
tangotiger
If we treat the chance of getting 20 to 200 wins as equal (a pitcher is just as likely to get 20 wins from age 34-onward as he is to get 35 wins or 78 wins or 126 wins, etc), then figuring out the chance that Cliff Lee can get 181 wins becomes straightforward.
You set your boundary wins (25 and 200 in this case), and see that 181 wins or higher occurs twenty times (200-181+1) out of 176 (200-25+1), or 11%.
So perhaps an easy way to do this, to at least have some sort of sniff test, is to set the “maximum boundary” wins for each age. We see in this case, RJ and Spahn were both close to 200, so that’s an easy one. Do this for every age, and we can come up with a plausible maximum number.
We can set the minimum boundary range to 10% of that figure.
Finally, if we can presume a uniform distribution, then we can come up with an initial sniff-test number for any (really good) pitcher at any age.
2 hours ago
***
tangotiger
Interestingly, if you use my suggestion, and look at ALL pitchers to figure out the maximum boundary, you’ll get much higher totals (thanks to Niekro, Moyer, et al). Now the max will be more like 220 wins at age 34.
Anyway, if you do use the min/max uniform technique, then being 181 wins away means you have a 20% chance of getting to 300 wins.
That matches what Bill and John have basically been saying.
Of course, if you make the min/max as 22/220, then the average is 121 wins. And we know the average can’t really be 121 wins (in between Maddux and Glavine, and we can presume they are not pitchers who have an “average” career path after age 33). So, using that technique means you can’t use a uniform distribution.
The reality is that if the average is closer to 90-100 wins than it is to 121 wins, and even if you keep the top-end at 220 wins, you’ll have to reign enough players toward the center. You have no choice but to bring it down below 20%.
So, 10% seems to be the plausible maximum expectation if you want to build a model. And closer to 5% for a more reasonable model.
3 minutes ago