Thursday, March 11, 2010
Forecasting Home Runs in 2009
From 2006-2008, the MLB home run leaders were: Ryan Howard (58), Alex Rodriguez (54), and Ryan Howard (48). It seems safe to say that the league-leader in home runs in 2009 should have been somewhere close to 50. But, who could we have guessed in March of 2009?
There were 31 players who hit at least 35 home runs at least once during those three years:
n35 HR3 HR162 Player
1 87 30.9 Bay, Jason
1 101 36.1 Beltran, Carlos
1 108 38.2 Berkman, Lance
1 71 43.0 Braun, Ryan
1 97 33.3 Cabrera, Miguel
2 100 36.6 Delgado, Carlos
3 120 42.7 Dunn, Adam
1 106 40.8 Dye, Jermaine
1 112 38.8 Fielder, Prince
1 83 40.2 Giambi, Jason
1 85 34.5 Glaus, Troy
1 90 30.7 Gonzalez, Adrian
1 71 34.2 Hafner, Travis
1 64 28.7 Hall, Bill
1 95 33.2 Holliday, Matt
3 153 52.2 Howard, Ryan
1 70 31.3 Jones, Andruw
1 88 34.4 Konerko, Paul
1 97 36.3 Lee, Carlos
1 51 37.3 Ludwick, Ryan
2 112 42.5 Ortiz, David
1 78 43.5 Pena, Carlos
2 118 42.3 Pujols, Albert
1 50 34.2 Quentin, Carlos
1 91 34.2 Ramirez, Aramis
2 92 36.2 Ramirez, Manny
3 124 43.9 Rodriguez, Alex
1 108 40.9 Soriano, Alfonso
1 81 29.5 Swisher, Nick
1 73 34.7 Thomas, Frank
2 111 44.5 Thome, Jim
To read this chart: Ryan Howard hit at least 40 HR three times, for a total of 153 home runs from 2006-2008. His average HR rate per 700 plate appearances (the equivalent of a full 162 game season) was 52.2.
It’s safe to say that if we had to guess on a home run leader for 2009, that it would be one of these 31 players. Just for the sake of illustration, let’s put in some semi-intelligent odds of each player winning the HR title in MLB in 2009 as follows:
n35 HR3 HR162 Odds Player
3 153 52.2 10% Howard, Ryan
1 112 38.8 6% Fielder, Prince
3 120 42.7 6% Dunn, Adam
3 124 43.9 6% Rodriguez, Alex
1 71 43.0 6% Braun, Ryan
2 118 42.3 6% Pujols, Albert
1 78 43.5 4% Pena, Carlos
2 111 44.5 4% Thome, Jim
1 106 40.8 4% Dye, Jermaine
1 97 33.3 4% Cabrera, Miguel
2 100 36.6 4% Delgado, Carlos
1 108 38.2 4% Berkman, Lance
1 101 36.1 4% Beltran, Carlos
2 92 36.2 2% Ramirez, Manny
1 108 40.9 2% Soriano, Alfonso
1 90 30.7 2% Gonzalez, Adrian
1 51 37.3 2% Ludwick, Ryan
2 112 42.5 2% Ortiz, David
1 83 40.2 1% Giambi, Jason
1 95 33.2 1% Holliday, Matt
1 91 34.2 1% Ramirez, Aramis
1 85 34.5 1% Glaus, Troy
1 87 30.9 1% Bay, Jason
1 97 36.3 1% Lee, Carlos
1 88 34.4 1% Konerko, Paul
1 50 34.2 1% Quentin, Carlos
1 81 29.5 1% Swisher, Nick
1 64 28.7 1% Hall, Bill
1 73 34.7 1% Thomas, Frank
1 71 34.2 1% Hafner, Travis
1 70 31.3 1% Jones, Andruw
9% Someone else
The total has to obviously come out to 100%. If we look at Pujols, we gave him odds of 6%, which is the next highest number after Ryan Howard. If the top-end expectation for Pujols is roughly 50 HR, then the average HR expectation will obviously be less than 50 HR. Let’s give Pujols this kind of HR expectation, again, purely for the sake of illustration:
50+ 6%
45-49 9%
40-44 12%
35-39 15%
30-34 18%
25-29 12%
20-24 10%
15-19 8%
10-14 5%
5-9 3%
0-4 2%
That seems like a reasonable kind of range. It includes his chance of injuries or possible bad year (for him). And it includes the chance of him winning the HR crown. The average of the above is 31 HR. So, when you look at a forecast for the number of HR for Pujols, and you see “31”, that number actually means “I have no idea how many HR he will hit, other than it will be centered around 31, give or take 20 or 30 HR”. And that’s pretty much the best we can do.
Can we prove that? A simple forecasting system I developed is called Marcel The Monkey Forecasting System, or The Marcels, for short. It’s named after the monkey from the TV show Friends. I also like the name Marcel for the hockey great Marcel Dionne so even if the name looks dated, you can think of Dionne instead. Anyway, Marcel listed 13 players as having a forecasted mean of 28 or more home runs for the 2009 season. Here are those hitters:
40 Howard, Ryan
32 Rodriguez, Alex
32 Fielder, Prince
32 Dunn, Adam
32 Braun, Ryan
31 Pujols, Albert
31 Pena, Carlos
30 Thome, Jim
29 Dye, Jermaine
28 Delgado, Carlos
28 Cabrera, Miguel
28 Berkman, Lance
28 Beltran, Carlos
Now, remember what I said, and this is important: we are NOT forecasting Pujols to hit 31 HR in 2009. We forecasted him to hit 31 HR give or take 20 or 30 HR. You apply that same kind of thinking for each of the above players. And, we are NOT forecasting Ryan Howard to led the league with 40 HR. We ARE forecasting SOMEONE to hit around 50 HR. And these guys our among our best bets. With the top-end of each of these hitters close to 50 HR, obviously the average will be much lower.
How many HR did these players hit in 2009?
47 Pujols, Albert
46 Fielder, Prince
45 Howard, Ryan
39 Pena, Carlos
38 Dunn, Adam
34 Cabrera, Miguel
32 Braun, Ryan
30 Rodriguez, Alex
27 Dye, Jermaine
25 Berkman, Lance
23 Thome, Jim
10 Beltran, Carlos
4 Delgado, Carlos
As you can see, it runs the gamut from Delgado’s 4 to Pujols’ league-leading 47. These 13 hitters were forecasted to hit a combined 401 HR in 2009. And how many HR did they actually hit in 2009? 400. That’s right, Marcel nailed it.
So, the forecasting systems work… if you know how to properly interpret what it is they are trying to tell you.


For me, your illustrative distribution for Pujols tells a lot that most non-stats folks won’t get (and I see this in many more places than just baseball)… I think people, in general, don’t understand that when you “project” Pujols to hit 31 HR that you really only think he has ~20% chance of being within a couple HR of that actual number… If someone knows how to convey the distribution behind the projection in a way that won’t make non-stats people’s eyes glaze over, I’d be interested in hearing about it…