Tuesday, February 17, 2009
Battle of the HR Steroid Theory
In this corner we have Nate Silver from a few years ago (though the charts have been inexplicably removed):
As it happens, not only has the increase in the standard deviation failed to keep a proportionate pace with the increase in home run rates, but it has actually decelerated. That is, while offensive output has increased substantially, the playing field has become comparatively more level. Last season, for example, about 19.3 home runs were hit per 650 plate appearances in the National League, with a standard deviation of 11.9. Compare that to 1970, when just 15.6 home runs were hit per 650 PA--about a 20 percent decrease from contemporary levels--but the standard deviation was actually a bit higher, at 12.3. This is far from a perfect experiment. But at the very least, it is highly problematic for the Steroid Gap Theory.
And in that corner is Dan Rosenheck:
This is exactly what happened between 1993 and 2004. Using the standard deviation, a common measure of how tightly a set of numbers is bunched together, performances by both hitters and pitchers were more spread out during that time than in any 12-year period since World War II. Although some of the difference was caused by adding new teams, expansion was much more rapid in the 1960s than the 1990s, and standard deviations were still lower back then. None of this means that steroids are necessarily the cause of the separation.
Two opposite viewpoints of the same data. I will say that aggregating as Dan does is problematic, if the time period was chosen for a specific reason. I’d rather go with a year-by-year approach.


They are not necessarily disagreeing. Dan is pointing out that the standard deviation in player performance increased from 1993-2004, while Nate is arguing that, while the standard deviation did increase, that increase was actually LESS than one would have predicted from the historical relationship between home runs and SD of home runs.
In his WARP file, Dan includes includes a spreadsheet with standard deviations throughout baseball history, as well as the time series of a regression estimate of standard deviations (unfortunately, I don’t know what factors he included in his regression; I know it includes runs scoring, years since expansion, etc,) Looking at that data, Dan’s estimator actually does a better job (by RMSE) of predicting standard deviations from 1993-2004 than it does for the rest of baseball history. So it seems like most of the increase in standard deviations is explainable in terms of other observable factors.