THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, February 17, 2009

Battle of the HR Steroid Theory

By Tangotiger, 04:16 PM

In this corner we have Nate Silver from a few years ago (though the charts have been inexplicably removed):

As it happens, not only has the increase in the standard deviation failed to keep a proportionate pace with the increase in home run rates, but it has actually decelerated. That is, while offensive output has increased substantially, the playing field has become comparatively more level. Last season, for example, about 19.3 home runs were hit per 650 plate appearances in the National League, with a standard deviation of 11.9. Compare that to 1970, when just 15.6 home runs were hit per 650 PA--about a 20 percent decrease from contemporary levels--but the standard deviation was actually a bit higher, at 12.3. This is far from a perfect experiment. But at the very least, it is highly problematic for the Steroid Gap Theory.

And in that corner is Dan Rosenheck:

This is exactly what happened between 1993 and 2004. Using the standard deviation, a common measure of how tightly a set of numbers is bunched together, performances by both hitters and pitchers were more spread out during that time than in any 12-year period since World War II. Although some of the difference was caused by adding new teams, expansion was much more rapid in the 1960s than the 1990s, and standard deviations were still lower back then. None of this means that steroids are necessarily the cause of the separation.

Two opposite viewpoints of the same data.  I will say that aggregating as Dan does is problematic, if the time period was chosen for a specific reason.  I’d rather go with a year-by-year approach.


#1          (see all posts) 2009/02/17 (Tue) @ 17:30

They are not necessarily disagreeing.  Dan is pointing out that the standard deviation in player performance increased from 1993-2004, while Nate is arguing that, while the standard deviation did increase, that increase was actually LESS than one would have predicted from the historical relationship between home runs and SD of home runs.

In his WARP file, Dan includes includes a spreadsheet with standard deviations throughout baseball history, as well as the time series of a regression estimate of standard deviations (unfortunately, I don’t know what factors he included in his regression; I know it includes runs scoring, years since expansion, etc,) Looking at that data, Dan’s estimator actually does a better job (by RMSE) of predicting standard deviations from 1993-2004 than it does for the rest of baseball history.  So it seems like most of the increase in standard deviations is explainable in terms of other observable factors.


#2    Matt Mitchell      (see all posts) 2009/02/17 (Tue) @ 19:45

It’s not exactly apples-to-apples between the two articles. Nate is talking about HR rates, while Dan seems to be talking about R/G (though he doesn’t explicitly say what he means by “run scoring levels"). Nevertheless, I’m a little skeptical of Dan’s findings, as he doesn’t seem to account for the ballpark boom that created more homer-friendly yards in many places (Cincy, anyone?).

As for the graphics in Nate’s article, they were still there for me, so it may just be a filter on whatever computer you were using, Tango.


#3    Blackadder      (see all posts) 2009/02/17 (Tue) @ 20:21

Dan is talking about the standard deviation his version of Wins Above Average, which he uses to compute his version of WARP.  The numbers are park-adjusted, so Coors should not be a problem, except insofar as Coors raises overall run scoring, and overall run scoring is positively correlated with standard deviations in player performance.


#4    Guy      (see all posts) 2009/02/17 (Tue) @ 20:29

This nice piece by Dan Fox, http://www.baseballprospectus.com/article.php?articleid=5813, shows that the CV for SLG did spike up a bit in the 1990s.  However, I’m not sure that we should expect that an increase in HR rates should result in the same spread of talent.  Tango likes to say that Coors field doesn’t impact Juan Pierre and Dante Bichette the same.  Isn’t the same true of a juiced ball?  (or any other non-PED explanation one favors) Some players are strong enough to see their long fly outs and doubles become HRs, while others are not.  Over time, the game adapts—players with no power find it harder to stay in MLB, maybe defense gets less priority.  But I would think a big change in the game would, at least for a while, increase the talent spread.  I’d imagine you would see the same thing if you looked at the NBA after the 3-point shot was introduced.


#5    Dan Rosenheck      (see all posts) 2009/02/18 (Wed) @ 10:27

Yeah, Nate and I are not looking at the same stat at all.  I am studying batting and pitching wins above average, Nate is studying HR per 650 PA.  Barry Bonds, for example, is off the chart in my metric from 2002-2004, posting marks of 14.2, 11.4, and 13.0 batting wins above average per full season played in those years, while his HR/650 PA are in the 50’s due to all the walks, which is high but not historic. 

I definitely calculate standard deviations for each league-season independently rather than a multi-year aggregate.

Wins above average are park-adjusted, so I certainly do account for any “ballpark boom.”

Tango, is there anywhere I can post my data on your site?


#6    Tangotiger      (see all posts) 2009/02/18 (Wed) @ 10:36

I can give you an FTP id/pwd if you like.  Send me an email, and I’ll set you up.  This goes for anyone else who wants space.  I get unlimited space/bandwidth, so there is no cost issue.

tom~tangotiger~net


#7    Tangotiger      (see all posts) 2009/02/18 (Wed) @ 10:36

Let me know what password you want, otherwise, I’ll give you an ugly looking one.


#8    Dan Rosenheck      (see all posts) 2009/02/18 (Wed) @ 10:45

To be extra-clear as well, I am studying batting wins above *positional* average rather than overall league average.  For my purposes, I am not interested in changes in between-position standard deviation (whether 1B outhit SS by 1, 2, 3, 4, or 5 runs a year), since that has no impact on actual player value as long as teams have to put one player at every position.

As for the question of my regression estimation of standard deviations and the seeming lack of unexplained variance/high residuals that Blackadder raises, I think we have to be careful.  The 1993-2004 years are, of course, included in the dataset that I use to derive my regression equation, so a big reason why I come up with such strong correlations between stdevs on the one hand and run scoring and expansion on the other is because when run scoring was high and there were two expansions in the 1990’s, there were also very high standard deviations.  Perhaps a better test would be use a regression model just on 1893-1992, and see how that equation’s projections compare to the 1993-2004 results.

I am only an amateur stats guy, so it would be great if someone on this board better trained than me could do some tests and draw some conclusions from my data.


#9    Guy      (see all posts) 2009/02/18 (Wed) @ 12:12

Hmmm, not sure I buy the within-position approach.  Seems like a few great players at weak positions could have a big impact on your SD.  Since 1920, 3 of the top 4 OPS+ SSs (you can guess their names), and 5 of top 10, played mainly in the post-1992 period.  Also four of the top 10 2Bmen.  If you move Nomar, Kent, and ARod to 3B, and shift Jeter and Biggio to CF, would your SD spike largely disappear?  If so, I think it would be hard to be confident this reflects PEDs.  (If you move these guys, then you presumably replace your worst-hitting 3Bmen and CFs with weaker-hitting SSs and 2Bmen, but I doubt that offsets the impact of these great hitters).


#10    Tangotiger      (see all posts) 2009/02/18 (Wed) @ 12:24

I’m with Guy here.  The change in spread can simply represent the philosophy in how teams decide to spread their talent around.

Indeed, you can have the exact same players with the exact same stats in back-to-back years, and depending on which positions those players played, you can have very different standard deviations the way Dan calculates it.


#11    Colin Wyers      (see all posts) 2009/02/18 (Wed) @ 14:25

Even assuming (for the sake of arguement) that Dan’s figuring the SDs correctly with the proper data, is there a reason that we can’t explain the change with non-steroids environmental changes? (Like, oh, a juiced ball?) That to me was the big red flag I got reading the article.


#12    Guy      (see all posts) 2009/02/18 (Wed) @ 15:02

Colin:  I believe Dan adjusts for run environment, so an increase in scoring won’t necessarily increase his SDs.  Although, as I suggested in comment #4, I can see how a league-wide change (such as livelier ball) might increase the SDs.


#13    Colin Wyers      (see all posts) 2009/02/18 (Wed) @ 15:16

A livelier ball would (I think) intuitively lead to a higher SD in offensive talent - FB hitters would stand to benefit more than the rest of the league.


#14    Guy      (see all posts) 2009/02/18 (Wed) @ 15:21

Maybe.  On the other hand, a juiced ball probably increases hits on all BIP.  BABIP made a huge jump from 1992 to 1994, roughly .280 to .300.  Although HRs have gotten all the attention, that 20-point increase probably increased scoring by about .5 R/G (if my back-of-envelope estimate is right).  So GB hitters could have gained as well.


#15          (see all posts) 2009/02/19 (Thu) @ 00:03

Even pitchers saw a spike in their BABIP while hitting from 1992-1994.

Throw in livelier balls, smaller ballparks, expansion, proportionally larger pitching staffs, good hit/ok glove guys at 2b & ss, along with some guys on steroids.


#16    Colin Wyers      (see all posts) 2009/02/19 (Thu) @ 00:37

Looking at 1988-1999 (probably the most detailed part of the Retroera), here’s what I get, going from ‘88 to ‘92 and ‘93 to ‘99:

BABIP:

.280
.296

GB_BABIP:

.214
.234

AB_BABIP:

.324
.336

So, roughly 12 points in AB_BABIP, compared to 20 points in GB_BABIP. Considering that this analysis excludes home runs, and that an AB BIP is probably more valuable in terms of extra-base hits, I think it’s safe to say that air ball hitters benefited more from the change in environment than ground ball hitters.


#17    Guy      (see all posts) 2009/02/19 (Thu) @ 10:37

Colin:
I suspect you’re right, but I’m not sure these numbers demonstrate that.  You must have included LDs in the AB category. But I don’t think FB hitters necessarily hit more LDs than GB hitters, so those should really be excluded.  We really want to compare GBs to FBs.  Also, hitters overall hit more GBs than FBs, so the increase in BA on GBs has proportionately more impact (i.e. a GB hitter hits many more GBs than FBs, while a FB hitter still hits a lot of GBs). 

On the other hand, to really answer Dan’s question you would want to factor in HRs as well, and I’m sure that does greatly increase the benefit to FB hitters.

To further complicate matters, we can’t just take GB/FB tendencies as a given.  As balls started to fly faster and farther, I assume some hitters changed their approach.  And teams promoted more FB hitters to the majors.  This would tend to reduce SDs (assuming they did expand initially).  The question is how fast that happened.

(BTW, if you haven’t done so you should probably include ROEs as hits, which will help the GB hitters a bit.)


#18    Dan Rosenheck      (see all posts) 2009/02/19 (Thu) @ 14:36

I have uploaded two files to Tango’s site (tangotiger.net/rosenheck).  The first, Rosenheck WARP.zip, is just the archive with my position player WARP data from 1893-2005.  It includes a glossary.  To hear more about how the numbers are calculated, check out the interminable thread at http://www.baseballthinkfactory.org/files/hall_of_merit/discussion/dan_rosenhecks_warp_data.

The second file is just my standard deviation data for 1893-2005 with fielding and baserunning removed, along with all the variables that I thought might correlate to it.  If anyone can get a regression equation that works better than the one I’m using, please let me know.


#19    Tangotiger      (see all posts) 2009/02/19 (Thu) @ 15:29

In Colin/16, “AB” is “air ball”.  Clearly, using “AB” to mean something other than “at bat” is more than confusing. 

***

Handy link for those who don’t like to cut/paste:
http://www.tangotiger.net/rosenheck/


#20    Colin Wyers      (see all posts) 2009/02/19 (Thu) @ 15:36

I run into the same problem when I want to talk about batted balls - the solution I’ve come up with is to use “on contact” instead.

I’ve taken Guy’s suggestions and rolled them up into a blog post, complete with graphs:

http://statspeak.net/2009/02/bringing-home-the-bacon.html


#21    Tangotiger      (see all posts) 2009/02/19 (Thu) @ 15:38

I call those “contacted balls”.

“Air balls” is fine, but use “Air” or “AirB” instead of “AB” as the shortform.


#22    Guy      (see all posts) 2009/02/19 (Thu) @ 16:42

Nice writeup by Colin.  I’ll be interested to hear what Dan R thinks about this data.  My only (tiny) disagreement is with the statement that “there was a league-wide change in offense that seems to favor power hitters.” Seems more accurate to say it favored FB hitters, over GB hitters.  The biggest beneficiaries might have been guys who hit a lot of FBs but with only mediocre power (at least that’s what the results in the park factor thread seem to indicate about parks that inflate HRs).  In any case, we’d need to check to see if power hitters really gained proportionately more.

I think he’s probably right that pitchers made more effort to keep ball on the ground after 1993 (and teams selected pitchers with that skill), though it may be hard to prove that.  Mostly, the value of strikeouts increased, and that has increasingly been an essential survival skill for pitchers.

Apropos of nothing, that 1987 power spike really stands out.  I suspect MLB experimented with a new ball, and maybe stepped back after seeing the result.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 01:57
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential