Thursday, April 05, 2007
Would Babe Ruth Be Average Today?
Six or seven years ago, I dared ask the question on Baseball Boards. I laid out the process and came up with the answer: yes. The discussion that followed was illuminating and exasperating. The least appealing of the comments, rather than criticizing the process, criticized the conclusion! The problem, which no one pointed out at the time, was regression toward the mean, the single-most important concept to understand, if you are going to analyze sample data. I didn’t know about the concept back then. Once you handle that, the answer changed dramatically: no. Ruth would still be RUTH, but not so RUTHIAN.
The process was similar to what Dick Cramer did, as explained in The Hidden Game of Baseball, but I handled the age adjustment (he didn’t, it seemed). His results can be dismissed. I intended to finally write the followup for THT two years ago, but ended up shelving it. I’ve always intended to finish it up.
BP’s Between the Numbers looked at the issue, but, the execution was lacking. The drawing of the adjusted line really didn’t make much sense. It almost seemed like the author realized the problem, and couldn’t put his finger on it. That work too I would dismiss. Bill James’ timeline is also an effort to just put something in. Now we’ve got David Gassko handling the problem. If we look at his chart, we see that a player around 1950 would have his wOBA of .400 drop to .340 today. That’s a drop of 31 runs per 600 PA. This seems almost as preposterous as Cramer’s findings, even though he took care of the regression toward the mean issue. (My guess is that David didn’t handle the age adjustment.) But, there’s a part two coming up, so we’ll see what he did. IIRC, my work suggests about a 10-run change per 600 PA in that time period, and virtually flat in the last 30 years or so. I really ought to dust that off. The process is semi-reasonable, and the results pass the sniff test.
I remember that old Baseball Boards discussion, and what we saw at the time still roughly holds in Gassko’s graph - when the slope is so steady over time (barring blips like WW2), it’s got to make you think that there’s a year-to-year error over the entire timespan (aging, not enough regression, whatever).
I agree that the theory that quality has improved almost continuously over time seems reasonable, but I’m not sure it passes the sniff test that the increase would be so constant over time. Wouldn’t we expect to see more dramatic increasing during integration or during the period where the Latin presence in baseball really took off?