Thursday, June 26, 2008
One of many ways that not regressing toward the mean can get you in trouble…
Here is a snippet from a BP article by Geoff Young about Adrian Gonzalez, the Padres slugging first sacker (I sound like a real baseball writer!):
So I decided to check out his age 25 stats (from 5/8/07 to 5/7/08) and see just how much he’d built on his success from the previous year. Using the same format from my earlier article, and with the help of David Pinto’s Day-by-Day Database, here’s what I found:
Adrian Gonzalez, Age 24-25 Age AB BA OBP SLG ISO XB/H AB/HR
24 598 .316 .376 .543 .227 .376 18.69
25 650 .282 .344 .498 .216 .432 22.41Uh-oh. That wasn’t supposed to happen. I had it all figured out: Gonzalez was going to exhibit a slow but steady increase in skills, and the numbers would support what my eyes had led me to believe.
Unfortunately, reality had other ideas.
So, Young thinks that Gonzalez did not progress as a 24 year old should, given that his numbers (say, OPS) went down from .919 to .842, a significant decline. But wait…
Did he really not progress as we would expect an average player from age 24 to 25? Actually, he did! Let’s forget for a moment that his OPS before age 24 was in the .600’s (which is quite relevant, BTW). At age 24, he had 598 AB’s according to Young’s data. Let’s say that OPS gets regressed around 50% after that many AB’s (I forgot what the actual regression equation is). So, his true talent at that point is the average of .919 and .750 (a league average non-pitcher in the NL - of course, Gonzalez is a big left-handed hitting first baseman, so his population mean is probably quite a bit higher than that, but let’s pretend for a moment that the .750 is correct), or .835. At age 25, he hit .842, which is a progression of 7 points!
Writers and “analysts” make that mistake all the time. It is sort of a contradiction actually. A player’s numbers get worse, yet he progresses. Seems wrong, but it is correct.
The same thing is true for a player who has a bad season. Let’s say that we have a 23 year old 6 foot one inch, left-handed hitting third baseman, like Alex Gordon. He hits .725 in his first year (which he did in 07), not too good for a star prospect with those characteristics. The next year, at age 24, he hits .755 (which he is now). Well, he has progressed, like he is supposed to, from age 23 to 24, right? Wrong!
Let’s say that a 23 year old left-handed hitting star prospect of that size hits .790 on the average (I don’t know if that is true, but let’s say that it is). His true talent as a 23 year old is the average of .722 and .790, or .756, assuming again, a 50% regression.
So his .755 this year is actually WORSE than his true talent last year. He did not progress!
This is an important concept and one that is hard to wrap one’s hands around.
I may have screwed up the proper amount of regression and some of the numbers, like the appropriate means to regress towards in the above two examples, but it does not matter. The point remains the same.
When you have kids, are you going to replace the growth chart on the wall with a regressed version that compares them to other children of similar ages?
Or, to stretch out the example, if your 6 year old goes from 4’0 to 4’1, are you going to explain to him that he didn’t actually get taller, because the growth chart says he should now be 4’3? Or would your kid look at you with a confused look when you tried to explain that 4’1 was actually not taller than 4’0?
Maybe I’m just hanging out with the wrong people, but to me, progress is defined at the specific individual level. I understand what you’re saying, and from a projection standpoint, I don’t disagree, but I’m not sure that anyone I know defines progress as you do here.