THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, June 27, 2008

One of many ways that not regressing toward the mean can get you in trouble…

By

Here is a snippet from a BP article by Geoff Young about Adrian Gonzalez, the Padres slugging first sacker (I sound like a real baseball writer!):

So I decided to check out his age 25 stats (from 5/8/07 to 5/7/08) and see just how much he’d built on his success from the previous year. Using the same format from my earlier article, and with the help of David Pinto’s Day-by-Day Database, here’s what I found:
Adrian Gonzalez, Age 24-25 Age AB BA OBP SLG ISO XB/H AB/HR
24 598 .316 .376 .543 .227 .376 18.69
25 650 .282 .344 .498 .216 .432 22.41

Uh-oh. That wasn’t supposed to happen. I had it all figured out: Gonzalez was going to exhibit a slow but steady increase in skills, and the numbers would support what my eyes had led me to believe.

Unfortunately, reality had other ideas.

So, Young thinks that Gonzalez did not progress as a 24 year old should, given that his numbers (say, OPS) went down from .919 to .842, a significant decline.  But wait…


Did he really not progress as we would expect an average player from age 24 to 25?  Actually, he did!  Let’s forget for a moment that his OPS before age 24 was in the .600’s (which is quite relevant, BTW).  At age 24, he had 598 AB’s according to Young’s data.  Let’s say that OPS gets regressed around 50% after that many AB’s (I forgot what the actual regression equation is).  So, his true talent at that point is the average of .919 and .750 (a league average non-pitcher in the NL - of course, Gonzalez is a big left-handed hitting first baseman, so his population mean is probably quite a bit higher than that, but let’s pretend for a moment that the .750 is correct), or .835.  At age 25, he hit .842, which is a progression of 7 points!

Writers and “analysts” make that mistake all the time.  It is sort of a contradiction actually.  A player’s numbers get worse, yet he progresses.  Seems wrong, but it is correct.

The same thing is true for a player who has a bad season.  Let’s say that we have a 23 year old 6 foot one inch, left-handed hitting third baseman, like Alex Gordon.  He hits .725 in his first year (which he did in 07), not too good for a star prospect with those characteristics.  The next year, at age 24, he hits .755 (which he is now).  Well, he has progressed, like he is supposed to, from age 23 to 24, right?  Wrong!

Let’s say that a 23 year old left-handed hitting star prospect of that size hits .790 on the average (I don’t know if that is true, but let’s say that it is).  His true talent as a 23 year old is the average of .722 and .790, or .756, assuming again, a 50% regression.

So his .755 this year is actually WORSE than his true talent last year.  He did not progress!

This is an important concept and one that is hard to wrap one’s hands around.

I may have screwed up the proper amount of regression and some of the numbers, like the appropriate means to regress towards in the above two examples, but it does not matter.  The point remains the same.

(15) Comments • 2008/06/29 • SabermetricsStatistical_Theory
Page 1 of 1 pages

<< Back to main