THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Thursday, September 23, 2010

Simple example of regression toward the mean

By Tangotiger, 10:46 AM

Hawerchuk takes the scoring records of the top 20 scorers for the last 9 (qualifying) seasons:

......... GP     G     A     Pts
Year 0     79     34     49     83

This is the important part: because he selected based on the metric he is studying, this is a selection bias.  This group of players will not perform the same in Year T+1.  And indeed, they don’t:

......... GP     G     A     Pts
Year 0     79     34     49     83
Year 1     71     27     40     67

This is unsurprising in terms of direction, and could be surprising in terms of magnitude.  Could we have guessed what T+1 should have been?  Why yes, it should be the same as T-1:

......... GP     G     A     Pts
Year 
-1     73     29     40     69
Year 0     79     34     49     83
Year 1     71     27     40     67

Once you have your selection bias, in order to figure out what you actually have, you need to look at the out-of-sample data.  I have no doubt at all that if you repeat this in baseball, top 20 in OBP for the last 10 years, top 20 in SLG, top 20 in HR/PA, top 20 in ERA, top 20 in SB/time on base, top 20 in your high school math competition, top 20 in stock performance over the last 12 months and doing so for the previous 10 seasons (though in this case we’ll need to do it relative to the index *), etc, you will find this: the rates in T+1 will match those in T-1.

To the extent that it doesn’t, then either you have a sample of really young or really old players, or just bad luck.  We’ve been beating this drum for a long time.  In order to believe this, you need to roll up your sleeves and prove me right (or wrong).  It’s time to take out your fishing rods.

(*) The reason that it won’t necessarily work in stocks is that companies may genuinely change.  Not to mention stock prices is not something intrinsic but perceived.  The “performance” of the stock is really not a true performance.  With players, they don’t change that much, and their performance is their performance.  Anyway, I’m sure there are some stock guys reading this, and are looking to do some work for an hour, learn something, and teach their colleagues something.  Go for it.

Page 1 of 1 pages

Latest...

COMMENTS

May 26 07:27
“Why Kickstarter works”

May 26 03:03
Pete Palmer’s new book: Basic Ball

May 26 01:11
Largest demonstration in Canadian history?

May 25 19:41
What sabermetrics is NOT

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

September 23, 2010
Simple example of regression toward the mean