THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Saturday, February 12, 2011

BPro Reader mail of the day: regression toward the mean, part 2

By Tangotiger, 10:56 AM

tbwhite sent me a file of batting average (why batting average?  I don’ t know.... it bothered me to no end to see those numbers… if you want to make me happy, send me a file of OBP next time please).  A very well thought out file containing 4434 records with, among others, these fields:
A. batting average in year N
B. career BA through year N (would have preferred N-1)
C. league mean in year N
D. a field if BA in year N was above career mean
E. a field if BA in year N was below career mean

I then decided to ran various correlations to see which pieces of data helps us the most.

Unsurprisingly, the one that did the best was the one that used the first three, at r=.55.  This is that equation:
0.22*A + 0.58*B + 0.21*C

This means 21% regression toward the league mean.  The t-values for each coefficient is above 10 making it super-high statistically significant.

Now, what if he ignore the league mean?  Well, you get a very strong r=.53 using just his current year and past career:
0.25*A + 0.74*B

Using past career is better than the league mean.  This is using past single year and league mean at r=.48
0.56*A + 0.46*C

That’s a very strong regression toward the mean.

Now, here’s an interesting one at r=.50
0.99*A - .017*D + .016*E

So, if his BA in year N was above his career, then we drop his batting average by 17 points.  If his BA in year B was below his career mean, then we increase his batting average by 16 points.  That’s just regression toward his past career.

Anyway, you need it all.  You need the player’s most recent performance.  You need his career performance.  And you also need the league mean.  They are all required.

(13) Comments • 2011/02/14 • SabermetricsStatistical_Theory
Page 1 of 1 pages

Latest...

COMMENTS

May 24 21:30
Help needed with sticky issue…

May 24 20:16
Largest demonstration in Canadian history?

May 24 17:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 14:09
Neal Huntington’s best moves

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com

May 24 00:16
Psst… wanna intern… somewhere?

THREADS

February 12, 2011
BPro Reader mail of the day: regression toward the mean, part 2