Saturday, February 12, 2011
BPro Reader mail of the day: regression toward the mean, part 2
tbwhite sent me a file of batting average (why batting average? I don’ t know.... it bothered me to no end to see those numbers… if you want to make me happy, send me a file of OBP next time please). A very well thought out file containing 4434 records with, among others, these fields:
A. batting average in year N
B. career BA through year N (would have preferred N-1)
C. league mean in year N
D. a field if BA in year N was above career mean
E. a field if BA in year N was below career mean
I then decided to ran various correlations to see which pieces of data helps us the most.
Unsurprisingly, the one that did the best was the one that used the first three, at r=.55. This is that equation:
0.22*A + 0.58*B + 0.21*C
This means 21% regression toward the league mean. The t-values for each coefficient is above 10 making it super-high statistically significant.
Now, what if he ignore the league mean? Well, you get a very strong r=.53 using just his current year and past career:
0.25*A + 0.74*B
Using past career is better than the league mean. This is using past single year and league mean at r=.48
0.56*A + 0.46*C
That’s a very strong regression toward the mean.
Now, here’s an interesting one at r=.50
0.99*A - .017*D + .016*E
So, if his BA in year N was above his career, then we drop his batting average by 17 points. If his BA in year B was below his career mean, then we increase his batting average by 16 points. That’s just regression toward his past career.
Anyway, you need it all. You need the player’s most recent performance. You need his career performance. And you also need the league mean. They are all required.


Recent comments
Older comments
Page 1 of 344 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date