Tuesday, July 22, 2008
Intraclass correlation
Pizza talks about it all the time, and has a blog post to it. But, darned if I know what it’s actually doing. When I do my thing, I figure the z-score for the stat for each player (number of standard deviations, SD, from the mean), and then calculate the SD of the z-scores. The correlation is r = 1 - 1/SDzScore^2. So, if the SD of all the z-scores is 1.41, then r = .50. If it’s 2.0, then r=.80. I think this is what Pizza also does, and so, I guess I’m doing an intraclass correlation without even knowing it. Regardless, what I do seems sound. I like what Guy said in the comments in response to my comment:
# tangotiger says:
May 23rd, 2007 at 2:46 pm
In order to increase the r, you can increase the variance of your population (i.e., introduce bad pitchers). All of a sudden, your r goes from .18 to .25, without anything else changing.
K has a high correlation because of the huge spread in K rates to begin with.
There’s no such possibility in MLB for BABIP, since 75% of the PA end with a BIP. Jeff Weaver’s .500 BABIP this year, even if true/real, couldn’t exist long enough for us to detect, since you’ll never be allowed to pitch. A guy can K at half the league rate, if he can walk guys at half the league rate as well.
All the correlation shows is if you can see the signal in the noise, and does not tell you how real the signal is.
# Guy says:
May 23rd, 2007 at 7:48 pm
“All the correlation shows is if you can see the signal in the noise, and does not tell you how real the signal is.”
I agree with Tango, but would say it slightly differently. Correlation tells us the ratio of signal to noise, but doesn’t tell us how significant the signal is to baseball outcomes. For one thing, for BABIP the noise is greater than the other stats: SD for 750 BIP is about .017, while SD for K/PA on 1,000 PA is .011. More importantly, a proportionate change in BABIP has much more impact on runs allowed: a .270 pitcher will be much more successful than a .300 pitcher, but a similar 10% difference in K-rate is no big deal (the BABIP difference equals .7 runs/game, vs. .2 R/G for the K difference).
Do this thought experiment: suppose that the ICC for HBP/9 was 1.0. All signal, no noise. Would that make it more of “real skill” than K-rate or BB-rate? We still wouldn’t care, because it has a trivial impact on RA. Skills matter to the extent they help you win games. All that matters is the amount of variation in true talent, in terms of the impact on RA. If you look at it that way, you’ll find that the true talent variations in BABIP, while appearing small, are nearly as consequential as differences in the other 3 skills. For example, Clay Davenport looked at AAA pitchers who made the majors vs. those who didn’t, and the BABIP difference between the two groups was roughly comparable to the other 3 metrics in terms of RA.
We should stop talking about correlation as telling us how “real” a skill is. The amount of noise is completely irrelevant, except in the sense that it makes it harder for us to figure out who has the skill. What matters is the size of the signal, translated into runs.