THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, July 22, 2008

Intraclass correlation

By Tangotiger, 10:56 AM

Pizza talks about it all the time, and has a blog post to it.  But, darned if I know what it’s actually doing.  When I do my thing, I figure the z-score for the stat for each player (number of standard deviations, SD, from the mean), and then calculate the SD of the z-scores.  The correlation is r = 1 - 1/SDzScore^2.  So, if the SD of all the z-scores is 1.41, then r = .50.  If it’s 2.0, then r=.80.  I think this is what Pizza also does, and so, I guess I’m doing an intraclass correlation without even knowing it.  Regardless, what I do seems sound.  I like what Guy said in the comments in response to my comment:


# tangotiger says:

May 23rd, 2007 at 2:46 pm

In order to increase the r, you can increase the variance of your population (i.e., introduce bad pitchers). All of a sudden, your r goes from .18 to .25, without anything else changing.

K has a high correlation because of the huge spread in K rates to begin with.

There’s no such possibility in MLB for BABIP, since 75% of the PA end with a BIP. Jeff Weaver’s .500 BABIP this year, even if true/real, couldn’t exist long enough for us to detect, since you’ll never be allowed to pitch. A guy can K at half the league rate, if he can walk guys at half the league rate as well.

All the correlation shows is if you can see the signal in the noise, and does not tell you how real the signal is.

# Guy says:

May 23rd, 2007 at 7:48 pm

“All the correlation shows is if you can see the signal in the noise, and does not tell you how real the signal is.”

I agree with Tango, but would say it slightly differently. Correlation tells us the ratio of signal to noise, but doesn’t tell us how significant the signal is to baseball outcomes. For one thing, for BABIP the noise is greater than the other stats: SD for 750 BIP is about .017, while SD for K/PA on 1,000 PA is .011. More importantly, a proportionate change in BABIP has much more impact on runs allowed: a .270 pitcher will be much more successful than a .300 pitcher, but a similar 10% difference in K-rate is no big deal (the BABIP difference equals .7 runs/game, vs. .2 R/G for the K difference).

Do this thought experiment: suppose that the ICC for HBP/9 was 1.0. All signal, no noise. Would that make it more of “real skill” than K-rate or BB-rate? We still wouldn’t care, because it has a trivial impact on RA. Skills matter to the extent they help you win games. All that matters is the amount of variation in true talent, in terms of the impact on RA. If you look at it that way, you’ll find that the true talent variations in BABIP, while appearing small, are nearly as consequential as differences in the other 3 skills. For example, Clay Davenport looked at AAA pitchers who made the majors vs. those who didn’t, and the BABIP difference between the two groups was roughly comparable to the other 3 metrics in terms of RA.

We should stop talking about correlation as telling us how “real” a skill is. The amount of noise is completely irrelevant, except in the sense that it makes it harder for us to figure out who has the skill. What matters is the size of the signal, translated into runs.

#1    Rally      (see all posts) 2008/07/22 (Tue) @ 11:54

I think Pizza uses intraclass correlation to get an r or r^2 for more than two columns.

For example, instead of comparing babip for 2004 and 2005 and then for 2005 to 2006, he gets one result for 2004 to 2005 to 2006...and add more years if you like.

I don’t think it can be done in excel but I think he’s got more advanced statistical tools than that.

I’m sure Mr. Cutter will let me know if I’ve got that wrong.


#2    Eli      (see all posts) 2008/07/22 (Tue) @ 12:12

I think what it does is something like the following:

Say you have one season’s worth of player stats, divided by game. You could find the correlation between each player’s first half stats and his second half stats (in some particular metric). You could also find the correlation between each player’s odd-numbered game stats and his even-numbered game stats. These are two ways of dividing the 162 games in half. But obviously there are a lot of other ways you could group them into two 81-game subsets. If you tried every possible division and found the correlation for each one, and then averaged all those correlations, I think you’d get something like the intraclass correlation. Actually, you’d really get Cronbach’s alpha, according to this:

(click name)

But the following suggests that Cronbach’s alpha is equivalent to the “stepped-up consistency version” of intraclass correlation, whatever that is:

http://en.wikipedia.org/wiki/Cronbach’s_alpha#Cronbach.27s_alpha_and_the_intra-class_correlation


#3    Tangotiger      (see all posts) 2008/07/22 (Tue) @ 13:20

I really don’t think that’s what he’s doing, but I’m very confused so I don’t know.

It’s possible that what he’s doing is taking a random set of half the PAs in one season and doing a correlation to the other half, and then repeating for another random set, such that he gets a virtual infinite set of PA to correlate to the other half. Maybe that’s why he said not to add too many years.

Like I said, I’m super confused and invest alot of time in what Pizza writes.  (Then again, I’m sure others might feel the same way about what I do!)


#4    MGL      (see all posts) 2008/07/22 (Tue) @ 15:00

It’s possible that what he’s doing is taking a random set of half the PAs in one season and doing a correlation to the other half, and then repeating for another random set,

Without reading his description or poking around on the net, that sounds like a reasonable way of doing it.  More than reasonable.


#5    Pizza Cutter      (see all posts) 2008/07/22 (Tue) @ 23:57

Tom, check your spam filter… I think that my epic novel on the subject of ICC may have been stopped for further questioning.


#6    Tangotiger      (see all posts) 2008/07/23 (Wed) @ 01:26

Pizza, nothing there.  I hope that you made a copy of your treatise before posting here.


#7    Pizza Cutter      (see all posts) 2008/07/23 (Wed) @ 13:13

The short version is “it’s kinda like a year-to-year correlation, only it uses more than two years.” In theory, it could expand to an infinite number of years.

ICC is part of a technique known as hierarchical linear modeling (HLM, acronym soup!) HLM was originally designed for educational psychologists and analysts to look at what factors influence performance on standardized tests.  Scores will certainly measure something about the kids innate talent.  However, suppose the kid has a good (or awful) teacher.  You’d expect see some increase (or decrease) in performance across all 30 kids in that classroom.  We might also see that all the kids at this particular school tend to do well, or that kids in this district.  In political-speak, it’s a good way for assigning blame and credit for test scores.

ICC is just the lowest level of HLM.  It’s a measure of how much variance is explainable by individual variation in talent.  The specific ICC stat that I use is called AR(1) rho.  It’s designed to handle longitudinal data/multiple measures (like multiple years of data).

What happens is that you put in 3 or 4 or 1200 years worth of data.  The computer then creates a covariance matrix around these data.  For AR(1) specifically, the matrix incorporates an auto-regressive component.  The idea is that if I’ve hit 18 HR two years ago, and 20 last year, then a good bet on me, mathematically, is 22, knowing absolutely nothing else.  (I’m sure the actual algorithim is more sophisticated, but my mathematical knowledge ends there.) The idea though is that consistency isn’t based on hitting 20-20-20-20 HR a season, but staying true to a trend line.

What results is a number that looks and reads like a correlation, but is more than just your simple year-to-year.  It can be interpreted in much the way that a y-t-y can be (if you take the R-squared, you can say things like X% of the variance is consistent from year to year.)

The advantage of ICC is that it’s more stable than year to year based on the fact that it has more data available.  In the comments section of what you link to, I ran the y-t-y for different combinations of years for BABIP.  They ranged from .095 to .281.  Actually, had Voros done his study a year later, he might have seen .281 and saw that it was in the ballpark of HR rate and Sabermetric history might have been different.


#8    Tangotiger      (see all posts) 2008/07/23 (Wed) @ 13:22

What level of data do you need?  For example, could you simply have 1 seasonal line per player, one season only?  Or, do you need to have at least 2 lines per player?  For example, if you look only at 2007 data, do you need at least to have some set of 300 PA for your player and a second set of 300 PA?  And, would more granular information, say one record per PA per player be better?


#9    Pizza Cutter      (see all posts) 2008/07/23 (Wed) @ 15:42

In SPSS, you need one line per player per amount of time that you want to measure.  Since the player-season is the most common unit of analysis, I use that.  So, I have a line for David Wright 2005, David Wright 2006, David Wright 2007.  If you wanted to do Wright first half then Wright second half, those would be two separate lines, but equally as valid.

You can get an ICC for one time point only, although it would not be AR(1).  You’d probably have to use a diagonal covariance matrix structure (I only sorta know what that means too...)


#10    wcw      (see all posts) 2008/07/24 (Thu) @ 02:25

I really haven’t been paying much attention, but I can say one thing more-or-less objectively: spss is the devil.  Fire it and use R, or if you somehow are memory-constrained, Sas, or if you are both memory- and pocketbook-constrained (though this seems impossible, since you can afford spss), buggy-but-basically-functional-now pspp.

Please.  Ugh, spss.  It’s a network-effect catastrophe that rivals mysql.  I don’t know which I hate more—though I lean to spss.  And that speaks volumes for anyone who knows how terrible mysql is.


#11    Pizza Cutter      (see all posts) 2008/07/24 (Thu) @ 10:21

I’ve never used R, but SAS is a programmer’s program, and I am not a programmer by any stretch of the imagination.  (In other words, SAS confuses me to no end.) And for what it’s worth, I can’t afford SPSS.  I’m a poor grad student.  I got a copy in exchange for doing some beta testing for them.


#12    wcw      (see all posts) 2008/07/24 (Thu) @ 12:54

Aha.  Well, if you need menus and you’re a poor student, try R with R Commander.  And while spss’s scripting syntax probably is slightly more-parseable than sas’s, R code is to my mind much more readable and intuitive than either one.

R does have a learning curve, even with the gui helper library, but it’s a hundred times better than spss unless you need the few things spss does well: metadata, working on disk and not in memory, and pretty crosstabs.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 10:54
David G. checks in again on whether experience matters in the post-season

Nov 20 10:42
Offense by position groups by decade

Nov 20 04:02
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?