Monday, May 18, 2009
Another nail in the Hot Hand coffin
The Baseball Analysts has now become my 1A to the 1 that is Hardball Times. This time, it’s Sky that looks at how hitters perform after they’ve been on a 30-game hitting streak:
The “during streak” line is actually “performance from game 31 until his last hitless game” (as best as I can figure).


Since my response has nothing to do with the article, I thought I’d move it here.
MGL, null hypothesis significance testing isn’t as ridiculous as you make it out to be. Of course you can’t be a slave to a p-value, but don’t mock the baby with the bathwater.
A couple points:
(1) NHST didn’t develop in sociology or reflexology. It was developed in the late 19th, early 20th century in genetics by people trying to extend the then novel ideas in On the Origin of Species.
(2) The whole point of statistics is to take a large set of data and extract a lower-dimensional, meaningful summary. The point is that the myriad of observed data contain information, but we can’t see the patterns in all the noise. A p-value is the ultimate summary, taking all the data you observed and abstracting out a single bit of information. By using *any* statistics, you are over simplifying the problem because over-simplifications are useful.
(3) There’s a reason why you’d never get a paper published claiming an interesting finding that has a 25% chance of being wrong (and much larger chance of having a minimal effect size). If you *really* care about getting the answer right, 75% confidence is utterly meaningless. You can’t confidently act on it one way or the other. These conclusions are really only useful to people (like sabermetricians) who have no practical use for the information they seek. No baseball team is going to change its behavior, with millions of $$ on the line, because of a p value of 0.25.
I think this is part of the reason why much of the baseball establishment sees sabermetrics as relatively useless; studies like this are fun for the researcher, and fun for readers, but they are little more than entertainment. If we really care about increasing our understanding and guiding behavior, studies with p values of .25 provide little to no information.