Thursday, March 03, 2011
Tiring fastballs
First off, I love that Jeremy did this:
but only using the fastest 25% of pitches by every pitcher
I’ve talked about this before. And it’s the same argument I have about “average” HR distances. If you hit alot of just-enough HR, this is going to bring down your average. So, even if you hit alot of moon shots, by virtue of also having enough power to turn warning-track HR into just-enough HR, you bring down the overall average.
A similar thing happens with fastballs. We don’t know exactly which pitches are fastballs. Well, for alot of them, they are easy. But, then there are those that are on the margins, and those that are on the margins are biased in that they are pitches that are a couple of mph slower than the standard fastball. So, in order to bypass the borderline pitches, we instead focus on taking every pitcher’s fastest 25% pitches. There are exceptions (Wakefield, Dickey), but those would be easy to spot. Otherwise, since every other pitcher throws some 30% to 90% fastballs, taking each pitcher’s fastest 25% pitches and calling them his “top speed” pitches, we get an unbiased measure of how fast a pitcher throws. It may not be his “fastball”, but it is, by definition, his “top speed”.
Anyway, so then Jeremy ups the ante and:
Then, I took the 25% fastest pitches for every pitcher at every pitch count
It’s not clear if Jeremy started with the 25% subset, and then further took 25% of that. I think that’s what he did. I’m not sure that that’s correct. The unbiased measure would be to take the top 25% of all his pitches, at every pitch count. I’m not totally sure, but I’m pretty sure. Anyway, so then he collects it league-wide:
This is a problem here, in that after 100 pitches thrown, it’s going to be a disproportionate number of good pitchers. So, while Felix and Verlander make up a smaller portion of the early pitches, by the time pitches 101+ is collected, they make up a larger share. The solution here is to make sure each pitcher is equally weighted.
Of course the problem is that some pitchers won’t be represented at all (because they didn’t pitch there). And so now we have survivor bias.
The standard solution would be to do a delta pair comparison, and then “chain” the results (similar to what I did with aging charts, most recently with the NBA free throw).
Anyway, I love the whole thing, mostly because of its simplicity to the approach. Anyone here can recreate what Jeremy did, and add their own improvements to his process, to get a stronger conclusion.


Recent comments
Older comments
Page 1 of 344 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date