Saturday, April 23, 2011
Why has no one done this research?
Pitch f/x guys, this is directed toward you… Unless I missed something…
In The Book, we show that batter/pitcher match ups (at least in less than enormous samples of data) have virtually no predictive value. We even show that batter results against certain classes of pitchers have little or no predictive value.
However, we seem to reserve the notion that certain types of batters may perform “better or worse than expected” against certain types of pitchers, beyond what we looked at in The Book. For example, we often give managers credit for “knowing” that a high ball hitter might do well against a high ball pitcher or a good curve ball hitter may do well against a pitcher who throws lots of curves. Etc.
So how about one of our pitch f/x guys doing this:
Establish two groups of pitchers - those that throw a lot of high pitches and those that throw a lot of low pitches. You don’t need to put all pitchers into one or the other group - just the extreme ones - maybe the top and bottom 10%. Then do the same for batters, only 4 groups - those that perform well against the high pitch and those that don’t. And those that perform well against the low pitch and those that don’t.
Now look at expected versus actual performance of each group of batters against both groups of pitchers. Expected would be their overall wOBA matched up (using odds ratio) with the overall wOBA of the pitcher, adjusted for handedness using platoon ratios.
Of course, you have to make sure that the data used for the groupings and the data used for the expected and actual performance are different.
Do the same thing for different locations (say outside and inside pitches) and for different types of pitches.
Wouldn’t that pretty much settle the long-standing debate over whether a manager, scout, or anyone can recognize and utilize to their advantage a favorable or unfavorable match up?


I’ve had similar thoughts about this. It seems like techniques from the machine learning/pattern recognition community could be useful here. If you identify a meaningful parameter space, perhaps using some of the parameters suggested here, you could run a clustering algorithm on the pitchers within that space. Ideally, there’d “clumps” of pitchers that could be identified as belonging to a particular group (i.e. soft-tossing LHers who work low and away, RH fireballers, etc.). My guess is that the real world won’t clump so easily, but it would be very interesting to see what happens.
I anticipate that identifying the right parameter space would be the most challenging aspect of doing something comprehensive like that. The idea of trying inside/outside or high/low (one variable at a time) seems like a good starting point.