Thursday, March 03, 2011
Is BIS data on swings completely unreliable?
Mike makes a strong case:

When I first saw the Crawford map from BIS, I dismissed the scale as being unlikely. I mean, NO ONE, strikes out in such a clear pattern. Look again at the scale that BIS provides: it’s a counting scale. It’s saying how often a pitch was thrown there (in this case, the 104 pitches that resulted in a strikeout). You will never get 104 of anything to look as smooth as what BIS is saying.
Mike provides the actual pitch-by-pitch location, and that’s what you expect: some sort of scatter plot, where if you strain yourself, you might see some pattern. And if you don’t want to strain yourself, you get R (and Mike) to do it for you to get some sort of pattern:
So, the only conclusion I can come to is that BIS is doing the following:
1. They are smoothing out the strikeouts
2. They are smoothing it out to such an extent to get a very smooth pattern (no splotches)
3. And then their scale tries to show this pattern as a counting number (smoothed out rates times 104)
Except of course if you smooth it out that much, there’s no way you can get a cluster that represents some 40 or so strikeouts all in the low-and-away corner. By my count, there were 27. But there are also alot of strikeout low and low-and-in. If you smooth things out (or average out), those pitches plus some pitches medium and away might come together to form the cluster BIS sees.
In other words, this is what you might think is happening: if you were to throw 40 pitches all in the low-and-away corner, you’ll get 27 in that general area, plus another 5 or 7 or so low and center, and another 5 or 7 as away and middle. So when you smooth out, all 40 come together low-and-away.
BIS did note the following:
For instance, 67% of Crawford’s strikeouts came on pitches out of the zone
And, depending where you draw the imaginary lines of the strike zone, it’s about correct. But, the process of the smoothing simply wiped out the spread of the strikeouts, and it makes it look like there were concentrated in a handful of spots, and completely absent (as in zero according to the scale), in most others.
At the very least, Mike has uncovered that the BIS output is not representing what actually happened, but what a smoothed out version of what may have happened.
UPDATE: Looking at the BIS chart again, and perhaps those numbers are not counting numbers, but percentages. But then I look at Pena’s chart, and neither makes much sense. Half the image is basically a “7”. I don’t see how it can either be a percentage or a count. The total number of pixels is 178x150 = 26,700 pixels. It looks like about one-third of those pixels are in the “7”. So, about 9000 get “7"… what percent? 9000 x 7 percent is 630 percent.
So, what looks like one pixel should really be treated as say 6 or 7 pixels. Either that, or the entire idea of putting a number of the scale just has no real sense. After all, how can you have 10% of strikeouts in such a tightly controlled area?


Recent comments
Older comments
Page 1 of 344 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date