THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Thursday, March 03, 2011

Is BIS data on swings completely unreliable?

By Tangotiger, 10:24 AM

Mike makes a strong case:

When I first saw the Crawford map from BIS, I dismissed the scale as being unlikely.  I mean, NO ONE, strikes out in such a clear pattern.  Look again at the scale that BIS provides: it’s a counting scale.  It’s saying how often a pitch was thrown there (in this case, the 104 pitches that resulted in a strikeout).  You will never get 104 of anything to look as smooth as what BIS is saying. 

Mike provides the actual pitch-by-pitch location, and that’s what you expect: some sort of scatter plot, where if you strain yourself, you might see some pattern.  And if you don’t want to strain yourself, you get R (and Mike) to do it for you to get some sort of pattern:

So, the only conclusion I can come to is that BIS is doing the following:
1. They are smoothing out the strikeouts
2. They are smoothing it out to such an extent to get a very smooth pattern (no splotches)
3. And then their scale tries to show this pattern as a counting number (smoothed out rates times 104)

Except of course if you smooth it out that much, there’s no way you can get a cluster that represents some 40 or so strikeouts all in the low-and-away corner.  By my count, there were 27.  But there are also alot of strikeout low and low-and-in.  If you smooth things out (or average out), those pitches plus some pitches medium and away might come together to form the cluster BIS sees.

In other words, this is what you might think is happening: if you were to throw 40 pitches all in the low-and-away corner, you’ll get 27 in that general area, plus another 5 or 7 or so low and center, and another 5 or 7 as away and middle.  So when you smooth out, all 40 come together low-and-away.

BIS did note the following:

For instance, 67% of Crawford’s strikeouts came on pitches out of the zone

And, depending where you draw the imaginary lines of the strike zone, it’s about correct.  But, the process of the smoothing simply wiped out the spread of the strikeouts, and it makes it look like there were concentrated in a handful of spots, and completely absent (as in zero according to the scale), in most others.

At the very least, Mike has uncovered that the BIS output is not representing what actually happened, but what a smoothed out version of what may have happened.

UPDATE: Looking at the BIS chart again, and perhaps those numbers are not counting numbers, but percentages.  But then I look at Pena’s chart, and neither makes much sense.  Half the image is basically a “7”.  I don’t see how it can either be a percentage or a count.  The total number of pixels is 178x150 = 26,700 pixels.  It looks like about one-third of those pixels are in the “7”.  So, about 9000 get “7"… what percent? 9000 x 7 percent is 630 percent.

So, what looks like one pixel should really be treated as say 6 or 7 pixels.  Either that, or the entire idea of putting a number of the scale just has no real sense.  After all, how can you have 10% of strikeouts in such a tightly controlled area?

(29) Comments • 2011/03/04 • SabermetricsBall_Tracking
Page 1 of 1 pages

Latest...

COMMENTS

May 26 01:11
Largest demonstration in Canadian history?

May 25 23:40
“Why Kickstarter works”

May 25 19:41
What sabermetrics is NOT

May 25 19:41
Pete Palmer’s new book: Basic Ball

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

March 03, 2011
Is BIS data on swings completely unreliable?