Thursday, September 20, 2007
Data recorders fit square pegs into round holes
David’s fine article will serve as the impetus for my diatribe:
A “line drive” is not a “line drive”, as if it was some discrete play, separate from GB or FB. There is a continuum scale, with non-static demarcation points. Maybe we can agree on 70% of line drives being line drives. But, a LD that just lands outside the infield dirt could easily be considered a “GB”. And a LD off the wall could easily be a “FB”. And a 200 foot difference in where a ball lands should hardly be lumped into the same category, as it would with a “LD”. While obviously, overall, this effect is muted, but I would suspect that if you looked at the average distance of Line Drives by Juan Pierre and Albert Pujols, that it would make a huge difference. There are three parameters that we care about:
1. distance and time of ball from bat to first object (ground, base, player, wall, etc)
2. distance and time of ball from first object to last object
3. distance and time of ball from bat through the infield
Ideally, that number 2 would be “first to second”, “second to third”, .... “nth-1 to nth”. But, I’d be happy with the above list. Those pretty much describe what I want (the third one is so that I know if a player caught a ball on the fly that if he missed might have landed 100 feet behind him… asking for height of ball is probably asking for too much).
Instead, what we get are no time parameters at all. Hang time people. Stopwatches. http://www.HitTrackerOnline.com has no problems with that. All we can do is infer time based on whether it was hard hit or not, and whether it was noted as a line drive or fly ball. We can’t tell the kind of groundball. There is just so much inference going on, and its ripe for bias in the data.
Distance and time. That’s all a data recorder needs to record. A data recorder should not analyze the data, and try to fit the square peg into a round hole. Let data analysts like me scr-w that up for them. It’s shocking to me how little changes in data recording the major outfits (STATS, BIS, MLB.com) have undergone with respect to batted ball. And they’ve all been told.