Thursday, April 14, 2011
Batted ball data: good or bad?
Here is a quote from this article entitled “Batted Balls and Home Runs” by Studes on BP. To be fair, the article is not about trashing the integrity of batted ball data.
I’ve played with batted ball statistics for a while now, just about as long as the Hardball Times has been around. Batted ball stats, as compiled by Baseball Info Solutions, are just plain cool. Knowing how often batters hit line drives, or how often pitchers force infield flies, adds a new understanding to the game and results in metrics like xFIP and xBABIP and xWHIP—the “x stats"—as well as advanced fielding stats.
But lately, batted ball stats have been taking some hits (get it?). Colin Wyers (formerly of THT and now researching everything at Baseball Prospectus), has identified several reasons why data recorders in different parks might interpret the angle of a batted ball differently. Colin has only gotten more skeptical over time; I think he recently referred to line drives as “lie drives” somewhere (Twitter?).
It’s not that we don’t generally know what a line drive is. It’s that the definition between different batted ball types is gray, and when you start looking at small samples of batted ball stats—for individual batters, say, or pitchers—there may be some significant differences in how specific balls are classified. You may not know what you think you know.
Many people, including the authors of “Bad Hops” (the infamous Hirsch Brothers), have lambasted advanced metrics that use detailed batted ball data, citing the bad quality of the data as one of the reasons why these metrics are no good. Of course describing the data as being of “bad quality” is not very helpful, since the quality of the data is a continuum. As well, it is usually small minds who reduce things to an either/or, black/white, or bad/good dichotomy in order to prove or support a thesis. Things are rarely that simple.
In any case, this article gives us some good data (which should not surprise you) which supports and evinces something I have been trumpeting for a long time:
We can start out with a defensive metric that simply gives credit for an outfielder catching a ball hit in the air or not, whether those air balls are in a particular zone assigned to that particular fielding position or whether we use the parameters of the batted ball location (again, not perfect data) to determine how often a fielder should have and does catch a particular ball or set of balls (along with any other parameters we may choose to use, like the perceived speed of ball, etc.).
That type of system would be better than a simple range factor, and would also (presumably) be better than a system which doesn’t have batted ball data but tries to infer it from more traditional data like the handedness and G/F ratio of the pitcher on the mound (and other things), like TotalZone or DRA.
Now, if we wanted to do even better than that (using location of air balls), we can try and break down those air balls into categories which approximate and reflect how long they are in the air, and thus how difficult they may be to catch, given a certain location (and the presumptive starting location of the fielder).
Now here is the important thing: It does not matter how you break down those air balls or how much integrity the resultant data contains, as long as what you categorize as a short time in the air is indeed shorter than what you characterize as a medium time in the air, which in turn ends up being in the air for a shorter time than what you categorize as the air ball with the longest hang time. IOW, even if there are all kinds of mistakes, even horrible ones, what you end up with is ALWAYS going to be better than not doing any categorization (of hang time) at all!
For example, let’s say that I decide to split up all the air balls into 3 categories, which is typical of some of the batted ball systems - pop fly, fly ball, and line drive (BIS uses more categories, adding fliner fly and fliner line drive). And lets’ say that I am horrible at doing the categorization - I am half blind, I pay little attention while watching the game, and I can’t read my own handwriting after the session is over (and I am “strung out” half the time while “stringing"), such that 95% of my categorization is completely random (what I call a fly ball is equally likely to be any type of air ball), and only 5% are reasonably accurate. You may think that this is a total travesty and anyone relying on a system that uses such data is out of their mind. And you would be completely wrong. A system that uses such data will be BETTER than a system that treats all air balls equally. In fact, data from this irresponsible stringer might look something like this, if we had actually timed each batted ball in the air:
Type of air ball Average time in the air, adjusted for distance
Fly balls 3.5 seconds
Pop files 3.6 seconds
Line drives 3.4 seconds
Now, that is not data that is going to be real helpful, and if we had perfect categorization, or even a much better stringer, we might see something like:
Fly balls 3 seconds
Pop files 4 seconds
Line drives 2 seconds
But, with our horrible stringer and data, we are still BETTER off than not using any sub-categories at all and treating all air balls alike. This is a very important point when it comes to rebutting the critics of advanced metrics when they start attacking the metric via the integrity of the data.
By the way, here is what Studes found with respect to home runs as classified by BIS, where the height and hang time is known almost exactly through Greg R’s hit tracker web site:
Type Tot Apex Dist Ratio
Fliner liner 83 57 374 0.15
Fliner fly 1328 72 392 0.18
Fly 3151 95 400 0.24
Grand total 4562 87 397 0.22
The other thing is if you look at that chart or even if the numbers were different, but still in the same sequence, can you tell whether the data is good, fair bad, excellent, or otherwise? No. I suppose if you could get the real distance, angle, and hang time data on every ball (and not just the HR), you could establish a base line for “perfect data” but even with that, or absent that, calling the data “bad” because it is far from perfect and assailing the metric because of that, is not helpful and is in fact wrong, for the reasons mentioned above.


On Football Outsiders, Aaron Schatz likes to say that “the best is the enemy of the better” which is basically the same idea that your are espousing here.