THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, September 20, 2007

Data recorders fit square pegs into round holes

By Tangotiger, 09:44 AM

David’s fine article will serve as the impetus for my diatribe:


A “line drive” is not a “line drive”, as if it was some discrete play, separate from GB or FB.  There is a continuum scale, with non-static demarcation points.  Maybe we can agree on 70% of line drives being line drives.  But, a LD that just lands outside the infield dirt could easily be considered a “GB”.  And a LD off the wall could easily be a “FB”.  And a 200 foot difference in where a ball lands should hardly be lumped into the same category, as it would with a “LD”.  While obviously, overall, this effect is muted, but I would suspect that if you looked at the average distance of Line Drives by Juan Pierre and Albert Pujols, that it would make a huge difference.  There are three parameters that we care about:
1. distance and time of ball from bat to first object (ground, base, player, wall, etc)
2. distance and time of ball from first object to last object
3. distance and time of ball from bat through the infield

Ideally, that number 2 would be “first to second”, “second to third”, .... “nth-1 to nth”.  But, I’d be happy with the above list.  Those pretty much describe what I want (the third one is so that I know if a player caught a ball on the fly that if he missed might have landed 100 feet behind him… asking for height of ball is probably asking for too much). 

Instead, what we get are no time parameters at all.  Hang time people.  Stopwatches.  http://www.HitTrackerOnline.com has no problems with that.  All we can do is infer time based on whether it was hard hit or not, and whether it was noted as a line drive or fly ball.  We can’t tell the kind of groundball.  There is just so much inference going on, and its ripe for bias in the data.

Distance and time.  That’s all a data recorder needs to record.  A data recorder should not analyze the data, and try to fit the square peg into a round hole.  Let data analysts like me scr-w that up for them.  It’s shocking to me how little changes in data recording the major outfits (STATS, BIS, MLB.com) have undergone with respect to batted ball.  And they’ve all been told.

(5) Comments • 2007/09/20 • SabermetricsBatted_Ball
Page 1 of 1 pages

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:33
Avery being Avery

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP