THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, November 27, 2007

PITCHf/x data quality

By Tangotiger, 01:37 PM

Mike Fast reports on more data quality issues:

...Chamberlain throws Lind five straight sliders to strike him out and end the inning. Unfortunately, however, the pitch locations recorded by PITCHf/x for these pitches were mistakenly attached to the wrong pitches in the Gameday XML data.... Then the order of the other pitches is out of whack, too. The pitch labeled #1 should be #5, #2 should be #1, #4 should be #2, and #5 should be #3.

I don’t intend my notation of this example in any way to disparage the incredible work that MLBAM and Sportvision have done in creating this data set and making it available to us. For free, no less. It’s an incredibly valuable resource, and some errors are to be expected during a season in which the system was being evaluated and debugged.

I just don’t know how prevalent these kinds of errors are and when they might call into question some of my conclusions. I do know that Eric Van spotted a similar error in Josh Beckett’s data from Game 1 of the division series, as detailed in this thread at Sons of Sam Horn, post #88. The PITCHf/x data in question for that game has since been removed from the data set altogether. 

As long as you get a substantial % correct in the aggregate, that should satisfy most research needs.  Just something to keep in mind, when looking at the data on an individual basis.  And Mike is right that it’s not everyday that we get the luxury of seeing Beta results in a production environment for the general public to consume.


(3) Comments • 2008/06/04 • SabermetricsBall_Tracking
Page 1 of 1 pages

<< Back to main


Latest...

COMMENTS

Nov 21 17:29
Sabermetric Moves of the 2009 Pre-Season

Nov 22 02:48
How good are the Fans in evaluating fielding?

Nov 21 20:13
Runs Produced

Nov 21 19:27
Marcel 2009 is here

Nov 21 17:50
The New Triple Crown

Nov 21 16:43
Nate Silver: hero to interviewers

Nov 21 10:57
New BBTN

Nov 20 20:34
ABSO-lutely… not!

Nov 20 19:23
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being

Nov 20 18:06
Top Free Agent Pitchers