THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, January 04, 2007

The Next Leap In Data Recording and Distribution

I asked MLB.com about getting stopwatches for their scorers.  Here is what the Director of Stats had to say:


You’re absolutely right that we want to do more to clock the movement of the various objects on the field—not just the ball, but also the batter (as runner, too), up to three baserunners, nine fielders, even the bat itself during the swing—but I don’t think stopwatches are the way to do it. Very simply, that’s a flawed method of data capture, since you’re relying on human reaction times to create careful measurements of events that happen in milliseconds… far too imprecise.

We are working on a program for this season that will finally allow us to measure the speed, trajectory and location (at home plate) of all pitches in all ballparks, utilizing a series of high-speed cameras and proprietary tracking software. This will enable us for the first time to gather consistent, comparable data for all pitches in all games in all ballparks. Systems that are in place right now are too heavily dependent on human observation and subject to the variances in “slow guns” vs. “fast guns” from one ballpark to the next. In other words, a 92-mph fastball in one ballpark might be measured as 95 mph in another. So who’s right?

You may have seen the first limited rollout of this system during the 2006 postseason with our “Enhanced Gameday” feature, which displayed a graphical pitch trajectory and speed data for most games. There was some rancor over the fact that we clocked Joel Zumaya throwing 103 mph, but in reality, it doesn’t matter if it was 99 mph or 103 mph or anywhere in between. What’s more important is that if we had him throwing 103 in Detroit, we’d measure the same pitch at 103 in St. Louis or New York or anywhere else… as you well know, consistency of data and comparability of data sets is just important (maybe more so even?) as sample size. The program will building will provide all of the above.

Ultimately we’re focusing our efforts on extending this system to record the relative movements of the other moving parts on the field, which I hope will someday allow us to generate much more precise and useful data on fielding, baserunning, pitching mechanics, etc. This may be another year or two from reality, or maybe even more, but our goal is to implement a system that will go far beyond the current limitations of what can be recorded. The pitch system launches this year, though… Hardware has already been installed in about half the ballparks around the league and we’re pushing our hardest to roll out in all 30 by opening day.

I then followed up with more questions, and his answers are interspersed.

I’m with you on everything you said, and agree with it wholeheartedly. If I wanted the ultimate, I’d do as you say.

Three issues:

1 - Will this data be available to the general analyst?

Yes, definitely, but not for free. We want to share this data but it’s going to be very cost-intensive to gather so we are looking at creating some sort of premium product or subscription service to help support it… I’m sure it won’t be prohibitively expensive at all, but we can’t afford to give it away.

2 - How will this data be parsed for the general analyst?  It’s one thing to record the data, it’s another to distribute it. I imagine this will be gigs of data?

I’m quite certain we have no idea yet HOW it will be provided. grin Only that it will be.

3 - Until this comes about, I’d rather the scorer not tell me “hard hit” “soft hit” “line drive”. Just tell me, more or less, hang time. I agree it is imprecise, but, without question, more data is better. Even if it’s only 80% reliable, as long as the error is random, any analyst can live with that.

I parse through NHL.com data, and there is tons of errors in there, as you can imagine. Try tracking 10 skaters who get off the ice every 45 seconds, in place of 10 other skaters. All of a sudden, one team has 7 players on the ice, when only 6 is permitted. Human recording is tough, agreed. BUT, what NHL.com offers, even with the unreliability, is a huge boon.

Trajectories, and hard/soft modifiers, are part of our standard scoring conventions so they are not going away, and I’m equally sure that we won’t be pursuing anything with stopwatches. The pitch data project has been six years in the making (no kidding) so we’ll take our time and do it right. Certainly sooner is better than later, and something is better than nothing, but there’s no inclination among anyone I’ve spoken to internally to rush to market with an inferior or flawed product.

***

My hope has always been that it’s better to give the data to one thousand analysts rather than a handful.  It seems that MLB.com is working toward that goal.

(24) Comments • 2007/05/04 • SabermetricsBall_TrackingData
Page 1 of 1 pages

<< Back to main