Thursday, January 04, 2007
The Next Leap In Data Recording and Distribution
I asked MLB.com about getting stopwatches for their scorers. Here is what the Director of Stats had to say:
You’re absolutely right that we want to do more to clock the movement of the various objects on the field—not just the ball, but also the batter (as runner, too), up to three baserunners, nine fielders, even the bat itself during the swing—but I don’t think stopwatches are the way to do it. Very simply, that’s a flawed method of data capture, since you’re relying on human reaction times to create careful measurements of events that happen in milliseconds… far too imprecise.
We are working on a program for this season that will finally allow us to measure the speed, trajectory and location (at home plate) of all pitches in all ballparks, utilizing a series of high-speed cameras and proprietary tracking software. This will enable us for the first time to gather consistent, comparable data for all pitches in all games in all ballparks. Systems that are in place right now are too heavily dependent on human observation and subject to the variances in “slow guns” vs. “fast guns” from one ballpark to the next. In other words, a 92-mph fastball in one ballpark might be measured as 95 mph in another. So who’s right?
You may have seen the first limited rollout of this system during the 2006 postseason with our “Enhanced Gameday” feature, which displayed a graphical pitch trajectory and speed data for most games. There was some rancor over the fact that we clocked Joel Zumaya throwing 103 mph, but in reality, it doesn’t matter if it was 99 mph or 103 mph or anywhere in between. What’s more important is that if we had him throwing 103 in Detroit, we’d measure the same pitch at 103 in St. Louis or New York or anywhere else… as you well know, consistency of data and comparability of data sets is just important (maybe more so even?) as sample size. The program will building will provide all of the above.
Ultimately we’re focusing our efforts on extending this system to record the relative movements of the other moving parts on the field, which I hope will someday allow us to generate much more precise and useful data on fielding, baserunning, pitching mechanics, etc. This may be another year or two from reality, or maybe even more, but our goal is to implement a system that will go far beyond the current limitations of what can be recorded. The pitch system launches this year, though… Hardware has already been installed in about half the ballparks around the league and we’re pushing our hardest to roll out in all 30 by opening day.
I then followed up with more questions, and his answers are interspersed.
I’m with you on everything you said, and agree with it wholeheartedly. If I wanted the ultimate, I’d do as you say.
Three issues:
1 - Will this data be available to the general analyst?
Yes, definitely, but not for free. We want to share this data but it’s going to be very cost-intensive to gather so we are looking at creating some sort of premium product or subscription service to help support it… I’m sure it won’t be prohibitively expensive at all, but we can’t afford to give it away.
2 - How will this data be parsed for the general analyst? It’s one thing to record the data, it’s another to distribute it. I imagine this will be gigs of data?
I’m quite certain we have no idea yet HOW it will be provided.
Only that it will be.
3 - Until this comes about, I’d rather the scorer not tell me “hard hit” “soft hit” “line drive”. Just tell me, more or less, hang time. I agree it is imprecise, but, without question, more data is better. Even if it’s only 80% reliable, as long as the error is random, any analyst can live with that.
I parse through NHL.com data, and there is tons of errors in there, as you can imagine. Try tracking 10 skaters who get off the ice every 45 seconds, in place of 10 other skaters. All of a sudden, one team has 7 players on the ice, when only 6 is permitted. Human recording is tough, agreed. BUT, what NHL.com offers, even with the unreliability, is a huge boon.
Trajectories, and hard/soft modifiers, are part of our standard scoring conventions so they are not going away, and I’m equally sure that we won’t be pursuing anything with stopwatches. The pitch data project has been six years in the making (no kidding) so we’ll take our time and do it right. Certainly sooner is better than later, and something is better than nothing, but there’s no inclination among anyone I’ve spoken to internally to rush to market with an inferior or flawed product.
***
My hope has always been that it’s better to give the data to one thousand analysts rather than a handful. It seems that MLB.com is working toward that goal.
The same video system and software that MLB will be using to measure pitch speed and movement should be able to measure batted ball speed and vector. Did you ask your contact about their plans to provide that information as well?