THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, January 04, 2007

The Next Leap In Data Recording and Distribution

By Tangotiger, 11:27 PM

I asked MLB.com about getting stopwatches for their scorers.  Here is what the Director of Stats had to say:


You’re absolutely right that we want to do more to clock the movement of the various objects on the field—not just the ball, but also the batter (as runner, too), up to three baserunners, nine fielders, even the bat itself during the swing—but I don’t think stopwatches are the way to do it. Very simply, that’s a flawed method of data capture, since you’re relying on human reaction times to create careful measurements of events that happen in milliseconds… far too imprecise.

We are working on a program for this season that will finally allow us to measure the speed, trajectory and location (at home plate) of all pitches in all ballparks, utilizing a series of high-speed cameras and proprietary tracking software. This will enable us for the first time to gather consistent, comparable data for all pitches in all games in all ballparks. Systems that are in place right now are too heavily dependent on human observation and subject to the variances in “slow guns” vs. “fast guns” from one ballpark to the next. In other words, a 92-mph fastball in one ballpark might be measured as 95 mph in another. So who’s right?

You may have seen the first limited rollout of this system during the 2006 postseason with our “Enhanced Gameday” feature, which displayed a graphical pitch trajectory and speed data for most games. There was some rancor over the fact that we clocked Joel Zumaya throwing 103 mph, but in reality, it doesn’t matter if it was 99 mph or 103 mph or anywhere in between. What’s more important is that if we had him throwing 103 in Detroit, we’d measure the same pitch at 103 in St. Louis or New York or anywhere else… as you well know, consistency of data and comparability of data sets is just important (maybe more so even?) as sample size. The program will building will provide all of the above.

Ultimately we’re focusing our efforts on extending this system to record the relative movements of the other moving parts on the field, which I hope will someday allow us to generate much more precise and useful data on fielding, baserunning, pitching mechanics, etc. This may be another year or two from reality, or maybe even more, but our goal is to implement a system that will go far beyond the current limitations of what can be recorded. The pitch system launches this year, though… Hardware has already been installed in about half the ballparks around the league and we’re pushing our hardest to roll out in all 30 by opening day.

I then followed up with more questions, and his answers are interspersed.

I’m with you on everything you said, and agree with it wholeheartedly. If I wanted the ultimate, I’d do as you say.

Three issues:

1 - Will this data be available to the general analyst?

Yes, definitely, but not for free. We want to share this data but it’s going to be very cost-intensive to gather so we are looking at creating some sort of premium product or subscription service to help support it… I’m sure it won’t be prohibitively expensive at all, but we can’t afford to give it away.

2 - How will this data be parsed for the general analyst?  It’s one thing to record the data, it’s another to distribute it. I imagine this will be gigs of data?

I’m quite certain we have no idea yet HOW it will be provided. grin Only that it will be.

3 - Until this comes about, I’d rather the scorer not tell me “hard hit” “soft hit” “line drive”. Just tell me, more or less, hang time. I agree it is imprecise, but, without question, more data is better. Even if it’s only 80% reliable, as long as the error is random, any analyst can live with that.

I parse through NHL.com data, and there is tons of errors in there, as you can imagine. Try tracking 10 skaters who get off the ice every 45 seconds, in place of 10 other skaters. All of a sudden, one team has 7 players on the ice, when only 6 is permitted. Human recording is tough, agreed. BUT, what NHL.com offers, even with the unreliability, is a huge boon.

Trajectories, and hard/soft modifiers, are part of our standard scoring conventions so they are not going away, and I’m equally sure that we won’t be pursuing anything with stopwatches. The pitch data project has been six years in the making (no kidding) so we’ll take our time and do it right. Certainly sooner is better than later, and something is better than nothing, but there’s no inclination among anyone I’ve spoken to internally to rush to market with an inferior or flawed product.

***

My hope has always been that it’s better to give the data to one thousand analysts rather than a handful.  It seems that MLB.com is working toward that goal.

#1    Peter Jensen      (see all posts) 2007/01/05 (Fri) @ 00:23

The same video system and software that MLB will be using to measure pitch speed and movement should be able to measure batted ball speed and vector.  Did you ask your contact about their plans to provide that information as well?


#2    John Beamer      (see all posts) 2007/01/05 (Fri) @ 02:58

Wow. That’s certainly quite exciting. I must say I was fairly impressed by some of the Enhanced Gameday stuff that we saw in the postseason, however, there are still a couple of question marks over data.

Those who have studied baseball physics have calculated that the ball will slow 7-8 mph from the point of release to the point of crossing the plate.  The mlb software showed a slowing down of 10mph in many cases—this may be right, but a 25% discrepancy is worrying.

I’m not sure whether the software will measure batted ball speed. Cricket has had a similar technology for a few years now (called Hawkeye) that measures the trajectory of a bowl, however, becuase the special cameras are confined to focusing on the batter once the ball is out of a pre-defined zone it can’t be tracked. To track everything (accurately) on the field would require a lot of technology.

Anyway, everything the mlb Director of Stats said is very encouraging. Let’s hope it happens sooner rather than later!


#3    tangotiger      (see all posts) 2007/01/05 (Fri) @ 08:54

Peter, my takeaway is that whatever data they record, they will provide.


#4    Peter Jensen      (see all posts) 2007/01/05 (Fri) @ 10:56

Beamer - The technology doesn’t have to track the ball very far off the bat to calculate batted ball speed.  Less than the distance to the pitcher should do it and the cameras need to be able to cover that area to calculate the pitch speed.  After it leaves the bat the ball will slow down at a constant rate due to air resistance until it hits something.


#5    Joe Arthur      (see all posts) 2007/01/06 (Sat) @ 09:47

Peter

I am not a physicist, but the drag force from air resistance is said to vary as the square of the velocity of the ball, so the ball will not exactly slow down at a constant rate. This site
is not available at the moment, but should discuss it. Adair’s Physics of Baseball has a graph on p.8 of drag force v. velocity.

I thought I remembered reading that the system used by enhanced Gameday is taking two “snapshots” of the ball’s flight - near the pitcher’s release point and near the plate, so they aren’t tracking the flight of the ball continuously between pitcher and batter. Sounds like John also understands it this way. I’d assume they grab that final velocity on the pitch pre-collision. To do it consistently, I’d think they have to measure it slightly before the possibility of collision (maybe a foot in front of the plate?). It looks like they don’t need to track the ball over any significant distance to measure velocity, but it must also be at least somewhat more challenging to capture the post-collision velocity.  There are a lot more trajectories for the batted ball than the pitched ball and therefore a lot more places to start looking for the post-collision ball. I’d like to see batted ball velocity and vector too, but the Director of Statistics doesn’t list it explicitly as something they’ll eventually try to do, and I wonder if it’s because it’s particularly difficult to do.


#6    tangotiger      (see all posts) 2007/01/06 (Sat) @ 11:50

Here’s the email response I received from my contact:

To the post by Joe Arthur at 8:47 a.m. on January 6, the system we’re installing actually uses three high-speed cameras which capture approximately 60 images per second, so we’re hoping to record about 30 data points for each pitch. (The cameras work together to locate the pitch so it takes all three to record one image.) We track the pitch from about 55 feet away from home plate—roughly the pitcher’s release point—to the front edge of home plate, so this allows us to determine the pitch’s trajectory, etc., with great accuracy along the entire path.

As an aside, what’s been amazing to me about this program is what we’ve learned from the data we captured last season. That is, we found out that what we thought we understood about pitch movement has been, for lack of a better word, wrong. Think about how most fans observe pitches: on TV, through the center field camera. However, think about the challenges of accurately judging the pitch this way: you’re trying to follow a 4-inch wide ball from a distance of 400 or more feet, scaled down onto a 27-inch TV screen or 17-inch computer monitor, or whatever your viewing screen might be. And don’t forget that the camera is offset from center by an unknown amount that varies in each ballpark. This creates massive scaling errors in the human mind… for instance, we discovered that in many cases, a pitch that looks like it just missed the black may actually have been 8 to 10 inches outside.

I think this is going to be even harder for people to grasp than the notion of the pitch’s “actual speed”. To John Beamer’s post at 1:58 a.m. on January 5, we recorded a difference of 7 to MPH between the release speed and home plate speed, in most cases. This was true whether or not it was Joel Zumaya pitching or Chad Bradford.

Anyway, this is exciting stuff for me too, because I’m eager to see what will be discovered when we are able to get this data into the hands of researchers much smarter than myself. I appreciate everyone’s patience, but rest assured, it’s coming…

Again, more indications that the data will not require a filing of the Freedom of Information Act.  MLBAM looks like it can be a model for many internet companies.


#7    Peter Jensen      (see all posts) 2007/01/06 (Sat) @ 12:34

Joe - Of course you are absolutely right about the effect of air resistance.  I mispoke.  I said constant rate when what I meant to say was a predictable rate.  Doesn’t change my assessment of whether the system should be able to track hit speed.  Tango’s contact described the system as I had heard it described before.  A three camera video system should have the hit ball remain in its field of view for the 2 or 3 frames necessary to calculate hit ball speed even though the trajectories vary more than the pitched balls.  At 60 frames per second that’s only about 8 to 10 feet from where it is hit by the bat.


#8    Joe Arthur      (see all posts) 2007/01/06 (Sat) @ 13:28

Tango,

thanks for following up.

Not being an engineer either, (to Peter’s point) I still wonder how easily MLBAM could use the same 3 camaras to track the batted ball; the camaras would have to be able to stop and reverse direction smoothly and quickly, and track over a larger range of speeds than a pitch. And the camaras might have to be able pan upward to track balls with a high launch angle. A ball leaving the bat just over 100 mph initially would travel about 2 1/2 feet per frame; as Peter suggests to get a clean distance over time calculation, you need at least 2 full frames after initial contact to guarantee the time interval, which could put the ball in the vicinity of 8 feet almost straight up from contact. And of course the ball can be beaten into the ground too; you can’t accurately measure the speed of those balls with this time resolution.

One way or another, I think tracking the batted ball is a more challenging problem, and practically speaking it might require more camaras.


#9    Peter Jensen      (see all posts) 2007/01/06 (Sat) @ 14:26

Joe - My understanding is that the cameras are fixed in position and fixed focus so there is no direction to change or panning involved.  They are synchronized so that the frames occur at the same time in each camera and the calculations involve triangulation of the balls exact position using the balls relation to fixed targets in the cameras field of view and is accomplished wholly through software.  I believe that the camera’s are far enough away and their field of view is large enough that few if any balls would be out of frame in 8 to 10 feet. But even if some were lost wouldn’t you want to the information on the rest?  The cameras are presumably running all the time so it would just be a matter of having the software calculate another data point after the ball leaves the bat.


#10    Peter Jensen      (see all posts) 2007/01/06 (Sat) @ 14:37

You know, its possible that MLB doesn’t think that anybody would be that interested in batted ball speed and vector.  Since Tango is in communication with them it doesn’t hurt to ask whether it is technically feasible with the system they are installing.  It may be possible without any additional changes.  But it might also be that it could be made possible with a very simple change like moving the targets that they use for triangulation.  Now is the time to express our desires before the system is completely installed and any changes would require additional time and money.  We won’t get what we need unless someone asks for it.


#11    John Beamer      (see all posts) 2007/01/06 (Sat) @ 15:44

Peter,

I think in principle you are right. However, my understanding of the system they use in other sports is that the cameras track ball position, nothing else. I have no doubt that we can get batted ball speed and vector.

To accurately know where the batted ball will land we also need to know spin, atmospheric conditions and probably a few other factors as well. A complete treatment requires cameras covering the whole field.


#12    tangotiger      (see all posts) 2007/01/06 (Sat) @ 16:38

I don’t care so much about batted ball speed, but rather hang time.  After all, that is really what we care about: where is a fielder and how long does he have to intersect with the ball’s flight (and does he need to jump, or how many bounces to get to him).

The only reason we care about thrown ball speed is because we can easily convert that to “time to plate”.  Even then, because of some of those nasty curveballs, I’d rather have the “time to plate” rather than thrown ball speed.  Even better would be: “feet from plate thrown ball is, when ball is 0.20 seconds away from crossing plate”.

This way, if a Zumaya fastball is 25 feet away, or a Wakefield knuckler is 40 feet away, that’s more interesting, since the batter will swing based on how much time he thinks the ball will take to cross the plate.

BUT, record all data you can, and let the analysts sort it out.


#13    Peter Jensen      (see all posts) 2007/01/06 (Sat) @ 21:50

Beamer - We don’t need to calculate where a ball hit on a certain vector will land given spin and atmospheric conditions because for every ball hit in the park the commercial cameras do an excellent job of locating the exact location of where it DID land or where it was caught.  That information is already plotted in MLBs hit ball locations shown on their plots. What we don’t know is how long it took to get there.  Although Tango is right that an exact measure of the hang time would be the best, that would require a more extensive system of cameras covering the whole field.  If we can get hit ball speed from the proposed system we can calculate hang time to a specific hit ball location to a very close approximation.  Although knowing where the fielders were located at the start of the play might be useful for some analyses, I can’t think of any at the moment.


#14    tangotiger      (see all posts) 2007/01/07 (Sun) @ 01:52

To your last sentence, the location of the fielders is paramount to understanding fielding.  Does Everett play so well because he’s positioned optimally (whether by himself or his coach), or does he just get to everything?  Is the reason Betancourt not stand out in the stats because he’s not being positioned well?

Not every fielder is positioned the same way for the same hitter, so we need to know where they are.  Is the 3B hugging the line or not?  Are the OF playing straightaway or not?

Fielding is about a players tools and his positioning.  Ichiro has the tools obviously, and maybe, he’s not positioned well enough.  I think it’s alot easier to improve someone’s positioning than his tools.


#15    John Beamer      (see all posts) 2007/01/07 (Sun) @ 04:29

The panacea here is a fully computerized system with no need for human intervention. At the moment, (as far as I am aware) tracking landing position of batted balls requires human intervention.

What you really want is a bunch of cameras (or some giant pressure mat below underneath the field!!) that records the postion and movement of all fielders and also precisely where the ball lands. That would allow us to calculate hang time too.


#16    Peter Jensen      (see all posts) 2007/01/07 (Sun) @ 12:12

Beamer - While it might seem logically that more precision is always a better thing, it isn’t always necessary or desirable.  The present system of locating hit ball landing spots using human intervention is probably accurate to within +- 5 feet and the semi-computerized system used by BIS probably within +- 1 foot.  Computerized three dimensional triangulation like you propose could probably put it within a fraction of an inch.  But to what end?  To analyze the data effectively you have to group it together.  Dewan now groups his outfield data together to within 5 feet even though his raw data is supposedly more precise.  The present system is more than adequate.

Tango - The correct positioning of the fielder is dependent on more than hitter.  A fielder should change his position depending on the count, the type of pitch (if he knows it), the baseout situation, the game score, and the atmospheric conditions, and probably some other things that I can’t think of.  Judging whether a player is out of position or not is always going to be in the provence of scouting, not statistics.


#17    Peter Jensen      (see all posts) 2007/01/07 (Sun) @ 13:02

According to “Baseball Hacks” hack #29 MLB already has a file that tracks the location of hit balls to less than .1 foot, presumably using similar semi-computerized technology as BIS.  But have you ever seen that data except in the plots of hit balls?  No.  So I think that it is much more important to lobby for accessibility in a form that we can easily use for analysis to the data that MLB already has, and soon will have with its new system.  It’s great that Tango has the ear of the Director of Stats at MLB and has started lobbying for us.  If they give us hang time, great.  But if they don’t, batted ball speed is an acceptible substitute if they also give us hit location because we can calculate hang time to the precision we need.  The same thing with “time to plate” and “pitch speed”.  We can have the ultimate data system this year if we can convince MLB to make it available to us. Which is very exciting.


#18    John Beamer      (see all posts) 2007/01/07 (Sun) @ 13:08

Peter—I don’t disagree. However, any process that reduces human error, and let’s face it there will be human error has to be good. Also it will be significantly cheaper (over time) and will hopefully make the data more available to the masses, which has got to be a good thing.

Perhaps in my last post I came across a bit too strongly on the ball location. What BIS does now is actually as good as we probably need. However, there are other benefits to a fully computerized system. Here are two:

1) There is still subjectivity on the definition of a flyball, liner and fliner. Given that batted ball stats are one of the most exciting areas of sabermetric research and there is still a data issue, anything we can do to reduce error is a good thing. I don’t know, but perhaps hang time and angle of batted ball (or location at landing) is all you need to do this.

2) I disagree with your assertion that fielding position is the provence of scouting. If we know a fielder’s precise position, his speed and range, as well as his existing positions we’ll be able to run a simulation to understand where optimal position is exactly. We’d also be able to differentiate by park and perhaps opposing batter. I would be very surprised if scouts, coaches and fielders had this right today.


#19    Joe Arthur      (see all posts) 2007/01/07 (Sun) @ 15:29

a couple of clarifications on mlb.com hit locations:
1) I think the accuracy is actually about plus or minus 3 feet, not a tenth of a foot. the x & y locations in their files are given with decimal point precision, but they must be conversions from a different underlying co-ordinate system; when studying the range of values, I saw that the increments between adjacent points were 1.00 or 1.01

2) the location of the hit ball (on hits) is where the fielder picked it up, NOT where it landed,counterintuitive though that decision may be.

I myself have found it difficult to identify hit location from mlb.com video really precisely, especially on balls hit to the outfield. Sometimes there aren’t good ‘landmarks’ in view, and there’s some distortion of perspective. What the Director said about the ambiguity of pitch location from ordinary video must also apply to hit location…


#20    Peter Jensen      (see all posts) 2007/01/07 (Sun) @ 16:26

Joe - Point 1 - I have not tried to use Hack #29.  I suspected some false precision when I saw they gave the x and y coordinates to the .01 of a grid point.  I am assuming from your post that even though they give locations to .01 of a grid point that the actual descrimination between points is 1 grid point.  With 250 grid points to cover the whole field and the stands making a grid point about 3 feet.  Is that correct?  Even 3 feet would be sufficiently accurate for fielding analysis.

Point 2 - An excellent point.  I think I knew this at one time but had forgotten it.  Makes the information for hits virtually useless for our purposes but the information for outs should still be useful as would the resulting vectors for ground ball hits.

My impression was that the video feed that they used to determine hit location was something other than the commercial video feed but I might be wrong about that.


#21    Joe Arthur      (see all posts) 2007/01/07 (Sun) @ 16:42

Peter

point 1 - yes that’s right, and I agree that 3 feet is accurate enough for current purposes.


#22    DanAgonistes      (see all posts) 2007/01/08 (Mon) @ 17:21

Sorry I didn’t get to your question on my blog but it looks like you got some great information. Having looked at the Enhanced Gameday data in the raw XML format it is certainly the case that in order to use it we’ll need a simplified representation or tools to help with. Hopefully, that’ll be the big value-add for the premium service.


#23    tangotiger      (see all posts) 2007/01/08 (Mon) @ 17:59

No worries Dan.  We’ve all got other things going on!  I’m hoping you and others there can (continue to) exert your influence for the rest of us.


#24    Tangotiger      (see all posts) 2007/05/04 (Fri) @ 15:03

Dan Fox gives us some more news:
http://baseballprospectus.com/unfiltered/?p=356


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 21 17:29
Sabermetric Moves of the 2009 Pre-Season

Nov 22 06:40
The New Triple Crown

Nov 22 06:24
Chance of Scoring by Base/Out, Retrosheet Years

Nov 22 02:48
How good are the Fans in evaluating fielding?

Nov 21 20:13
Runs Produced

Nov 21 19:27
Marcel 2009 is here

Nov 21 16:43
Nate Silver: hero to interviewers

Nov 21 10:57
New BBTN

Nov 20 20:34
ABSO-lutely… not!

Nov 20 19:23
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being