THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, April 28, 2011

Out rate by location and hang time

By Tangotiger, 05:01 PM

I love anything with hang time because it makes it quite clear how determinant it is of the out rate.  Dudek’s initial article way back in the first Hardball Times Annual with limited number of games is all that we needed to know how powerful hang time is.  BIS is finally publishing snippets of it, which we see here courtesy of Mark Simon’s article at ESPN (with an appearance from Ben):

Base Hit Frequency
Balls Hit to Spot of Youngs Double
Hang Time
(Sec)     Plays     Hits     BABIP
3.5     29     27     0.931
4.0     17     15     0.882
4.5     21     12     0.571
5.0     27     2     0.074

Base Hit Frequency
Balls hit to spot of Lillibridge 
catch
Hang Time 
(sec)     Plays     Hits     BABIP
2.0     21     21     1.000
2.5     61     58     0.951
3.0     49     41     0.837
3.5     32     15     0.469
4.0     19     3     0.158
4.5
+     77     0     0.000


#1    Peter Jensen      (see all posts) 2011/04/28 (Thu) @ 18:51

All that information would be terrific if BIS were able to consistently and accurately place hits within a 10 foot square, but there is no evidence that they can and plenty of evidence that they can’t. The hit rates between some of the 1/2 second intervals show a pretty large variation as well.  A continuous propbability function like Shane Jensen constructs would help if both the times and distances had the accuracy to merit it.  I asked Shane if his BIS data set included the hang times for 2010, but he said that it did not.  I didn’t get the impression that he knew BIS was collecting the hang time data.


#2    tangotiger      (see all posts) 2011/04/28 (Thu) @ 19:48

They don’t have to be consistent and accurate.  They just have to be unbiased.  Sample size saves us.


#3    Colin Wyers      (see all posts) 2011/04/28 (Thu) @ 20:35

They don’t have to be consistent and accurate.  They just have to be unbiased.  Sample size saves us.

There’s no evidence they’re THAT, either. And like Peter says, plenty that they aren’t.


#4    Tangotiger      (see all posts) 2011/04/28 (Thu) @ 20:44

I didn’t say that they are not.  I said that we don’t need accuracy.  We just need non-bias.  Whether BIS delivers or not I don’t know.  It doesn’t take away from my point.

Discussion should be mostly centered on bias.


#5    Greg Rybarczyk      (see all posts) 2011/04/28 (Thu) @ 21:13

For what it’s worth, I timed the two batted balls in question.  I’ve got some experience with that, and I’ve tested my timing ability before.

The Cano line drive time of flight was pretty much accurate, although their description of it (just under 2.5 seconds) isn’t exactly precise. 

The video for the Young ball, however, wasn’t ideal, as it didn’t show the bat hitting the ball, you had a zoom-in on the pitcher delivering the pitch.  I had to time it by synchronizing with the crack of the bat, and in doing that, I got 2.54 seconds about 6 times in a row (compared to their 2.4 seconds).

Unfortunately, the crack of the bat is frequently out of synch with the video (I know this from watching umpteen home runs where the crack comes noticeably early or late compared to the video of contact - I typically turn the sound off while timing homers), so there is likely some variation, and possibly bias, to deal with here in cases where audio has to be used.  But most of the time the video is enough, and I wouldn’t expect any timing bias by an experienced analyst.

Hopefully BIS knows to turn the sound off when they do flight times…


#6    Colin Wyers      (see all posts) 2011/04/28 (Thu) @ 21:59

Greg, you’re timing them from when the ball leaves the bat to when the ball reaches the fielder, right?


#7          (see all posts) 2011/04/28 (Thu) @ 22:36

We just need non-bias.  Whether BIS delivers or not I don’t know.

I thought we had already established that they did not deliver unbiased outfield air ball locations, based upon Figure 13 from the Shane Jensen paper.


#8    Greg Rybarczyk      (see all posts) 2011/04/28 (Thu) @ 23:05

Well, no, actually for the one that wasn’t caught, it was bat to ground.  For catches, yes.


#9    Colin Wyers      (see all posts) 2011/04/28 (Thu) @ 23:29

Okay, so lemme see if I get this straight.

We have data where:

* We strongly suspect the lateral position of the ball as recorded is affected by whether or not the ball is caught,

* We can at least suspect that the depth of the ball as recorded is affected in the same way as the lateral position, and

* The recorded hang time of the ball is shorter when the ball is caught for an out than when it isn’t.

I’m sorry, what is this supposed to tell us again?


#10    Guest      (see all posts) 2011/04/29 (Fri) @ 00:09

That you have an agenda and won’t stop hijacking potentially interesting threads until you’ve sufficiently annoyed everyone who reads this blog?


#11    Colin Wyers      (see all posts) 2011/04/29 (Fri) @ 00:36

Okay, how is the question of whether or not the data is true or false irrelevant to the topic at hand? How is it, in fact, not the FIRST question that needs to be addressed?


#12    Greg Rybarczyk      (see all posts) 2011/04/29 (Fri) @ 01:34

The fact is, there is not a single point in space and time at which a fly ball is catchable; a ball can be caught anywhere from about 9-10 feet in the air right down to ground level, and the hang time difference of those two extreme points can be 0.15 seconds for a medium fly ball, and more than 0.20 seconds for a line drive. 

To expand on this, I simulated a rather typical line drive, such as might be hit in the gap, sometimes caught and sometimes dropping in.  This hypothetical ball passes through a point 270 feet from home plate and +9 feet height after 3.10 seconds; the ball covers about 14 feet of horizontal distance as it descends that final 9 feet down to field level.  The projected landing time at field level is 3.32 seconds.  In other words, at the end of its flight path, this particular ball is moving through a curved path roughly 17 feet in length over a period of 0.22 seconds, during which time it is catchable at any point.

Given those parameters, what kind of confidence can we have in any fine use of this data if our positional uncertainty is measured in double-digits of feet, and our time measurement is troubled by inconsistent definitions? 

I think we need to get some rigorous data definitions in place, and some robust technology to make the measurements precisely and accurately.  The technology is coming, but in my mind the data definitions are still to be decided.  In my mind, those data definitions are going to have to be not just 3-dimensional (distance, angle and hang time), but 4-dimensional (distance, angle, hang time and height).


#13    Colin Wyers      (see all posts) 2011/04/29 (Fri) @ 01:52

Right, and it’s entirely a one-directional effect—air balls that are caught will always have a shorter hang time than similar air balls that land for a hit. Increasing the sample size, therefor, won’t help.

Something between a tenth and two tenths of a second doesn’t sound like much when dealing with half second increments, but there’s enough borderline balls that you’ll have significant “spoilage” of the dataset (I don’t mean in terms of individual batted balls, but all the baseline comparisons as well) based on the effect. Marrying the data with location data that’s both inaccurate and biased will exacerbate the problems, not improve the situation.


#14    Colin Wyers      (see all posts) 2011/04/29 (Fri) @ 02:20

Of course, this does suggest a possible remedy, yes? We know the direction of the bias, and we can estimate the magnitude through a physical model of the batted ball - I used this spreadsheet made by Alan Nathan:

http://webusers.npl.illinois.edu/~a-nathan/pob/Full-3d-trajectory-7.xls

and came up with numbers similar to what Greg did.

I mean, for any one batted ball you’re pretty much hosed when it comes to adjusting hang time based on catch height, but when you’re aggregating the data you could probably come up with a workable scheme for at least correcting the fundamental model of how hang time affects catch rates. This gets easier if you have a decent estimate of other parameters, of course, like distance and launch angle.


#15    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 05:56

Great stuff in Greg/12.  I was always wondering about how much the distance and hang time changes based on whether the fielder is there or not.  This is especially important with line drives where is the fielder is not there to catch it, I can see it travel an extra 20-30 feet.

In the other thread, I said that I count as a “low infield” batted ball (which is mostly a standard ground ball) based on where the ball would have landed had the fielder not been there, and maximum height reached (say 6 feet for illustrative purposes).  So if it hits within the dirt like that, it’s a low infield ball.

We of course don’t need to compartmentalize anything other than for ease of data processing.

***

Greg: if the sound/visual are not synched, why not just use the pitcher’s release, figure his initial speed, and then subtract 0.35 to 0.50 seconds?

If initial speed is 75mph, that’s 110 feet per second.  If it’s 100mph, that’s 147 feet per second.

Final speed is about 90% of that, so the average speed would be 95% of initial speed.

So, a 90mph initial speed would come in at 0.40 seconds.


#16    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 07:53

Let’s face it.  We are not going to be able to improve our fielding metrics much until we have access to Field Fx or Trackman data.  From either of them we can determine the spin on the ball and calculate where a ball would have landed had it continued to the ground after a catch and what its hang time would have been to that position.  With Field Fx we will also know the starting position of the fielders, which was not even mentioned in the ESPN article as a factor in determining the “catchability” of the two balls in question.

It also seems fruitless to continue to argue about bias in fielding data.  Colin and others seem to be convinced that bias exists and that it is a major problem that severely degrades the utility of fielding metrics that use hit location and batted ball type data.  I think that it is probable that bias exists in the data, but that individual hit ball locations and batted ball types are biased to only a small degree.  More importantly, the bias that might exist for data on individual hit balls has even less effect on the actual run values calculated for players by fielding metrics that use the data.  The actual ordinal rankings of those fielders change in a very minor way and the most common effect of bias, if it exists, would be to compress the range of values for a particular fielding position, lowering the fielding runs plus value of the best fielders by around 4 runs per 150 games and increasing the fielding runs of the worst fielders by about the same amount.

But until we have the data from Trackman or Field Fx that will have to remain only my opinion, and I don’t expect Colin or anyone else to accept it any more than I can accept his opinion that the problems of bias are so great that we are better off constructing fielding metrics that don’t use hit location or batted ball types at all. 

They don’t have to be consistent and accurate.  They just have to be unbiased.  Sample size saves us.

This is a correct statement if you are discussing the more general question of designing a fielding metric using hang time.  It is incorrect if one is discussing the ESPN article where the “catchability” of a single ball in play is being evaluated and even an unbiased error in its position would put it into an entirely different bucket than those used by Ben for the article.


#17    joe arthur      (see all posts) 2011/04/29 (Fri) @ 07:54

#12-#15: good point by Greg, of course, but this was one of the innovations of David Pinto’s PMR, in which direction but not distance were used among his parameters. A related problem occurs with ground balls, where the pitcher or corner infielder might “cut off” a ball before it becomes fieldable at a greater depth by a middle infielder. [I know that David’s final versions of PMR gave this up for the outfield, as he switched one of his parameters from soft/medium/hard to very large distance buckets.] Instead of trying to adjust hangtimes or distances within a 3D model, David’s original approach is worth another look. A 4D model, as Greg suggests, could also be developed. Once you have a 4D model and bring the fielders’ movement into it, you’ll have some differential equations to solve…

For #5, for those who didn’t read the original article, the hang time from BIS is quoted as 4.4 seconds for Young’s ball (not 2.4). I got 4.5 [Yankee’s YES feed does show the batter, while the Twins’ FSNorth feed focuses on the pitcher as Greg says...]. But competent hand timings of the same visual event can certainly vary by a tenth of a second. And quoting manual timings to the hundredth of a second is misleading. Human reaction time to visual stimuli is not that consistent (one s.d. probably equals .03 seconds), even if the timing is otherwise perfectly executed.


#18    Rally      (see all posts) 2011/04/29 (Fri) @ 09:13

"The actual ordinal rankings of those fielders change in a very minor way and the most common effect of bias, if it exists, would be to compress the range of values for a particular fielding position”

Hang time bias would actually work in the opposite direction as the other types we’ve seen.  The extra time it takes for the ball to pass the fielder’s glove and hit the ground makes flyballs look easier to catch than they would otherwise, and caught balls look a bit tougher.

But it really depends on the play.  There would be no measurable difference for a ball caught on a dive just before it hits the ground.  The most extreme difference would be on a play like the hardest hit ball I’ve ever seen:

Jose Canseco hits a line drive towards short against Baltimore.  My immediate reaction was that Ripken might have a chance to leap and catch it.  It’s over his head, and still rising.  All the way to the seats for a homerun.  That was over 20 years ago.  My faulty memory (and less than ideal vantage point) has probably exaggerated the specifics, it wouldn’t surprise me if video were found to show that it was 25-30 feet above Ripken’s head.

If it were true, probably a 2-3 second difference between catch and out.


#19    Rally      (see all posts) 2011/04/29 (Fri) @ 09:14

Last line should say “catch and hit”


#20    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 09:34

For the hang time, there is a sweet spot where you really really need to know to the tenth of a second.

If you look at the above charts, you see that for a 1 second range, the out rate plummets by 0.65 to 0.80 outs per play.

That means that each 0.1 seconds causes a 0.07 change in out expectancy.

That may seem small, but I’d like to remind everyone that the gap between the best and worst fielders at a position would be roughly 0.10 outs per play.

As long as that timing is not biased toward certain parks, fielders, or kinds of fielders, it’s not an issue once sample size is in play.

***

To Peter’s point: yes, if we are going to talk about any single play, we’ve got a huge margin of error, not the least of which is the starting position of the fielder.

We can say the same thing about offense, where we credit someone with 1.4 runs on a HR, even if the bases are empty or bases are loaded.  That obviously makes zero sense, from an empirical standpoint.

We can say the same thing about pitching, where a pitcher will get a great FIP if he strikes out the side, but still allows (or is on the mound when it occurs that) several singles in the inning.

***

Generally speaking: If you take a step back, this was a fantastic article on ESPN.  It has to be short, because that’s an ESPN requirement.  It introduces the reader to some new concept (hang time), and it talks about outs per play.  And it links it to some real play, not some general concept of a play.

The ESPN reader was well-served, and this is an excellent way to bring that reader into our intimidating world.

The writer was looking to get on base, not hit a HR.

I have no qualms with this article, within these parameters.


#21    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 09:59

My faulty memory (and less than ideal vantage point) has probably exaggerated the specifics, it wouldn’t surprise me if video were found to show that it was 25-30 feet above Ripken’s head.

Exaggerated by quite a lot unfortunately as it is not physically possible for a ball that could be caught by the shortstop to clear the fence of any major league stadium on earth.  Perhaps when expansion teams begin play on the moon!

Your main point of hang time bias being in the opposite direction to other bias is certainly correct and important to point out.  But in reality the actual effect of hang time bias on fielding runs would be very little.  Greg has calculated nearly the maximum effect of additional distance for a line drive that could be caught by an outfielder: plus 14 feet.  Although there will be additional distance added to every outfield catch, the average added distance is much likely to be around 4 or 5 feet.  That added distance would add a little to the difficulty of balls hit beyond the outfielder’s starting point, but would add nothing to balls hit in front of the outfielder.  The overall net effect of hang time bias negating other forms of bias would be extremely minor.


#22    MAH      (see all posts) 2011/04/29 (Fri) @ 10:18

Very good article introducing advanced metrics to a general audience.

Tango, regarding the value of unbiased batted ball data, MGL did not address my closing argument to the prior thread. 

The essence of the question is, since DRA using 2003-10 Retrosheet data can eliminate all known systematic effects on batted ball distribution, to provide an unbiased yet presumably noisy estimate of expected plays, how much do we gain by adding theoretically unbiased batted ball data if it is still has noise?  How low must the batted ball data noise be to reduce the noise in the DRA estimate?

I am developing a way of actually measuring this, using Retrosheet’s batted ball data from 1989 through 1999, but won’t have time to complete the research for a couple of months.


#23    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 10:18

I have no qualms with this article, within these parameters.

Certainly, for that portion of the ESPN audience that has never heard of hang time, the article is a good introduction.

That means that each 0.1 seconds causes a 0.07 change in out expectancy.

But the charts are based on a 10 by 10 grid so there is an uncertainty of position of +- 7 feet from one corner of the grid to the other.  For a moderately fast outfielder traveling at 35 feet per second that is an uncertainty of +- .2 seconds.  The lack of granularity that is necessary because the hit location can’t be measured precisely pretty much makes the charts worthless.


#24          (see all posts) 2011/04/29 (Fri) @ 10:35

Exaggerated by quite a lot unfortunately as it is not physically possible for a ball that could be caught by the shortstop to clear the fence of any major league stadium on earth.  Perhaps when expansion teams begin play on the moon!

Peter, I don’t think that’s correct.  It’s extremely unlikely, but it is physically possible.  Using Alan Nathan’s trajectory calculator, I input the following parameters:
z0 = 2 ft
v0 = 120 mph
theta (launch angle) = 3 degrees
backspin = 2000 rpm
temp = 60 deg F
wind = 20 mph tailwind

At 110 feet from the plate, I have the ball at a height of 9.25 feet.  That’s probably just over the glove of a shortstop that is playing in a bit.  That ball would just clear an 8-foot fence that was 366 feet from the plate.

Certainly, all these things are the edge of probability, so the likelihood of it happening in a game is extremely low.  Let the batter hit it at 5 degrees, and now it’s at 13 feet high at 110 feet out and clearing an 8-foot fence 397 feet away.  Leave it at 5 degrees but drop the tailwind to 5 mph, and now it’s only clearing the fence at 373 feet away.  Etc.


#25    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 10:49

MAH/22: I agree, that this needs to be established.

One way to get an indication is to simply run a correlation of UZR, DRA, outs per BIP against next year’s outs per BIP.

Now, ideally, you would do it ONLY for players who switched teams (or changed pitchers alot).  Because otherwise, a systematic bias in a metric will carry over. 

We can think of an easy example like HR per batted ball for hitters at Coors.  If you adjust the HR allowed, and then run a correlation against next year’s HR / BIP, the unadjusted one may do better.

So, you have to be careful.  Nonetheless, you can still start with the above just to see what you’ve got.

***

“The lack of granularity that is necessary because the hit location can’t be measured precisely pretty much makes the charts worthless.”

For a single play, maybe.  But, not given enough samples, because the extremes will cancel out.

***

“moderately fast outfielder traveling at 35 feet per second “

Moderately fast?  We established the peak running time for Usain Bolt at 0.92 seconds per 10m, which is almost 36 feet per second.  Grass, cleats, equipment, looking over your shoulder, and not being Usain Bolt probably puts that at 30 feet per second.

Also, you need the starting time, so we’re not looking at a runner’s peak time, but a runner’s first 3 seconds of running time.  So, while his peak time might be 90 feet in 3 seconds, his actual time to travel 90 feet would be, well, we kinda know that a fast runner would take that in 3.5 seconds or so, or say 25 feet per second.


#26    Rally      (see all posts) 2011/04/29 (Fri) @ 11:16

"Certainly, all these things are the edge of probability, so the likelihood of it happening in a game is extremely low.  Let the batter hit it at 5 degrees, and now it’s at 13 feet high at 110 feet out and clearing an 8-foot fence 397 feet away.  Leave it at 5 degrees but drop the tailwind to 5 mph, and now it’s only clearing the fence at 373 feet away.”

Thanks Mike.  It’s more possible than I thought.  I didn’t say Ripken had a play on it, merely that when it left the bat I thought he might have, but when the ball actually got to 110 feet out it was out of his reach.

Ripken was 6’4 and reportedly a great basketball player who could dunk well.  He could probably have gotten a glove on a ball up to 11 feet if timed properly.  Put Dwight Howard at short for a play and he very well might be able to block a potential homerun 13 feet over the shortstop.


#27    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 11:49

Mike - We could debate all day whether Greg’s estimates of speed off the bat are correct and it is actually possible to hit a batted ball at 120 MPH (he had 4 hit that hard in 2010).  Or whether it is physically possible to do so with a ball that is only 2 feet off the ground and have the ball have a backspin of 2000 RPM.  Or whether a shortstop ever played Canseco at 110 feet (about 4 feet beyond the baseline from 2nd to 3d).  But I will take the easy way out and say thanks for correcting me and change my “not physically possible” to “infintesimally probable”.

Tango - I was talking about the time that a fielder would take to cover the 14 feet from corner to corner of a the ten foot square distance of the block where BenJ had placed the hit ball.  At that point the outfielder would be at full speed.  The Field Fx data had some outfielders reaching maximum speeds above 35 FPS and since I wanted to not exxagerate the error I assumed to that there were outfielder’s that a 35 FPS outfielder might be closer to the median than the extreme.  Your computation of a 25 FPS speed would actually increase the error for covering the 7 feet from corner to center of the 10 foot grid block from my estimate of +-.2 seconds to almost +-.28.

For a single play, maybe.  But, not given enough samples, because the extremes will cancel out.

So when will the sample of balls be large enough so that you actually gain information from having hang time without increasing accuracy of hit location information?  BIS’s current accuracy of balls hit to the outfield probably does not exceed a SD of 15 feet.  That means that there is less than a 40 percent chance that a hit ball is located in the correct 10 x 10 foot grid.  And the 10 X 10 foot grid is still too large to be of much use.  When you can place balls within a 2 x 2 foot grid at 80% accuracy with a hang time of +- .03 seconds then you will be doing something useful.  Might even gain as much as +-3 runs a year in accuracy for the average fielder.  You decide when it is worth the effort.  I’ll stick with the current charts are useless.


#28    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 12:03

He could probably have gotten a glove on a ball up to 11 feet if timed properly.

From standing still?  Jumping from the infield dirt? Reacting to a ball that would have been past him in .62 seconds?  With a 20 MPH wind in his face? smile


#29    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 12:05

Peter: I agree with your last two sentences.

As for this:
“The Field Fx data had some outfielders reaching maximum speeds above 35 FPS “

Usain’s top interval time is here:
http://www.insidethebook.com/ee/index.php/site/comments/ode_to_the_triple/#30

After running for 100 feet (30m), it took him 3.78 seconds.  His time in the 20m to 30m segment was 10.99m/sec.  That’s 36 feet per second.

His top speed (occurring between 50m and 80m) is 12.2m/sec (40 feet per second).

That’s for Usain, in ideal track conditions.

So, for your “some outfielders”, on the baseball field, I’d have to think it’s at best 30 feet per second.

If FIELDf/x is saying 35 feet per second for “some outfielders” (plural, and not even necessarily the fastest in baseball), doesn’t that seem… well, not right?


#30    Colin Wyers      (see all posts) 2011/04/29 (Fri) @ 12:09

For a single play, maybe.  But, not given enough samples, because the extremes will cancel out.

Tango, I think you’re really underrating how long it takes to get a significant sample size on these things.

If you cut the typical MLB outfield into 10x10 squares, you end up with over 800 divisible units. Divide that into an additional 4 to 6 units for hang time… let’s say five and hit the midpoint of the two example charts. 840 outfield grid spots times five hang time bins per grid location means… four-thousand and two-hundred total bins for outfield air balls. So at 63k air balls per year, you’re talking about an average of 15 batted balls per bin?

So you’re already dealing with some pretty big sampling problems. And we’ve already discussed the bias issue, although we can certainly discuss it more. But here’s the thing - with that small of a sample size per bin, is there seriously anything to be gained by having bins smaller than your margin of error on your measurements?


#31    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 12:26

There’s no reason to create bins as if they are all independent of each other.  You use a smoothing function.


#32    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 12:51

Tango - You seem to be missing the point about the outfielder’s speed.  Field Fx doesn’t give information on outfielder’s speed, it only gives outfielder position every 1/15th of a second.  I chose to calculate speed over 2 intervals rather than a longer period of time knowing that I was not getting a true picture of a fielder’s sustained top speed but that I was getting an estimate of his maximum burst speed.  When I answered the question above I wanted to give an unexaggerated estimate of the uncertainty of position of the ball given the uncertainty of the hang time ane the probable top speed of an outfielder for covering 14 feet, not 10 meters.

I don’t know why you are so interested in proving me wrong on this relatively unimportant point which I will readily concede to you, especialy since it weakens your overall argument concerning the usefulness of the data.


#33    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 13:20

I’m only interested in having an interesting discussion, not to slam anyone or advance some agenda of mine.  Certainly not with any of the commenters here, who I have the utmost respect for. 

I’ll take greater care to make this clear.  At the same time, presume I’m not trying to be a hard-ss.

***

When I made my initial comment, I separated it from the rest of the comment. 

It’s a tangent.

I was only addressing that particular statement on its own, not within the scope of a larger point, or trying to make anyone’s position stronger or weaker.

You said this:
“moderately fast outfielder traveling at 35 feet per second”

And I’m only discussing it on its own merits.

I am not “so interested” in proving you wrong.  I showed how that claim just is not supportable.  But you keep coming back that it is supportable. 

***

Extending this tangent, I’m puzzled by this:
“The Field Fx data had some outfielders reaching maximum speeds above 35 FPS “

Which, as I’ve shown, seems very unlikely.

You also said something interesting:
“it only gives outfielder position every 1/15th of a second.”

So, this looks like what you may have done:  You see his location at one point, and then 1/15th of a second later, you see his location at another point.  Given that you said that it was 35 feet per second, that would mean the gap was 2.33 feet in the two points.  Given that 35 is really unlikely, and should have been 30 or less, the gap should have been 2 feet.

Therefore, there may be an error range of say 0.25 feet in determining the exact location of a player.

It could also be that FIELDf/x is measuring different parts of his body at each point, so that perhaps it’s locating more of his back in the first snapshot and more of the front in the second snapshot.  I can certainly see how FIELDf/x can misplace a fielder by 3 inches.

So, the larger lesson here is that if you want to track a player, you should not rely on two such close measurements.

I’m presuming therefore when we saw that Cairo presentation last year, and we saw his MPH constantly changing, there was some “massaging” of the data to make it seem like this measurement error was being handled.

Basically, it understands that it’s a human being running, it understands that he’s on some sort of acceleration or deceleration curve, it understands that the player is not trying to slow down and hurry up and slow down and hurry up.  And so, it uses all the data points at each 1/15th of a second to create some sort of running chart and make sure that the bounces up and down are mitigated.


#34    Colin Wyers      (see all posts) 2011/04/29 (Fri) @ 13:22

Tango, smoothing is a band-aid on a bullet hole. Let’s double the size of the squares and drop down to three time-increment bins - that gives us 1,263 bins instead of 4,200. That pushes you up to… 50 BIP per bin!

Smoothing will behave a little better than simply increasing bin size, but you still have sample size problems that can only be resolved with… larger sample sizes. That’s why MGL uses three-year baselines for UZR, right?

Except for multiyear baselines to be workable, the definitions need to remain consistent year to year. And that’s third on the list of things that BIS has not shown evidence they can do, and in fact there’s a sizable body of evidence that BIS can’t maintain consistency in their data definitions between seasons.

(Can you solve these problems without resorting to multi-year sample sizes and massive amounts of smoothing? Sure, but it requires a fundamental understanding of the relationship between the parameters and how balls are caught. THAT’S the real problem with bias.)


#35    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 13:40

Colin: But I also know something else.  I know that the fielders are human.  I know how their range of speed.  So, your point here is exactly correct:

“Sure, but it requires a fundamental understanding of the relationship between the parameters and how balls are caught.”

This is simply a video-game problem.

The bin thing simply makes it easier for us to visualize as people, but it’s not necessary to do any binning whatsoever.  So, there’s no point to talk about samples per bin, as if each bin was independent of the bin next to it.


#36          (see all posts) 2011/04/29 (Fri) @ 13:56

I’m presuming therefore when we saw that Cairo presentation last year, and we saw his MPH constantly changing, there was some “massaging” of the data to make it seem like this measurement error was being handled.

Yes, that data was smoothed.


#37    Rally      (see all posts) 2011/04/29 (Fri) @ 14:10

"I don’t know why you are so interested in proving me wrong on this relatively unimportant point which I will readily concede to you”

I laughed pretty loud at this after the nitpicking of the details as I remember a Canseco homer.  One that I early on qualified as likely obscured by both time (20+ years ago) and my less than ideal vantage point (upper deck seats behind home plate area.)


#38          (see all posts) 2011/04/29 (Fri) @ 14:20

Also, I’ll chime in on the “how fast do baseball fielders run” discussion, not that I think Peter is really disputing Tango’s claims. 

My published work (the Cairo presentation and the FIELDf/x summit) has two fast baserunners topping out at around 20 mph (30 ft/s).  It wouldn’t surprise me if there is a baserunner somewhere that can go a few mph faster than this, but these were top quintile runners going all out in order to take another base.

I won’t offer up any other specifics since the other data has not been published, but I haven’t seen anything else in the data to suggest that the top speeds on these two plays are inconsistent with the range of top speeds for fast baseball players in general, whether on the bases or in the field.


#39    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 14:49

It all depends on how much smoothing you do to the fielder location data.  I will readily admit that the 35 FPS number that I gave was not smoothed enough, incorporates measurement error, and doesn’t represent actual speeds that MLB outfielders can sustain.  But whatever number you come up with for outfielder speed is predicated on the distance you are measuring the speed over.  Choosing too long an interval will result in a number that is lower than maximum speed, so if maximum speed is what you want you have to compromise.  Others presented information on fielder speed at the summit, but I don’t remember what speeds they measured and what distances they used for their measurements.  And if you are trying to say what the range is for all MLB outfielders we have no way of knowing since the Field Fx data that we were allowed to use only covered 5 teams.  But I would agree with Mike that the maximum speeds for times of .5 to 1 second in duration were closer to 30 FPS than either 25 or 35.


#40    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 14:53

Rally - I used the smiley face.


#41    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 15:05

Right, this is my frame of reference:

We established the peak running time for Usain Bolt at 0.92 seconds per 10m, which is almost 36 feet per second.  Grass, cleats, equipment, looking over your shoulder, and not being Usain Bolt probably puts that at 30 feet per second.

Also, you need the starting time, so we’re not looking at a runner’s peak time, but a runner’s first 3 seconds of running time.  So, while his peak time might be 90 feet in 3 seconds, his actual time to travel 90 feet would be, well, we kinda know that a fast runner would take that in 3.5 seconds or so, or say 25 feet per second.

Use 30 feet per second for top speed, and use 25 feet per second if you include startup time.

***

I think you are stretching here:

Choosing too long an interval will result in a number that is lower than maximum speed, so if maximum speed is what you want you have to compromise.... But I would agree with Mike that the maximum speeds for times of .5 to 1 second in duration…

You seem to be saying that you can legitimately choose a range of 1/15th of a second, and that if you can live with the measurement error, then the speed over that 1/15th will give you a true top speed.

How much difference can a runner’s top speed over a 1 second segment be from a 1/15th second segment?  If Usain Bolt’s top speed was 40 feet per second over a 1 second segement, it might be 41 feet per second over the fastest 1/15th second segment?  Something like that.  (Or if I had to put a good guess at it, I’d say 40.2 or 40.3 feet per second.)

So, I don’t think there’s any gain at the 1/15th second segment level.  And certainly not when faced with the measurement error we are seeing.

I’m glad Peter reported the 35 feet per second, because it gives us a good idea of the measurement error.  If say the measurement error is 0.25 feet, and if we want to be within 1 foot per second, then you would need to measure at least 4/15ths of a second.

Where did I get that?  Well, if the top speed is 30 feet per second, then that means it’s 2 feet per 1/15th of a second.  After 4/15th of a second, the runner will have travelled 8 feet, but will have been measured to have travelled 8 feet +/- 0.25 feet.  So, at 8.25 feet in 4/15th seconds means he was measured at 30.94 feet per second.

Therefore, best bet to report numbers (and presuming the error range is 0.25 feet like I’m guessing), is to report based on no less than 4/15th of a second (if you don’t want to smooth).


#42    Peter Jensen      (see all posts) 2011/04/29 (Fri) @ 15:39

Tango - Some how we keep talking around each other.  I never said that I measured 35 FPS as a speed for any outfielder at any interval.  I never said that I used 1/15th second intervals.

You seem to be saying that you can legitimately choose a range of 1/15th of a second, and that if you can live with the measurement error, then the speed over that 1/15th will give you a true top speed.

The above statement is completely contrary to the point I was trying to make.  When I said “choosing a longer interval”, I didn’t mean longer than 1/15 of a second I meant longer rather than shorter, i.e. 2seconds rather than the .5 to 1 second interval that I mention in the last sentence.  When I did my measurments I used 2/15 of a second and I was still getting FSP’s of 60 feet occasionaly so I knew that that was too short of an interval to calculate a fielder’s actual top speed.  When I mentioned 35 FPS it came not from actual calculations but was a number that I thought might be within the range of a very fast outfielder for covering 14 feet, because I didn’t want to exaggerate the location error that as I tried to explain in post #32.  It was probably still too high and I think Mike’s 30FPS is actually closer to reality.  I can see how you have misinterpreted my past comments and I am sorry that I have not expressed myself well.

It would be wrong to use any comment that I have made to try and infer measurement error, as nothing I said was meant to be precise enough to make even a reasonable estimate of measurement error in Field Fx.


#43    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 16:08

Peter, thanks for the more elaborate explanation.  It is very fascinating.

This is intriguing:
“When I did my measurments I used 2/15 of a second and I was still getting FSP’s of 60 feet”

This would give us a measurement error of 4 feet!  That is, sticking with my example of a player’s true speed being 2 feet (*) per 1/15th of a second, that means he would have covered 4 feet in 2/15th of a second (and running at 30 feet per second).  If you got 60 feet per second, that would mean that the measurement error was 4 feet (i.e., measured to cover 8 feet instead of 4, in 2/15th seconds).

(*) I’m so thankful it’s a simple integer like this.

It would be pretty straightforward to figure out the measurement error in feet actually.  I presume this has been done, but has not been publicized.

And I presume the tough part in measuring is that the runner’s body has his arms and legs moving in all different directions, so you can get his back foot in one measurement and his front hand in another one, and then, boom, there’s your 4 feet of spread. 

This 4 feet measurement error would seem to be the maximum limit.  For example, it can go the other way, and you’d be 4 feet too short, and so instead of covering 2/15th seconds in 4 feet, you’d cover it in 0 feet.


#44          (see all posts) 2011/04/29 (Fri) @ 16:28

It would be pretty straightforward to figure out the measurement error in feet actually.  I presume this has been done, but has not been publicized.

IIRC, the figure of one yard was discussed at the summit.  In my experience, most of the data is better than that, but not all.

One reason that was given was that while they attempt to track center of mass, it can be difficult to always detect the same parts of the body, depending on shadows, uniform colors, etc.


#45    Tangotiger      (see all posts) 2011/04/29 (Fri) @ 16:38

A 3 foot measurement error, if taking even a one second interval, means a 3 feet per second measurement error.  So, someone running at 30 feet per second will come in at 27 to 33 feet per second.

Hence the reason for the smoothing.

The other thing to test is if the measurement error is random or systematic.  That is, is the measurement error from frame-to-frame independent or not.  (Presumably not.) It’s possible that the error has sort of an inverse relationship so that if it’s +2 feet in one measurement it might be -1.5 feet in another. That’s a good kind of systematic bias (self-correcting).

Fun and games…


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 13:18
Do pitcher’s reach back for velocity when needed?

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves