THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, June 24, 2011

Is there such a thing as a sure-hit?

By Tangotiger, 10:33 PM

Imagine the field is marked at minus 45 degrees (3B line) to plus 45 degrees (1B line), with 0 degrees at the 2B bag.  The typical SS would be set at minus 17 degrees for the typical play.

Now, suppose the SS, for this particular play, is positioned at minus 9 degrees (halfway between where he normally plays and the 2B bag).  The hitter, for this particular play hits the ball at minus 22 degrees.  As a result, it results in a hit, and there was no chance of the SS getting the ball.  And that’s because he was positioned at minus 9 degrees.

But, had he positioned at minus 17 degrees, he would have had a great chance at an out.

Let’s say most shortstops would have positioned themselves at minus 19 degrees for this particular batter.  They would have had a sure-out.

So, my questions:
1. Do we want to call this batted ball a sure-hit (because we presume where the fielder actually was positioned).

2. Do we want to call this batted ball a probable out (because we presume where the fielder is typically positioned).

3. Do we want to call this batted ball a sure-out (because we presume where the fielder should have been positioned).

Big thanks to Greg for clarifying this important point.


#1    Colin Wyers      (see all posts) 2011/06/25 (Sat) @ 10:25

Where’s the third baseman?


#2    Tangotiger      (see all posts) 2011/06/25 (Sat) @ 10:45

For the purposes of this discussion, he’s at his normal spot, and he had no chance at the ball.


#3          (see all posts) 2011/06/25 (Sat) @ 10:46

What about the shortstop’s typical first step before contact based upon pitch type and location?

If the shortstop was out of position because he read the pitch and or batter’s swing wrong, then I’d still want to blame that on the fielder, and I’m not sure I’d want to lump that in with “positioning”.

But even apart from that, it depends on what you are trying to do with the data.


#4    Tangotiger      (see all posts) 2011/06/25 (Sat) @ 10:59

For the purposes of this discussion, Ozzie Smith would not have made the play, if he was positioned at -9 degrees.

This was a play where based on the fielders positioning, as soon as the ball was struck, everyone on the field knew it was a hit.


#5          (see all posts) 2011/06/25 (Sat) @ 11:08

What question are you trying to answer with this data?  That will determine the correct approach.


#6    Tangotiger      (see all posts) 2011/06/25 (Sat) @ 11:38

I’m trying to establish what each individual person (you guys) have as an assumption in terms of seeing a batted ball.

So, of the three questions I have posed, which one do you agree with the most.

You see a ball go halfway between the SS and 3B for an easy hit.  But, had the SS been positioned where he “should” have been, it would have been an easy out.

Do we want to call that batted ball an easy hit, an easy out, or a probable out?


#7          (see all posts) 2011/06/25 (Sat) @ 11:50

Just caught up on the other thread.  Didn’t realize the nature of the discussion, since it was so far from the original topic. 

The direct answer to your question here:  it depends on at least two things:
1. Why the fielder was positioned where he was (situational, personal preference, hitter tendency)
2. (What Mike said) What you are trying to evaluate (the hitter, the pitcher, the fielder (including positioning as a skill), the fielder (excluding positioning as a skill).  Either way, I don’t think you ever want to call it a sure out. 

Also, from the other thread, I confess myself a little disappointed in you guys, who are typically very up-to-date on current research.  I usually avoid pointing out my own work, but in The Hardball Times 2011, I re-did the Dudek study, except I used the entire 2010 season, rather than just eight pitchers.  The timer data was collected the same way Dudek’s was. 

I’ll assume it was skipped over because it’s not available online (to my knowledge).  Well, here’s the chart:

Timer Group Balls In Air Outs Out Ratio Dudek
1.5-2.0 4148 3 0.1% 0.8%
2.0-2.5 5187 184 3.5% 4.5%
2.5-3.0 5523 1676 30.3% 28.3%
3.0-3.5 5513 2818 51.1% 41.1%
3.5-4.0 5237 3383 64.6% 55.9%
4.0-4.5 5250 3940 75.0% 67.4%
4.5-5.0 5172 4490 86.8% 80.1%
5.0-5.5 5473 5153 94.2% 86.5%
5.5-6.0 4887 4742 97.0% 95.9%
6.0-6.5 3035 2953 97.3% 97.3%
6.5+ 937 907 96.8% 96.4%

That being said, the chart doesn’t solve your “bimodal” question at all.  A 4.0 second fly ball might be a guaranteed hit in some locations and a guaranteed out in all others.  It could be 56 percent happened to be in the guaranteed out regions. 

However, later in the same article, I incorporated hit location to show a contour map of the full outfield, based on time and location.  I’ll leave you to pull out the book yourself (why don’t you have it handy?) and interpret on your own.  Of course, this doesn’t split positioning out a fielder’s range. 

Colin and Mike, wouldn’t a potential caught/not caught location bias actually make the gradient even more gradual?


#8    Nathaniel Dawson      (see all posts) 2011/06/25 (Sat) @ 12:37

Was he positioned where he was because of instructions from the dugout? Or because of his own initiative?

I don’t think that’s something we could ever know, and without that information, it becomes a pretty tough call how to assign the responsibility for that play.

I’d have to call it “B”, but that’s still problematic, in my mind.

I don’t see how “C” works at all, as you could take it to it’s extreme and say that, for any batted ball in play, a fielder should have positioned himself properly to make that play, and everything would end up as a sure out.


#9    Tangotiger      (see all posts) 2011/06/25 (Sat) @ 14:48

Now, what if the SS is still positioned at the -9 degrees spot, all other SS would position themselves at -19, and the ball was hit at -8 degrees.

Which of the three answers do you choose here?


#10    Guy      (see all posts) 2011/06/25 (Sat) @ 18:13

Ben makes an important point, which is that just looking at hang time—or any other single dimension—tells us very little about how bi-modal the distribution of airballs really is. The notion that only 17% of airballs have a 90%+ out probability seems implausible, even if you exclude IFs.  The mid-range balls (3-4 seconds) are not all mid-probability out balls, but rather a mix that includes many low- and high-probability balls.  A 4 second ball that travels 220 feet to LF is a sure out, a 4 second ball off the fence in left-center may be a 15% out ball. Ben:  do you have an estimate of the proportion of airballs that are 90% or more (assuming fielders in the average position)?

Once you consider hangtime, distance, hitter and pitcher handedness, base/out, and identity of individual hitter (likely to be a big factor), I’d guess at least 40-50% of airballs are in the 90% out category.


#11    TCQ      (see all posts) 2011/06/25 (Sat) @ 20:34

I think we can safely assume that fielder positioning will mostly be done pretty effectively, either by the fielder himself or the dugout. So the batter creates a sure hit by hitting against the scouting report, in that way.


#12    Tangotiger      (see all posts) 2011/06/25 (Sat) @ 22:00

Ben I think describes it well enough.  And if TCQ’s position is the basic assumption, then this illustration also describes the situation we see.  That by aggregating the data using minimal parameters, out rates at wide ranges get averaged out into middle points, thereby losing the extreme rates.

This is why we see out rates on GB at 40%-90%.  I think we’ve all seen enough baseball to think of a ton of ground balls at the 95% if not 99% level.

So, when we see out rates based just on spray angle of 90% somewhere, you know that that’s a collection of 70% - 100% out rates in there.

And so, human observation that will capture additional parameters becomes important, up until the point that we can capture electronically those parameters.

Therefore we’ll see a bunch of batted balls in the 90%-100% out levels.  And if we see a bunch there, we’re going to see a fair amount at the 0%-10% to balance out.  And then, in-between, we’ll see a spreading out at each level.

My expectation is that we’ll get bimodal.

Again, presume we make the position of the fielder as the starting point.


#13    Greg Rybarczyk      (see all posts) 2011/06/25 (Sat) @ 22:02

TCQ #11, I don’t see how you can make that assumption.  A few years back I carefully analyzed every batted ball hit by three players over a period of three seasons, one of whom was Andruw Jones.  I was able to identify two identically struck balls, medium speed grounders just to the left of second base.  Against the Cardinals, who were playing him only a couple feet swung towards 3B, it went through clean for a single to CF.  Against the Brewers, who were playing a strong shift, the ball went directly at the 2B Rickie Weeks, who only had to lean over, drop his glove and scoop it up.

Now, you can argue which positioning deserves to be called “effective”, and you can’t necessarily judge that off one ball, but my research tells me that the Brewers were doing the right thing, as Jones only hit two balls on the ground to the right side that would have required a second baseman in the traditional location; everything else could have been fielded by the pitcher or first baseman.


#14    Brian Cartwright      (see all posts) 2011/06/25 (Sat) @ 23:51

I’ll be the contrarian. I don’t make the binary choices of sure hit or sure out. I will assign odds on each ball as part of determining an expected value.

(Numbers for illustration only)
The ss is positioned at -9. The ball goes for a hit at -22. I credit the hit to the ss. He’s still at -9, the ball is at -8, it’s an out. Two opportunities, one hit.

After I’ve counted all the hits allowed and opps, I’ll calculate the expected hits for the time period. The ball at -22 is over all a 5% hit, while the ball at -8 is a 80% hit. 1 hit allowed, 0.85 hits expected, this ss is -0.15 plays in 2 opps.


#15    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 01:04

Let’s, for a moment, think as though we don’t care which fielder makes a play, just whether or not any fielder makes a play. (This does not describe every question we may care to answer, but it describes many more questions about balls in play than it doesn’t.)

In order to figure our expected outs, then, what we are looking for is, in essence, “how often does the defense as a unit record an out (or allow a hit/ROE) based on the outcome of the batter-pitcher matchup?”

So our baseline for comparison is, given the same batter-pitcher matchup and outcome, how would the average defense have performed? So there are some things that drive defensive positioning for the average team: handedness, are there runners that need to be held, etc. So your baseline is going to include SOME effects of positioning implicitly, even if you ignore it explicitly.

But above and beyond those effects, the positioning of the fielder to reflect either the particular skill of the fielder (positioning a less rangy shortstop further back to increase his range at the expense of giving him less time to make a play, for instance) or an ability to better predict the location of the batted ball (Greg’s Andruw Jones example is one such case) should not count towards our expected outs, at least in terms of figuring out how well the defense performed relative to average.


#16    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 01:44

Colin and Mike, wouldn’t a potential caught/not caught location bias actually make the gradient even more gradual?

Okay, let’s step through this. Let’s cut the field into 22 zones, a la the STATS diagram:

http://www.baseballthinkfactory.org/szymborski/zrgrid.jpg

Let us consider only ground balls, and let us presume that there are no classification issues on the GB/LD border (there are, but let’s take this one thing at a time). Now, let’s simplify our model and assume that the direction of BIP is totally random - whenever a ball is put in play, it has an even chance of landing in any one of the zones. This is of course not true at all, but in terms of this sort of thought experiment you lose very little in explanatory power in exchange for a sharp reduction in complexity.

Let’s say we have 1,100 BIP, or 50 per zone. And let’s assume that in the three zones flanking the typical fielder’s starting position, you have an expected out rate of .80, and in all other zones you have an expected out rate of .70. (Do I think this reflects reality? Not especially, but it gives me some numbers that are easy to play with and again, this is a thought experiment.) Now, this is what occurs when I move 10 outs from every “out of zone” area immediately adjacent to a fielder’s “zone,” and place those outs in a fielder’s zone:

http://www.editgrid.com/user/cwyers/biz_example

Look at, for instance, the zone boundary between E and F. In the “true” data, we see a difference in expected out rates between the two zones of .10. In the biased data, we see a difference of .19, nearly twice that.

Again, these are numbers I just made up for the sake of illustration. The reality is MUCH more complicated than this. But it should make it understandable WHY a hit/no hit bias in the location data of the sort we’ve discussed here in the past would make the hit/out probabilities on BIP look more bimodal than they are in actual fact.


#17    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 02:16

And before this gets pointed out - yes, the magnitude of the effect is potentially a function of zone size. I chose the STATS zone diagram because it is conveniently published on the Internet and the zones are consistently sized the same in terms of degrees, as opposed to the Project Scoresheet zones which are not. (The STATS zone sizes are, I believe, used in UZR regardless of the data source.)

In a metric like Fielding Bible Plus/Minus, you have many more “zone” boundaries (my understanding is that what BIS does is not, strictly speaking, a zone system but somewhere between a zone system and the sort of continuous distribution we see in something like SAFE). That makes the change in expected outs at the zone boundaries due to range bias less severe, but it means that many more balls pass over one (or more) zone boundary due to range bias.

In a metric like Peter’s BZM or Rally’s TZL, you have many fewer zone boundaries. So range bias would cause fewer balls to cross zone boundaries, but the change in expected outs due to zone boundaries should be more severe.

So changing the sizes of the zone may well have an effect on how much range bias affects expected outs. (I say may well because it’s possible that everything offsets and the zone size simply does not matter; note I say “possible” and not something such as “likely.") It’s not obvious to me which is the “correct” zone size, however. (And that could well be a function of the data set - it’s possible that the Fielding Bible “zone size” is optimal for BIS data, and that the “big zone” metrics are correct for Gameday data. I doubt this - I think that, in broad strokes, the “correct” zone size will be the same for everyone. But that’s a total supposition on my part, and it’s not a strongly held one.)

Of course, there are a lot of confounds to be able to even test this - there’s an inherent tradeoff in zone sizes between sample size and predictive ability, and this is likely to be true even if there is no bias in the data whatsoever. But I think the example I posed is a useful illustration of the effects of range bias on expected outs - the size of zone may change the magnitude of the effect, but I don’t believe it would change the direction of the effect.


#18    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 02:46

As regards your particular THT Annual study, Ben:

* Yes, I’ve seen it.

* Yes, it being only in print and not online makes it harder to cite in discussions like this.

* Yes, I think your hang time data is likely to be better than what Dudek collected.

* But for the purposes of this discussion, I don’t think being better makes a difference, though. The two (as well as my Hit F/X study) are in broad agreement, so far as I can tell. I don’t think anyone’s analysis turned on the accuracy of any one data point in Dudek’s results; using your data instead of Dudek’s wouldn’t have changed the discussion, and as you note, the Dudek data was readily available.

* I think you make a valid point in the abstract about the impact of hit location, but given my concerns about range bias in stringer-collected hit location data I don’t think your particular study gives us much insight as to the interaction of a ball’s actual landing point with the hang time in figuring the out rate. As I demonstrated above, there’s potential for the hit location data to overstate the effects of hit location on batted ball out rates.


#19    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 09:05

Brian: the question on the table is if you include positioning as one of the parameters or not, when determining if you have a sure hit or a sure out.

So, if a ball is hit right at a fielder, but a fielder was terribly positioned, in your system, you are not going to call that a sure-out.  Indeed, you are going to call that a probable-hit.

It is important if we’re going to have a discussion of bimodal what the assumption is.  Hence the purpose of my questions.  With Brian’s assumptions, bimodal is impossible.


#20    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 09:14

So our baseline for comparison is, given the same batter-pitcher matchup and outcome, how would the average defense have performed?

That is YOUR assumption, not “our” baseline. And, it’s pretty much what Brian is saying.

And that’s fine.  On that basis, you are not going to have 50% of batted balls being sure or easy outs (90% to 100% expected out).

If we were to ask the typical baseball fan how many easy or sure outs there are, I’m guessing they’d say 50%.  So, THEIR assumption is that positioning is a given, and they’re not basing their expectations based on “average” fielding alignment.

Therefore, there are two right answers:
1. There is no hope to finding bimodal (assumption based on average fielding alignment, given various parameters).

2. You likely will find bimodal (assumption based on the particular fielding alignment for each of the 190,000 PA).


#21    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 10:21

If we were to ask the typical baseball fan how many easy or sure outs there are, I’m guessing they’d say 50%.  So, THEIR assumption is that positioning is a given, and they’re not basing their expectations based on “average” fielding alignment.

Okay, but the typical baseball fan can’t distinguish between positioning and everything else that occurs between when the pitcher begins their delivery and when the camera cuts to the fielder - so we’re lumping in things like first step, acceleration, etc. under the notion of “positioning.” If we do THAT, then yeah, we’re going to see a more bifurcated set of sure hits/outs. But in doing so we’re ignoring a lot of the things a fielder does in order to turn a ball in play into an out.


#22    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 10:27

And when I say:

the typical baseball fan can’t distinguish between positioning and everything else that occurs between when the pitcher begins their delivery and when the camera cuts to the fielder

Nobody watching on television, and most people watching in person, can’t do it either. Simply put, you can’t watch the batter-pitcher matchup and the fielder’s initial reaction to a BIP at the same time. On TV, you have no options - you watch whatever the producer decides you will watch. At the ballpark it’s possible to watch the fielders exclusively, but most people are going to follow the ball.


#23    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 10:30

Right, and I’m not disagreeing with that.

It’s a matter of if there’s a fixed environment that a viewer sees (190,000 PA), since a SS cannot play 20% of the time here, 30% of the time there, 10% of the time over here, and 40% of the time over there for any single given PA.  That viewer is going to see him at one spot for any given PA.

That’s why that fan will say that a ball hit at -8 degrees and the SS being at -9 degrees, even though he had no business being there, will call that a sure-out, while Brian would call that a probable-hit.

It’s a matter of assumptions.  So, we can only have a discussion of the bimodal if we understand the assumptions.


#24    Guy      (see all posts) 2011/06/26 (Sun) @ 11:00

Therefore, there are two right answers:
1. There is no hope to finding bimodal (assumption based on average fielding alignment, given various parameters).
2. You likely will find bimodal (assumption based on the particular fielding alignment for each of the 190,000 PA).

I don’t see these as the choices.  I think we want the reference point to be the average outcome for the hitter/pitcher matchup at hand, given the base/out situation and park.  We can’t ever measure that exactly, both because of sample size limits and the fact that the pitcher often plays with specific fielders.  But that’s what we want to approximate.  To the extent teams use a variety of positionings for a hitter (e.g. Jones), our baseline should be the average outcome across those positionings.  As Colin says, we want to know how often, on average, all defensive units record an out “based on the outcome of the batter-pitcher matchup.”

In practice, I think you want to control for hitter and pitcher handedness, score/out/base, park, the specific hitter (as best you can), and perhaps the pitcher’s Air/ground tendency.  Once you do all of that, I don’t see any reason to believe you won’t see a bimodal distribution.


#25    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 11:23

Guy, I can think of a lot of reasons to believe you won’t see a bimodal distribution. Look at Peter’s presentation here:

http://baseball.sportvision.com/summit/archive/2009

And tell me if you think that represents a bimodal set of fielding chances. And if we look at stringer-provided batted ball data, we still don’t see the stark bimodal outcomes that you’re predicting, even though we have good reason to suspect that they are biased in such a way to encourage that sort of outcome.

The only piece of data we have to support the bimodal hypothesis is Sky’s study that Tango referenced earlier, where he watched some arbitrary number of plays and estimated the hit/out probability based on four or five buckets. I just went and watched four or five highlights of grounders hit to short here:

http://mlb.mlb.com/search/media.jsp?mlbtax_key=defense

I used a stopwatch on one or two of them and took another and went frame-by-frame, and by my estimate roughly a third to half of the time a shortstop has to field any given grounder has transpired before the camera switches to him. Sky’s “sure hit” bin is going to include a vastly disproportionate number of times where the shortstop performed poorly over that period of time, and his “sure out” bin is going to include a vastly disproportionate number of times where the shortstop performed well over that period of time.


#26    Guy      (see all posts) 2011/06/26 (Sun) @ 11:24

I think you make a valid point in the abstract about the impact of hit location, but given my concerns about range bias in stringer-collected hit location data I don’t think your particular study gives us much insight as to the interaction of a ball’s actual landing point with the hang time in figuring the out rate. As I demonstrated above, there’s potential for the hit location data to overstate the effects of hit location on batted ball out rates.

That bias may or may not prove to be large.  But whether or not location has yet been measured well, it’s obvious that location makes a huge difference on outcomes.  To just ignore location and accept the idea that only 17% of airballs are 90% probability outs, as Mike Fast seemed to in the other thread, doesn’t make sense to me.  If forced to choose between using potentially-biased location data and relying on hangtime alone, I’d happily bet on Ben’s data to be closer to the truth.  Obviously, what we want ideally is unbiased data on both location and hangtime.  But until you have that, it’s not at all clear to me your best choice is “ignore location.”


#27    Guy      (see all posts) 2011/06/26 (Sun) @ 11:26

Colin/25:  My comments have all been about airballs.  I don’t have an opinion (yet) on how bimodal GBs are.  I’ll check out Peter’s presentation when I have the time.....


#28    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 11:39

But whether or not location has yet been measured well, it’s obvious that location makes a huge difference on outcomes.

I don’t think that’s at all obvious, especially given that “huge” is such a vague term.

(And again - I think the important question to resolve first is how the batted ball trajectory affects the overall probability of a hit versus out, not the probability for any particular fielder.)


#29    Guy      (see all posts) 2011/06/26 (Sun) @ 11:49

"Huge” is indeed imprecise.  But I do think it’s obviously true that the combined impact of hangtime and location is very substantial.  Disentangling the share of the outcome variance that comes from location alone is hard, if it’s possible at all. 

One point on GBs:  if it is true that there are far fewer 90-95% out GBs than our observation seems to tell us, then I think it’s clearly a mistake for metrics like UZR and TZ to treat errors the way they do.  In both cases, an error is assumed to provide us information about how many “automatic” GBs that fielder had.  If positioning and range can make a 70% ball appear to be a 95% ball, and vice versa, then a fielder booting an apparently easy GB really tells us very little about his actual opportunities.  And my sense is that this is probably the case....


#30    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 11:49

Given that huge is such a subjective term, let me restate the above in a hopefully less subjective way:

I don’t think it’s obvious that location is more important than hang time in determining out probability for air balls. And in order for there to be a bimodal distribution of expected outs on air balls, location would have to be more important than hang time in determining out probability for air balls, given the continuous, unimodal distribution we see for hang time.


#31    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 11:50

One point on GBs:  if it is true that there are far fewer 90-95% out GBs than our observation seems to tell us, then I think it’s clearly a mistake for metrics like UZR and TZ to treat errors the way they do.  In both cases, an error is assumed to provide us information about how many “automatic” GBs that fielder had.  If positioning and range can make a 70% ball appear to be a 95% ball, and vice versa, then a fielder booting an apparently easy GB really tells us very little about his actual opportunities.  And my sense is that this is probably the case....

I would agree with all of this.


#32    Brian Cartwright      (see all posts) 2011/06/26 (Sun) @ 11:56

Tango:

I believe that positioning is a skill, one of several skills.

If there’s a location where, for the league as a whole, a ball is an out 95% out the time, but the fielder chooses to stand someplace else and then doesn’t get the ball that’s hit there, it’s should be counted against the fielder. On the other hand, if the fielder can consistently put himself in the optimal position, so that on average he makes more outs than expected, then he should get credit for being above average.

Colin:

I agree with much of what you’ve said in this thread. If we make our bins too small, they start getting lumpy and showing biases. Too large, they are not as predictive.

I broke outfield air balls into the 22 slices and it was easy to see that the distribution of outs bunched towards the fielder’s locations while the distribution of hits bunched away from the fielders. Seven outfield slices were the most I was comfortable with to get a smooth unbiased distribution.

On defense, I’ve only been using the horizontal slices to assist in assigning responsibility for hits when it’s not clear from the Gameday record. For grading infielders on their ground ball hits to the outfield allowed, in addition to the vector, I use park factors, bat handedness, if it’s a dp situation and if the runner is held at first (from base/out) to determine an expected value, the last three as proxies for standard positioning.


#33    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 12:24

Brian: I am not talking about whether positioning is a skill or not.

All I am asking is this: every other SS would stand at minus 19 degrees for a particular context (this batter, this pitcher, this park, this base/out, this count, this expected pitch type and location), but this SS stands at minus 9 degrees.

This particular ball is hit at minus 8 degrees (a line shot).  This SS barely moved.  It was an out.

Do you, Brian Cartwright, count that as an probable-hit or an automatic out?

I’m not talking about valuation.  I’m not talking about player skills.  I’m just saying, you happen to watch this game, you see this play, you see the SS looking completely out of position, and he makes an easy line out.

Guy: same question to you.


#34    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 12:34

And if you need to qualify your answer, do so.  But give me an answer.

Colin, Mike and anyone else: question applies to anyone wanting to answer.


#35    Peter Jensen      (see all posts) 2011/06/26 (Sun) @ 13:23

And in order for there to be a bimodal distribution of expected outs on air balls, location would have to be more important than hang time in determining out probability for air balls, given the continuous, unimodal distribution we see for hang time.

Colin - The bimodal distribution that Sky showed in his study was entirely due to the particular method that he used to graph the data. It is an artifact of chosing to use “Percent Outs” for the buckets on the X axis and “Percent of BIP” on the Y axis.  If you graph either Dudek’s data or Ben J.’s data in the same manner they end up being bimodal as well, not “continous and unimodal” as you state above.


#36    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 13:32

It’s important to note Sky’s assumption: he’s saying what would happen had that particular situation repeated itself an infinite number of times (with no learning, naturally).  So, not only the same batter, pitcher, park, count, runners, etc, but the SAME FIELDERS.  It’s not an “average” fielder, but those particular fielders.  That’s his assumption.

Under those assumptions, you are going to get automatic hits and automatic outs.


#37    Peter Jensen      (see all posts) 2011/06/26 (Sun) @ 13:32

And I should add, so will any graph be bimodal that has buckets of “out percentage” on the X axis and “Distance from fielder"/hang time on the Y axis.


#38    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 13:45

If you make “Expected outs” based on THOSE fielders, for that context, you’ll get bimodel. 

If you make “expected outs” based on “average” fielders (in talent and positioning), for that context, you may get bimodal but you probably won’t.


#39          (see all posts) 2011/06/26 (Sun) @ 15:17

But whether or not location has yet been measured well, it’s obvious that location makes a huge difference on outcomes.  To just ignore location and accept the idea that only 17% of airballs are 90% probability outs, as Mike Fast seemed to in the other thread, doesn’t make sense to me.  If forced to choose between using potentially-biased location data and relying on hangtime alone, I’d happily bet on Ben’s data to be closer to the truth.  Obviously, what we want ideally is unbiased data on both location and hangtime.  But until you have that, it’s not at all clear to me your best choice is “ignore location.”

Whoa, whoa, whoa, wait a minute.  Tango said in the other thread that the “guaranteed” starting point for any discussion here should be a bimodal distribution and that Sky’s study was evidence for that.  I pushed back and said that Dudek’s study was far better, and that I did not see any evidence for a bimodal distribution, and that the evidence I had seen showed a much smoother distribution.  At no point did I claim that someone couldn’t improve on what Dudek had done or that it was the final and perfect word on the subject.  I was disagreeing that a bimodal distribution should be assumed, no questions asked.  If one wants to posit a distribution, I suggested one ought to investigate either hang time, as Robert Dudek (and Ben Jedlovec) did, or the HITf/x data, but that asserting it based upon watching balls on a TV feed was very inadequate.  Starting with assumptions about how the data should look is a recipe for bad conclusions.  That was my point.  If there is good evidence of bimodality, other than someone’s conception of how the game should behave, I am very open to it.


#40          (see all posts) 2011/06/26 (Sun) @ 15:23

Tango/34, as long as you ignore my question, I can’t answer yours.

I thought it was well accepted here that there was not one correct form for data or statistics but that it was critical to determine first what question you were trying to answer with the data.

So until you say what you want to do with the data, my answer will remain “It depends.”


#41          (see all posts) 2011/06/26 (Sun) @ 15:33

Guy, one of the important results of Dudek’s study was that hang time was far more predictive of out probability than location.  He used only 12 outfield zones, for whatever that is worth.  He did not use any combination of zones plus hang time, probably because sample size would have been too small, I’m guessing.

I don’t like going into the data with the assumption that it’s going to conform to what we think we’ve learned from the last decade or so of batted ball tracking.

One thing that impressed me from FIELDf/x data is how much it matters what fielders do within the first second or less that the ball is in the air.  Speed and routes and so forth matter, too, the first half second or second that the fielder is in motion (which can start before the ball is hit) makes a huge difference.

I was also impressed/intrigued by Matt Thomas’s work on how positioning is done.

I believe there is a lot to be learned about these issues.  I don’t believe that a pre-existing assumption of lots of sure hits and sure outs helps us in any way to be aware of the subtleties of how fielding works.


#42          (see all posts) 2011/06/26 (Sun) @ 15:47

If you graph either Dudek’s data or Ben J.’s data in the same manner they end up being bimodal as well, not “continous and unimodal” as you state above.

Peter, Dudek’s data has a slight trough in the middle if you graph it that way.  However, it has a lot more balls in the middle than Sky’s distribution did.  I, at least, am using “bimodal” here to mean not just that the buckets at high and low out probabilities have more occurrences than the buckets with middle out probabilities, but that the large majority of balls fall in buckets with out probabilities of <10% or >90%.  Dudek’s data doesn’t fit that criterion.  Ben’s data has about half and half.

Also (not necessarily directed to Peter), as Guy’s comments show, perhaps I did not make clear earlier that I do not believe it to be impossible that out probabilities are strongly bimodal.  It’s that I have not seen good evidence of that.  So people saying that this effect or that effect might cause things to be more bimodal than Dudek, etc.  Well, fine.  But we shouldn’t act on “might be”.  If there is evidence that it IS so, then we can act on that.  We have enough data available from HITf/x that we’re not restricted to speculation.


#43    Brian Cartwright      (see all posts) 2011/06/26 (Sun) @ 15:48

Tango said:
I’m just saying, you happen to watch this game, you see this play, you see the SS looking completely out of position, and he makes an easy line out.

The outcome of one play does not tell you whether the player made a correct decision. A runner trying to take third can be a bad play, one they should be scolded for, even if they are safe. If the SS can continue playing “out of position” but continue to make plays, then they may prove it to be the correct decision.

Tango said:
I’m not talking about valuation.  I’m not talking about player skills.

I’m not sure how I’m supposed to determine a player’s skill without doing a valuation. Dammit Jim, I’m a quant, not a scout.

Tango said:
All I am asking is this: every other SS would stand at minus 19 degrees for a particular context (this batter, this pitcher, this park, this base/out, this count, this expected pitch type and location), but this SS stands at minus 9 degrees. This particular ball is hit at minus 8 degrees (a line shot).  This SS barely moved.  It was an out.

I use two aggregates, actual plays made and expected plays made, and then find the difference.

The actual catch goes into the actual plays made bin. For the league as a whole, it’s a low probability play, so I’ll put 0.2 catches in the expected bin. That play is a plus for the fielder. But, as I said in the first point, if he stands at -9 long enough, when everyone else is at -19, I believe the odds are he will miss more plays than he makes, and in a large sample size will likely prove to be an unwise strategy, one that I will dock his defensive rating for.


#44    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 16:26

Mike, please don’t say I ignored you.  Tango/6 was in response to Mike/5.

You can say that I didn’t answer it well enough, but don’t go farther than that.

***

It’s fascinating to see Brian’s responses because they are so myopic (not saying that in any offensive way, just an observation).  I keep saying to forget about valuation, and forget about player skills. But every single line of his responses somehow has to do with valuation or player skills!

***

What is fascinating is how the prevailing responses throughout this thread keeps the pitcher, batter, park, runner, count, and batted ball as constants, but simply presumes “Average” fielders.  That rather than ponder a situation to simply discuss what you actually see in the here-and-now (i.e., ALSO keep the fielders constant), instead it’s about “the average fielder”.

On that basis, you likely will not have bimodal.  How could you?  You’ll never get a scenario where a good number of batted balls would have had a 90%-100% chance of being a hit.  It’s going to be rare.

I look at baseball, and I see a large number of sure-outs or high-probability-outs.  Say at least 50%.

But most of the responses here won’t accept that, because they “float” the fielders, meaning they have different fielders at different positioning and asking how often would that play have been made by those combination of fielders and their likely positioning.

Like I said, no right or wrong answer here.  It’s a matter of what your assumption is.

If your assumption is as mine is, and as Sky’s implicitly was, then almost certainly, you get bimodal.  There’s no way around that.

If you reject that assumption for discussion purposes, then of course you have to reject the entire argument.


#45    Brian Cartwright      (see all posts) 2011/06/26 (Sun) @ 16:40

OK, I did misread a line, which contributed to my confusion

>I’m not talking about valuation.  I’m not talking about player skills.

I didn’t see the “not” in the player skills

I do agree that up to 90% of plays are either sure hits or outs, only 10% depend on the fielder.

I was trying to point out that I reject your initial premise, that’s just not how I model it, so that might also contribute to not really understanding what you’re asking. I was trying to explain that I just don’t do it that way.


#46    Tangotiger      (see all posts) 2011/06/26 (Sun) @ 16:43

Let’s say for example we follow my assumption (we look at the actual fielders and where they were actually positioned), and let’s say you have this as the chance of making an out, and the frequency at each out range:

5% 0.05
15% 0.05
25% 0.05
35% 0.05
45% 0.05
55% 0.05
65% 0.05
75% 0.05
85% 0.05
95% 0.55

The “5%” means “0% to 10%” and so on.

If you add that all up, you get 0.725 outs per BIP.  That’s too many outs!  In order to get fewer outs, you need to shift the frequency down so that the 0% to 10% out range has a higher frequency, and the rest in the middle has a lower frequency.  That’s bimodal.

On the other hand, if you have this:
5% 0.055
15% 0.055
25% 0.055
35% 0.055
45% 0.055
55% 0.055
65% 0.055
75% 0.055
85% 0.055
95% 0.505

You get 0.703 outs per BIP, which is the league average.  No bimodal.

Is the above possible?  Well, it all depends how many high-prob outs you get.

Again, and all based on the assumption noted.


#47          (see all posts) 2011/06/26 (Sun) @ 16:57

Mike, please don’t say I ignored you.  Tango/6 was in response to Mike/5.

You can say that I didn’t answer it well enough, but don’t go farther than that.

I didn’t realize your #6 was in response to my #5.  I thought you were just ignoring me and repeating your question.

If #6 is as specific as this question will get, I will not be able to answer anything other than “it depends.” I don’t have an assumption about how a batted ball should be labeled independent of doing something with that label, and I honestly don’t see how anyone else can, either.


#48    Peter Jensen      (see all posts) 2011/06/26 (Sun) @ 19:46

Peter, Dudek’s data has a slight trough in the middle if you graph it that way.

Mike - Dudek would have around 300 balls at less than 10% outs, over 300 at 90%-100% outs, over 300 at 80-90% outs, and a little more than 700 balls spread out over 10-80%.  How you get “a slight trough in the middle” is beyond me.  It may not fit your criterion for bimodal, but the majority of balls JUST ON HANG TIME would less than 15% or greater than 85%.  If you looked at distance from player divided by hang time I have no doubt that it would easily meet your criterion for bimodal.


#49          (see all posts) 2011/06/26 (Sun) @ 20:18

If you looked at distance from player divided by hang time I have no doubt that it would easily meet your criterion for bimodal.

This is what I don’t find useful.  Speculating about what the data might look like doesn’t help us make any progress.  Actually seeing what the data looks like is helpful.

We know what the data actually looks like for hang time.  That’s why I used that as a stake in the ground.  Not because I believe there is something magical about only using hang time.  (It’s better than a lot of other measures, but not perfect.)

I also would not consider a ball with an 85% out probability as one that the quality/skill of the fielder would make no difference on.


#50    Colin Wyers      (see all posts) 2011/06/26 (Sun) @ 23:37

Okay, so I took the largest body of hit location data that I have, the Project Scoresheet data. This is the zone diagram:

http://www.retrosheet.org/location.htm

I took and figured the average out rate per season by zone and batted ball type (grounder, fliner, fly ball, popup), looking at all BIP (so excluding home runs). I then summed up the number of BIP in each group, given the out rate I observed in each bin. I am presenting this alongside the data collected by Sky:

Bin    SkyA    Scoresheet
00
-04    17%     4%
05-20     7%     8%
21-79    11%    33%
80-95    18%    41%
96-100   48%    14%

I see nothing to support Peter’s claim that the bimodal distribution is “due to the particular method that he used to graph the data,” or that adding hit location data (as I did) would produce a similar distribution. What I see seems to be a right-skewed unimodal distribution.

Now. I make no claims that the Project Scoresheet distribution is true; I think everyone knows I have severe reservations with both hit location and trajectory data. But all of those objections apply at least as much to the sort of observation Sky was doing (and probably moreso). And I see nothing here that should cause the kind of certainty Peter is asserting.


#51          (see all posts) 2011/06/27 (Mon) @ 00:57

I write in response to the original question and Tango/44.

Tango wrote:

I look at baseball, and I see a large number of sure-outs or high-probability-outs.  Say at least 50%.

But most of the responses here won’t accept that, because they “float” the fielders (. . .)

If you are looking at the analysis from the batter’s perspective, I would assert that all the variables (other than fielder position) are things the batter is aware of early enough to react to.  Therefore, one bins in the other variables and floats the fielder position.  The batted ball is a probable out.

If you are looking at the analysis from the fielder’s perspective, the fielder can react in advance to everything except the batted-ball location.  One bins in the other variables and floats the batted-ball location.  This particular batted ball is a sure hit for that positioning.

At some point, you can distinguish:
P(out|positioning) from
P(out|ball location) from
P(out|positioning;ball location).

Then you can distinguish quantitatively between fielders who fail to make plays because of poor positioning (not where the ball is usually hit), unlucky positioning (not where the ball was hit this time), or lack of skills beyond positioning (in the right place, but failed to make the play).  But I doubt we’re anywhere near that point.


#52          (see all posts) 2011/06/27 (Mon) @ 00:58

Colin wrote:

So changing the sizes of the zone may well have an effect on how much range bias affects expected outs. (. . .)It’s not obvious to me which is the “correct” zone size, however. (. . .) (T)here’s an inherent tradeoff in zone sizes between sample size and predictive ability (. . .)

The correct zone size is infinitesimal (the continuum limit).  Naturally, this can’t be done.  Next best would be to use a good binning scheme with a solid estimate of how much it differs from the continuum; this can’t be done either.

The practical solution is to re-run the analysis on a few different binning schemes and either take the best scheme or an average as your central value and use how much the answers differ from each other as your binning uncertainty (error bar from binning).


#53    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 02:01

I look at baseball, and I see a large number of sure-outs or high-probability-outs.

Okay, sure. But you see this for the same reason the mark never wins at three-card monte or the stage magician seems to pull a rabbit out of the hat - because you are looking at something other than what’s really happening (in this case, you’re watching the follow-through of the batter’s swing and his first steps out of the box, rather than watching how the fielder reacts to the batted ball).  So I really don’t know how to answer your question here - we seem to disagree on the fundamental premise.


#54    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 08:31

Colin, right, we disagree fundamentally here.  You seem to be saying that a play that seems like an automatic 4-3 play, or an automatic 7 (fly out), which I can say by the very unhurried movements of the players, that you can’t accept that.  That you are saying that things were happening that I wasn’t seeing (an extra step or two) that ALLOWS those fielders to look unhurried, but that’s only because they got a great first step.

Does that sound about a correct summary of your position?


#55    Guy      (see all posts) 2011/06/27 (Mon) @ 09:37

I look at baseball, and I see a large number of sure-outs or high-probability-outs.

Three suggestions for clarifying the discussion:

1. Separate the question of whether there are a lot of “sure outs” from whether there are a lot of “sure hits.” While there is some necessary relationship between the two, they are distinct questions—it’s possible (for example) that 50% of airballs are sure outs, but only 10% are sure hits.

2. Define “sure out” and “sure hit” (90% and 10%?)
And define how many we’d need to find of each to call the distribution “bimodal.”

3. Discuss airballs and GBs separately.  The distributions may be very different.


#56    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 10:04

Guy/55: I’d be happy to discuss it more clearly as you are noting, or with any other parameters others would like to suggest.

For #2, when I say high-probability out, I’m setting it at 90%-100% out.  And for a high-probability hit, I’m setting it at 0%-10% out.

If it was perfectly uniform distribution, then we’d have 0.10 frequency for each of the 10 buckets.  And of course, this would imply that overall average would be 50% out rate. 

Since we have data that tells us that the MLB player makes an out on 70% of his BIP, then we know it cannot possibly be uniform.  Indeed, it must be skewed toward the higher out side.

As a matter of fact, all the above is true.

Anything else is speculative of course.

And my speculation was that the high-prob out was a frequency of 0.60 (based just on my guts).  Sky’s observations had it at 48% in the “automatic” category (99% out rate), and 18% in the “some effort” category (90% out rate).

As we can see, my “guts” prior (for whatever that is worth) and Sky’s observastions (for whatever that is worth) correspond to each other pretty well.

All this is based however on the assumption that we are looking at those specific fielders for that specific ball in play, and not some “average” fielder who may have positioned himself in some average distribution.

Does this help the discussion?


#57    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 10:09

As for the idea that Sky’s sample may be too small, that is also speculative.  After all, we just need n=2 for a pitcher’s fastball for the signal to exceed the noise in terms of his fastball speed.  We need 70 or 80 balls in play to know the pitcher’s GB tendencies.  We need 200 PA to know the hitter’s actual talent level.

How many BIP do we need to know how many high-probability outs there are in baseball?


#58    Guy      (see all posts) 2011/06/27 (Mon) @ 10:14

Tango:  I won’t speak for anyone else, but I don’t find it helpful to think about these proportions for a specific fielder, and even less if you are also assuming the fielder is positioned where he in fact was positioned on that play (which I think you are also assuming).  Obviously, you can answer any question you want. But how is that helpful for understanding defense, or anything else?  It seems like all you are doing is explaining/confirming that many BIP do indeed appear to be “automatic” plays.  But I don’t think anyone doubts this is true—the question is whether that appearance reflects reality or not.


#59    Guy      (see all posts) 2011/06/27 (Mon) @ 11:22

Colin/50:  Colin, any chance you could break that data down separately for GBs vs. Airballs?  And maybe include 0-10% and 90%-100% breaks (if we can all agree on those as definitions of “automatic” hits/outs)? 

My guess is that airballs are more bi-modal than GBs (as Rally argued in another thread).  And also that if you added hangtime data to the airballs, there would be a pretty high proportion of 90% out airballs (at least 50%). 

Even though I think I lean more toward the “bimodal” view than Colin/Mike, I don’t think it’s helpful to think of fielding talent as only revealed on 40 or 50 BIP for each fielder.  The fact that the spread of fielding talent is of that magnitude doesn’t mean that the spread came only from that number of BIP.  The difference between a .275 and a .325 hitter is about 25 hits, but that obviously doesn’t mean the difference in these hitters came only on 25 PAs.  Let’s say a type of LD to LF is a 90% out ball on average.  Even at that extreme, this might be a 0% out ball for Dunn and a 25% out ball for Crawford, so that BIP would still provide a meaningful opportunity for fielding skill to be leveraged. 

People are assuming that the variance among fielders on a BIP is much narrower as the overall out% becomes more extreme.  That’s obviously true at the outer limits, for 99% balls and for 1% balls.  But there may be a reasonable amount of variance even around 90% or 10%, and it must be the case that quite a few BIP fall between those values.


#60    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 11:38

It seems like all you are doing is explaining/confirming that many BIP do indeed appear to be “automatic” plays.  But I don’t think anyone doubts this is true—the question is whether that appearance reflects reality or not.

I don’t see how something that “appears” to be automatic is not in fact an automatic.  (Well, in some minority of cases, sure.  But, for the lion’s share, something that appears to be automatic is in fact automatic.)

But how is that helpful for understanding defense, or anything else? 

I’m not focused on that for this particular specific question.


#61          (see all posts) 2011/06/27 (Mon) @ 11:41

Even though I think I lean more toward the “bimodal” view than Colin/Mike, I don’t think it’s helpful to think of fielding talent as only revealed on 40 or 50 BIP for each fielder.  The fact that the spread of fielding talent is of that magnitude doesn’t mean that the spread came only from that number of BIP.  The difference between a .275 and a .325 hitter is about 25 hits, but that obviously doesn’t mean the difference in these hitters came only on 25 PAs.  Let’s say a type of LD to LF is a 90% out ball on average.  Even at that extreme, this might be a 0% out ball for Dunn and a 25% out ball for Crawford, so that BIP would still provide a meaningful opportunity for fielding skill to be leveraged.

People are assuming that the variance among fielders on a BIP is much narrower as the overall out% becomes more extreme.  That’s obviously true at the outer limits, for 99% balls and for 1% balls.  But there may be a reasonable amount of variance even around 90% or 10%, and it must be the case that quite a few BIP fall between those values.

Guy, I’m not sure we differ in our views on likelihood of bimodality based upon what you have said in your last few posts.

What you said here is exactly a point I have been trying to make.  I just didn’t explain it nearly as well you did.  I was even thinking about this last night using the example of Carl Crawford and Adam Dunn on a 90% out-probability ball to LF, so it’s perfect that you used that example.


#62    Guy      (see all posts) 2011/06/27 (Mon) @ 11:59

Mike:  Poor Adam Dunn—we should all think of someone else to pick on!  And obviously, my example was a 90% hit BIP, not 90% out.

I don’t see how something that “appears” to be automatic is not in fact an automatic.

There are at least two ways.  One is that the fielder moved rapidly into position before we observed him, making the play look automatic when it might not have been for another fielder, or even for this same fielder at another time when his reactions were not as good.  Two is that the fielder was already in an atypical position when the ball was struck, making the play “automatic” given his position but not if you assume an average positioning.  (I understand that you don’t want to assume average positioning—I just don’t understand WHY you don’t want to.)


#63    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 12:06

In your first scenario, that’s not going to happen often.  Can’t we say that of all the plays that “appear” automatics, at least 90% actually are automatics?

As for your second scenario, I already said the assumption is we are looking at this fielder, not a distribution of all fielders (and their distribution of positioning, coupled with their distribution of reaction time).


#64          (see all posts) 2011/06/27 (Mon) @ 12:09

To add to Guy/62, a third way is that a better/worse fielder would have moved faster/slower even once we were observing the fielder, but that we couldn’t visualize in our heads how much difference that would make.  I wouldn’t think an untrained observer could do that, and it’s not clear to me how well a trained observer could.  We could also be affected by how the TV producer frames the shot (e.g., is the fielder out of the picture frame, on the edge of the picture, in the middle, is the neighboring fielder in the shot, and how close was he to the ball, etc.).  That may be two ways of saying the same thing--we’re basically talking about range bias here.


#65          (see all posts) 2011/06/27 (Mon) @ 12:14

In your first scenario, that’s not going to happen often.

What? Do you believe that all fielders have similar reaction times and first jumps on all batted balls?  I wouldn’t think so.

Or are you simply saying here that given a particular fielder’s starting position, reaction time, first jump, and closing speed, and a given batted ball’s trajectory and speed, that we can determine 99% of the time whether he would have caught it or not?  If that’s all you’re trying to show here, then sure, but I don’t see the point of that exercise.

As soon as you start letting those parameters vary, the “automatic” outs start going down, the moreso the more parameters you include in your fielding model.


#66    tangotiger      (see all posts) 2011/06/27 (Mon) @ 12:22

I’m holding fixed the identity of the fielder and his starting position.  That’s it. 

And I’m saying that OF those plays that “appear” automatic, 90% of those actually are automatics (given that fielder and his starting position).


#67          (see all posts) 2011/06/27 (Mon) @ 12:35

I’m holding fixed the identity of the fielder and his starting position.  That’s it.

Those are both pretty major things to hold fixed and would severely limit the applicability of anything you could learn from the exercise.

You’re basically asking here, I guess, how much a particular fielder’s reaction, acceleration, and speed vary from play to play.  I would guess that speed would vary little (unless it wasn’t needed to reach the ball, of course), acceleration some but not a whole lot, and reaction time could vary significantly based upon what kind of read the fielder got on the pitch and the ball off the bat.  I would guess that intra-fielder variation in reaction time for a variety of batted balls would be greater than inter-fielder variation in reaction time for a particular kind of batted ball.  But that’s a guess.  Actual data is probably the only way to resolve that, if in fact that question is at the heart of what you are asking about (of which I’m not sure).


#68    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 12:37

MLB fielding percentage is, what, 98%? And that’s for balls a guy gets close enough to field. (And that drops once you start including things like infield singles.) The idea of a 99% out probability seems to me to be essentially impossible.


#69    Rally      (see all posts) 2011/06/27 (Mon) @ 12:41

When a flyball is hit, and the camera pans to an outfielder standing and waiting for the ball to come down, that’s an automatic out.

Is it possible that before the camera switched he anticipated the play, sprinted 50 feet, and came to a stop or at least slowed down considerably?  That a lessor fielder would still be running full speed and the play’s outcome would be in doubt?

Possible but not worth worrying about.  Probably as likely as the hitter hitting the ball twice on the same swing.

These are the outfield equivalents of the infield fly - a type of ball so likely to be an out that they had to make up a special rule to keep infielders from turning easy double plays on them.  By Ben J’s data in THT these plays make up about 25% of flyballs.


#70          (see all posts) 2011/06/27 (Mon) @ 12:49

I assume Rally/69 was addressed to Colin/68?  I think we need to distinguish between air balls and ground balls.  I can agree with Colin/68 on ground balls. 

I would think that there are a number of air balls, whether infield flies or outfield cans of corn with 6+ second hang times where the out probability is very high.  I don’t know exactly what “very high” would mean but I would think in the neighborhood of 99%.  Outfield fielding percentage is in the neighborhood of 99%, isn’t it?

I don’t see, though, where Ben J’s data shows 25% of air balls in this category, if in fact Rally was addressing Colin/68 with that statement.  The highest out percentage in any bucket was 97.3%.


#71    Rally      (see all posts) 2011/06/27 (Mon) @ 12:49

"MLB fielding percentage is, what, 98%? And that’s for balls a guy gets close enough to field. (And that drops once you start including things like infield singles.) The idea of a 99% out probability seems to me to be essentially impossible.”

Team fielding% is not a good way to look at it.  That includes all the force putouts, both at 1B and sometimes the other bases, and catchers catching strike three.  Errors are also somewhat rare.  It seems every day there’s a play with an infielder messing up, the scorer calling it a hit, and the announcers going nuts over it.

For infielders I’d look at plays made/(PM+err+IFhits).  That would probably give you closer to 85-90% as the top range for a ground ball.  There will be a difference in groundballs to 3B/SS compared to 2B/1B.  Probably the ones to second are closest to automatic outs, because the 2B and 1B are good at what they do. On a groundball to first the pitcher covering the bag seems to make the outcome a little more uncertain.

What’s a 99% out probability?  Popups.  Infield variety and those to the outfield that don’t require the fielder to move much.


#72    Rally      (see all posts) 2011/06/27 (Mon) @ 12:54

"I don’t see, though, where Ben J’s data shows 25% of air balls in this category, if in fact Rally was addressing Colin/68 with that statement.  The highest out percentage in any bucket was 97.3%.”

That’s using hang time along, where a 6 second can of corn is lumped with 6 second flyballs that go near the outfield wall.

Mike, #69 was addressing Tango 66 and the general idea of whether we can judge high probability outs.


#73          (see all posts) 2011/06/27 (Mon) @ 13:01

That’s using hang time along, where a 6 second can of corn is lumped with 6 second flyballs that go near the outfield wall.

Right.  With something more than Ben’s data, you might be able to demonstrate a subset of outfield air balls that have higher out probability.  Presumably you would.  But Ben’s data doesn’t show that.

Maybe we are degenerating here over the definition of “automatic”.


#74    Bob G.      (see all posts) 2011/06/27 (Mon) @ 13:16

Hey you guys, I watched the webcasts of the 2009 and 2010 Sportvision summits. Colin/#25 links to presentations from the 2009 summit. Couldn’t look at the Jensen presentation as recommended ‘cause it was too big to download at the moment. Looked at the Using Photogrammetry to Track Fielders presentation. Charts 8 and 10 in it seem applicable here. In 2010, I think they tapped into a bigger sample size and showed how the chart 10 plot started looking more like a bell shape. Time was also factored in. Is your example grounder hard or soft, Tango? Of course, Sprotvision STILL has not uploaded the 2010 presentations, even as they advertise for the 2011 summit. Thanks so much.


#75    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 13:20

Those are both pretty major things to hold fixed and would severely limit the applicability of anything you could learn from the exercise.

I’m not disagreeing about its applicability.  The question is valid on its own, as it’s fairly well- and narrowly-defined.

And the rest of Mike/67 is exactly on target.  And that’s why Sky was able to reach the conclusion he reached with his data.  And that’s why I offered the model I offered.

***

As for “automatic”, I instead offered “high probability out” of 90% to 100%.  Otherwise, Mike is right, that we’re going to degenerate over what “automatic” means.

If there is about 27 batted ball per game (19 outs), I’d think about 60% of those (16 batted balls) are high-probability outs.

If there is about 27 batted ball per game (8 hits), I’d think about 20% of those (5 batted balls) are high-probability hits.

That leaves us with 6 batted balls of the 27 where there is doubt as to it being a hit or out.  That if you were to replay it, you might see it swing either way at some non-trivial number of times.

Assumption: identity and position of fielders is fixed.


#76    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 13:29

Bob: I downloaded the Thomas presentation.  Again, it is NOT applicable here.

I am NOT asking: given the angle, what’s the chance of getting an out.

I AM asking: given all the parameters of this batted ball, given the identity of all the players involved, what’s the chance of getting an out.

It should be obvious that the fewer parameters you have, then the more the low-out and high-out plays will get merged. 

This is why you get the up-and-down chart that Matt shows. How can all the plays at 60 degrees (SS normal play) be 100% out, and at 65 degrees be 100% out, but in-between be at 70% out?

You get that because there’s something else about that batted ball that the angle doesn’t explain well-enough.  That maybe the SS is positioned, for some of those plays, way to the other side.

***

I’m going to put this in all my posts from now on:
assumption that the identity of fielder and starting position is fixed.


#77    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 13:35

By the way, thanks Bob for looking that up.  The effort is definitely appreciated.


#78    Guy      (see all posts) 2011/06/27 (Mon) @ 14:21

With something more than Ben’s data, you might be able to demonstrate a subset of outfield air balls that have higher out probability.

I would think you could identify a decent number of automatic outs just using hangtime and distance together, both of which Colin looked at in his THT article.  For example, he reports that balls with 4.0 second hangtime have a .64 out probability.  But if you divide those by distance—say, 350+, 250-350, under 250—I bet you get very different out rates.  Colin:  did you look at the interaction of distance and time in that data?  In fact, calculating the horizontal velocity for balls—distance/time—might be a good objective way of distinguishing between what we call flyballs, line drives, and “fliners.”


#79    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 14:51

Looking at Project Scoresheet data, the out rate on popups is “only” 93%. If there’s a catch/no catch bias on the PU/FB boundary, the real out rate may be lower.

(Gameday popups have a higher out rate, but we know there’s a catch/no catch bias there, as the definition in the MLBAM specification is:

• A ball hit on a high, short arc that is caught on the fly by an infielder only
• Used only for balls fielded by infielders, NOT by outfielders
• Used only for outs, NOT for hits, unless an infield pop falls untouched among several fielders

And even then, the out rate is only 97%.)


#80    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 15:27

Colin, what is your best estimate at the frequency of batted balls that are expected to be out 90% to 100% of the time (given the identity of the fielder and his positioning for the ball in question to be fixed)?


#81    Guy      (see all posts) 2011/06/27 (Mon) @ 16:02

Colin:  I assume that popups include some flyballs that fall just beyond the range of infielders.  Is that right?  If you looked at popups that travel no more than 120 feet, you’d probably get 97-98% outs. 

The more interesting question, I think, is what happens to flyballs with 4.5+ seconds of hangtime that travel say 200-300 feet.  That’s the kind of routine flyball to the OF that many of us think happen with some frequency and which will be outs more than 90% of the time.  Are we wrong?


#82    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 16:38

Colin, what is your best estimate at the frequency of batted balls that are expected to be out 90% to 100% of the time (given the identity of the fielder and his positioning for the ball in question to be fixed)?

I don’t know. And before I sink extra effort into trying to figure out what my best estimate is, can I ask - what is knowing this in aid of?


#83    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 16:49

I spent minimal effort in my best estimate.  I don’t ask for anything more than that.  Your gut reaction as to what you think, or some back-of-the-envelope calculation.

I should say “current best estimate”, meaning, whatever you think at this very moment.  And I understand you’ll have a high uncertainty level.


#84    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 16:53

It would still help me to understand the question if you could explain why we care about the answer.


#85    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 17:02

I’d like to know how much variability there is in turning a batted ball into a hit or out (if we hold constant: the identity of all players, the position of all fielders, and the flight parameters of the ball itself).

So, I am asking you to hold constant what I just said, and tell me what you think is your current best guess as to how often a batted ball will get turned into an out 90% to 100% of the time, if we can replay that event an infinite amount of time (with no learning naturally).

I hope that’s a clear enough question that you can give me some reasonable answer.


#86    Guy      (see all posts) 2011/06/27 (Mon) @ 17:04

And if I can add one question and a suggestion: 

Q:  Won’t the answer be exactly the same even if you remove your two assumptions? 

Suggestion:  separate questions for GBs and airballs.


#87    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 17:14

Guy: before we add a second question, let’s get an answer to the first question.

If Colin wants to provide two answers split by batted ball, then he can do so.  But I just need some answer.


#88    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 17:20

Q:  Won’t the answer be exactly the same even if you remove your two assumptions?

I was all set to explain why this wasn’t so, and then I realized that I was an idiot and yes, you’re absolutely right. So that simplifies things.

So… 20%?


#89    Guy      (see all posts) 2011/06/27 (Mon) @ 17:29

Tango:  fair enough.  But you’ve argued several times in these threads that the distribution will be much more bi-modal if we take the fielders and their positions as a given, much less bi-modal if we compare outcomes to average fielders.  I can’t see why this would be true; in fact, I’m pretty sure it’s not.  So I’m wondering what you’re getting at here.....


#90    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 17:42

As for the idea that Sky’s sample may be too small, that is also speculative.  After all, we just need n=2 for a pitcher’s fastball for the signal to exceed the noise in terms of his fastball speed.  We need 70 or 80 balls in play to know the pitcher’s GB tendencies.  We need 200 PA to know the hitter’s actual talent level.

How many BIP do we need to know how many high-probability outs there are in baseball?

The problem with Sky’s study isn’t the sample, it’s the methodology. The methodology is terrible.  The data he’s collected is perfectly useless, except as a way to quantify how incorrect such an approach is.


#91    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 17:59

If Colin wants to provide two answers split by batted ball, then he can do so.  But I just need some answer.

Now that I’ve provided an answer, can you explain why you need one?


#92    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 19:51

Colin,

I’m not looking at more than what I am asking. I’m just trying to get everyone on a common basis of discussion, so that we don’t talk in circles.

Basically, you have my question in Tango/85, shich I hope is clear to everyone. 

And Mike/67 has many of the considerations that should impact your answer.  I’ll repeat it here because he described it well-enough:

You’re basically asking here, I guess, how much a particular fielder’s reaction, acceleration, and speed vary from play to play.  I would guess that speed would vary little (unless it wasn’t needed to reach the ball, of course), acceleration some but not a whole lot, and reaction time could vary significantly based upon what kind of read the fielder got on the pitch and the ball off the bat.  I would guess that intra-fielder variation in reaction time for a variety of batted balls would be greater than inter-fielder variation in reaction time for a particular kind of batted ball.  But that’s a guess.  Actual data is probably the only way to resolve that, if in fact that question is at the heart of what you are asking about (of which I’m not sure).

So, given the question and (some of) the considerations to impact your answer, Colin gave us 0.20 as the frequency of batted balls that would have a high-probability (90%-100%) of being out if repeated over and over again.

My guess was 0.60 of batted balls would have a high-probability of being out.

What this means from Colin’s perspective is that he’s got alot more balls in the 70%-90% out range than I would.  He’d have to have quite a bit, since we know the mean for the league is 70% out rate.


#93    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 19:54

Guy, can you explain:

I can’t see why this would be true; in fact, I’m pretty sure it’s not. 


#94    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 21:27

Also Guy, can you also answer my question in Tango/85?


#95    Guy      (see all posts) 2011/06/27 (Mon) @ 22:34

Tango:  I don’t see how the identity of the fielder and his positioning changes the answer.  Let’s say you think 30% of all BIP are sure outs (90%) on average, across all fielders.  That’s only on average of course, so when Jeter is on the field maybe that drops to 28%.  But when excellent SSs play, it rises to 32%, so overall it’s still 30%.  In some cases the fielders will be better positioned than average, but in other cases their positioning is worse than average—again, it’s a wash.  Why/how would the proportion rise if we consider specific fielders? 

As for the original question, I don’t have a very good sense of what proportion of GBs are sure outs.  On airballs, I would guess maybe 30-40% are sure outs (including popups, excluding HRs).


#96    Tangotiger      (see all posts) 2011/06/27 (Mon) @ 22:42

Guy,

Let’s say you have a batted ball that would result in an out if:
- great SS positioned anywhere at minus 22 to 14 degrees
- good SS if positioned anywhere at minus 20 to minus 16 degrees
- half-decent SS if positioned at minus 18 degrees

If you replay it over and over again, the great SS, positioned at minus 22 to minus 14 degrees will make the out at least 90% of the time.

If those SS were positioned one degree outside that range, the chance of making an out goes down to 80%.  If they were two degrees outside that range, it drops down to 50%, etc.

We’re not looking at the overall aggregate where you get a wash.  We’re looking at each individual batted ball, evaluating that batted ball, and summing it.


#97    Colin Wyers      (see all posts) 2011/06/27 (Mon) @ 22:57

We’re not looking at the overall aggregate where you get a wash.  We’re looking at each individual batted ball, evaluating that batted ball, and summing it.

I fail to see how those two things are different. It reads to me like talking about the difference between the league-average OBP and the weighted average of each player’s OBP.

Guy: In the Project Scoresheet data, 35% of airballs are 90%+ outs, while 22% of ground balls are, or 29% aggregate. That’s taking into account distance, vector and “hang time” as approximated by the four batted ball types.


#98    Guy      (see all posts) 2011/06/27 (Mon) @ 23:14

Tango: Like Colin, I don’t follow the logic.  If you tell me how many great, good, and half-decent SSs there are, and the distribution of positionings each uses on this type of BIP, then I can tell you whether this BIP type will become an out 90% of the time.  It’s the same answer you will get. 

Colin:  Thanks.  The question remaining is how much higher are these figures once we take account of 1) more precise hangtime, 2) more precise location, 3) pitcher and hitter handedness, 4) base/out/score, and 5) hitter characteristics?  Going the other way, the ball type may have inflated the 35% and 225 due to result bias.  I would increase my guess on airballs to between 40% and 50%.


#99    Guy      (see all posts) 2011/06/27 (Mon) @ 23:16

98 should say “inflated the 35% and 22%”


#100    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 07:34

I can tell you whether this BIP type will become an out 90% of the time

You are aggregating across all fielders.  You can’t do that.

Let me expand here:

Let’s say you have a batted ball that would result in an out if:
- great SS positioned anywhere at minus 22 to 14 degrees
- good SS if positioned anywhere at minus 20 to minus 16 degrees
- half-decent SS if positioned at minus 18 degrees

And let’s say that all three SS are positioned at minus 20 degrees.  They all see exactly the same batted ball.

The first guy turns it into an out 99% of the time.  The second guy turns it into an out 90% of the time.  The third guy turns it into an out 50% of the time.

For this batted ball, the average out rate is 80% across all fielders.

But, I am saying the rate of shortstops that will turn this batted ball into an out at least 90% of the time is 0.67.  That is 2 out of 3 SS sees this ball as a high-probability out.

By aggregating as you are suggesting, then this batted ball looks like an 80% out batted ball.

***

Related question: suppose that Roy Halladay’s true talent level is such that the Phillies win .600 of their games with him on the mound.

Count as “1” any time the Phillies have a greater than 50% chance of winning with Roy Halladay on the mound.

What percentage of the games are the Phillies favored to win?  Is it 60%?  Or more than 60%?


#101    Guy      (see all posts) 2011/06/28 (Tue) @ 08:42

OK, some balls in the 80% bin will actually be 90% balls.  But some balls in the 92% bin will really be 85% balls.  Why does your rate have to be higher than the overall proportion of balls in 90% bins?


#102    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 09:06

The balls in the 80% bin will belong to 95% SS, 90% SS, 85% SS… 60% SS.

The balls in the 92% bin will belong to 99% SS, 95% SS… 70% SS.

I don’t see your point.


#103    Guy      (see all posts) 2011/06/28 (Tue) @ 09:27

But you’re assuming a wildly implausible distribution of fielding skill:  a large number of good fielders, and a small number of sub-Jeter fielders.  You’re essentially suggesting that the 70% overall out% is the result of having a large number of 80% fielders, plus a few 50-60% fielders.  But we have a huge amount of evidence that this isn’t true.  Alternatively, you must be assuming a very asymettrical positioning distribution:  a lot of SSs positioned well, and a few SSs who wandered into short right field by mistake.  But do we have any evidence for such skewed positioning?

In reality, your 80% BIP will have something like these probabilities:  90%, 85%, 75%, 70%.  And if you use plausible distributions, then the proportion of 90% balls will be about the same whichever way you calculate it. 

Alternatively, I’m just missing something very obvious.  Anyone else want to weigh in?


#104    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 09:53

I’m not presuming anything.  I’m just pointing out it’s not at all the same thing.

It may look like the same thing if you’ve got tight fielding talent and tight positioning, but that’s besides the factual point I’m making.

I just need to everyone to accept the various points being discussed, so that we are all talking about the same thing.


#105    Peter Jensen      (see all posts) 2011/06/28 (Tue) @ 09:53

The balls in the 80% bin will belong to 95% SS, 90% SS, 85% SS… 60% SS.

The balls in the 92% bin will belong to 99% SS, 95% SS… 70% SS.

Tango - I must be completely misunderstanding what your bins represent.  If a ball is in the 80% bin does that mean that it is turned into an out 80% of the time?  If a shortstop is a 95% shortstop does that mean that he turns ground balls in his zone into outs 95% of the time?  If these two statements are true, I think you have the quoted statement above exactly opposite of reality.  The balls in the 80% out bin will be turned into outs by only the higher rated shortstops and the balls in the 90% out bin will be turned into outs by almost every shortstop.


#106    Guy      (see all posts) 2011/06/28 (Tue) @ 10:00

but that’s besides the factual point I’m making

I do not think that word [factual] means what you think it means.  :>) I would say you are making a theoretical distinction, and one that happens not to be consistent with known facts. 

And you haven’t just argued about what theoretically could be true—you have argued in these threads that the distribution MUST be more bimodal if you take individual fielders and positioning into account.  Even if that’s theoretically possible, I can’t see why that has to be true, or is even likely to be true.


#107    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 10:14

Peter, I don’t think I described it well enough.

Let’s say we have these as our 5 shortstops in the league, in decreasing order of talent:
Ozzie
Elvis
Freddie
Michael
Derek

They’ve all been given the exact same batted ball to turn into an out.  And they get to repeat to field that batted ball 100 times.  We get this as their out rates for this particular batted ball:
92 outs Ozzie
89 Elvis
86 Freddie
83 Michael
80 Derek

The average number of outs for all our (5) SS in the league is 86 outs per 100 batted ball (for this particular batted ball).

Do we classify this batted ball as a “high-probability” out, if the threshold is 90%?  For one SS, it is.  For the other 4, it is not.  So, for this batted ball, we get 0.20 as the frequency.

Brian for example would say “no”.

Let’s say we have another batted ball, and for this batted ball, this is the result:
99 outs Ozzie
96 Elvis
93 Freddie
90 Michael
87 Derek

The average is 93 outs per 100 opps.  Is this a high-prob out batted ball?  For 4 shortstops it is, and for 1 it isn’t.  In this case, 0.80 is the frequency.

Brian for example would say “yes”.

It’s a question of what level of aggregation you are going to do before you answer the question.  And I’m saying don’t aggregate the fielders so that you are left with an “average fielder”.

***

Setting aside WHY I want to do this, is it clear enough WHAT I am doing?

***

In my case, I’m not asking about this batted ball for all other SS.  I’m just asking for THIS SS and this particular positioning.


#108    Peter Jensen      (see all posts) 2011/06/28 (Tue) @ 10:35

Setting aside WHY I want to do this, is it clear enough WHAT I am doing?

In my case, I’m not asking about this batted ball for all other SS.  I’m just asking for THIS SS and this particular positioning.

Not really, but I have decided that I don’t really care, so don’t try and explain any further for my benefit.


#109    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 10:47

Peter, fair enough.

I suppose I’m doing this for the benefit of people who are challenging (disagreeing with) what I’m saying, and/or for the people out there who are just interested and are remaining silent.


#110    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 10:59

Another way to describe it, without holding me too strongly to the two-parameter limits of this image, is that you’d create this image for each individual SS:

And then I ask you how many data points are you going to find in his individualized green zone (treating the green-zone as high-probability outs).

And perhaps for Ozzie, you’ll find 65% such data points, and for Freddie you’ll find 60% and for Derek you’ll find 55% (or whatever it is).

What you DON’T do is just use the average red line.


#111    Guy      (see all posts) 2011/06/28 (Tue) @ 11:31

I think Colin/97’s data is interesting.  He has 35% of airballs as automatic outs.  While outcome bias in ball type classification could inflate that number, I don’t think that likely has much impact on this calculation (the biggest bias is likely calling hits linedrives, but few of these non-obvious ball types are likely to be 90% balls anyway).  If we accounted for all the other relevant factors (pitcher, hitter, base/out/score) and had more exact location, I have to think at least 40% of airballs meet our “automatic out” definition, and maybe closer to 50%.  That’s a lot higher than Dudek’s 17% based only on hangtime, as we’d expect.  What I wonder is how big is the 0-10% category?  What does the rest of the distribution look like?

On GBs, only 22% are automatic in Colin’s data.  That’s much lower than most of us “feel” is the reality, I think.  Of course, a lot of important factors are still missing:  pitcher, batter, velocity, base/out, and precise location.  How much higher do we think the 22% would be if we could account perfectly for all of that?  Maybe 50%?  Probably less.  While the GB distribution may be “bi-modal,” I think it’s also likely that the proportion of balls in the 10-90% range will be larger than most fans’ intuition would suggest.

This reminds me that MGL has said that virtually no GBs are rated in UZR at much more than 80-85% out probability.  That would be consistent with a more smooth distribution than usually assumed.  (But is also a strong argument IMO against treating errors separately, the way UZR does.)


#112    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 11:39

The “automatic” classification that Colin is referencing doesn’t hold the player constant. 

We’re going to be talking about two different things if we don’t resolve the issue.  At the very least, specify the assumption.


#113    Guy      (see all posts) 2011/06/28 (Tue) @ 12:04

We’re going to be talking about two different things if we don’t resolve the issue.  At the very least, specify the assumption.

OK:  my assumption is average fielders and average positioning, which I think is the default assumption in all defensive metrics (which isn’t to say it’s the “right” benchmark, but certainly a common one).  Also, I think most of us believe they are in fact the exact same thing (or close enough that the difference is insignificant), and I don’t see the point to debating that further.


#114    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 12:08

Also, I think most of us believe they are in fact the exact same thing (or close enough that the difference is insignificant), and I don’t see the point to debating that further.

Well, I’ve already described how they are not the same.  We don’t need to debate it further, but you also can’t presume that they are the same (in this blog anyway).

To do so means you are dismissing my argument, and then presuming that you don’t need to qualify your statement.

I’m going to respect that you have a valid view, and therefore when I describe the number of high-probability outs, I will state my assumption.

Just grant me the same respect, and then we don’t have to have a debate about it.


#115    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 12:12

So for Guy (and whoever else):

“my assumption is average fielders and average positioning” is a great qualification.

In my case:

“my assumption is this particular fielder and this particular positioning for this particular batted ball”

Sounds reasonable?  You can even short-hand it with “using the Brian Assumption” and “using the Tango Assumption”.


#116    Peter Jensen      (see all posts) 2011/06/28 (Tue) @ 12:15

Tango - I am not sure why you would create the graph you showed in #110 for each individual player.  Greg’s plan for that graph was to have the green and blue areas represent the automatic hits and automatic outs for all players at that position.  The datapoints for an individual player would then be plotted on the graph for an indivual player at that position and only the points in the white area would be considered relevant for assessing the player’s range talent.  So a player like Ozzie might have 600 total datapoints, of which 80 would be in the white area, and of those 80 Ozzie might field 75 of them for outs.  Derek would also have 600 datapoints and also have 80 in the white zone, but of those 80 he might field only 15 of them for outs.  Basically, the datapoints in the green and blue areas are considered to be irrelevant to assessing a fielder’s skill because they are considered either hits or outs for all fielders at that position.

I am just not sure what you could do with individual green and blue areas tailored for a specific fielder.


#117    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 12:30

Again, I am not talking about valuation! 

Yes, for valuation purposes, for that specific chart (league-wide), yes, the green and blue zone is the automatics, and is nothing but noise.  And should be discarded for valuation purposes.

There is no disagreement here.

I am only talking about classifying a ball as an automatic or not, GIVEN the fielder and his positioning.  Ozzie Smith, given perfect positioning (i.e., advanced knowledge of where the ball was going), would have the entire chart as a green zone.  You would NOT discard all that data as being noise!  The high frequency of automatics means it’s a combination of: automatics for any fielder, and “automatics” for this particular fielder (based on his talent and positioning).

***

Man, I suck at explaining this concept…


#118    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 12:31

Forget the why… focus on the what.


#119    Peter Jensen      (see all posts) 2011/06/28 (Tue) @ 12:53

Also, the distance measure on the x axis is the distance for the player’s starting position for that particular hit ball to the position where he fields the ball or where the ball passes nearest him in the case of ground balls, or where the ball lands or is caught in the case of air balls.  This, of course, is information that we don’t have unless we have Field Fx.  Hang time on the Y axis, while not currently freely available at all, is at least possible to acquire in other ways.

What Greg and I have differed on is how many of a fielder’s chances will end up in the meaningful white zone of that graph.  I have also questioned whether other factors like direction of the route a fielder has to take to the ball, and spin on the ball will require separate graphs.  And of course there will still be some variation of how ths balls are distributed within the white zone.  Some fielders may have a disproportionate number nearer the sure out zone during any given year, others more toward the sure hit zone.


#120    Peter Jensen      (see all posts) 2011/06/28 (Tue) @ 13:00

Tango - I can see the usefulness of Greg’s graph.  I can’t envision any usefulness for yours.  In either case the information that you want is dependent on Field Fx and it looks like it will never be available in a league wide database for any player playing before 2012 at the earliest.


#121    Guy      (see all posts) 2011/06/28 (Tue) @ 13:05

I’ll take one more crack at this.  Let’s imagine we build a regression model to predict the out% for a large sample of BIP.  (Methodology similar to Max’s catcher studies).  We begin with precise information about the actual BIP only:  location, velocity, hangtime, distance.  And we find that, say, 35% of BIP have an expected out% of 90%+. 

Then we add a series of variables that I think we all agree will “bimodalize” the distribution:
The hitter
The pitcher
Base/out
Inning/score.
So let’s say that now our model tells us that 45% of all BIP are automatic outs.  Are we all agreed that this general story makes sense thus far?

Now, Tango wants to add two more variables:
The fielders
The fielders’ locations
In his view, our model will now yield a still larger proportion of automatic outs—maybe 50% or more.  (If I’m mischaracterizing your position, Tango, I apologize.  This is what I hear you saying.) My expectation is that adding those 2 variables won’t change the number of predicted 90% balls much if at all.  When Ozzie is on the field, the proportion of automatic outs will be higher (and automatic hits smaller), but when Derek is on the field it will be lower.  Good positioning will be offset by bad positioning.  Our R^2 will presumably rise—we will get better at predicing the outcome on each ball—but we’ll still end up with the same 45% of BIP having a 90% out projection. 

And if I suspend disbelief, and assume the fielder and location will yield a different answer, what then?  I can’t imagine any data that will allow us to answer the Tango question—it seems to me that we can only hope to answer the “Brian” version of the question.


#122    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 13:08

Peter/119:

I qualified my post with:
“without holding me too strongly to the two-parameter limits of this image”

I was hoping that seeing it visually would help.  If it didn’t, then ignore that I posted that illustration.

***

Peter/120:

“I can’t envision any usefulness for yours. “

Again, I’m not asking you (or anyone) to see the “why”.  Just the “what”.


#123    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 13:33

Guy/121: a perfect summary. 

This is the only part where we disagree in terms of expectations:

My expectation is that adding those 2 variables won’t change the number of predicted 90% balls much if at all.  When Ozzie is on the field, the proportion of automatic outs will be higher (and automatic hits smaller), but when Derek is on the field it will be lower.  Good positioning will be offset by bad positioning.

Theoretically, and as you conceded I think, if the talent level and/or positioning skill is wide enough (unrealistic let’s say at the MLB level), you will get a different answer.  Your presumption is that at the MLB level, we’re splitting hairs.

That is an interesting theoretical discussion that I’d be glad to have.

I can’t imagine any data that will allow us to answer the Tango question

This was the intent of Sky’s article that’s been linked to in the past.  And I think we could crowd-source this. 

Yes, the Brian question is easier to answer, and will have more usefulness in terms of answering.  That doesn’t preclude the Sky question.

***

Great, I think we’ve reached a common point of understanding!  Thanks for your patience.  You always go the extra mile around here, and I hope others appreciate it as well.


#124          (see all posts) 2011/06/28 (Tue) @ 13:41

This was the intent of Sky’s article that’s been linked to in the past.  And I think we could crowd-source this.

The problem with Sky’s method is the inherent limitations of the observer and the vantage point.  How is crowd-sourcing a bad method going to improve the results?


#125    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 13:48

The methodolgy is not a problem… it’s a limitation.  Crowdsourcing helps.  We can either hold to our opinions and leave it at that, or we can discuss it further if you like.


#126    Colin Wyers      (see all posts) 2011/06/28 (Tue) @ 14:54

Crowdsourcing helps.

How? The fundamental problem is that you’re inferring initial position from the fielder’s position when the camera cuts to him.


#127    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 14:57

I meant crowdsourcing at the game.  I agree, it doesn’t help if we have multiple eyes using the same lens.


#128    Guy      (see all posts) 2011/06/28 (Tue) @ 15:23

I think crowdsourcing Sky’s method would be useful mainly as a way of documenting and measuring how much range bias there is.  Which could be useful, but obviously wasn’t what you had in mind. 

A better use of crowdsourcing, I think, would be having fans try to record the position of fielders BEFORE each ball is hit.  It would be useful to know how much fielder positioning really varies at each position, broken down by batter/pitcher handedness, base/out, etc.  And this could also provide some insight into how bimodal the out distribution really is:  the only way Sky’s distribution can be right (given what we think we know about the distribution of BIP) is if there is huge variation in defensive positioning.  If the fans don’t see that, then there’s a limit to how much bimodality there can really be.


#129    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 15:36

I think Matt Thomas recorded positioning.  We had some thread a long while back on it I think.


#130    Rally      (see all posts) 2011/06/28 (Tue) @ 15:43

Fielder positioning?  This could be done (and has, though I can’t remember the name) much better by one guy with a camera from a fixed position than from multiple people recording a rough estimate of position.

As to the question of bimodality:  I think we’re looking at the wrong part of the question.  The high probability hits are the interesting part, not the high probability outs.

There has to be a large number of high probability outs.  Even if with perfect data we found that the 95% outs are extremely rare, then that would just mean there must be a large proportion of outs somewhere between 70% and 95% - since the overall out rate is around 70%.

So bimodal or just right skewed?  Do we have a large proportion of high probility hits around the <5% out level? - more than in the 10-20-30% ranges?  Or is the left portion of the graph more even?


#131    Rally      (see all posts) 2011/06/28 (Tue) @ 15:44

Thanks Tango - Matt Thomas is the name I was searching my brain for.


#132          (see all posts) 2011/06/28 (Tue) @ 15:52

http://www.insidethebook.com/ee/index.php/site/comments/fielder_positioning_by_matt_thomas/

But the link to his presentation is broken.


#133    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 15:58

Feel free to move the positioning aspect of the discussion there.  I’ll see if I am allowed to upload his PDF file to my site.


#134    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 16:13

Matt has this data in his presentation:

Three noteworthy time regimes:
• 1.5 sec <= t <= 3.5 sec: n = 224, 58 of 224 (.259) were outs or errors
• 3.5 sec < t <= 5 sec: n = 217, 171 of 217 (.788) were outs or errors
• 5 sec < t <= 7.33 sec: n = 220, 216 of 220 (.982) were outs or errors

Using hang-time alone, based on 660 or so airballs in play, one-third in the 5+ seconds of hang time were outs.

Presumably, some of those 3.5 to 5.0 airballs had alot of “out of range” balls (i.e., in the gap), so we have a mix of easy-to-get and tough-to-get airballs (contributing to a .788 out rate).  If you include angle, you’d get fair amount of easy-to-get balls.

So, pooling all the fielders, and separating by angle and hang time, we’d probably get at least 40% in the “easy to get” category, if not higher.


#135    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 17:02

The high probability hits are the interesting part, not the high probability outs.

We’ve come full circle, because this is the title of the thread!

What parameters could you use where you’d get the out rate below 10% (i.e., expected hit rate of 90% to 100%)?

For groundballs, you can’t do it on angle alone.  Other than right up the middle, the hit rate peaks at about 60%.  What other parameters can you add to (i.e., make the bins smaller), so that we can disentangle some of the surer outs from this bucket, so we can see surer hits?  I think you’ll probably have to say that you can’t.  Not unless you include the starting fielder position.

For airballs?  Just hang time was not enough.  The Matt Thomas data gives you 75% hit rate for 1.5 to 3.5 hang time.  Maybe we need to make the time buckets smaller?  Using Ben/7’s data, hangtime of under 2.5 seconds gives us a hit rate of greater than 95%, so that’s a good sign.  Except, this is less than 1% of all air balls.  Make the bucket any larger, and you close to the 75% hit rate that Matt Thomas reports.

So, can you break up the hang time bucket further?  Well, you can include spray angle.  You might get it higher to close to 90% maybe.

Otherwise, to get a high-probability hit, again, you are left with talking about the starting position of the fielder to be used as a parameter.


#136          (see all posts) 2011/06/28 (Tue) @ 17:20

Using Ben/7’s data, hangtime of under 2.5 seconds gives us a hit rate of greater than 95%, so that’s a good sign.  Except, this is less than 1% of all air balls.

Small point here, and perhaps it was just a typo, but it’s 19% of air balls in those two buckets in Ben’s data.


#137    Tangotiger      (see all posts) 2011/06/28 (Tue) @ 19:31

Mike, thanks.  I was counting the wrong column.  That certainly changes things.


#138    Guy      (see all posts) 2011/06/28 (Tue) @ 23:54

For groundballs, you can’t do it on angle alone.  Other than right up the middle, the hit rate peaks at about 60%.  What other parameters can you add to (i.e., make the bins smaller), so that we can disentangle some of the surer outs from this bucket, so we can see surer hits?  I think you’ll probably have to say that you can’t.

Well, velocity will matter a lot—very slow and very fast GBs will both be hits more often.  Hitter and pitcher handedness will help, as will base/out.  And the identity of the individual hitter would be very important I think, both in terms of his speed to first and where fielders will be positioned for that hitter.  What’s hard is you really want to meaure all of these interactions, so your model is incredibly complex.  Having a runner on first, for example, might reduce the out% for a GB to right side, but increase the out% on GBs to the left side (by creating a force at 2B).  Having Pujols at the plate might reduce the out% in some locations (where infielders rarely position themselves against Pujols) but increase the out% on slow rollers (because he’s not very fast).  If you could build such a model, and had the necessary data, I’m sure you would end up with some automatic GB hits.  But probably fewer than our intuition tells us is true.


#139    Tangotiger      (see all posts) 2011/06/29 (Wed) @ 00:09

Guy, right, the more you can try to break it down, the fewer BIP you end up with.  (By the way, I already included batted hand in my breakdown… forgot to mention that.)

On the other hand, if you had starting position of fielder, you’ll be able to find a fair number of BIP in the high-probability hit category.  I don’t know if you have that as off-limit in your breakdown or not.

It’s almost as if you might be saying that you want to include everything, except anything related to the skill of the fielders.


#140    Tangotiger      (see all posts) 2011/06/29 (Wed) @ 00:27

And I understand why you are doing that.  You are asking: “What’s the chance of seeing a high-probability hit, if I don’t know who the fielder is, and I don’t know where he’s positioned.”

And by “I don’t know”, you are instead substituting the normal distribution of fielding talent and normal distribution of their individualized positioning skills, relative to the scenario being presented.

And that is the more interesting question.

But, what if you take a step the other way: “What’s the chance of seeing a high-probability hit, if I DO know who the fielder is, and I DO know where he’s positioned.”

Then you’ll get a bigger number, because finally you’ll have enough information from breaking out of the 80-85% hit range into the 90% hit range… but less useful in terms of applying it to fielders.


#141    Tangotiger      (see all posts) 2011/06/29 (Wed) @ 00:34

If I can continue, basically, you have Ozzie Smith at SS, and he surveys all the parameters, and he figures, ok, Cal would be here, Davey would be there, and if the ball is hit this way, then that ball would get through 85% of the time.  If it was Cal only, it would get through 93% of the time, but if it was Davey, it would get through 81% of the time.  So, it’s not a surehit, because, for ALL SS, it’s 85% of the time it’s a hit.

And for whatever reason, we don’t want to call it “30% of SS, it’s a high-prob hit, and 70% of SS it’s a prob hit”.  We instead want to say “for the AVERAGE SS”.  Rather than for the distribution of SS.


#142    Guy      (see all posts) 2011/06/29 (Wed) @ 08:27

You are asking: “What’s the chance of seeing a high-probability hit, if I don’t know who the fielder is, and I don’t know where he’s positioned.” And by “I don’t know”, you are instead substituting the normal distribution of fielding talent and normal distribution of their individualized positioning skills, relative to the scenario being presented.

Right, that’s the question I find most interesting.  I’m not sure what I would do with the information you’re looking for.  Plus I don’t have enough confidence in the Sky methodology to think it could ever come close to giving us the answer.  (Although it would be interesting to know, in the aggregate, the distributional impact of good SSs.  Does Ozzie/Balanger have most of his impact in the middle range, turning 50% balls into 60%, while having much smaller impact on 15% and 85% GBs?  Or is the gain more similar across the distribution of BIP?)

But I’m coming around to your view that controlling for fielder and position will increase the size of the 90% bin.  The issue I think is the shape of the distribution you get without that information.  If there are lots of 75-90% BIP, but relatively few 90-99% BIP, then I think you are correct that incorporating fielder/position will boost the 90% bin—because many more 85% balls will become 92% balls than vice versa.  And that presumably is the case.  If we were instead asking what proportion of balls are over/under 70%, then I think adding player/position wouldn’t make a difference.


#143    Tangotiger      (see all posts) 2011/06/29 (Wed) @ 09:52

First, re-thanks to Mike for correcting my math.  I can’t believe I did that.

Anyway, Ben’s data has 18.5% of airballs as “high probability hits” using ONLY hang time.  If we also included spray angle, perhaps some of those 2.5-3.5 seconds plays would also count as high-prob hits (i.e., they went into the gap).  So, we’re talking about, what, 20%-25% of airballs as high-probability hits, if we include other parameters?

And, I think we agree, that it’s going to be tough to find high-probability hits on the ground (barring the use of fielder position).  So, say we’ll get maybe 5% of groundballs being classified as high prob hits if we use as many parameters as we can?

There’s about 55% airballs and 45% groundballs, so we have:
20% - 25% times 0.55
plus
5% times 0.45
equals
about 15%

So, about 15% of all batted balls are “high probability hits”?

Sky had it at 17% using his observations.  I had it at 15% (10% for the sure-hit, and split in half the other 10% that I had on the threshhold of 0.10 outs per play for an over/under).

Seems to me that 15% of batted balls that have at least a 90% chance of being a hit is a good working number.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 09:31
Do pitcher’s reach back for velocity when needed?

May 25 08:11
What sabermetrics is NOT

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story