THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, October 17, 2011

Not all fielding opportunities are created the same

By Tangotiger, 10:40 AM

I said this in another thread several weeks back, but I’ll repeat it here anyway.

Batting Opps

A batting opportunity gives you a certain expectation of getting on base, dependent on the context.  That context is the pitcher primarily, but includes the park and the fielders, the base-out state, and other minor things.  So, the chance of reaching on base context for the average batter will stretch from say .250 to .400, or, to put it in clearer numbers: 25% to 40%.  That range is actually fairly tight, if we think of that range as encompassing 99% of the data points presented.  Basically, one SD = 2% to 3%.  So, the OBP would be 33% +/- 1SD=2.5%, or something like that.

If you come to bat 625 times, then the SD goes from 2.5% for 1 PA down to 0.1% for 625 PA.  Therefore, a player coming to bat over a season would have the chance of reaching base context being 33% +/- 1SD=0.1%.  This is why we can say that if Pujols comes to bat 625 times and Ryan Braun comes to bat 625 times, that they’ve faced similar enough opportunities over a season to reach base.

Of course, there are systematic biases.  You will see a starting pitcher 3 times in a game, not once.  You are stuck to your home park half the time, etc.  Those biases add up, and so the “strength of schedule” is probably more like 33% +/- 1SD = 0.3%.  Just a guess, but something like that.

Fielding Opps

A fielding opportunity is far different.  Any baseball fan will know there are many gimmes for a fielder (out rate = almost 100%) to impossible to get (out rate = 1%).  (It also depends how you want to handle positioning as a skill to the player or the manager.) So, we end up with say one SD = 30%, with a mean of 70% out rate.

In addition, fielders get fewer opportunities than batters (strip out the BB, HB, HR, SO, and depending how you want to handle bunts). More like 450 let’s say.  So, the 1SD = 30% for one ball in play becomes 1SD = 1.5% for a whole season of balls in play for a given fielder. Add in a bit of systematic bias, and now we have 1SD = 2%.  (As an illustration.)

That’s the kind of distribution a fielder is going to face in terms of “strength of schedule” thinking.  One SD being 2% on 450 plays is 9 outs (or 7 runs)!  That’s for one standard deviation.  Hence, the reason we don’t like just having a straight “range factor” (i.e., outs per ball in play).

Fielding Opps Classification

That’s why it’s critical to try to qualify each opportunity specifically.  You do that by adding more knowledge to each BIP: its vector, its hang time, etc.  You try to establish some sort of out rate for each fielder for each ball in play: that is, his opportunity to make an out.

This is the idea behind most of the advanced fielding metrics.  The idea is on the right path.  The question is how much tighter can we make that range in opportunity that I mentioned (one SD = 2%).  The better you can classify each batted ball, then the more you know about the quality of the opportunity, and the tighter the range.

You have to be careful that by classifying each ball that you don’t introduce a systematic bias, because that becomes a killer.  In this case, the larger the sample size, the worse off you are!  That’s because a systematic bias becomes persistent across years.

Suppose that we don’t know any extra classification, and we just accept that one SD = 2% for a season of batted ball data.  If you have 4 years of data, similarly unadjusted, then one SD = 1%.  This is why sample size is our friend here: instead of relying on classifying batted balls, you just increase your sample size.  If you have 16 years of data, one SD = 0.5%, and now we’re happy.  This is why for a career’s worth of fielding stats, we don’t need to worry about adjustments too much.  This is why you get Ozzie Smith as #1 in any fielding system: all those annual biases that come into play (who’s his pitcher, where did all those batted ball go) comes out in the wash of a full career.  Of course, this doesn’t help us in evaluating in real-time.

Alternatives

What are the alternatives if you don’t like the unadjusted huge spread in quality of opportunities, or the adjusted (but possibly biased) smaller spread in quality of opportunities? 

. Well, you can just throw your hands up in the air and “look at all of them”.  Life is short, and you don’t want to waste your time evaluating each one. 

. You can “go to the eye test”, though, that certainly has its own inherent biases, not to mention the extremely small sample size (how many of the 130,000 batted balls did you watch?). 

. You can crowdsource, though that has its own bias issues as well.

. You can be a politician, and point out why the method you have chosen is the best by pointing out the good parts of it, and pointing out the bad parts of the methods you have rejected.  For some people, politics is a way of life.

The worst alternative is to laugh or mock.  You simply aren’t offering anything of value by being an a$$hole.  If you go this route, at least be funny.  But being a funny jerk still doesn’t mean that you are providing facts.  So, be funny, and then stand aside.  Friendly hint: chances are, you are not being funny.


#1    Brooks Robinson      (see all posts) 2011/10/17 (Mon) @ 14:54

If I were in any sort of position of power with any MLB team, one of my spring training rituals would be to put prospects in the field at their respective positions, hit a slew of balls at ‘em, and do the f/x-type tracking of their performance (from ball off bat to completion of pseudo-putout). The idea would be to quantify fielding ability independent of game situation, to a reasonable extent. It wouldn’t take very many sessions to build a sizable database, on both individual players and in the aggregate. It would provide a useful basis for comparisons.


#2    weskelton      (see all posts) 2011/10/17 (Mon) @ 15:44

In addition to classifying each batted ball by it’s typical out-rate, one should also consider the impact of an out not being made. 

As an example, think of a duck-snort fliner in front of a charging outfielder vs. a ball that would just barely clear the fence.  In the first case, the outfielder would have to make a shoe-string catch.  In the second, he would have to jump and pull back the would be homer.  The typical out rate, albeit very low, could be the same.

However, if the the outfielder fails to make the former (and plays it on the hop), it will be a single with each runner advancing one base.  For the latter, the result of not making the play is a homerun.


#3    Tangotiger      (see all posts) 2011/10/17 (Mon) @ 15:49

Wes: agreed. 

While I definitely mean my point to be in terms of run value, not out value, it’s easier to have the discussion if we talk about the out rate (akin to a batter’s OBP rate).


#4    Peter Jensen      (see all posts) 2011/10/17 (Mon) @ 15:52

Brooks - Just the kind of evaluation system that you describe above was suggested in the Glenn Shoenhals presentation at the 2010 Pitch Fx Summit: http://baseball.sportvision.com/summit/archive/2010.  Also discussed was the possibility of programming pitching machines to simulate hitting in a controlled way to different parts of the field to more quickly pinpoint the limits of a player’s range.  Both ideas make a lot of sense for evaluating young players especially.


#5    BirdWatcher      (see all posts) 2011/10/17 (Mon) @ 16:54

Tango, can you clarify the standard deviation numbers used in the article. It reads as though the standard deviation for OBP for a single plate appearance is 2.5% when in fact it is more like a 100x that amount, so it’s unclear to me what you’re saying. Similarily, the standard deviation after 625 plate appearances would give you a range around the mean of about 10 points, say .324 to .342, or about 2.5% on each side of the curve, so where does the 0.1% come from. The numbers don’t really change the point of your article, but I’d like to be sure everybody is on the same wavelength when using standard deviations and related percentages. Thanks.


#6    Tangotiger      (see all posts) 2011/10/17 (Mon) @ 17:04

Thanks for pointing out the confusion.

I’m talking about the CONTEXT.  If, for example, every batter faced Doc at Citizen’s Bank Park, and in front of the same 8 fielders, at 25 celsius, at night, then the standard deviation of the *context* would be zero.

But, the reality is that you have a spread in talent of opposition, of parks, of temperature.  Your “strength of schedule”.  That spread is going to be fairly tight.  That is, the context of Doc in good conditions for him would be an OBP of 25%.  The context of some crappy Royals pitcher in fairly not-good conditions would have an OBP of 40%.

So, that’s the context that a batter faces for every PA.

***

A fielder is not like that, and the reason is because the ball is already in flight.  Whereas the ball is still in the pitcher’s hand when we establish the context for a hitter’s environment as being 25% to 40% in OBP range (1 SD = 2.5%), the context for a fielder is that the ball is in flight, and so, he’s got a 0% to 100% chance of getting an out on any batted ball.


#7    Brian Cartwright      (see all posts) 2011/10/17 (Mon) @ 18:28

Tango, excellent explanation.

If we were starting from scratch, I would want the fielder’s initial position, the ball’s position (then calculate distance and angle from fielder to ball) and the time for the ball to go from the bat to that position. Field f/x or Trackman should be able to give us that.

But we are working with the publicly available data, and with the descriptions and measurements chosen by others.

Tango correctly points out that trying to be too fine with the classifications introduces bias. For example, using Gameday’s xy coordinates to create a vector (horizontal angle) for balls in the air to the outfield, I found that with any more than seven vectors, hits in the gaps were recorded as being farther from the outfielders, while outs were recorded as being closer. So given the limits and biases of the available data, left field line, left field, left center gap...etc is as detailed as we can go without getting noticeable biases in the data.


#8    Tangotiger      (see all posts) 2011/10/17 (Mon) @ 19:45

Brian, yes, great way to describe the issue of “too fine” of a classification.

I think this is why Peter goes the route of the “Big Zone” metric, and why Retro zones might be better to use than the finer STATS/BIS zones.


#9    Tangotiger      (see all posts) 2011/10/17 (Mon) @ 19:46

Interestingly, MGL used to take all the STATS/BIS zones and collapsed them into the much larger Retro zones, until some people (maybe me even) told him to keep the data as-is from STATS/BIS.


#10    Brian Cartwright      (see all posts) 2011/10/17 (Mon) @ 21:18

That’s something that Colin had been pointing out, and I was able to verify in the Gameday data. For however many outfield vectors I created, I calculated the babip in each. It should fairly smooth curves, with the lowest hit rates where the ball is closest to an outfielder, higher away from the outfielder. But with more than 7 zones it starts getting bumpier, not just random variation, but you can see new high and low points reflecting the classification bias. Halfway between the outfielder and the center of the gap would have a lower hit rate than right at the outfielder, because catches were being clustered away from the center of the gap, towards the outfielder.


#11    aweb      (see all posts) 2011/10/18 (Tue) @ 09:44

I’m always curious about the anomolies, which can show you where the current system goes wrong on an individual level. This year, Carlos Lee had great OF ratings. Everyone agrees that he is not a good outfielder - he fails the visual “looks bad out there” test, previous years and various methods have him as being at best average, and typically bad. So most people will agree that Carlos Lee, for whatever reason, got lucky this year. There are two basic ways I see to get lucky.
1. He got a lot of plays that fell into the “anyone could make that” category, above the normal percentages by a significant amount. Coaching coud get some credit here, but OFers can go long stretches without any particularly difficult chances coming up.
2. He made an unusual number of low percentage plays.

There are also data issues I could see happening for a bad player - if balls consistently fall a long way from him, maybe the stringer thinks they were liners instead of flyners (or similar). Or the zone classes get messed up, for reasons already touched on here. RZR had him very low, out of zone plays (as reported by fangraphs) didn’t have him particularly high on a per inning basis. Aside from a good arm rating, I’m not sure, taking a quick look, at what exactly the systems think he was good at.

We know some hitters will fluke into a high average, and we can usually tell why (high BABIP on grounders and/or flyballs, high HR rate, low IF pop rate, etc). What I think needs to be stated more clearly is what exactly is happening when a player “flukes” his way to a high defensive rating by the current systems. Is there any way to tell for a particular player?


#12    Tangotiger      (see all posts) 2011/10/18 (Tue) @ 10:05

David Pinto is one guy who does a fantastic job to highlight these issues.

Ideally, what we’d see from MGL (and Dewan and everyone else) is how much each adjustment contributes to the overall rating.

For example, you might have Carlos Pena as -12 runs unadjusted, and +1 run adjusted.  But in-between, adjustments are made for: location, trajectory, base-out, pitcher tendency, park, etc, etc, etc.  It would be lovely to see how each adjustment makes the changes.

Are the adjustments on the objective parameters, or subjective ones, and by how much.  Are those that are influenced more by stringer bias the ones that have the most adjustments?


#13          (see all posts) 2011/10/18 (Tue) @ 11:02

Is there a free source for fielding play-by-play data, ie, who caught or should have caught a given hit?


#14    Tangotiger      (see all posts) 2011/10/18 (Tue) @ 11:18

There’s Retrosheet and Gameday.

Retrosheet denotes who was the first fielder to touch a ball, and that’s the one I use.

Biggest problem there is if you have a CF that touches a groundball, you don’t know if it went to the SS or 2B side.  Similarly, a LF that touches a GB may have gotten to a grounder in the hole or down the line.

At least, it reduces the opportunity space down to 2 infielders.

(Of course, the quality of that data has not been tested.  While the scorers are motivated to get the outs correct, and backup procedures to get those right, they are not necessarily equally motivated to make sure they got the hits-touches correct.  Who knows exactly how accurate those are.)

You can try to make some reasonable guesses as to what happened, and that’s what Dan Fox, Sean Smith, Colin Wyers, Brian Cartwright and/or Peter Jensen might be doing.


#15    Colin Wyers      (see all posts) 2011/10/18 (Tue) @ 13:03

As far as who SHOULD have made the play, Josh, you should know much better than me!


#16    joshorenstein      (see all posts) 2011/10/18 (Tue) @ 13:23

Ha, true.  Just looking for some pbp data to combine with my own dataset.


#17    bojan      (see all posts) 2011/10/18 (Tue) @ 13:40

Retrosheet denotes who was the first fielder to touch a ball, and that’s the one I use.

Gameday does, too, although not in such a direct manner as retrosheet.


#18    Charles O. Finley      (see all posts) 2011/10/18 (Tue) @ 14:19

It’s a shame that the f/x era missed the artificial turf era by a good 10-15 years. When a number of parks had grass fields and a number of other parks had turf, you had a source of variation just begging for quantification by the quantitatively inclined. Not so much, these days, even though the tools are now here. Too bad that the eras really didn’t overlap.


#19    Rally      (see all posts) 2011/10/18 (Tue) @ 14:54

"Gameday does, too, although not in such a direct manner as retrosheet.”

Not sure what you mean by that.  Retrosheet has been getting their data for recent years from the gameday files.


#20    Colin Wyers      (see all posts) 2011/10/18 (Tue) @ 14:58

Parsing the Gameday files is nontrivial, at least compared to parsing Retrosheet event files, is what I think he meant, Rally.


#21          (see all posts) 2011/10/19 (Wed) @ 08:03

Rally/19

Yes, I meant what Colin said in 20.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 13:18
Do pitcher’s reach back for velocity when needed?

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves