Monday, October 17, 2011
Not all fielding opportunities are created the same
I said this in another thread several weeks back, but I’ll repeat it here anyway.
Batting Opps
A batting opportunity gives you a certain expectation of getting on base, dependent on the context. That context is the pitcher primarily, but includes the park and the fielders, the base-out state, and other minor things. So, the chance of reaching on base context for the average batter will stretch from say .250 to .400, or, to put it in clearer numbers: 25% to 40%. That range is actually fairly tight, if we think of that range as encompassing 99% of the data points presented. Basically, one SD = 2% to 3%. So, the OBP would be 33% +/- 1SD=2.5%, or something like that.
If you come to bat 625 times, then the SD goes from 2.5% for 1 PA down to 0.1% for 625 PA. Therefore, a player coming to bat over a season would have the chance of reaching base context being 33% +/- 1SD=0.1%. This is why we can say that if Pujols comes to bat 625 times and Ryan Braun comes to bat 625 times, that they’ve faced similar enough opportunities over a season to reach base.
Of course, there are systematic biases. You will see a starting pitcher 3 times in a game, not once. You are stuck to your home park half the time, etc. Those biases add up, and so the “strength of schedule” is probably more like 33% +/- 1SD = 0.3%. Just a guess, but something like that.
Fielding Opps
A fielding opportunity is far different. Any baseball fan will know there are many gimmes for a fielder (out rate = almost 100%) to impossible to get (out rate = 1%). (It also depends how you want to handle positioning as a skill to the player or the manager.) So, we end up with say one SD = 30%, with a mean of 70% out rate.
In addition, fielders get fewer opportunities than batters (strip out the BB, HB, HR, SO, and depending how you want to handle bunts). More like 450 let’s say. So, the 1SD = 30% for one ball in play becomes 1SD = 1.5% for a whole season of balls in play for a given fielder. Add in a bit of systematic bias, and now we have 1SD = 2%. (As an illustration.)
That’s the kind of distribution a fielder is going to face in terms of “strength of schedule” thinking. One SD being 2% on 450 plays is 9 outs (or 7 runs)! That’s for one standard deviation. Hence, the reason we don’t like just having a straight “range factor” (i.e., outs per ball in play).
Fielding Opps Classification
That’s why it’s critical to try to qualify each opportunity specifically. You do that by adding more knowledge to each BIP: its vector, its hang time, etc. You try to establish some sort of out rate for each fielder for each ball in play: that is, his opportunity to make an out.
This is the idea behind most of the advanced fielding metrics. The idea is on the right path. The question is how much tighter can we make that range in opportunity that I mentioned (one SD = 2%). The better you can classify each batted ball, then the more you know about the quality of the opportunity, and the tighter the range.
You have to be careful that by classifying each ball that you don’t introduce a systematic bias, because that becomes a killer. In this case, the larger the sample size, the worse off you are! That’s because a systematic bias becomes persistent across years.
Suppose that we don’t know any extra classification, and we just accept that one SD = 2% for a season of batted ball data. If you have 4 years of data, similarly unadjusted, then one SD = 1%. This is why sample size is our friend here: instead of relying on classifying batted balls, you just increase your sample size. If you have 16 years of data, one SD = 0.5%, and now we’re happy. This is why for a career’s worth of fielding stats, we don’t need to worry about adjustments too much. This is why you get Ozzie Smith as #1 in any fielding system: all those annual biases that come into play (who’s his pitcher, where did all those batted ball go) comes out in the wash of a full career. Of course, this doesn’t help us in evaluating in real-time.
Alternatives
What are the alternatives if you don’t like the unadjusted huge spread in quality of opportunities, or the adjusted (but possibly biased) smaller spread in quality of opportunities?
. Well, you can just throw your hands up in the air and “look at all of them”. Life is short, and you don’t want to waste your time evaluating each one.
. You can “go to the eye test”, though, that certainly has its own inherent biases, not to mention the extremely small sample size (how many of the 130,000 batted balls did you watch?).
. You can crowdsource, though that has its own bias issues as well.
. You can be a politician, and point out why the method you have chosen is the best by pointing out the good parts of it, and pointing out the bad parts of the methods you have rejected. For some people, politics is a way of life.
The worst alternative is to laugh or mock. You simply aren’t offering anything of value by being an a$$hole. If you go this route, at least be funny. But being a funny jerk still doesn’t mean that you are providing facts. So, be funny, and then stand aside. Friendly hint: chances are, you are not being funny.


If I were in any sort of position of power with any MLB team, one of my spring training rituals would be to put prospects in the field at their respective positions, hit a slew of balls at ‘em, and do the f/x-type tracking of their performance (from ball off bat to completion of pseudo-putout). The idea would be to quantify fielding ability independent of game situation, to a reasonable extent. It wouldn’t take very many sessions to build a sizable database, on both individual players and in the aggregate. It would provide a useful basis for comparisons.