THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, July 28, 2010

Reducing bias in fielding metrics

By Tangotiger, 12:19 PM

Colin tries to stick to factual information only, to try to infer what could have happened and he sets out his constraints:

So let’s take a different approach. Let’s try to design a fielding metric with no bias—or, at least, attempt to minimize the effect of bias. What we can do is:

1. Restrict ourselves to looking at only factual data—data we can validate objectively. That means no batted-ball data, no hit location data, etc.

2. For estimating the amount of plays an average player at that position would have made, ignore data about the outcomes of batted balls whenever possible.

3. Err on the side of caution when deciding whether or not to adjust—in other words, make as few adjustments as possible. We can allow the data to be expressive by getting the metric out of its way whenever we can.

As for the bolded part, he says this:

So this gives us, at the team level, outs on the ground versus outs in the air. And what we see is a strong negative relationship between ground plays and air plays, with a correlation of -0.77. So when a team makes a lot of ground-ball plays, the most likely explanation is that they saw a lot of ground balls.

So, let’s adjust for that. What we can do is look at how many plays a team made in total, compared to the average team, and then look at how many ground-ball plays a team made compared to how many air-ball plays they made. A team with superior ground-ball fielders will not only have more ground-ball plays but likely more plays made overall.

So for a team that’s above-average on making ground-ball plays but below-average in making total plays, we “shift” the responsibility toward the ground-ball plays (in other words, inflate the amount of ground-ball plays we think the team should have made, but deflate the amount of air-ball plays we think the team should have made), while keeping the total number of plays we think the team should have made constant.

This is, for lack of a better term, our “ground-ball rate” adjustment. It’s a bit of a misnomer, because we ignore any scorer data on the number of ground balls a defense saw. And it is possible that including that scorer data could improve the process here as well. But for now, let’s err on the side of excluding that data.

So, he’s introducing a bias here.  He’s inferring something about the groundball frequency based on the groundball outs made.  Two things lead to more groundball outs: more groundballs and better infielders.  It’s not clear how Colin is separating everything.  He could pretty much split the difference and presume that if a team turns alot of BIP into his estimate of ground outs that it’s a combination of the two (perhaps iteratively figuring that out… say for example that he presumes that it’s all pitchers, then he gets alot of ++ for the infielders, so then he does a second pass, and then he might get something different, and he does a third pass and so on, until he get to a point where it stabilizes).

Anyway, he does a cool thing, and that’s to show the margin of error.  For his SS, the margin of error is 15 runs.  I presume this means 2 SD = 15 runs, so that 1 SD = 7.5 runs?

My preferred margin of error is to get 1 SD = 3 to 5 runs.  The margin of errors will decrease the more data you give it.  The basic rule there is that it moves proportional to the square root of the sample size.  So, if you have 1 season with 1 SD = 7.5, then 3 seasons will give you 1 SD = 4.3 and 1 SD = 5.3.  And that pretty much goes with what I’ve been saying in the past that you need to look at 2 seasons for SS and 4 seasons for 1B (basically 3 seasons if you need one number).

Anyway, I like Colin’s overall presentation, and the constraint to stick to factual data.  He gets numbers in scale similar to what WOWY gives.  For the same reason that I would not use single-year WOWY, I wouldn’t use single-year this metric either… you just get way too many out-there results.  But, once you get to three years, you start to listen, and by six years or so, all the extra subjective information that the other metrics use to lower the uncertainty will pretty much wash away as an advantage, leaving us with potential biases in those metrics, and little biases in factual metrics.

That is, at say 6+ years, it becomes a bias v uncertainty battle, as Colin’s main point in the article discusses.  There’s plenty of room for everyone at the table.


#1          (see all posts) 2010/07/28 (Wed) @ 12:59

It seems to me that the potential bias is in the quality of the outfielders or perhaps the size of the outfield, rather than in the quality of the infielders, if I understand Colin’s method correctly, which I may not.

He’s adjusting based on this assumption:

A team with superior ground-ball fielders will not only have more ground-ball plays but likely more plays made overall.

Which should be true, I think, unless the outfield biases the results.


#2    Colin Wyers      (see all posts) 2010/07/28 (Wed) @ 13:25

I can go ahead and give some more details on how I’m handling that adjustment, and I do agree that it’s a potential source for bias. I’ll go ahead and blow past the parts I go over in detail in the article.

First I figure ground ball plays and air ball plays (without looking at any batted ball data, I should clarify - exactly how I split infield plays between GB and AiB I think is explained pretty clearly in the article).

So, then you have:

GB_PM_RT/lgGB_PM_RT + AiB_PM_RT/lgAiB_PM_RT = TM_PM_PLUS

So basically like OPS+ is figured. I should note that I only adjust when one unit is above average and one unit is below average -

If the infield is above-average, but the team is simply average, the assumption (and yes, there are probably cases where it doesn’t hold) is that the batted ball distribution is the cause, and so I figure an adjusted league GB and AiB play rates to force each unit to average.

If the team is above-average, though, I credit that to the unit that makes the most plays above average, and then do the shift.

So as Mike notes, a team with a good defensive infield but a below-average outfield (or a team with a park that suppresses outfield plays) is probably being misattributed here to some extent. The counterweight is that some portion of air ball plays are being made by infielders, so it’s not solely determined by outfielder proficiency.

It’s not perfect, and I’m certainly open to suggestions to make it better. (And yes, batted ball data may be one way to do so - it’s one thing I intend to look at once I have some other things, like outfielder ratings, attended to.)

And that margin of error should be one SD, not two.


#3    Guy      (see all posts) 2010/07/28 (Wed) @ 14:50

Nice work, Colin.  I didn’t completely follow how you are estimating your margin of error. You say:

“For example: In 2009, with a right-handed hitter batting, a shortstop will make a play on a ball in play roughly 12 percent of the time.... But the margin of error around our estimate of how often a shortstop will make any single play is about 30 percent.”

How did you determine the 30%?

*

In terms of improving your air/ground estimate, couldn’t you use aggregate batted ball data to do this?  That is, there should be a predictable relationship between a team’s GB% and it’s number of air and ground outs (as you measure them).  Including other objective data would help as well:  HRs and triples (and perhaps ratio of XBH:singles), as well as team DER.  While we may not trust the GB% estimate for any particular team, or any particular SS, I can’t see why there should be any significant bias in your estimate of how each “objective” factor predicts the true GB%. (I wouldn’t make the same claim for the LD vs. OF distinction.)


#4    Colin Wyers      (see all posts) 2010/07/28 (Wed) @ 15:08

The error on a per-play basis is figured as such (expressed in pseudocode:

SQRT(AVERAGE(POWER(PlayMade - AvgPlayMade)))

Where PlayMade is 1 if a play is made by the fielder in question, 0 if it isn’t. AvgPlayMade is the league average rate of a play being made, given the batter handedness and the team-level adjustment for GB/AiB outs.

Aggregated batted ball data may be useful in determining the team BIP split, yes. I want to finish off everything else before I examine it, for two reasons:

1) I need to use this for instances when I don’t have enough batted ball data, and so I still need to figure this stuff out regardless, and

2) Then we can look at how introducing the batted ball data changes the outputs, how it affects the margin of error calculations, etc. So we can then try to make a determination of whether or not the batted ball data is improving the system, and to what extent, using data rather than just supposition.

And this is still a work in progress - I don’t want to give the idea that I’m “done.” One of the reasons I published was to get input and hear from people about ways to improve the idea.


#5    Guy      (see all posts) 2010/07/28 (Wed) @ 16:05

Colin, I wasn’t suggesting you actually use the batted ball data for any specific team.  I’m suggesting you build a model that predicts a teams’s GB% based on data that you have for all years (outs on ground, outs in air, HR, 1B, DER, etc.).  Then use that model to make your air/ground adjustment for all teams across all seasons. (Presumably those relationships were the same in years before you have batted ball data.)

Does that make sense?


#6    Tangotiger      (see all posts) 2010/07/28 (Wed) @ 16:25

Right, I agree with Guy.

It’s one thing to say that the batted ball data is biased in some form toward/against a specific pitcher or team.  But, if you use it to establish your equation to estimate this, that’s very useful.

For example, say you figure out that the number of GB/BIP = GBouts/BIPouts.  You go ahead and apply that to all teams, you figure out the fielder ratings, and then you find for teams with great infielders (Ozzie, Rolen, Beltre, Belanger, etc) this equation does not hold.  Well, this means there’s a bias with good fielders.  And so, you can’t just do GBouts/BIPouts.

Using the subjective data to establish the general equation would seem to me to be necessary to do.


#7    Guy      (see all posts) 2010/07/28 (Wed) @ 17:11

This is probably just my confusion, but I’m still not following the margin of error estimate.  We want to know the random variance in opportunities at a position (given the factors Colin controls for).  That means we have to know the variance in team pitching staffs’ distribution of opportunities, plus incorporate random variance (but NOT take into account talent difference in fielders).  Is that what Colin’s formula is doing?


#8    studes      (see all posts) 2010/07/28 (Wed) @ 17:24

Guy, if I’m reading the article correctly, Colin is calculating the margin of error by calculating the percentage of estimated ground balls successfully fielded by all shortstops, adjusted for the handedness of the batter.


#9    Guy      (see all posts) 2010/07/28 (Wed) @ 17:34

Studes:  I think that’s right.  But that only accounts for random variation. So that assumes every pitcher has the same true distribution (or at least, that the weighted average distribution of the pitchers a SS fields behind will always be the same).  Take Furcal as an example:  he has played behind pitchers in both Atl and LA who allow a lot of BIP to SS (according to Tango’s WOWY analysis).  The chance of his opportunities occuring by random chance behind average pitchers is almost nonexistent, but behind his actual pitchers is not surprising.


#10    Colin Wyers      (see all posts) 2010/07/28 (Wed) @ 17:59

Well, no - we’re including two variables to figure out the batted ball distribution: “GB/FB” tendencies and batter handedness.

Let’s take Furcal. From 2000 to 2009, weighted by Furcal’s chances (that is to say, all BIP while Furcal was at shortstop), on average there were roughly .094 plays made by a shortstop per BIP. That gives us 3069 plays made, when multiplied by all of Furcal’s chances.

But that’s not what Furcal is being compared to. He’s being compared to 3267 plays made - in other words, we think that due to the GB/AiB split and the handedness of the batters he’s faced, that an average shortstop would have made 200 more plays than he would have given an ordinary distribution of batted balls.

So in the particular case of Furcal, at least given what you say about it (I don’t remember the exact figures), I don’t see a major dispute between WOWY and what I’m proposing.


#11    Colin Wyers      (see all posts) 2010/07/28 (Wed) @ 18:11

As for the proposed changes to how to estimate the GB/AiB split, I don’t know if they offer much of an improvement, at least in the problem space we’re considering.

Let’s say that a team makes 50 more GB plays than average. And let’s say that on the whole, the team makes 25 more plays than average. (So by definition, that’s 25 fewer AiB plays than average.)

So do we assume that the team got 50 more GB? No. What we do is assign the plays above average to the infielders, so that now they’re 25 plays above average, and then refigure how many plays the average team would have made. So we’re still treating the infielders as above-average as a group.

Yes, there is a potential problem - but it’s not simply when the infield is above average, but it’s when the infield’s above-average play rate is offset by a below-average outfield.

Including data like XBH/H doesn’t help us here, does it? Because that still doesn’t tell us whether the extra air ball hits (which is presumably what we’re using extra base hits as a proxy for) are due to extra fly balls or due to poorer outfielders. So I don’t know if that solves the problem, or just adds an extra layer of complexity to it.

HR doesn’t have that bias, of course, but it’s still difficult to make work (I have looked at using HRs to adjust the team’s presumed AiB rate, and so far haven’t made it work, but that isn’t to say I won’t find a way to make it work.)


#12    Guy      (see all posts) 2010/07/28 (Wed) @ 18:50

Colin/10:  Fair point.  Your GB/AiB adjustment will capture a lot of the variance among pitching staffs.  You do get some idiosyncratic differences beyond the GB/FB tendency (e.g. Glavine pitching outside), but probably isn’t a big deal for at least 95% of your fielders.

Colin/11: 
I don’t think the problem is necessary limited to good OF/bad IF teams (and reverse).  Your team is +50 GB and -25 AiB, so you assume they are really +25 and 0.  But how do you know that is in fact the true average for such teams? Won’t that depend on the respective variances of staff GB/FB tendency, IF skill, and OF skill?  Couldn’t the average reality be +18/+7, or +30/-5?  That’s what I think you want to figure out.

And I would think that XBH could help you figure it out.  Suppose you had two of your +50GB/-25AiB teams, but one was -23 1B / -2 XBH while the other was -2 1B / -23 XBH.  Wouldn’t you make different estimates of the GB/AiB distributions?  That said, maybe in reality the XBH/H ratios don’t vary enough to help. 

Too bad HRs aren’t more help.  I’d think you could at least add them to OF putouts as certain airballs.


#13    Guy      (see all posts) 2010/07/28 (Wed) @ 19:13

BTW, I think you actually understate the gap between your metric (you need a name—how about “DRS?") and TZ.  TZ includes double plays turned.  If you look just at Rally’s range rating, this is the comparison for your leaders:
Colin / TZ range
Smith, Ozzie 322.1 / 214
Belanger, Mark 237.2 / 233
Sanchez, Rey 217.7 / 100
Russell, Bill 190.6 / 62
Valentin, Jose 177.4 / 32
Guillen, Ozzie 168.3 / 99
Templeton, Garry 150.3 / 30
Groat, Dick 144 / 42
Maxvill, Dal 139.3 / 47
Gagne, Greg 130.4 / 78

With the exception of Belanger, the difference varies between large and huge.


#14          (see all posts) 2010/07/28 (Wed) @ 20:33

Guy/13, those numbers imply, if I’m looking at it correctly, a range bias of about 8 runs per season for that group of players, assuming we can attribute the difference to that cause.

your metric (you need a name—how about “DRS?")

How about Wyers Transparent Fielding?


#15          (see all posts) 2010/07/28 (Wed) @ 20:43

It may be taboo to say this on this blog, but if I’m thinking it, surely others are, too.  If you have range bias of 8 runs per season, let’s just make a guess that park-scorer bias is of the same order, and we’ve previously determined that methodology (UZR vs. Plus/Minus) can make a difference of around 7 runs.  Then say that the random error is around 10 runs. 

If all those sources of error are independent, it adds up to about 17 runs of error (i.e., 1 stdev) on a season.

Over three seasons, it’s about 44 runs total error, or almost 15 runs of error per season. With 6 season of data, you get it down to 14 runs of error per season.

That’s the problem with the error due to persistent bias, it doesn’t let us bring our total error down very much by increasing the sample size.

I recognize that these are all guesses, some more educated than not, but aren’t we all interested in what sort of total error we might have in our fielding estimates?


#16          (see all posts) 2010/07/28 (Wed) @ 20:54

Re “It may be taboo to say this on this blog”, that came out sounding very differently than I intended when I “wrote” it in my head, so please scratch that sentence.


#17    Guy      (see all posts) 2010/07/28 (Wed) @ 22:25

Mike/14:  I don’t think we can say the entire difference from TZ represents a “bias.” We’d expect highly-rated fielders to have had an above-average number of opportunities, even after controlling for GB/FB and pitcher handedness.  And the reverse for low-rated players.  So the question is whether TZ or other metrics are regressing too much.  It’s a great question, but I’m not sure how you’d determine that. 

You describe a bias of X runs per season, but I’d expect it to be proportional to a player’s skill.  If these players are typical (which they probably aren’t), then TZ = .5 * Colin.  Just intuitively, it seems unlikely the distribution of opportunities could be that different over an entire career, taking Colin’s controls into account.  But intuition ain’t proof, of course....


#18          (see all posts) 2010/07/28 (Wed) @ 22:54

Guy/17, yes.  The bias does not affect all players the same.  I think I mentioned that elsewhere.  I’ve discussed this article at four different places today and sometimes I grow tired of hearing myself talk.  That may be hard to believe, but it’s true.

And I certainly don’t intend my estimates as exact.  I’m just trying to make a WAG because that interests me.  We don’t have enough information yet to make a real estimate.

That’s kinda what I meant with my “taboo” statement.  I said to scratch it because I realized it sounded like I was saying Tango or MGL discourage discussion, which wasn’t what I meant at all. 

I was saying that I was curious about what the total effect might look like, but since we don’t have good evidence to make the calculation carefully, nobody was making a guess (out loud, at least).  But I was definitely interested in making that guess, and I thought surely other people must be, too.  The “taboo” part was that summary conclusions without evidence are (rightfully) considered BS on this blog.  My “evidence” for my calculations is sketchy enough that I’d be skirting that line, for sure.


#19    dq      (see all posts) 2010/07/29 (Thu) @ 00:15

Don’t xbhits help in that you penalize a team for giving them up? And whatever portion you attribute to the defense all goes to the outfield?

If a team gives up 10 more doubles than average wouldn’t you charge the outfield the value of those doubles?


#20    joe arthur      (see all posts) 2010/07/29 (Thu) @ 09:29

I’m going to take this on a different track ...

Based on his single year chart for shortstops, it appears that Colin “throws away” infield air outs. Also I don’t see an explanation for the “plays the team should have made” part of the analysis.  If you are going to be fully agnostic about batted ball types, and you discard some of the balls fielded, then Colin can’t just be using unadjusted team or league DER to establish the “plays should have made” part.

Colin also says that he makes an adjustment for “ground ball rate” based on a comparison of team ground ball plays made to air ball plays made, “while keeping the total number of plays we think the team should have made constant.” How does the keeping the total number of plays made constant work? If I regain some belief in batted ball types with large sample size, from 1987-2009 retrosheet data, DER on ground balls is >76% (and 77% on bunts), but about 61% on “air balls” not including popups. In short, my expectation is that “team expected plays” should go up if ground ball rate goes up.

I think Colin’s on an interesting line of thought, but the expected plays made part of it needs to be more transparent, because minimalist assumptions about it are the whole point of what he’s trying to do ...


#21          (see all posts) 2010/07/29 (Thu) @ 10:39

Joe/20, not to disagree with your point about wanting more details from Colin about the adjustments, but shouldn’t “air balls” include popups and home runs?


#22    Colin Wyers      (see all posts) 2010/07/29 (Thu) @ 11:36

I’m not discarding popups, I’m simply not including them in ground ball plays. Popups (and liners caught by infielders, for that matter) are separated as best I can from infield plays on grounders. Those plays are then included in a team’s air ball plays made.

So when we talk about a team’s plays made on GB versus AiB, we need to include popups as well (which are almost all outs, and so that should significantly close the gap that you’re reporting on DER for batted ball types.)

The other thing to note about looking at DER instead of plays made - an infield error is going to count as a “play made” by DER, assuming you’re using 1-BABIP. So how you handle the error in DER is important (IF makes more errors, all else equal).


#23    Rally      (see all posts) 2010/07/29 (Thu) @ 13:52

DER has generally not been 1-babip, it counts errors as plays not made.  This is true for Bill James’ original calculations of it and what Sean Forman does on BBref.  I certainly can’t speak for every version of DER though.


#24    Tangotiger      (see all posts) 2010/07/29 (Thu) @ 14:01

It should go without saying that DER counts only BIP that led to actual outs.  If there’s any implementation out there that uses the strong form of DER (1-BABIP, where an error = out), please point it out to me.  I’ll do my best to put a stop to that nonsense.

Also, SF must be in the denominator of DER.


#25    Chris Dial      (see all posts) 2010/07/29 (Thu) @ 14:22

BTW, I think you actually understate the gap between your metric (you need a name—how about “DRS?")

High five!


#26    Tangotiger      (see all posts) 2010/07/29 (Thu) @ 14:43

I think you guys should so what we do with WAR.  Once you agree on the framework (WAR), you have different implementations of it.  fWAR (Fangraphs), rWAR (Rally), tWAR (Tango).  It’s clear that we are all talking about the same thing, and we follow the same basic framework.  Given the way BPro has shifted its WARP calculation, I would likely include theirs as well now (pWAR really, but, they’ve got the p on the suffix and uppercase).

So, if you have a certain framework, like the various Zone-based metrics, you make sure the Z is in there, and then others go off that.  Total Zone, Big Zone, Simple Zone, Ultimate Zone, etc.


#27    joe arthur      (see all posts) 2010/07/29 (Thu) @ 17:47

Colin/22 , your published chart in your article at BP for top shortstop seasons appears to exclude air outs; otherwise wouldn’t plays made reconcile pretty well to all balls fielded for outs, regardless of “real” batted ball type? For example, the chart has Ozzie Guillen with 515 plays made in 1988, but he actually fielded 595 balls for outs, 36 being line drives, 47 popups and 512 being GB according to retrosheet’s hit typing that year.

I didn’t count errors in the DER I presented, though I did fail to subtract HR ...
Revising to include pop ups in air balls and leave out HR, DER for ground balls was .767 and for air balls was .695


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 01:57
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential