Wednesday, July 28, 2010
Reducing bias in fielding metrics
Colin tries to stick to factual information only, to try to infer what could have happened and he sets out his constraints:
So let’s take a different approach. Let’s try to design a fielding metric with no bias—or, at least, attempt to minimize the effect of bias. What we can do is:
1. Restrict ourselves to looking at only factual data—data we can validate objectively. That means no batted-ball data, no hit location data, etc.
2. For estimating the amount of plays an average player at that position would have made, ignore data about the outcomes of batted balls whenever possible.
3. Err on the side of caution when deciding whether or not to adjust—in other words, make as few adjustments as possible. We can allow the data to be expressive by getting the metric out of its way whenever we can.
As for the bolded part, he says this:
So this gives us, at the team level, outs on the ground versus outs in the air. And what we see is a strong negative relationship between ground plays and air plays, with a correlation of -0.77. So when a team makes a lot of ground-ball plays, the most likely explanation is that they saw a lot of ground balls.
So, let’s adjust for that. What we can do is look at how many plays a team made in total, compared to the average team, and then look at how many ground-ball plays a team made compared to how many air-ball plays they made. A team with superior ground-ball fielders will not only have more ground-ball plays but likely more plays made overall.
So for a team that’s above-average on making ground-ball plays but below-average in making total plays, we “shift” the responsibility toward the ground-ball plays (in other words, inflate the amount of ground-ball plays we think the team should have made, but deflate the amount of air-ball plays we think the team should have made), while keeping the total number of plays we think the team should have made constant.
This is, for lack of a better term, our “ground-ball rate” adjustment. It’s a bit of a misnomer, because we ignore any scorer data on the number of ground balls a defense saw. And it is possible that including that scorer data could improve the process here as well. But for now, let’s err on the side of excluding that data.
So, he’s introducing a bias here. He’s inferring something about the groundball frequency based on the groundball outs made. Two things lead to more groundball outs: more groundballs and better infielders. It’s not clear how Colin is separating everything. He could pretty much split the difference and presume that if a team turns alot of BIP into his estimate of ground outs that it’s a combination of the two (perhaps iteratively figuring that out… say for example that he presumes that it’s all pitchers, then he gets alot of ++ for the infielders, so then he does a second pass, and then he might get something different, and he does a third pass and so on, until he get to a point where it stabilizes).
Anyway, he does a cool thing, and that’s to show the margin of error. For his SS, the margin of error is 15 runs. I presume this means 2 SD = 15 runs, so that 1 SD = 7.5 runs?
My preferred margin of error is to get 1 SD = 3 to 5 runs. The margin of errors will decrease the more data you give it. The basic rule there is that it moves proportional to the square root of the sample size. So, if you have 1 season with 1 SD = 7.5, then 3 seasons will give you 1 SD = 4.3 and 1 SD = 5.3. And that pretty much goes with what I’ve been saying in the past that you need to look at 2 seasons for SS and 4 seasons for 1B (basically 3 seasons if you need one number).
Anyway, I like Colin’s overall presentation, and the constraint to stick to factual data. He gets numbers in scale similar to what WOWY gives. For the same reason that I would not use single-year WOWY, I wouldn’t use single-year this metric either… you just get way too many out-there results. But, once you get to three years, you start to listen, and by six years or so, all the extra subjective information that the other metrics use to lower the uncertainty will pretty much wash away as an advantage, leaving us with potential biases in those metrics, and little biases in factual metrics.
That is, at say 6+ years, it becomes a bias v uncertainty battle, as Colin’s main point in the article discusses. There’s plenty of room for everyone at the table.


It seems to me that the potential bias is in the quality of the outfielders or perhaps the size of the outfield, rather than in the quality of the infielders, if I understand Colin’s method correctly, which I may not.
He’s adjusting based on this assumption:
Which should be true, I think, unless the outfield biases the results.