Monday, May 23, 2011
New fielding runs for BPro
Steven quotes Colin:
This version of FRAA avoids the pitfall of subjectivity inherent in zone-based ratings. “In contrast to other popular metrics, FRAA does not use any stringer-recorded observational data,” Colin explains. “Serious discrepancies have been noted between data providers, and research has shown that in larger samples use of that sort of batted-ball data introduces severe distortions in the metrics that impede accuracy. Without evidence that the batted-ball data has redeeming value in the short term, it seems imprudent to use that sort of data in our evaluation of player defense.”
First, I think FRAA does use stringer-data, since I believe Colin uses which fielder (is marked as the one who) picked up the basehit. Perhaps that was in a previous version, and Colin decided to go with 100% factual data (which is what my WOWY also does).
The rest of the statement is as much a philosophical point than anything. For example, with WOWY, I intentional limit myself to factual data, not so much because I don’t believe that the subjective data is without value, but because I don’t want to include the uncertainty level of subjective data. In that respect, it’s a bit like FIP, that intentionally focuses on non-BIP events. Colin however goes one step further and is arguing that the status quo should be to be against subjective data until it’s been proven to add value.
Now, he’s asking about evidence, and it’s a good enough point. One way to find that is to run a correlation of UZR and FRAA (and DRA and PMR and whatever else) against next year’s (unadjusted) outs per BIP. That’ll tell us which stat does better.
You do need to be careful here though. Let me take a clearer example that is based on pitching to make my analogy. Let’s say you have FIP and you have SIERA. FIP includes a pitcher’s HR rate as a “skill” component, while SIERA ignores it altogether. And then we run a correlation of FIP and SIERA against next year’s RA9 (runs allowed per 9 innings). Well, if none of the pitchers changes teams, and all your parks are either Petco or Coors, then a pitcher that gives up lots of HR at Coors will continue to give up lots of HR at Coors. And so, his FIP will correlate strongly. SIERA on the other hand won’t know that the pitcher is pitching at Coors.
So, if UZR does a great job of removing park and pitcher bias, but then you have a bunch of fielders who have the same pitchers and parks, then you are comparing a context-adjusted metric (UZR) against a non-context-adjusted metric (outs per BIP). The more you adjust, the less you make your metric correlate against next year’s stat! (To a point.)
One way around this is to only look at players that switch teams. This way, you increase the uncertainty, but at least you won’t have a bias.
That’s what I would recommend, that you look at say all SS from 1993-2009 who switched teams in the following year, and then run a correlation of their UZR and FRAA to the next year’s outs per BIP. Repeat this for 3B, 2B, OF, and let’s see what we get.
I think this would move the discussion forward from a theoretical objection, to a practical one.


Recent comments
Older comments
Page 1 of 344 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date