Monday, May 21, 2007
The fielding system approach I’ve been preaching
http://stat.wharton.upenn.edu/~stjensen/research/safe.html
Whereby UZR and other systems use discrete zones, this approach uses a continuous function. It doesn’t look like they use as many parameters as MGL’s UZR (park, GB/FB tendency, base/out, etc), but nonetheless, they’ve got the basic framework down. I’m not sure how they convert plays to runs either.
(Hat tip: David A.)
The methodology is brilliant of course. There are a few things they need to do to make the results more accurate, especially in the short run. Rememeber that the main point in making things more granular, as this methodology is, as compared to the “discrete function” ones, is to increase accuracy in the short run. The definitely need to separate the fly balls in play into at least line drives and non-line drives if not line drives, fly balls and pop-ups (and fliners?) as all of these obviously have very different probability functions.
As well, they need to incorporate other parameters, as Tango mentions, like runners, outs, handedness of batters and/or pitchers, and certainly park affects. For example, the numbers for Manny cannot be taken seriously without a park adjustment.
Now, most of these additional parmaters will even out in the long run, but some of them will not, as players who stay on the same team tend to play behind a pitching staff that remains farily constant so that handedness of pitchers/batters may be biased across a player’s 4-year career data. And obviously park affects don’t even out over the long run.
But, as Tango says, the basic methodology is brilliant and correct.
I would also like to see them estimate a playereach fielder’s average positioning on the field as compared to the average fielder, which they should be able to do from the data. For example, does Jeter in fact cheat up the middle, does Edmonds play shallow? Not that a fielder’s positioning should not be included in his “skill set,” but it would be nice to know it anyway.