Wednesday, March 21, 2007
Creating a Fielding System
Joe Arthur reports interesting data on Manny Ramirez:
Extending what Misterdirt did with retrosheet and including 2006 I get 4 year totals by hit type of:
Fenway
F Manny 363/554 .655 (312 TB allowed; 1.63 per hit) all others 645/948 .680 (501 TB allowed, 1.65 per hit)
L Manny 42/334 .126 (404 TB allowed 1.38 per hit) all others 70/646 .108 (780 TB allowed 1.35 per hit)
P Manny 0/0 all others 1/1 1.000 (0 TB allowed)Red Sox Away Games
F Manny 370/445 .831 (115 TB allowed 1.53 per hit), all others 774/899 .861 (194 TB allowed, 1.55 per hit)
L Manny 47/365 .129 (414 TB allowed 1.30 per hit), all others 106/694 .153 (747 TB allowed, 1.27 per hit)
P noneIf you assumed that the specific difficulty of the batted balls evened out, and if you assumed that these other left fielders (red sox opponents and red sox backups for Manny) were collectively average, and if you assumed retrosheet hit types were consistently judged and recorded without errors, then you could conclude from all this that Manny was -14 plays on flys at fenway and +6 on line drives at fenway, and -13 plays on flys in away games and -9 on line drives, in his actual opportunities. There is no sign that he is better at preventing extra base hits at Fenway, and no sign that he is worse at allowing them away from Fenway. His total bases allowed seems completely comparable in both. Cumulatively over 4 years, he would be -8 plays at Fenway and -22 plays away. Accounting for the hit value of those missing plays, that would be about -10 runs and -23 runs. So that’s about -8 runs per year (actual) in an average of 1104 defensive innings. Casting that into the 150G = 1350 inning basis quoted for UZR, you’d get -10 runs a year, (in terms of performance, not talent).
In short, Manny is about .025 to .030 outs per play worse than the average LF (whether in Fenway or not), and his line-drive advantage in Fenway is balanced against the same out of Fenway.
There’s about 350 FB plays in LF, so Manny is 10 outs per 162 GP worse than the average LF, which would be around 8 or 9 runs.
***
In post #102, MGL in the same thread has Manny as being -20 runs, just on the road. If we focus only on road performance, Manny didn’t do well on LD. Joe’s data from above shows that he’s -.03 on FB and -.025 on LD. Work it out, and that puts Manny at -17 outs per 162 GP, or about -14 or -15 runs.
Pretty much, we see that Joe’s results and MGL’s results match.
What’s in question is Fenway. I do not agree with MGL’s method of park adjustment. In post 104 he says:
Presently, I use one single number to adjust ALL of a player’s stats in each park, regardless of the zone it was hit in. For example, in Fenway, I think the LF park adjustment is 81, meaning that LF’ers catch 81% of the total balls caught in all parks in the AL. So if Manny catches 60% in Zone A in Fenway, he gets credit for catching 60% divided by .81. Same in zone B, etc. That is not a great way to do it of course, but neither would having an adjustment factor for each of the dozens of zones in LF.
Why am I not crazy about this? In Fenway, Manny (and his opponents) will be playing saying 20 feet closer to the plate than usual. If the out rate in non-Fenway parks in that zone is .70, it’ll be say .90 in Fenway. But, perhaps the zone that is 40 feet to his left will be much closer whether at Fenway or not. The adjustment can’t possibly be even.
The way to do this is to start with the normal position for a LF at Fenway (against a LHH and RHH). You don’t have to create an adjustment “per zone”, but you would just create a function of distance (and direction) from that normal starting point.
So, the normal starting point at Fenway is zone “265 from home”, and the out rate is .85. The normal starting point at non-Fenway is “285 feet from home” and the out rate is .90. (All numbers for illustration).
For every 10 feet forward, the out rate drops by .05. For every 10 feet backward, the out rate drops by .10. For every 10 feet to left or right, the out rate drops by .03.
It doesn’t have to be linear obviously. But, that’s how you get around the “per zone adjustment”. And, this has the double advantage of modeling reality. This is how fielders are positioned, knowing that moving up 10 feet means a tradeoff somewhere.
(Again, all numbers for illustration only.)
I don’t disagree that my park adjustment methodology is poor for Fenway. However, in the long run, it should be just fine. It is not biased (assuming that everyone plays LF starting at around the same location), which is important.
Your method is one that I have toyed around with for UZR (for all the data, not just at quirky parks), but I have not deployed it yet (and probably never will). It is probably the best way to handle the problem of using such small zones (which I do now - I use STATS slices and distances rather than Retro zones). Essentially you are smoothing out the per zone baselines and interpolating from park to park.
In any case, in the same thread I used my basic methodology and crunched the data at Fenway only, essentially comparing Manny to everyone else at Fenway, similar to what Joe did, though I treated each zone separately as I usually do.
I forgot how the numbers came out, but they were considerably more conservative than what came out using my basic .81 park adjustment for LF at Fenway.
However, as I have said many times, no matter how you slice it, Manny is likely one of the worst fielders in baseball. Fenway probably mitigates his deficiencies. Given all the data and the different ways we have seen of parsing and analyzing it, I would now guess that Manny is around -15 (per 150) going into this year.
Even if Joe (or Misterdirt or whoever is doing the analysis) gets -10 per year over the last 4 years (with no weighting I assume), after age adjusting, you get around -15 going into this year.
Heck one of the reasons why we regress sample data in the first place is to “account” for the mistakes we make in analyzing it. I may have gotten -25 over the last 4 years with a “flawed methodology” or something like that, but after I get done regressing it, it is more like -18 or -20 which is not that far off from -15.