Thursday, March 12, 2009
Normalizing the Gameday hit location data
Peter rolls up his sleeves and more, and shows us how to normalize the data at each park in part 1.
In part 2, he gives us some juicier details, including specifics on Carlos Beltran:
[My model, BZM, Big Zone Metric] has Beltran saving 16.7 runs in 2008. ... UZR has Beltran saving seven runs in 2008. Interestingly, UZR shows 377 expected outs and 418 put outs. I have him at 385 expected outs with 414 plays made. The slightly fewer plays made is due to my ignoring line drive outs OOZ and also ignoring plays for which Gameday has no hit location data. How MGL interprets a +41 plays made over expected as only being worth 7 runs will have to be explained by him. ... Plus/Minus gives Beltran 404 expected outs and +14 plays made. This is adjusted to +24 in the “enhanced” model and 14 runs saved (not including runs saved with his arm).
So, Peter and Dewan are almost identical at 14 to 17 runs saved, while MGL (via bUZR) has him at only 7 (though Peter is saying there may be a disconnect in MGL’s expected outs, similar to another reader reporting same on Brandon Phillips… what’s missing on Fangraphs is “Plays made” that MGL considers, since the “Putouts” is not necessarily what MGL is looking at, but a subset of that). In sUZR, MGL has Beltran at +17 runs.
Anyway, in WOWY, I have Beltran at +21 plays, which would be roughly 18 runs saved. I have him at 415 plays made (very close to Peter’s 414). Interestingly, the batters he faced historically did not like to hit to CF. If we only focused on the batters he saw, he was at +29 plays. But, using his pitchers, he was at +18. His parks show him as being +23, and if you look at the batted ball distribution he was at +15. The batted ball distribution “should” be in the middle of all that, since, ideally, the combination of the batters he faced and the pitchers he has and the parks he has all conspire to create a specific batted ball distribution. But, the RECORDED batted balls shows him to be pretty good, but not as good as we’d have expected using the historical data of his batters, pitchers, and parks. I would not be surprised if there was less than ideal scoring in his games in 2008. This process repeats itself in 2007. But, in 2006, the numbers are much tighter. Anyway…
Looking at Rally’s numbers: he’s at +11 runs. But, as we’ve noted in the past, Rally’s numbers are spread out less than UZR and mine. It’s likely that in terms of standard deviations, he’s the same as we are.
It seems to me that we are extremely close, even though we all have different systems. The outlier is bUZR, as sUZR, Dewan, Peter, me, and (adjusted) Rally are all right around +16 runs or so.


Recent comments
Older comments
Page 1 of 344 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date