Saturday, March 24, 2007
UZR for hitters
Protrade checks in with the UZR version for hitters:
http://www.protrade.com/content/DisplayArticle.html?sp=Sfd06ae48-d89d-11db-8683-5577a9d16e8f
http://mlb.mlb.com/news/article.jsp?ymd=20070322&content_id=1854282&vkey=news_mlb&fext=.jsp&c_id=mlb
The main issue is getting enough parameters to distinguigh a Beltran hit to location x,y that was “hard hit” from a Neifi Perez’s hit to location x,y that was “hard hit”. On top of which, you really need to know the fielding alignment. In short, it’s a good first try, but the parameters required need to be extended.
I agree with Tango. It is a good first start, but much more work needs to be done. In general, I am not a big fan of “UZR for batters” for the reasons Tango mentions. A hard hit ball to section X,Y by Bonds is NOT the same as a hard hit ball by Neifi, even if it has the same STATS (or BIS or whomever) parameters. As well, because the defense plays differently (sometimes VERY differently) for each batter, the baseline “caught percentages” can be way off and sometimes meaningless.
And of course, you will find that most of the lucky players have high BA and most of the unlucky ones, a low BA. That is why they have a high or low BA in the first place - they likely got lucky or unlucky. That is why we regress all sample stats to the mean. IOW, we know that Mauer probably won’t hit .347 again without knowing his batting UZR.
However, the beauty in the batting UZR if it is done well, is to see which high BA guys got lucky and to what extent and which ones perhaps did not and then adjust the regressions accordingly. Same thing for the low BA guys.
As far as using pitcher UZR (or PZR), that is a lot better, as we expect a “hard hit” ball to be about the same no matter who the pitcher is, on the average, and we don’t expect a whole lot of differences in terms of fielder positioning among different pitchers, other than whether they are L/R.
I did not read the protrade article real closely, but I assume that their baselines at least are separated by whether the batter and pitcher are L/R (the handedness of the the batter is the most important thing), as that is somewhat a proxy for fielder positioning and to some extent speed of the batted ball.
What I would like to see in the data, for example, would be the affect of UZR for batters and pitchers, holding a traditional stat constant. For example, if we look at 2005 data and we look at all batters with a BA of around .300 and then break them into lucky and unlucky and then look at each group’s BA the next year. How much more regression to the mean do we see in the lucky group? That tells us how much the UZR data is actually helping us.
What you sometimes see with (bad) studies is something like the following:
We have 2 groups of players - one the lucky group and the other, the unlucky group (according to batter UZR). The lucky group lost 20 points in BA and the unlucky group gained 20 points next year. Voila, we must have a great measure of luckiness! Not do fast. If the lucky group had an average BA of .310 the first year and the unlucky group, .230, then we can explain the 20 point gain and loss by “normal” regression to the mean alone. We don’t necessarily need the UZR data. But, if we find that the lucky and unlucky .310 (and .230) hitters regressed significantly differently the next year, and ditto for the unlucky .310 and .230 hitters, then we are on to something. Or, if we simply see more regression than we expect when we divide the players into lucky and unlcuky baskets, then we also are on to something.