Tuesday, March 31, 2009
HZR, UZR, PZR: the convergence of scouting and performance
Hey Tom, I’m a frequent reader of yours, and I had been wondering about a new possible stat derived from the vector things that UZR uses. Could we use batted balls to get a offensive stat that takes out great defensive plays. Like the example of the medium-speed, straight-at-the-shortstop ball being converted at 95% of the time (dont recall exact numbers), and then Derek Jeter makes the play and has a UZR of +0.05. Could we give the batter a scoring of 0.05 and then multiply by a run valued linear weight, like lets say 95% out and 5% single. and then he would get 5% of .47 (what i think is the run value of a single) then make an average? Wouldnt that take luck out of the stat?
What is the model? The model is a set of humans, who each have their own unique skills, each of whom is faced with a different set of circumstances at different frequencies, with a changing environment (weather, park, umpires, how they feel). Ideally, we’d put each human under all the possible combinations of parameters, and make him face each context one million times, all in the same day. That is the model.
What we have instead is 4 PA a day, of a set of known and unknown variables. That’s the performance data.
What we also have is how observers see those players, in-game or out. That’s the scouting data.
The idea is to create a model that is complex and comprehensive enough as to make both performance and scouting data obsolete. We’re not there, and we’re not going to be there. But, the plan is to work toward getting there as much as possible. PITCHf/x, HITf/x, and FIELDf/x (or similar systems) is the gold we’ve been looking for.
Imagine knowing exactly when Ryan Howard starts his swing when he faces Johan Santana, when he expects fastball, and gets curve, he expects it inside, and it goes outside. It will become not only important the angle at which it comes off the bat in that particular pitch, but at what angle does it normally come off under those conditions. We’re not going to necessarily care exactly where the ball ended up, we won’t necessarily care what happened when the ball and bat met, what we may end up caring about the most is the exact millisecond prior to impact: given all the effort exuded by Santana and Howard, in the amount, direction, and timing of that effort, what should have happened?


The data that UZR and other defensive systems use can be used to tweak offensive evaluation and models that estimate true, long-run offensive talent, just as they are used to create these on defense, but…
If you rely exclusively on such a model for offense, you will run into problems, because we don’t have granular enough data on batted balls.
Let me give a few examples which will explain what I mean:
Even though the data tells us the speed of each batted ball, slow, medium, and hard, all “hard” ground balls by Juan Pierre are not the same as all hard hit ground balls by Ryan Howard. So, if we used the same methodology we use for defensive metrics to evaluate offense, we would likely be underrating Howard and overrating Pierre on ground balls (Howard will have more ground balls go through the IF because he hits them harder, but the “system” will assume that all ground balls of a particular speed by both players should be fielded the same). The same thing is true of air balls.
Another thing is player speed. Player speed is an important factor in things like IF hits and whether outfield hits are singles, doubles, or triples. Again, if we use the same methodology we use for defense, the “system” will assume that all players are of the same speed. Again, a ground ball by Pierre is NOT the same as a ground ball by Yadier Molina because Pierre will beat many more of them out for a hit. So all of a sudden that ground ball to Jeter that gets fielded 95% of the time is NOT really 95% for players of different speeds.
Same thing with fielder positioning. The actual results of batted balls that go into traditional offensive metrics - out, s, d, t, roe - include the positioning of the fielders. A defensive-type system would not.
Etc.
Of course, all of these things could be accounted for - not perfectly.
So, we are probably better off in the long run just using what actually happens for an offensive evaluation system. In the short-run, we definitely could improve offensive evaluation systems by incorporating the types of things that we use in defensive systems like UZR. Lots of guys are already doing that (like the THT guys) in their offensive projection systems - for example, incorporating line drive rates and things like that…