Tuesday, November 08, 2011
Numbers don’t (necessarily) represent the performance of the player
Jeff asks an innocuous question:
As I understand it, the Rookie of the Year award is supposed to go to the league’s best rookie. Consensus seems to be that “best” is some combination of performance and playing time. This is why Brett Lawrie doesn’t show up at the top of many lists. But why should playing time be that important? Brett Lawrie came to the plate 171 times and hit .293/.373/.580. That is an outstanding performance. An outstanding performance over a limited sample, sure, but a more outstanding performance than any other AL rookie, as far as I can tell. Why shouldn’t he get more consideration for the award? It isn’t the AL’s most valuable rookie. It’s the AL’s best rookie. There’s room for interpretation. Man, there’s room for interpretation with everything.
It is an outstanding RESULT. It is an outstanding OUTCOME.
And those results, those outcomes, are LINKED to Brett Lawrie.
Can we therefore INFER that because those outcomes are linked to Brett Lawrie that we (necessarily) conclude that Brett Lawrie had an outstanding performance?
Just today, I caught every single green light. I mean every single one. That was an outstanding outcome. And, it was me, Tom, driving the car. Can you infer that I had an outstanding driving performance? Intuitively, we know that virtually all of that was luck. So, since we know most of it is luck, we simply conclude that all of it is luck, and regress my “performance” 100% and treat it as all luck.
Brett Lawrie came to bat less than 200 hundred times. And he was deeply involved in each one. It SEEMS like the outcomes linked to Lawrie is OWNED by Lawrie. But that’s not true! A large share of those outcomes are owned by Lawrie, but not all of them.
And here’s another weird part: the more outcomes he had, then the larger share of those outcomes that we can attribute directly to Lawrie. So, if you had 2 plate appearances for Lawrie, then we attribute very little of the OUTCOMES to Lawrie. We just don’t know if he was being a good driver, or just happened to be in the driver seat at that moment in time. If he had 20 PA, then we attribute more of those outcomes to Lawrie. If he had 20,000 PA, then we’d attribute 99% of each of those outcomes (including those first 2 outcomes) to Lawrie.
This is a Bayes world.
Unless you can make a perfect and direct connection from Brett Lawrie to a particular outcome, then we have no choice but to INFER FROM the outcome back to Lawrie, the extent to which Lawrie himself actually influenced that outcome. And one way to do that inference is through regression.
Remember: everything we see is an observation. And our job is to infer what caused that observation. And the more observations we have of Lawrie, the more we can infer each single observation.


To put it another way:
Brett Lawrie hits .293/.373/.580 in 171 PA in 2011
Willie McCovey hits .293/.368/.590 in 261 PA in 1962
But because McCovey had already accumulated 900 PA prior to that, we can attribute a larger share of those outcomes in 1962 to McCovey than we’d attribute to Lawrie’s outcomes in 2011.
But, if Lawrie continues to play for a few more years, and does so at a high-outcome level, then we can retroactively attribute more of Lawrie’s 2011 outcomes directly to Lawrie… even though 2011 is already in the books.
How we see the outcomes, and their relationship to Lawrie is dependent on what more we know about Lawrie outside those outcomes.