Thursday, July 19, 2007
Win Probability: Preallocation of wins
This comes up every now and then, and I’m not sure I’ve posted it in my blog before (though I have elsewhere). Should the win probability tables include the identities of the players, their batting spots, etc? Or, do you always presume each team is exactly equal, even if you have Jose Lima facing Albert Pujols?
There are two ways to look at this:
1. You are a gambler. From this perspective, you must know all this. If Pujols comes to bat, the win probability must be higher than if it was Neifi Perez. And, if Pujols makes an out, that must drop you down more than if Neifi Perez makes an out. I know it looks like you are “punishing” Pujols, but this simply represents your expectations. In fact, if you measure everything perfectly and your sample size is large enough, then the aggregate change in win probability for all Pujols PA will total to exactly ZERO. The aggregate change in win probability for all Redsox games will total exactly ZERO.
There’s still the issue of “punishement”. Well, that’s where preallocation of wins comes along. If you have Santana pitching, the Twins will have say a .600 chance of winning. And if he doesn’t, they’ll have say a .450 chance of winning. Santana is preallocated +.150 wins at the start of the game. Given enough sample, and presuming your estimate of his talent level was accurate (.700 pitcher, throwing 7IP per start), then the aggregate of his in-game changes in win probability will total exactly zero. What you are left with is Santana being worth +.150 wins x number of starts.
2. You want to reward performance. From this standpoint, you can’t presume that Santana is starting from a higher point than Jose Lima. If you do, then every out Santana gets will be worth less than Lima’s outs. Doesn’t make any sense. So, you start off with an environment where everyone is equal. And the change in win probability can then be allocated in real-time, with no preallocation of wins. In this case, the preallocation will be exactly zero, while the aggregate change in win probability is the impact of the player.
Maybe you want something in-between. Let’s say you want to consider the park. In this case, you preallocate +.040 wins to the team’s park, allowing you to start the home team at .540. This way, if the Twins win at home, rather than them getting +.500 wins, they get +.460 wins.
What if you want to consider a standard batting lineup? In this case, you preallocate say +.010 wins for the leadoff hitter, +.050 wins for the cleanup hitter, -.030 wins for the #8hitter, or whathaveyou. Then, Pujols would be compared to the typical cleanup hitter. If he performs for that game at exactly what the average cleanup hitter would do, he gets +.050 of preallocated wins, and +.000 of in-game wins.
So, you can really do anything you want with this. It’s all a matter of exactly what your question is, and then creating an implementation within the win probability framework.
Another illustration:
If you have Eckstein hitting in front of the pitcher, or hitting in front of Pujols, the value of his performance is greatly affected. A walk is far more valuable if you have someone who can leverage it.
Therefore, at its ultimate, WPA must account for all known variables. Exactly what a gambler would do, and exactly what a fan would do. The fan is aware of the identities of all players involved, and therefore, the win probability tables should reflect those parameters.
However, once you do that, the in-game WPA of Pujols and Eckstein, over a long period of time, will equal to exactly ZERO!
That is, if Eckstein gets on base in front of Pujols, Eckstein gets a huge plus on his walk. But he got the huge plus because of Pujols. Pujols is jipped out of that, because to Eckstein, Pujols is part of the context.
Now, once Pujols is at bat, the opposing pitcher sees him, and he’s scared to death. If he gets him out, that’s a huge plus for the pitcher. However, this has to be a zero-sum situation. If Pujols only gets the standard minus, who ends up getting the rest of the minus?
Therefore, in order to resolve the Pujols on deck, and Pujols at bat situations, you must treat every player as being in the context. We are no longer comparing players to the “average player”. We are now comparing players to our expectation, given the context. And that means, essentially, comparing players to their own averages.
In the end, if you made a perfect guess as to each player’s average, and the expectant batter/pitcher matchup that would result, then, over a long period of time, the sum of all in-game totals, for each and every player, will be exactly zero.
All of the value comes in the preassignment of wins, exactly the way a fan and gambler would do it if they see Santana pitching, or knowing if Pujols is playing or not.
What this means then is that the value of WPA comes in looking at games in isolation, to see who performed better than their own expectations.
***
Two things:
1 - It is 1 million times harder to do it the right way, than the quick way. You will, in effect, get just about the exact same answers anyway. So, unless you are a real-time gambler, don’t bother.
2 - The preassignment of wins must now decide how a reliever will be used (amount of leverage), and clutch performance (to the extent that it exists). You can of course choose not to do this, and simply let the in-game numbers capture the differences. That is, you are not worried about getting the in-game numbers to sum to zero for each player, because, in the end, whatever wins you preassign, and whatever wins results in the in-game, will add up to what they should be, no matter how you handle WPA.