THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, July 19, 2007

Win Probability: Preallocation of wins

By Tangotiger, 10:44 AM

This comes up every now and then, and I’m not sure I’ve posted it in my blog before (though I have elsewhere).  Should the win probability tables include the identities of the players, their batting spots, etc?  Or, do you always presume each team is exactly equal, even if you have Jose Lima facing Albert Pujols?

There are two ways to look at this:


1. You are a gambler.  From this perspective, you must know all this.  If Pujols comes to bat, the win probability must be higher than if it was Neifi Perez.  And, if Pujols makes an out, that must drop you down more than if Neifi Perez makes an out.  I know it looks like you are “punishing” Pujols, but this simply represents your expectations.  In fact, if you measure everything perfectly and your sample size is large enough, then the aggregate change in win probability for all Pujols PA will total to exactly ZERO.  The aggregate change in win probability for all Redsox games will total exactly ZERO.

There’s still the issue of “punishement”.  Well, that’s where preallocation of wins comes along.  If you have Santana pitching, the Twins will have say a .600 chance of winning.  And if he doesn’t, they’ll have say a .450 chance of winning.  Santana is preallocated +.150 wins at the start of the game.  Given enough sample, and presuming your estimate of his talent level was accurate (.700 pitcher, throwing 7IP per start), then the aggregate of his in-game changes in win probability will total exactly zero.  What you are left with is Santana being worth +.150 wins x number of starts.

2. You want to reward performance.  From this standpoint, you can’t presume that Santana is starting from a higher point than Jose Lima.  If you do, then every out Santana gets will be worth less than Lima’s outs.  Doesn’t make any sense.  So, you start off with an environment where everyone is equal.  And the change in win probability can then be allocated in real-time, with no preallocation of wins.  In this case, the preallocation will be exactly zero, while the aggregate change in win probability is the impact of the player.

Maybe you want something in-between.  Let’s say you want to consider the park.  In this case, you preallocate +.040 wins to the team’s park, allowing you to start the home team at .540.  This way, if the Twins win at home, rather than them getting +.500 wins, they get +.460 wins.

What if you want to consider a standard batting lineup?  In this case, you preallocate say +.010 wins for the leadoff hitter, +.050 wins for the cleanup hitter, -.030 wins for the #8hitter, or whathaveyou.  Then, Pujols would be compared to the typical cleanup hitter.  If he performs for that game at exactly what the average cleanup hitter would do, he gets +.050 of preallocated wins, and +.000 of in-game wins.

So, you can really do anything you want with this.  It’s all a matter of exactly what your question is, and then creating an implementation within the win probability framework.



Another illustration:

If you have Eckstein hitting in front of the pitcher, or hitting in front of Pujols, the value of his performance is greatly affected. A walk is far more valuable if you have someone who can leverage it.

Therefore, at its ultimate, WPA must account for all known variables. Exactly what a gambler would do, and exactly what a fan would do. The fan is aware of the identities of all players involved, and therefore, the win probability tables should reflect those parameters.

However, once you do that, the in-game WPA of Pujols and Eckstein, over a long period of time, will equal to exactly ZERO!

That is, if Eckstein gets on base in front of Pujols, Eckstein gets a huge plus on his walk. But he got the huge plus because of Pujols. Pujols is jipped out of that, because to Eckstein, Pujols is part of the context.

Now, once Pujols is at bat, the opposing pitcher sees him, and he’s scared to death. If he gets him out, that’s a huge plus for the pitcher. However, this has to be a zero-sum situation. If Pujols only gets the standard minus, who ends up getting the rest of the minus?

Therefore, in order to resolve the Pujols on deck, and Pujols at bat situations, you must treat every player as being in the context. We are no longer comparing players to the “average player”. We are now comparing players to our expectation, given the context. And that means, essentially, comparing players to their own averages.

In the end, if you made a perfect guess as to each player’s average, and the expectant batter/pitcher matchup that would result, then, over a long period of time, the sum of all in-game totals, for each and every player, will be exactly zero.

All of the value comes in the preassignment of wins, exactly the way a fan and gambler would do it if they see Santana pitching, or knowing if Pujols is playing or not.

What this means then is that the value of WPA comes in looking at games in isolation, to see who performed better than their own expectations.

***

Two things:
1 - It is 1 million times harder to do it the right way, than the quick way. You will, in effect, get just about the exact same answers anyway. So, unless you are a real-time gambler, don’t bother.

2 - The preassignment of wins must now decide how a reliever will be used (amount of leverage), and clutch performance (to the extent that it exists). You can of course choose not to do this, and simply let the in-game numbers capture the differences. That is, you are not worried about getting the in-game numbers to sum to zero for each player, because, in the end, whatever wins you preassign, and whatever wins results in the in-game, will add up to what they should be, no matter how you handle WPA.

#1    MGL      (see all posts) 2007/07/19 (Thu) @ 14:32

I think I sprained my neck reading this! smile


#2    Tangotiger      (see all posts) 2007/07/19 (Thu) @ 14:57

MGL, as you likely prefer the Vegas scenario, the one-line summary is to (a) create a model of all known variables such that (b) over a million games the impact of each player’s performance in terms of change in winning odds, aggregated, will be exactly zero.


#3          (see all posts) 2007/07/19 (Thu) @ 15:22

Very interesting!  So down by 1 in the 9th, with Coco leading of, and then Ortiz and then Manny… there is some traditional (non-Vegas) WPA associated with Coco getting on base.  (let’s say .1 wins).  And let’s say all things considered, Pedroia is available to pinch hit and is 20% more likely to get on base.  If Francona makes the switch, we can credit him with getting .1 wins with 20% better likelihood.  Or, .02 wins for Francona for making the switch.

HOWEVER, with Ortiz and Manny on deck, instead of two “average” historical major league hitters, Coco getting on base is actually worth .15 wins.  And thus, Pedroia getting on 20% more often makes it 20% more likely that the Sox get those .15 wins.  So Francona instead gets credit for .03 wins instead of .02 if we look at the Vegas scenario.

Does that make sense - or better yet, is that correct?  That seems like a non-zero sum way to evaluate people via WPA with the inclusion of all the variables.


#4    Tangotiger      (see all posts) 2007/07/19 (Thu) @ 15:29

The Vegas odds would include the chance that Pedroia would have possibly come to pinch hit, much like you include the chance that Hoffman or Mo would possibly come in relief.

Everything is zero-sum.

I don’t think I understood whether you were doing the first scenario (full preallocation) or the second (no preallocation).


#5    Pizza Cutter      (see all posts) 2007/07/19 (Thu) @ 17:51

A thought: Say the Cards are in Atlanta, and Pujols sees Jimmy Carter in the crowd.  He’s always wanted to meet Jimmy Carter and he’s decided to leave the game to go meet him.  (I don’t like saying, “Pujols gets hurt.” But that’s what I’m really talking about...) Clearly, Tony LaRussa will have to send someone not-quite-as-good out to play 1st and hit in Pujols’s spot and the Cards win prob will have to be adjusted downward.  To whom do we credit/debit the WPA?  The trainer?  Pujols?  Jimmy Carter?  God?


#6    tangotiger      (see all posts) 2007/07/19 (Thu) @ 19:06

Are you assuming method 1 (with full preallocation)?  In that case, you debit Pujols, since he was preallocated a certain amount presuming a certain number of PA.

This would be like Santana coming out after 1 pitch.  He was preallocated a huge amount, and will be deallocated that same huge amount in-game.


#7          (see all posts) 2007/07/19 (Thu) @ 21:55

Pizza cutter, I’d say Pujols.  I’m not trying to advocate some fancy new amazing perfect system for evaluating stuff, just throwing around ideas.  But given it was Pujols’ conscious decision to leave (and like it was Francona’s hypothetical conscious decision to pinch-hit someone).

Tango, maybe I’m talking about something where you’re examining the process of the decision-making, not the results.  In which case, if you make the “right” decision every time, you never lose.  That’s how I try to evaluate Theo’s moves - sometimes you get lucky (Ortiz), sometimes unlucky (Renteria), but since the process was correct (in my opinion) in both instances, he gains points for both deals.  I was just trying to put an in-game, objective spin on it.


#8          (see all posts) 2007/07/20 (Fri) @ 15:40

Would a WPA method that treated batters differently be computationally tractible? 

Standard WPA uses historical data to estimate a situation run-scoring matrix, and a situational win-probability matrix.  All WPA calculations depend on the same two matrices.  The system assumes (1) all plays are independent; and (2) all innings are independent.

When evaluating the WPA of at bat, if you want to consider the line-up surrounding the at-bat (e.g., Pujols is up next vs. Perez is up next), you would have to come up with different situational run-probability matrix for each at bat.

You’d also have the problem of innings affecting one another.  If a pick-off at first means Pujols bats first next inning, as opposed to last this inning, your prediction for runs next inning has changed.

Further, that dependence doesn’t stop next inning; an event in the first inning can affect the likelihood of Pujols batting again in the 10th.  And in order to calculate the value of the event in the 1st inning, you’d need to project out to the 10th.

So, unless I’m confused (which happens somewhat frequently), instead of being able to leverage the independence of events and innings, you would have to do a ton more math.  For instance, Inning 1 can influence inning 2 in only 9 ways (by changing who leads off inning 2, assuming no pinch hitting, no subs, no injuries, etc.), but since that inning affects inning 3, etc., there are 9^8 (9^# of innings - 1) possible ways in which the events in inning 1 can affects the rest of the game.  And unless someone is very clever and comes up with a way to simplify the problem, one would need to evaluate all 43 million effects in order to determine the long-term effects of an event in the first inning.


#9    tangotiger      (see all posts) 2007/07/20 (Fri) @ 16:20

cdm, right on.  Unless you are a gambler, where such analysis would be required (full preallocation), that’s the reason I prefer as much non-preallocation as possible. 

The concept that the players are identical on both teams is extremely appealing from a programming standpoint and explanatory standpoint.

And in the end, given enough games, you will end up with exactly the same results for Pujols using either of the two methods.

Where you gain the most, by far, is the one-game analysis.  That aside, since we typically look at things at the seasonal aggregated level, who the heck has time to program the perfect preallocation model?

It’s good to know what the frameworks are.  But, time must lead this solution.


#10          (see all posts) 2007/07/21 (Sat) @ 02:23

CDM

If you look at market based systems, such as those at Tradesports then they will give you the “gambling answer” to a large degree as the markets update in real time.

You can read more about it in a Hardball Times column I wrote:

http://www.hardballtimes.com/main/article/prediction-markets-redux/

John


#11          (see all posts) 2007/07/21 (Sat) @ 17:07

John,

Sure, as you mentioned in your column, those markets make pretty good predictions.  On an interesting aside: the variance between the predictions made by Vegas this year and actual game outcomes is only slightly greater than what you would expect by chance alone.  Over 1455 games this year, the variance one would expect by chance: 341.17.  Vegas variance: 343.32.  Thats crazy good.

But thats not solving the problem. Its using a whole bunch of gooey gray neural networks to estimate the solution without any real introspective understanding as to how the solution works. Its a purely descriptive model, without any predictive aspect. It doesn’t tell me, for instance, how replacing Kason Gabbard with Clay Buchholz would affect the likelihood of the Red Sox holding onto their one run lead going into the fifth.  For instance smile The “true” gamblers WPA solution would tell me that, if it were tractable.

-cdm


#12    Edgar for Pres      (see all posts) 2007/07/21 (Sat) @ 23:13

I really like that this is being discussed because I think it has a lot of potential.  I like using Fangraphs live WPA charts when watching games sometimes but when I’ve got #7-9 hitters coming up in the 9th and we’re down one run I’m pretty sure we are going to lose.  I can’t really help on implementation but the simpler it was the better (even at the cost of some accuracy I think).


#13          (see all posts) 2007/07/22 (Sun) @ 11:56

While I agree that with Tango that the non-preallocation makes more sense to track and follow as fans, we’re forgetting a very important subset of gamblers: managers.

If you’re a manager/strategist thinking about using WPA to evaluate strategy, you’d be nuts to consider at least a little bit of pre-allocation.  Otherwise, a bunt/don’t-bunt decision would be evaluated exactly the same whether you had - ahem - Aaron Rowand or Albert Pujols on deck.  And that’s just nuts!


#14    tangotiger      (see all posts) 2007/07/23 (Mon) @ 08:58

A manager is in the gambler class, for sure.  A fan evaluating in-game decisions is in the gambler class too.


#15          (see all posts) 2007/07/23 (Mon) @ 10:42

So given that, do you believe that there is a hybrid of no pre-allocation/full pre-allocation that is necessary for using WPA to analyze strategic moves?


#16    Guy      (see all posts) 2007/07/23 (Mon) @ 12:00

"If you have Eckstein hitting in front of the pitcher, or hitting in front of Pujols, the value of his performance is greatly affected. A walk is far more valuable if you have someone who can leverage it.”

“And in the end, given enough games, you will end up with exactly the same results for Pujols using either of the two methods.”

Tango:  I may be thinking about this incorrectly (often the case with WPA), but I’m not sure both of these statements can be true.  Traditional WPA is totally ignorant regarding the future—it “knows” only what has come before.  A leadoff walk—given a certain inning/score state—always has the same value. 

But in the “gambler” version, you seem to be taking account of varying probabilities in what happens next.  A walk by Eckstein has more value than a walk by the Cards’ #8 hitter, because Pujols is on the horizon creating a kind of “pre-leverage.” Won’t the two methods give you different assessments of Eckstein’s value?  And perhaps Pujols’ as well?


#17    Tangotiger      (see all posts) 2007/07/23 (Mon) @ 13:58

It should give you roughly the same, depending on the amount of synergy.

To Frank Thomas: Tim Raines preceded by Frank Thomas is better than a pitcher preceded by Frank Thomas. 

To Tim Raines: Raines preceded by Thomas is beter than Raines preceded by a pitcher.

The Raines+Thomas combination would provide more than additive impact because of the synergy.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 03 21:29
Sabermetric Moves of the 2009 Pre-Season

Dec 03 21:15
What would happen if the shootout period was 10 minutes, not 5?

Dec 03 20:51
Marcel 2009 is here

Dec 03 18:40
Avery being Avery

Dec 03 17:41
How to calculate the area of a baseball field

Dec 03 16:57
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 03 14:50
The Return of the Baseball Abstract?  No, the next best thing…

Dec 03 14:48
Estimating BABIP

Dec 03 10:42
What was Pedro worth?

Dec 03 10:20
Complete Run Expectancy, Retrosheet Years