THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, October 02, 2006

Misunderstanding Win Expectancy

By Tangotiger, 10:30 AM

Via studes’ blog, I ended up here, at Crawfish Boxes, who says:

Now, fairness compels me to state that the WPA contraption does not see a big difference between a man on second and a man on third with two outs there. And it says that the play cost us a little over 70 points of win probability.
But I say that’s bullshit.

I thought putting out a book that explains how it all works would stop me from explaining how it all works.  Let me explain how it all works.


In the Crawfish example, it’s the bottom of the 9th inning, tied, with a runner on 2b, and 1 out.  This version of a win expectancy table has the win probability at .703. 

Now, Crawfish says:

I don’t think you can overstate the importance of hitting the ball to the right side in that situation, and I don’t think you can understate the damage a K there would have done.

He’s dead-wrong about the K.  You can’t understate the damage of a K, with less than 2 outs, with a runner on THIRD.  The reason is that the runner has a huge change of scoring with less than two outs from 3B, 86% with no outs, and 66% with 1 out.  That plummets to 26% with 2 outs.  A strikeout in this situation is terrible!  But the runner needs to be on third base.

With a runner on 2B and 1 out, he will (eventually) score 40% of the time.  With 2 outs, that’s 22%. 

Remember those two numbers.  Using real-life data, a guy will score with 2 outs from 2B 22% of the time, and frmo 3B 26% of the time.  It’s not a big deal.  It’s why the old adage of “don’t make the 3rd out at third base” is real.  The value of being on 3B with less than 2 outs is that a flyball scores you. 

Now, why are the chances of winning from 2B, 1 out 70%?  What do I consider?  If you read the Crawfish entry, you’d think win expectancy considers little.  In fact, win expectancy (as I implement it) considers EVERYTHING.  Not everything.  But, EVERYTHING.  Anything that has ever happened on a baseball field from 1999-2002 is considered, and weighted, by the frequency in which it occurred. 

When you’ve got a guy on 1B and 1 out, I already said the chance that he’ll eventually score (and win the game in the bottom of the 9th) is 40%.  That 40% includes any possible way you can think of, to get that guy to 3B and home plate.  So, 40% of the time, the game ends with a win for the Astros.  The other 60% of the time, the game goes to extra innings, of which the Astros will win half the time.  40% + half of 60% equals..... 70%.  Empirical data from Walk Off Balk, 1979-2004 bears this out: .703.  This is reality.  You have a 70% chance of winning with a guy on 2B and 1 out.

Now, what about 2 outs, and the guy either on 2B or 3B?  Again, go through the same logical process.  2B and 2 outs means the runner will eventually score 22% of the time.  22% + half of 78% = 61%.  3B and 2 outs means the runner will eventually score 26% of the time.  26% + half of 74% = 63%.

It’s a 2% difference.  It’s not 10, it’s not 20.  Walk Off Balks’s empirical data says it’s 3.8% difference.  But, the sample size is very low with the guy on 3B (just 521 times in 25 years!).  One standard deviation is 2%, which shows how insignificant the empirical data is.  Regardless, 2% is our best estimate.

The win expectancy model is extremely simple to model.  It needs the frequency of all possible events at every state, and it needs to know all the possible places the batters/runners end up following the event.  It’s as simple as it sounds.  Human intuition, however, tries to process things in a different way.  Human intuition is superb if you have a model with alot of holes that needs plugging.  Win expectancy is not such a model.

#1    MGL      (see all posts) 2006/10/02 (Mon) @ 12:31

Can you post the above as a reponse on that website?  Not that the guy is going to change his mind and admit the error of his ways.  People have an incredible ability to dig their heels in even when they are wrong.


#2    Tangotiger      (see all posts) 2006/10/03 (Tue) @ 11:22

The following was posted on SOSH, and I am reposting it here:

==================================

one of the problems being that players on good teams will have higher WPA’s, just by virtue of them having more wins

FALSE!

Noah is completely right that it’s players on .500 teams that will be inflated a smidge.

Well, its been pretty widely proven that the home team wins (off the top of my head) 53.5% of the time. So right off the bat, WPA has a major flaw.

That’s the AVERAGE home team.  In any case, if you want to provide the necessary charts, and perhaps provide different ones for different parks, be my guest.  In the end, it’s a minor issue that’ll come out in the wash, since every team plays the same number of games at home and on the road.

WPA, as a framework, allows you to provide whatever parameters you want.  If certain sites choose to have a different implementation, that’s not an issue with the WPA framework.  That’s an issue with that particular implementation.

The biggest issue with WPA, imo, is that it is almost 100% a retropsective stat, not a predictive one.

But, isn’t OBP, SLG, BA, ERA, and every single stat in existence a “retrospective” stat?  You are implying that WPA doesn’t have the byproduct of being used in a predictive fashion.  I don’t see why that is an issue, any more than ERA would be an issue.

A good (and perhaps necessary) next step for WPA would be to generate a separate set of value tables for each ballpark based on its run environment, thus park-adjusting the stat.

A person who says this has an excellent grasp at what WPA is doing and not doing.  This quote is dead-on, on the absolute necessity to get the right run environment to generate the win probability tables.  90% of the problem with any WPA construction must resolve the run environment issue.  Otherwise, a pitcher and hitter can’t be compared fairly.  Until this is addressed, anything else is nothing more than a tweak.

Adjusting for opposition quality would be a bitch but would be a huge step in measuring actual value. There’s a major problem in that the batter / pitcher matchup ceases to be symmetrical. That is, giving up a walk-off homer to Alex Gonzalez should reduce WPA a lot more than giving one up to Ortiz (or Hafner). Yet obviously Gonzalez and Ortiz should get the same credit for hitting the same walk-off HR off the same pitcher. Basically, for the pitcher, you want to recalculate the run environment based on the current and following batters, but not based on his own quality, and for hitters, you want to recalculate the run environment based on the opposition pitcher, but not based on the quality of the lineup they’re part of. Which of course destroys the zero-sum nature of the metric.

Another excellent quote, and is 9% of the problem with WPA.  In fact, what needs to happen is to look at it from the standpoint of a gambler.  If Santana is pitching, he is “preassigned” WPA before the game even starts.  In fact, all 50 players are preassigned WPA, based on their expectations for that game.  By the end of the season, if all the preassigning was done “perfectly”, then every player’s in-game WPA will total to exactly zero, and you are left with the sum of his preassigned WPA.  Which of course makes WPA itself useless if that was the case!

The larger power of WPA is to represent in-game what is happening.  Its utility on a seasonal basis is more for an historical account, and its use for analysis and prediction would be trumped by other metrics.

WPA is what it is.  Nothing more, nothing less.

***

As for Ortiz/Hafner, Ortiz was unreal in crucial situations, and Hafner, not so much (even with all those grand slams).  There was a 3.3 win difference that is completely attributed to the timing of Ortiz’s performance versus that of Hafner.  That is a huge difference.  Whether the timing was by design is irrelevant.  Redsox got 2 extra wins because of it, and the Indians had 1 extra loss on it.


#3    Tangotiger      (see all posts) 2006/10/04 (Wed) @ 07:36

A followup to my SOSH post:

=============================================

Tango, I’m not sure I understand what you’re getting at here. You could take the starting lineups and starting pitchers and run a large number of simulations (using player stats) to determine the expected WPA for each player. But where does that lead to?

This leads to EXACTLY where you were going with, with your example.  If you have Eckstein hitting in front of the pitcher, or hitting in front of Pujols, the value of his performance is greatly affected.  A walk is far more valuable if you have someone who can leverage it.

Therefore, at its ultimate, WPA must account for all known variables.  Exactly what a gambler would do, and exactly what a fan would do.  The fan is aware of the identities of all players involved, and therefore, the win probability tables should reflect those parameters.

However, once you do that, the in-game WPA of Pujols and Eckstein, over a long period of time, will equal to exactly ZERO!

That is, if Eckstein gets on base in front of Pujols, Eckstein gets a huge plus on his walk.  But he got the huge plus because of Pujols.  Pujols is jipped out of that, because to Eckstein, Pujols is part of the context. 

Now, once Pujols is at bat, the opposing pitcher sees him, and he’s scared to death.  If he gets him out, that’s a huge plus for the pitcher.  However, this has to be a zero-sum situation.  If Pujols only gets the standard minus, who ends up getting the rest of the minus?

Therefore, in order to resolve the Pujols on deck, and Pujols at bat situations, you must treat every player as being in the context.  We are no longer comparing players to the “average player”.  We are now comparing players to our expectation, given the context.  And that means, essentially, comparing players to their own averages.

In the end, if you made a perfect guess as to each player’s average, and the expectant batter/pitcher matchup that would result, then, over a long period of time, the sum of all in-game totals, for each and every player, will be exactly zero. 

All of the value comes in the preassignment of wins, exactly the way a fan and gambler would do it if they see Santana pitching, or knowing if Pujols is playing or not.

What this means then is that the value of WPA comes in looking at games in isolation, to see who performed better than their own expectations.

***

Two things:
1 - It is 1 million times harder to do it the right way, than the quick way.  You will, in effect, get just about the exact same answers anyway.  So, unless you are a real-time gambler, don’t bother.

2 - The preassignment of wins must now decide how a reliever will be used (amount of leverage), and clutch performance (to the extent that it exists).  You can of course choose not to do this, and simply let the in-game numbers capture the differences.  That is, you are not worried about getting the in-game numbers to sum to zero for each player, because, in the end, whatever wins you preassign, and whatever wins results in the in-game, will add up to what they should be, no matter how you handle WPA.


#4    studes      (see all posts) 2006/10/04 (Wed) @ 08:11

Just want to point out that users can generate different WPA tables for different parks in my spreadsheet.  Plus, there is an option to give the home team an advantage.

I have to say that I’m not a fan of adjusting WPA for the quality of the participants at any point in time.  I understand why some would like to do it, particularly gamblers and in-game managers, but the purer version of WPA also has value in that it assigns WPA to players based on what they do, not what they’re expected to do.  I personally have a preference for the cleaner approach.


#5    Tangotiger      (see all posts) 2006/10/04 (Wed) @ 08:49

It all depends on what you are trying to do, of course.  If the bases are loaded in the bottom of the 9th, down by 2, and Bonds is at bat, with Mayne or deck, a “clean” WPA table will give the same win probability if either Bonds or Mayne would have been at bat, and therefore, the same gain if they have the same result. 

Strategy-wise, manager or fan, or betting-wise, gambler, that’s not the case.  It depends what you want to do.

The only reason I prefer the cleaner approach is stricly because of time.  If I were retired and had oodles of time, I’d opt for the manager/fan/gambler approach.


#6    David Smyth      (see all posts) 2006/11/01 (Wed) @ 07:37

I noticed this quote from Palmer in The Hidden Game, when discussing the Mills Bros system. The reason I have always missed it is that it’s not in the body of the chapter, but in the ‘notes’ section.

“The major flaw in the Mills brothers’ system is that the Player Win Average weights a few events very heavily, many others quite lightly, so that it effectively has a smaller sample and is therefore less accurate. A combination of overall and situational data would be better.”

It looks like Palmer didn’t really understand win probability at that time.


#7    tangotiger      (see all posts) 2006/11/01 (Wed) @ 08:04

David,

I think Palmer understood win prob pretty well at the time (and now).  In the Hidden Game, he talks about how ace relievers have twice the leverage than starters.

Without remembering the context of the above quote, his quote seems fine to me.  It sounds like all he is saying is that you can’t get a good enough read on a player’s talent level by looking at his PWA, since you are intentionally introducing noise that the player himself can’t control.

Can you expand on the context of his quote?


#8    David Smyth      (see all posts) 2006/11/01 (Wed) @ 16:55

Well, my interpretation is that he is starting out by judging WP by the standards of regular Lwts. Then, when he says that a combo of regular info plus situational info should be prefered, it seems like he is implying that the base-out situation should perhaps be included (as in value-added), but not the inning-score context. He doesn’t seem to realize that it is inherent to WP that some PAs will correctly receive a very high weight.

It’s pretty clear to me, unless someone can convince me otherwise, that Palmer was using an inappropriate standard to evaluate WP.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 01:57
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential