THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, June 08, 2007

More about WPA

By Tangotiger, 11:51 AM

There’s an on-going Q&A thread over on a Cardinals forum, which I will cut/paste my comments here, but they make more sense in context there:


The first thing to remember is why does WPA exist. It exists to give you a snapshot of the game. That is, at this point in time,
- what are the chances of winning
- after this at bat, what are the chances of winning

You capture the difference, and associate it to the batter/runner/pitcher.

Now, the next temptation is to aggregate this over a number of games, or season. But, that’s not really why it exists. That is a byproduct.

Even so, let’s say we do want to aggregate it, does it tell us anything. If you look at this thread:
http://www.fangraphs.com/blogs/index.php/is-wpa-predictive-for-batters /

You will see that WPA is NOT as predictive as OBP, SLG, or OPS. But, it’s still fairly strong.

However, if you look at this post:
http://www.fangraphs.com/blogs/index.php/is-wpa-predictive-for-batters /#comment-10954

You will see that WPA divided by LI is as predictive as all the major ones.

***

In the long run, say several years, WPA will converge to the major numbers (OPS, RC, LWTS). So, it definitely means something.

WPA applies as well to pitchers as hitters.

As for “overrating” relievers, it does no such thing, any more than an average reliever having a better ERA than an average starter overrates relievers.

It simply is what it is. What you need to do is interpret it on that basis, that relievers need a different baseline, not only for WPA, but for ERA, K per IP, etc.

***

Let me ask you the question:
1. If Albert Pujols gets a few walk offs, where the Cards were down to their last out, and he pulled one out of his back pocket, turning a sure loss into a real win, is that impressive?

2. If Mariano Rivera comes in with the bases loaded, 0 outs, up by 1 in the 9th inning, and strikes out the side, is that impressive?

What WPA does is measure that. It takes the pulse of the game before the batter comes to bat, and then after, and says “here you go, this is what happened! This is why Scutaro gets a huge WPA gain and Mariano must get the exact opposite WPA loss”. If you look at it from both perspectives (hitter, pitcher) at the same time, you are forced to conclude that not all runs are created equal, without the benefit of hindsight. That the evaluation of the game, of how you feel, exists in real-time, and not in a “if I had known”. If you knew Mariano was going to give up that HR to Scutaro, why would you even bother going to the game?

In fact, if Vlad hits a grand slam against the Redsox in the playoffs to send the game into extra innings, only for the Angels to have subsequently lost the game, he may have as well struck out, since it’s the exact same results.

***

The base comparison is the sum of each WPA/LI (that is WPA divided by LI on a PA-by-PA basis). So, the question is: what does the sum(WPA) give you over the sum(WPA/LI)?

Let’s look at the best hitter in the league:
http://www.fangraphs.com/statss.aspx?playerid=1177&position=1B

In 2006, his WPA/LI is +6.1 wins. In 2005, his WPA/LI is also +6.1 wins. This measure is purely a reflection of his hitting skill with respect to the game situation, but without the extra leverage. So, a HR in the 9th inning or 1st will still be worth around +.12 wins. (It changes a little, but not enough to concern us at the moment.)

Similarly, you can use OPSwins as:
wins = PA * .025 * (1.7*OBP+SLG-1)
to get to the same results.
In 2006, Pujols would be:
634 * .025 * (1.7*.431+.671-1) = +6.4 wins
And in 2005:
700 * .025 * (1.7*.430+.609-1) = +6.0 wins

So, that’s what he is, a +6.0 to +6.4 wins hitter.

In 2006 however, he put on one of the best clutch hitting performances you will ever see. His WPA in 2006 was +9.6 wins, giving him a clutch impact (though not clutch skill) of +3.5 wins. It is a fantastically high number.

If you sort by LI:
http://www.fangraphs.com/statsp.aspx?playerid=1177&position=1B&season= 2006

You will see that in high-leverage situations, he was unbelievable. In the 10 most important at bats last year (other than the IBB), he made an out only TWO times! That is clutch.

However, in 2005, in the 10 most important at bats, he made an out EIGHT times. That’s not so clutch.

So, that’s what WPA captures. Pujols was lucky in 2006… not that he was lucky to hit HR, but lucky that he TIMED the HR when he did. What you have here is a great hitter, who was great in 2005 and great in 2006, and yet, somehow, was better able to time that greatness in 2006 when the Cards needed him the most.

While clutch hitting does exist, the amount of PA you need to even determine that someone is a clutch hitter is huge, on the order of several thousand (a career’s worth). At that point, who cares, other than for retrospective thought.

You certainly wouldn’t base your decision on whether to bring in a certain player based on his “clutch stats”. You *could* bring someone in if you believe in something beyond the numbers. But, the numbers aren’t going to help you out.

***

Yes, it makes Vlad a star, in real-time.

But, in hindsight, he may have as well struck out. That’s why, you need to decide:
1. do I care about what happens in real-time
2. do I care about what happened, in hindsight
3. do I care about what the past tells me about the future

Choose one, not all.

WPA *can* be used for all three of those things, but it is *not* the best one for each of those three things. WPA is perfect for real-time.

#1    MGL      (see all posts) 2007/06/08 (Fri) @ 13:22

God, I hate WPA. On top of everything, now it confuses people.

It tells you the impact of the result of a PA on the WE of one team or the other.  Nothing more and nothing less.  That corresponds, more or less, to the “excitement” of the PA.  By definition, a PA with a low WPA was, “ho-hum,” no big deal, like a 2 out walk with the bases empty in a 5 run game in the 9th inning. OTOH, a PA with a high WPA was mostly likely a “wow” event, or that was a great AB (for the pticher or batter) like a 2 out HR in a 1 run game (batting team losing), or even a no out walk in a one run game in the 9th.

If that floats your boat, that’s fine with me.  What can it be used for?  I suppose for pats on the back or for some kind of MVP type award or some fairly interesting discussion over a beer or two (whose performance really impacted their team or had the potential to impact their team?).

The problem (and I thinkit is a big one, but that is a qualitative argument, not a quantitative one) with WPA as a “wow,” MVP, or “over a beer” metric is that it does not distinguish between whether the team won or lost or even whether a run scored or not.

If a player got a lot of those no out walks with his team down a run in the 9th, but a run never scored and his team never won, NO ONE is going to remember those walks or that player, and you would be laughed out of the bar if you tried to tell everyone how great and clutch this player was.  Some really sophisticated and deep thinking bar patrons might say that this player did indeed do a great job when his team needed him the most, but that his teammates let him down.

But let’s face it, in the real world people only get props for helping to cause success (wins, and important ones at that) and NOT POTENTIAL SUCCESS (WE).  Only nerdy stat-heads care about that.

In any case, of what other use is WPA?  None that I can think of.  While it will converge with context-neutral linear weights or ERC or ERA (or any metric without a bias) in the long run, it is NOT at all useful in making real-life decisions or predicting any future events, beyond what a context-neutral metric provides.

In fact, it is ALWAYS worse, in less than an infinite number of historical PA!

I take that back.  WPA probably captures what little clutch skill we think exists among MLB players, so that it will NOT quite converge with a context-neutral measure of performance.  In an infinite number of PA if any or every player had some unique clutch skill, no matter how small or large, it will be captured in the difference between their linear weights and what their linear weights would suggest in WPA (given their teammates, opponents, etc.).

For example, if player A and B both had the same lwts projection, then you go with the one who has the higher WPA just in case he has a slightly higher clutch skill, which we presume he does (no matter how small).

But if player A has a +5 lwts projection and player B has a +10, you go with player B pretty much no matter what their difference in WPA is for the current year or for their careers.

I suppose that there is a point at which if their context-neutral projections are close enough and their WPA difference is large enough and their PA history is large enough that you might go with the player who has the smaller projection but the larger WPA, but those situations are very rare.

Almost everything we do as sabermetricians helps us understand the game better from the standpoint of which players and teams have the better true talent and what strategies help teams to win. 

This ain’t one of them. It is kind of like an interesting (to some people) yet useless sidebar in sabermetrics.


#2    Tangotiger      (see all posts) 2007/06/08 (Fri) @ 13:33

WPA is mostly useful for in-game analysis.  The aggregating of data has limited value (and really none if you have LWTS), unless, as MGL points out, it is carried over a long career.

However, an important measure, and one that is definitely better than LWTS is WPA/LI.

For example, we can accept that the way Ichiro bats with bases empty or runners on base is different, and the way Glavine pitches in those cases is different. 

So, we kind of like LWTS by the 24 base/out states, since the batter/pitcher are aware, for example, that the run value of a K changes wildly with a runner on 3B and less than 2 outs.  Given all that, we need to account for that.

Even more so is LWTS by game state. With the bases loaded, a walk or a HR is the same thing.  The exact same thing.  A batter knows it and the pitcher knows it.  They are both going to alter their approach to maximize their outcome.  If ARod hits a HR in this case, I don’t treat it as a “HR”.  I treat it as a certain win value change, but depress its leverage. 

That is, if the average “plus” value in a random situation is +.05 wins (+.03 for a walk, +.13 for a HR), then I would make this particular ARod HR also +.05 wins (+.05 for a walk, hit, or HR). 

You can also account for “moving runners over on outs”.

Construction a Linear Weights chart by game-state, but depressing out its Leverage, is the true measure. 

To put it simply, we want wOBA by Game State.

And, that is WPA divided by LI (WPA/LI).


#3    Tangotiger      (see all posts) 2007/06/08 (Fri) @ 13:38

To make it clearer, in the situation of bases loaded, bottom 9th, tie game, two outs, the wOBA equation is this:

1.0*BB + 1.0*RBOE + 1.0*1B + 1.0*2B + 1.0*3B + 1.0*HR
---------------------------------------------------------
PA

That is of course very different from the standard wOBA.  But, that’s the actual impact of the events.  And this is the preferred, if you believe that the batter/pitchers change their approach based on the knowledge that a walk and HR are now equals.

If you don’t believe that, if ARod is just as likely to hit a HR in this situation as in a bases EMPTY, tie game, two out, bottom of 9th (where the HR is far more valuable than a walk), then this method (WPA/LI) has no value-added.


#4    MGL      (see all posts) 2007/06/10 (Sun) @ 00:12

I disagree that WPA/LI or wOBA or lwts by bases/outs state is an important measure.  I am sure that batters and pitchers change their approach with the game state.  However, is there any presumption that some batters or pitchers are significantly better than others at this?  If there is, then we may be able to tell from looking at these kinds of measures.  If not, then we have the same problem as with WPA - which is too much noise leading to inferences that are not true.  For example, if player A has a little better wOBA tha player B but player B’s wOBA by game state is a lit higher, which player would you want?  And if we think that some pitchers are better at altering their approach by game state than others, then we should be using regular old ERA rather than ERC or lwts (or wOBA) against (which we probably should in the very long run). 

It all boils down to the same argument.  At what point (how many PA) do we want to use more granular (non-context-neutral) measures like WPA/LI or wOBA (or lwts) by game state in order to make future or present decisions or in order to better project a player’s value in wins?  That depends on how well the metric measures the skill we want to measure, what the spread of skill is, and the number of historical PA we are looking at.  For mosy things the spread of skill is so small that it becomes a mistake to use these “context-included” measures than the context-neutral ones.


#5    Anthony      (see all posts) 2007/06/10 (Sun) @ 08:38

I think pitching with runners on is definitely a skill. Hitters still have to react to whatever the pitcher does, so there’s only so much they can change. But pitchers can and do change their whole approach. They change their motion (pitch out of the stretch v. windup), alter their delivery to the plate to throw off baserunners’ timing; some are much, much better at holding baserunners. I think looking at pitcher performance by base/out state makes a lot of sense; batters, not so much.

If you look at Whitey Ford’s DT card on BPro, his Delta Runs is negative in 15 of his 16 seasons. He allowed 134 fewer runs than expected based on his stat line for his career, which is .38 R/9. That would probably be significant over 3000+ innings, no?

Tom Glavine is another one: 15 of his last 18 seasons are negative for Delta Runs, and he’s -121 runs over that span (he was also +32 in his first three seasons, so -89 for his career).

Nolan Ryan had a +173 Delta Runs. He was terrible at holding runners. Go check out his stolen bases allowed each year on Baseball-Reference, then look at Whitey Ford. I think Ryan had 22 straight seasons(!) allowing more SB than Ford allowed in total from 1957 to 1967. That’s insane.

Since most starters pitch from the windup with nobody on and most relievers pitch from the stretch with nobody on...would looking at the way each set pitches with men on tell us if pitching from the stretch really makes a difference?

Along these same lines… I’d love to see Dan Fox expand his Gameday columns to look at fastballs by base/out state to see if 1) pitchers throw more fastballs with runners on first (to help the catcher throw out basestealers); and 2) pitchers throw fewer splitters (or really low pitches in general) with runners on third (to prevent run-scoring wild pitches). That’s another potentially meaningful way in which pitchers adapt to a situation.


#6    tangotiger      (see all posts) 2007/06/10 (Sun) @ 12:40

As Andy showed in The Book, there is a definite “stretch skill”.

***

http://www.baseball-reference.com/pi/bsplit.cgi?lg=ML&team=TOT&year=2006

If you look at bases empty and man on 3B (3b only, 1b/3b, 2b/3b, bases loaded), here are how many HR and K per nonIBB walk:

BE: 0.40 HR, 2.33 K
3B: 0.26 HR, 1.94 K

And some players have a much bigger difference.  As well, when it comes to leadoff hitters especially, they certainly won’t have the same proportionate split of base/out situations.

So, at the very least, even if you don’t buy into the game-state differences, you have to look at the base-out-state differences.  You’d have to say: “given how often Ichiro had bases empty, bases loaded, man on 3b, less than 2 outs, etc, how would an average batter do?”.


#7    MGL      (see all posts) 2007/06/10 (Sun) @ 19:24

Yes, Anthony, read the book that this blog is based on.

Of course another way to context-neutralize batting and pitching stats is to adjust for bases/outs just in case in the short run certain batters or pitchers had a non-typical set of bases/outs.  You have to be careful with pitchers though as a pitcher generally creates his own bases/outs states so you don’t necessarily want to “neutralize” that for a pitcher.

But we are talking about something else.  Anthony, yes we all concede that there is skill in pitching and batting in different situations.  The question is what is the spread of that skill within the population and therefore how much to regress these stats that are adjusted for game states.  If the spread of skill is minimal, as it is with clutch hitting, then the regression is a lot and we are MUCH better off not using the game state stats in order to infer that skill or to make decisions, just as we don’t want to use clutch stats (over their context-neutral stats) to make decisions about players and to infer clutch skill, unless we at least regress those stats the appropriate amount.

And yes, there will be always be players at the extreme of ANY splits whether those splits have any skill (spread of skill I should say) associated with them or not.  So what?  I’m sure there are pitchers who have great stats on odd days but not on even days, etc.  First you have to determine the spread of skill then you have to take a pitcher’s splits and regress them.  You don’t look at the outliers and assume that they must be due to skill.  Even when it is intuitiely obvious that skill is invlolved.  That is not the way it works.  We would have assumed that there was a great spread of skill in MLB as far as BIBIP, but as it turns out there is not.  Also remember that skill is not equivalent to spread of skill. It is not whether we are trying to find out whether there is skill in something.  It is what the spread of that skill is.  If something has no spread of skill in MLB for whatever reason, then we have to ignore all differences in performances among players with respect to that skill.  A very important point to understand.


#8    Anthony      (see all posts) 2007/06/10 (Sun) @ 22:29

Ah, yes, well...I blame Amazon for not having delivered my book by now.

I understand what you’re saying about spread of skill. My point was rather that I think this method is much more worthwhile for pitchers than for hitters. Maybe I’m wrong, but I kind of like WPA/LI for pitchers.


#9    MGL      (see all posts) 2007/06/11 (Mon) @ 06:33

Could be, but again, before you decide how much you like a stat that is not context-neutral, at least as a predictive stat or one that implies something about true talent (rather than what just happened), you still have to determine the rate of regression vis-a-vis the spresd of talent.

You can’t just assume that there is some significant spread of talent and then assume that the non-neutral stat has some predictive value.  And again, it is a matter of degree.  While Andy found in the book that there is some spread of pitching from the stretch skill, it is not enough to want to make much of any pitcher’s stretch/non-stretch splits.  Pretty much the same with almost anything but platoon splits for pitchers and LH batters.  Everything else requires a really large sample before you can infer much (from the “splits").  IOW, I am not sure that I want to make anything of a pitcher who has a certain ERC or ERA+ but a much better or worse WPA/LI than the ERC or ERA+ would suggest.  I am talking “before the fact” (before we have a gigantic sample size) and not after the fact, as with pitchers like Glavine.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:23
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors