THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, December 29, 2009

WAR v WARP3

By Tangotiger, 09:27 AM

Matt, at BPro uses WARP3.  Everyone at Fangraphs uses WAR (Fangraphs’ WAR, which I’ll label fWAR).  Most others unaffiliated to these two sites use fWAR or Rally’s WAR (rWAR).  Some use WARP3, notably Hall of Merit (who have also come to use rWAR).

Since WARP3 has the misfortune of not having UZR, PMR, or Dewan, then when should WARP3 be used and not used?  My inclination has been to only use WARP3 (in part) on pre-Retro years.  But in the Retro-years, to not use WARP3 at all.

Seeing that a smart guy like Matt, an outsider basically, who should have felt unconstrained to use whatever he wanted, chose to use WARP3, am I missing something?  Is there something that WARP3 offers that rWAR and fWAR is missing?


#1    Matt Swartz      (see all posts) 2009/12/29 (Tue) @ 10:08

I tested WARP3, WAR, looked at the sum of BIS’ Total Runs Saved and VORP, and looked at QERA with similar replacement level suppositions as WAR.  The results were clear and similar each way, and I opted to go with the statistic that readers were familiar with and for which there was a link to the glossary for readers who were more interested in the question of free agent availability and rebuilding but may have been less familiar with the differences in WARP3 and WAR.  I have consistently used UZR and Total Runs Saved to evaluate defense in articles on BP, because I believe they are better.  I think WARP3 and WAR both have flaws.  WARP3 uses FRAA, and WAR regresses BABIP 0% for hitters and 100% for pitchers, but HR/FB is regressed 0% for pitchers.  I also don’t like WARP3’s treatment of DIPS either.  I like each of these statistics, but seeing each is flawed, I check multiple statistics.  I’ve used both WARP3 and WAR at StatSpeak and TheGoodPhight as well for the same reasons.  Neither statistic is computed how I would do so, but both have strengths that make them valuable tools.

Meanwhile, I rather found my conclusions interesting, especially in light of the discussion with respect to Sky Andrecheck’s $/WAR article.  I’m still wrestling with that concept, and I think concepts in this article could be expanded in light of Sky’s work.


#2    Phil D      (see all posts) 2009/12/29 (Tue) @ 10:52

WARP3 also takes into account league difficulty which I do not believe either of the WARs do. It accounts for differences between the AL and NL over one year (i.e. average AL player will be higher than the average NL player for this season). It also makes a fairly robust adjustment based on the assumption that the quality of play improves over time (i.e. the average 2009 MLB player will be higher than the average 1948 MLB player). Nonetheless, I agree with Tango that for Retrosheet years, WAR is superior. I do think the league difficulty issue, at least the AL v. NL piece of it, is one that should be addressed though.


#3    Rally      (see all posts) 2009/12/29 (Tue) @ 11:09

rWAR does account for the difference in leagues.  In recent years, an average fulltime AL player will be 22 runs above replacement, and the same stats in the NL will be 18 runs above replacement.


#4    Tangotiger      (see all posts) 2009/12/29 (Tue) @ 11:26

I think it’s important to distinguish rWAR from fWAR.

While Matt is correct regarding the pitching portion of WAR, he is referring specifically to fWAR.  rWAR I believe uses the pitching line, and then adjusts the fielders.

Excellent point on the league adjustments. I don’t know what fWAR does, but since Rally just noted he applies league adjustments similar to the way I do it, that’s another feather in rWAR’s cap.

To the extent that they all have their pluses and minuses, this to me “WARP3 uses FRAA” is a huge hurdle.  FRAA makes zero use of play-by-play data.

Thanks Matt for highlighting the various points.  This was exactly what I was hoping to see.


#5    Colin Wyers      (see all posts) 2009/12/29 (Tue) @ 12:31

Matt, I really don’t like the way you’re using the word “regress” there. fWAR doesn’t “regress” pitcher BABIP. It simply credits it to fielders.

And in a retrospective metric, there is no reason for us to “regress” anything to do with hitters or a pitcher’s home run rate at all. At all. Those are real events that occured, and we need to account for them - think of it as double-entry bookkeeping. Everything has to go somewhere. In the case of a home run, the pitcher is the ONLY player on the defense that home run can be “credited to.” Same for a ground-ball single - the hitter is the only player on the offense who can be credited with that event.

The specific issue of balls in play for pitchers is trickier, because there are nine people (conceivably) that can share responsibility for that. A DIPS-like metric is going to give the pitcher some “credit” for hits allowed based upon his BIP rate, and then the fielders will recieve the “credit” for hits above/below the league average rate.

This does not reconcile as well as we might like, because UZR controls for BIP distribution - in other words you can split hits/runs on BIP into two parts:

1) Difference between hits allowed by an average team with average BIP distribution versus hits allowed by an average team with this BIP distribution

and

2) Difference between hits allowed by an average team versus this team given the same BIP distribution

UZR only reports the second component - the first component is simply “lost” using the fWAR method. (And of course you lose all sequencing information as well.)

Rally’s method, on the other hand, gives us sequencing and BIP distribution, but I don’t think it’s doing anything useful to seperate the pitcher/fielder gap. Why? Rally is using prorated team-season defensive ratings to adjust each pitcher’s runs allowed. The problem is that not all pitchers receive the same defensive support, any more than they all receive the same offensive support.

By Rally’s method, a pitcher on a good defensive team who recieves poor defensive support gets doubly-whacked - his RA is larger than his actual contribution merits, due to his poor defensive support. Then it’s further inflated relative to his actual performance, because it’s presumed he actually had good defensive support.


#6    Rally      (see all posts) 2009/12/29 (Tue) @ 12:31

I thought Baseball Prospectus was implementing a PBP fielding system.  There was a bit on it in last year’s book. Did they scrap it?  Or if they are using it, where do they use it?


#7    Tangotiger      (see all posts) 2009/12/29 (Tue) @ 13:15

I agree with Colin regarding the “accounting”.  When you do something retrospective, your “assign” the events to the player(s) involved, regardless as to how much skill they players had.

Reaching on error, catcher’s interference, homeruns, walks from Rich Ankiel, whatever.  You assign the bases, outs, runs, wins, so that it all adds up.  And, if you don’t know who to give it to for whatever reason, you create a “bucket” for it (call it whatever you want, “timing” for example).

Now, Colin brings up the point that you “know” the Mariners 2009 had great fielders but you don’t know if (GBer) Felix Hernandez actually benefitted from those fielders any more or less than say (FBer) Jarrod Washburn.  And even if you had two FB pitchers, how do you know they both got the same benefit from Gutierrez, unless you actually add up the UZR at the PBP level (i.e., PZR).

I think it’s a fair enough point from Colin.  It’s also a fair enough thing for Rally to do to portion them out in some fashion, such that some adjustment is better than none.

This is no different than park factors.  We don’t “know” if Larry Walker and Todd Helton were actually helped by Coors (or if they were helped, maybe there were helped less than others).  We can estimate it on an annual basis, and we know more if we have career stats.  But, in the year 2000, we don’t know how much Coors helped them that year, so we assign generic value across the board.

If we’re going to ding rWAR for the generic fielding adjustments, we’re going to ding every metric out there for the generic park adjustments.

So, Colin is right to bring it up.  But, you can’t single rWAR out for this issue.


#8    Matt Swartz      (see all posts) 2009/12/29 (Tue) @ 13:17

Colin-- good point.  That’s not really what I wanted to say.  I wanted to say that Fangraphs does not assume the pitcher is responsible for BIP at all, while crediting batters for all of it via wOBA (unless it’s adjusted based on opposing defense and pitcher?) and fielders for most of it too via UZR, which kind of double counts the credit too.  I also don’t like using just one DIPS estimator for pitchers seeing as they all currently have flaws, especially if the pitcher’s entire value is based on the DIPS estimator.  I guess the issue is that I think most of HR/FB (excluding PU) is the responsibility of the batter.  I don’t know that the pitcher does much to determine whether the ball goes 300 or 400 feet on the fly, and that’s why park-adjusted HR/FB has similar persistence to park- and defense-adjusted BABIP.  I guess I would say I have an issue in that I separate persistence and control as two different things.  I blame the pitcher for tipping his pitches and giving up the ensuing double in the gap.  I don’t think it’s an indicator he’s going to tip his pitches the following year any more than anyone else.

The whole point is this-- look how complicated these metrics all are.  I don’t like using just one if I have the time and ability to do so.

Rally, I’m not sure if there is someone working on a PBP fielding system.  I know Dan Fox took SFR with him to the Rays but allowed usage of PADE.


#9    Tangotiger      (see all posts) 2009/12/29 (Tue) @ 13:34

Dan Fox = Pirates = SFR
James Click = Rays = PADE (team-level metric, not much different than what is done by others)


#10    Colin Wyers      (see all posts) 2009/12/29 (Tue) @ 13:34

Tom, I don’t think the comparison with park factors is apt here.

For instance, let’s look at Juan Pierre. He played three years in Colorado pre-humidor and put up a line of .308/.356/.371. That’s about a third of his career. His career line is .301/.348/.372. So Pierre at Coors played pretty much like Pierre everywhere else, despite the fact that Coors was one of the more extreme hitters parks ever.

And we have good, sensical baseball reasons to think that this isn’t just a sampling fluke - Pierre’s hitting style simply isn’t all that affected by what happens at Coors.

So given that information, what park factor should we use - in a retrospective value metric - for Juan Pierre’s performance at Coors? The Coors field park factor.

Why is that? Even though the Coors field park factor doesn’t affect Pierre all that much, it certainly affects his OPPONENTS a lot. And so even if Pierre is contributing runs to his team at the same rate as he would anywhere else, those runs are still contributing to fewer wins because of the run environment.

The fact that Gutierrez picks up a lot of fly balls for Washburn has zero impact on how many runs Hernandez gives up or how those runs translate to wins, however. And so in this case it’s correct for us to adjust each seperately.


#11    Rally      (see all posts) 2009/12/29 (Tue) @ 14:13

I will not make the assumption that if two pitchers on the same team have BABIP of .330 and .270, that the fielders played like Ichiro behind one and stood like Adam Dunn for the others.

They might have played better defense behind one guy, but we don’t know it, and it’s likely that if they did the gap is much less than the observed 60 points.

If we can accept the argument MGL often makes that one full season is still too small a sample to make definitive statements about defense, how could we possibly make conclusions based on what they do in 150-200 innings behind a certain pitcher?

If I were to change anything, I’d use multiyear fielding data to adjust the defensive support for pitchers.  I’m not going to do this because of the time and effort that would be required, but that’s the direction I’d lean.

As to the Ricky Nolasco question, my position is he’s a talented pitcher who projects well for the future.  But he wasn’t worth much in 2009.


#12    Colin Wyers      (see all posts) 2009/12/29 (Tue) @ 14:16

Matt, you’re touching on a lot of topics that quite frankly for these purposes we shouldn’t care about. Yes, a lot of this comes up when we want to talk about what a pitcher will do tomorrow. But it really isn’t important when we look at what he did yesterday.

Once you come to the conclusion that in the aggregate your runs scored HAS to equal your runs allowed (that is, when looking at all 30 teams) it becomes a lot simpler to see what you have to do.

For a home run, it’s really quite simple - a home run was given up. Someone for both sides (offense and defense) gets credited for that home run. Nobody but the pitcher can be credited with that home run for the defense.

Now, is that home run is “fault?” Nobody really knows. But we do know that it happened. So we account for it. Now if we want to figure out how many home runs he’ll give up in the future, by all means we can drag out the batted ball data and regress things to the mean and all that fun stuff. But we don’t need a projection system to tell us what HAPPENED.

When you get to balls in play, the dynamic changes, because now there are multiple players with accountability. Once you start using a system - any system, be it FRAA, UZR or whatever else you want - to split defensive responsibility among fielders you need to (for accounting purposes) reconcile that with the credit you are giving to pitching.

So, let’s say a pitcher gives up a ground ball. Say that grounder becomes a hit 33% of the time (I’m just using numbers to illustrate here). So we “credit” the pitcher with a third of a hit, roughly. If a fielder makes a play on the ball, he gets credit for “preventing” a third of a hit. If no fielder makes a play on the ball, then the responsible fielder(s) get “credit” for allowing two-thirds of a hit.

So that’s where DIPS-like pitcher metrics come into play. FIP is smarter here than I think a lot of people give it credit for; a pitcher’s BIP rate is not explicitly considered but if you tear it apart and see how it works you’ll figure out that a pitcher’s BIP rate IS implicitly considered - a pitcher who allows more BIP will see that accounted for in his FIP ERA.

The problem with our DIPS estimators (in this case) isn’t really the DIPS aspects but the fact that they’re all component ERAs, and thus remove information like sequencing that is important in a retrospective evaluation.


#13    Matt Swartz      (see all posts) 2009/12/29 (Tue) @ 14:26

I don’t see why the pitcher needs to be credited for the run, per se.  We don’t credit anybody with the sequencing of hits, so if a pitcher pitches much worse with men on base than with bases empty, we see that as luck and don’t include that even though we don’t shift the blame to anybody else.  A decision needs to be made with respect to cutoffs.  If HR/FB shows the same persistence as pitcher BABIP (all adjusted for team/park), then treating them differently seems odd.  If you add up all of team’s WARP3 or WAR or whatever, you still don’t get their actual win total minus replacement level.  Sequencing luck is removed.  I’m advocating consistency with respect to which types of luck are removed, and it’s still a judgment call.


#14    Colin Wyers      (see all posts) 2009/12/29 (Tue) @ 14:30

As far as what play-by-play metrics BPro is using for defense:

* SFR is gone, sadly.
* There is a PBP metric, similar to SFR/TotalZone, that was developed for the ‘09 book. I do not know off the top of my head if that is used for any of the DT stats right now.

I am currently working on a defensive metric that I feel will be competative with UZR/Fielding Bible/etc. for recent years. As that gets further along we’ll see how well y’all think I’ve done in that department.


#15          (see all posts) 2009/12/29 (Tue) @ 14:32

Colin is right about the park factors analogy.  We only care about the run scoring environment the park creates, not how it affects a player individually.  We also don’t knock Youkillis for using his park to his advantage.  We only care about these things if these players are going to be leaving the team.  So yeah, the analogy doesn’t really work.

On another subject, why do we need to “give credit to” everyone so that it adds up perfectly?  If we assume (this may not be a good assumption) that a pither with a high HR/FB rate has a pitched against higher than average opponent performance, than shouldn’t that be adjusted?  A HR should be partly credited to the batter and partly to the pitcher.  If we think that HR/FB has more to do with the batter than the pitcher, wouldn’t it be more accurate to regress that, or even to give ALL the credit to the batter (xFIP instead of FIP)?  I know people like xFIP or tRA* for projecting players, but why can’t they be used when determining value?


#16    Rally      (see all posts) 2009/12/29 (Tue) @ 14:36

"The fact that Gutierrez picks up a lot of fly balls for Washburn has zero impact on how many runs Hernandez gives up or how those runs translate to wins, however.”

What I do is use the team total of defensive runs and prorate to the pitcher’s BIP allowed.  It would be better to have separate adjustment for infield and outfield defense and apply to FB/GB rates, but if you don’t take that extra step the effect is minimal.

Felix allowed 53% ground balls, Washburn 36%.  If the team had an exactly average defense but was composed of 3 Ichiros (Threechiro?) in the OF (+60) and 4 Yunis in the infield (-60), then the way the math works out Felix should be credited as having -3 runs defensive support and Washburn +3 runs.  And that’s using an artificial extreme example.  In the real world you’re talking a run here and there for pitchers.


#17    Rally      (see all posts) 2009/12/29 (Tue) @ 14:37

Colin, what’s the source?  MLB gameday data?


#18          (see all posts) 2009/12/29 (Tue) @ 14:38

Another way to look at what I’m asking -

We are looking at wins above replacement.  We can certainly estimate what a replacement level pitcher would do in the same shoes as a player, but we certainly don’t know.  When a player has a high HR/FB rate, wouldn’t that make it more likely that the replacement player would have a high HR/FB rate, thus increasing his value?


#19    Tangotiger      (see all posts) 2009/12/29 (Tue) @ 15:11

Even this statement is unambiguous:

“For a home run, it’s really quite simple - a home run was given up. “

But, many system will count that as “1.4” runs allowed, when it could very well have been a solo shot or a grand slam.  So, again, more “adjustments” being made here in terms of accounting: you don’t have pbp data, and therefore, you err on the side of average.  Which is pretty much Rally’s point regarding the fielding behind pitchers.

***

Rally: good illustration. 

To the knuckleheads who question the use of extreme examples: what Rally did was highlight the extent of impact by using the extreme examples.  This is what I often do.  Not a way to validate something, but to highlight and expose the limits of something.  Nothing can be simpler or clearer than doing what Rally did.


#20    Colin Wyers      (see all posts) 2009/12/29 (Tue) @ 15:56

Rally, I’d say probably. I really can’t say any more because I really don’t know any more. There are still a lot of internal debates (as in, internal to myself) going on and I’m not sure how I want to resolve all of those issues yet.

I really don’t want to spend a lot of time discussing a metric that doesn’t exist yet; I just wanted to acknowlege that we’re aware of the issue Tango pointed out and that we’re taking steps to resolve it.

Obviously in the interim what is is what is, and so everyone can decide what to use accordingly based upon that.

=====================

I will not make the assumption that if two pitchers on the same team have BABIP of .330 and .270, that the fielders played like Ichiro behind one and stood like Adam Dunn for the others.

I don’t think that it’s absurd to acknowlege that sometimes that does happen, though, just as sometimes a team will hit like Dunn for one pitcher and hit like Ichiro for another.

I think we can acknowlege that given current data it’s impossible to say for sure what transpired with the fielders behind any given pitcher for X sample of innings where X. The question is if using team data as you do is a better representaton than using just the data from that one pitcher. Using more data obviously gives us more information about that fielder’s real level of talent, but knowing what happened behind Washburn really doesn’t tell us what happened behind Gutierrez.

=====================

Sequencing is a real problem, yes, and it’s not one that we’ve come a long way in addressing here. I’m going to simply punt the issue for now, because I don’t have the time at the moment to write out my thoughts on it without making it sound like something from Donnie Darko. But I promise to come back to the issue at some point.


#21          (see all posts) 2009/12/29 (Tue) @ 16:15

What system does WARP3 use to measure defense?  Does it use any?


#22    Tangotiger      (see all posts) 2009/12/29 (Tue) @ 16:28

FRAA, which uses seasonal data, and applies generic adjustments based on LH/RH makeup of team’s pitchers.  No play-by-play data.

As I said, a HUGE hurdle for me.


#23    Colin Wyers      (see all posts) 2009/12/29 (Tue) @ 16:30

WARP3 uses Fielding Runs Above Average, a metric developed by Clay Davenport that uses official fielding stats (putouts, assists, etc.) without any play by play information. You can read more about it here:

http://baseballprospectus.com/article.php?articleid=73

(That article is open to nonsubscribers.)

Clay has also developed a version of FRAA that uses play-by-play data, similar in conception to Rally’s TotalZone system, or Fox’s SFR system. (Or for that matter, my SZR system.) You can read more about it in the BP ‘09 book. It’s been discussed previously on this site here:

http://www.insidethebook.com/ee/index.php/site/comments/intentionally_using_less_data/


#24          (see all posts) 2009/12/29 (Tue) @ 20:05

Sounds like they should switch it to use the version of FRAA which uses play by play. 
Using traditional fielding statistics isnt going to get you anywhere.

Mariners were 1st in UZR and had the 7th most errors.
Rays were 2nd in UZR and had the 10th most errors.
The next two best defensive teams, Giants and Tigers, were middle of the pack in errors.


#25    Colin Wyers      (see all posts) 2009/12/29 (Tue) @ 20:29

Alex, if you’ll read the article linked you’ll see that it’s not just a warmed-over fielding percentage. It’s a step up from things like Palmer’s Defensive Runs and in the same ballpark as Defensive Win Shares or any other adjusted range factor.


#26    Alex Krolewski      (see all posts) 2009/12/30 (Wed) @ 00:49

Rally/11:

“If we can accept the argument MGL often makes that one full season is still too small a sample to make definitive statements about defense, how could we possibly make conclusions based on what they do in 150-200 innings behind a certain pitcher? “

If a fielder plays 162 games a year, that’s 1458 innings.  A pitcher with 200 innings is equivalent to 200*7 = 1400 defensive innings, because he has 7 fielders behind him.  Therefore using something like PZR to measure the defense behind a starting (not relief) pitcher is similar to using UZR to measure a player’s defense for one year.
As a result I don’t think we should be so quick to dismiss a PZR-like metric, since it should be more accurate for a single season (although it hardly matters over an entire career).


#27    Kincaid      (see all posts) 2009/12/30 (Wed) @ 02:24

I agree with Steven/15/18.  When we talk about retrospective value, it’s almost always about production relative to some baseline (i.e. average, replacement, etc).  If one pitcher gives up 50 runs in 150 innings, and another gives up 60 runs in 150 innings, we know who gave up more runs, but we don’t know who provided more value without knowing what the baseline for each pitcher was.  I agree that stats like xFIP and tRA* also tell us something about retrospective value because the adjustments they make, while they don’t necessarily tell us more about what the pitcher himself gave up, can tell us something about what a baseline pitcher would have given up.  Like Steven says, that also affects value, so I think that is something we should care about even in talking about retrospective value.


#28    Alex Krolewski      (see all posts) 2009/12/30 (Wed) @ 23:24

Should players be credited if their team overperforms or underperforms its Pythagorean W-L?  This “accountability” is a key part of Win Shares but isn’t part of any of these 3 systems (as far as I know).  Just as rWAR ensures that estimated runs equal actual runs, shouldn’t we also try to account for some of the variation between runs and wins?  Of course I realize the WS method is very crude since it gives equal credit to all players if a team overperforms its Pythagorean record.  However, can’t we give some credit to a teams’ bullpen if it’s good in the clutch, or some credit to pitchers for being inconsistent?


#29    Tangotiger      (see all posts) 2009/12/31 (Thu) @ 11:50

WPA definitely accounts for it, and WPA/LI to a large enough extent does as well.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:37
What sabermetrics is NOT

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion