THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, June 21, 2011

Confused about WAR

By Tangotiger, 07:52 AM

This fellow seems to have his heart in the right place.  However, he’s all over the place in terms of trying to get a grasp of WAR, what it means, why Fangraphs and B-R.com are different, and a host of other puzzling statements.

I’ll try to get to these in the morning.  I am thankful that he made his post, because I think there must be tons of people as confused as he is, and it gives me something to work with.

Plus, awesome blog name.

UPDATE:

I have successfully deleted this post twice already because of the amazing functions of undo and autosave.  Regardless, I hope to incite a forum on some of the sabermetrics that are becoming more ubiquitous as time passes. I have read Tom Tango’s book showing how wOBA is better than AVG, OBP, OPS, etc.

Kind of an odd takeaway from The Book.  But, that doesn’t seem to be the issue at hand, so let’s skip that.

Of course there are the constant stream of intermediaries that people use to calculate these statistics, but the one that I’m most hesitant of is WAR.  For those that don’t know, WAR (Wins Above Replacement) is an all-encompassing statistic that essentially determines how much a given player is worth.  This includes offensive and defensive analyses. 

Rather than say how much a given player “is worth”, let’s say “WAR is the number of wins his past performance has been attributed to the player”.

I don’t like the idea of how “manufactured” the stat is because it’s essentially an average of an average of an average, etc.

I have no idea what average of an average means, nor the “etc” part.  Let’s throw this sentence out the window .  The blogger is trying to learn, but I think he’s reaching here for something.

And each statistic that is used in its calculation has limitations and assumptions, which aren’t usually discussed.

EVERY metric has limitations and assumptions, which aren’t usually discussed.  OBP values a walk and HR equally.  No one talks about this either.  SLG has HR at 4 and single at 1, and that’s not discussed.  Let’s not set a higher standard for WAR.

I see how it can describe how “valuable” a player was to his team last year, but can it really help when it comes to a player being traded or picked up?

Ah, excellent.  Now, we have something to talk about.  Can’t we say the same thing about OBP or ERA?  By definition, every performance metric measures past performance.  That’s what the stat is.  If you want to know about the future value of the player, we need to INTERPRET that metric, be it WAR or any other metric. 

First thing you have to figure out is: what is the metric actually trying to do.

Or can you simply add the WAR of each player on a team and predict the playoffs for the following year (and maybe the World Series teams)?  I don’t think it can stretch that far.

No, you can’t do that.

The data I have below (which I can’t format well for the life of me) are total WAR for each time last year.  Now, of course the better teams have better WARs since they were better.  The reasoning is a bit circular which I think makes it robust for past analyses but not as useful for the future.

Right, if you are stuck on an unadjusted metric, it’s hard for it to be useful for the future.  Same as any other metric.

Anyway, let’s look at them and see how well it did.  The first table is from Baseball-Reference.com, and the second is from Fangraphs.com (WAR is also calculated differently at different places, another reason I’m not too high on it).

How well they “did”?  Did at what?

As for different calculations: that’s why I call them rWAR and fWAR to show that they are in fact different calculations.  They are part of the WAR family.  Is it that hard to get past it?

I have them listed as batWAR, pitWAR, and Team WAR.  These are the sum of the WARs for each individual position player (batWAR), pitcher (pitWAR), and collective team (Team WAR), respectively.

I can’t seem to write below these, so I apologize for any scrolling that’s necessary.  If you look closely, there are some discrepancies.  First off, the Fangraphs.com values are higher in general than the Baseball-Reference.com ones.

fWAR is higher than rWAR because fWAR uses a lower replacement level.  There’s nothing wrong in either case.  Just a reasonably justiable choice by both systems.

And Fangraphs had the Twins as the best team in baseball.  Baseball-Reference had them 5th.  Seems to be a decent drop.

Here is probably where the big difference rests: rWAR tries to account for all runs scored and allowed.  fWAR does not do that.  Basically, rWAR tries to apportion the luck to the players involved, while fWAR largely ignores the luck aspect.

It’s a choice.

Anyway, as a comparison sake, I would say that Baseball-Reference better encompassed the results of last year so I’ll talk about it mainly.  I just wanted to show the difference between the sites.

To the extent that luck is a result, and you need to see that luck somewhere somehow, then rWAR would be the better choice.  In this particular instance.

Something that first strikes me as interesting is that the Yankees had a better RAR than the Rays in both systems, but Tampa won the division.  That seems to be interesting.  I can see how WAR would fail when comparing teams that didn’t have much of an effect on the other, but to me, it seems odd the Tampa was not 1st in it’s division’s WAR from either site.

I don’t find that interesting at all, nor is it even a requirement of anything really.  The Rays scored 23.6% more runs than they allowed.  The Yankees were at 24.0%.

If rWAR or fWAR were more interested in capturing the luck of wins, then, sure, you’d have a case to make.  But, that’s not what they are about.

Something impressive from BR (Baseball-Reference) is that the 8 playoff teams were in the top 9 in WAR.  Only Boston (who was impressively 4th, meaning the AL East had 3 of the top 4 WAR teams last year) didn’t make the playoffs within the top 9 WAR teams.  So, this measure pretty well “predicted” the playoff teams.  FG (Fangraphs) didn’t do as well.

The use of predict here is very wrong.  When you “predict”, you are making an estimate of a future event.  In this case, fWAR is simply representing the runs scored and allowed by the team, and distributing it to the players.  Obviously, the teams that make the playoffs will be predisposed to be those teams that score alot more runs than they allow.

This is another instance of the blogger wanting to learn, but it stuck on something that he should get out of.

Something else that’s interesting is that of the 8 playoff teams, the Giants had the best pitching WAR according to BR.  Seems to coincide with the old belief that pitching is everything in the playoffs.

Again, he’s grasping.  n=1.

Actually, if you look closer, within each series from the playoffs, the team with the better pitching WAR won the series.  That makes me feel more comfortable about the statistic, but again, these calculation included the successful pitching of those teams so it’s circular.  However, it does seem promising.

No, you should forget about all this.  None of this is relevant in discussing WAR.  It’s fun trivia, but ultimately meaningless in validating WAR.

But I would like to have people talk about these context-neutral statistics.  WAR is normalized based on the replacement player of that year, so it’s supposed to comparable across time and leagues.

Eh, sorta-kinda.  It compares players to that year’s baseline.  Whether that baseline player is identical across time and leagues is debatable. 

However, wouldn’t the context change if that player were to change teams?  They would play around different defenders which can take away plays from them or cause problems.  The new pitching staff could affect a players defense.  The ballpark obviously has an effect.  And you see different pitching more than likely changing your ability to hit to some degree.  Does this not seem to matter?

The exact same thing can be said of any metric.  Again, the blogger is grasping here, looking for chinks in an armor.

Also, WAR takes into account some form of fielding statistic, and all of the fielding statistics seem to be a bunch of magic. 

Granted, they “seem” like a bunch of magic.  But, they have a logical, rational basis.

I’m not saying I know a better way, but not much can be quantitative. Anyway, please respond with thoughts about these statistics and what you feel is successful and appropriate in many discussions.  I just feel a bit hesitant, but maybe someone can help ease my discomfort.

Take care.

That first sentence is the key: if you want to discard WAR, and you STILL want to have an opinion, then what do you do?  Well, you come up with your own flimsy, half-rational metric, without any internal consistencies.  You’ll look at someone’s OBP and SLG, maybe his SB, look at his park, appyl some visual observations of their fielding and how they look at bat, see how his team did and say “Yeah, Ryan Howard is pretty good.” That’s really all you are going to do.  And the more you try to do, the more rigid you make your system, the more consistent you try to make your ideas, the more logic you apply, well… congratulations, because you are on the path to WAR.

It’s almost like you don’t want to go to WAR, and are trying to figure out how to do it your own way.  When your way is simply a circumventing of WAR.  And eventually, the more you do the work, the more you realize that, “yup, that WAR is what I’ve been doing all along”.

Really, it’s not like I just came in and said: “This is WAR and this is how it’ll work.” This was a long process to get to where we are.  And, if we have to change things, we will.  This is not some religion.  It’s a result.

And if you don’t want to use WAR, then use whatever else you want to use.  But when you are challenged on logic and rationality, then, please, be kind enough to explain yourself.  Don’t just say “this sucks” without offering an alternative.  That’s what politicians do.  Challenge the logic, and the rationale.  That we can talk about.


#1    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 10:34

I updated the post.


#2    Tom N.      (see all posts) 2011/06/21 (Tue) @ 10:47

Tango, I was wondering why it is that we use replacement-level as a basis of measuring value rather than average?

It seems to me like trying to determine the performance of a hypothetical “replacement” player adds potential subjectivity, whereas an average is 100% objective. I know the whole “zero-cost replacement” argument, but it just always struck me as so subjective, particularly in a field that strives to be as objective as possible.

So, why do we use WAR rather than WAA? Is there something about averages that makes using them inappropriate in this instance?


#3    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 10:52

Actually, you don’t have to use replacement level if you don’t want.  You can still use the core of WAR and just not do the last step.

This is WAR:

WAR
= wins above average (hitting)
+ wins above average (running)
+ wins above average (fielding)
+ wins above average (position)
+ wins above replacement (playing time)

Just don’t do that last step if you don’t care about playing time.

If all you want to know is:
“how many wins can we attribute to this player, GIVEN his playing time, compared to an average player”

Then you want wins above average.

If what you want to know is:
“how many wins can we attribute to this player, GIVEN his playing time, compared to a bench-level player”

Then you want wins above replacement.

BOTH are valid questions.  Pose your question, and then a proper course of action will present itself.


#4    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 10:54

MGL for example doesn’t care about playing time at all (other than to use it for regression purposes).  He sets everyone to wins per 150 G or runs allowed per 9IP.

The reason is that playing time should be linked to talent, so, you are STILL going to have the same ordinal ranking anyway, whether you do it MGL’s way or otherwise.

talent x playing time is what you get paid for however


#5    Tom N.      (see all posts) 2011/06/21 (Tue) @ 12:02

Cool, thanks for the clarification!


#6          (see all posts) 2011/06/21 (Tue) @ 12:52

When we are looking at personnel moves, trades, drafts, free agents, etc., we need to use replacement level as the floor above which we measure value.  We would never want to be paying for expected below replacement performance.

The frequent threads here in the off-season show the utility and importance of this method of analysis.  Since below average but above replacement performance has positive value, the math involved in these moves would be much worse if we worked from WAA instead of WAR.


#7    Pierre      (see all posts) 2011/06/21 (Tue) @ 12:59

Please correct me if I’m wrong and it’s not so much an issue with the stat as how it’s used, but can’t the definition of “replacement level” affect how players are rank-ordered?  I.e. the lower the replacement level, the more PT/health/longevity is “rewarded”.  This bothers me when I see career WAR used as the basis of HOF discussions.  Mike Mussina v Pedro Martinez is a good example of what I’m talking about…


#8    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 13:49

Pierre/7: exactly what is the question you are seeking an answer to?  Then we can tell you what method to use.


#9    Pierre      (see all posts) 2011/06/21 (Tue) @ 14:09

well, right, and I’m thinking that if the question is “who had a better career?” that WAA makes more sense.  I.e. in the short run, the absent star will be replaced by Willie Bloomquist, but over time the PAs will go to Alberto Callaspo or Jed Lowrie or someone.  My point is that if you use WAA rather than WAR, you may well get a different answer.


#10    Arvin Hsu      (see all posts) 2011/06/21 (Tue) @ 14:11

Pierre is pointing out that the value at which the system sets replacement level will alter career value gained calculations disproportionately based on PT.  This therefore affects ordinal ranking of how we evaluate players.

==============================================
Let’s take the following season as an example:

Player A: 150 G, WAA/150G: 2.0
Player B: 50 G, WAA/150G: 7.5

Player A created 2.0 Wins above Avg.
Player B created 2.5 Wins above Avg.

Now what happens at various replacement levels:

Replacement level: -0.5 WAA/150G
--------------------------------
Player A: 2.5 WAR
Player B: 2.67 WAR

Replacement level: -1 WAA/150G
------------------------------
Player A: 3.0 WAR
Player B: 2.83 WAR

===========================================

Thus, our evaluation of which player created more WAR during the season varies dependent on what we decide to use as replacement level. 

Since this is one of the differences between fWAR and bWAR, it presumably doesn’t have an accepted standard across the sabr community. Doesn’t this bring into question the validity of using WAR to evaluate career value for questions such as HOF-worthiness (e.g. Mussina vs. Martinez)?


#11    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 14:15

You’ll get a different answer if you use different metrics.  No one is disputing this.  Nor is anyone suggesting that you should always use one metric all the time.

The point I’m making is that you use a hammer when you’ve got a nail, and you use a screwdriver when you have a screw.

Now, your question was: “Who had a better career?” You may think that wins above average (WAA) is the answer, but then, compare Neifi Perez’s WAA to anyone who played less than 100 games in MLB.


#12    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 14:21

Since this is one of the differences between fWAR and bWAR, it presumably doesn’t have an accepted standard across the sabr community. Doesn’t this bring into question the validity of using WAR to evaluate career value for questions such as HOF-worthiness (e.g. Mussina vs. Martinez)?

To the extent there is no common baseline, this would question the validity of using ANY metric.

The point is that you state your assumptions, and you answer the question under those constraints.

Why for example would “average” be any better?  Indeed, couldn’t someone say to only count those seasons where he was a positive WAA, and ignore anything negative, under the understanding that you can never “lose points” in terms of your march toward HOF.

Once Tiger Woods has 12 Majors (or whatever it is), does he get a lower “Majors above average” every year he gets no majors?

Again, everything has limitations and constraints.  That’s not a bad thing.  Just state those assumptions up front.


#13    Pierre      (see all posts) 2011/06/21 (Tue) @ 14:37

Tango #11-I agree, and my issue is with how the stat gets used, not the stat itself.

So how would you evaluate the Koufax v Niekro question?  I’m thinking WAA…

The Nefti Perez question kind of goes to my point-this kind of analysis is tricky, and where you set the baseline is important.


#14    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 15:02

Pierre, if you want to talk about Koufax / Niekro, then write out a complete sentence.  Preferably unambiguous.

As for Neifi: it’s not tricky, so much that the person asking the question is being ambiguous or incomplete.


#15    Pierre      (see all posts) 2011/06/21 (Tue) @ 15:02

The weird thing, and again please correct me if I’m wrong, is that the original purpose of WAR was to figure out the $ value of a player’s performance.  Yet all I see it used to do is rank players.


#16    Arvin      (see all posts) 2011/06/21 (Tue) @ 15:05

Tango/12: “To the extent there is no common baseline, this would question the validity of using ANY metric.”

I disagree here.  Within the sciences, there is often a dispute involving baselines or methodology when the two major baselines/methodologies in question lead to different conclusions for important questions.  Typically, the field gets split into halves, with many experts falling in one camp or the other.  However, sooner or later, one camp wins out (or a new paradigm wins over), and all the experts come to a consensus on one baseline/methodology.  At this point, the metric becomes accepted or standardized. 

Are we at this point for any of the advanced baseball metrics yet?  There is certainly dispute involving baselines/methodologies of many metrics, from value(WAR) to simple rate performance metrics(RA vs. FIP vs. xFIP vs. tERA, wOBA vs. wRC, UZR vs. FSR vs. TZ).  However, do these metrics lead to different conclusions for important questions?  I would argue that for 99% of cases, the top 2-3 accepted sabermetric offensive rate metrics and pitching performance metrics agree on conclusions.  For fielding metrics, otoh, the rate of agreement is much much worse.

I would posit that offensive rate metrics and pitching performance metrics fall into the latter category of metrics/baselines/methodologies which have a consensus formed around them, and arguably have become standardized or accepted.  Defensive metrics have not reached this threshold yet. 

Back to Pierre’s question: what about WAR?  I would have said yes, before this conversation, but now I’m not so sure.  I’d want to see if there are any major conclusions that differ as a result of the difference in baselines.

I just checked Mussina vs. Martinez, and Pedro edges out Mike according to both fWAR (4%) and bWAR(1%).  Both methodologies show that both players produced very similar WAR over the course of their careers, and that Pedro produced slightly more than Mike.  I would say that bWAR and fWAR agree on their assessment of career WAR and HOF-worthiness. 

So the question remains, are there any major questions where using fWAR vs. bWAR generate different conclusions?  If not, then WAR would be a statistic around which consensus has pretty much formed.


#17    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 15:07

My purpose for WAR was to account for a player’s impact that showed he had value the more he played regardless of how bad he was as a player (to a point).  A byproduct of that is the dollar translation.

Regardless, how a stat is more commonly used is irrelevant, as long as it’s used properly.


#18    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 15:10

It’s rWAR not bWAR.

And at a career level, I don’t think there’s any different conclusion being reached, by using rWAR or fWAR.

But if you use WAR or WAA, you will get different conclusions.


#19    Pierre      (see all posts) 2011/06/21 (Tue) @ 16:11

Odd.  I was sure bref WAR had Mussina higher than Pedro.  That’s why I picked that example.  I think they must have tweaked the formula sometime fairly recently. 

So, for Koufax v Niekro, would you use WAR or WAA?  My contention would be that WAR underrates Koufax pretty significantly (relative to Niekro or Sutton or somebody like that).

Is there somewhere I can go to read about the various replacement levels and the rationales behind them?  It’s interesting, and I’m guessing that Tango et al (Sean Smith?) spent a fair amount of time wrestling with the issue.


#20    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 16:16

So, for Koufax v Niekro, would you use WAR or WAA?

Pierre, if you want to talk about Koufax / Niekro, then write out a complete sentence.  Preferably unambiguous.


#21    Pierre      (see all posts) 2011/06/21 (Tue) @ 16:32

How do you think about career value of a brilliant player that had a short career, such as Sandy Koufax, relative to a good/very good player that had a long career, such as Phil Niekro or Rick Reuschel or somebody?  I think that WAR sells Sandy short.  In the short run, the replacement player is Willie Bloomquist or the pitching equivalent.  In the long run, it’s whoever they can develop or acquire (in the case of the ‘67 Dodgers it was Bill Singer). 

A similar question arises in-season.  If A-Rod stubs his toe, Ramiro Pena gets a few ABs.  But if A-Rod breaks a leg, it takes the Yanks about 2 seconds to free Wilson Betemit.  Thanks.


#22    Arvin      (see all posts) 2011/06/21 (Tue) @ 16:46

Average players don’t come free.  For a team like the Yanks, the resources required to sign or trade for an average player may be a small portion of their payroll, but for most mid-market teams, it’s not insignificant.  Average players cost $5-10mm/year in payroll or resources(prospects) to acquire.  Those are resources that could go elsewhere.

Thus, WAA may accurately depict the value of a player over average, but it doesn’t capture the resources spent on acquiring “average.” WAR does this.

Also, Koufax and Niekro aren’t that far apart on the excellence scale.

rWAR seasons > 5:
Koufax: 10.8, 10.8, 8.2, 7.8, 5.6
Niekro: 9.1, 8.5, 7.5, 6.8, 6.7, 6.6, 6.4, 6.2, 5.7, 5.1

Koufax had 4 seasons 1-2 wins better than Niekro.  Phil had 10+ seasons 2+ wins better than Koufax, with at least 6 of them in the excellent(>5 wAR) category.


#23    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 16:50

Pierre, we must be having some communications issue here, because you don’t have a clear question.  (If you are French, feel free to post in French.  If you are not, then you’ve got to be clearer for me.)

Bobby Orr played for 8 full seasons and won best defensemen in the league 8 times.  His career was cut short due to bad knees.

Niklas Lidstrom was a 6 time winner and Ray Bourque was a 5 time winner.  And they had a career twice as long.

So, what is your question?  Whose career would you rather have?  Well, I guess I’d rather have Lidstrom’s career.

Who would you select to win the Stanley Cup?  Well, I guess I’d rather have Orr’s career.

Who would I pay more for over the course of their careers?  Lidstrom maybe.

You can give me a dozen questions, and half will give you Orr as the answer and half will be Lidstrom.

I need a question that is clear.


#24    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 16:52

Thus, WAA may accurately depict the value of a player over average, but it doesn’t capture the resources spent on acquiring “average.” WAR does this.

I’m not sure who this is being addressed to. 

If it’s me, then I know this full well.

If it’s FYI, then ok.


#25    Arvin      (see all posts) 2011/06/21 (Tue) @ 17:11

24/Tango:  It was being addressed to 21/Pierre, and the argument that in the long run, WAA is the better way to calculate value since teams won’t use a replacement level player for more than a season.


#26    dave smyth      (see all posts) 2011/06/21 (Tue) @ 17:21

fWAR uses BABIP = lg avg, in effect, right? Am I the only one who thinks this is an exceedingly poor decision?  For long careers it probably makes little difference. But I recall a few weeks ago looking at their leaderboard for 2011, and M Garza was near the top. I was dumbfounded, because I had seen his games and he was...not getting good results, which was reflected in his high actual ERA.

So, what is fWAR for pitchers good for (couldn’t resist)?  Over a long career, if say G Maddux has a BABIP 10 or 15 pts better than lg avg., why use fWAR?  OTOH, if after 2 months Maddux has a BABIP of .350 in some season in the middle of his career and an ERA of 4.55, why use fWAR to place him among the leaders just because his FIP is good?

So, in a backwards looking sense (which WAR is intended to be, right), where is the place, in terms of IP or BFP, where fWAR for pitchers is better than an alternative treatment of BABIP? Tango always requests a clearly worded question to answer. Hope this was good enough.


#27    Pierre      (see all posts) 2011/06/21 (Tue) @ 17:46

One more try: If I’m comparing Lidstrom and Bourque, I happily total up my (imaginary)seasonal hockey WARs.  20 years, 1500 games apiece.  Bourque has 70 WARs, Lidstrom 60, hence I vote Bourque into the Hockey HOF before Lidstrom.

But say I’m comparing Bobby Orr to, say, Larry Murphy.  If I add up my WARs, it’s Murphy 50, Orr 40.  But I don’t think it’s appropriate to add up the WARs because I suspect that the Bruins were ultimately able to replace Orr with someone halfway decent.  If I somehow figure out how to calculate hockey WAA, I get Orr 30, Murphy 20.  This because Murphy was only ever a little bit above average (for sake of the example). So, do I use WAA on the theory that WAR unduly “penalizes” Orr for having a short career?  Or maybe something in between WAR and WAA?

In baseball, it’s Koufax v Moyer, Dick Allen v Tony Perez, etc.  Is the question not clear or is it that there’s not really a “right” way to think about this?


#28    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 17:58

So, do I use WAA on the theory that WAR unduly “penalizes” Orr for having a short career?  Or maybe something in between WAR and WAA?

If you are considering using something “between” WAR and WAA, then by definition anything below WAA *IS* WAR.  It just means you have a higher replacement baseline than someone else’s WAR baseline.  You’d have pWAR (Pierre’s WAR), with a baseline somewhat below WAA.

And where do you set this baseline exactly?  Exactly to whatever point that answers your question.


#29    Pierre      (see all posts) 2011/06/21 (Tue) @ 18:06

Basically the same question asked another way: should the replacement player be the guy who’s sitting on the end of your bench (e.g. Reggie Willits) or the guy who’s been clinging to a starting job that you can get for a box of popcorn and a PTBNL (e.g. Corey Patterson).  I’m under the impression that the WAR systems use Willits and of the opinion it should be Patterson.  And it seems to me that the longer the period over whioh you have to replace somebody, the better the replacement player is likely to be. For the rest of 2011, I get Patterson, but next year I commit to Ryan Kalish or whoever.


#30    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 18:08

Dave/26: for the sake of illustration, let’s say you have 100 fielders on the field.  We can then argue that any batted ball in play has virtually nothing to do with the pitcher.

However, the “hits allowed” are always charged to the pitcher.  Even if he’s not responsible for them.

For the sake of illustration, let’s say that a pitcher always pitches the same way whether there’s runners on base or not.  And yet, we’ll get Cliff Lee on the mound during inopportune times (whether he was actually responsible or not for giving up all those hits with men on base), and get Doc on the mound during opportune times.

However, the “runs allowed” are always charged to the pitcher.  Even if he’s not responsible for (all of) them.

Now, do we NEED to charge the pitcher in this manner?  Do we NEED to charge the batter with an RBI?

Do we have to do always choose between 100% charging (HR, BB, SO) and 0% charging (Reached on error)?  Are those the choices we much always make?

If so, then we’ll always have a point of disagreement.  BABIP (and W/L), at the seasonal level, is more non-pitcher than pitcher.  So, in the 100% / 0% choice, the best choice is 0%.

W/L (and BABIP) at the career level is more pitcher than non-pitcher.  So, in the 100% / 0% choice, the best choice is 100%.

Or, we can get past all that and give out graded levels to the pitcher, that some things his performance more represents what the pitcher actually did, than not.

So, I can accept fWAR and rWAR’s version as reasonable, dependent on the assumptions.


#31    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 18:11

Pierre: you have to decide what the “Zero” point means.

In WAR-speak, if you have 0 WAR, that means that you don’t get paid.

In WAA-speak, if you have 0 WAA, that means that you get paid like an average player would, REGARDLESS of the amount of playing time. 

If you want 0 WAA to mean you get paid like an average player would, DEPENDENT on the amount of playing time, then, congratulations, you just defined WAR instead.

So, what do you want 0 WAR to mean, and does it matter if it’s 50 PA, 500 PA, or 5,000 PA?


#32    Arvin      (see all posts) 2011/06/21 (Tue) @ 18:25

"… BABIP (and W/L), at the seasonal level, is more non-pitcher than pitcher.  So, in the 100% / 0% choice, the best choice is 0%.
...
Or, we can get past all that and give out graded levels to the pitcher, that some things his performance more represents what the pitcher actually did, than not.

So, I can accept fWAR and rWAR’s version as reasonable, dependent on the assumptions.”

err, doesn’t fWAR BABIP = lgAvg mean fWAR is choosing the “0%” choice above?  Aren’t you arguing above that the better choice would be ~20-30% for seasonal BABIP and ~70-80% for career BABIP?  If I’m not misreading your argument, it would say that fWAR could stand to improve by taking a TBF weighted avg. of pitcher BABIP by lgAvg BABIP.


#33    dave smyth      (see all posts) 2011/06/21 (Tue) @ 18:43

Tango #30, I got all that (and always have understood that)...but I still don’t like fWAR for pitchers at all....

Just my opinion.


#34    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 18:45

The better choice is 20% BABIP… AND 40% HR… AND 70% BB.... AND 80% SO.  But NO ONE does this.

fWAR has decided on a 0/100 approach.

rWAR has decided on a 100 approach, with adjustments.

I’m saying they are ALL reasonable and justifiable.


#35    Pierre      (see all posts) 2011/06/21 (Tue) @ 18:58

Tango- I definitely see the problem of using the average as the baseline, if that’s what you are getting at in #31.  But isn’t it arbitrary to use the major league minimum salary (fWAR uses the major league minimum, right?)?  Why not the average salary the Florida Marlins have to pay before they get in trouble with the league and players’ union (like $1m-$1.5m).  Now you’re fielding a team of Wilson Betemits and Nick Johnsons and winning 55-60 games instead of 42. Seems like it makes just as much sense, and might eliminate the “problem” of Pedro Martinez finishing 5th among pitchers in WAR in 2002.


#36    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 19:10

So, if the Marlins pay 30MM$ in salary and get 55 wins, that’s a fair baseline, correct?

And if they pay 90MM$ (league average) and get 81 wins (league average), that’s a true statement.

The difference is 60MM$ and 26 wins, or 2.3MM$ per win.

Compare to this:
12MM$ in salary and 48 wins and 90MM$/81 wins.

Difference?  78MM$ and 33 wins… which is 2.4MM$ per win.

SEE?  SAME THING!


#37    Arvin      (see all posts) 2011/06/21 (Tue) @ 19:16

"The better choice is 20% BABIP… AND 40% HR… AND 70% BB.... AND 80% SO.  But NO ONE does this.”

Why not?  Is it that much harder to do a weighted avg. by TBF?

Rather than using lgAvgBABIP or pitcherBABIP, why not use pitcherBABIP*(.8-1/(TBF^n))+lgAvgBABIP*(.2+1/TBF^n) where n is regressed? 

I’m sure there are better scales than 1+1/x^n.  I just pulled that one off the top of my head.

You’re advocating a straight weighting, without TBF as a scaling agent.  That makes it even easier to calculate.  As long as WAR is a complex opaque formula to most users, why not try and get it as accurate as possible?


#38    Pierre      (see all posts) 2011/06/21 (Tue) @ 21:30

@36.  Yes, it’s the same (presumably) if you’re comparing guys who played the same number of years or the same number of games within a year.  But if you’re looking at 2002 Pedro Martinez, it’s not at all the same.  I look at the 2002 AL pitcher WAR list and think “what’s wrong with this stat? Oh, I get it. Pedro’s hypothetical replacement for the 5 starts he missed is getting his brains beat out.” I’m not sure there’s a right answer, and your opinion is as good as mine and probably better, but the the replacement level selected will skew the player rankings in cases like this.


#39    Pierre      (see all posts) 2011/06/21 (Tue) @ 22:05

More on this.  Feel free to stop me.

2002
BB K HR ERA rWAR
Pedro 40 239 13 2.26 5.7
Roy 62 168 10 2.93 6.9
Diff 22 -71 -3 6.30 1.2

40 more innings, 28 more ERs, 22 more BBs, 71 fewer Ks, 1+ more WARs.  Doesn’t really make sense.  Makes me think screwed-up calcs are the real issue and my qualm with the replacement level, while valid, may not make much difference.


#40    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 22:05

Pierre: then you choose your own baseline level, and see how everyone looks.  I’ll guarantee you one thing: there’s going to be some guy some year where you are going to say “That’s guy is really skewed.”

You are always going to find someone who looks odd.

But the important thing is you have a logical rational process.  And then… you accept the results.

You can’t just keep dismissing results and changing the process until you get the results you want.  You’ll never get anywhere that way.

Bill James had a great line: a metric that never surprises is useless, a metric that always surprises is wrong.  You want a metric that confirms what you know 80% of the time and surprises you 20% of the time.

WAR with the baseline we’re using probably fits the bill.

***

Why not?  Is it that much harder to do a weighted avg. by TBF?

We get ENORMOUS pushback on WAR, even though it’s completely open source, and very straightforward.

What you are suggesting will simply lose 90% of the audience.

Personally, I prefer weighting by opportunities, but I’m in the minority here.


#41    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 22:12

Pierre: if you are going to quote rWAR, you can’t then limit yourself to BB,SO,HR.  BABIP plays a big part in it.  As does team fielding.

In your case, you need to look at fWAR.


#42    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 22:13

I looked, and I see that Pedro has 8.3 fWAR and Oswalt is 6.5 WAR.


#43    Arvin      (see all posts) 2011/06/21 (Tue) @ 22:21

"We get ENORMOUS pushback on WAR, even though it’s completely open source, and very straightforward.

What you are suggesting will simply lose 90% of the audience.”

Not if it rides on WAR’s coattails.  It could be xWAR or something like that, for those who wish to use it, it would be better, more precise, and correct for the all-or-nothing-demons you mention in this thread, for those who don’t, they still have WAR.  For anyone using it, they just explain that it’s the exact same scale as WAR, just weighted slightly better.  Their audience will understand just fine.

Honestly, I think it would clarify a lot of the misunderstanding re: the difference between fWAR and rWAR.


#44    Pierre      (see all posts) 2011/06/21 (Tue) @ 22:25

Oh, OK.  So, i look at fWAR and get much happier.  Still, look at the rWAR #s and tell me if they could possibly be right.  If their calcs are screwed up, I feel like I should tell them. 

Have you looked at how much the replacement baseline affects the rank ordering of players in cases like this?  Now I’m thinking it may not make a whole heck of a lot of difference…


#45    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 22:39

I wrote to Sean.  My guess is that Redsox underperformed that year, and it gets spread out to the players.  We’ll see.


#46    Tangotiger      (see all posts) 2011/06/21 (Tue) @ 22:43

Arvin: you may like this thread.  REad through the comments:

http://www.insidethebook.com/ee/index.php/site/comments/reader_mail_of_the_minute_apparent_vagaries_in_woba/


#47    Arvin      (see all posts) 2011/06/21 (Tue) @ 23:14

"I wrote to Sean.  My guess is that Redsox underperformed that year, and it gets spread out to the players.  We’ll see.”

What do you mean by that?

Thanks for the thread pointer.  Gonna post thoughts in that thread.


#48    Tangotiger      (see all posts) 2011/06/22 (Wed) @ 07:44

I forgot to mention:

The discussions / disagreements with WAR and WAA and whatever else is because of the “single dimension” problem.  That by only showing it as a single final number, we lose a dimension.

In response, I created this:

http://www.tangotiger.net/wonloss/index5.php?retroid=roges001

With TWO dimension, the Indidualized Won AND Loss numbers, the reader is now free to choose to combine them, or not, into one number, or not.

Prefer WAA?  Then (W-L)/2.  Prefer WAR with a .333 replacement level?  Then W-(W+L)*.333.  And so on.

YOU CHOOSE.


#49          (see all posts) 2011/06/22 (Wed) @ 11:50

The problem is more in the way it is used - not just by a few people but by at least a substantial minority.

And I think even you underestimate some of the uncertainties - specifically with regard to defense, where (a) the various analytical fielding metrics often differ by a LOT, and (b) there seems to be (haven’t studied this systematically) more year to year variance in the numbers than for the most reliable hitting stats.

As for alternatives? I think the alternative is to be more modest about thinking we have a precise way to compare players’ overall performance on a one on one basis using a single metric. Any single metric.

WAR probably is the best metric for evaluating a player’s overall contribution. But we need to be more modest about our ability to make such comparisons with precision.


#50    Tangotiger      (see all posts) 2011/06/22 (Wed) @ 12:17

And I think even you underestimate some of the uncertainties

I don’t underestimate anything.  I’ve actually got pretty good estimates of everything.

I think the alternative is to be more modest about thinking we have a precise way to compare players’ overall performance on a one on one basis using a single metric.

That’s not an alternative, but a condition or constraint.  No one is talking about a “precise” way, least of all me.  Just reasonably accurate.

If you’ve followed my off-season threads, you see I grant a +/-0.5 win error range.  If I call someone 3.5 WAR, and the team signs a player implying 3.2 or 3.9 WAR, then I accept that as a reasonable assessment on their part.

So, I don’t think your view represents how it’s used (around here anyway).  I can’t help how it’s used elsewhere.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 04:38
The first time a pitcher has ever intentionally thrown at a batter….

May 25 03:39
Lack of hustle during a game

May 25 02:54
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story