Tuesday, June 21, 2011
Confused about WAR
This fellow seems to have his heart in the right place. However, he’s all over the place in terms of trying to get a grasp of WAR, what it means, why Fangraphs and B-R.com are different, and a host of other puzzling statements.
I’ll try to get to these in the morning. I am thankful that he made his post, because I think there must be tons of people as confused as he is, and it gives me something to work with.
Plus, awesome blog name.
UPDATE:
I have successfully deleted this post twice already because of the amazing functions of undo and autosave. Regardless, I hope to incite a forum on some of the sabermetrics that are becoming more ubiquitous as time passes. I have read Tom Tango’s book showing how wOBA is better than AVG, OBP, OPS, etc.
Kind of an odd takeaway from The Book. But, that doesn’t seem to be the issue at hand, so let’s skip that.
Of course there are the constant stream of intermediaries that people use to calculate these statistics, but the one that I’m most hesitant of is WAR. For those that don’t know, WAR (Wins Above Replacement) is an all-encompassing statistic that essentially determines how much a given player is worth. This includes offensive and defensive analyses.
Rather than say how much a given player “is worth”, let’s say “WAR is the number of wins his past performance has been attributed to the player”.
I don’t like the idea of how “manufactured” the stat is because it’s essentially an average of an average of an average, etc.
I have no idea what average of an average means, nor the “etc” part. Let’s throw this sentence out the window . The blogger is trying to learn, but I think he’s reaching here for something.
And each statistic that is used in its calculation has limitations and assumptions, which aren’t usually discussed.
EVERY metric has limitations and assumptions, which aren’t usually discussed. OBP values a walk and HR equally. No one talks about this either. SLG has HR at 4 and single at 1, and that’s not discussed. Let’s not set a higher standard for WAR.
I see how it can describe how “valuable” a player was to his team last year, but can it really help when it comes to a player being traded or picked up?
Ah, excellent. Now, we have something to talk about. Can’t we say the same thing about OBP or ERA? By definition, every performance metric measures past performance. That’s what the stat is. If you want to know about the future value of the player, we need to INTERPRET that metric, be it WAR or any other metric.
First thing you have to figure out is: what is the metric actually trying to do.
Or can you simply add the WAR of each player on a team and predict the playoffs for the following year (and maybe the World Series teams)? I don’t think it can stretch that far.
No, you can’t do that.
The data I have below (which I can’t format well for the life of me) are total WAR for each time last year. Now, of course the better teams have better WARs since they were better. The reasoning is a bit circular which I think makes it robust for past analyses but not as useful for the future.
Right, if you are stuck on an unadjusted metric, it’s hard for it to be useful for the future. Same as any other metric.
Anyway, let’s look at them and see how well it did. The first table is from Baseball-Reference.com, and the second is from Fangraphs.com (WAR is also calculated differently at different places, another reason I’m not too high on it).
How well they “did”? Did at what?
As for different calculations: that’s why I call them rWAR and fWAR to show that they are in fact different calculations. They are part of the WAR family. Is it that hard to get past it?
I have them listed as batWAR, pitWAR, and Team WAR. These are the sum of the WARs for each individual position player (batWAR), pitcher (pitWAR), and collective team (Team WAR), respectively.
I can’t seem to write below these, so I apologize for any scrolling that’s necessary. If you look closely, there are some discrepancies. First off, the Fangraphs.com values are higher in general than the Baseball-Reference.com ones.
fWAR is higher than rWAR because fWAR uses a lower replacement level. There’s nothing wrong in either case. Just a reasonably justiable choice by both systems.
And Fangraphs had the Twins as the best team in baseball. Baseball-Reference had them 5th. Seems to be a decent drop.
Here is probably where the big difference rests: rWAR tries to account for all runs scored and allowed. fWAR does not do that. Basically, rWAR tries to apportion the luck to the players involved, while fWAR largely ignores the luck aspect.
It’s a choice.
Anyway, as a comparison sake, I would say that Baseball-Reference better encompassed the results of last year so I’ll talk about it mainly. I just wanted to show the difference between the sites.
To the extent that luck is a result, and you need to see that luck somewhere somehow, then rWAR would be the better choice. In this particular instance.
Something that first strikes me as interesting is that the Yankees had a better RAR than the Rays in both systems, but Tampa won the division. That seems to be interesting. I can see how WAR would fail when comparing teams that didn’t have much of an effect on the other, but to me, it seems odd the Tampa was not 1st in it’s division’s WAR from either site.
I don’t find that interesting at all, nor is it even a requirement of anything really. The Rays scored 23.6% more runs than they allowed. The Yankees were at 24.0%.
If rWAR or fWAR were more interested in capturing the luck of wins, then, sure, you’d have a case to make. But, that’s not what they are about.
Something impressive from BR (Baseball-Reference) is that the 8 playoff teams were in the top 9 in WAR. Only Boston (who was impressively 4th, meaning the AL East had 3 of the top 4 WAR teams last year) didn’t make the playoffs within the top 9 WAR teams. So, this measure pretty well “predicted” the playoff teams. FG (Fangraphs) didn’t do as well.
The use of predict here is very wrong. When you “predict”, you are making an estimate of a future event. In this case, fWAR is simply representing the runs scored and allowed by the team, and distributing it to the players. Obviously, the teams that make the playoffs will be predisposed to be those teams that score alot more runs than they allow.
This is another instance of the blogger wanting to learn, but it stuck on something that he should get out of.
Something else that’s interesting is that of the 8 playoff teams, the Giants had the best pitching WAR according to BR. Seems to coincide with the old belief that pitching is everything in the playoffs.
Again, he’s grasping. n=1.
Actually, if you look closer, within each series from the playoffs, the team with the better pitching WAR won the series. That makes me feel more comfortable about the statistic, but again, these calculation included the successful pitching of those teams so it’s circular. However, it does seem promising.
No, you should forget about all this. None of this is relevant in discussing WAR. It’s fun trivia, but ultimately meaningless in validating WAR.
But I would like to have people talk about these context-neutral statistics. WAR is normalized based on the replacement player of that year, so it’s supposed to comparable across time and leagues.
Eh, sorta-kinda. It compares players to that year’s baseline. Whether that baseline player is identical across time and leagues is debatable.
However, wouldn’t the context change if that player were to change teams? They would play around different defenders which can take away plays from them or cause problems. The new pitching staff could affect a players defense. The ballpark obviously has an effect. And you see different pitching more than likely changing your ability to hit to some degree. Does this not seem to matter?
The exact same thing can be said of any metric. Again, the blogger is grasping here, looking for chinks in an armor.
Also, WAR takes into account some form of fielding statistic, and all of the fielding statistics seem to be a bunch of magic.
Granted, they “seem” like a bunch of magic. But, they have a logical, rational basis.
I’m not saying I know a better way, but not much can be quantitative. Anyway, please respond with thoughts about these statistics and what you feel is successful and appropriate in many discussions. I just feel a bit hesitant, but maybe someone can help ease my discomfort.
Take care.
That first sentence is the key: if you want to discard WAR, and you STILL want to have an opinion, then what do you do? Well, you come up with your own flimsy, half-rational metric, without any internal consistencies. You’ll look at someone’s OBP and SLG, maybe his SB, look at his park, appyl some visual observations of their fielding and how they look at bat, see how his team did and say “Yeah, Ryan Howard is pretty good.” That’s really all you are going to do. And the more you try to do, the more rigid you make your system, the more consistent you try to make your ideas, the more logic you apply, well… congratulations, because you are on the path to WAR.
It’s almost like you don’t want to go to WAR, and are trying to figure out how to do it your own way. When your way is simply a circumventing of WAR. And eventually, the more you do the work, the more you realize that, “yup, that WAR is what I’ve been doing all along”.
Really, it’s not like I just came in and said: “This is WAR and this is how it’ll work.” This was a long process to get to where we are. And, if we have to change things, we will. This is not some religion. It’s a result.
And if you don’t want to use WAR, then use whatever else you want to use. But when you are challenged on logic and rationality, then, please, be kind enough to explain yourself. Don’t just say “this sucks” without offering an alternative. That’s what politicians do. Challenge the logic, and the rationale. That we can talk about.


I updated the post.