THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, September 06, 2011

“WAR doesn’t work”

By Tangotiger, 11:00 AM

This site is blocked at the office, and Google Reader only shows a portion of it.  I can only presume that the author will use illogical or irrational reasoning (or has such a myopic view that he misses the entire point of WAR) to try to make some sense to support his thesis, because that’s what ALWAYS happens when someone says WAR doesn’t work.  (I’m just applying Bayes here.) It’s the same kind of reasoning when one says “OBP doesn’t work” to point out “BB=HR?  It doesn’t work!”.

But, I’m willing to play along and post this thread blind, and be embarrassed, as the author makes a cogent or insightful argument.  I’ll read the rest of the thread at home tonight, unless someone wants to post it below, starting from after “There’s only one problem.  It doesn’t work.”


#1    dan      (see all posts) 2011/09/06 (Tue) @ 11:21

It’s a whole lot to quote, so I’m not sure if it’s the best idea to do that here.

However, in general, the author is criticizing each of component of WAR, not the concept itself. He doesn’t seem to have a problem with the framework, just the inputs.


#2          (see all posts) 2011/09/06 (Tue) @ 11:27

Wow, over-generalize much, Tango?  Everyone who has ever criticized WAR is in the illogical and irrational camp?  Mark me down among the illogical and irrational group of sabermetricians, then, and I’ll be far prouder to be part of that one than the WAR-can-do-no-wrong camp.

The author has some good points, and others that probably aren’t so good.  One of the main points he makes is that UZR has problems on an order that affect WAR significantly.  I’ve raised that same objection with you here on this blog before.

Some of his other points might not stand up to additional scrutiny and research, but they are hardly presented in an illogical and irrational manner.


#3    alex      (see all posts) 2011/09/06 (Tue) @ 11:38

@Mike/2

Either you willfully misread what Tango wrote, or you need to read it again. “(I’m just applying Bayes here.)” I assume, being a sabermetrician, you know what Bayes is.

Further more, as Tango has said a million times, WAR is a framework. UZR is an input that fangraphs chose to use. It’s illogical to say that a problem with UZR is a problem with WAR. It’s a problem with UZR.


#4    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 11:38

Mike: why do you say this:

“Everyone who has ever criticized WAR is in the illogical and irrational camp? “

I said this:
“because that’s what ALWAYS happens when someone says WAR doesn’t work.

Why would you take the leap of me saying “WAR doesn’t work” and equating that to “criticized WAR”?

I’m all for criticizing WAR.  I have no problem with criticizing WAR.  If you want to say that WAR is limited because of its reliance on UZR, then fine.  It’s when someone CONCLUDES that “WAR doesn’t work” that the person will show himself to be illogical, irrational, or myopic.  (Which is another part of the statement that you removed.)

WAR is a framework, not an implementation.  fWAR, rWAR are implementations.  To not make the distinction between a framework and an implementation is ONE way of many ways to be myopic.


#5    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 11:39

Alex/3: you cross-posted me, so, yes, you get me.

Dan: feel free to email me at tom~tangotiger~net


#6    David Pinto      (see all posts) 2011/09/06 (Tue) @ 11:47

It’s a very well written, thoughtful critique of WAR.


#7    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 11:50

David: if that is the case, then the author must have made good arguments, but came to a terrible conclusion.


#8    Dan Turkenkopf      (see all posts) 2011/09/06 (Tue) @ 11:56

David/6:

I disagree.  I don’t think the overall article was very well thought out at all.

A large portion of it had the feel of “This doesn’t jive with what I think about the game so must be wrong.”

That said, there are some interesting points in there that need further study.  The correlation between UZR and FB% definitely needs more investigation. As does the concern around sample size for utility players.


#9    cloak      (see all posts) 2011/09/06 (Tue) @ 11:56

Tango, the very next sentence after “It doesn’t work” is “At least, not yet.  Not in the fantastically straight-forward way we try to use it.”

Other than the fact that the author inserted a line just to get more viewers, it’s a cogent and well-written article, a few points of which can be debated rationally here.


#10    Andy      (see all posts) 2011/09/06 (Tue) @ 11:57

Read it here Tango:

http://justpaste.it/h4j


#11          (see all posts) 2011/09/06 (Tue) @ 12:06

Mike: why do you say this:

“Everyone who has ever criticized WAR is in the illogical and irrational camp? “

I said this:
“because that’s what ALWAYS happens when someone says WAR doesn’t work. “

Why would you take the leap of me saying “WAR doesn’t work” and equating that to “criticized WAR”?

Maybe I misread you because I had read the article before I read your critique of it.  Given what the author wrote, your post here came across to me as lambasting anyone who dared critique WAR.

Maybe I should have given you some benefit of the doubt because you hadn’t read the article, but if so, that’s kind of ironic.


#12    Tom N.      (see all posts) 2011/09/06 (Tue) @ 12:12

In general, it’s a pretty thoughtful article.

The anecdote he gives about Nyjer Morgan is a little misleading, however. The implication is that Nyjer Morgan is walking less because he’s batting in front of Braun and Fielder, so he’s getting more pitches to hit.

Just looking at the pitch data from fangraphs, Morgan is actually seeing the lowest percentage of strikes in his career, and he’s swinging at more pitches both inside and outside of the strike zone than he has at any point in his career. So this data suggests Morgan is actually getting FEWER pitches to hit, and he’s walking less because he’s swinging a lot more. Basically the opposite of the author’s suggestion.


#13          (see all posts) 2011/09/06 (Tue) @ 12:15

Just looking at the pitch data from fangraphs, Morgan is actually seeing the lowest percentage of strikes in his career, and he’s swinging at more pitches both inside and outside of the strike zone than he has at any point in his career.

You can’t use that data to say that.  The BIS zone data has swings on the order of 5% from season to season.  That data is basically useless for season-to-season comparisons.


#14          (see all posts) 2011/09/06 (Tue) @ 12:15

The author says “It [WAR] doesn’t work.” But he also says, “As yet, it [WAR] is probably as good a singular statistic as is widely available.”

I think what he is actually arguing is that WAR is dependent on the accuracy of its inputs, which I doubt that you or any other proponent of WAR would disagree with. But that’s not nearly as good a hook as saying “WAR doesn’t work”.


#15    Matt Bandi      (see all posts) 2011/09/06 (Tue) @ 12:29

There are several misunderstandings of WAR in the article. For example: “According to WAR, in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.” He completely ignores the positional adjustment here, looking only at each player’s UZR.


#16    Tom N.      (see all posts) 2011/09/06 (Tue) @ 12:30

"You can’t use that data to say that.  The BIS zone data has swings on the order of 5% from season to season.  That data is basically useless for season-to-season comparisons.”

I was not aware of that. Thanks for the clarification.


#17          (see all posts) 2011/09/06 (Tue) @ 12:34

I think he made some good points that a lot of casual WAR users haven’t fully thought about.  A less sensational headline would have added credibility but gotten less traffic.  The point that WAR is often misused on the internet is the most important one for me, because I think stat sites might be well-served to make a bigger effort combating the many misuses.  There should be some kind of ambassador of WAR who keeps an eye out for misuse and politely explains to the author what they did wrong.  Then on FanGraphs every month you’d republish an accessible post called How Not To Use WAR with lots of examples.


#18    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 12:40

Andy: thanks, but that’s blocked too!  But, no worries, as someone emailed me the article.  I’ll go read it in a couple of minutes.

From the sounds of the comments above, it seems that the “it doesn’t work” was kind of a misleading way to put it, especially when it’s followed up by the next sentence (not available on Reader) that partially negates his own statement.  It’s like Bill James was saying about “you won’t believe what happened” in the news shows, only to end up with a story that was entirely believable.

Anyway, I’ll be back…


#19          (see all posts) 2011/09/06 (Tue) @ 12:43

#14

That’s without question what’s going on here.  Titling a piece “Is WAR the new RBI?” and placing the sentence “It doesn’t work” right above the fold is like titling something “Cute video of cats and dogs” on the internet.  It’s the SEO of the sabersphere. 

All he’s really doing is questioning the validity of the fielding data (nothing new, right?) and questioning whether context has more of an effect on WAR than we’d prefer it to. 

#15 may be right in that the author may be misunderstanding the positional adjustment, though.  Both Tulo’s and Lee’s raw fielding number is about +9 runs, but Lee gets crushed in the positional adjustment and Tulo doesn’t.


#20    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 13:02

Tim: I’ve done that quite a bit in the past with WPA (what it is… and isn’t).  I did that because there was alot going on with WPA that wasn’t easily seen.

Maybe I need to add WAR to my “how to use” list.


#21    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 13:14

His portion on fielding was great.

***

His fielding discussion was not good:
“According to WAR, in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.”

No way.  The defensive value, as I’ve said countless times, is the fielding portion plus the positional adjustment.  That’s +15 for Tulo, and +1 for Lee. 

So, when he says this:
“ As yet, WAR struggles to distinguish between the two.”

He completely misses the point of how WAR does great in putting a SS on the same scale as the other fielders.

The rest of his positional talk misses the entire point, so I won’t bother commenting on more examples.

***

“UZR results get weirder the smaller the sample gets. “

You don’t say.  That’s true of alot of things.

***

I don’t think the author understand enough about WAR to criticize it as strongly as he has.

He should have far more questions than he should have answers.  His article reads like he understands WAR, that he knows where things are lacking, etc.  In fact, he should be asking questions and learning.

***

I’ve reached out to Buster Olney, because he comes to similar WAR conclusions.  He hasn’t responded (and I don’t think he will).  I’m not going to reach out to each author individually, to set them straight.

I written tons about WAR and replacement level and positional adjustments.  The author(s) should research what they are writing.  Or they can even email me.

***

This article has some good stuff and some terrible stuff.  And in no way should it come to any conclusions.


#22    Jon      (see all posts) 2011/09/06 (Tue) @ 13:41

I’d be interested in discussing his “WAR Hates Sluggers” section more as he brings up an interesting point about Fielder vs. Bourjos and other middle of the diamond guys who WAR rates as more valuable.


#23    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 13:57

What is it that you see as interesting exactly? 

I see it more a question of the positional adjustment.

For example, this is just pure bullsh!t:
“It doesn’t like that they are fat and slow.”

I mean, he just made that up.

And then he goes on to this:
“We’ve struggled to understand and statistically represent the effect hitters have on one another.”

No, we haven’t struggled with that.

And this is just more crap:
“but no weight to the scarcity of pitcher-intimidating, strategy-altering cleanup hitters, which I see as a form of reverse discrimination.”

***

We can have a good discussion, but there is so much rambling in there, it’s very difficult to get a focus on his article.

I would rather he ask me questions, and I can give him answers, rather than he assert claims, and then I have to go an undo his mess.

Mike Silva for example did alot of asserting a long while back, and what did he do?  He sent me emails, with all his questions, and then we all learned.

This author should do the same thing, and we’d all be better off.


#24    MGL      (see all posts) 2011/09/06 (Tue) @ 14:17

I’m posting on my phone which is cumbersome but the article is analytically horrible especially the discussion about fielding and “sluggers.”

The author does not understand baseball analytics enough to be criticizing them.  Obviously not everything he says is misinformed but enough of it is to render the article a 2 on a scale of 1 to 10.  IMO that is.


#25    Jon      (see all posts) 2011/09/06 (Tue) @ 14:30

Tango you are of course right about the positional adjustment.  I was alluding to the theoretical discussion of who you would target as a free agent or if you were building a team.  I think you quoted Olney a few weeks ago discussing who he’d rather have between a slugging first basemen like Fielder who does two things well (hit and hit for power) or better overall players like Bourjos or Zobrist.  It’s an interesting debate because Zobrist has put up slightly better WAR numbers (I think) over the last 3+ years but the casual fan would most likely take the middle of the order slugger.


#26    minesweeper      (see all posts) 2011/09/06 (Tue) @ 14:36

"He should have far more questions than he should have answers.”

Great quote.


#27    Rally      (see all posts) 2011/09/06 (Tue) @ 14:39

The writer just assumes, when discussing Kinsler and Miguel Cabrera, that of course Cabrera is more valuable.  He doesn’t even bother trying to demonstrate in any way.


#28    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 14:57

Neyer does a great critique of my problem with the author:

http://mlb.sbnation.com/2011/9/6/2408060/limits-of-war-zobrist-analysis


#29          (see all posts) 2011/09/06 (Tue) @ 15:12

Tango, I don’t know if you’ve seen this or not, but it looks like your original “Calculating WAR” post got hacked. Either that or it’s a nifty way for you to put advertising within the post.

http://www.insidethebook.com/ee/index.php/site/comments/how_to_calculate_war/


#30    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 15:48

That is paid advertising, thanks.


#31    studes      (see all posts) 2011/09/06 (Tue) @ 16:31

Really, this article is a critique of UZR, with a few swipes at WAR.  Why didn’t he include BRef’s version of WAR instead?

I wish the author were more coherent about sluggers vs. non-sluggers.  This appears to be a part of WAR that many observers struggle with, and I’d like to see a more analytic rationale of why/why not.  There may be something there...or not.  I’m keeping an open mind about the issue.


#32          (see all posts) 2011/09/06 (Tue) @ 19:12

I saw this more of a critique to the inputs of WAR as opposed to the WAR model itself…

While the title is provocative and perhaps I’m being generous.... maybe the author is suggesting the following:

There are a lot of “mainstream” (for lack of a better word) media and fans throwing WAR #’s about without really understanding the limitations of the data that feeds the model.  I look primarily at some of the rocket scientists over at ESPN.  As another example, how many times do you now see player X is having a better season than player Y when the WAR delta might be 0.3-0.5 WAR between the two.

Again perhaps I’m being too generous but maybe the author is simply suggesting that folks are throwing WAR #’s around without understanding the limitations and margin of error in those #’s (much like people have thrown about and misused RBI in the past).

BTW… the BSR input in fWAR also seems sketchy (esp 1 year samples).  I understand the concept and like the idea of trying to measure it, but there is a lot of context in baserunning (arm of the OF, # of outs, score) that matters and one year samples would not seem to be nearly enough to ‘even out’ these contextual variables.  Also I don’t think the current BSR uses any sort of zone system which means advancement is only based on the OF (or IF) who fields the ball and not where on the field the ball is (I could be wrong on this though)


#33    Tangotiger      (see all posts) 2011/09/06 (Tue) @ 19:22

I agree that the main objection seems to be UZR, with some objection to the positional adjustments.  But the only evidence provided is a sort of “well, that doesn’t make sense”.

As I said, he had great stuff on the fielding study, and everything else should be ignored as nothing but a stream of consciousness.


#34    Tangotiger      (see all posts) 2011/09/07 (Wed) @ 11:07

I liked this response at Primer, linked from here:

http://crashburnalley.com/2011/09/07/war-back-in-the-news/

I think this is a problem that I see a lot not just in relatively unimportant venues like sports, but also in more important arenas (popular discussions of science, economics, etc.)

People correctly point out that we don’t have precise answers and that our best quantifications have error bars that are [larger] than the number of decimal places reported. That’s a valuable insight and worth discussing, but then people take it a step further and use that as an excuse to remain completely agnostic on things.

By denigrating the best efforts of others to quantify difficult questions and insisting that “I don’t need all that fancy stuff, just give me the basics and I’ll take my own guess since no one knows” they give themselves a feeling of smugness and superiority to those bookish nerds vainly searching for answers they can’t pin down, but they also throw away valuable information that the effort to quantify those things tells us and in most cases behave as though the uncertainty is much greater than it actually is.

Whoever you are, well done!


#35          (see all posts) 2011/09/07 (Wed) @ 12:53

MGL: I always appreciate and respect what you have to say. This >> The author does not understand baseball analytics enough to be criticizing them. >> seems like a sweeping generalization of the author and article which needs more explanation. I know time is an issue. But still, very disappointing details are not given as to why you think this way. Maybe, perhaps, you’ve talked to the author privately.


#36    Tangotiger      (see all posts) 2011/09/07 (Wed) @ 14:02

This came up in my feed (from the same site that published the main piece in this thread):

“WAR is a good, imperfect, idea”

http://www.itsaboutthemoney.net/archives/2011/09/07/war-is-a-good-imperfect-idea/

As always, I cannot read below the fold.  Since my attempt yesterday to forecast the level of illogic, irrationality, or myopicness (myopia?  that a word?  it is now!) to a statement of “WAR doesn’t work” by applying Bayes was met with disdain by some, I won’t do it this time. Clearly, in the Bayes world, I had completely ignored the possibility that the author would retract his claim below the fold after making it above the fold.  That’s a bad job of establishing a prior in my case. 

So, I’m giving myself a one-day detention on applying Bayes, to this article, and I’ll only provide (unconditional) commentary after reading all of it.


#37    SittingCurveball      (see all posts) 2011/09/07 (Wed) @ 14:16

Other than the UZR point, which has been brought up by so many that you can’t credit this author with ingenuity, there isn’t much there.

He talks about how you can’t build a lineup without a slugger, ergo they are valuable. Does he not understand scarcity at all? The point is that you can find a slugger 1B so easily that few teams go without one.

He also uses WAR to use a “better on a per game basis” argument. Not only does that create small-sample and margin-of-error problems that this very author criticizes people for using, but it also fails to account for WAR’s credits for playing time.

I hate when articles are considered “well-argued” because they are long and gracefully written.


#38          (see all posts) 2011/09/07 (Wed) @ 14:30

Tango, the subsequent piece you just linked mostly deals with Neyer’s point that “we” are not using misusing WAR in the sense that people on that side of the argument have yet to point to any specific references where WAR is treated as God’s Truth.  This is also being discussed in the other thread.

I don’t have a popular website whose audience is saber-friendly “fans”, so I don’t really get a lot of feedback on this front, but apparently these guys do.  If they’re trying to remind people about the dangers of overvaluing the digits in a certain player’s WAR column, then I can’t see anything wrong with that.  But abstractly complaining about misuse, without referencing anybody specific, hurts the framing of that argument because now “we” don’t know who “we’re” talking about.  If what you’re saying that somebody is using WAR incorrectly, then say who it is.


#39    Tangotiger      (see all posts) 2011/09/07 (Wed) @ 14:59

I created a new thread about the “slugger” issue.


#40    Tangotiger      (see all posts) 2011/09/07 (Wed) @ 17:14

Damn my self-imposed Bayes banishment.

http://www.itsaboutthemoney.net/archives/2011/09/07/legitimacy/


#41    Tangotiger      (see all posts) 2011/09/07 (Wed) @ 18:06

I read Brien’s article, and I enjoyed it.


#42    Dallas Bob      (see all posts) 2011/09/07 (Wed) @ 20:24

I have a question about WAR and defense.  Does it in any way weight the postion a player plays?  Is a good right fielder given the same defensive weight as a good shortstop?  If so, isn’t that a problem? Does being great in right field really matter as much over being average as it does for a shortstop.

Just asking.


#43    Tangotiger      (see all posts) 2011/09/07 (Wed) @ 20:43

Dallas: yes, it’s handled via a positional adjustment.

For example, a SS get +7.5 runs, and a RF gets -7.5 runs (per 162 games) just for playing their positions.

If you have a crappy SS (say he’s -10 runs relative to the average SS), and you have a good fielding RF (say he’s +5 runs), then guess what: their fielding value will be equivalent.  They are both -2.5 runs relative to an average fielder at a neutral position.

This is why I recommend that when we talk about fielding, we merge the fielding runs above position with the positional value, so that everyone is on the same playing field.

The following is an illustration that kind of puts the thing in action (1999-2003):

http://www.tangotiger.net/UZR9903TT.html


#44    MGL      (see all posts) 2011/09/08 (Thu) @ 00:36

MGL: I always appreciate and respect what you have to say. This >> The author does not understand baseball analytics enough to be criticizing them. >> seems like a sweeping generalization of the author and article which needs more explanation. I know time is an issue. But still, very disappointing details are not given as to why you think this way.

OK, here goes. It’s not pretty!

It frequently measures context as much as performance.”

Of all the stats that you can think of, sabermetric or not, the one that is designed to account for context is WAR.  It is essentially – not perfectly of course – context neutral.

“Especially when used to evaluate single seasons, it doesn’t sufficiently account for the inevitable variations in opportunity and environment.”

Any single-season stat is going to be fraught with small-sample size driven variability, random or otherwise.  But, basically, I have little idea what he is talking about with that sentence.

“While Simon isn’t looking at UZR specifically, he does point out that most defensive metrics do not account for positioning and that half a dozen plays can cause sizable shifts in the aggregate numbers when we’re dealing with less than a season’s worth of data.”

The whole point of a good defensive metric is NOT to account for positioning, so to speak.  If a player is out of position overall, either because he is poor at it or his coaches are poor at it, then it gets reflected in his UZR and should.  Of course there are instance where a fielder may be correct in his positioning but UZR does not accommodate it well.  But that is part of its imperfection.

Again, I have no idea what the second part of that sentence means other than yes, the whole world agrees with and realizes, ad nauseum, that less than a season’s worth of data for ANYTHING is going to produce anomalous results for any metric.

“I’m not the only one who’s noticed that UZR frequently yields suspicious results in small samples, at Fenway, and when several good outfielders are playing alongside one another.”

In small samples, UZR, or any metric that is not 100% descriptive, is going to yield suspicious looking results in lots of instances.  It so happens that UZR does a pretty good – not nearly perfect - job of park adjusting.  The concept of ball hogging and the like is WAY overrated in terms of being problematic.  This guy knows so little about UZR and defensive metrics that it is scandalous that he should even be allowed to criticize them.

“There is, however, significant evidence that pitching staffs with extreme batted ball tendencies can dramatically effect their outfielders UZR numbers.  (These extremes I defined at upward of 40% at the high end and below 33% at the low end.)

Average OF UZR for FB% > 40.0: 10.1
Average OF UZR for FB% < 33.0: -10.6”

I am going to question those results until and unless someone confirms them. In any case, it is true actually that pitchers with high fly ball rates induce easier to catch fly balls.  Same for ground balls.  UZR takes this into consideration actually.

But, it could be that infield pop flies are screwing up the numbers. UZR, of course, ignores them.  It could also be that teams with better OF defense cultivate fly ball pitchers or the pitchers themselves deliberately allow more fly balls.

There is bias in the data that helps to cause that as well.  Balls that are caught tend to be recorded as flies and those that are not tend to be recorded as line drives.  If the author is only counting fly balls, of course he will find that outfields that catch more air balls tend to be on teams that induce more fly balls (again, as opposed to line drives).  So there are lots of non-ominous reasons why there might be a correlation, but, more importantly, so what if there is?  There are many, many problems, biases, etc., with defensive metrics.  So, instead of reflecting actual defensive performance 90% or 95%, it might be 75%.  As long as we recognize that, we don’t have to get our panties in a bunch – and most people who actually know a little about defensive metrics – unlike this author – do recognize this.

“This is not to say that UZR is useless, just that is unreliable in single season increments and that unreliability is passed on to WAR, which we habitually use/misuse when discussing single seasons and partial seasons.”

Just a typically (for this article) inane statement.  UZR is not “unreliable” for single seasons. If it were, there would not be a typical y-t-y correlation of .5, which there is (unless of course, it was not necessarily accurate).  It is less reliable for one year than 2 years.  And 2 years is less reliable than 3 years.  Etc.  Oh, and that is true for every sample stat on the face of the earth.  Small-minded people, like this author, incorrectly think about sample statistics in terms of yes/no, black/white.

“On one level, this seemed legit.  Zobrist appeared at every position on the diamond in ’09 and over the years has proven himself to be an above-average defender at second base and in right field.  Managers have long lauded the value of versatility and lavished praise on players like Zobrist, Mark DeRosa, and Placido Polanco, who play several key positions well and also swing decent sticks.  Zobrist’s looked like evidence of their wisdom.”

While there is some value in being versatile, that has nothing whatsoever to do with defensive metrics or WAR. Nothing.  Zobrist gets high defensive marks because he was overall above average at all his positions combined.

“According to WAR, in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.”

If you want, you can stop reading right here.  The author, beyond a shadow of a doubt, evinces his complete lack of understanding of defensive metrics, defensive e value, sabermetrics, WAR, common sense, and anything else you can think of.  Why would you take seriously or even care about someone’s opinion of a subject that he knows little about?  Seriously.

First of all, what numbers is he referring to?  Tulo is 9.1 in UZR and Lee is 1.9.  And of course one is at SS and the other is at first base. There is something like 15-20 runs in defensive value separating an average first baseman and an average SS.  What the heck is this guy talking about with that sentence?

“There are two types of utilitymen, those who are given the job because they play many positions well and those who are given it because they play no position well.  As yet, WAR struggles to distinguish between the two.  It reads Houston’s inability to decide where Lee hurts them least as evidence of Lee’s versatility.  It suggests that Howie Kendrick‘s defense at second base has gone from average to exceptional since Mike Scioscia started giving him more starts in left field.
UZR results get weirder the smaller the sample gets.  The utility player may log a thousand innings in total, thus suggesting his UZR is somewhat more reliable, but what actually happens is that several hyper-unreliable samples of a few hundred innings or less are bundled together like toxic mortgages and rated AAA.”

Again, complete lack of understanding of defensive metrics.  They don’t give extra credit for versatility of course.  To some extent, I would guess that players who play lots of positions do not play any of them as well as they might if they played that position exclusively.

And as far as combining numbers from different positions.  The reliability of the total sample size should be exactly the same as the same sample size from one position!

“However, one can’t help but notice that a cross-section of the most intimidating hitters in the game are treated with relative disdain by the metric.  It doesn’t like them because they play first base or left field (or DH), which aren’t scarcity positions.  It doesn’t like that they are fat and slow.”

Hate to sound like a broken record, but the guy has no idea how to value players. It isn’t that WAR “hates” slow sluggers who play 1B, DH, or corner OF.  It is (obviously) that a player’s value includes his defense and his baserunning.  If a player has little defensive value and doesn’t run the bases well, obviously he is not worth as much as a player who does, all other things being equal.  And things like walks and scratch singles, which are worth 20-30% of a HR, are not as sexy as the HR. But the value of an offensive event is what it is.  Any lwts-based stat, like WAR, is non-discriminatory. 

Characterizing it as liking or disliking one type of player is ignorant.

“While I understand that everybody would love to have Chase Utley or Troy Tulowitzki, a middle-of-the-order hitter who makes big contributions in the field and on the basepaths, as well as at the plate, the fact remains, building a lineup without a slugger (or two) is like building a mall with seven Sunglass Huts and no department stores.”

Well, we have a thread which explains why this is absolutely not true.  And even if it were, we could say the same think about on-base, non-slugger type players – having a lineup full of sluggers but no on-base guys is like…


#45    MGL      (see all posts) 2011/09/08 (Thu) @ 00:39

“We’ve struggled to understand and statistically represent the effect hitters have on one another.  Would Nyjer Morgan be hitting .306 if he wasn’t batting directly in front of Ryan Braun and Prince Fielder?  (WAR suggests, by the way, that Morgan has been more valuable on a per game basis than Fielder.)… ”

Again, it has been shown many times, with actual evidence (as opposed to this guy making assertions with no evidence, and extremely poor logic), that protection is mostly a myth.

In any case, his arguments about sluggers and protection are straw men.  They have virtually nothing to do with WAR.

“But, WAR is not a debate-ending statistic, especially for single seasons.  Even WAR’s adherents, like Dave Cameron, generally admit the margin of error is at least 15%.  When we stubbornly suggest that 0.5 WAR means anything, we are grossly exaggerating the statistic’s accuracy, even according to its creators.  It remains true that any reasoned discussion of an individual’s contributions still requires analysis of the various components that go into WAR, as well as several that don’t, and, as such, subjectivity reigns.”

Perhaps the only thing that makes any sense in the article.

I hate when articles are considered “well-argued” because they are long and gracefully written.

Wonderful quote!  This article is a pig in a wedding dress…


#46    Aaron      (see all posts) 2011/09/08 (Thu) @ 10:23

Fangraphs lists Lee’s UZR as 9.2.


#47    Dallas Bob      (see all posts) 2011/09/08 (Thu) @ 10:26

Tangotiger,

Thanks for the explanation about the positional adjustment.  I guess I have more stupid questions concerning WAR and defense.  Does the positional adjustment truly account for the importance of position?  Might the following be true:  The negative affect on WAR for a bad fielding right fielder should be small because the importance of defense is small in right field (the right fielder gets few chances, even average right fielders make the plays and great defensive ability is just overkill, etc.).  The negative affect on WAR for a bad fielding shortstop should be large because the importance of defense at shortstop is great (great number of chances, below average players can’t make the plays, excessive skill is not overkill at shortstop, etc.).

I am particulary disturbed by the WAR of a David Murphy.  I think he is sitting at 0.7, which I believe means he shouldn’t even be on the end of the bench.  I think this is due to defense but he plays the corner outfield where his defense just doesn’t seem to affect the outcome of games.  If he played shortstop I would be inclined to worry but that is the point - he doesn’t play shortstop.

Also, Michael Young has been criticized for lack of defense.  If he were to play only DH, would his WAR skyrocket?

I am not trying to argue - just trying to learn more about the great game!


#48    Tangotiger      (see all posts) 2011/09/08 (Thu) @ 11:10

Dallas:

Let’s take an easy example.  You have a bad fielding 1B.  He’ll be about -10 runs relative to the average fielding 1B.  Not a whole lot, but, 1B don’t have many opps to do damage to begin with (certainly not compared to SS or CF).

The positional adjustment for 1B is -12.5 runs.  So a bad fielding 1B has a fielding+positional value of -10 -12.5 = -22.5 runs.

The DH “positional” value is ALSO -22.5 runs.

So, whether you have a bad fielding 1B, or you have a non-fielding DH, both will have an identical “defense” value.

***

When you see players move from CF to the corners, or vice versa, you see a shift of about 10 runs.  This is why the positional value of CF is +2.5 and the corner OF is -7.5.

If you have a below average fielder in CF, say a -5 runs (relative to other CF), he will be an above average fielder in the corner (+5, relative to other corner outfielders).  In BOTH cases, his “defense” value is -2.5 runs (+2.5 -5, or -7.5 +5).

This is the point of the positional value adjustment, that it puts players on the same scale.  (Or tries to anyway.)

The positional adjustment in the OF is solid as a rock. 

The positional adjustment in the IF (2B/SS/3B) is somewhat solid.

The positional adjustment BETWEEN OF and IF is a bit shakier, not the least of which because all the lefthanded throwers are in the OF, creating a glut there.

The positional adjustment BETWEEN those 6 positions and 1B is also a bit shaky (but not that much).

The catcher adjustment is a bit of a stab in the dark.

So, I end up with the following adjustments:
+12.5 C
+7.5 SS
+2.5 2B/3B/CF
-7.5 LF/RF
-12.5 1B
-22.5 DH

(DH get a +5 bonus on the hitting side because it’s hard to hit as a DH.  You can also even do a similar adjustment for catchers.  But, no need to get into that here right now.)

So, you look at that scale, and it kinda makes sense.  We know, for a fact, that SS must be higher than 2B/3B: you almost NEVER see a 2B/3B become a SS, but tons of 2B/3B are former SS at some point of their lives.

Same idea for CF and the corners.

You kinda plug the two sets together, the 1B is obviously below the corners, and we suspect the C are above (huge scarcity).  And voila, you get a quantified version of the fielding spectrum.

ANY of those numbers can go up or down by 1 or 2 runs.  If you want to say the 2B should be +4 and 3B should be +1, I’m not going to argue.  If you want to say it should be different by era, I’ll agree (but it won’t change that much, certainly not in my lifetime).

It’s a fairly basic system that crystallizes how you would naturally feel anyway.  It passes the sniff test.

So, absent other research someone wants to present, then based on both my research, and the sniff test, that chart is pretty solid.

To argue against it means that you are going to bring something to the table.  And, I could be wrong.  But, absent actual research, we’ll be arguing between my nose (+ my research) against your nose (and no research).

Why would your nose win?

(Royal “your”.)


#49    Tangotiger      (see all posts) 2011/09/08 (Thu) @ 11:21

Dallas: I created a separate thread for this issue, so please continue discussion there.  Thanks…


#50    pierre      (see all posts) 2011/09/08 (Thu) @ 11:23

there was a long discussion about all this maybe 3 months ago where Tango tolerated my attempts to poke holes.  Possibly worth looking at.


#51    Tangotiger      (see all posts) 2011/09/08 (Thu) @ 12:24

I don’t remember that at all.  However, there’s a “see all posts” next to Pierre’s name.

Clicking that, and let’s see if I can find what he’s talking about.

Probably this thread:
http://www.insidethebook.com/ee/index.php/site/comments/using_wowy_and_positional_switchers_to_determine_the_fielding_value_of_each/

I totally forgot I did that.


#52    Brian Cartwright      (see all posts) 2011/09/08 (Thu) @ 13:49

Aaron/46 - after I read the article I looked up Lee at FG and saw the 9.2 UZR as well. Oliver has him -10, -8, -7 the past three years in LF, but then +9.6 in 2011, 0 at 1B, almost exactly what UZR reports. But given the past performances, I’m sure we can expect another +12 or so from Tulo, but not from El Caballo.


#53    MGL      (see all posts) 2011/09/08 (Thu) @ 17:24

My bad. It was 1.9 for 1B and 7.4 for LF.  In any case, Tulo is 9.1 at SS.  My point still stands. If the author does not even know that a player who is 9.1 runs better than the average SS is a WAY better defender than a player who is 9.2 runs better than the average LF’er andf 1B’man, does he have any business discussing defensive metrics?  I submit an unequivocable, “No.”

Not that we expect Lee to be a “true” plus defender in LF.  Which is why I HATE using any non-regressed numbers to explain “what happened” when the metric (like UZR) is not a directly descriptive metric.

For those of you who don’t know what I mean (I have explained it many times on this blog), a directly descriptive metric is on that assigns a value to something that definitely happened.  Like lwts.  A player hits a single and we give it a value of .47 runs (above an unknown league average PA).  Now, it could be a line drive single or a little duck snort into RF where the batter barely made contact.  Which is why even a directly descriptive stat is not necessarily a good predictor, especially in small samples.  (In large samples those line drive and bleeder hits will mostly even out).

UZR, though is not directly descriptive.  At least the values it spits out are not.  It is inferentially or indirectly descriptive.  If a ball is hit and it is fielded or not, UZR makes note of that (which is the the directly descriptive part), but when it assigns a value to the catch or non-catch is is no longer directly descriptive, only because we don’t know what actually happened to cause that ball to be caught or not.  We only know that approximate location, trajectory, and speed.

So, in small samples, an indirectly descriptive stat like UZR is not only not particuarly great in predicting future defensive performance in small samples, it is also not particular great in describing what happened, quantitatively, in small samples.

For example, say we have one ball hit in the vicinity of the SS and he misses it.  UZR, using the stringer’s description of the batted ball, and its imperfect engine (which lumps lots of different kinds and locations of batted balls into the same bucket) might think that that particular kind of batted ball gets fielded 70% of the time and docks the fielder accordingly.

We KNOW that he should be docked something, because he didn’t catch the ball (therefore that part is perfectly and directly descriptive), but we don’t REALLY know how often an average fielder catches that kind of ball because we don’t REALLY know exactly what kind of ball it was and how catchable it really was.

Maybe, in reality it took a bad bounce (not recorded in the data) and instead of being caught 70% of the time, it would have been caught 20% of the time?

The defensive equivalent of an offensive metric is just reporting whether the ball was caught or not and in what basic part of the field, like zone rating.

The offensive equivalent of UZR would be to assign a value to a single (and double, etc.) based on how hard it was hit, where on the field, etc. (like pitcher PZR).  That would be an indirectly descriptive stat, but you know what?  It would actually have much more predictive value just like UZR and similar metrics have more predictive value than stats like zone rating or range factor!

The only drawback to that kind of offensive and defensive stat is that if there is too much inaccuracy or bias in the data, it can become worse than the directly descriptive and more simple stat.

Anyway getting back to Carlos Lee, because we suspect that he is a bad defender in LF and he has had prior poor UZR numbers in LF, we suspect that his +7.4 is not only a poor predictor of future UZR, but is not a very good description of what actually happened (maybe he has played a little better than his true talent this year, but it is still not likely that he has played all that well in LF).

Which is why I HATE using un-regressed numbers for anything especially for indirect descriptive stats. At least if a player plays well-above his true talent rate offensively, we can say (when using an un-regressed offensive stat), “Well, that is what he actually did. So what if he happened to get a lot of bloop hits, home runs just over the wall, walks on close pitches, or an in-ordinate percentage of “cookie” pitches by chance alone. They all count!”

Not so with metrics like UZR which are not directly descriptive. An historically poor fielder with a positive UZR probably did not actually play well in the field and vice versa for an historically good fielder with a poor sample UZR…


#54    Brian Cartwright      (see all posts) 2011/09/09 (Fri) @ 00:37

Here are my numbers for Carlos Lee. For most years thinks are consistent, such as in 2008-10 where the ratio between obs and exp remains constant.

Problem is this year where the obs catch rate went up the same time the exp went down.

year pos team  fraa airb   obs   exp
2005   7  158   4.9  564 0.551 0.537
2006   7  158 -14.1  289 0.471 0.537
2006   7  140  -4.9  157 0.529 0.573
2007   7  117   3.6  461 0.514 0.506
2008   7  117  -8.6  276 0.493 0.533
2009   7  117  -8.0  380 0.458 0.509
2010   7  117  -8.8  294 0.476 0.516
2011   7  117   8.9  182 0.539 0.451

Looking deeper, in 2011 Lee is +3.8 runs on 26 LD by LHB, but he’s also +4.4 runs on 50 FB by LHB. On LD & FB by RHB and all ground ball hits, he’s about +1.

So almost all of his plus value this year has been on balls in the air by left handed batters. I might suspect a problem with LD vs FB classification, but it’s on both. But it’s only LHB, not RHB. Then again, it’s only 76 fly balls, so could be a small sample - but it bothers me that only 76 balls could swing his defense almost 20 runs after being very stable for three years running.

I will be looking into possibly not using LD & FB buckets, instead imputing a catch rate adjustment from the gb% of each pitcher the fielder played behind.


#55    Tangotiger      (see all posts) 2011/09/09 (Fri) @ 09:11

Brian, can you email the following columns for each year for Carlos Lee (2007-2011), other Astros LF, opposing Astros LF:

year
identifier (Lee, mateAstros, oppAstros)
actOuts (not rate)
BIP (batted balls sans HR)
GB%
IFFB%
OFFB%
LD%
expOuts (not rate)


#56          (see all posts) 2011/09/09 (Fri) @ 11:31

MGL: Thank you. That took a lot of time on your part and is much appreciated. See, your details, are very good.  I’m not sure why, as a female (females being notoriously pegged with being emotional), in these threads I’m always thinking: stick to the facts and away from the emotion.
But anyway, I still think Hippeaux has a better understanding than what was perceived by some readers. I find it hard to believe anyone writing about Sabermetric stats would not know one year samples of stats are less reliable than two years and so on (even I know this).  That being said, linear weights based stats are complicated (correct me if I’m wrong) because they are context and raw data, open-minded and literal, individual people and averages – something no other stats can merge. A stat that has been both ground breaking and yet still raises a few (very few) questions.  The inherent nature of these stats will always provide room for both critique and application. Again, thanks. – Anna


#57    Tangotiger      (see all posts) 2011/09/09 (Fri) @ 12:04

The takeaway should always be this:

Is the way the data is being processed by WAR better or worse than what you would do yourself?

If you choose to discard WAR, then you have to create your own “smushing” system.  Is that personal smushing system more reliable, less uncertain?

***

Now, perhaps SOMETHING in WAR you can do better.  So, instead of discarding WAR, you keep most of it, and replace, say, UZR, with, say Fans Scouting Report (i.e., the eye test).  And, you say, that is more reliable and less uncertain.

And that’s fine, and acceptable.  And that’s why Fangraphs and Rally present the WAR data by its components: to make it easy for you, as a person, to make your smushing system more reliable and less uncertain than what Fangraphs and Rally show.

The framework of WAR is rock solid.  The components used in the various implementations can be discussed, but you have to bring an alternative.

“Slugger”, if that’s an alternative, can be included by you, but you can pretty much presume that most people will then discard it altogether.


#58    Brian Cartwright      (see all posts) 2011/09/09 (Fri) @ 12:14

Tango/55 - Yes I can do that, I have to gather up the data from different tables (one thing I must do soon is get one summary table).

Do you want the ballpark listed as well, and should that be for all Astros games within the time period?


#59    Tangotiger      (see all posts) 2011/09/09 (Fri) @ 13:01

For ballpark, just show:
park (MinuteMaid, elsewhere)

And yes, for all Astros games 2007-present.

Thanks…


#60    Rally      (see all posts) 2011/09/09 (Fri) @ 14:03

"“Slugger”, if that’s an alternative, can be included by you, but you can pretty much presume that most people will then discard it altogether.”

I’ve seen this objection a few times and it really perplexes me.  Especially when some of the examples given are Miguel Cabrera (26 homers) vs Ian Kinsler (28 homers) or Tulowitzski (30 HR) vs Fielder or Howard (31 HR).

Anyone with any kind of notion that WAR is underrating sluggers: I would love to see an example, and a breakdown of exactly how and where your rating would differ.


#61    Brian Cartwright      (see all posts) 2011/09/11 (Sun) @ 23:58

Tango/59 - I sent you the data in an email

One thing I noticed is that there are many more balls coded as LD this year in Minute Maid. My LD park factor for that park is 0.85 over 2005-2011 (roughly .16 at home, .20 on the road), but looking at Lee, his Astros mates and the opposition in 2011, home LD rates are up to .20 or .21.

I’ve never really thought that LD park factors are the result of varying amounts of true LDs between different parks, that it’s mainly a matter of perception, which Colin did quite a bit of research into, including his article on heights of press boxes.

So if this year LDs at Minute Maid are being classified at close to the MLB mean, from .16 previously to .20 now, the .04 that was added were then previously recorded as FB. Therefor the expected out rate of a LD in 2011 would be higher than the LDs in 2005-2010. As I am assigning 2011 LDs the out rate as observed in 2005-2011, I would be giving them too low a value, and thus giving undeserved credit to the fielder.

If this theory is correct, another reason to not use LD vs FB in the outfield.


#62          (see all posts) 2011/09/12 (Mon) @ 13:09

It seems that combining the positional adjustment with fielding runs solves this guy’s problem with WAR ... in other words Lee and Tulo are no longer “equal fielders”.

I’ll admit that I routinely do not factor in Positional adjustments into fielding and I support the combining of both of them into the “defense” component of WAR.

I would prefer to have simply just a [1] defensive runs (UZR/Def + Positional) and [2] offensive runs (batting + baserunning), and I think brWAR does that.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:37
What sabermetrics is NOT

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion