THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, September 16, 2010

Wars of WARs

By Tangotiger, 09:05 AM

I’m going to link to Neyer’s piece mostly because I love his headline.  Rob also says:

But it’s probably not going to happen soon, and I guarantee none of those guys will listen to me. They’re subject to the same prejudices as the rest of us, prone to the same egos and interests. But they’re also more amenable to reason than most of us.

I just want to say that this has nothing, nothing at all, to do with ego.  But Rob is right that this is all about reason.

The WAR framework has developed over time.  It started at least as far back as Pete Palmer’s Hidden Game, where he started off with WAA (wins above average).  The basics were laid out: offense above average, defense above average, some sort of fielding position adjustment, some sort of relief adjustment.  After that came Bill James, who had offensive and defensive wins and losses, which he then compared against “chance that these number of W/L were put up by a .400 player”.  And then Bill James again in 1987 comparing Clemens/Mattingly and Rice/Guidry, where the first fresh talk of replacement level came in (specifically noting that the replacement level for pitcher, in that article, was one run above league average). After that, there was (I’ve been told) BBBA, and also Woolner’s VORP followed by Clay’s WARP, and Bill James with Win Shares.  Those guys brought it up a notch.

When I became involved was at the old Fanhome boards (RIP) where we had many many many discussions of replacement level.  Then, about three years ago, I had what I wanted: the positional adjustment and the relief adjustment.  The positional adjustment was, I think, the last big piece in order to stabilize the WAR framework.  It was one of those things that was bothering me for a long time, and with some help from the readers of this blog, I was able to crystallize that.  The relief adjustment was in due in large part to Guy’s contribution, though minor in the grand scheme of things, was major as it related to the very small subset of pitchers out there.

And that’s where we are: the WAR framework.  The WAR framework is about offense compared to average, fielding compared to positional average, a positional adjustment without relying on the offense for that position, a playing time value (replacement level); on the pitching side, different baselines for starters and relievers (a concept first introduced by Woolner); a league adjustment.  Fangraphs and Baseball-Reference agree on this.  And, I suspect that Baseball Prospectus is trending toward this framework.  If we don’t have consensus, we are going to get it.  I’ve been dealing with the WAR framework and chatting with the readers of this blog for so long that I think that any of the major issues have been hashed out.

So, what’s the issue then?  Well, now that we have a framework, everyone wants their own implementation of that framework.  Not all houses are built the same, are they?  But they all have the same foundation, the same basic structure.  There are two major differences in the implementation (the houses) built by Fangraphs and Baseball-Reference over the WAR framework (blueprint) that has been developed and championed by me and other readers of this blog:
1. UZR v Total Zone: the fielding systems are different; they are different because one guy has access to more data than the other. 
2. DIPS v Runs Allowed: one takes a belief system that they are only going to rely on non-BIP for pitching, while the other takes the belief system that all runs are attributed to the pitcher, regardless of sequencing of events, with some generic team-based fielding adjustment

There are other minor issues, like how park factors are used, what kind of relief adjustment is made, what kind of AL v NL adjustment is made, how low or high to set the replacement level, among a few others.

But, they’ve both agreed on the WAR blueprint.  Now, the discussions is on the peripherals, about whether to use a two by eight or a two by ten, about whether to use solar panels, about whether to hardwire the smoke detectors or use wireless.

In no way will further discussions lead to invalidating one implementation or the other.  In no way should anyone choose a position based on ego.  The implementation I favor is whichever one I can backup with evidence, one that I can verify and stand behind.  I stand on the side of truth, or at least one of reason.  I have no agenda other than to expose holes and make sure there are no cracks in the foundation.  (e.g., If someone is going to use Runs Created instead of BaseRuns or Linear Weights, that’s a crack, and I will expose it.)

Until then, presume that all sides have something to add, and just take the midpoint of them all. 


#1    Rally      (see all posts) 2010/09/16 (Thu) @ 10:02

"1. UZR v Total Zone: the fielding systems are different; they are different because one guy has access to more data than the other.”

We could have access to the exact same data and they would be just as different.  Compare UZR to Pinto and Dewan.

We aren’t going to get a unified WAR unless we can prove the superiority of one defensive metric over the others.  The only thing I have to go on is using Dewan’s led to better team defensive projections for the 2009 season.  Hardly conclusive.  Even if we are able to crown a king of the defensive systems, some of the sites showing WAR have access to that stat and some do not.  I don’t think not having the stat will make any of us decide to shut down.


#2    Rally      (see all posts) 2010/09/16 (Thu) @ 10:07

I do find Rob’s mention of ego a bit odd.  Among those who publish a WAR rating, I can’t recall anyone saying “Use mine, it’s better than all the others out there” In general, we encourage people to look at the others and if so inclined use the framework to create their own.

This idea of a war of WARs is external.  I think there are too many idiots out there who want us to fight it out, declare a winner, and publish the exact same number on every sabermetric site.  That way the idiots have a number they can use without having to think for themselves.


#3    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 10:24

If Clay hadn’t changed WARP, and if Colin wasn’t over there revamping everything, then there would be a fight.  Clay reasoned his way to the replacement level that the rest of us had bought into, and Colin is pushing Linear Weights, which again the rest of us accepted.  Once Colin puts his stamp on WARP, then we’ll likely have unification on the framework.

Where there IS a war is with the forecasting systems.  The bare basics are there (what Marcel does).  After that, there’s component-based forecasting, there’s minors and college forecasting, and there’s more age-specific forecasting, and possibly position-specific.  And similar players.  There’s alot out there. 

If one wants to criticize based on ego, I’d talk about that.  The lack of uncertainty ranges in the forecasts, the lack of oversight, those are big problems.


#4    Colin Wyers      (see all posts) 2010/09/16 (Thu) @ 10:47

Rally, can you tell me what your team defensive projection correlations would be if you dropped Seattle?


#5    birtelcom      (see all posts) 2010/09/16 (Thu) @ 11:31

I continue to prefer Total Zone to UZR not for accuracy reasons (which remain difficult to fully evaluate) but because total Zone is capable of backwards historical application in a way UZR is not.  Because one of the great charms of baseball stats is their ability to give us broad historical context, even if a stat might be a little less accurate (not a lot, but a little), for me that would be offset by the advantages of being able to use that stat to evaluate not just contemporary players but historical performances as well.  Sure it’s good to be able to evalaute and compare the defensive value of Reyes, Jeter, Tulowitzki, Rollins in 2010.  And the accuracy of such comparisons is the most important factor for a general manager, or a rabid fan of a particular team, in 2010.  But as a general baseball fan (as well as a fan of a current particular team), I also want to know how those guys in 2010 compare on the same or similar scale to Ozzie or Aparacio or Pee Wee Reese.


#6    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 11:57

birtel: Imagine that you only had AB, HR, R and RBI pre-2002. And since 2002, you have everything.

Now, would you insist on using some combination of AB, HR, R, RBI for all players, or would you limit that to pre-2002, and use more information from 2002 onwards.

***

Suppose you only have PO, A, E, pre-1950, but you have “first fielded by” since 1950.  And you have “zone location” since 2002.  What are you proposing?

The correct answer: use as much reliable information as you have.


#7    Rally      (see all posts) 2010/09/16 (Thu) @ 12:52

"Rally, can you tell me what your team defensive projection correlations would be if you dropped Seattle?”

Why Seattle?

I could, but not sure when.  First of all I have to be home, and second of all I have to have enough free time at home to open the damn thing up, which did not happen last night.  Send me an email to remind me and I can do this eventually.


#8    Colin Wyers      (see all posts) 2010/09/16 (Thu) @ 13:03

Because dropping Seattle causes the single biggest change in the year to year correlations for DER from ‘08 to ‘09.

And thanks. I’ll drop you a line here in a bit.


#9    Matt K      (see all posts) 2010/09/16 (Thu) @ 13:03

Agreed on all points. I’ll add that while I understand why disagreement on details of the framework and/or components or WAR is disturbing to some, such reactions tend to overlook the advantage of the “different WARs” being out there. Disagreement, debate, etc. are what lead to progress. It’s a sign of vitality. It’s how progress happens in knowledge-accumulation in general, and sparks further research, discussion, debate, which leads to refinement of our models and tools, which leads to more discussion, debate, refinement, and so on.

We accept the multiplicity of “research programs” in natural science and social “science,” why can’t we accept it in sabermetrics?


#10    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 13:07

Some of “we” do.

And some of “we” don’t.


#11    B      (see all posts) 2010/09/16 (Thu) @ 13:54

I’m not sure why a few forward thinking mainstream writers (mostly Law and Neyer that I’ve seen) have gotten ahold of the idea we need one version of WAR.  It doesn’t make much sense to me.  There are different ways of looking at things that are equally valid - whether you want a context neutral linear weights approach, or you want to take context into account (like using WPA or something) isn’t a matter of right/wrong - it’s a matter of preference and what information you want.  Sometimes one works better, sometimes the other does.  For instance, in terms of whether you want context, if context is just random variation, taking it into account will do a better job of describing the past, but a worse job of predicting the future.  That doesn’t make it right or wrong to use context, it just depends on what information you’re looking for.

So I just find the idea that we need one WAR...strange.  We don’t.  But then again, I guess that’s what your post is saying, Tango.  wink


#12    Rally      (see all posts) 2010/09/16 (Thu) @ 14:08

I’d kind of like to see a ‘perfect’ version of WAR - one that uses uses WPA for it’s components, but is able to use detailed PBP records to distribute credit for events between batter/runner on offense, and much tougher, pitcher/fielder on defense.  One example is a double scoring a runner from 1st base - use hit location data to determine how much credit for the extra base goes to the runner or the batter.

Do that and still account for position adjustments and the different replacement levels for starter and reliever and you’d have a WAR implementation that I would bow down to.


#13    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 14:26

One example is a double scoring a runner from 1st base - use hit location data to determine how much credit for the extra base goes to the runner or the batter.

Definitely.

I’ve been repeating this for years, but this is how I think: “Given this context, what would an average player have done?”

So, how would a runner at 1B respond to a ball hit 280 feet with 3 seconds of hang time, and two bounces to the outfielder, at +14 degrees from 2B?

Until then, I’d be happy to say that any player within 1 WAR is in a reasonable position to be argued for.  That’s not to say they are equals.  And the further you are from 1 WAR, the more justification your argument needs.  Jimmy Rollins IIRC was one such player where WAR didn’t have him at #1, but he was close enough that you can make a reasonable argument.

WAR is a great way to whittle the list down to about 10 or 20 players to talk about.


#14    B      (see all posts) 2010/09/16 (Thu) @ 16:53

@Rally - is that WAR “perfect”, though?  Seems like it would be perfect at describing exactly how much an individual contributed to his team over a given time period, but what if you’re a GM looking to sign a player and figure out how much to pay him?  Would it be the best WAR framework for predicting future performance so you can properly value the player?


#15    birtelcom      (see all posts) 2010/09/16 (Thu) @ 17:48

tango #6: We have a lot of information going back to 1920, and even before, that can narrow the gaps in our knowledge of “opportunities” on defense, to supplement our knowledge of successful conversions of opportunities as expressed in A and PO.  We know team level BIP numbers, we know pitching staff handedness, we have a very good idea about fly ball and ground ball ratios—all from box score info that goes way back. We shouldn’t set up a straw man opposition between supposedly having just unrefined range factor for a hundred years and zone locations after 2002.  We can do a lot better than unrefined range factor with box scores and, as you point out, with play-by-play from 1950 on.

On the other side of the equation, zone location brings in a certain amount of subjectivity that box score numbers don’t.  I’m not disagreeing with you that more info is better, but I am saying that if there is data avaialable only after 2002 which is only slightly more reliable than data available for much of baseball history before that, my own personal tendency will be to look most often at the numbers that can be applied broadly across eras, even if that might means lose a small amount of accruracy with respect toh most recent players While of course also recognizing that for purposes of comparing current players to one another, the current state-of-the-art is there to be used as the most accurate info.  No disagreement that more data is better, and different forms of data have their different roles to play.


#16    Toffer      (see all posts) 2010/09/16 (Thu) @ 19:36

@B

WAR is a descriptive stat. It is not a projection system. Most people want to know how a player actually played. GMs can certainly adjust thier numbers to better predict how players will perform in the future but when it comes to WAR I’m sure the vast majority of fans prefer it to describe what actually happened and not what should have happened.


#17    Alexander      (see all posts) 2010/09/16 (Thu) @ 19:39

I would like to see Fangraphs or Baseball-Reference offer context WAR, context-neutral WAR, luck-free WAR, and projected WAR based on the offensive and defensive projection currently.

Another thing on my wish list is a page of the average value in a variety of stats, such as the average pitcher FIP, the average starter FIP, and the average reliever FIP, and the same thing for other stats. And maybe “replacement” level for each stat (yes I know it’s not really replacement because WAR contains many factors). (Sorry for being off-topic on that one.)


#18    tangotiger      (see all posts) 2010/09/16 (Thu) @ 19:58

Average pitcher FIP = average ERA, just like average wOBA = average OBP


#19    Rally      (see all posts) 2010/09/16 (Thu) @ 21:53

Colin,

To answer your question on removing Seattle from the correlation, all the systems look worse but the ordinal ranking is not much different.  I get

Dewan .21
Pinto .20
TZone .12
UZR .09
ZR .03

All the systems predicted Seattle to have a good defense, in the +30 to +40 range, but they were even better than that.  That Seattle’s DER was completely different in 2008 than 2009 doesn’t throw any of these projections off, they all knew Gutierrez was playing center and Endy Chavez (among others) replaced Ibanez, and that Jack Wilson would replace Betencourt after half a year.


#20    Nick Steiner      (see all posts) 2010/09/16 (Thu) @ 22:13

The problem isn’t even that the WAR detractors are bothered by the different frameworks in this case - their bothered by an update of data.  Sean didn’t change any of his calculations, just added data to the park factors so that he could better gauge the correct park factor.  Honestly, that should be expected in, well, anything, so that’s not a valid reason to mistrust more.


#21    B      (see all posts) 2010/09/16 (Thu) @ 22:40

@Alexander - that’s kind of what I’ve been getting at - there are legitimate reasons to want different kinds of information for different uses.  That doesn’t make one better or worse than the other, they’re just different and have different uses, so why would we want to agree on just one?

@Toffer - why wouldn’t a GM want to use WAR as a framework, though?  There’s no reason we can’t have one WAR to be predictive, and one descriptive of the past.  Plus, maybe one person wants a “luck free” WAR, while another wants to know exactly what a player did to help his team win games - basically, look at Alexander’s list of things he’d like to see, why wouldn’t we want a WAR for all of those?  There’s a reason to look at them all…


#22    Alexander      (see all posts) 2010/09/16 (Thu) @ 23:06

@18
I also meant things like average wOBA for every position (what wOBA makes a player with average defense an average player). The average FIP this year according to Fangraphs is 4.10. But what is the average FIP or ERA for starters and relievers each? I’m sure you could figure it out with a little math, but I think a page of averages for a bunch of stats would help sabermetric novices and relatively “sabermetric” fans alike. And yes, I know Fangraphs displays the averages for many stats when you hit “Show Averages.”


#23    kds      (see all posts) 2010/09/16 (Thu) @ 23:28

I don’t think that one WAR should be a goal because there will always be different purposes and uses.  There is a spectrum between pure value, looking back and pure ability, looking forward.  I can understand a preference for either linear weights or WPA, at the two extremes.  For most situations I prefer RE24 and also like WPA/LI.


#24    Lee Panas      (see all posts) 2010/09/17 (Fri) @ 00:01

I like seeing different versions of WAR for the reasons stated.  Mainstream writers want to see a single version of WAR because that’s what their readers want.  Most fans do not understand the differences between WARs and just want a number they can refer to and interpret easily. When they see differences between WARs or they see a WAR getting updated in the middle of a season, they don’t understand it.  It looks sloppy to them.

I think all the different variations are good for those of us who are really into stats, but not good for making WAR friendly to the mainstream.  I don’t think it’s an easy problem to solve.


#25    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 00:31

Rally, that’s sort of what I expected to happen. It’s interesting - I don’t know that 2009 is a very representative year, and so I don’t know how much the result tells us.

Looking at 1993-2009, the year to year correlation for team DER is .40. But looking at just ‘08 to ‘09, it’s -0.22.


#26    MGL      (see all posts) 2010/09/17 (Fri) @ 01:09

I don’t understand how you can use, for example, UZR, to try and estimate team DER when UZR is park adjusted and DER is not. Are you park adjusting DER?  If not, then the non-park adjusted defensive stats will do a lot better than the park adjusted ones in terms of predicting team DER.


#27    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 02:00

Rally is, I believe, park adjusting DER. I can’t speak to the specific nature of the adjustments. (He’s also adjusting for observed batted ball scoring in those parks, IIRC.)

I was not park adjusting DER at all (it was a pretty lazy calculation, all told, based upon official pitching stats) - I was just illustrating that 2009 was an odd year for predicting team defense.


#28          (see all posts) 2010/09/17 (Fri) @ 07:55

Why don’t we simply let data determine the fate of each house? If the year-to-year correlations are better for one WAR than for another, shouldn’t that implementation be the preferred choice? Aren’t we simply trying to explain past performance while predicting future performance?


#29          (see all posts) 2010/09/17 (Fri) @ 08:02

I think the “joe lunchpail” crowd (is that too harsh? ..... maybe “unwashed masses” .... naw that’s a bit archaic ... how about “teeming throngs”???) tend to dismiss any statistic for which multiple values appear

You could talk all day long about DER, DIPS, UZR and WTF but you’d have the same chance of success explaining differential equations to a hamster

So it appears there are 2 paths foward here

1) The community agrees on a common WAR and now Freddy Fan can chug a beer while belching out “yeah but Jeter has a career WAR 45 wins better than Nomahhhhhh”

2) We continue with multiple versions of WAR and those that understand the differences are free to use it or not use it (at least until Mom needs to use the basement for laundry or her Mah Jong club)


#30    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 09:12

I think we are going to get to a point where a popular WAR implementation will become de rigueur.

I think we’re still feeling our way into which direction should ultimately prevail, not to mention we’re pretty early in the WAR implementation lifecycle.

The WAR framework that was popularized here is about three years old, and Rally / Fangraphs came out with their implementation, what, about a year and a half ago?

The major players would need to convene a summit to standardize some of the major things, like UZR v TZ, whether to include clutch (WPA or not, or by how much), how much DIPS, park factors, etc.

As it stands, there’s not much incentive to do this.  What’s going to change if we standardize WAR?


#31          (see all posts) 2010/09/17 (Fri) @ 09:54

As was mentioned earlier, part of the problem people have with WAR is how much it can differ depending on the defensive estimator used.  Is there any disadvantage to sticking UZR, TZ, DRS and RZR (and anything else I might be missing) on the same scale and aggregating them for a WAR fielding component?


#32    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 10:06

The only thing you can NOT do is merge fielding metrics where one is a subset of another.  For example, it makes no sense to have both UZR and ZR (not that you were proposing that one).  UZR is ZRplus.  It would be like mixing wOBA with OBP: wOBA is OBPplus, or SLGplus.

Now, what does TZ do differently from UZR that makes it a non-subset?  I’ll let Rally answer that one.

With DRS, it uses the same dataset as UZR, but is it a subset, or looks at things in different ways?  I’ll let Dewan answer that.

I can only speak for my metric (WOWY), and that one is decidedly different.  It takes the position that hit locations are unreliable, and therefore dismissed entirely.  It instead presumes that a pitcher has a “style” of batted ball distribution and that a batter has a style and that a park has a style, and that focusing on that is “good enough”.  Is it?  Not for one year, no.  But, after a certain number of years?  Yes, I would say.  I would say that the biases in recording data will be systematic in UZR’s case such that after 6+ years, those biases will remain.  But in WOWY, it’s more likely that all those things will wash out once you account for who the pitcher, batter, and park is.

There are many many other fielding metrics out there, each taking its own position on reliability of data.

In the end, once FIELDf/x takes shape, all this will be meaningless, since all these other metrics estimate what FIELDf/x will be recording.


#33          (see all posts) 2010/09/17 (Fri) @ 10:17

This isn’t directly related to the dissection of WAR you’re having here, but it’s been on my mind a while, and won’t go away, so I’m going to bring it up.

Why are people having such a hard time understanding that there are different WARs? We totally accept that people can buy the same goods with different currencies.

A person can use Dollars, Pounds, Euros, or Peso’s to measure the value of something. No one has a problem with that. Find the exchange rate and use the printed paper you think is prettiest.

I’m not seeing why Player X can’t be measured with fWAR, brWAR, VORP, or WARP and that some exchange rate will get you to the currency you want.

While there are some changes in the relative position of players depending on the scale used, this is a feature, not a bug, and the kinks are still being worked out.


#34          (see all posts) 2010/09/17 (Fri) @ 10:30

Kevin/31, no one has found any accuracy advantage in doing that.  It seems to make some people feel better to do it, though.


#35          (see all posts) 2010/09/17 (Fri) @ 10:42

Why doesn’t it bother people just as much that BABIP has different definitions based on which site you go to?

Is the answer to that to just take the midpoint of BABIP from Fangraphs and Baseball-Reference until they can work it out?

This whole discussion is a bit surreal to me.


#36    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 10:49

I think the difference is that there’s nothing more to do with WAR than… look at it.

With BABIP, you take it, and you do stuff with it.

That’s why people don’t like the “single number”, because they feel obviated.


#37          (see all posts) 2010/09/17 (Fri) @ 11:00

"As it stands, there’s not much incentive to do this.  What’s going to change if we standardize WAR?”

You’ll spend 15 minutes less each day in discussions like these?

The biggest advantage to doing it is if you have compelling evidence that one approach is better than the other.  Otherwise, having two (or three or four) is better than one.


#38    Rally      (see all posts) 2010/09/17 (Fri) @ 11:04

ZR used by Chris Dial is the only system that still uses STATS data.  UZR, Dewan, PMR are all using BIS.  To me that alone is enough reason to pay attention to it.

TotalZone is using Gameday data.  That’s enough for me right there, but for 2005-2009 I’ve got MLB hit location data instead of the estimates I had to make do with before.  At this point I’ve got most of the same inputs UZR has, unless MGL is getting something new like hang time for flyballs in there.  To the extent we come up with different numbers it is not clear if one system gets more right than the other.

I don’t see any problem with combining multiple fielding measures for a consensus WAR.  It’s not practical for a website publisher to do that - we can only use the systems we own or work out an agreement to use.  But the data is easy to download, go right ahead at home and make up your own lists.

Field F/X only changes the equation if we have access to it.  From what I can tell, that is not going to happen any time soon.


#39    B      (see all posts) 2010/09/17 (Fri) @ 11:16

"Aren’t we simply trying to explain past performance while predicting future performance?”

Well Rob, one version of WAR might perform better at explaining past performance while a different version performs better at predicting future performance, so that’s kind of the point.  There are different things we might want to know for legitimate reasons, and we might need multiple versions to do that.


#40    Will23      (see all posts) 2010/09/17 (Fri) @ 12:39

I don’t like the idea of “averaging” different WARs because if one is better than why mitigate that with a lesser version? Of course, determining better is the issue.

WAR’s burden is it isn’t contributing to analysis...it is doing it. If you are going to rely on it, you really can’t have qualifications. That’s why it would be nice if some “real” consensus could be reached, and perhaps that’s slowly where we are headed. I mention “real” because a contrived compromise wouldn’t be much help. In the meantime, it is hard to have confidence in competing versions, at least until one distinguishes itself.


#41    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 12:56

The only consensus you are going to get is: “Use fWAR when...” and use “Use rWAR when...” .

Seriously, for MVP talk, I definitely want clutch (or WPA) in there in some form.  For contract talk (how I value players), I don’t want clutch there except for the smallest of portions.

It’s against the very idea of a WAR framework that there be only one implementation.  It’s like saying that if you have a screwdriver, it must be a Philips.  Why not the flathead?  Or, for that matter, vodka and OJ?


#42          (see all posts) 2010/09/17 (Fri) @ 13:00

The only thing you can NOT do is merge fielding metrics where one is a subset of another.

If we’re merging them simply because it feels good, rather than on the basis of any evidence that it improves accuracy, why can’t we merge whatever fielding metrics we want?

This is not sabermetrics, it’s a fielding metric buffet: take however big a helping of each dish as you like.

You don’t like both ZR and UZR on your plate, but other people might prefer that combination of flavors.


#43    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 13:26

You don’t like both ZR and UZR on your plate, but other people might prefer that combination of flavors.

MGL could create UZR1, UZR2, UZR3, UZR4, and UZR5.  UZR1 would be basic ZR.  UZR2 would make the zones more granular.  UZR3 is UZR2 + park adjustments, and so on.

Or I could create Marcel1, Marcel2, Marcel3, each of which takes all the things from the previous one, and adds more.

Now, why in the world would you do (Marcel1+Marcel2+Marcel3)/3 ?

You either buy into Marcel1 because you don’t trust the adjustments in Marcel2, or you buy into Marcel2, and discard Marcel1.  To use both makes no sense.  It’s like saying: “I kinda buy into the Marcel2 adjustments, but not totally, so I’ll accept Marcel1 and Marcel2 at the same time.”


#44    Lee Panas      (see all posts) 2010/09/17 (Fri) @ 13:31

I think the advantage of averaging fielding metrics is to get a more conservative estimate.  The idea is that when there is a lot of uncertainty in a metric, it might be better to err on the side of caution.  For example, a player has the following numbers:

UZR +20
DRS +8
TZ +6
PMR +6

If UZR is your metric of choice, you would add 2.0 to the WAR. If +20 is an outlier that just doesn’t work for this particular player, you would arrive at a WAR estimate that is very inaccurate. If you take the average (10) instead, the effect of the outlier would be alleviated.  The disadvantage, of course, is that the +20 might actually be right and you’ll be underestimating the player’s defensive value by taking an average.

I think it has been shown that averaging will not improve the accuracy of projections.  However, it might decrease the number of times you are very wrong about a player. Has this been studied at all?  When you project into the future, do you get fewer really large deviations using an average rather than a single measure?


#45    Mike      (see all posts) 2010/09/17 (Fri) @ 13:38

"To the extent we come up with different numbers it is not clear if one system gets more right than the other.”

The fact that it rates Ryan Zimmerman at replacement level defensively over his career tells me everything I need to know about the accuracy and worth(lessness) of TotalZone.

If it can mess up this bad on a player that every other metric (plus the subjective opinions of scouts, writers, fans and MLB coaches) says is at elite level defensively, then it can’t be trusted.


#46    Rally      (see all posts) 2010/09/17 (Fri) @ 13:47

Cherry picker.


#47    Mike      (see all posts) 2010/09/17 (Fri) @ 13:55

I was expecting a claim of “outlier”.  Close enough, I guess.

No defense, I suppose ?


#48          (see all posts) 2010/09/17 (Fri) @ 14:01

I’m curious about the usage of FIP in determining pitcher value. It seems to attempt to try to separate out things like luck and defense (by ignoring hits and focusing on indicators), but then doesn’t account for things like a fluky home run totals. Additionally, hits do play a big part in runs and just because we don’t understand them well enough (I mean more than the superficial level of ld, fb, gb rates) it seems incorrect to ignore their effect on ERA. Namely, some guys just don’t give up a ton of hits (like CY and Cain) and some guys do (like Nolasco and Liriano).

You mentioned not using a combination of equations because either you believe in one system or another. But why not in this case when one system is useful but not entirely valid and another might possibly capture elements that are ignored by the first? Instead we use the dubious term of luck to explain shortcomings in the algorithm. Someone like Matt Cain will always be shortchanged until we understand why the algorithm can’t explain his success.


#49    Rally      (see all posts) 2010/09/17 (Fri) @ 14:19

Well Mike, I don’t know why your crack even deserves a defense.  You pick out one player whose rating you don’t agree with and that invalidates the whole system?  I’m sure I could find a similar case for UZR, or DRS, or any of them if I wanted.  Would a single case invalidate every system? 

I’m trying to jusdge the systems on a bit more than that.

You need to explain why you chose the term “replacement level” defense instead of “average defense”.  I think you are doing it for effect instead of any kind of attempt at a rational thought. 

Mabe it’s just an unfortunate result of Sean Forman’s recent split into offensive and defensive replacement level.  But keep in mind that it is the exact same thing as average, and if he were to put some number in there to make the Adam Dunns come out at zero on defense, and the average players who play every day at some positive number, he’d be following the mistaken path that Davenport did for so many years.

Finally, about Zimmerman is that once I added the MLB hit location data he does come out at +12 for the 2005-2009 period.  This has not been updated on BB-ref, I think it’s something Sean Forman wants to put in place for next year, but it can be found on Fangraphs.


#50    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 14:30

You don’t like both ZR and UZR on your plate, but other people might prefer that combination of flavors.

Sticking with the food analogy, choosing between ZR and UZR (presuming they are both based on the same datasource), is like choosing between a chocolate donut with and without sprinkles, and you decide to have half of each.

If they are based on different datasources, that would mean a chocolate donut at Tim Horton’s and a chocolate donut with sprinkles at Crispy Creme.  In this case, you ARE justified to take half-and-half.

Namely, some guys just don’t give up a ton of hits (like CY and Cain) and some guys do (like Nolasco and Liriano).

If this were true, then we want not be so fond of DIPS and FIP.

I’m sure I could find a similar case for UZR, or DRS, or any of them if I wanted.  Would a single case invalidate every system? 

Right, you’ll always have someone.  I mean, bUZR and sUZR had Andruw Jones over a 7yr period at 112 run difference.  That doesn’t invalidate UZR, but it calls into question the data source.

If Zimmerman, who could just as easily be playing SS, rates low in TZ, this could very well be an issue with Gameday.

Rally: can you break it down home/away?

I have to say that there is a small, but not insignificant, data quality issue with Gameday.  But, that’s because Gameday was not looking for precision in tracking of hit locations.  Perhaps they are more serious about it this year, but that wasn’t the case at startup.


#51          (see all posts) 2010/09/17 (Fri) @ 14:53

The problem is in how people use WAR. Currently, WAR is at the center of the MVP, CYA, ROY races, etc.

So, fWAR says Cliff Lee is 1-2 in CYA, rWAR says he’s out of the top 10.

2 WAR is a pretty big gap. It’s big when it’s the diff between replacement and league average, but it gets even bigger (IMO) when the value increases, due to their being fewer high WAR players.

The WARs will generally agree, but at times they will not. If “people” do not intend for fans to use WAR in meaningful ways, to influence important decisions, then it’s no biggie ... everyone can have their own pet WAR and it’s no big deal because no one owuld WAR for anything important.

But, as I said, WAR seems to be at the center of many “award discussions” or “who’s best”, and the like.

My preference, and I’m being repetitive, is to average them. One system of WAR removes all batted balls, the other accepts them. When you average them together, you are giving the pitcher *some* credit for not giving up hits and runs, which is (IMO) how it should be. It’s not a 0% or 100% situation. It’s not all luck or all defense. Until we know better, we should assume that pitchers do have some influence over hits and runs allowed.

The crazy thing to me is that while Cliff Lee was in his 5 game streak of giving up a ton of hits and runs, he was rising up the fWAR leaderboard (all the way to #1, currently #2) simply because the hits were not homers and he doesn’t walk anybody.

There are so many aspects involved and perhaps data required that we currently do not have access to, that we may never know what % is pitcher influence, what % is defense, and what % is luck/variation ... but common sense indicates to me that the pitcher should be given *some* credit for the stats, even if it’s not repeatable. Career high/low BABIPs count for batter WAR, why not pitcher WAR?


#52    Mike      (see all posts) 2010/09/17 (Fri) @ 15:05

Rally -

If it were just me disagreeing with it, then you’d have a point.  But it’s not, it’s basically everyone.  Some are just willing to overlook it or are too diplomatic to say anything about it.

Average defense = replacement level defense.  At least, that’s what Sean Forman said in his blog post explaining dWAR. But I’ll refer to it as average if you’d prefer.  The difference between an average level defender and an excellent one is significant.

UZR has Zimmerman at 59 career.  DRS ?  79.  TZ ? 2.  That’s a massive difference for less than 5 full defensive seasons.  ~12-15 runs per season.

Go ahead and try to find one.  Just one.  Just one player whose UZR or DRS disagrees that much with either UZR&TZ or DRS&TZ over that much time.

That is to say, from elite to average.  Or even from average to awful.  Or vice-versa.

Good luck.


#53    Lee Panas      (see all posts) 2010/09/17 (Fri) @ 15:08

Circle Change, I think it makes a lot of sense to average fWAR and rWAR for pitchers.  It might be a weighted average depending on whether you prefer runs allowed or FIP stats, but some kind of combo could work. I think it’s something that can be justified and might serve as a compromise in the FIP versus RA debate. 

I don’t know if I’d do the same thing for position players, although I might average some defensive numbers before adding to the offense.


#54    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 15:11

FWIW, Zimmerman is +61 career (through 2009) plays in WOWY, which is close to +50 runs.

I agree it is bothersome if he’s that low in TZ.  Then again, I’m sure there are a few players that WOWY shows as average, but everyone else shows as elite.  All this means is have a couple of fielding systems handy.


#55    Rally      (see all posts) 2010/09/17 (Fri) @ 15:49

These are only 3 years of data, since that is the biggest year range you can select on Fangraphs.  I found 3 whose ratings between UZR/DRS are very different:

JD Drew -4 or +23
Orlando Cabrera -29 or +6
Yunel Escobar +49 or +12

Per year difference is 9, 12, and 12.

Zimmerman is +12 by TZ, so the per year difference is 10 compared to UZR, 13 from DRS.

And UZR/DRS can’t even put the blame on using different data sources!


#56    Mike      (see all posts) 2010/09/17 (Fri) @ 15:52

Tango -

WOWY seems in line UZR and DRS, more or less.

It is indeed disturbing that he’s that low in TZ.  If I were an architect or supporter of TZ, I’d be even more disturbed that “my” system disagrees that much with, well, basically everyone else.  Even if it is a single player.  I’d want to know why.


#57          (see all posts) 2010/09/17 (Fri) @ 16:04

For pitcher WAR, why isn’t there a middle ground between DIPS and runs allowed?  Wouldn’t it be possible to quantify the value of pitch types (categorized by speed, movement, location, etc.) based on an “average” fielding team?


#58    Mike      (see all posts) 2010/09/17 (Fri) @ 16:04

Rally -

I’m seeing for the last 3 years of TZ:

Drew: +25

Cabrera: -27

Escobar +39

So you’d have a case there, albeit with only 3 years of data.  And a valid point concerning the data source.

When you have such disagreement between metrics over a player, I think one should see what the subjective opinions have to say on the matter.  Do you agree ?


#59    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 16:11

Rally: what I’ve always suggested to MGL is to have different versions of UZR (UZR1, UZR2, UZR3, UZR4), so that we can track to see at what point things get confusing.

With my WOWY, I keep things separate (WOWYbase, WOWYpitchers, WOWYbatters, WOWYparks, WOWYbattedballtype).  This way, if I see something weird, I can track it down.


#60    Rally      (see all posts) 2010/09/17 (Fri) @ 16:28

"When you have such disagreement between metrics over a player, I think one should see what the subjective opinions have to say on the matter.  Do you agree ?”

No argument there.  I’d go further, if TZ/UZR/DRS all say about the same thing, and the subjective rating is very different, I’d want to know that too.


#61    Rally      (see all posts) 2010/09/17 (Fri) @ 16:32

"I’d be even more disturbed that “my” system disagrees that much with, well, basically everyone else.  Even if it is a single player.  I’d want to know why.”

I’ve been there.  But you can’t dwell on that or you go crazy.  At some point i just had to accept that when multiple systems are rating hundreds of players at any given time, there are will be some where the ratings are going to be off.


#62    Mike      (see all posts) 2010/09/17 (Fri) @ 16:36

"No argument there.  I’d go further, if TZ/UZR/DRS all say about the same thing, and the subjective rating is very different, I’d want to know that too”

Now we’re getting somewhere.

In the case of Zimmerman: UZR, DRS and the subjective opinions say one thing, while TZ says another.

Looking at all of this, how do you evaluate him defensively ?  Throw out the outlier ?


#63    mike      (see all posts) 2010/09/17 (Fri) @ 16:41

"I’ve been there.  But you can’t dwell on that or you go crazy.  At some point i just had to accept that when multiple systems are rating hundreds of players at any given time, there are will be some where the ratings are going to be off.”

In this case, we are talking about one of THE elite defenders in the game today.  At least, according to everyone else.

I understand looking at the big picture and not sweating the “small” stuff, but I’d make an exception in this particular case.  FanGraphs’ WAR has Zimmerman as the most valuable player in the NL to date so far this year, based in no small part on his defensive value.


#64    Rally      (see all posts) 2010/09/17 (Fri) @ 16:42

I’d average them.


#65    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 17:15

It is indeed disturbing that he’s that low in TZ.  If I were an architect or supporter of TZ, I’d be even more disturbed that “my” system disagrees that much with, well, basically everyone else.  Even if it is a single player.  I’d want to know why.

But we know why. This isn’t rocket surgery. You have your basic framework:

(Plays - Expected Plays) * Runs Per Plays

The first and third terms are relatively easy to derive with some level of agreement. The second term is being estimated by different people based on a largely unvalidated set of assumptions based upon different data sets subject to unverified measurement error. And for the most part the people with access to the raw data to provide some accounting of the measurement error have shown no interest or capacity to do the grunt work of showing what the measurement error is in the underlying data sets.

There’s no great mystery as to why the agreement between these defensive metrics are so low.

“MGL could create UZR1, UZR2, UZR3, UZR4, and UZR5.  UZR1 would be basic ZR.  UZR2 would make the zones more granular.  UZR3 is UZR2 + park adjustments, and so on.

Or I could create Marcel1, Marcel2, Marcel3, each of which takes all the things from the previous one, and adds more.

Now, why in the world would you do (Marcel1+Marcel2+Marcel3)/3 ?”

Okay, but we can take the Marcels and compare them to reliably measured out-of-sample data and get a very precise measurement of which version of the Marcels is an improvement over the others. That’s not the same with UZR1, UZR2, UZR3, UZR4, whatever.

“Right, you’ll always have someone.  I mean, bUZR and sUZR had Andruw Jones over a 7yr period at 112 run difference.  That doesn’t invalidate UZR, but it calls into question the data source.”

Right, and we have no idea which data source is correct, or even if either of them is. If that level of inaccuracy of batted ball data doesn’t “invalidate” UZR, then what does?


#66    Tangotiger      (see all posts) 2010/09/17 (Fri) @ 17:21

It doesn’t invalidate UZR, but you can say it calls into question the possibility of doing anything with the data.  GIGO.


#67    Mike      (see all posts) 2010/09/17 (Fri) @ 17:27

"I’d average them.”

I’m sure you would.  Because it would give the outlier the semblance of credibility.

But in the face of such strong objective and subjective disagreement, I’d say that the outlier needs to be unceremoniously dropped on this one.


#68    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 17:28

It’s a two state process:

1) Is the batted ball data (hit location and trajectory) a reliable indicator of reality?

and

2) Is the methodology using that data correctly?

If 1 is no, then 2 has to be no, doesn’t it?

UZR cannot be valid if the data is bad - garbage in, garbage out. Even if the data was’t bad, though, UZR could still be invalid.

So I don’t get what you’re saying - are you saying that the principles behind UZR may be correct, it’s just that the data to utilize those principles may not be available? Maybe that’s true, but even so, if the batted ball data is unreliable, any output from UZR must be unreliable to at least the same extent, right?


#69    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 17:30

I’m sure you would.  Because it would give the outlier the semblance of credibility.

But in the face of such strong objective and subjective disagreement, I’d say that the outlier needs to be unceremoniously dropped on this one.

No, the “incorrect” values need to dismissed - and you don’t know which values are incorrect. In that situation, dropping the most extreme outliers is pretty much as likely to end up reinforcing error as it is to remove it. (Same with averaging, actually.)


#70    Mike      (see all posts) 2010/09/17 (Fri) @ 17:45

"you don’t know which values are incorrect”

You’re right, I don’t.

But I can make a pretty damn good educated guess and sleep well tonight.

While you apparently play devil’s advocate and look for obscure reasons to show that the Heaven’s Gate group was right after all.


#71    Brian Cartwright      (see all posts) 2010/09/17 (Fri) @ 18:00

ditto to everything Colin just said.

TZ is not an outlier on Zimmerman. My Oliver Fielding Runs (OFR) at THT Forecasts has him at +1.1 FRAA in 2010 (last four years +4, -7, +2, +1).

I can’t really tell you TZ’s methodology, but I can say that OFR and TZ both use Gameday.

(Plays - Expected Plays) * Runs Per Plays

Which is exactly what I do. As Colin said, it’s the derivation of expected plays which will create the greatest discrepancy, and part of that is deciding which plays are evaluated.

For example, with outfielders I use every fly ball, and also count how many total bases on the hits. I also count total bases on ground ball hits to the outfielders. I don’t currently do outfield arms (bases advanced by existing runners) but will when I get access to the raw data.

I have four categories for infielders - infield hits and reached on error per ground ball retrieved, estimated total of ground ball hits to the outfield which that fielder was responsible for, double plays started and double plays pivoted.

That’s the data set I create expected plays for. TZ, UZR, PM, DRS etc all have their own preferences.


#72    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 18:04

But I can make a pretty damn good educated guess and sleep well tonight.

While you apparently play devil’s advocate and look for obscure reasons to show that the Heaven’s Gate group was right after all.

If you are able to make an educated guess that is more accurate than the metrics - then you don’t need the metrics. Just go about doing what you’re doing, then. You’ll be able to sleep well tonight, if that’s what matters to you.

What I am trying to do, on the other hand, is approach the problems with fielding metrics in a systemic way - one where the conclusions apply equally well to fielders one likes and fielders one dislikes. It’s tougher, sure, and I have to go to bed at night without your sense of certainty over whether or not Ryan Zimmerman is a good defensive third baseman or not. But I’m prepared to accept a little uncertainty here and there.


#73    Mike      (see all posts) 2010/09/17 (Fri) @ 18:23

"ditto to everything Colin just said”

TZ is not an outlier on Zimmerman. My Oliver Fielding Runs (OFR) at THT Forecasts has him at +1.1 FRAA in 2010 (last four years +4, -7, +2, +1).”

Good for you.  Now get out of your mom’s basement and actually watch a baseball game or two.  Then perhaps you might actually change your mind.


#74    Mike      (see all posts) 2010/09/17 (Fri) @ 18:29

"If you are able to make an educated guess that is more accurate than the metrics - then you don’t need the metrics”

My educated guess supplements the metrics.  I wouldn’t deign to propose that mine is the best educated guess on the planet.

But it’s pretty good.

And if you’re prepared to accept the fact that Ryan Zimmerman is not a good defensive third baseman, then methinks thou art a dumbass.


#75    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 18:41

My educated guess supplements the metrics.  I wouldn’t deign to propose that mine is the best educated guess on the planet.

But it’s pretty good.

And if you’re prepared to accept the fact that Ryan Zimmerman is not a good defensive third baseman, then methinks thou art a dumbass.

You know what really convinces people to accept your particular point of view, after previously holding other views?

I’m sorry, if you answered “insults,” that’s incorrect.

If you would spend a little less time on the bon mots and a little more time on the reading comprehension this might actually be a productive conversation. (And trust me - I’ve been through Marine Corps boot camp. You aren’t that good at the demeaning comments. Sorry.)

I never said what I was prepared to accept. Which - if you’re interested to know - is the truth. If you aren’t interested in that, that’s fine. But to get there, you have to start from basics and work your way up. The other way - to start with what you want to know and work your way there - is the road to confirmation bias and sloppy work and anything but the truth.


#76    Mike      (see all posts) 2010/09/17 (Fri) @ 18:57

"I’m sorry, if you answered “insults,” that’s incorrect.

If you would spend a little less time on the bon mots and a little more time on the reading comprehension this might actually be a productive conversation. (And trust me - I’ve been through Marine Corps boot camp. You aren’t that good at the demeaning comments. Sorry.)”

I didn’t answer “insults”.  And if you’ve truly been through Marine boot camp, then you should be fully aware that my mild chiding pales in comparison.

Did your “start from basics and work your way up” lead you to believe that Zimmerman is an average defensive third baseman ?


#77    tangotiger      (see all posts) 2010/09/17 (Fri) @ 19:10

No name calling please.  I can’t believe I’m saying this after telling a 12-yr relative to do that.

***

So I don’t get what you’re saying - are you saying that the principles behind UZR may be correct, it’s just that the data to utilize those principles may not be available? Maybe that’s true, but even so, if the batted ball data is unreliable, any output from UZR must be unreliable to at least the same extent, right?

Yes, exactly.  UZR as an engine is what I’m talking about.


#78    Mike      (see all posts) 2010/09/17 (Fri) @ 19:16

"No name calling please.”

Least common denominator reactions rarely advance the subject of discussion.

But occasionally, they are spot on target.


#79    Brian Cartwright      (see all posts) 2010/09/17 (Fri) @ 19:40

just some facts, only addition and division, no probability

I rate infielders on range, hands, dp starts and dp pivots.

for Hands
der = 1-(ifh+roe)/(ifh+roe+gbo)
For mlb 3b, bunt=f, 2005-2010, mean der is .862

from top 40 ranked by touches
top 5 (total for six year period)

.909  836 Chavez
.902  520 Punto
.896 1252 Lowell
.895 1777 Inge
.894 1343 Rolen

bottom 5
.829  721 Bautista
.823 1144 Encarnacion
.821  641 Gordon
.814  425 Cantu
.814  876 Reynolds

If I select top 100 touches, Freddy Sanchez is top at .923 of 363, worst Ryan Braun .770 of 187.

Zimmerman is 25 of 40, .855 (mean=.862) on 1394 touches. That’s slightly below average


#80    tangotiger      (see all posts) 2010/09/17 (Fri) @ 19:44

Mike: my site, my rules.  I’m moderator.  If you don’t like it, please leave.


#81    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 19:57

Yes, exactly.  UZR as an engine is what I’m talking about.

Okay. So we have three primary batted ball data sources: MLB Advanced Media, Baseball Info Solutions and STATS, Inc.

Do any of them have data good enough to use that engine to evaluate individual fielding?


#82    MGL      (see all posts) 2010/09/17 (Fri) @ 19:58

I hate this dichotomy of “reliable” or “unreliable” (the PBP data).  That dichotomy does not exist.  I don’t know what the baseline is or should be, but you have well-run companies like BIS and STATS using fairly well-trained stringers who check and double check the data.  If the baseline is “the best you can do without using computers and cameras” I would guess that the data is quite “reliable.”


#83    Mike      (see all posts) 2010/09/17 (Fri) @ 20:00

"Zimmerman is 25 of 40, .855 (mean=.862) on 1394 touches. That’s slightly below average”

Well, you’ve nailed it.  Perhaps the Nationals should trade him for a PTBNL. [rolleyes]


#84    Mike      (see all posts) 2010/09/17 (Fri) @ 20:02

"Mike: my site, my rules.  I’m moderator.  If you don’t like it, please leave.”

Tounge-in-cheek, no offense intended.

I’m new here, I’ll ask for a little slack while I get the lay of the land.


#85    tangotiger      (see all posts) 2010/09/17 (Fri) @ 20:12

Colin: the question you are asking is the uncertainty level that the noise in the data contributes to the UZR of each player, and how much better is it using only factual data (like WOWY).

I don’t know the answer other than to say that it is better than WOWY at under 2 years of data, and probably not better at more than 6 years.


#86    Colin Wyers      (see all posts) 2010/09/17 (Fri) @ 20:14

Colin: the question you are asking is the uncertainty level that the noise in the data contributes to the UZR of each player, and how much better is it using only factual data (like WOWY).

I don’t know the answer other than to say that it is better than WOWY at under 2 years of data, and probably not better at more than 6 years.

Why two years? Why six years? Is this equally true for Gameday data from 2008 and BIS data from 2009, say?


#87    tangotiger      (see all posts) 2010/09/17 (Fri) @ 20:18

Good, you are open to adapt and that’s all I need to know.  This is a place that you, and the readers, feel you want to come back to.


#88          (see all posts) 2010/09/17 (Fri) @ 21:39

Perhaps a bit of a tangent (and perhaps already addressed in the comments), but still related to WAR.  The issue I have with WAR, regardless of variation, in that it incorporates single season defensive metrics and their accompanying greater uncertainty and potential greater inaccuracy to the actual skill level being measured as compared to offensive metrics.  Especially when evaluating single seasons.

I understand the accuracy/uncertainty level are what they currently are with defensive stats, but by incorporating the defensive statistics into a single number it creates a perceived certainty that does not exist.  Perhaps this is just the nature of the beast, but it seems to me to result in a false sense of certainty.

I’d be interested in thoughts to this.
Thanks.


#89          (see all posts) 2010/09/19 (Sun) @ 11:17

Just reading tango’s description of the differences leaves me wondering how he could conclude that the blueprints are the same. The goal is certainly the same but the blueprints? Not so much.

As for the conclusion, reason is indeed the only way to approach any issue but taking the midpoint of any position and incorporating it is not necessarily all that reasonable. To continue the house/blueprint metaphor, if builders took the midpoint of differing house plans and built a house, it would be one of the ugliest ever built - kind of like WAR is right now.

Just saying.


#90          (see all posts) 2010/09/19 (Sun) @ 11:21

In 89 I responded to the basic points tangotiger made. Afterwards I read some comments and saw there is another Mike out there - for the record Mike 89 is a different Mike. Guess I need a more differentiating name smile


#91          (see all posts) 2010/09/19 (Sun) @ 15:43

Rally 12 said:

“Do that and still account for position adjustments and the different replacement levels for starter and reliever and you’d have a WAR implementation that I would bow down to.”

Are there any metrics that take into account the quality of competition?

Pitchers would be the obvious starting point, by chance alone they could easily get circumstance that would cause worse numbers in any given year.  I remember reading an article on an old-timey pitcher (from Cleveland I think) that showed he had pitched in tough circumstances over a stretch.  Then there was a rebuttal, by Colin Wyers iirc, that showed the same guy had seen favourable circumstances in later years.  Then there was a rebuttal to that which was a bit nasty and defensive.

It was all good reading, but like you, I was looking for something that took a macro view.


#92          (see all posts) 2010/09/20 (Mon) @ 06:49

Neyer could select the best WAR available and get it posted on ESPN. Then it would be the default because of how many readers it has. Perfect? No. But it would give a certain amount of validation to the stat while the rest of us iron out the details.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:37
What sabermetrics is NOT

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion