THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, January 26, 2012

rWAR v fWAR?  No.  rWAR + fWAR.

By Tangotiger, 12:24 AM

Moving posts from another thread here.


#1    Tangotiger      (see all posts) 2012/01/25 (Wed) @ 10:37

Why the non-love for rWAR?  As far as I’m concerned, fWAR and rWAR both satisfy a need in the marketplace, and for my purposes, I treat them as 50/50 or 60/40.

Both follow the same framework I’ve championed here, and they have different assumptions as to the data.

Nothing wrong with either one.


#2    pierre      (see all posts) 2012/01/25 (Wed) @ 12:35

So, when I dug into it, the way they treated pitchers was quite different, and I very much preferred fWAR.  But I can’t even remember the ins and outs.  Point being, if someone who regularly visits this web site doesn’t really understand rWAR v fWAR, then that’s one WAR too many.


#3    Colin Wyers      (see all posts) 2012/01/25 (Wed) @ 12:43

So, when I dug into it, the way they treated pitchers was quite different, and I very much preferred fWAR.  But I can’t even remember the ins and outs.  Point being, if someone who regularly visits this web site doesn’t really understand rWAR v fWAR, then that’s one WAR too many.

If we are to only have one - and I don’t see why that should be the case, but let’s grant the premise - the decision should probably be left up to someone who DOES understand the difference between them.


#4    Tangotiger      (see all posts) 2012/01/25 (Wed) @ 13:01

I am definitely against the “1” position.  That’s because it depends what you are interested in.

The recent case I’ve been using was Doc v Lee (I think in 2010).  Lee was on the mound when alot of bad things happened with runners on bae.  Doc was on the mound when good things happened with runners on base.  And all those things related just to BABIP.

Now, if you decide that the pitcher is 100% responsible for BABIP with runners on base, then rWAR is your preferred route.  That’s because you believe that runs allowed is 100% owned by the pitcher.

(rWAR does some global adjustment for team fielding, so that “100%” is not really 100%.  It tries to adjust, but obviously, it’s not very specific to the actual events.  It’s like park factors in that case.)

If the pitcher is 0% responsible for BABIP with runners on base, then fWAR is your preferred route.  That’s because you believe that the pitcher owns only the FIP stuff, and you don’t even believe that we should distinguish between FIP with men on base or not.  (Indeed, fWAR even ignores things like SB, CS, PK, BK, WP, PB.)

So, going back to Doc v Lee: what is it you care about?  How they will perform in the future?  How they performed in the past?  How much of the outcomes do we want to actually link back to them personally?

You simply will come across occasions where fWAR is better, and you will come across occasions where rWAR is better.  And where you need to meet in the middle.  Or perhaps, neither of the two is sufficient, and you need to create a personalized verison of WAR, that addresses your particular needs.

Any time you are forced to have “the 1”, you end up with something like pitcher W/L records.  Yes, it is clearly defined, and yes, it never changs.  But, is it really that useful?  Maybe it helps you 10% of the time at the seasonal level, and maybe it helps you 40% of the time at the career level.

Is that what we want from a WAR stat?  That it’s useful say 30% of the time at the seasonal and 80% of the time at the career level?  So, ok, we can come up with “the 1”.  As long as we understand that it won’t answer all of your questions. 

At least with things like OBP and FIP, it’s clear how limiting they are.  They are a subset of something.  They can’t possibly answer all of your questions.

Creating “the 1 WAR” will hide what is obvious in other metrics, that they are limited to its assumptions.


#5          (see all posts) 2012/01/25 (Wed) @ 14:02

From my simple perspective, I think rWAR (at least as far as this topic is concerned - pitchers) does a better job of describing what has happened while fWAR does a better job of projecting what is likely to happen in the future.

EG, while I understand BABIP and the implications, when thinking about something like an MVP award, I am not comfortable absolving a pitcher for all that may have happened with balls in play while he was on the mound.  However, when, say, thinking about my outlook for 2012 - I’d go with fWAR.


#6    Tangotiger      (see all posts) 2012/01/25 (Wed) @ 14:07

Right, fWAR does absolve him completely, but rWAR implicates him completely.

It’s not clear to me that we want all of the BABIP that happened while Lee was on the mound with runners on base to be absorbed completely by Cliff Lee in 2010.  Yes, he was on the mound.  Yes, he’s the biggest single culprit.  But he has 8 other defenders there too.  They do exist, and they did contribute.

Hence, rather than arguing about it, I just go with a 50/50 approach, and call it a day.  Once we have it all figured out, 50/50 will be alot closer to the real answer than 100/0.


#7    mettle      (see all posts) 2012/01/25 (Wed) @ 16:38

Can’t one calculate the degree to which pitchers control BABIP w/ RISP? Shouldn’t it not be a belief, but a quantifiable measure?

I mean, short term, yes, I understand that we’re a ways of of having this figured out. As another example, technically catcher framing has some role in strikes and therefore strikeouts and walks. So, technically pitchers should only be credited with, say, 99% of the Ks and BBs they get.

So, with split-half correlations and quantifying distributions and a precise quantification that a pitchers has, say, control over approximately .02 variance in BABIP on average, we should be able to calculate a correct WAR, including this factor for BABIP w/ RISP.


#8    Rally      (see all posts) 2012/01/25 (Wed) @ 17:20

One thing to consider is how far you want to go in making this an accounting system - things happen on the field and the credit has to wind up with some player.

rWAR does this better than fWAR.  Components will add up to actual park adjusted runs scored and allowed.  But it does not go as far as win shares - while rWAR reconciles to runs, it does not take into account extra wins and losses from beating your pythag.  fWAR will not reconcile to runs.  You could have a pitcher horrible with people on base, excellent FIP stats, and an average team defense behind him.  His pitching WAR + team UZR will not reconcile at all with the runs actually allowed.

I don’t think there is a single right answer to this.  Analysts will disagree on whether making it work as an accounting system, and to what extent to do this, is desirable.  And that’s why there will always be different WAR calculations.


#9    Tangotiger      (see all posts) 2012/01/25 (Wed) @ 17:52

mettle: the answer will be different if you have one game, one hundred games or one thousand games for a player.

Say you a pitcher that allows 12 hits on 20 balls in play in one game.  Well, we have no idea how much the pitcher was responsible for that.  Regression would say that we’d credit him with 6.1 hits allowed.  If another pitcher gets a no-hitter on 20 balls in play, regression would credit him with 5.9 hits allowed.

Is this what you want?  I don’t know.  You may or you may not.

What happens to the other 6 or so missing hits?  Does the fielders get credit or debit for the 5.9 other hits to balance it out to 0 or 12?  I don’t know, you may or you may not.

Maybe we simply credit the whole 5.9 missing hits to “timing” and be done with it? I don’t know, you may or you may not.

Now, if you have a career of stats, if you have a pitcher that allowed 1200 hits on 2000 balls in play (not that such a pitcher exists), then it’s a whole new ball game.  Now, we definitely would NOT want to credit him with 610 hits allowed.

It all depends basically, which is why it makes no sense to think of things as “the 1”.  You are doing nothing short of fitting a square peg in a round hole.  Don’t do it, as you’ll end up causing problems for yourself.

Respect what the data allows you to do, and no more.


#10    Tangotiger      (see all posts) 2012/01/25 (Wed) @ 17:56

Rally: right, it depends how far you want to go.  Bill James wants to make sure it all adds up at the team wins level, so he’s going to force that in, distributing the “unaccounting” for wins to the players in some proportion.  WPA doesn’t have this problem.

Rally does the same, except he forces the thing to add up at the runs level.  But, if he doesn’t rely on performance by the 24 base-out states, he also is going to end up fudging things.  RE24 doesn’t have this problem.

Well, WPA and RE24 don’t have those problems, because the implementations take a shortcut and avoid all discussions of the fielders.  It helps when you only look at it from the batter and pitcher viewpoint.  Indeed, the batter gets all the benefit or cost if a runner runs himself out of an inning, if only because we don’t know why the runner was putout.  (The PBP data doesn’t necessarily tell us that.)

There’s shortcuts everywhere, so you have to be careful as to what everything is actually telling you.


#11    SG      (see all posts) 2012/01/25 (Wed) @ 20:19

Everyone focuses on the BABIP component of fWAR, but my biggest issue with it is that it ignores sequencing. 

Who’s more valuable, the pitcher who gives up HR/BB/BB/BB and then gets out of the inning or the pitcher who gives up BB/BB/BB/HR and then gets out of the inning?  According to fWAR, they’re equally valuable,


#12    Tangotiger      (see all posts) 2012/01/25 (Wed) @ 20:25

SG: yes, and that could very well be right, depending on the question you are trying to answer.

Since there’s very little talent to sequencing to begin with, a metric that relies heavily on sequencing doesn’t tell us about the player, so much that it tells us about timing and circumstances.

I’ve been saying this for a decade now: if someone wins the lottery, do we want to count that as something that tells us about that person, or would you rather split the person’s income into: earned through talent, and earned through random variation?

Each person has his own question.


#13    SG      (see all posts) 2012/01/25 (Wed) @ 20:30

Since there’s very little talent to sequencing to begin with, a metric that relies heavily on sequencing doesn’t tell us about the player, so much that it tells us about timing and circumstances.

Sure, but what if we are more concerned with how a player’s performance helped his team and not his talent level?  I think of value in terms of how a player’s performance helped his team, and not necessarily about how good or bad he is. 

I have no problem with people who don’t think that way, it’s a preference thing for me.  But it’s why I prefer rWAR, especially for pitching.


#14    Tangotiger      (see all posts) 2012/01/25 (Wed) @ 21:07

Right.

So, for you, you care about sequencing, so fWAR is not for you.  It’s not an “issue” with it… it’s its raison d’etre.  It highlights and wallows in the fact that it ignores sequencing.  It’s a feature, not a bug.

From your perspective, fWAR has “issues”.  But, fWAR was not designed for you.

You may as well say that you have issues with two-seater convertibles, because you have a family of six, or you have issues with the Bee Gees because you can’t stand the beat.


#15    Myron      (see all posts) 2012/01/26 (Thu) @ 01:01

Padres shortstop Jason Bartlett’s 2011: .1 rWAR and 1.8 fWAR.

fWAR has his fielding at -1 runs, rWAR -4, so it isn’t all fielding.

I’ve wanted to look at this closer, but I’m wondering why such a discrepancy (obviously, it’s only one player) ... it appears fWAR sets the replacement level a little bit lower.

Also, rWAR has his hitting at -16 compared to fWAR at -12. Could be the offensive formulas used or perhaps how they handle park factors (Petco is an extreme one).

FWIW, Bartlett’s career: 18.8 fWAR, 13.3 rWAR.


#16    Colin Wyers      (see all posts) 2012/01/26 (Thu) @ 01:36

Everyone focuses on the BABIP component of fWAR, but my biggest issue with it is that it ignores sequencing.

Who’s more valuable, the pitcher who gives up HR/BB/BB/BB and then gets out of the inning or the pitcher who gives up BB/BB/BB/HR and then gets out of the inning?  According to fWAR, they’re equally valuable,

But that’s a very unrepresentative case, isn’t it? Typically sequencing also includes BIP as well, so you don’t just have the question of how you handle sequence but how you handle sequencing of a pitcher’s BABIP.


#17    mettle      (see all posts) 2012/01/26 (Thu) @ 01:54

9/ Right, small sample size is absolutely an issue for any one individual pitcher, but you can still estimate/calculate a true talent for BABIP w/ RISP for any one pitcher, regardless of sample size, via regressing to the mean of all pitchers.
I mean, you know this far better than I, but one should, in principle, be able to calculate a true talent BABIP for any pitcher and use that to calculate a real WAR that’s somewhere between actual performance and xFIP.
No?


#18    Dan Strittmatter      (see all posts) 2012/01/26 (Thu) @ 01:59

I tend to dramatically prefer fWAR to rWAR with hitters, simply because the continued use of OPS+ by rWAR utterly baffles me.  Park-adjusted proxies?  That’s a lot of work to turn a proxy into, well, still a proxy, and still overvaluing slugging (to Myron’s point, this is why rWAR doesn’t like Bartlett - he can’t hit for power, but he gets on base, and rWAR can’t distinguish between a point of OBP and a point of SLG; fWAR can acknowledge Bartlett’s OBP skills for their true value).

However, using solely fWAR puts you at the mercy of UZR’s whims, so my strategy tends to be to collect as many different defensive metrics as possible to get a general sense of defensive value, then adjust fWAR for the difference between the consensus of those and what UZR says.  Helps take some of the volatility out of the most volatile part of fWAR, leaves all the wRC+’y goodness in.

For pitchers (well, for everything, but for pitchers in particular), it all depends on context, even if you’re using both to project future performance.  The first example that comes to mind showing the all-importance of context to me is Joe Saunders and his latest contract with Arizona.  His strikeout allergies and BABIP-reliance have caused huge differences in fWAR and rWAR values throughout his career, with his fWAR values remaining relatively low and consistent, while his rWAR values have jumped all over the place.  2010/2011 was no exception - 2010/2011 rWAR & fWAR: -0.2/2.4 & 1.7/1.0.

When a team with an average defense looks at acquiring Saunders, they’d probably be more inclined to use fWAR - after all, they don’t have the defensive capabilities to try to suppress Saunders’ hit rates, so his pure peripheral metrics probably indicate his (relatively low) value to that club.  However, when Arizona looks at Saunders’ skillset, they see a pitcher who will get a ton of fly balls that their stud-alignment of Parra, Young, and Upton can flag down with relative ease.  Thus, Arizona can feasibly expect Saunders to keep up his low-BABIPing ways of 2011 and contribute value closer to his rWAR totals from 2011.  Why value him by fWAR when you already have the supporting cast around him to make him more valuable than fWAR would indicate?

Two teams, both looking at the future performance of the same pitcher, yet different WARs apply because of the contexts.  I typically prefer fWAR for pitchers, but, as Tango wisely noted, it’s not a traditional “v” scenario between them - it almost never is with statistical metrics - it’s a “+” scenario, but one in which there usually is one metric that should be deemed more accurate, depending on the context.  The value in an analyst isn’t in reading the rWAR and fWAR totals, it’s in figuring out which one is the best to use.


#19    Myron      (see all posts) 2012/01/26 (Thu) @ 02:28

Dan/18,

I need to brush up on the specifics, but rWAR uses batting runs for offense I believe, not OPS+.


#20          (see all posts) 2012/01/26 (Thu) @ 02:39

Really? I suppose I’ve always been under that impression, but that’s possibly because it’s technically the most all-encompassing metric features in the top portion of the main B-R pages for position players.  It’s also the only metric listed on the “bat glossary” page they have.

http://www.baseball-reference.com/about/bat_glossary.shtml


#21    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 08:31

rWAR belongs to Rally, not Forman.  It uses Linear Weights, not OPS+.


#22    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 08:42

mettle: you still have the issue that you have a pitcher who allows 12 hits on 20 BIP.  What do you do with that?

***

fWAR does have a lower replacement level.  Roughly speaking, it’s about .333 for rWAR and .290 or so for fWAR.


#23    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 09:11

I should say: “A pitcher who was on the mound for 12 hits on 20 BIP.”

For example, you have double plays and DP opps… do we ONLY look at the 2B, or do we also consider the SS and 3B (and even 1B)?

You have BIP… is it only about the pitcher, or is it also about the rest of the fielders?

Just because we have CHOSEN to assign the hits to the pitcher (just like we have CHOSEN to assign the wins to the pitcher) doesn’t mean that’s true.

Why not track hits on BIP by SS and by CF (even if they were not involved in the play at all)?  So, because we don’t know who was involved, we won’t track it at all?

But because the pitcher was always involved, we ONLY track it by pitcher?

It’s a lie.  It’s a tidy lie.  A tidy lie that allows you to make false conclusions, simply because you want to make SOME conclusion.

The reality is that we have an honest mess around here.  Treat it as a mess, and don’t clean it up by sweeping it under the rug.

I thought you guys were men.  Around my house, my socks can often be found on the floor.  All of a sudden, you guys want to clean the house?


#24    rempart      (see all posts) 2012/01/26 (Thu) @ 11:01

fWAR pitching only being available back to the mid 70s is a major downer for me.

Baseball Seamheads pitching WAR uses DIPs 2.0 in it’s calculation.


#25    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 11:14

rempart: hopefully, that’ll change in the coming months.  I’ll be talking with David about this, no doubt.


#26    kds      (see all posts) 2012/01/26 (Thu) @ 11:59

Tango #23, lol.  The problem is the guys treating this messy abode as if it were neat.  First you must acknowledge the mess.  Then you may be able to study it and see some order in the mess.  You may also be able to do some cleaning up. 

mettle, don’t we have strong evidence that the basic regression formula for pitcher’s BABiP is (actual BABiP * BIP + 3700 * league average BABiP)/
(BIP + 3700).  But it takes at least 6 years for a full time starter to have 3700 BIP. So what % of PA have a RiSP?  I’ll WAG it 25%.  I see no reason that the regression equation for BABiP w/RiSP should be much different than the general equation.  Which means that even a long career won’t have enough BIP w/RiSP to regress as little as 50%, and on the single season scale we are going to be regressing more than 98%.  And of course w/RiSP BACON and SLGCON become even more important.  I think the signal to noise ratio is going to be so low that there isn’t much to gain here.  Mo Rivera’s career BABiP w/RiSP would be regressed maybe 80% or more.


#27    aweb      (see all posts) 2012/01/26 (Thu) @ 12:08

Why isn’t it available further back? Is it establishing a replacement level over different eras, or deciding the relative worth of various outcomes (which also changes over time)?

The data certainly seems to exist, and it is a formula-centric approach.

I like the fWAR+rWAR approach for a within year, and rWAR increasingly over time (pitchers only), since fWAR will never give credit to unusual skills/flaws that may hurt runs allowed (inability to control running game, pitching from stretch, etc).


#28          (see all posts) 2012/01/26 (Thu) @ 12:16

Welp, looks like I provide further merit to that annoying yet accurate truth about assumptions. It wasn’t totally arbitrary and contrived from nothing, but clearly was unfounded. Ugh.

Regardless, it seems there is a sense of agreement here. Both systems have their flaws when it comes to assigning the value fluctuations of BABIP to the pitcher or not to the pitcher, and both systems seem to know it, yet the systems seem to gravitate to the extremes of the situation simply because a) it’s convenient and simple to understand, b) someone had to, and c) it’s easier than trying to justify a formula or method of assigning responsibility that would probably ultimately rely on defensive data that is imprecise (or assume randomness).


#29    Colin Wyers      (see all posts) 2012/01/26 (Thu) @ 12:19

Why isn’t it available further back? Is it establishing a replacement level over different eras, or deciding the relative worth of various outcomes (which also changes over time)?

It seems to coincide with the availability of full play-by-play data. I’d guess it has to do with handling the starter/reliever split.


#30    Suicide Squeeze      (see all posts) 2012/01/26 (Thu) @ 13:42

#24-29:

I think at least part of the reason that fWAR isn’t available that far back is because we don’t know if the assumptions in FIP (por ejemplo, that BABIP is largely noise) hold for those eras.  I feel like I’ve seen that answer in a chat, but Google is failing me.


#31    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 14:08

The way Fangraphs evolved is that it used STATS and/or BIS for its play-by-play, and some other source for the seasonal-historical (maybe Lahman DB?).

BIS goes back to 2002, so that’s why you see alot of the PBP splits only back to 2002 (I think).  UZR only goes back to 2002, etc.

The Retrosheet event files came along at some point, and they were stable from 1974-onwards, so that got converted (to whatever database format Fangraphs is using).  So, I think you see WPA and LI and stuff go back to 1974.

Now that Retrosheet has released more data in the past few years, that data needs to now get converted as well.

David’s a one-man show, and he obviously is giving his readers something more centric to current stats, incorporating PITCHf/x, etc.  Forman is also a one-man show, and he’s got his focus more on the historical PBP.  Woolner/Clay had a big lead time over Sean and David, so that’s why you’ll see stuff at BPro that you might not see elsewhere.  And now with Colin, he’s streamlined some new processes as well.

The really short answer is: time.


#32    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 14:10

As for historical BABIP: I looked at this 10 years ago, and I seem to remember that the year-to-year correlations were about the same.  Don’t quote me on that though.


#33    mettle      (see all posts) 2012/01/26 (Thu) @ 14:15

23/26

Suppose BABIP for all pitchers w/ RISP is .305 (let’s say there’s a real effect of using the slide step instead of a full wind up).

Furthermore, let’s suppose 10% of the variance in BABIP is accounted for by pitcher and 30% of the variance in BABIP is account for by defense (and 20% the hitter, let’s say 40% unexplained).

So, if a pitcher gives up 12 hits on 20 bip w/ RISP, why can’t we just regress that to the mean while including a factor for incorporating explained variance via pitcher. In this instance, the pitchers’ true talent BABIPRISP may be .306 or whatever the calculation comes to, then we can calculate a true WAR based on that.

Since we’re using all pitchers, with all defenses, to regress to, that neutralizes the effects of defense in general.
Also, by using what we calculate with respect to explaining BABIP variance, we can quantify who is responsible, rather than using belief and say pitcher is 100% responsible or 0% responsible. The blame should be completely estimable to the best of the data we have.

(I don’t follow your socks point - this is honest inquiry here, not an attempt to criticize what you’ve done in any way, if you’re suggesting as much.)


#34    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 14:35

I see no reason to use rWAR for hitters. The problems with UZR are that the subjectiveness of the hit location may be off 10 feet or something. The problem with TZ is that the estimated hit location might put something in left field when it’s in right field. TZ is not close to UZR for me. Everybody complains about UZR’s reliability, but I did a study where I found that UZR/150 is as reliable as wOBA at 650 PAs at 1500 Innings. You definitely don’t need 3 years of data. fWAR is clearly better than rWAR for hitters. For pitchers it’s tricky. fWAR is ignoring like 60% of the PAs, but rWAR is measuring some of the defense. Look at pitcher run support numbers compared to teammates. Why would pitcher defense support be any different? PZR would be so much better than either of these (hint hint MGL), but I would probably use fWAR, at least it’s measuring what the pitcher did. rWAR is reliant on teammates, don’t we hate wins, rbis, and runs scored for that reason? I’m trying to create my own gameday fielding metric so I can make a PZR like stat, but it’s really frustrating because I can’t run Mike Fast’s parser script. It dosen’t work.


#35    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 14:39

Another thing I forgot for baserunning, UBR measures advances and stuff like that. rWAR dosen’t. It uses available baserunning stats but no base advancing, that would be so easy to incorporate into rWAR.


#36    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 14:54

mettle: if you regress 12 for 20 and 0 for 20 down to 6.1 for 20 and 5.9 for 20, are people going to be happy with that?

Good luck explaining to someone that a pitcher who was involved in a no-hitter is going to be credited with allowing 5.9 hits.

***

But, let’s extend it further.  Let’s say that there is NO TALENT for pitchers once the ball is in play. (Yes, there is talent, but let’s presume there isn’t for this discussion.)

One pitcher allows 12 hits, another allows 8, another allows 5, etc.  But, we’ve established there is no talent.

I win a lottery for 564$, you win a lottery for 234$, and the other guy spent 1200$ on tickets and never won a thing.

In terms of “performance”, do we want to track all of that, or do we simply not want to associate any of those outcomes to each person?

Does it matter that the pitcher has fielders, and so, we want to attribute something to them?

In my lottery winnings, it was an office pool, but I’m the one that bought the tickets.

How is it you want to account for all that?


#37    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 15:00

rWAR and fWAR are pretty close for hitting.  Indeed, rWAR does more (GIDP, ROE).

rWAR and fWAR disagree on fielding, obviously.

I wouldn’t conclude that fWAR is necessarily better than rWAR, on the basis of UZR v TZ.


#38    Suicide Squeeze      (see all posts) 2012/01/26 (Thu) @ 15:05

#34:

“fWAR is ignoring like 60% of the PAs”

Remember that FIP doesn’t really ‘ignore’ batted balls, it just treats all of them as the same thing, much as it treats Ks and backwards Ks the same.


#39    Matthew Cornwell      (see all posts) 2012/01/26 (Thu) @ 15:25

When wanting to get a read on a younger player’s career so far or predict a future season of a new player, I always look at fWAR - probably for the first 4 years or so of a player’s career.  For players with long careers and large sample sizes of BABIP/sequecning events, I definitely prefer rWAR- probably from 8 years on or so.  For guys who have been in the league about 5-7 years, I look at both, similarly to what aweb is describing. I never average the two exactly, since they are on slightly different scales due to replacement level differences and rWAR including an AL vs. NL adjustement (which FG does not).  Over a 7 year career, these two factors could mean about a 4+ difference between fWAR and rWAR (if it is durind a period with a clearly superior league). 3-4 WAR over 7 years seems significant enough to me to at least consider it, but it is a prefrence, I guess. The difference in a single season isn’t big enough to worry about, of course.


#40    mettle      (see all posts) 2012/01/26 (Thu) @ 15:26

Thanks for the lottery analogy—that helps.
Since we do have WPA to tabulate (lottery) winnings, I guess I’d be inclined to have WAR zero portion out.
Also, zeroing it out is an idealized situation since pitchers do have some control over BABIP, albeit a small amount. It seems logical to me that the pitcher should be credited with precisely the amount of a hit they have control over. Using your lottery point, since there is a tiny bit of strategy in lottery (e.g., you can maximize payoff by picking numbers > 31 since others tend to use birthdays), I’d give the lottery players $.01 of credit, or whatever that turned out to be. But again, yes, I see your point.

However, on your first point, darn straight I would allow for the pitcher to be responsible for 5.9 hits in my calculation for WAR. The system shouldn’t be idealized for explaining to Murray Chass, but for producing the most accurate evaluation.


#41    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 16:12

Tango, but dosen’t TZ estimate where the ball goes, dosen’t rWAR not even use TZL, just the basic TZ. Why would you prefer estimation of hit location to hit location that could be maybe 10 feet off.


#42    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 16:14

I think something like tRA would be better for fWAR. Uses batted ball data for balls in play and uses HRs. Also a problem with FIP is that it uses innings pitched which is not defense independent.


#43    Matthew Cornwell      (see all posts) 2012/01/26 (Thu) @ 16:53

Then you run into the issue of batted ball data making large assumptions about pitcher profiles, like PZR and the like ignoring the fact that extreme ground-ballers and extreme change-up pitchers reduce BABIP on groundballs.  And there aren’t just a few exceptions either.

Given a large enough sample size, I would prefer looking at pitcher’s BABIP compared to mates or using team DER than using batted ball assumptions.  With an appropriate regression for luck, of course.


#44    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 17:07

BoSox: bias.

***

Matt/39: good.

***

Mettle/41: ok, if we’re going to regress BABIP, why not everything else?  A hitter gets 3 HR.  You have to regress that too.  He goes 5-5, all singles.  Guess, what, he didn’t, under the regression scenario.

Indeed, at the game level, because the sample size is so small, you are going to regress basically 95 to 99% of the performance back to the player’s true talent level.  If you are going to do that, why not just regress 100%.  And if you do that, then you don’t care about individual games.


#45    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 17:17

Tango, my question is though, the bias on BIS hit location may be about 10 feet, a TZ estimate can think the ball goes to the third baseman when it goes to the left fielder.


#46    Anon      (see all posts) 2012/01/26 (Thu) @ 18:40

#45, 10 feet on balls in the infield, maybe, especially if it’s close to a base.

In the outfield though? I highly doubt BIS is anywhere near to 10 feet of accuracy (that’s pretty much 3 steps), there are so few points of comparison within the field of view of most broadcasts that they’re getting their video from.

Although you use the word “bias” which isn’t exactly accuracy. But I’m not sure what the bias will be in this case.


#47    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 19:41

The bias is that balls are MARKED as being hit closer to certain fielders or certain types of fielders.

For example, when MGL runs UZR against STATS-marked data and BIS-marked data, I found a 100 run difference for Andruw Jones over a period of 6 or 7 years.  There were huge differences for a few centerfielders (Cameron, Beltran), and I think for Ichiro, something on the order of 50 runs over the same time period.

Now, why would this be?  It could be a stringer-bias, that the way BIS collects data is much different from the way STATS collects data.

***

Now, when it comes to one play or one game, or even one month, that stringer-bias will be better than what TZ or WOWY or what nFRAA does.  That’s because the random variation of those metrics will overwhelm the stringer-bias.

But, as you get more data, that stringer-bias will get exacerbated, and so as the sample size increases, the random noise decreases, but stringer-bias gets STRONGER.

So, a metric like UZR, say for 7 or 10+ years, might do worse, because that bias is simply sticking to a player.  So, Andruw Jones was either a league-average CF from 2002-2009, or he was the best fielder in the league, depending which source you use for UZR.


#48    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 19:53

I think WOWY is by far the best of those 3 metrics. But why not use TZL over TZ then? And what if the pitching staff fore some reason has some strange ability to make balls go to one side? Wouldn’t UZR be better in that case?


#49    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 19:56

Tango, I have a question. Would you prefer if fangraphs used UZR in season so we could have in season WAR, and after the season switched to ADR? I’m not sure about this. Would fans scouting report be a useful tool to combine with UZR, DRS and TZL. Maybe a gameday based fielding metric would be good too, because then we would have 4 different data sources.


#50    Tangotiger      (see all posts) 2012/01/26 (Thu) @ 21:38

Again, it depends what you are interested in.

Fans much prefer un-regressed, and perhaps un-adjusted, stats to anything.

In my case, I’m in the minority: I care about true talent.

***

As for TZ and TZL: what’s the difference between the two?  I didn’t know Rally had two versions.

In any case, Total Zone does give you general direction.  It knows generally where all the outs went, and it knows kinda where all the hits went.  So, we can infer where the balls went, sorta.

In return for that lack of precision, we also are not saddled with that much bias.

UZR still rules for at least up to 2 years of data, and maybe as much as even 6 years of data.  Somewhere around there, at 6 years, UZR starts to lose its lustre.


#51    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 22:22

Rally explains TZL here http://baseballprojection.com/special/tz_hitlocation.htm It’s an improvement on TZ. It uses hit location for a couple of things like infield singles. It actually changes the number a decent amount. Fangraphs has TZL under advanced defense and basic TZ under standard.

I think regressing is if you want to find true talent, but for WAR, I don’t think you should.


#52    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 22:26

Oh, and ADR dosen’t mean regressed, it’s fangraphs’ aggregate defensive ratings. They have them on the player card. It’s FSR, TZL, UZR, and DRS weighted 1333. FSR is 1


#53    BoSoxFan      (see all posts) 2012/01/26 (Thu) @ 23:15

I did a little research. I found these correlations when switching teams mid season since 2003. I had 77 players. UZR .53 RZR .74 DRS .37. I was very surprised that UZR was so much better than DRS. I expected RZR to win because of no batted ball data, but not by that much. Maybe this means that batted ball data is more due to park and less due to scorer inaccuracies. This probably means it could be almost fixed with a quick battedball park factor. What’s up with DRS though?


#54    BoSoxFan      (see all posts) 2012/01/27 (Fri) @ 15:10

I think Peter summed it up nicely in this thread http://www.insidethebook.com/ee/index.php/site/comments/how_lucky_has_scott_rolen_been_with_his_opportunities_to_field/#47 why I like fWAR better than rWAR


#55    Tangotiger      (see all posts) 2012/01/27 (Fri) @ 16:16

That was an amazing thread.


#56          (see all posts) 2012/01/29 (Sun) @ 01:59

#53. Half season UZR correlation for a given player is 0.5 per a previous thread here.

I guess if you tested 77 other players (selected at random) and compared their 1st half and 2nd half performances in a given year and same team and found similar correlations, or not, it might be interesting.


#57    Sean Forman      (see all posts) 2012/02/07 (Tue) @ 16:06

"Another thing I forgot for baserunning, UBR measures advances and stuff like that. rWAR dosen’t. It uses available baserunning stats but no base advancing, that would be so easy to incorporate into rWAR.”

This is wrong.  It does include advances.

I clearly need to add a soup to nuts WAR breakdown on the site.


#58    Tangotiger      (see all posts) 2012/02/07 (Tue) @ 16:54

Sean/57 was marked for moderation and is now open.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 11:41
Do pitcher’s reach back for velocity when needed?

May 25 11:33
“Why Kickstarter works”

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 10:14
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 17:04
Firefox, IE, or Chrome?