Thursday, January 26, 2012
rWAR v fWAR? No. rWAR + fWAR.
Moving posts from another thread here.
Buy The Book from Amazon
Michael does a good job at starting the reader from scratch. If you’ve been a bit confused by run expectancy and linear weights, perhaps this article will help you out.
Not that it makes much of a difference, I recommend this page for your run expectancy needs:
http://www.tangotiger.net/re24.html
It’s based on more data, the run environment isn’t as extreme as the oft-cited 1999-2002 page, and gives you extra charts too.
Clay starts with all the plate appearances, and removes hits, walks, and hit batters. Basically, he’s left with outs and some form of reaching base on error. He compares the actual change in run expectancy (RE24 on Fangraphs) to what it would have been if the plate appearance was a strikeout (i.e., one out, no runners advance). The leaders and trailers:
Josh Hamilton, Tex 20.1
Derek Jeter, NYY 16.2
Juan Pierre, CWS 15.7
Omar Infante, Fla 15.5
Carlos Lee, Hou 15.2
Hideki Matsui, Oak 14.9
Angel Pagan, NYM 14.8
Ichiro Suzuki, Sea 14.6
Elvis Andrus, Tex 14.5
Alcides Escobar, KC 14.4
...
Brian Bogusevic, Hou -1.3
Matt Domiguez, Fla -1.5
Brad Hawpe, SD -1.9
Brian McCann, Atl -1.9
Pedro Alvarez, Pit -1.9
Matt Wieters, Bal -2.0
Matt Holliday, StL -2.8
Ryan Adams, Bal -3.0
John Buck, Fla -3.3
David Ortiz, Bos -5.1
Because he included reaching base on error, that probably explains Jeter and Pierre and Ichiro. My preference would have been to keep them separate, but, not that big a deal.
You should also note that it’s easier to get a plus than a minus. That’s because, overall, a strikeout is more negative than a non-K out (especially if reaching base on error is included in this category). Typically, a K-out is about .01 to .02 runs worse than a non-K out (again, depending how you handle the errors). Clay shows the team totals, and the average is that a non-K out is 80 runs better for a team than a K-out. That works out to 9 runs per 162 games per player (which is about .02 runs per out), or say about 7-8 runs for a typical regular player. (And you could have figured out that yourself, since the leaders/trailers endpoints are at +14.4 and -1.3, and the halfway point of that is 6.6 runs.) It might be better if Clay readjusts his numbers above to force the zero-point.
This is how I like to visualize the various components of wOBA. Here’s what to look for:
1. The green area under the 1.000 line is equal to the green area above the 1.000 line. This implies that the average value of the events equals 1. This forces wOBA = OBP.
2. The gap in the red area gives you the value of each event above the 0.333 baseline (or whatever the league average OBP is). This gap is exactly proportional to the Linear Weights run values.
Because of these two points, you get the wOBA values, of roughly 0.7 for a walk, 0.9 for a single, 1.3 for a double/triple, 2.0 for a HR. That’s all there is to it.
There are two great implications of using wOBA:
A. Even though wOBA IS Linear Weights, I have never ever ever had to explain what a “negative run value” is. It just doesn’t exist in wOBA. It’s hidden by having the baseline lowered as I have.
B. Because it is coupled with OBP (the same mean, and the same denominator of plate appearances), and OBP is perfectly suited for the binomial distribution, wOBA takes on similar characteristics. Not exactly, but close enough for our purposes. Whereas the binomial would say p*(1-p), in this case, we’d use p*(1.1-p).
Hence, my love for wOBA.
I like debates where both sides make reasonable points.
Anyway, my opinion as to the reason that Howard is (currently) seen as elite, but should be seen as average: he has had star to superstar seasons in the past, but he hasn’t had those in the last two seasons. There’s a huge difference when you hit 45 to 58 home runs in those star seasons, and when you hit 10 to 15 fewer home runs today. We’re talking about 20 runs of value that has simply disappeared on power alone. That’s 2 wins, and that turns a star player into an average player.
Even relying on the old school stats, Ryan Howard had 136 to 149 RBIs from 2006-2009, and in these last two seasons, he’s down at least 20 RBIs. His runs scored was 94 to 105 in his peak, and he’s down now at least 10 runs scored.
If you look at his wOPS (weighted OPS): from 2010-2011, among the 30 1B+DH with at least 810 PA, he’s #11. So, just on the hitting side among his peers, he’s barely above average. Add in his below average fielding and running, and you have yourself an average player.
He has been very clutch though. And I think that is what keeps his elite status in play (among his supporters, anyway).
The work I’m most proud of is the Markov calculator. Baseball is a really simple game: you get on base, you move over, until you make too many outs. It’s one of the easiest thing to program. You can’t run backwards, you can’t jump bases, you can’t have 4 outs or 2 outs in an inning. It’s very structured, very easy to program. Every single person who reads this site needs to try out the Markov calculator. Seriously.
(The only limitation to the calculator, and what makes baseball a bit tougher to code, is that you can have runners out on base. This would turn a very simple program into a fairly complex program. I’ve offered the simple one, for free, with the source code available to all.)
Anyway, using the default values, you hit calculate, and we see that that team will score 4.905 runs. Now, what happens if this team does not hit any HR? Well, click the back button, change the “1” to a “0”, and change the hits from 10 to 11.2. (This is because trading a HR, which has a wOBA value of 2.0, for 2.2 singles, which has a wOBA value of 0.90, is a fair trade. So, we lose a HR, but gain 1.2 hits.) You end up scoring 4.944 runs. That’s 0.039 more runs scored by NOT having a slugger.
Basically, alot of the value of singles and walks is not realized, because you get HR hit. By not having any HR hit at all, each of those events takes on greater importance, as they feed off each other.
As an aside: Ty Cobb and players of his era should not be judged by standard linear weights. The run value of a single shoots way up when there are no HR hit, by something like +.07 runs per single. But the value of an out also has more impact by an extra -.04 runs per out. Cobb gets more hits and makes fewer outs, so his value goes up more than what we’d use with standard linear weights.
Fangraphs has had it for a while. In the section “Background on WAR”, there’s a link to a 15 part article as to how to calculate WAR. I also had a basic thread on the matter three years ago.
This post is in response to a Primer reader who said:
I again ask where I can find a detailed breakdown of how WAR components are calculated. I don’t see it on Sean’s site or on b-r. If the answer is that this information is not public, then that’s the answer. Is that the answer? Is this stuff black box?
You can also see all the leaders here, with the breakdowns (and David allows you to export it to Excel too). One thing I’d like to see from Fangraphs is to combine fielding+position (as an option). This way, you can better display the “best fielders”, when you do a sort. I understand you may want the two separate. I’m just saying to have both (separate, and as a group of two).
a) WAR is wins above replacement.
b) WAR is a framework.
c) WAR presents the performance of a player into a single number.
d) WAR is limited to the data points it considers.
e) WAR is limited by the bias in the data.
f) WAR is not all-encompassing.
So, what does all that bullsh!t mean?
Someone at BPro was asking about the value of a called pitch. This is how I explain it quickly:
A crude way to think about the run value of a strike or ball is this way:
The run value of a walk is around +.30 runs, and the run value of a strikeout is around -.27 runs.
So, going from an 0-0 count to a 4-0 count means that each called ball is +.075 runs.
Going from 0-0 to 0-3 means that each called strike is -.09 runs.
That means that switching a called ball to a called strike is going from being at a +.075 run state to a -.09 run state, or around a .16 run swing.
So, getting that one call every game for 150 games means .16 runs x 150 calls = 24 runs.
This is just a quick crude way to try to frame the expectation.
I did a two-post followup to the poll at Fangraphs.
Part1, Part2.
Linear Weights for pitchers.
It requires using a conditional clause. This is what you say:
IF you intend to look at numbers, THEN this is one of the best ways you would look at them.
If they come back with:
But what about what’s not in the numbers? What about heart? How can you rely only on numbers?
Your response is:
I’m not disagreeing with you. All I said is IF.... IF you intend to look at numbers. Whether the numbers tell 100% of the story or 10% of the story, I’m not suggesting either way. The only thing I’m saying is IF… IF. You can decide for yourself how much weight the numbers should get, and how much weight non-numbers should get.
And if they say:
What about wins?
Your response is:
That’s a number.
Excellent little article by Schoenfield. Once you see that, then you have no choice but to move on to something better. You can’t look at all the holes in RBIs and then… be content to use RBIs. Eventually, you make yourself to this article by Ruane. And then finally, you get to Linear Weights and RE24.
As we’ve found out, FIP does not work at all run environments. The coefficients work for pitchers if they give up around league average runs. But the further away you are, then the more the coefficients need to change. If you wanted to do FIP the right way, that’s what you’d have to do. However, the appeal to FIP is that we get a quick look by using nice constant coefficients. If we had to figure out the new weights for each pitcher and each year, it would lose a great deal of appeal.
wOBA was never intended for mainstream use. It was conceived for The Book. And wOBA, done right, would be FIP done right: proper coefficients for each run environment. So, some years, the coefficient for the HR is 1.90 and others it’s 2.10 and so on.
But, the appeal to FIP is the non-changing coefficients, and we calibrate by using a constant (we’d add +3.20 or +3.00, etc, as the case warrants it).
Indeed, when I use wOBA as a quick calculation, I use this:
wOBA
= 0.7 * (BB + HB)
+ 0.9 * (1B + ROE)
+ 1.3 * (2B + 3B)
+ 2.0 * (HR)
If I need to align it to some league level, or if I want to make it cross-era useful, I’ll just apply some overall constant to line them up. That is, I use the same principle behind FIP: keep the coefficients, and apply an overall fudge factor.
My questions:
1. Do you prefer to see a non-changing wOBA formula, like FIP?
2. If so, do you prefer to align to that particular year, or do you want the league average to always be 0.330?
Good job by Dave. But I like his side note here:
We would not suggest that anyone look at 2011 WAR as a definitive ordered list of who the best players in the game are at this time – it’s not even trying to make that claim. It’s talking about past performance only, not what we expect going forward.
In fact, Buster’s criticism of WAR could be applied to any stat you want, traditional or advanced. If you interpret it literally, ERA currently says that Ryan Vogelsong is the best pitcher in the National League. That’s crazy, of course, but no one interprets single-season ERA that way. Single season batting average gives you Casey Kotchman as the third best hitter in baseball. It’s not just the advanced stats that produce results that “don’t pass the smell test”.
Bill James wrote an article recent called “Abe Lincoln Scores”, where he focused on 4 scores (BB+HB, SO, HR, BIP). He set the score for BIP to a “1”, and floated the other numbers around that. SO was 0, HR was 4, BB+HB was 2. (HR is undervalued in his metric.)
At this point, you should be thinking two things:
a. wOBA
b. FIP
The wOBA equation is this:
0.0: SO, other outs
0.7: BB, HB
0.9: 1B, ROE
1.3: 2B, 3B
2.0: HR
What James did was to focus just on those FIP things. So, we can come up with a FIP equation based on wOBA fairly easily:
wOBAfip = (0*SO + 0.7*BB + 2.0*HR + something*BIP) / PA
So, this made me think. Whereas in the FIP equation, the “3.2” is a constant for all pitchers, the “something*BIP/PA” is specific for each pitcher. That is, his wOBA will be affected based on the percentage of his PA that are BIP. To take an extreme view, if 100% of his PA are BIP, his FIP will equal 3.20. The wOBA for such a pitcher will be 0.300.
Is a .300 wOBA actually a 3.20 ERA? Not exactly. I mean, it’s pretty close. A .300 wOBA is more like a 3.30 ERA. But, still, there’s a bit of bias to account for.
The other thing is if the 13, 3, -2 weights are correct. FIP complicates matters by having IP, not PA, as its denominator. Since we are trying to remove the fielders from the equation, the existence of IP implicitly includes them.
***
The linear weights run values are these for the 4 scores:
Runs above average =
-.28 SO
-.03 BIP
+.32 BB
+1.40 HR
To convert to runs, we add +.12 runs (per PA). So, we get:
Total Runs =
-.16 SO
+.09 BIP
+.44 BB
+1.52 HR
This gives us runs scored per game.
Since FIP likes to keep the BIP “fixed”, then we remove .09 runs per PA from each event, and spin it off into its own. Now we have:
Total Runs =
-.25 * SO
+.00 * BIP
+.35 * BB
+1.43 * HR
+.09 * PA
Since there are 38.5 PA per game, we get:
-.25 * SO
+.00 * BIP
+.35 * BB
+1.43 * HR
+.09 * 38.5
Note: there are an average of 38.5 PA per game. Great pitchers see fewer batters. Hence, the reason we have a bias here.
With 9 IP per game, we get:
-.25*9 * SO/IP
+.00*9 * BIP/IP
+.35*9 * BB/IP
+1.43*9 * HR/IP
+.09 * 38.5
Which is:
(
-2.25 * SO
+3.15 * BB
+12.9 * HR
) / IP
+ 3.47
Since this is on a runs scale, and not an earned runs scale, we can multiply everything above by 0.923 to get the ERA scale:
(
-2.1 * SO
+2.9 * BB
+11.9 * HR
) / IP
+ 3.20
Hence, we see where the -2, +3, +13, 3.20 figures from FIP comes from. Based on this deconstruction, we see that the HR value in FIP may be too high, that I should be using 12, not 13.
***
However, remember that the run value of a HR is fairly static at +1.40 runs, while the run value of the walk moves with the run environment. As the run environment goes down, so does the run value of the walk. So, relatively speaking, the HR value compared to the walk increases as the run environment goes down, and decreases as the run environment goes up.
Indeed, if the run value of the walk is +.30, then the FIP component for HR becomes 13. If the run value of the walk is +.33, then the FIP component for the HR becomes 12.
***
We of course have another bias, and that is that runs are not linear when dealing with pitchers. But we’ve taken a decidedly linear approach. So, there are two things that conspire against a great pitcher’s FIP score being biased too high:
1. We give him 38.5 batters per 9IP, when it should be a bit lower.
2. Each event has less impact the fewer runners on base.
However, one thing that shifts the balance is the use of IP, not PA, in the denominator. So, it kind of sets the balance back the other way.
***
What am I saying? I don’t know. Maybe change the “13” to “12” for HR? Maybe try to focus on PA and not IP? Maybe look at percentage of PA that are BIP? Maybe have a FIP equation that is better tuned to the run environment than simply floating the 3.2? I don’t know yet.
That’s why this is a lab thread.
I understand the reason for the existence of OPS. There’s the OBP pillar over here, there’s the SLG pillar over there, and we need something to keep the house from falling, so, let’s use the OBP + SLG pillars together.
But, why does OPS+ need to exist? No person actually calculates OPS+ by hand or by computer even. B-R.com calculates it for you, so you are basically taking it on faith. If you are going to take a metric on faith, why not take one that is not biased? That’s why I support RC+ (though not the James version of Runs Created). OPS+ doesn’t even mean anything. It just happens, by luck, to approximate RC+.
Bill James recently noted when asked about OPS+:
I don’t much like OPS. OPS is an approximation which has gained favor over better measurements because of its simplicity. A mathematical derivation based on a convenient approximation doesn’t strike me as a best option.
And he’s right. OPS is a nice shortcut, which will ensure its survival. But, to add the level of complexity required to get it to OPS+ is not the best option. It may be an ok option, it may be a passable option. It may even be half-decent option.
OPS+ is nowhere near the best option, and there’s no point in debating for it on that basis. The argument for OPS+ requires you to concede that you are not interested in the best. And if you want to argue for OPS+ the way you’d argue that you’re happy at your crappy job because it pays the bills, then so be it. It gets the job done.
I’ve been meaning to do this for a few years now.
In 2010, with Cliff Lee on the mound, his team allowed 84 runs to 842 batters. Tommy Hanson’s team gave up 86 runs to 845 batters. As you can see, a pretty solid match.
(Note by the way that I didn’t say Cliff Lee gave up 84 runs. The defense has 9 fielders on the field. While the pitcher may be the pivotal player in allowing runs, he’s not the only one. This is why we should always say “the team allowed with the pitcher on the mound X number of runs”. This is not only accurate, it keeps us from giving too much credit to the pitcher.)
Hanson’s slash line (BA / OBP / SLG) was: .239/.301/.347
Cliff Lee on the other hand: .240/.255/.363
That works out to an estimated wOBA of:
Cliff Lee
= .277
Tommy Hanson
= .300
How is it that Cliff Lee ended with much better results than Hanson overall, but gave up a similar number of runs? While Hanson’s slash line with runners on base and bases empty was consistent with the league, Cliff Lee was on the mound when bad things happened with men on base:
Cliff Lee
.214/.230/.333 Bases Empty
.288/.302/.420 Runners on Base
The entire difference is basically BABIP driven, but we’re not concerned about this for now.
So, the question is: can we come up with a BaseRuns equation that is dependent on the base-out situation, such that the total runs estimated will be the same for Hanson and Lee? I don’t know the answer to that question yet.
I do want to present a general wOBA equation for bases empty and runners on base. For bases empty, we have:
0.85: 1B, BB
1.10: 2B
1.50: 3B
2.25: HR
Obviously, a single and walk are identical with bases empty. A shortcut to get the above, using only the slash line would be:
wOBAe = (2 * OBPe + SLGe - BAe ) * .42
The little e denotes performance with bases empty.
A general equation for runners on base would be:
0.50: BB
0.95: 1B
1.40: 2B
1.60: 3B
1.75: HR
With runners on base, there’s simply little to distinguish the various extra base hits. So, a shortcut equation would be:
wOBAr = (3 * OBPr + 2 * SLGr + BAr) * .16
The little r denotes performance with runners on base.
Also note that the Leverage Index with runners on base is 1.4, while it’s 0.7 with bases empty. And that the bases empty occurs 55% of the time. (Yes, I know that the better you are, the more often the bases are empty. This is quick shortcuts here.)
So, to combine the above two equations into an overall wOBA, we get:
wOBA
= wOBAe * 0.7 * .55
+ wOBAr * 1.4 * .45
So, if we take Cliff Lee:
.214/.230/.333 Bases Empty
.288/.302/.420 Runners on Base
We can convert that as:
wOBA
= (2 * .230 + .333 - .214 ) * .42 * 0.7 * .55
+ (3 * .302 + 2 * .420 + .288) * .16 * 1.4 * .45
= .299
Tommy Hanson:
.233/.289/.349 Bases Empty
.249/.319/.343 Runners on Base
We can convert that similarly to:
wOBA
= .303
As you can see, a wOBA based on looking at performance by men on base and bases empty makes Cliff Lee and Tommy Hanson equivalent.
Feb 11 14:01
Reader Mail of the Day: Why do we need X years of fielding data? And what about outliers?
Feb 11 11:54
Who is Jeremy Lin?
Feb 11 10:29
Dwight Evans
Feb 11 08:56
MGL: Today on Clubhouse Confidential
Feb 11 02:12
Performance through the ages
Feb 10 23:01
For Your Soul
Feb 10 21:07
Hero of the month: Brittney Baxter
Feb 10 18:32
Moneyball at Villanova
Feb 10 17:00
Psst… wanna intern in Canada?
Feb 10 15:01
New PECOTA
THREADS
February 11, 2012
Clutch analogy
February 11, 2012
Who is Jeremy Lin?
February 10, 2012
Jose Molina
February 10, 2012
Reader Mail of the Day: Why do we need X years of fielding data? And what about outliers?
February 10, 2012
Performance through the ages
February 10, 2012
Hero of the month: Brittney Baxter
February 10, 2012
Win expectancy charts used in football… in 1983!
February 10, 2012
Dwight Evans
February 09, 2012
Psst… wanna intern in Canada?
February 08, 2012
Moneyball at Villanova
Recent comments
Older comments
Page 1 of 320 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date