Friday, January 26, 2007
Runs Produced
A wrote a piece on trying to make sense of Runs Produced:
http://www.tangotiger.net/runsproduced.html
Buy The Book from Amazon
A wrote a piece on trying to make sense of Runs Produced:
http://www.tangotiger.net/runsproduced.html
Could it be the reason the run produced with HR included does not reconcile with RC 100 guys who hit 35 HR or 5 HR is because RC and LWTS may be more kind to leadoff type hitters (nothing to back this up, I am unarmed).
The thing with the various run estimators being used for individual players is that while they are accurate at the team level they do not account for the fact that the value of a HR by a leadoff hitter is less than someone hitting 3rd or 4th, and the value of a walk by a # 4 hitter is not as valuable as it would be for a leadoff hitter. A slugger who does not walk a lot gets killed.
I understand the arguments for and against not subtracting the HR, but perhaps the answer lies somewhere in between, and that some part of the HR should be deducted but not all.
I have the LWTS by batting order in The Book. If someone wants to calculate it, they can.
***
If we take this data:
http://www.retrosheet.org/boxesetc/YS_A2006.htm
We find that the runs created of the 3rd hitter is 1 or 2 less runs than the adjusted Runs Produced. The RC of the cleanup hitter is 6 runs lower than his adjusted Runs Produced.
So, the big boppers get around 4 Runs Produced higher than they should get, because of their batting order. This is pretty consistent with what we were getting in the article.
My problem is with what seems to an assumption--that RP is supposed to be an alternative version of RC. That is what leads to attempts to modify R+RBI by subtracting out HR, or adding a term for outs, or whatever.
But it is just as reasonable (more so, IMO), to look at RP and RC as having different goals. RC is hypothetical expected runs. RP attempts to distribute credit for actual runs, in a retrospective viewpoint.
By that concept, the HR is treated properly in R+RBI. What is missing, of course, is the “middle advancement” credit which is a part of many non-HR runs. B James intuitively understood this, but (surprisingly for him) failed to articulate it very well.
So, IMO, if you want RC, then use RC. If you want what RP represents (for whatever reason), then use R+RBI, while accepting that around 15% (or whatever) of the credit for actual runs is missing. If you want to fix that, then get the PBP and work it out.
But don’t use R+RBI-HR, which is a poorly conceptualized and incomplete “hybrid” stat--sort of like “kissing your sister"…
I disagree with David, and possibly agree.
Disagreement viewpoint:
Run Creation is “getting on”, “moving over”, and “inning killer”. The moving over value of the HR and triple is exactly the same (around .40 runs). From an RBI perspective, the triple is .60 runs and the HR is 1.60 runs. From an RDI perspective, they are both .60 runs. For reasons already articulated, R+RDI, like Goals+Assists, fits the bill.
Agreement viewpoint:
If (R+RBI)/2 is the true measure, as opposed to R+RBI, then the run value of the HR comes in at 1.30, but the run value of the walk is now .15, the single is now .23, the run value of the double is .42, etc. All too low. The missing part, as David is describing it, is those walks and non-HR hits that move runners over, but don’t actually get a R or RBI out of it. To accept that however, means that there is .18 runs missing from the walk, .24 missing from the single, .35 missing from the double, and .4x missing from the triple. I don’t see how that makes any sense, especially when you look at the walk.
To me, the RDI captures, fairly well, the “moving over” part. It gives too much credit to the extra base hits, which is why the inning killer factor (outs * .14) is so high, rather than the proper outs * .10.
In fact, if we simply leave it as: R+RBI-HR-outs/10-adjustment, this makes it clearer. The question then is how to best handle the adjustment part. You can say knock out .07 for each double, and .20 for each triple and HR, and add .03 for each walk (or some such), so that it properly balances to LWTS.
Nonetheless, R+RBI-HR is the basis to which I stick by.
I’ve long been with Tango on this one. Runs Produced is a good stat. In a better world, instead of BA HR RBI we’d have wOBA RP.
The main issue I have is assuming scoring a run and an RBI has equal value in the Runs Produced equation (R+RBI-HR). Intuitively I feel this is not the case which probably means it is but....
So I may just be exposing my ignorance here, but why not compare Runs Produced with Run Value relative to the out? If we are looking at Total Runs Produced not Net Runs produced, this seems more sensible.
Using your numbers (1.7 HR, .77 1b, 1.08 2b, 1.37 3b, 0.62 nibb) from the The Book (pg 28) and the mlb data from the retrosheet 2006 link
Bat
Order.R+RBI..R+RBI-HR.R+RBI+HR...RV..........
1st....2504....2294....2714....3067.46...13.02%
2nd....2600....2389....2811....2964.22...5.45%
3rd....2850....2477....3223....3118.21...-3.25%
4th....3038....2593....3483....3173.04...-8.90%
5th....2662....2299....3025....3000.59...-0.81%
6th....2354....2025....2683....2662.1...-0.78%
7th....2121....1844....2398....2550.76...6.37%
8th....2024....1837....2211....2357.73...6.64%
9th....1852....1701....2003....2145.16...7.10%
Totals.22005..19459...24551....25039.....1.99%
Maybe Bill James was right and HR should be added and not subtracted, or some other factor should be applied to the RBI to give it more weighting relative to the run.
The added value of a HR relative to other hits or walks may be that it guarantees 1 or more runs is scored 100% of the time it is hit, while other events do not have this certainty.
Tango, I don’t think you got my point. I am not disputing that R+RBI-HR-outs/10 (or some similar construction) best mirrors “runs created”. But RP is working from the top down, starting with actual runs. So, RP is not concerned with giving credit for a leadoff triple who ends up stranded. RP is not concerned with a batter who struck out, thus lessening the chance that an existing runner will score. RP is only interested in a “subset” of overall theoretical “run creation"--who produced the 4 base advances which directly led to each actual run. It’s similar to that stat I posted some years ago called “absolute wins produced”.
Paul:
Using the Google Docs file I posted in the article, and removing those entries with less than 10 players in there, I run a regression of R, RBI, HR against RC. This is what I get:
correlation: r=.999
RC = 1.45*R + 1.03*RBI - 1.30*HR - 88
The standard error for the HR is .17 runs, meaning we’re pretty sure it’s a big negative.
Keeping all the records (including those with only 1 occurence), I get:
r=.993
RC = 1.34*R + 1.29*RBI - 1.68*HR - 93
I see nothing here to alter that our best guess is somewhere around R+RBI-HR.
However, your point that R or RDI (i.e., RBI-HR) need not be equally weighted deserve further scrutiny.
***
David: agreed that RP is concerned with who was involved (not “produced") the 4 bases that led to the actual run. For a solo HR, the 4 bases goes to the batter. For an RBI-generating single that also scores, the single would get, say 2 bases to advance the runner, plus 4 bases for himself scoring, for a total of 6 bases. (Same would apply for a double, triple, or HR.)
Divide the bases by 4 to get “runs”, and we see the HR will come in at around 1.50 runs. This in effect becomes the “second process” discussed by FC Lane last century.
Mike Humphreys quote: “Runs Produced is a good stat.”
No stat is inherently good or bad, it is only relatively good or bad in answering a specifically defined question. No one has defined the question that Runs Produced is trying to answer. Is it projecting a players future value or describing his past production? Is value being defined as a player’s “true talent” value or his ability to help his team win games? When trying to project a player’s value or estimate his “true talent” value an analyst usually tries to neutralize the context in which that player created his accomplishments. Runs Produced makes no attempt at neutralizing context. There may be some question that Runs Produced is good at answering, but until it is defined the hypothesis that Runs Produced is good or bad at answering that question can’t be tested.
Tango, if you get a chance, can you run your regression using only R and RBI terms?
r=.966
RC = 2.00 * R + .06 * RBI - 82
The standard error of each the R and RBI is between .10 and .20, making the RBI term effectively meaningless.
If I just use Runs:
r=.966
RC = 2.05*R - 82
If I just use RBI:
r=.676
RC = 1.02*RBI + 3
***
I think the reason is rather clear. The HR gets an average of 2.6 RBI per HR hit, while a walk get almost 0 RBI per walk. This has no semblance to actual production.
If you take 2 * R, you get a run value of .52 for the single and walk, .86 for a double, 1.22 for a triple, and 2.00 for a HR. While also not good, it’s a much better fit than RBI.
***
Peter: however you want to define Goals and Assists in the NHL, I’m saying Runs and Runners Driven In is the equivalent in MLB. G+A = Points, R+RDI=RP. Whatever “Points” means, “RP” means the equivalent.
I’m also not saying that RP is good or not good. My sole point is that R+RBI-HR makes alot more sense than R+RBI.
For those who think that R+RBI divided by 2 would alleviate some of the problems:
Go back to the Google Docs in my article, and concentrate on two lines:
- the one with 56 players (avgWOBA of .273, avg HR of 13.1) and the one with 142 players (avgWOBA of .379, avg HR of 15.8)
These two players have roughly the same number of HR per 600 PA, but are wildly different in their production (.273 v .379). Their RC are 44 and 100 respectively ( a difference of 56 runs).
Their RP are 107 and 150, a difference of 43 runs. Their adjusted RP have a difference of 53 runs.
The difference in their R+RBI divided by 2 is: 23.
***
Try two other guys with the same number of HR:
.312 wOBA, 23.3 HR, 65 RC, 125 RP, 65 adjRP
.424 wOBA, 25.9 HR, 123 RC, 170 RP, 121 adjRP
As you can see, a difference of 58 RC, 45 RP, and 56 adjRP.
The difference in their R+RBI divided by 2 is 24 runs.
As you can see, dividing by 2 doesn’t work.
***
It’s not R+RBI, it’s not (R+RBI)/2. It’s R+RBI-HR, or better R+RBI-HR-outs/7.
In the ‘new’ RC, B James reconciles each player to the team seasonal total. But as we know, run creation really exists on the inning level. So, if each player’s RC were figured for each inning, and reconclied to the team’s actual runs scored in the inning, which would correlate better, Runs Produced or R+RBI?
I was trying to use Runs Produced (RP) in an accurate way to rank hitters. I came up with:
.92*RP/PA + .46*OBA = wOBA
Using Tango’s RP component values (.29, .47, .84, 1.21, 1.60), here are the +1 values from my formula, alongside Tango’s actual wOBA values:
+1 value, actual wOBA
BB, .73, .72
1B, .89, .90
2b, 1.23, 1.24
3B, 1.57, 1.56
HR, 1.93, 1.95
Cool.
Taking career totals of all players born since 1931 (Mays, Mantle were born in 31) with at least 1000 PA (1746 players), I get an r=.97 for this equation:
wOBA = .52*RPI/PA + .66*OBP
That’s not to say that this is the best equation. And in fact what David has done is likely better, since he’s trying to construct something logically.
I also didn’t control for changing run environments. However, when I look at 1993-2007 careers, I get .53 and .65 as my coefficients.
I think I made a silly calculation error. I’ll be back later.
Yep. My advice is not to do this stuff in 5 minute breaks at work.
Doing the +1 method correctly, I get:
(RP/PA + OBA) *.61
Much closer to Tango’s regression. I think the simplicity of this version might make it preferable.
Agreed. I had them mostly even, so keeping them as even is fine with me.
I should have checked doing simply RPI/out instead. Might not even need OBP maybe.
But then the 2 components have different denominators. Since OBA is a classic stat, and uses PA, and we have wOBA as an accurate PA-based stat, it’s better to stick with RP/PA and weight accordingly.
And no, using RP/out will not stand on its own.
A couple points:
1) This stat even does well with the IBB, since it is an on-base event, while having fewer RP than a regular BB.
2) There should probably be a CS element, since RP includes the effect of SB. Personally, I’m fine without it, letting the absence of CS stand in for baserunning and avoiding GDP.
Nov 19 15:14
Sabermetric Moves of the 2009 Pre-Season
Nov 19 19:42
Nate Silver: hero to interviewers
Nov 19 19:31
My 1B is better than your 1B
Nov 19 19:13
Offense by position groups by decade
Nov 19 17:32
Changes in home run rates during the Retrosheet years
Nov 19 16:40
One Year and One Million Hits Later
Nov 19 16:22
Soria as a starter?
Nov 19 13:50
Response of a fired head coach
Nov 19 11:26
MLB logo
Nov 19 10:53
BDB Database (MS Access)
Hello---test 1