THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, February 07, 2007

Why does 1.7*OBP+SLG make sense?

By Tangotiger, 11:39 AM

This is a step-by-step explanation as to why you should use some form of modified OPS, and not just OPS.  If someone ever talks to you about OPS and how it should be weighted, send them here.


Start with a standard hitting line based on 600 PA:
AB: 540
H: 145
2B: 30
3B: 3
HR: 17
BB: 50
K: 100
HBP: 5
SF: 5

This produces a BA/OBP/SLG line of: .269/.333/.430, which is pretty typical these days.

Now, what happens if we add 1 hit and 1 AB to this batting line (i.e., a single)?  The OBP and SLG numbers go up by .0011 each.

And if we instead add 1 hit, 1 double, and 1 AB to this batting line (i.e., a double)?  The OBP and SLG numbers go up by .0011 and .0029 each.

A triple causes this: .0011, .0048
A HR causes this: .0011, .0066
A walk or hit batter causes this: .0011, .0000
An out causes: -.0006, -.0008

Now, we know what the run values of each event is, courtesy of this:
http://www.tangotiger.net/RE9902event.html

And those numbers are:
Event LWTS
1B: 0.474
2B: 0.764
3B: 1.063
HR: 1.409
BB: 0.336
out: -0.302

Now, all we have to do is run a regression of the change in OBP and SLG numbers against the LWTS numbers.  And what do we get?  An r=.9993 (almost perfect), with this equation:
283 * OBP + 162 * SLG
which is the same thing as:
(1.75*OBP + SLG) * 162

And, if we apply this equation to this table:
Event OBP, SLG
1B: 0.0011, 0.0011
2B: 0.0011, 0.0029
3B: 0.0011, 0.0048
HR: 0.0011, 0.0066
BB: 0.0011, 0.0000
out: -0.0006, -0.0008

What do we get?
Event RegressedRunValue
1B: 0.485
2B: 0.785
3B: 1.084
HR: 1.384
BB: 0.314
out: -0.286

If we remove the triple from the analysis (because there are so few of them), we get this, with an r=.9995:
(1.73*OBP + SLG) * 163

And if we apply this to the OBP and SLG differentials, we get this:
Event Actual Regressed
1B: 0.474 0.485
2B: 0.764 0.786
3B: 1.063 1.087
HR: 1.409 1.389
BB: 0.336 0.313
out: -0.302 -0.286

As you can see, our equation of 1.73*OBP+SLG is fairly close to the actual LWTS run values. 

***

Now, if we take this equation:
(1.73*OBP + SLG) * 163

You may be wondering about that “163”.  Remember that the OBP and SLG differentials I presented was based on 600 PA.  If I would have used 6000 PA, that multiplier would have been 1630.  In short, the multiplier is 0.27 * PA.  So, the real equation is this:
Runs: (1.73*OBP + SLG) * 0.27 * PA

In order to convert from runs to wins, you need to divide by something around 10 to 11 (that’s your runs to wins converter).

Wins: (1.73*OBP + SLG) * 0.025 * PA

In order to compare to league average, you would do:
lgOPS: 1.73*lgOBP + lgSLG, which, as luck would have it, is very close to 1.0 (it’s actually 1.014).  For simplicity’s sake, if we make the equation:
1.69*OBP+SLG, this will give us a result of exactly 1 for the 2006 season.

So, to compare to league average, you can have this version of OPS:

Wins above average = (1.69*OBP+SLG - 1) * .025 * PA

A guy with a .430/.670 and 600 PA would give us:
Wins above average
= (1.69*.430+.670 - 1) * .025 * 600
= +6.0

#1    Peter Jensen      (see all posts) 2007/02/07 (Wed) @ 12:22

Great analysis except I don’t understand why you would use RE values from 1999-2002 and then normalize to 2006 league data.  Wouldn’t it make more sense to use RE values for 2004-2006 and then normalize to 2004-2006 league data?


#2    Tangotiger      (see all posts) 2007/02/07 (Wed) @ 12:48

Sure, if I had them!  I probably should have used the 99-02 league data, now that you mention it.  Probably around the same, though, since the 99-02 is 5.0 RPG and 2006 was 4.9 RPG.

The important point is that the framework is established.  Now, anyone can use the more relevant data for their own implementation.


#3    Guy      (see all posts) 2007/02/07 (Wed) @ 13:44

Tango:  Using your formula of R =(1.73*OBP + SLG) * 0.27 * PA, I get a typical team scoring about 10.5 R/G and a typical hitter creating 163 R per 600 PA.  I think you’ve got a problem here.


#4    Peter Jensen      (see all posts) 2007/02/07 (Wed) @ 14:09

Linear Weights 2004-2006:

1B--.465
2B--.775
3B-1.056
HR-1.396
BB--.319
OUT-(-.292)
NonIntBB-.333


#5    tangotiger      (see all posts) 2007/02/07 (Wed) @ 14:40

Guy, it’s a marginal formula, meaning that it measures the change in OBP and change in SLG against the change in RE (i.e., LWTS).

So, it’s not Runs equals all that.


#6    tangotiger      (see all posts) 2007/02/07 (Wed) @ 14:45

Peter, thanks!

A regression of your numbers (minus triple) against my stat line in the blog entry shows this:
r=.9997
(1.73*OBP+SLG) * 163

In short, the same equation.


#7          (see all posts) 2007/02/07 (Wed) @ 14:52

Shouldn’t the regression weight the individual events by the proportion in which they actually occur? 

That would change the results a small but significant amount—mostly because of walks, I think.


#8    tangotiger      (see all posts) 2007/02/07 (Wed) @ 14:59

Actually, it should be weighted by their variance among players.  If for example, 95% of players vary in their singles per 600 PA from 100 to 110, while 95% of players vary in their HR per 600 PA from 10 to 30, it’s the HR that gets overweighted, not the singles.

In any case, you are definitely right.  I took the lazy way out.


#9          (see all posts) 2007/02/07 (Wed) @ 15:05

Hmmm, let me think about that!  Never thought about the variance among players ...


#10    tangotiger      (see all posts) 2007/02/07 (Wed) @ 15:06

Among players with at least 300 PA since 1995, here are the standard deviations, per 600 PA:

1B: 17.5
2B: 7.2
3B: 2.8
HR: 10.6
BB: 22.0
outs: 23.2

So, those are the weights that need to be applied in the regression.


#11    tangotiger      (see all posts) 2007/02/07 (Wed) @ 15:22

Weighting the regression based on my post #10, and using the LWTS values of Peter in post #4, the best-fit gives me an r=.9996, with this equation:
1.77*OBP+SLG

***

This gives me an average error in the run value of .014.  That is, 1.77*OBP+SLG implies a run value of .486 for the single, as opposed to Peter’s LWTS value of .465, for an error value of .021.  I did it for all the events, and weighted the absolute errors by my weights in post #10.  That gives me an average error of .014.

Using my original blog entry, the average error using that equation is .014 runs. 

If I use RMSE, I get .016 for both equations.

In short, we’re really in the tweaking stage, and the coefficient for OBP relative to SLG will be somewhere in the 1.7 to 1.8 range, depending on your dataset.


#12    Cyril Morong      (see all posts) 2007/02/07 (Wed) @ 15:42

When you say “all we have to do is run a regression of the change in OBP and SLG numbers against the LWTS numbers” I am not sure what the independent variable is and what the dependent variables are. My guess is the following (what I would enter into Excel before running the regression)

LWTS-OBP Change-SLG Change
0.474 0.0011 0.0011
0.764 0.0011 0.0029
1.063 0.0011 0.0048
1.409 0.0011 0.0066
0.336 0.0011 0
-0.302 -0.0006 -0.008

If it is not those values, then what values did you use in the regression? You probably have different numbers in your regression since my regression results are alot different


#13    tangotiger      (see all posts) 2007/02/07 (Wed) @ 16:02

Maybe because of rounding.  Try this:
Event OBPdelta SLGdelta LWTSpeter
1B 0.0011093 0.0010543 0.465
2B 0.0011093 0.0029027 0.775
3B 0.0011093 0.0047511 1.056
HR 0.0011093 0.0065996 1.396
BB 0.0011093 0.0000000 0.333
out -0.0005546 -0.0007941 -0.292

Force the intercept to zero.  The equation you are solving for is:
LWTS = x1 * OBP + x2 * SLG

x2 will come in around 160-163 and x1 will be around 1.7something times x2.


#14    Cyril Morong      (see all posts) 2007/02/07 (Wed) @ 16:10

Okay, thanks, I will try that


#15    Cyril Morong      (see all posts) 2007/02/07 (Wed) @ 16:14

That worked


#16          (see all posts) 2007/02/07 (Wed) @ 19:44

It does make sense for predicting team runs in todays era.

Unfortunately there are those who will use it to to compare a lead off hitter in 2006 and a clean up hitter in 1968, or even 2006 and I do not believe this to be accurate. 

For player comparisons it seems we would need a formula based on position in the lineup and era,


#17    tangotiger      (see all posts) 2007/02/08 (Thu) @ 00:45

Tom Ruane published year-by-year league-by-league LWTS.  Do a search for his name on this site.

In The Book, I’ve published LWTS by batting order for 99-02. 

You can use these two as the basis to get what you want.

I definitely don’t condone any form of OPS.  I should have made that clear.  But, if you are going to use OBP and SLG, one should combine them in an intelligent fashion.


#18    tangotiger      (see all posts) 2007/02/08 (Thu) @ 11:12

Silly me.  I didn’t even bother running a regression that included batting average.  Here’s what we get:
r=.9999 (highest yet)

(1.8*OBP + SLG - 0.2*BA) * 0.28 * PA

That’s right.  The batting average is a negative.

This was discussed in the second article here, as well as in the comments section (which I’ve given a direct link to):
http://www.tangotiger.net/archives/artOPS1.shtml
http://www.tangotiger.net/archives/artOPS2.shtml#1005

In short, the batting average being a plus or minus depends on the run environment.


#19    Trader Joe      (see all posts) 2007/02/08 (Thu) @ 16:35

Isn’t this negative sign not so much “contextual” as rather due to collinearity among your predictor variables?  What are the all the intercorrelations between BA, SLG, OBP and LWTS?  Simple correlations would all be positive between each pair of variables, right? But when you have three highly collinear predictors in the same multiple regression equation, it’s not unusual at all for there to be a “sign flip” such as what you observe.


#20    tangotiger      (see all posts) 2007/02/08 (Thu) @ 16:51

You are generally right.

But in this particular case, you should read the second link.  It’s context.


#21    Guy      (see all posts) 2007/02/09 (Fri) @ 11:38

Tango, 2 questions/issues:

How do you reconcile 1.7*OBP with the accuracy of SLG*OBP (*.87*PA)?  If SLG*OBP works, that would imply a point of OBP is worth about 1.4X a point of SLG.

Also, your regression coefficient for OBP is determined entirely by the out and BB values, so is very sensitive to those values.  Ruane’s data seems to suggest something closer to .31 for a BB, and -.26 for an out, and those values would give you a smaller coefficient of 1.5-1.6 for OBP.  More fundamentally, I wonder if it’s right to think about the marginal value of an out as being -.29 (or -.26) in this context.  If your hitting line were a team, it scores about 75 runs in approx 15 games.  If you add 27 outs to the line, would the team score 8 fewer runs?  Doesn’t seem right, but maybe I’m thinking about it wrong.


#22    Tangotiger      (see all posts) 2007/02/09 (Fri) @ 15:00

SLG*OBP “works” under limited conditions.  I wouldn’t try to reconcile it against anything.  BaseRuns is the one to bring out here.

***

The run values of each component is completely dependent on the run environment.  This is what the Ruane data looks, when grouped by run environment:
http://www.insidethebook.com/ee/index.php/site/comments/linear_weights_by_run_environment/

If you look at the second-to-last line, which is 5 runs per game, the run value of the walk (including IBB) is .324.  The run value of a hit batter is typically about 10-15% higher than a non-IBB walk.  With 10% of of the HBP+BB being HBP, the overall average of BB+HBP is .33.

***

As for your question about the outs, simply trot out the markov calculator:
http://www.tangotiger.net/markov.html

If you multiply all those numbers by 10, you get 4.96 RPG, or 49.6 total runs in 10 games.

If you multiply those numbers by 10, and add 27 at bats, you get 4.24 RPG, or 42.4 total runs in 10 games.

That’s 7.2 less runs scored by adding 10% more outs to the batting line.

I’ll let you do the multiply by 15 example.


#23    Chris C.      (see all posts) 2007/02/25 (Sun) @ 10:50

Alan Schwarz wrote an entire NY Times article about (1.8*OBP)+SLG.

http://www.nytimes.com/2007/02/25/sports/baseball/25score.html

It’s too bad he didn’t cite the ‘OPS Begone’ piece or posts like this one, but it’s nice to see the idea get some more attention anyway.


#24    john      (see all posts) 2007/03/22 (Thu) @ 13:49

is there a way to convert 1.8OBP + SLG into Runs Scored?


#25    Tangotiger      (see all posts) 2007/03/22 (Thu) @ 14:49

Try something like:

R = 12 * (1.8OBP + SLG) - 7.3

So, a team with a .340/.420 line will score 5.1 runs.

But, I definitely do not recommend it.  I don’t understand why you’d want to do it.

Why don’t you just stick to:
Wins above average = (1.69*OBP+SLG - 1) * .025 * PA


#26    Tangotiger      (see all posts) 2007/09/19 (Wed) @ 15:02

OPS+ is roughly OBP*1.25+SLG all divided by the league average of that.

The impact of OBP and SLG to each of the events is detailed at the top of this blog.

That makes the run impact of 1.25*OBP+SLG as the following for each event:
+0.48 1B
+0.83 2B
+1.20 3B
+1.54 HR
-0.27 BB
-0.30 out

The single is fairly valued.  The extra base hits, especially the HR, are overvalued, and the BB is undervalued.

A straight OPS still keeps the single fairly valued, the run value of the walk drops down to +.24 runs and the run value of the HR increases to +1.65 runs.

OPS is a horrible stat, relatively speaking.  OPS+ is a slight improvement.  They “work” as long as you aren’t a power hitter with few walks, or a walk machine with no power.


#27    Tangotiger      (see all posts) 2007/09/19 (Wed) @ 15:42

Total Average (TA), a stat that is one step below OPS, give the run value of a single and a walk as +.40 runs, and a HR at +1.60 runs.

As in OPS, TA overvalues HR significantly (but less significantly than OPS).  It gets the run value of the walk wrong by the same amount (.07 runs, but the other way).  And misses the boat on the single.

OPS and OPS+ should die a quiet death.


#28    Tangotiger      (see all posts) 2007/12/12 (Wed) @ 13:12

Bumping day today, it seems.


#29    Tangotiger      (see all posts) 2008/06/11 (Wed) @ 12:36

My semi-annual bump to inform all who missed it on OPS, and then for it to never be used again.


#30    Tangotiger      (see all posts) 2008/06/11 (Wed) @ 15:24

Justin did some OPS correlations, among others:
http://jinaz-reds.blogspot.com/2008/06/why-do-i-keep-using-ops.html

Justin used 3 years of data

And Ubiquitous did the extensive one right here at post #2:
http://www.baseball-fever.com/showthread.php?t=48531

Ubi used 40 years of data.  Just look at the very last row, and you will see that the best in that particular bunch is 1.8*OBP+SLG.


#31          (see all posts) 2008/06/12 (Thu) @ 06:02

Given that:
Runs: (1.73*OBP + SLG) * 0.27 * PA
then (1.73*OBP + SLG) * 0.27 should be runs/PA?

I have added this to the batting query in my database.


#32    tangotiger      (see all posts) 2008/06/12 (Thu) @ 07:10

It also requires a constant to turn it into a runs created figure.  Otherwise, you’ll get 10 runs per game.  So, your second line does not follow, unless you apply an adjustment figure.

Nonetheless, I abhor anything OPS-related, and would caution against any analytical use.  Only for Quick and dirty.


#33    Tangotiger      (see all posts) 2008/09/29 (Mon) @ 09:36

Setting the out value to -.29 runs with the “standard” hitting line I was using, here is the plus1 method for wOBA:

0.49 1b
0.79 2b
1.07 3b
1.41 hr
0.34 bb
(0.29) out

Ideally, I should use the hitting line that I used to construct wOBA, but I’m just using what I always have been using.

For example, if I change the hitting line to:
AB H 2B 3B HR BB K HBP SF
540 150 30 3 17 55 100 5 5

BA OBP SLG
0.278 0.347 0.439

I get this (setting the out value to -.30 runs):
0.48 1b
0.78 2b
1.06 3b
1.40 hr
0.33 bb
(0.30) out

***

I don’t see why we would have expected anything different.  Peter noted in the other thread about using the multiplier, but that should be irrelevant, in terms of the relationship between the safe and out events.

The conversion of wOBA to runs is:
(wOBA-average)/1.15*PA

So, if you have a 200 in the numerator and 600 in the denominator, then adding 1 in the denominator (an out), we go from a .333333 wOBA to 0.33278, which is a difference of .00055.  Divide by 1.15 and times 600 (or 601) and we get -.29 runs.

If you instead add one HR (and one PA), you get 201.95 in the numerator and 601 in the denominator, for a wOBA of .3360, which is a difference of .0027.  Divide by 1.15 and times 600 and we get 1.40 runs.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main