Thursday, December 17, 2009
Proof of the modified OPS
Let’s for the sake of argument start by saying that it’s better to have a player’s performance broken down by its components (hits, walks, HR, etc) than for it to be first aggregated into two metrics (OBP, SLG) which is then aggregated into one metric (OPS). We accept that OPS (1*OBP + 1*SLG) is a shortcut, a lazy way to get to what a component analysis would give us.
Let’s also, for the sake of argument, presume that the best meteric for components is Linear Weights. It’s fairly straightforward. A single is worth .47 runs, a double is .77, a triple is 1.04, a HR is 1.40, and a walk or hit batter is .32 runs. The out (meaning at bat minus hit plus sac fly) is worth -.276 runs. And when you apply this to MLB 2009, you get a total of zero runs. That is, the positive events matches to the negative events.
Linear Weights, in some form or other, is supported by top baseball sites like http://www.Baseball-Reference.com, http://www.Retrosheet.org and http://www.Fangraphs.com . The ESPN Baseball Encyclopedia, edited by the inventor of Linear Weights, Pete Palmer, also supports it. Top known analysts, such as MGL, are huge proponents of it. Top little-known analysts, like Patriot, are believers. And, as perhaps the biggest champion of Linear Weights (even bigger than Pete I would say), I am completely on board with Linear Weights. When I think about baseball, Linear Weights guides me half the time. We’re believers.
If we apply the above Linear Weights equation to the batting stats of the 2009 season, we get these leaders:
LWTS AB H 2B 3B HR BB HBP SF
77 568 186 45 1 47 115 9 8
60 591 177 35 3 46 110 9 9
57 523 191 30 1 28 76 2 5
47 552 153 27 2 40 119 5 4
44 609 178 43 3 39 81 12 5
44 611 198 34 0 34 68 5 1
44 576 197 42 1 24 61 9 5
43 532 163 36 2 35 76 3 4
43 635 203 39 6 32 57 13 3
42 469 151 38 1 25 70 4 1
You will notice I did not list names. They are unimportant for the purposes of this discussion. You can guess that Pujols is the #1 guy though. And here are the bottom 10:
LWTS AB H 2B 3B HR BB HBP SF
-18 429 97 19 0 19 20 1 7
-18 391 93 22 0 8 20 4 4
-18 413 93 23 2 4 40 10 4
-19 460 115 19 1 5 39 1 5
-21 341 71 8 3 10 19 3 0
-21 334 67 20 1 8 27 0 4
-23 461 116 11 6 1 34 2 4
-24 470 115 20 6 6 21 0 6
-24 376 82 15 0 8 18 5 3
-28 404 97 11 2 1 18 2 2
There are no surprises here, nor should there be, since you stipulated to Linear Weights as the ideal metric to interpret these components.
Now, for some reason, the major sites don’t look at Linear Weights (though they will look at the NFL QB rating). Instead, OBP and SLG are the two metrics that have taken favor with the public, each one only looks at the subset of the above components. OBP looks at H, BB, HBP, while SLG looks at H, 2B, 3B, HR. TOGETHER, they look at all of the above components. Well, then, let’s add them up! And hence we have OPS, or “PRO” (as in production) as Pete Palmer first called it 25 years ago. Same metric, different name.
But, why? Why just add it together as 1*OBP + 1*SLG? Why the “1” for each? Why not something else? Why not run a regression analysis and find the best fit for the two coefficients? And what would we be correlating against? Why, the one metric that we all agree on: Linear Weights (LWTS). Since OPS is ostensibly a “rate” metric, we’ll correlate it to LWTS per plate appearance (PA).
There were 345 players in 2009 with at least 200 PA. Here for example are the top 10 and bottom 10 in LWTS per PA, along with the OBP and SLG:
LW/PA SLG OBP LWTS AB H 2B 3B HR BB HBP
0.110 0.658 0.443 77 568 186 45 1 47 115 9
0.094 0.587 0.444 57 523 191 30 1 28 76 2
0.083 0.602 0.412 60 591 177 35 3 46 110 9
0.077 0.567 0.414 42 469 151 38 1 25 70 4
0.071 0.548 0.413 42 491 150 36 1 27 77 16
0.070 0.579 0.393 43 532 163 36 2 35 76 3
0.070 0.531 0.418 30 352 102 24 2 19 71 7
0.069 0.551 0.407 47 552 153 27 2 40 119 5
0.068 0.543 0.410 44 576 197 42 1 24 61 9
0.065 0.543 0.405 39 501 149 28 7 27 91 2
...
(0.049) 0.308 0.288 -13 237 50 8 0 5 22 4 1
(0.058) 0.338 0.258 -21 334 67 20 1 8 27 0 4
(0.058) 0.337 0.256 -21 341 71 8 3 10 19 3 0
(0.059) 0.267 0.292 -13 202 48 6 0 0 14 2 1
(0.060) 0.322 0.261 -24 376 82 15 0 8 18 5 3
(0.061) 0.304 0.276 -18 283 68 7 4 1 13 1 0
(0.063) 0.271 0.277 -16 225 43 10 1 2 26 1 1
(0.065) 0.288 0.274 -14 198 44 7 0 2 15 0 2
(0.066) 0.285 0.275 -28 404 97 11 2 1 18 2 2
(0.067) 0.259 0.280 -17 228 46 7 3 0 22 3 1
Since what we REALLY care about is Linear Weights per PA, but what we ONLY have is SLG and OBP, then how can we convert SLG and OBP into LWTS per PA? Well, we perform a simple regression:
LWTSperPA = x * OBP + y * SLG + z
And when we run a regression of the above data, we get a correlation (r) of .999. What does that mean? Well, if you have r=0, that means the variables (OBP, SLG) bear no relationship to the output (LWTSperPA). If r=1, then we have a perfect relationship. r=.999 is about as close as you could ever hope for. We have, therefore, found a relationship that links OBP and SLG to Linear Weights. So, what is that equation?
LWTSperPA = .459 * OBP + .269 * SLG - .265
We can rewrite this as:
LWTSperPA
= .265 * (.459/.265 * OBP + .269/.265 * SLG - .265/.265)
= .265 * (1.73 * OBP + 1.01 * SLG - 1)
Therefore, rather than OBP + SLG, what we really want is 1.73*OBP+SLG. This properly aligns OBP and SLG to Linear Weights.
And a nice by product of 1.73*OBP + SLG? Remember I said there were 345 players with at least 200 PA? Those guys averaged a .337 OBP and a .423 SLG. And 1.73*.337 plus .423 is 1.00. The average modified OPS is 1. That’s a nice scale, isn’t it? And it lines up to Linear Weights as well.
Furthermore, to convert to Linear Weights, simply take your modified OPS, subtract 1, and multiply by .265. And you get runs per PA.
***
And for you OPS+ lovers out there, OPS+ is roughly equal to 1.2*OBP + SLG. Well, now we know better don’t we? Instead of OBP/lgOBP + SLG/lgSLG, you should instead do 1.2*OBP/lgOBP + .8*SLG/lgSLG. That sets things right where we want them for OPS+.