THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, January 10, 2008

Linear Weights Ratio, Total Average, Estimated Runs Produced…

By Tangotiger, 11:49 AM

Studes checks in with a quick way to evaluate a seasonal batting line.  For you Thomas Boswell fans, it’s basically a weighted version of Total Average.  It also mimics my Linear Weights Ratio, but uses better-to-remember weights (if you take my numbers and multiply by 3, you get close to Studes’ numbers).  And, for you Paul Johnson fans, Studes’ weights are the same as Estimated Runs Produced

Basically, all of us offer some form of Linear Weights, in about as an easy form as we can.  You have bases on one side and outs on the other.


#1    david smyth      (see all posts) 2008/01/10 (Thu) @ 18:47

The interesting thing to me is that, if you have enough data, counting the bases (and outs) is all you need. IOW, run scoring only appears to be non-linear beacause of lack of data. If you have all of the bases gained by a team, and all of the bases lost (from OOB or stranded at end of inning), find the difference, divide by 4, and you have runs scored.

So, it’s not really BaseRuns that has the correct equation, it’s Total Average. BsR simply does a better job of compensating for lack of full data.


#2    david smyth      (see all posts) 2008/01/10 (Thu) @ 19:09

I should mention that my fave batter stat of the moment is a simple variation of Tango’s wOBA:

(2.5*H + 2*BB + XB)/(AB + BB)

is the basic version. (XB of course is TB-H).

Nails an individual batter’s contribution to a team (since there is no self-interaction). Also, nails and delineates the 3 major categories of production (hits, walks, and extra bases)


#3    studes      (see all posts) 2008/01/11 (Fri) @ 08:31

That’s pretty interesting, David.  The “geometric” argument, to me, has always been something like this: If a team gets, say, five total bases in a game, they’re likely to score zero runs.  But if they get ten, they’re likely to score, maybe, two.  And if they get 15, they’re likely to score five.  (made-up numbers!) IOW, the natural grouping of events drives a geometric increase in scoring.  It’s always made sense to me.

However, James himself (who pushed the geometric point of view) seemed to back off that position in Win Shares, when he admitted that a straight line works within all “normal” major league cases, ie. when he used the 52% minimum and formed a straight line from there.

Anyway, I’d be interested in understanding more about your thinking and research.


#4    Tangotiger      (see all posts) 2008/01/11 (Fri) @ 10:27

Cool David.

This is what David did:
starts with
0.72 BB
0.90 1B
1.26 2B
1.62 3B
1.98 HR

All these numbers are very close to the wOBA coefficients.
The above is EXACTLY the same as:
0.72 BB
0.90 H
0.36 2B
0.72 3B
1.08 HR

Which is exactly the same as:
0.72 BB
0.90 H
0.36 (TB-H)

Dividing all those numbers by 0.36 gets you:
2 BB
2.5 H
1 (TB-H)

Or
2.0 BB
1.5 H
1.0 TB

Interestingly, the league average is close to 1.000, which is useful to know.


#5    david smyth      (see all posts) 2008/01/11 (Fri) @ 18:13

----"Anyway, I’d be interested in understanding more about your thinking and research.”

Thanks, studes, but I wouldn’t call anything I do ‘research’ by current standards. I just play with numbers on a yellow pad with a calculator.

Anyway, to get a run you need 4 bases. Count the net bases, divide by 4, and you have runs, exactly. The only reason that these ‘geometric’ formulas were needed, and also the weighted linear formulas like XR, is lack of complete data. With PBP, the complete base data is available. So, why are we still messing around with things like 1b=.475? Because, while counting bases on the team level is straightforward, on the individual level the attribution of each base (primarily baserunner advancement) is a bit problematic. I’m not sure if you have seen it, but an analyst named (help me here) Bill Hubble? (or something like that) published a 13 (or so) article series on this, called Base Production in the 1990s. Hopefully his fine, ahead-of-time work is still available on the net somewhere. Another early analyst who was onto this was Travis Jackson in the mid-eighties in his 2 “The Last Word in baseball statistics” books.


#6    david smyth      (see all posts) 2008/01/11 (Fri) @ 19:08

Oh yeah, it was Tuttle, not Hubble. I think he used overall avg rates to divide the credit for runner advancement between hitter and runner. But with the PBP, couldn’t this be refined so that if it’s a med. hard single to RF slice x by a lefty batter against a righty pitcher, and the RF has X arm rating, etc., the credit for going to 3rd is much more fairly divided between runner and batter, on an individual play basis.

Not that I really care for this level of micro-analysis.


#7    Tangotiger      (see all posts) 2008/09/10 (Wed) @ 12:23

By the way, I like David’s equation.  You can also express it as:

Plus/Minus
= (TB+1.5*H+2*BB-PA) * .36

The terms within the parens is close to zero for the league.  You could also include 0.5*SB-CS.

***

This is somewhat similar to EqR which if you work it out is something like:
Total Runs
= (2.5*TB+2.5*H+3.75*BB-PA)*.12

Which if you divide the inner terms by 2.5 and multiply the .12 by 2.5 you get:
Total Runs
= (TB+H+1.5*BB-PA*.4)*.3

In order to get it as Plus/Minus, you need to subtract .18 runs per out
-.18Outs
= -.18*(PA-H-BB)
= .3*(-PA*.6+H*.6+BB*.6)

So, EqR as Plus/Minus
= (TB+H+1.5*BB-PA*.4)*.3 -.18Outs
= (TB+H+1.5*BB-PA*.4)*.3 + .3*(-PA*.6+H*.6+BB*.6)
= (TB+1.6*H+2.1*BB-PA)*.3

Compare that to David’s equation:
= (TB+1.5*H+2*BB-PA) * .36

Pretty much g-dd-mn the same thing.

The issue with Clay’s equation is how he adjusts is for the run environment.  As I showed in another thread, it doesn’t work in extreme cases.  Otherwise, what Clay does with EqR is exactly wOBA.

I think his translation between EqR and EqA is unnecessarily complicated, since I’ve shown that you can get from wOBA to runs in a simple linear relationship.


#8    Tangotiger      (see all posts) 2008/09/10 (Wed) @ 14:21

We can also try to compare EqA to OPS+.

The formula courtesy of Patriot’s blog:
EqR = (2*RAW/LgRAW - 1) * PA * Lg(R/PA)

where RAW = (H + TB + 1.5*(W + HB + SB) + SH + SF)/(AB + W + HB + SH + SF + SB + CS)

If we strip out all the extra parameters, you basically get:
RAW = SLG + 1.5*OBP

The denominator of course makes this a bit tricky, but it’s something like that.

OPS = SLG+OBP
but OPS+ uses this something like this numerator:
OPSnum = SLG+1.25*OBP

So, RAW and OPSnum are very similar.

OPS+ = 2*OPSnum/lgOPSnum-1

EqR
= (2*RAW/LgRAW - 1) * PA * Lg(R/PA)
= (2*RAW/LgRAW - 1) * stuff

See how similar EqR and OPS+ is?

EqR, OPS, wOBA are all different variations of Linear Weights.  And the least complicated of the bunch is good ole Linear Weights.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential