THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, October 03, 2008

Complete Linear Weights, 2008

By Tangotiger, 11:49 AM

Colin provides his data for easy access, along with his intro article.


#1    Peter Jensen      (see all posts) 2008/10/03 (Fri) @ 12:59

Colin - I used your linear weights and spread sheet and tried to replicate your RAA values and got numbers that were significantly larger than yours (63.8 instead of 42.7 for Pujols).  I am in the process of double checking but you may want to as well.  Especially since it seems like Pujols should have closer to three times as many RAA as Manny instead of less than twice as many.

Which brings me to another point.  You didn’t bother to sum Manny’s and Teixeira’s production for both teams they played for( and others that played for more than one team) and that left them out out of the top ten where they clearly should be.

Also, you can’t just multiply a park factor times a negative value (like below average RAA values) because it alters the RAA value in the wrong direction.

Finally, the changes in rank order from RAA to ABS demonstrates the absurdity of Tango’s decision to add a fixed run value to all PAs.  This should be a simple transformation with no changes in rank order.  The only proper method is what I have arguing for all along, adding a positive run value to only the batting events that don’t result in an out.


#2    Colin Wyers      (see all posts) 2008/10/03 (Fri) @ 13:44

Just to make sure we’re on the same page, are you using:

PA - H - K

to represent outs and

H - 2B - 3B - HR

to represent 1B?


#3    Patriot      (see all posts) 2008/10/03 (Fri) @ 13:53

Outs should be AB-H-K.

I think that’s probably the issue here--the abv avg figures don’t pass the smell test.  Sizemore +3.5?  Anybody -45?


#4    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 14:09

Peter is correct that the park factors are applied to the totals, not the marginals.

***

Peter is incorrect in expecting the same rank ordering for RAA.  You expect the same rank ordering for RAA per PA.

A guy with +1 in 1 PA will rank way lower than someone with -1 in 600 PA, when looking at “total” runs created.


#5    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 14:10

I also agree that +42 for Pujols doesn’t pass the smell test, not in the slightest.


#6    Colin Wyers      (see all posts) 2008/10/03 (Fri) @ 14:16

Patriot’s right - ABs versus PAs. (Blog post and spreadsheet have been updated.) Now I get 79.4. runs for Pujols. Unsure of where the remaining discrepancy between Peter and I is. (The only thing I know of that he and I are doing differently now is that he’s using the weights rounded out to three places, whereas I’m using “unrounded” values.)


#7    Peter Jensen      (see all posts) 2008/10/03 (Fri) @ 14:27

Colin - I am using PA -(Hits + Walks + HBP + SO + DP) for Outs as SO and DPs have been given their own linear weights.

Tango - I was incorrect about the rank order having to stay the same.  However, I am not incorrect in criticizing adding a fixed value for every PA.  It makes absolutely no sense for two players who have identical PAs and identical Linear weights above average to also have identical linear weights above 0 if one player makes more outs than the other.  But by adding a fixed value per PA they will.  This is clearly wrong.


#8    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 14:27

Here is an illustration to show where Peter and I differ.

Player 1
0: runs above average
400: outs made
600: PA

Player 2
0: runs above average
450: outs made
600: PA

Obviously, the second player has alot more homeruns.

To get “runs created”, I simply add +.12 runs per PA, and I get both at 72 runs created.  So, I get both players as: league average players, who came to bat 600 times each, and generated 72 runs each.

Peter on the other hand is suggesting adding say .36 runs per non-out PA.  So, Player 1 gets an extra 72 runs to add to his zero to give him 72 runs created.  Player 2 gets an extra 54 runs created.

Am I representing you correctly?

He may also instead want to apply a multiplier to the positive run values.  In that case, we have this:
Player 1
0: runs above average
400: outs made
600: PA
120: positive runs
120: negative runs

Player 2
0: runs above average
450: outs made
600: PA
135: positive runs
135: negative runs

So, we add 0.6 runs created per positive run generated.  Player A gets an extra 72 runs, to add to 0, to get 72 runs created. Player B gets an extra 81 runs to add to 0 to get 81 runs created.

I’m not sure which way Peter is advocating.


#9    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 14:30

We cross-posted.  However, “This is clearly wrong. “ I see it as “This is clearly right!”.

In any case, please tell me from my post 8 illustration which player gets more RC in your view.


#10    Colin Wyers      (see all posts) 2008/10/03 (Fri) @ 14:32

Peter,

The correct way to do it would be to either use AB - H - K (as Patriot said) or to use:

PA -(Hits + Walks + HBP + SO)

given the way I constructed the double play weight. It’s not explicitly a seperate term, like the K term, because technically some Ks and CSs are double plays as well. I’m pretty sure I mentioned how I did that in one of my previous articles.


#11    Peter Jensen      (see all posts) 2008/10/03 (Fri) @ 14:33

I used PAs and subtracted BBs and HBP rather than using ABs because even though you didn’t include them in your spread sheet I thought you were treating SF and SH as outs. 

If you rounded to three digits rather than truncated it shouldn’t make much of a difference.


#12    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 14:36

Colin: “The correct way to do” it would be to use exactly the events you used to generate the equation.  As Peter noted, however you treated SH and SF to generate the LWTS, that’s how you should treat them for each player.


#13    Peter Jensen      (see all posts) 2008/10/03 (Fri) @ 14:38

Colin - The heading for DPs in your spreadsheet is GDP.  I interpreted that as Ground ball double plays.  Since you didn’t give values for other double plays in your spread sheet I assumed that you would be properly charging the extra lost value against the runner instead of the batter.


#14    Colin Wyers      (see all posts) 2008/10/03 (Fri) @ 14:50

All hitting data is taken directly from Baseball-Reference.com - I scraped it off the league pages yesterday.

SF and SH were ignored in generating the LWTS; presumably all were coded as “2” by Retrosheet. So then the correct term (or at least as close as one can come, given the data available):

PA - H - BB - IBB - HBP - K

That means coding certain plays (reach on error, fielder’s choice) as generic outs - I used all Retrosheet event codes to generate the weights. That shouldn’t be a problem on the whole - I adjusted the values of the events I ended up using to sum to zero on the dataset, to compensate for the missing data. But a player with high ROE totals will be (slightly) underrated.


#15    Peter Jensen      (see all posts) 2008/10/03 (Fri) @ 14:57

Colin - Then I don’t understand how you are treating GIDP.  Aren’t you using the value for DP that you have in your list of linear weights for them?  If you are then you have to subtract them from the generic outs as well.


#16    Colin Wyers      (see all posts) 2008/10/03 (Fri) @ 15:15

Peter, Retrosheet doesn’t have a seperate event code for the double play. The vast majority of double plays are coded as 2, or “Generic Out.” Some are coded as strikeouts - some are even coded as singles and doubles.

The value of the DP is the value beyond that of an ordinary out. This is consistent with how formulas like Extrapolated Runs and Estimated Runs Produced handle the double play.


#17    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 15:30

Peter, in case you missed it, I replied in post 8…


#18    Peter Jensen      (see all posts) 2008/10/03 (Fri) @ 15:32

So in calculating RAA from the spreadsheet you added the value -.587 * 16 to Pujols other totals for his double plays?

Retrosheet does have a double play flag so you can create a double play event category or a grounded into double play event category (using Batted Ball Type) for calculating linear weights.  I thought that you had done that.  Similarly, you can create a separate ROE event.


#19    terpsfan101      (see all posts) 2008/10/03 (Fri) @ 16:04

To get Grounded Into Double Plays with Retrosheet data, set the DP_FLAG to “T” and set the EVENT_TX to *GDP*, asteriks included. Actually you dont even need the DP_FLAG if you set the EVENT_TX to *GDP*


#20          (see all posts) 2008/10/03 (Fri) @ 16:28

This

>Also, you can’t just multiply a park factor times a negative value (like below average RAA values) because it alters the RAA value in the wrong direction.

still seems to be a problem


#21    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 16:37

One can also consider the possibility that the park impact should be additive to PA, not multiplicative to the runs created for reasons explained in the second half of the article here:

http://tangotiger.net/parks.html


#22    Colin Wyers      (see all posts) 2008/10/03 (Fri) @ 16:42

That’s how I derived my initial values for the DP term; then I subtracted the weighted average value of the underlying event from the DP term.

As for the park factor issue - I’m open to suggestions as to a fix for that. Would it be correct to adjust the inputs separately before applying the LWTS?


#23    terpsfan101      (see all posts) 2008/10/03 (Fri) @ 16:54

Since I already broke down the Retrosheet data by the official statistics, let me see if I can use Baseruns to generate LW and RC for 2008. I’ll use the empirical Linear Weights from the AL 1993-2007 and NL 1993-2007 to generate my Baseruns equations.


#24    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 17:07

If you make the park impact figures additive, then it won’t matter when you apply it.


#25    Colin Wyers      (see all posts) 2008/10/03 (Fri) @ 17:15

Let me see if I’m understanding this right. League average is roughly .12 runs per PA. So let’s say that league average at Coors Field is .16 runs per PA. (Which is a number I just made up.) So then the park factor would be -.04 per PA?


#26    terpsfan101      (see all posts) 2008/10/03 (Fri) @ 17:22

Actually it would be a pain to calculate LW for 2008 using Baseruns as I’m not at the computer that has all my Baseruns work on it.


#27    Peter Jensen      (see all posts) 2008/10/03 (Fri) @ 17:23

Thanks Tango.  I had missed your post #8 in the flurry of responses. It will take me a while to formulate my response.  For some reasom I find this a difficult problem.


#28    tangotiger      (see all posts) 2008/10/03 (Fri) @ 17:56

Colin: right.  Unless you can show us that someone with .20 runs per PA is affected more than someone with .10 runs per PA.


#29    Colin Wyers      (see all posts) 2008/10/03 (Fri) @ 18:21

Easiest ways to do it that I can think of:

* Take the R/PA for the league and figure each park’s R/PA using the multiplicative park factors I used.

* Parse the following out of the PBP data for each park:

R/PA(HOME) - R/PA(ROAD)

Any thoughts on this?

Also, I’m considering doing replacement level with this, and here’s what I’ve come up with so far, just doodling with pen and paper.

Tango’s replacement level is -2.25 wins above average, or -23.625 RAA, per 700 PA (you can get more specific by league). Still using .12 R/PA on average:

((700*.12)-21)/700 = .086 R/PA

Using the VORP baseline, .80% of league average, gives me .096 R/PA for replacement level. So instead of adding .12 (I actually use different values for league - they’re within .01 runs of each other, I believe, so .12 is close enough for explaining.)

Instead of adding .12 per PA to get Runs Above Zero, I should be able to add .086 (or .096) to get Runs Above Replacement, right?


#30    tangotiger      (see all posts) 2008/10/03 (Fri) @ 18:42

The 2.25 doesn’t necessarily translate to 23.625.  Depends on the run environment.

I use around 74-75% or so.  Patriot uses 73%, which is what Clay uses I think.  MGL might use 80%.  Woolner may say he uses 80%, but if you add up his VORP, he uses something close to 75%, since his VORP matches Clay’s RARP.

Otherwise, yes, what you said.


#31    tangotiger      (see all posts) 2008/10/03 (Fri) @ 19:57

As a rough guideline, we can use 75% for nonpitchers, 125% for starters and 105% for relievers.

For a 4.67 run per game environment (84 runs per 700 PA, 4.30 ERA), that sets the replacement levels as:
- 63 runs per 700 PA or .09 runs per PA
- 5.40 ERA for starter
- 4.50 ERA for reliever

The total runs above replacement per team:
nonpitcher: (84-63)*9= 189 runs
pitcher: (5.4*.65+4.5*.35 - 4.3)*162= 127 runs

Total RAR = 189+127 = 316

The nonpitcher/pitcher split is 60/40.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:13
Avery being Avery

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP