THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, June 12, 2008

Complete Run Expectancy, Retrosheet Years

By Tangotiger, 07:43 PM

RE chart for the entirety of Retrosheet years.

To read the first record: with the bases empty and 0 outs, there have been almost 2 million PA with almost 1 million runs scoring from that point to the end of the inning.  The average (R/PA) is 0.494 runs, which is labeled as REOI (runs to end of inning).  REOI_0 is the percentage of times that there were no runs scored at all in that inning (72.7% in this case).

Excluded are all partial innings, and home halves of 9th and later innings.  I am only looking at the base/out state at the start of the PA.

Here’s the SQL


#1          (see all posts) 2008/06/13 (Fri) @ 12:18

Tom,
Wondering if it has any possible theoretical value to break out this data by inning, given that run scoring is not equal across all innings due to lineup construction issues . The differences are, in many cases, rather slight, but there is enough fluctuation that certain base-out situations in particular innings might be affected enough to register wrt strategy.


#2    Tangotiger      (see all posts) 2008/06/13 (Fri) @ 14:14

Yes, definitely right, especially in the 9th inning.  I was thinking of also doing the chart by:
- inning/score
- lineup slot

I would highly recommend joining the RetroSQL group, as well as following along on the wiki.  All our code is open.  So, a strong encouragement to join to create your own database, and do exactly what I’m doing.  For example, if you wanted inning, it would require exactly three lines of change, and each one would simply be to add this line:
,INN_CT
It’ll be as easy as that.

Anyway, back to the breakdown: any further split will really mean that the data will only be available as a csv file, otherwise it’ll become unmanageable.


#3    Tangotiger      (see all posts) 2008/06/13 (Fri) @ 22:18

Here’s the RE chart for the top of the 9th.  A score of “-4” is the batting team down by at least 4 runs.  “4” means at least 4 runs.  Every other score is exactly at that score.

Look at RE with bases empty 0 outs to figure out the quality of the hitters/pitcher.  You can easily tell when the closer is in there.

I like to compare the chances of no runs scoring with man on 1b 0 outs and man on 2b 1 out.

http://tangotiger.net/retrosheet/reports/re_top9.htm


#4    Colin Wyers      (see all posts) 2008/06/13 (Fri) @ 22:27

Great stuff, Tango.

I’m working on putting together a set of linear weights from the RE chart I generated - I should say I did figure out the linear weights values, looking at the average change in RE (plus runs scored) for each event. I’d love to share it, but a massive storm came through last night and took out my phone line and thus my Internet at home.

That said, everything seemed to match up decently with Palmer’s original LWTS, except doubles; Palmer had .85 and I come up with roughly .75. I don’t know how to explain it yet.


#5    Tangotiger      (see all posts) 2008/06/13 (Fri) @ 22:39

Palmer is wrong.  I published my LWTS numbers a few months ago on my blog. Should be easy enough to find.  Do a search for Retrosheet.

***

http://tangotiger.net/retrosheet/reports/re_bo.htm

This is by batting order and whether the DH was in effect.  Obviously, the DH is only from the last 35 years.  So, don’t make the comparisons too closely.

Also, when the cleanup hitter is up, he’ll be followed by the 5,6,7 hitters.  So, if you are wondering why “his” numbers are so low, it’s because they are not “his” but his and the guys that follow him.  But, with 2 outs, they are mostly “his”.



#7    Colin Wyers      (see all posts) 2008/06/13 (Fri) @ 22:47

Looks about like what I have - I’m going off of memory, because I only have Internet at work until they send out the service technician tomorrow. I’ll try and get my code cleaned up by then to have some stuff to post to the group.

I grouped mine by league and year, considering doing it by run environment. I know you’ve done that before:

http://www.insidethebook.com/ee/index.php/site/comments/linear_weights_by_run_environment/

But I was thinking - would it be better to do that on a per-game basis? Instead of grouping individual seasons by RPG, group individual games by RPG. I guess you could even go down to the inning level if you wanted. Thoughts?


#8    tangotiger      (see all posts) 2008/06/14 (Sat) @ 09:15

Definitely you can go to the inning level or game level.  But, and this is critical, you CANNOT group by runs (the output).  The grouping MUST be done by the inputs (preferably something like BaseRuns).

Otherwise, you group all the shutout games, and what are you going to have?  An impossibility, since how can a .100 OBP and .150 SLG lead to zero runs?

Always always always select your samples without knowledge of the future.  This is as important as location location location is to buying a house.

Always always always.


#9    Colin Wyers      (see all posts) 2008/06/14 (Sat) @ 13:25

Makes sense. Not something I’d want to implement in pure SQL - BaseRuns uses a bit more math than I really like doing in SQL.

I’m rewriting all of my code to exclude partial innings (and trying to get some better performance out of it) but in the meanwhile, here’s the weights I’ve got for all years, broken down by league:

http://www.editgrid.com/user/cwyers/linear_weights_version_0.1


#10    tangotiger      (see all posts) 2008/06/14 (Sat) @ 20:51

Colin, are you in the RetroSQL group?  I’ve got most of what you need already coded there.


#11    Colin Wyers      (see all posts) 2008/06/14 (Sat) @ 20:56

Yeah. I used your suggestion on the START_BASES_CD thing. As for the partial innings code, I rewrote it to use the INN_END_FL:

CREATE TABLE events_inning_partial
AS
SELECT GAME_ID, INN_CT, BAT_HOME_ID, END_BASES_CD
FROM events
WHERE ( INN_END_FL = “T” AND (OUTS_CT + EVENT_OUTS_CT) < 3 );

Should be a little more expedient. I’ll post my full code there in a bit.


#12    tangotiger      (see all posts) 2008/06/15 (Sun) @ 00:31

Yup, that should work out better than what I have.  Good catch.


#13    tangotiger      (see all posts) 2008/06/15 (Sun) @ 11:12

Your way is better obviously.  Performance-wise I was surprised the little gain.  It was 65 seconds running time against 84 seconds my more convoluted way.  It’s a good gain, but I’m surprised not as much as I thought it would.


#14    Tangotiger      (see all posts) 2008/06/17 (Tue) @ 13:59

I should also say, going back to the “run environment”, it needs to be done by “expected run environment”.  So, the chance that you may or may not bunt, may or may not steal, plays into the value of things.

If for example you have a game that’s a 6-hitter, and it was done in 1960s with lots of bunts and steals, or it was a 6-hitter in a game pitched by Jose Lima in 2002, those are two different things.  How different in terms of establishing RE and LWTS, I don’t know yet.


#15    Tangotiger      (see all posts) 2008/06/17 (Tue) @ 16:18

I alerted Palmer as to my findings, and he said he’ll look into it.

He had said that he got his new (higher) doubles value (back in 2003) by updating his original model (which was based on limited World Series games) with more data (presumably all the PBP data he’s collected). 

Seeing that Ruane and I independently used, basically, the maximum available data, and we got the same results, and that my Markov program also got similar results (Table 7 in The Book), I recommended to him, and highly suggest to you guys, that you use what I’ve found.

The gap in run values is pretty static at .30 runs between the single and double, and .16 runs between the single and walk.  The HR value is 1.40, and the gap between the double and triple is around .58 runs.  All of this, regardless of run environment.


#16    tangotiger      (see all posts) 2008/07/13 (Sun) @ 23:31

Palmer just confirmed that his .85 value is indeed wrong, and that he should have used .78.  To his credit, he looked into it, and admitted his error very quickly.


#17    terpsfan101      (see all posts) 2008/08/29 (Fri) @ 18:09

Tangotiger,

Can you recall the exact years that you used for the calculations of the RE Chart? I assume 1999 isn’t included because Retrosheet released the PBP data approximately 2 weeks after you published this. So I’m assuming you used 1956-1998 and 2000-2007 for both leagues. And then 1954 and 1955 for the NL.


#18    tangotiger      (see all posts) 2008/08/29 (Fri) @ 19:27

Right, except no 1955 NL.


#19    terpsfan101      (see all posts) 2008/08/29 (Fri) @ 21:49

Thanks for the quick reply. You did include the 1955 NL. Using the years I listed in post#17, there were 886218 runs. From the PBP files you counted 885304 runs. If I exclude 1955 NL, there were 880640 runs.


#20    tangotiger      (see all posts) 2008/08/30 (Sat) @ 01:10

I might therefore have also included 1911 and 1921.

My DB does not have 1955 (yet) and so, it definitely is not part of the dataset.


#21    terpsfan101      (see all posts) 2008/08/30 (Sat) @ 08:12

1921 NL was released after you compiled this. But 1911 NL and 1922 NL were available. If you didn’t include 1955, I’m thinking you didn’t include 1954. I am now assuming that you used these years:

1911 NL
1922 NL
1956-1998 AL,NL
2000-2007 AL,NL

There were 886716 runs scored during these seasons. From the PBP files you counted 885304.

Why did you use R/PA instead of R/O?


#22    terpsfan101      (see all posts) 2008/08/30 (Sat) @ 22:02

It looks like the R/O is derived from R/PA. I can’t believe I didn’t figure that out sooner.

By the way, I do think you included NL 1954 in your sample. That gives us 892294 Runs from the official statistics. So if you counted 885304 runs from your sample, 7000 were unaccounted for. This seems reasonable considering that the PBP is missing a few games for most seasons prior to 1973 and the fact that you excluded partial innings and home halves of the 9th and later innings.


#23    tangotiger      (see all posts) 2008/08/30 (Sat) @ 23:13

Right, my post 18 only removed 55NL, not 54NL.


#24    terpsfan101      (see all posts) 2008/08/31 (Sun) @ 10:08

Thanks Tom. I’ve removed the partial innings and home halves of the ninth and later innings. I’ll take a stab at calculating the Linear Weights later tonight.


#25    terpsfan101      (see all posts) 2008/09/02 (Tue) @ 18:48

Here is what the RE chart from 1954 to 2007 looks like. This was a pain to calculate as I have next to no knowledge about databases. I had to manually delete all the partial innings. I had to wait 10-15 minutes for Access to run each query. What probably took Tom less than an hour to calculate, took me about 10 hours.

BASES OUTS PA RUNS REOI
___ 0 1851084 917545 0.496
___ 1 1304712 344457 0.264
___ 2 1026666 102873 0.100
1__ 0 480346 419844 0.874
1__ 1 550587 286985 0.521
1__ 2 553980 124011 0.224
_2_ 0 100437 112933 1.124
_2_ 1 207757 143532 0.691
_2_ 2 256903 85305 0.332
12_ 0 108773 162988 1.498
12_ 1 198399 183383 0.924
12_ 2 254060 112931 0.445
__3 0 18189 24852 1.366
__3 1 69196 66000 0.954
__3 2 107739 40668 0.377
1_3 0 45438 79810 1.756
1_3 1 94295 110217 1.169
1_3 2 122049 61327 0.502
_23 0 20969 41846 1.996
_23 1 56832 79650 1.401
_23 2 59544 36092 0.606
123 0 27312 64074 2.346
123 1 69456 108989 1.569
123 2 84652 65224 0.770

There were 917545 runs scored in 1787300 innings for a runs per inning of .513


#26    terpsfan101      (see all posts) 2008/09/02 (Tue) @ 18:57

Perhaps somebody tell me how to properly format a table. I don’t understand why it looks correct before you post it, but then it gets reformatted.


#27    terpsfan101      (see all posts) 2008/09/12 (Fri) @ 19:54

Tangotiger or Colin,

I was wondering if you could tell me how I could link my RE chart to the events in my PBP database in Microsoft Access. For instance, how would I link all the events with Start_Bases_CD = 0 and Outs_CT = 0, to the .496 REOI value I calculated for this base-out state.

Thanks


#28    Colin Wyers      (see all posts) 2008/09/12 (Fri) @ 20:15

terpsfan - Possibly; I’m barely fluent in Access. What version of Access are you using?


#29    terpsfan101      (see all posts) 2008/09/12 (Fri) @ 21:38

Access 2002.


#30    tangotiger      (see all posts) 2008/09/13 (Sat) @ 00:02

I’m not sure I follow the problem.  Can you amplify the issue?


#31    terpsfan101      (see all posts) 2008/09/13 (Sat) @ 00:41

I was just wondering how to connect my RE table to my PBP database. I think that I need to do this by setting up a relationship between the 2 tables. I created 2 tables, “RE Start” and “RE End”. I connected “RE Start” to the PBP database using the fields START_BASES_CD and OUTS_CT. Then I connected “RE End” to the PBP database using END_BASES_CD and END_OUTS_CT. I defined END_OUTS_CT as (OUTS_CT + EVENT_OUTS_CT). Let me see if this works.


#32    Colin Wyers      (see all posts) 2008/09/13 (Sat) @ 00:43

I’m pretty sure that the question is just how to perform an SQL join. (Somebody really should put out a “Databases for Sabermetricians” resource one of these days. I nominate Brian or Mat.)

terpsfan, I presume you have your RE data as a database table. I would go ahead and add a new column, and use that for the codes used in the START_BASES_CD column. Then, create a new Query and go into design view. Add the two tables. You should just be able to perform a join by dragging lines between the items you want to match.


#33    terpsfan101      (see all posts) 2008/09/13 (Sat) @ 00:47

Yes it worked! Now I just need to find an expression to calculate the R/O for plays where there was an out recorded. I’ll need help with this.


#34    Colin Wyers      (see all posts) 2008/09/13 (Sat) @ 01:22

Take the value for your starting RE (none on, none out) and divide by three. Then, start a new query. There should be a column in your database called something like EVENT_OUTS_CT. I’ll presume your events table is called “events.” Go into SQL mode, and type something like this:

SELECT events.Event_CD, AVG(events.Event_Outs_CT)
FROM events
GROUP BY events.Event_CD;

That’ll (if I’m correct on guessing your naming conventions) give you the outs per event. Take that number, and multiply it by your out value above. Then subtract that value from the appropriate LWTS value, and you should be set.

(Note: I’m simply instructing on how to do it; check the Preview to Run Estimation thread for discussion on whether or not this is the RIGHT way to do it.)


#35    terpsfan101      (see all posts) 2008/09/13 (Sat) @ 01:37

Colin,

Right, you can apply either the average R/O or the empirical R/O. I was more interested in applying the empircal R/O. For instance a Sacrifice Hit with no outs has an R/O of .232, according to the table in post #25. I think I can figure out how to implement this in my database. I’ll let you know how it goes. Thanks for your help.


#36    terpsfan101      (see all posts) 2008/09/13 (Sat) @ 06:14

I figured out how to apply the R/O and was able to calculate both average and runs created Linear Weights. However, I noticed a problem with my RE table. To get the “Runs to the End of the Inning” for each base-out state, I took EVENT_RUNS_CT + FATE_RUNS_CT when PA_NEW = T. Is this correct? When I take the sum of EVENT_RUNS_CT for all base-out states, I get 885,198 runs. So why do I keep getting 917,545 Runs to the end of the inning for the Bases Empty No Out state.


#37    terpsfan101      (see all posts) 2008/09/13 (Sat) @ 06:47

The sum of my Runs Created LW is 885,200 runs. So it does look like the sum of EVENT_RUNS_CT (885,198 runs) gives you the total number of runs scored. Now I really am confused as to why the REOI for the Bases Empty No Out state is 30,000+ runs higher than the actual runs scored.


#38    Colin Wyers      (see all posts) 2008/09/13 (Sat) @ 11:19

I don’t know if this is the reason, but states do repeat themselves in the same inning, so you will end up double counting some runs. For example, let’s say you have an inning where 9 runs will score. First hitter hits a solo home run. So that’s an REOI of 9 runs. The next PA - same base-out state, 8 runs to score in the inning.  He hits a solo home run. Next batter, still the same base-out state, 7 fate runs in the inning. So only 9 runs score this inning, but already that’s 9+8+7 = 24 runs added to the 0 on, 0 out baseout state.


#39    terpsfan101      (see all posts) 2008/09/13 (Sat) @ 13:07

Colin,

Thanks for pointing out that certain base-out states repeat themselves in the same inning. So it does look like I calculated the RE chart correctly.


#40    terpsfan101      (see all posts) 2008/09/13 (Sat) @ 14:01

Colin,

What column did you use to count Plate Appearances. I just noticed that if you count the instances where PA_NEW = T, you end up double counting Plate Appearances in certain situations. For instance, if a runner was caught stealing for the 3rd out, that batter would be charged with a plate appearance to end the inning and then be charged with another one when he leads off the next inning.


#41    Colin Wyers      (see all posts) 2008/09/13 (Sat) @ 14:24

I’ve never bothered to compute plate appearances for an RE/LWTS table - events like SB/CS and WP/PB aren’t plate appearances, but you still want to include them. Since any events like you describe would be assigned to an event code for a non-PA event, it should work out fine.

I use BAT_EVENT_FL to derive PAs normally (but, again, not for RE charts). If you want to exclude partial PAs, use the PA_TRUNC_FL field.


#42    terpsfan101      (see all posts) 2008/09/13 (Sat) @ 16:39

Here are the LW from 1954-2007 grouped by Event Code and SH flag (True or False). I set the “SH flag” to false for all events “not including” the Sacrifice Hit so I could avoid double counting the Sac Hit. I also split the Fielder’s Choice event code into 2 categories: Reached on Fielder’s Choice and Fielder’s Choice Out. Eventually, I plan on refining these LW. For instance, about 3000 Caught Stealing appear in the Event Code for Strikeouts. I haven’t figured out the best way to split up these sort of things just yet. When I do, I’ll post the results on editgrid.com. Partial and home-half of the ninth and later innings were excluded. The Runs Per Game for the games in my play-by-play database (from which this sample is drawn) was 4.41 Runs/Game.

EVENT, LW, RC, Frequency

Single, 0.461, 0.465, 1234346
Double, 0.767, 0.770, 310566
Triple, 1.047, 1.048, 44542
Home Run, 1.402, 1.402, 177434
Walk, 0.308, 0.308, 598605
Intentional Walk 0.167, 0.167, 56431
Hit by Pitch, 0.333, 0.333, 49236
Error, 0.481, 0.483, 86015
Reached on Fielder’s Choice, 0.676, 0.676, 6328
Interference, 0.371, 0.371, 897
Sacrifice Hit, -0.095, 0.105, 77631
Generic Out, -0.271, -0.099, 3852288
Strikeout, -0.276, -0.112, 1133900
Fielder’s Choice Out, -0.626, -0.446, 18176
Stolen Base, 0.200, 0.200, 113332
Caught Stealing, -0.437, -0.279, 48746
Pickoff, -0.236, -0.126, 24351
Balk, 0.270, 0.270, 9497
Passed Ball, 0.275, 0.275, 15060
Wild Pitch, 0.279, 0.279, 55378
Def Indifference, 0.131, 0.131, 1600
Other Advance, -0.445, -0.315, 2450

Runs Scored = 885198


#43    terpsfan101      (see all posts) 2008/09/15 (Mon) @ 17:10

First, let met apologize for hijacking this thread. This really isn’t the place for me to be seeking advice. A forum would be a more appropriate place. I am looking for advice on how to split up SB, CS, Pickoffs, WP, and PB that occur on events where a plate appearance occurs. For instance, K+SB, K+CS, BB+CS, etc.... I cannot find an easy way to do this. For example, say there is a runner on 2nd base who is thrown out trying to steal third on “ball four” with 2 outs. What would the appropriate LW value for the walk be, and likewise for the CS?


#44    Colin Wyers      (see all posts) 2008/09/15 (Mon) @ 17:27

terpsfan - I suppose it depends on what dataset you plan on applying the LWTS to.


#45    terpsfan101      (see all posts) 2008/09/15 (Mon) @ 17:55

I guess that the dataset I’d plan on applying the LWTS to would be a hybrid of the “official statistics” plus a few Retrosheet categories such as ROE, RFC, and, Pickoffs (not Pickoff CS). Perhaps PB, WP, and BK if I can determine how to give credit to the baserunners on the latter 3 categories. I was just unsure how to break up a Retrosheet event that falls into 2 of my categories.


#46    Colin Wyers      (see all posts) 2008/09/15 (Mon) @ 22:09

If you’re creating your own dataset, then just be consistent with it - whatever you do for the LWTS, do for the data you’re parsing out of the gamelogs. I’d just use whatever Chadwick produced for event codes, frankly, as that’s probably the easiest.


#47    terpsfan101      (see all posts) 2008/09/15 (Mon) @ 22:34

After taking a look at the data, I have to agree with you Colin. There may be some value in seperating the non-PA events that occur during PA’s, but it would just require too much time and effort to be worthwile. It would require that you look at most plays one-at-a-time.

Thanks.


#48    Colin Wyers      (see all posts) 2008/09/15 (Mon) @ 23:47

There are a lot of plays on base that simply aren’t recorded anywhere in the official stats. We can study those issues if we want, with the Retrosheet data. But it does add an extra level of complexity to thing, and for simple offensive linear weights to be applied to hitters I don’t see the value of it. (Now, if you want to do baserunning linear weights, then you do want to account for ALL baserunning plays - extra bases and outs on base.)


#49    terpsfan101      (see all posts) 2008/09/19 (Fri) @ 02:55

Alright, I figured out how to split BB and SO plays where a SB or CS occurs. First, you figure the run expectancy of the BB or SO as if the SB or CS did not occur. Then you figure the run expectancy of the SB or CS by acting as if it occurred after the BB or SO.

There are quite a few ways you can calculate Linear Weights from Retrosheet data depending on how you group things. One of the first things I wanted to attempt was to calculate a standard set of Linear Weights that could be applied to the official Batting statistics. Obviously, this is not the ideal way to calculate Linear Weights because not every baseball event is classified by the Official Batting stats. However, most baseball fans only have access to the official statistics. So I will present them here. The first set of Linear Weights is from my Play by Play database which runs from 1954 to 2007. These are not reconciled to runs scored. They are about 20,000 runs short. However, I would use these to calculate Custom Linear Weights with BaseRuns. I would then reconcile the Custom Linear Weights to Runs Scored for the dataset I’m looking at. This way you do not introduce inaccuracies by reconciling the Linear Weights twice.

Unreconciled (PBP Database 1954-2007)

EVENT: LW, RC, N
1B: 0.4613, 0.4646, 1234346
2B: 0.7667, 0.7697, 310566
3B: 1.0467, 1.0479, 44542
HR: 1.4020, 1.4020, 177434
UIBB: 0.3079, 0.3079, 598605
IBB: 0.1665, 0.1665, 56431
HBP: 0.3330, 0.3330, 49236
SH: -0.0948, 0.1053, 77631
OOUT: -0.2360, -0.0735, 3751640
SO: -0.2746, -0.1115, 1133900
SF: -0.0316, 0.1524, 56914
GIDP: -0.7991, -0.4845, 154255
SB: 0.1907, 0.1909, 123612
CS: -0.4247, -0.2712, 62773
RUNS: 885198
EVENTS: 7831885
RUNS/G: 4.41

OOUT = AB – H – SO – GIDP

The run values of Reached on Error (ROE) and Reached on Fielder’s Choice (RFC) are implied in OOUT since they are a subset of AB-H-SO-GIDP. That is why you see the .025 in the “a” coefficient of the Baseruns formula below. ROE and RFC occurring on a Sacrifice Hit are not included in OOUT since the batter isn’t charged with an at-bat. ROE SF are excluded from this category as well.

The second set of Linear Weights are reconciled to Runs Scored. This means that the Linear Weights sum to zero, and the Runs Created figures sum to Runs Scored. I applied the LW from my PBP database to the League stats from 1954 to 2007 using the Baseball Databank database. The IBB data from 1954 was obtained from Retrosheet’s website.

Reconciled to Runs Scored (Baseball Databank Database 1954-2007)

EVENT: LW, RC, N
1B: 0.4639, 0.4667, 1294104
2B: 0.7692, 0.7717, 322795
3B: 1.0493, 1.0500, 46310
HR: 1.4041, 1.4041, 185378
UIBB: 0.3100, 0.3100, 627082
IBB: 0.1686, 0.1686, 64079
HBP: 0.3351, 0.3351, 51204
SH: -0.0922, 0.1074, 83681
OOUT: -0.2333, -0.0714, 3927312
SO: -0.2721, -0.1095, 1191429
SF: -0.0290, 0.1545, 58924
GIDP: -0.7964, -0.4825, 160291
SB: 0.1930, 0.1930, 127253
CS: -0.4221, -0.2692, 63773
RUNS:  921602
EVENTS: 8203615
GAMES: 209098
RUNS/G: 4.41

OOUT = AB – H – SO – GIDP

Here are two Baseruns equations for the above two sets of data. The A factor represents initial baserunners and the C factor represents all outs.

Baseruns Equation (Unreconciled)

A = 1B + 2B + 3B + UIBB + IBB + HBP + .08*SH + .025*OOUT

B = 0.753*1B + 2.112*2B + 3.352*3B + 1.791*HR + 0.054*UIBB + -0.576*IBB + 0.166*HBP + 0.750*SH + 0.049*OOUT + -0.077*SO + 1.099*SF + -1.319*GIDP + 0.851*SB + -0.789*CS

C = 0.92*SH + 0.975*OOUT + SO + SF + 2*GIDP + CS

D = HR

BsR = A * B / (B + C) + D

Baseruns Equation (Reconciled to Runs Scored)

A, C, and D are the same as above.

B = 0.766*1B + 2.114*2B + 3.344*3B + 1.786*HR + 0.074*UIBB + -0.551*IBB + 0.185*HBP + 0.753*SH + 0.057*OOUT + -0.069*SO + 1.098*SF + -1.302*GIDP + 0.853*SB + -0.775*CS

BsR = A * B / (B + C) + D


#50    tangotiger      (see all posts) 2008/09/19 (Fri) @ 06:42

Good job.

There’s no reason to include SF for reasons stated too often.


#51    terpsfan101      (see all posts) 2008/09/19 (Fri) @ 14:15

I am against including the SF seperately as well. One could make the argument that we shouldn’t be including any situational statistics (IBB, SH, SF, GIDP) in our run estimators.


#52    Tangotiger      (see all posts) 2008/09/19 (Fri) @ 14:33

You bet.

SF is an accounting method.  There is no difference at all between a GB that scores the runner from 3B and gets the batter out and a FB that does the same thing.

We don’t differentiate between a double that scores a runner and one that doesn’t.

By using SF, you are using the knowledge that a run scored, because the accounting told you. 

You can make a similar case with the GIDP (takes known effect of runner on base).

IBB takes a known state of 1B likely open and likely 1 or 2 outs.

SH takes a known state of man on base and less than 2 outs and batter likely making out.

The SF is the worse of the lot because of the arbitrariness of FB/GB, and the thing we are looking for (runs) is guaranteed by knowing the accounting of the event.


#53    Colin Wyers      (see all posts) 2008/09/19 (Fri) @ 17:51

I think the difference - at least from a run estimation point of view - is that the other events tell us components: bases, advancement and outs. Sacrifice flies is the only one that just cuts to the chase and tells us “hey, I scored a run!” I think there’s a clear difference between giving us inputs based on situational stats and giving us the output (runs).


#54    terpsfan101      (see all posts) 2008/09/21 (Sun) @ 08:36

These are the categories I have decided to use for Linear Weights. I am including the Sac Fly for historical purposes only. It will be a helpful category when I calculate Custom LWTS from 1908-1930 and 1939. Sac Fly’s were lumped together with Sac Hits those seasons.

None of the categories overlap into another category. I don’t know how I managed to do this, but the sum of the RC weights reconciled to Runs Scored, even after splitting everything up. 

EVENT: LW, RC, N

1B: 0.461, 0.465, 1234346
2B: 0.767, 0.770, 310566
3B: 1.047, 1.048, 44542
HR: 1.402, 1.402, 177434
UIBB: 0.308, 0.308, 598605
IBB: 0.167, 0.167, 56431
HBP: 0.333, 0.333, 49236
INT: 0.371, 0.371, 897
ROE: 0.477, 0.479, 88749
RFC: 0.676, 0.676, 6328
SH: -0.095, 0.105, 77631
SF: -0.032, 0.152, 56914
OUT: -0.255, -0.088, 3656563
SO: -0.274, -0.112, 1133900
GIDP: -0.799, -0.485, 154254
SB: 0.191, 0.191, 123612
CS: -0.425, -0.272, 56056
PkO CS: -0.422, -0.268, 6717
PkO: -0.530, -0.364, 10114
PkO Error: 0.304, 0.304, 7745
BK: 0.270, 0.270, 9497
PB Non-PA: 0.275, 0.275, 15061
WP Non-PA: 0.279, 0.280, 55380
Def Ind: 0.131, 0.131, 1600
Other Advance: -0.445, -0.315, 2450
RUNS: 885198
RUNS/G: 4.41

OUT = AB – H – SO – GIDP – ROE – RFC


#55    terpsfan101      (see all posts) 2008/09/22 (Mon) @ 06:28

I don’t know what I was thinking including GIDP and SF in the set of LW I listed right above this post. If I was going to use a value-added LW or RC method, then it would be OK to measure the run impact of the GIDP and SF. My reasoning behind including Defensive Indifference and Other Advance was nothing other than to reconclie the linear weights without having to fudge them.


#56    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 06:12

I believe that the correct denominator for Runs to the End of the Inning (REOI) should be “Events”, not the PA_NEW Flag. I’ll have to confirm this later tonight. In the meantime, I’ll tell you how I arrived at this premise.

When I calculated my linear weights, they didn’t sum to zero (approximately 700 runs short). At first I thought the culprit was Foul Errors. After removing Foul Errors and calculating a new RE chart, I was now 720 runs short. So it looks like the denominator needs to increase, not decrease, for the LW to reconcile to zero. Now the only way to increase the denominator is to use “Events” as our denominator as opposed to the “PA_NEW” flag.

Sorry for the poor explanation. I wish that I could test this right now. It’s going to be bugging me for the rest of the day.


#57    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 11:35

I got a chance to test the results of using all Events as the denominator for REOI. It got me 100 runs closer to zero. So my LW now sum to -600 runs.


#58    Tangotiger      (see all posts) 2008/09/25 (Thu) @ 12:02

If you exclude all partial innings and all home half of the ninth, you should sum to exactly 0 if you use all events.

If you don’t sum to zero, it may be something as benign as a rounding issue.


#59    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 12:23

The new LW values I’m getting closely resemble Tango’s:

http://www.insidethebook.com/ee/index.php/site/comments/actual_wins_retrosheet_years/

Tangotiger has really got me confused here. It appears as though he used “Events” as the denominator for REOI in the link I just listed since I just replicated the work he did there and calculated LW within .001 runs of his results. If I use the methods he described in the beginning of this post, the LW vary in some cases by .01 to .02 runs compared to what he calculated in the “Actual Wins, Retrosheet Years” post.

Which method should I use?


#60    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 12:29

I’m 99.9% sure that all partial innings were removed. I manually removed them and then double checked by making sure that the sum of outs for each game was divisible by 3. I even removed partial innings where the game ends with “no outs.” I definitely removed all the home-half of the ninth and later innings.


#61    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 12:34

There were only about 200 games with partial innings after removing the home-half of the ninth and later innings. I doubt rounding is an issue. Access is storing 15 digits after the decimal point.


#62    Tangotiger      (see all posts) 2008/09/25 (Thu) @ 13:38

I’m pretty sure I did not use all events, and only used the events at the start of the PA.

The effect here is that if you include all events, then the LWTS won’t come out to zero.  In essence, by doing the start of the PA, the multiple events in a PA get rolled up into that one PA.  For example, a SB followed by a HR counts as one PA, not events.

I’ve done it both ways, and I think I prefer the PA-version, not the event-version.


#63    Tangotiger      (see all posts) 2008/09/25 (Thu) @ 13:42

The chart in the main link shows exactly how many PA at that base/out state.  So, it should be easy enough to confirm what I did and did not use.

Actually, I just noticed I provided the exact SQL! The only thing we can be differing on is the years being used.

So, I’m not sure why you are manually removing anything at all.  Just set up your DB as I did.


#64    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 14:07

When you get some free time (which you probably don’t have a lot of), see what you’re un-rounded LW from “Actual Wins, Retrosheet Years” sum to.

Thanks!

BTW,

I don’t know how to use SQL, not even in Access. I used you’re SQL only as a guide. What I should be doing in a few easy steps ends up turning into a complex mess of tables and relationships. Further complicating matters is the 2GB file-size limitation.


#65    Tangotiger      (see all posts) 2008/09/25 (Thu) @ 14:49

terps: it’s probably in your interest to learn at this point.  It’s not that bad, especially since I’m giving you the step-by-step after your MySQL or other DB is installed.

I don’t have the unrounded numbers, but my rounded numbers do total +580 runs (from post 59 above).  If I change my out value from -.270 to -.270164 (just as an illustration), I get exactly zero.


#66    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 15:45

If you look at the RE table I listed in post #25, it essentially matches the one you listed in the “main link.” Because my PBP database includes 1999 and 2007 (5 Runs/Game), the results are a tad higher. For some events, I get slightly different LW using the method you defined in your SQL (PA_NEW_FL = T) than if I use “all events”, such as you did in “Actual Wins, Retrosheet Years.” For instance, using PA_NEW_FL for the RE chart I get a LW of 1.047 for the triple. Including every event in the RE Chart, I get 1.033, which is the exact same value you listed in “Actual Wins, Retrosheet Years.”

I’m pretty sure that if you were to calculate LW using either method, you would end up with a slightly negative number. I am not saying that this is your fault. Am I right in thinking that the fault lies in the way Run Expectancy charts are constructed? I just checked and Colin’s LW don’t sum to zero either. The strange thing is that using both the PA and Event approach, the absolute Linear Weights sum to runs scored.


#67          (see all posts) 2008/09/25 (Thu) @ 18:50

terpsfan, I use Access and I’m not sure how you’re doing all the calculations without using SQL - if you use queries, then you are using SQL, even if you are using a wizard and not coding it by hand.

You do mentin having many tables, so maybe you aren’t using queries as much as you should. Whether in Access or MySQL, it’s the queries which give you the power and will probably simplify the process. For example, my MLE/Projection tool has seasom, league, team, player and batting as the only tables, then has a sequence of queries. I only have to click on the Projections query to see the final result.


#68    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 19:21

Brian,

I’m slowly getting the hang of Access. I’ve started to group some of my tables together. Instead of having 30 seperate tables with seperate events in them and linking them all to the Run Expectancy tables, I’ve now managed to cut it down to 15 tables. So I still have to run 15 queries to get Linear Weights. This is probably because I broke things into more categories than the Event Codes Retrosheet uses. Everything reconciles to the total number of outs and runs scored. Not the Linear Weights, of course, which keep summing to a slightly negative number.


#69    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 19:53

Tangotiger,

What is the sum of EVENT_RUNS_CT for your PBP database? I was wondering if it equal the 885304 runs that is listed in your RE chart for the “bases-empty no out” state.


#70    tangotiger      (see all posts) 2008/09/25 (Thu) @ 20:13

It would have to be that.


#71    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 20:17

Can you double check?


#72    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 20:29

The number of runs scored in my databse is 885,198 runs. My RE chart lists 917,545 for the bases-empty, no outs state. Colin stated earlier in the thread that he believes this is due to the fact that certain base-out states can repeat themselves throughout an inning. See Colin’s explanation in post # 38.


#73    terpsfan101      (see all posts) 2008/09/25 (Thu) @ 20:55

Tango,

I get exactly 859,000 runs scored from the PBP data using the years you listed here:

http://tangotiger.net/retrosheet/database/common/utilities/Retro_Events.txt

You list 885,304 REOI

Our RE Charts are over-predicting runs-scored. In your case, 26000 runs. In my case 32000 runs.

Is this OK?


#74    terpsfan101      (see all posts) 2008/09/26 (Fri) @ 14:56

Actually Runs scored shouldn’t match the Runs total for the bases-empty no-out state. Since our definition of Plate Appearances differs from the official definition, then our LW shouldn’t sum to zero. Because we are over-estimating PA’s, we are over-estimating Runs. This is probably causing the LW to come out slightly negative. The most logical way to reconcile the RE chart would be to adjust (EVENT_RUNS_CT + FATE_RUNS_CT) in proportion to the frequency that each base-out state occurs, until the LW = 0. Then we divide by Plate Appearances (PA_NEW_FL). In most cases, we are talking about a .001 fudge for each Base-Out state. Perhaps .002 for the states that occur less frequently. Problem solved, I think.

Tom, in your Custom RE chart, the denominator for the “Frequency” column is “PA” from the bases-empty no-out state, right?. Thanks for these by the way:

http://www.insidethebook.com/ee/index.php/site/comments/run_expectancy_by_run_environment/


#75    terpsfan101      (see all posts) 2008/09/26 (Fri) @ 20:20

The RE Fudge worked! The LW now sum to zero.


#76    terpsfan101      (see all posts) 2008/10/02 (Thu) @ 07:13

I finally calculated empirical LW for all of the Retrosheet years (1911 NL, 1921 NL, 1922 NL, 1953 NL, and 1954-2008 ML). I didn’t fudge the RE charts, like I had discussed in my previous 2 posts. The LW come pretty close to summing to zero. They sum to -228. When we’re talking about almost 1,000,000 runs, I can tolerate only being -228 runs off. I wonder if they would have summed to zero had I used the “Events approach” as opposed to the “start of the Plate Appearance” approach. Just splitting hairs here. Anyway, I just published them on Google Docs.

The first spreadsheet is for the Linear Weights:

http://spreadsheets.google.com/pub?key=pzy9IhjJPqas3SLH5qvVTYg&hl=en

The second spreadsheet is for the Run Expectancy Data:

http://spreadsheets.google.com/pub?key=pzy9IhjJPqasczX-d6q_eUA&hl=en


#77    terpsfan101      (see all posts) 2008/10/02 (Thu) @ 07:32

I know this is overkill, but here is how I initially broke things up. Some of this data was listed in the summary section of the first LW spreadsheet I posted:

http://spreadsheets.google.com/pub?key=pzy9IhjJPqauR12Iog9suPg


#78    terpsfan101      (see all posts) 2008/10/02 (Thu) @ 14:55

Consider these spreadsheets “Beta versions.” In this case, beta meaning that the presentation needs to be improved upon.


#79    terpsfan101      (see all posts) 2008/10/04 (Sat) @ 02:03

Grouping together the Linear Weights I calculated from 1993-2007 for the AL, I noticed that the Linear Weight value of Outs (AB-H-SO-ROE+SF) was higher than the value of a Strikeout.

Outs: -0.308
SO: -0.306

This is happening because Double Plays are being counted under Outs.

Should I fix this, combine them? Or simply leave it alone?


#80    Colin Wyers      (see all posts) 2008/10/04 (Sat) @ 11:59

If you’re including double plays in your LWTS for event code 2, it should be higher than the strikeout. (Or could be, at least, depending on your dataset.) You don’t have to fix it because it isn’t broken - it is what it is, not like coming up with a doubles value higher than the triple or anything.

If you do want to do something about it, either combine the strikeout with the other outs or remove the double play from the outs and account for them seperately.

Of course, all of this begs the question of what exactly you want linear weights for - it’s hard for me to say you should or shouldn’t do anything in an abstract sense. (Okay, well, I can tell you to ignore sacrifice flies in almost every case.)


#81    terpsfan101      (see all posts) 2008/10/05 (Sun) @ 02:05

I know MGL incorporates GIDP opportunities into his linear weights. I hadn’t been able to find an explanation about how he did this until I searched through Tango’s Fanhome Archives. Here is the Fanhome post where MGL describes how to normalize GIDP’s:

Posted September 17th, 2000 07:44 AM

...

“The other question is, “How much credit (responsibility) should we give a batter (or pitcher) for his DP’s. It has been suggested that since DP’s are a situational stat, i.e., they depend upon the fact that there is a runner on 1st with less than 2 outs, we ought to hold a batter only partially responsible for his DP’s. I agree with this, however I don’t think that the solution is to partially weight DP’s (if we took only half of their value, we would still add a whole out (-.3) to every DP, since the value of a DP is around 2 generic outs). In fact, the problem with using a player’s DP’s and adding -.6 for every DP (a DP costs .6 runs more than a generic out) is that this would be almost the same thing as using a “value added” formula for all the other offensive events rather than a lwts formula. A value added formula uses play-by-play data for each player and assigns a value to each offensive event depending upon the actual increase or decrease in run expectancy from before the event to after the event, and then adds up all these values. It is similar to lwts, but it is not context neutral. It basically uses a different lwts value each time an offensive event occurs. It is more accurate in describing exactly how much value a player created for his team, but it is not as accurate as a standard lwts formula in predicting future performance or evaluating ability. So using a run value for a DP, like -.8 (the -.8 includes both outs), and adding this into a standard lwts formula is mixing apples with oranges (or bananas with kiwi).

I think the DP “stat” that is most analagous to the other constant (context neutral) lwts values is DP per opportunity. We can use a value of -.8 per “normalized DP” and add this to the other lwts values in order to evaluate a player. This requires a lot of extra work and a non-mainstream database. We have to take a player’s total # of DP’s and “normalize” them to adjust for DP’s per opportunity. Let’s say the average player in the league hits into 1 DP per 25 AB’s. Let’s also say that the average player hits into 1 DP per 2 DP opportinuties. If player A had a DP per opportunity of .6 (20% greater than the league average of .5) we would “assign” him 1.2 DP per 25 AB’s. Since a DP is worth an extra -.6 runs or so, this player would have an extra -.12 runs per 25 AB (-.6 times .2), by virtue of the fact that he hits into 20% more DP’s PER OPPORTUNITY, no matter how many actual DP’s per AB he had. The assumption that a player is 100% responsible for his DP’s per opportunity is the same as the assumption that he is 100% responsible for his walks, hits, or K’s per AB. Using his method, or some version of it, allows us to combine apples with apples (or grapefruits with grapefruits).

That was a mouthful!”


#82    terpsfan101      (see all posts) 2008/10/10 (Fri) @ 02:58

The GIDP adjustment is confusing me. For instance, in 1991 AL, the GIDP rate (GIDP/Opportunities) was .11. Randy Milligan hit into 23 GIDP in 103 opportunities, so his GIDP rate was .22. The average player would of hit into 11.4 GIDP given Milligan’s GIDP opportunities. Milligan grounded into 11.6 more GIDP than he should have. Do I add 11.6 to his 23 GIDP?


#83    terpsfan101      (see all posts) 2008/10/10 (Fri) @ 03:19

I was figuring this out correctly. I was just struggling to find a way to express it as a formula. The formula is:

Normalized GIDP = (Player GIDP - (Lg GIDP Rate * Player GIDP Opps)) + Player GIDP

Using the Randy Milligan example above:

(23 - (.11 * 103)) + 23 = 34.6 Normalized GIDP


#84          (see all posts) 2008/10/10 (Fri) @ 04:22

No, that’s not right. He’s at twice the league rate, and after you normalize he’s at three times average.

His pct is what it is. Normalize by adjusting the number of opportunities.

According to BP stats page, in 1991 there were 160746 PA, 29642 DPopp and 3728 DPs.
DP/DPopp = .126
DPopp/PA = .184

Milligan had 571 PA, expDPopp = 571*.184 = 105.3
25 DP / 103 DPopp * 105.3 expDPopp = 25.6 expDP


#85    terpsfan101      (see all posts) 2008/10/10 (Fri) @ 06:31

So 24.4 would be Milligan’s normalized DP’s:

(25 DP - 25.6 expDP) + 25 DP = 24.4 Normalized DP


#86          (see all posts) 2008/10/10 (Fri) @ 14:43

No - I can’t understand your formula.

I am normalizing for opportunities. Depending on your teammates OBP and where you bat in the order, you might get more opportunities than “normaL” for your number of PAs.

The only way to normalize the pct of FP/DPopp is if to look at the pitcher, the fielders and the park to see if the defensive side, after the ball was put in play, was better or worse than normal at turning the DP. I think the best way to measure that is WOWY.


#87    terpsfan101      (see all posts) 2008/10/10 (Fri) @ 14:57

Then what would be the DP figure I’d use for Milligan?


#88    terpsfan101      (see all posts) 2008/10/10 (Fri) @ 15:41

Ok, I believe this is the correct formula:

(Lg DP Opp/PA * Player PA) * Player DP/DP Opp

Using your numbers for Milligan:

(.184 * 571) * .243 = 25.5 Normalized DP’s


#89          (see all posts) 2008/10/10 (Fri) @ 18:15

Yes, that looks right - how many DPs he would have been expected to hit into with a normal distribution of opportunities, but not testing for quality of the defensive players.

Milligan hit into DPs at twice the league rate, but actually had slightly fewer opportunities than would be expected.


#90    terpsfan101      (see all posts) 2008/10/21 (Tue) @ 00:13

The formula for the Adjusted GIDP’s isn’t right in post #88. I’m short approximately 1000 Adjusted GIDP’s out of 160000 total GIDP’s from 1954-2007. I even adjusted the GIDP Opportunities for player’s with Missing Retrosheet PA’s from 1954-1972. Maybe the error lies in those years.


#91    Colin Wyers      (see all posts) 2008/10/21 (Tue) @ 01:15

(.184 * PA) * (DP/Opp)

Okay, I presume that’s the formula you’re referring to. How are you calculating DP opportunities, is probably the first thing I’m curious about.

Also, if the .184 number is tuned only to the numbers you originally posted, you might have better luck deriving the numbers directly from your sample data. If you’re really worried about reconciling, avoid rounding until the very end.


#92    terpsfan101      (see all posts) 2008/10/21 (Tue) @ 03:34

For GIDP Opportunities, I counted the number of PA’s where their was a runner on 1st base and less than 2 outs.

The .184 number was just the example Brian used. He got this number from Baseball Prospectus.

Here is the formula I used:

[(Lg GIDP Opp / Lg PA) * Player PA] * (Player GIDP / Player GIDP Opp)

I think that I need to incorporate the League GIDP rate (Lg GIDP / Lg GIDP Opp) into the formula.

I really do not think I can figure this out on my own. I can post the data for the 1991 AL and Randy Milligan.

Lg GIDP = 1823
Lg GIDP Opp = 16462
Lg PA = 87305

Lg GIDP / Lg GIDP Opp = .111
Lg GIDP Opp / Lg PA = .189

Milligan GIDP = 23
Milligan GIDP Opp = 103
Milligan PA = 571

Milligan GIDP / Milligan GIDP Opp = 23/103 = .223
Milligan GIDP Opp / Milligan PA = 103/571 = .180


#93    terpsfan101      (see all posts) 2008/10/21 (Tue) @ 06:14

In case you haven’t figured it out, I tend to obsess over things until I find a solution to them. The shortfall in “normalized GIDP’s” is being caused by those players who hit into zero GIDP. Their GIDP opportunities are being adjusted, but their GIDP aren’t being adjusted because they haven’t hit into any double-plays. Mathematically, I don’t think there is anything you can do to solve this problem. Consider the following scenario with Enos Slaughter in 1954 with the Yankees:

Slaughter: 0 GIDP, 31.41 GIDP Opp, 154 PA
1954 AL: 980 GIDP, 9929.09 GIDP Opp, 47785 PA

(Note that the GIDP Opps are not integers. This is occurring because Retrosheet is missing 1048 PA for the AL in 1954. According to the PBP data, Slaughter is missing 2 PA and has 31 GIDP Opps. So his new GIDP Opp are (31/152)*154 = 31.41.)

Now, Slaughter’s Adjusted GIDP Opp are [(31.41/154)/(9929.09/47785)]*31.41 = 31.98.

Because Slaguhter did not Ground into any DP’s, he is not being penalized for these additonal .57 GIDP opportunities (31.98 - 31.41).

For instance, say Slaughter hit into 3 GIDP, he would be penalized .05 GIDP:

(3/31.41)*31.98 = 3.05 Adj GIDP

I don’t know of any mathematical expression that would correct for this. Now I get to look forward to a day at work with no sleep!


#94    terpsfan101      (see all posts) 2008/10/21 (Tue) @ 21:36

I’ll just use a multiplier for those player’s who grounded into at least 1 DP to compensate for the 0 GIDP player’s. If a player didn’t ground into any double plays, then you really can’t adjust his GIDP’s because he doesn’t have any to adjust. Even if you found a way to adjust, you’d end up with negative GIDP’s for player’s who had more GIDP opportunities per PA than the league average.


#95    terpsfan101      (see all posts) 2008/10/23 (Thu) @ 16:19

Here are a few things to keep in mind when adjusting statistics, whether that be applying park factors, reconciling individual totals to team or league totals, or adjusting the stats themselves, such as I did with GIDP.

If you have reconciled your Runs Created or Linear Weights on the league level, there is no guarantee that they will add up to the reconciled amount when you apply the RC or LW on the individual level, even if the league totals are the exact sum of the individual totals. They will come very close to adding up, usually within 5-10 runs of the league totals.

The same thing applies to Park Adjusted LW and RC. Previously, I was under the impression that if you forced the PF in each league to average to 1.00, then the Park Adjusted LW and RC would sum to the same amount as the Unadjusted LW and RC. This doesn’t happen. Again, the park adjusted figures will come very close to adding up to the original amounts, but they will not sum exactly to the original amounts.


#96    Tangotiger      (see all posts) 2008/10/23 (Thu) @ 16:35

I prefer applying adjustments on a linear scale, not multiplicative.  Then you guarantee that it all adds up, if you can live with a few negative numbers.


#97    terpsfan101      (see all posts) 2008/10/23 (Thu) @ 16:52

Excuse me for being dumb here, but what do you mean by applying adjustments on a linear scale? Can you refer me to any examples?


#98    tangotiger      (see all posts) 2008/10/23 (Thu) @ 17:51

I meant additive, not multiplicative.

If Coors allows 4 HR per 100 PA and the average park is 3, then I would subtract 1 HR per 100 Coors PA.


#99    tangotiger      (see all posts) 2008/10/23 (Thu) @ 17:52

Btw, I understand the limitation for a guy like Juan Pierre.  But, this gives you something quick, and it always adds up.

Ideally, you use the Odds Ratio method.


#100    terpsfan101      (see all posts) 2008/10/23 (Thu) @ 18:11

OK, I’ll give this method a try. Although, I didn’t use R/PA for my PF, I used R/O.

Say the R/O for Coors Field is .22 and the Lg avg R/O is .18. I would then subtract 4 Runs Created per 100 Outs.


#101    terpsfan101      (see all posts) 2008/10/24 (Fri) @ 01:09

Actually the additive method inflates the RC more than the multiplicative method does. By multiplicative we mean (RC / PF). And by PF, we mean the combination of Home and Road PF. Here are the cumulative Runs Created when you apply both methods to the Team Stats from 1954-2007:

Runs: 921602
RC Unadjusted: 921602

PF Additive: 922954 RC
PF Multiplicative: 922187 RC

Here are the results if I don’t reconcile the PF’s so that the avg PF in each league-season equals 1.00:

PF Additive, PF Un-Reconciled: 923172 RC
PF Multiplicative, PF Un-Reconciled: 922405 RC


#102    terpsfan101      (see all posts) 2008/10/24 (Fri) @ 07:00

Ignore the previous post. I caught a mistake in the way I calculated the R/O. I had created a category called Missing PA. For player’s with incomplete PBP data, the Missing PA category accounted for estimated Reached on Errors. Yes, it looks weird to assign a LW and RC value to this category, but it was the only way to be fair to player’s whose PBP data was incomplete. The number of estimated ROE’s accounted for under Missing PA’s need to be subtracted from total Outs. I forgot to do this. I’ll fix the results later this evening and then see how the additive PF fares.


#103    terpsfan101      (see all posts) 2008/11/01 (Sat) @ 19:06

Here is the reason why the sum of individual LW and RC don’t reconcile to the league totals (before applying the PF). The stupid SF! Actually, it is the ROE SF. Because an ROE can occur on a SF, I’m not sure exactly how to account for this in my equation for Outs. The equation for Outs uses the official categories and ROE data:

AB - H - ROE Non-SH-SF - RFC Non-SH - GIDP - SO + (SF - ROE SF)

Is this correct?

For seasons where the was no ROE SF’s, the sum of the individual LW and RC reconcile to League totals. This is driving me nuts.


#104    terpsfan101      (see all posts) 2008/11/01 (Sat) @ 20:44

I should be able to figure out this out when I take another look at it. I was just voicing my frustrations with how stupid of a category the Sac-Fly is. What makes even less sense is the ROE SF, which gets counted as an ROE, but not as an AB.


#105    terpsfan101      (see all posts) 2008/11/01 (Sat) @ 21:52

Found the problem. I was multiplying the RC and LW values for the ROE by the ROE number that did not include ROE SF’s.


#106    terpsfan101      (see all posts) 2008/12/03 (Wed) @ 10:20

I just finished calculating Linear Weights with the newest Retrosheet update. I will not get a chance to post them until later tonight. They will be downloadable this time. I figured out that Google Docs allows you to post spreadsheets in an exportable format. I will also clean up the presentation this time around.


#107    terpsfan101      (see all posts) 2008/12/04 (Thu) @ 00:42

Well, I didn’t get a chance to finish everything up tonight. I got the basic stuff posted. You can check my progrss by clicking on the links in post #76.


#108    terpsfan101      (see all posts) 2008/12/04 (Thu) @ 22:48

Ok, everything is updated. If you want to download the spreadsheets, click on the links I posted in #76. Change /pub to /ccc in the URL field. Then, click File and select Export.


#109    terpsfan101      (see all posts) 2009/12/06 (Sun) @ 07:15

I recently updated my linear weights to include 2009, 1952, and 1953. I recalculated the linear weights for all years prior to 1974. The linear weights from 1974-2008 (and 1911, 1921-1922) are the same ones I calculated last year.

http://spreadsheets.google.com/pub?key=tZHUipRgbW7tvNOij0f3_qg&output=xls

I’m still updating the run expectancy spreadsheet. I’ll post it here when I’m finished.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 14:49
Mail: rWAR v fWAR

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 13:00
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 12:05
Could Rob Dibble have been a comp for Strasburg?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II

Sep 01 22:11
PITCHf/x Summit 2010 - Recaps