THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, November 22, 2010

Run Expectancy Matrix, 1950-2010

By Tangotiger, 11:50 AM

Broken down in three eras.

Note that the 1950-1968 era is virtually identical to the 1969-1992 era, in terms of runs scored per inning.  The big difference is in which base/out state one era scores more than the other.  The 1969-1992 era scores more with 0 outs and: runner on 1B; runners on 1B/3B; bases loaded.  The 1950-1968 scores more with: 1 out, runners on 1B/2B or only 3B; 2 outs, runners on 1B/3B or 2B/3B.  This could have an impact here.

Two notes:
1. It will be interesting to see which events occur disproportionately in those states in comparing the two eras.
2. Retrosheet baserunner states are not completely reliable: in cases where they don’t have the intermediary baserunner states, Retrosheet marks the runners as having moved the minimum number of bases.  So, if Retro is not sure if the runner made it to 2B or 3B, it presumes 2B.


#1    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 12:11

If you subtract the bases empty line from each of the following 7 lines, you get this for 1993-2010:

1B    2B    3B    0    1    2
__    __    __            
1B    __    __    0.40    0.27    0.13
__    2B    __    0.63    0.43    0.24
1B    2B    __    1.01    0.67    0.36
__    __    3B    0.89    0.70    0.27
1B    __    3B    1.31    0.92    0.42
__    2B    3B    1.51    1.16    0.51
1B    2B    3B    1.85    1.34    0.70

That is, the .40 is a result of .941 - .544 = .397.  And so on.

Now, look how nicely additive things get.  The runner on first base, 0 outs is worth .40 runs.  The runner on 2B, 0 outs is worth 0.63 runs.  Together they are worth 1.022 runs, compared to the 1B/2B line of 1.012 runs.

Pretty much, every base/out entry is additive in this manner, except bases loaded.


#2    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 12:19

For those who missed it, you may be inspired to create your own RE matrix for era with no PBP data:

http://www.insidethebook.com/ee/index.php/site/comments/the_secret_recipes_of_the_run_expectancy_matrix/

I’d love for some more saberists to tackle this.  The RE matrix is the single most important tool in your toolbox.  This is your screwdriver.  So, roll up your sleeves, and do some work. 

Even if you don’t want to or can’t, please, print out the RE matrix chart, and carry it with you in your wallet.  It should appear opposite a picture of your wife and kids.  It’s that important.


#3    studes      (see all posts) 2010/11/22 (Mon) @ 12:44

Excellent, Tango!  I can kind of see why you broke things out into those three eras, but can you tell us more about your thinking?  For instance, why not break the low-scoring ‘60’s into their own matrix?


#4    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 13:01

The 1993-present was easy, because of the relatively clean break in terms of changing run environment.  Plus expansion.

1969 also was a clean break from the year of the pitcher.  Plus expansion.

I pretty much just wanted three eras for manageability, so the 1950s+60s got lumped together.

***

I could have instead broken it down by run environment, selecting seasons where runs per inning was .425-, .425-.475, .475-.525, .525+.  And I may do just that.

I did do something similar here:
http://www.insidethebook.com/ee/index.php/site/comments/run_expectancy_by_run_environment/

***

I can also break it down by pitcher quality, so any starting pitcher with a wOBA of under .300-, .300-.325, .325-.350, .350+.  Take his team’s games.

***

There’s really no limit to the way you can slice/dice.

The only thing NOT to do is START with the ACTUAL number of runs scored in that game.


#5    Bukanier      (see all posts) 2010/11/22 (Mon) @ 13:22

Thanks for the secret recipes. I’m currently working on an RE matrix / linear weights for my recreational adult baseball league.


#6          (see all posts) 2010/11/22 (Mon) @ 13:37

Super cool.  Been wanting to see this for a while.

If you broke out 1961-68 NL, do you think it would validate the small ball strategies from that era?


#7    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 13:59

I don’t think we need to necessarily wait for me to do that, as you can just go here:

http://www.tangotiger.net/markov.html

(Though that has the limitation noted.)

Basically, the lower the run environment, the more smallball makes sense.  That’s because the gain in bases remains constant pretty much.  That is, the run value of 2B relative to 1B is 0.17 runs.(*) The value of the out goes toward 0 the lower the run environment.  So, you can see that when you are facing Mariano Rivera, you should be bunting and stealing like crazy, because if you get a runner to 1B, you’re going to have to take risks to getting him to 2B, because the batters hitting away is not going to help.

(*) It was 0.163 in the more recent era, 0.177 in the 1969-92 era, and 0.181 in the 1950-1968 “era”.  That’s for the base between 1B and 2B.

For 2B to 3B: 0.190 for present era, 0.184 for previous era, and 0.181 for the earlier one.

When you put the two together, 1B to 3B:
0.353 1993-2010
0.361 1969-1992
0.362 1950-1968

So that’s why it’s helpful to think of the gain of those bases, 1B to 2B, or 2B to 3B, as fairly constant and uncorrelated to the run environment.

The out though is entirely correlated to the run environment.


#8    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 14:21

More evidence that the base/out state is not necessarily accurate in the older Retrosheet years.  Doing the subtraction of the bases empty state, we see that for the runner on 3B, 2 outs, we have:

0.273: 1993-present
0.278: 1969-1992
0.281: 1950-1968

Now, the chance of scoring from 3B and 2 outs and no one else on base rests entirely on batting average and reaching on error.  A walk is essentially a do-over (unless you get 3 straight walks). 

I’m not sure what the batting averages were in each of those eras, but I would say that they are either similar, or maybe the current present is ahead.

What happens instead is this: as I said, when Retrosheet doesn’t know if a runner landed on 2B or 3B, he presumes to land him on 2B.  There are lots of players that should be marked as being on 3B and are not there.  And presumably, a disproportionate number of those cases are where the runner is left on base to end the inning.

What you have therefore is 28% of the players who we know to have been on 3B to score, but also some players who we know to have been on 2B or 3B to not score, but we mark them as being on 2B.

That’s why you have to be careful in not treating the older Retro years as gospel, in terms of baserunner movements.


#9    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 14:26

Btw, PB, WP also contributes to scoring from 3B, if you were trying to figure out how you can have a .273 chance of scoring from 3B, if the batting average is around .265.


#10    Rally      (see all posts) 2010/11/22 (Mon) @ 14:36

"Even if you don’t want to or can’t, please, print out the RE matrix chart, and carry it with you in your wallet.  It should appear opposite a picture of your wife and kids.  It’s that important.”

I’m going to update that idea for 2011.  I actually do not have photos of my wife and kids in my wallet.  But I do on my cellphone.  So I just took a picture of the RE24 on my screen.  Actually came out pretty good.  Plus, I’ve got the site bookmarked for my mobile web browsing.


#11    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 14:42

I updated the file to also show the frequency of each base/out state.

Because OBP has been fairly stable, the frequency rates are also pretty stable.

Note that the runner on 3B / 2 outs is very low in the 1950-1968 period, and this supports my point regarding the gap in Retro data.


#12    John      (see all posts) 2010/11/22 (Mon) @ 14:46

Is there any indication in the Retrosheet files that makes it clear when they are guessing where the baserunner was?


#13    seank      (see all posts) 2010/11/22 (Mon) @ 15:07

How about putting the RE matrix on a card, like many insurance cards? Then you can use it wherever you have enough light to see it. And there are two sides, but I couldn’t decide what to put on the back.


#14    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 15:28

They aren’t “guessing”.  They are taking the minimalist approach of not putting someone at 3B if he never got there, preferring to leave him at 2B (a base he obviously must have occupied at some point).

And no, there are no indications.


#15    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 15:30

Sean, excellent idea!


#16    Colin Wyers      (see all posts) 2010/11/22 (Mon) @ 15:47

Do these RE tables zero out correctly? Just eyeballing them, the start of inning values feel a little high.


#17    RMR      (see all posts) 2010/11/22 (Mon) @ 15:55

It would be interesting to see a chart containing the differences between eras with a heat map layer on top of it.


#18    Thomas Aquinas      (see all posts) 2010/11/22 (Mon) @ 16:06

The sine qua non of This Thing Of Ours. Thank you, most excellent sir of sirs.


#19    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 16:14

Colin, don’t forget I’m excluding all 9th and later innings.

From 1993-2009, the total number of runs per inning was .5381.  If you consider the rate to be .544 for innings 1-8, and 10% less for innings 9+, and you have say 12% of innings at 9+, that gives you this:
.544*.88 + .544*.9*.12 = .5375

So, it ballparks pretty well.

The total number of runs per inning in 1969-1992 is .471, compared to my .477 with my process.

Total runs in 1950-1968 is .472 compared to .476 with my process.

All looks pretty good to me…


#20    Colin Wyers      (see all posts) 2010/11/22 (Mon) @ 16:34

I’m not necessarily talking about ninth and later; I exclude most of those as well. The question isn’t really whether or not the 0-0 values are too high, it’s whether or not they reconcile out on the whole. It’s been exceedingly difficult for me to get RE tables that sum to 0 properly; it probably is not a major difference in the grand scheme, but I was wondering how you handled that issue.


#21    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 16:56

One thing I do is look only at the base/out state at the start and end of PA.  So, I don’t consider SB and CS, etc as new base states.  One PA, one state.

This may be good or bad.  For example, at the start of the PA, the runner is on 1B.  Runner steals 2B.  Batter gets a 1B, scoring the runner.  It “looks” like he went 1B to home on the single.

You can do it other ways, such as base/out state on last pitch.  So, in this case, it would show the single moved the guy from 2B to home, but the frequency of the base/out state would ignore that the runner started at 1B.

You can do it like I used to do it, and consider the SB, CS as its own distinct event, but now we have multiple events for the same PA.

No right answer here.

***

Anyway, getting back to your point.  If you multiply the RE for each of the 24 base/out states by the frequency for that state, you will get back the runs scored for that inning.  Theoretically.

And, this works out just about perfectly for 1969-1992 and 1950-1968.

However, for 1993-2010, multiplying the two matrices gives me back 0.533 runs, when it should be 0.544.

This just means there’s alot of non-random things happening, and that we can’t necessarily treat each base/out state as independent from the others.


#22    Tangotiger      (see all posts) 2010/11/22 (Mon) @ 16:59

"No right answer here. “

Another right answer might be to look at the states on a pitch by pitch basis, but then one guy having an 8 pitch at bat would count differently from a one pitch at bat.


#23    The Wizard      (see all posts) 2010/11/22 (Mon) @ 20:57

thanks for your work and thanks for sharing your work tango


#24    Tangotiger      (see all posts) 2010/12/09 (Thu) @ 21:16

I updated the file with a third table: chance of scoring at least one run.  It’s the second table.

You will see for example that moving a runner from 2B to 3B, and losing an out, will increase your chance of scoring that inning.  That’s helpful in late innings of close games.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 11:38
Do pitcher’s reach back for velocity when needed?

May 25 11:33
“Why Kickstarter works”

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 10:14
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 17:04
Firefox, IE, or Chrome?