THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, June 12, 2006

The Secret Recipes of the Run Expectancy Matrix

By Tangotiger, 01:16 PM

Trying to make some sense out of the Run Expectancy (RE) Matrix, I’ll show you how to Deconstruct and Reconstruct an RE matrix, without play-by-play data.


When you look at this page:
http://www.tangotiger.net/RE9902.html

does it mean anything to you?  On the surface, it should at least tell you that this represents the average number of runs that scored from a particular base/out state, from 1999-2002, to the end of the inning.  This is an important chart to understand, and represents the core chart to everything from Leverage Index, Win Probability, Linear Weights, and everything in-between.  To better appreciate those concepts, you should spend alot of time on the core chart.

The Book also has an entire chapter devoted to Run and Win Expectancy, and the interested reader is recommended to pick up a copy.  It also explains in detail how the numbers are derived.

But, is there a shortcut, especially for years with limited data?  You bet!  And the shortcut I’m about to present is the absolute easiest shortcut with the absolute minimum amount of data.  What we are about to do is create an RE chart from the ground-up, using basic logic.  Ready? 

Runs Scored = number of baserunners time % of baserunners that score plus home runs
which we’ll write as:
R = baserunners * ScoreRate + HR
or
R = br * sr + HR

This is a truism.  If you have 10 baserunners in a game, and 30% of them score, and you hit 1 homerun, how many runs scored?  Four.  10 x .30 + 1 = 4.  Got it?  Keep that aside for now.

Also remember the 3,2,1 rule.  With 0 outs, you have 3 times more chance of scoring than with 2 outs.  With 1 out, you have twice the chance of scoring as with 2 outs.  So, if you have, on average, a 30% chance of scoring, you probably have a 45% chance with 0 outs, 30% with 1 out, and 15% with 2 outs.  Keep that aside for now, too.

Let’s take the 1994 data:
http://www.retrosheet.org/boxesetc/YS_1994.htm

We see that 15752 number of runs scored in 28586.1 innings, for an average of 0.551 runs per inning.  There was also 3306 homeruns, for an average of 0.116 HR per inning.  Let’s go back to our truism:
.551 = br * sr + .116
which means that:
br * sr = .551 - .116 = .435

That .551 goes into the “bases empty, 0 outs” cell. 

Now, how about with bases empty and 1 out?  That is, if baseball was only 2 outs in an inning, and if you only had the data above, how many runs would score per inning?

Well, if you have a 45% chance of scoring with 0 outs, 30% with 1 out, and 15% with 2 outs, then the average is 30%.  But, if you are coming to bat with bases empty and 1 out, then a baserunner will either score 30% of the time (with 1 out) or 15% of the time (with 2 outs), or 22.5%.  As you can figure out, this two-out figure is 75% of the three-out figure. 

As well, with two-thirds of the inning left, you will only have two-thirds of the baserunners, and two-thirds of the homeruns.

R = (br * 2/3) * (sr * 3/4) + (.116 * 2/3)
R = (br * sr) * (1/2) + (.116 * 2/3)

Since (br * sr) = .435 for this particular year, we get:
R = .435 * (1/2) + (.116 * 2/3)
R = .295

That .295 goes into the “bases empty, 1 out” cell.

Finally, how about with 2 outs?  That is, if you are down to the last out for the inning, how many runs will you score with the bases empty?

R = (br * 1/3) * (sr * 1/2) + (.116 * 1/3)
R = (br * sr) * (1/6) + (.116 * 1/3)
R = .435 * (1/6) + (.116 * 1/3)
R = .111

So, the first line of our RE chart for 1994, for bases empty, reads:

.551, .295, .111

The actual 1999-2002 data, for bases empty, for a slighly higher scoring environment, reads:

.555, .297, .117

Pretty cool, right?  The 1994 line was determined using exactly two pieces of information: runs per inning and HR per inning.  The 1999-2002 line was determined using 800,000 plays.  Imagine, we will be able to fill out the entire RE matrix, without even looking at the play-by-play data.  So far, we’ve got the first line down.  And this was really the hard part.  Once we can complete this, we will then be able to generate Linear Weights values for all events for any team or year in history.  Stay tuned!

***

Updated: June 13 - Chances of Scoring

Start thinking about the RE chart loudly, and take each number, one at a time.  The first is the .555.  That’s the average number of runs that score to start the inning.  Now, suppose I told you that having a guy on 1B with no outs will score that guy 39.8% of the time.  Since we know that, from the batter at the plate to the last batter for the inning, the team will score .555 runs, then we need to add the .398 runs that the runner on 1B represents.  .555 + .398 = .953.  This number is the RE with man on 1B and no outs.  And we see this number in the chart.  If you subtract the “1B” line by the “Empty” line, this will give you roughly the percentage of times that the runner on 1B will score with 0,1,2 outs.  Those numbers are, respectively, .398, .276, .134.  Pretty neat, right?

Do the same thing with the “2B” line.  The chance of scoring from 2B with 0,1,2 outs is: .634, .428, .227.  And from 3B it’s .927, .686, .270. 

Now, it doesn’t exactly work that way due to selective sampling, and sample size issues, but it’s a pretty good start.

It’s now time for a quiz.  What’s the RE with man on 1B and 2B and 0 outs?  We know the chance that the runner on 1B will score is .398, so that’s how much run expectancy (RE) he gets.  The runner on 2B is worth .634.  The batter at the plate, and all subsequent batters to the end of the inning, are worth .555.  Add them up: .398 plus .634 plus .555 gives you 1.587 runs.  Our chart tells us 1.573.  Pretty close!

Theoretically, you should be able to construct the entire RE chart for any era knowing only: the number of runs scored per inning (which we do know for any team or year in MLB history), and the chance that a runner will score from any base/out (which we don’t know).

(Note: The dependence of the runners on base, say bases loaded, makes it that it’s not such a simple additive exercise.  But, it’s a great starting point.)

How can we figure out the chance of a runner scoring from any base/out?  Stay tuned!

***

Updated: June 13 - Scoring from 3B

Of the various base/out states, the easiest one to figure out the chance of scoring is the man on 3B, no one else on, and 2 outs.  What has to happen for this runner to score?  Well, the batter has to get a basehit, or reach base on error.  If he’s out, the inning is over.  If he walks or is hit by a pitch, it’s a “do-over”, unless you get three consecutive “do-over” plays.

In short, your chance of scoring from 3B = batting average + error average + “3 consecutive walks” average

The batting average is whatever it happens to be for the league (make sure to count SF as a regular out).  It was .267 in 1994.  The error average is typically around .015.  The do-over plays occur 10% of the time, which means 3 consecutive do-over plays is .001.  So, the chance of scoring from 3B with no one else on and two outs in 1994 is .283.

Using just the RE tables, we were expecting .270, which is close enough.

How about with one out?  This one gets trickier, since we need to know about sac flies, or ground outs that score the runner.  Let’s say we don’t know about those pieces of information.  The chance of this runner scoring on a hit, error, or “do-overs” is .283 + (1-.283)*.283 = .486.  However, sac flies and the like does give our runner a chance to score with 1 out.

The strikeout rate in 1994 was .178 SO per (AB+SF).  The batting average + error average is .267 + .015 = .282.  That leaves the groundball and flyball rate as 1 - .282 - .178 = .54

Let’s assume that 50% of the time, the runner from 3B will score on an out with less than two outs. 

Maybe it should be 40%.  It’s not too important right now, but will be if you want to be serious about your RE charts.  We just need to come up with a reasonable number, and a quick research will give us the result.  This becomes important in the older years, where parks were much different, and the aggressiveness of the runners was different.  Using the SF, if you have it, would definitely help.  A little rolling-up-the-sleeves work is warranted here.

So, we have chance of scoring on 1 out = .283 + .27 = .553
Chance that he will still be on 3B with 2 outs = 1 - .553 = .467
Chance that he will subsequently score: .467 * .283 = .132

Add it up: .553 + .132 = .685

Using just the RE tables, we were expecting .686.  Nice, right?

Finally, how about 0 outs.  Repeating the same process:
Chance of scoring on 0 outs: .553
Chance that he will be on 3B with 1 out: .467
Chance that he will subsequently score: .467 * .685 = .320

Add it up: .553 + .320 = .873

And the RE tables told us .927.  There is a gap here, but we are at the mercy of our sample size.  If you refer to Table 9 in The Book, you will see that our expectation should have been around .87.

All we have to do now is add the .873, .685, .283 figures to the “bases empty” line of .551, .295, .111.

Our “man on 3B” RE line reads:
1.424, .980, .394

You just have to keep following the same process for all the other lines.  But, it’s going to start getting complex, but not complicated.

***

Updated: June 15 - Scoring from 1B, 2B

Rather than going through the complex process right now, let’s look at it from the other viewpoint that I brought up earlier.  Remember I started off with this truism:

R = br * sr + HR

And in 1994, on a per inning basis, I said that:

br * sr = .435

This means that if you had 1 baserunner (br) per inning, then you expect the average baserunner to score 43.5% (sr) of the time.  If you had 1.5 baserunners (br) per inning, then your expectation is 29% (sr).

br * sr can be broken down as:

br * sr
= br1 * sr1
+ br2 * sr2
+ br3 * sr3
= .435

Where the “1,2,3” signifies the base.  So, “br3 * sr3” is the number of initial baserunners at third base times the chance that the runner on third base will score.  We know what sr3 is equal to, it’s the average of:
.873, .685, .283
which is .614
As for br3, that’s pretty much the number of triples per inning, which in 1994 was .025.  (We should also count those baserunners that got on third base via error, but, those are rather rare, and just gets in the way at this point.)

So, br3 * sr3 = .015

This leaves us with this equation:
br1 * sr1
+ br2 * sr2
= .435 - .015
= .420

We’re getting close.  Stay with me.

From 1960 to 2004, using the RE charts provided by Tom Ruane at Retrosheet, the gap between the chances of scoring from second base (sr2) and scoring from first base (sr1) was .170.  And, this is true, regardless of the run environment.  The top 20 run environments of the 90 league-seasons in this time period averaged .552 runs per inning (5.0 runs per game), and the gap (sr2 - sr1) was .167.  The bottom 20 run environments averaged .425 runs per inning (3.8 runs per game) with a gap of .168.  Even if I limit it to the top and bottom 5 run environments (5.3 runs per game, gap of .162, and 3.6 runs per game, gap of .175), we are essentially at a no-gap scenario.  The correlation between the 90 run environments and the 90 gaps is an r of .004, which again, is essentially zero.

Our equation is now:
br1 * sr1
+ br2 * (sr1 + .170)
= .420

Which is
sr1 * (br1 + br2)
+ .170 * br2
= .420

Conveniently, and just a little bit more dangerously, ignoring the reaching on second base by error, the initial runners at second base was the doubles per inning rate of .200 (that’s br2). 

Our equation is now:
sr1 * (br1 + br2)
= .420 - .170 * .200
= .386

Almost there.  How many baserunners were there in 1994?  Excluding triples and homeruns, br1 + br2 was around 39285 per 28586 innings for an average of 1.374.  (Estimate based on hits minus 3b minus Hr plus walks plus hit batters plus interference plus reached on error.)

sr1 * 1.374 = .386

This makes sr1 = .281

That’s it.  The chances of scoring from first base, in 1994, is .281.  The chances of scoring from second base is .281 + .170 = .451.  Applying our 3,2,1 rule, and our chances of scoring from 1B with 0,1,2 outs is:
.421, .281, .141

Making out “1b only” line as:
.972, .576, .252

Our chances of scoring from 2B is:
.677, .451, .226

Making out “2b only” line as:
1.228, .746, .337

The other lines follows a similar process, giving us this estimated RE chart for 1994:

EST_1994___0_______1_______2
Empty____0.551___0.295___0.111_
1st______0.972___0.576___0.252_
2nd______1.228___0.746___0.337_
3rd______1.424___0.980___0.394_
1st_2nd__1.649___1.027___0.478_
1st_3rd__1.845___1.261___0.535_
2nd_3rd__2.101___1.431___0.620_
Loaded___2.522___1.712___0.761_

And what did Ruane at Retrosheet calculate for us?

RE_1994____0_______1_______2
Empty____0.549___0.300___0.116_
1st______0.936___0.565___0.258_
2nd______1.172___0.728___0.367_
3rd______1.464___0.999___0.419_
1st_2nd__1.595___0.936___0.472_
1st_3rd__1.767___1.175___0.572_
2nd_3rd__2.045___1.470___0.694_
Loaded___2.391___1.573___0.798_

Now, it does look a little off.  A few things conspired against us.  First, counting the reached on error as being from first base only.  Secondly, not accouting for runners who reach base on sacrifice hits.  If we had left on base (LOB) data instead, this process would have been a bit better.  After all, PA = R + LOB + Outs.  That is, every batter is either a run scored, left on base, or putout.

You now have the basis to calculate the RE for any team or league.

More to come…

#1    John Beamer      (see all posts) 2006/06/13 (Tue) @ 09:57

Awesome stuff Tango - I can’t believe it is so close to the Markov approach ... I’ll probably give it a crack for 2005 data


#2    tangotiger      (see all posts) 2006/06/13 (Tue) @ 10:42

Article has been updated.


#3    tangotiger      (see all posts) 2006/06/15 (Thu) @ 11:57

Updated again…


#4    tangotiger      (see all posts) 2006/06/16 (Fri) @ 11:59

I recommend reading this thread, as some people are trying to create the RE matrix based on this thread:

http://www.baseball-fever.com/showthread.php?t=41206&page=4


#5    DM      (see all posts) 2006/06/23 (Fri) @ 10:38

Tango,

This is excellent stuff.  When I learned that the RE charts in The Book were based on Markov chains, I wondered how you did that. This helps answer that question.  Thanks very much.

Are you familiar with the DLSI model?  It’s mentioned in the book “Curve Ball”.  With it you can take OBP, 1B%, 2B%, 3B%, HR%, BB% and produce a run distribution that is pretty close to observed events.  I’ve been tinkering with it to produce run distributions for all base/out states, which can be used to develop RE charts for parks, and, more interestingly, lineups.

Thanks again,

DM


#6    tangotiger      (see all posts) 2006/06/23 (Fri) @ 11:33

DM, I read Curve Ball, but I don’t remember seeing that.  In any case, the authors are right, and if you follow my model to completion (repeating the steps for doubles and singles), you will get it.  I actually wrote a Javascript program that does it, so I should post it here soon, and everyone will also have access to the source code too.

You will have the problem that SB, CS, PK, BK will not be included.  As well, the program I wrote gives you the option to change the chances of moving the runner over an extra base on a single or double.

It’s pretty cool stuff, and I think once I release the source code, that others can pick up on it, and enhance it.


#7    tangotiger      (see all posts) 2006/06/23 (Fri) @ 11:57

I also recommend reading what http://www.stat.harvard.edu/People/Faculty/Carl_N._Morris presented.  (I suggest you click the last link, and then right-click the 2nd link, save-target-as to your hard-drive, and then open it with your text editor, and make sure you put “wrap”.)

I read that a while ago, but I seem to remember that he forced in something like a single always advances a runner 1 base, and a double 2 bases.  Anyway, what he does is 95% of what I do.


#8    DM      (see all posts) 2006/06/23 (Fri) @ 13:56

Thanks, Tango.

The DLSI model assumes runners on 2B score on singles, but all else advance one base on a 1B and 2 bases on a 2B.

Dan Fox has written some articles for Hardball Times earlier this year (he’s now with BBPro) with averages for baserunner advance (e.g. 46% of the time a runner on first gets to third on singles to RF).  I wonder if we could work those averages into the model somehow.  I’ve been using rough figures like that to allocate RE value among batters and runners on base hits with runners on.


#9    tangotiger      (see all posts) 2006/06/23 (Fri) @ 14:08

DM:

http://www.tangotiger.net/destmob.html

You would definitely use those numbers to figure out linear weights, and your Markov chain.


#10    tangotiger      (see all posts) 2006/06/26 (Mon) @ 08:48

I expect to release my simple Markov program (including source code) for all to use and expand under some open source licence.  Hopefully, I can get to this within a few weeks.


#11    tangotiger      (see all posts) 2006/06/26 (Mon) @ 08:53

I’m thinking I’ll include this licence:

http://creativecommons.org/licenses/by-nc-sa/2.5/


#12    Mike      (see all posts) 2006/07/04 (Tue) @ 10:39

Tango,

Great post!  When I too purchased the book I was wondering how you did it with Markov chains.

I’m looking forward to the Markov program; it should be a useful tool!

When you mention Markov chains in the book, you say that they are particularly helpful beacuse you can change the input (in other words, you can put in Rivera, Papelbon or any other pitcher).  Other than finding out the “chance of scoring/winning” probabilities with these pitchers, what else can the markov chains tell you in terms of analysis/evaluation when changing the input to a major league pitcher?  I know DePodesta used markov chains to find values for the the 24-base out situations (which originates from a long time ago and is nothing new), but when you put Rivera in as your example, can the markov chains tell you anything else that a GM would want to know?


#13    Mike      (see all posts) 2006/07/10 (Mon) @ 18:45

Not sure if you saw this post or not, tango.


#14    tangotiger      (see all posts) 2006/07/11 (Tue) @ 07:51

The RE matrix is at the heart of everything baseball.  From the RE matrix, you can generate a win expectancy (WE) matrix.  The WE matrix is required to analyze in-game strategies like bunting and stealing.  Against certain pitchers or environments, the RE matrix will tell you how much each event is worth, to the point that player A might be a better option than player B in certain environments, and vice-versa in others.


#15    Colin Wyers      (see all posts) 2008/11/11 (Tue) @ 04:02

Through the miracle of thread necromancy…

During the section on scoring from third with one out, you say:

1 - .553 = .467

Which is a typo; it should read .447.

Using (largely) this method (a few subtle tweaks here and there, like including WP/PB data) gave me this:

http://www.editgrid.com/user/cwyers/run_expectancy_all_years

I finally punted on working out a better estimate of scoring from third base on an out; the only method I came up with for it essentially coincides with the Retrosheet years entirely. Now I just need to work out what to DO with these.


#16    terpsfan101      (see all posts) 2008/11/11 (Tue) @ 04:16

Great Job!

Wasn’t SABR Matt looking for something like this a few years ago? What does necromancy mean? One of the songs on Rush’s 1975 album, Caress of Steel, is called “The Necromancer” It is one of my favorite Rush songs.

Look at the RE from the bases empty no out state from 1894: .851 runs. In 1930 NL, the REOI with a runner on 1st and 0 outs is 1.113 runs.

If you estimated a frequency table for each event in the 24 base-out states, you could calculate LW for every season dating back to 1871. Easier said than done, however.


#17    Colin Wyers      (see all posts) 2008/11/13 (Thu) @ 04:27

Necromancy is, broadly speaking, magic having to do with life and death. More specifically, it’s the raising of the dead.

I can’t for the life of me figure out how to do this. Or I should correct myself and say I’ve figured out two ways to do this, and neither of them seems appropriate. The first is to just use a state-to-state transition matrix from the Retroera to determine the value of each event. I guess that would “work” if you were willing to accept that level of imprecision.

The other method involves estimating the frequency of each state from the available data - how ofen a PA happens in each state, essentially. The “problem” is that once I’ve estimated that I’m not sure I need the run expectancy matrixes I’ve calculated, because essentially by that point I’ll have a full Markov model!

(The problem is that you can’t sumarize the odds of the bases being loaded as:

Odds of runner on first * odds of runner on second * odds of runner on third

Which was what I was doing. This should have occured to me when I was figuring out these things in the first place. So essentially, in order to estimate all of these things properly I essentially have to build a Markov, I think. And once I’ve done that, I don’t need the RE tables from above, because I can build them from the state-to-state transition matrix. I could be wrong here.)


#18    terpsfan101      (see all posts) 2008/11/13 (Thu) @ 08:07

I’ve read a couple of books by Frederic H. Myers, including “Human Personality and It’s Survival of Bodily Death”, and I don’t recall coming across one reference to necromancy.

After thinking about your historical RE tables for a couple of days, the main problem I see with trying to derive LW from them is accounting for the advancement value of hits and batting outs. It’s not hard to figure out the “getting-on” value for each event. That component can be figured from the RE table. Do we have any idea how often a batter went from 1st to 3rd on a single during the 1880’s: 70%, 60%, 50%, 40%?


#19    Colin Wyers      (see all posts) 2008/11/13 (Thu) @ 14:22

A lot of it can be assumed - some things, like going first to second on a batting out, seem very stable across the entire Retro era. Some seem a lot less stable, like scoring from second on a single - runners are less likely to take chances like that in high run-scoring environments.


#20    Peter Jensen      (see all posts) 2008/11/13 (Thu) @ 15:38

Colin - Is it just that the runners are less likely to take chances or is there a different distribution of the type of singles (infield, ground ball through infield, fly ball or line drive) that may be affecting the rate of 2nd to home on a single?


#21    Colin Wyers      (see all posts) 2008/11/13 (Thu) @ 15:47

It goes in the reverse direction for it to be that. If that was the case, you’d see MORE second to home on a single in the modern era, when in fact what you see is more runners holding up at third.


#22    Peter Jensen      (see all posts) 2008/11/13 (Thu) @ 17:27

Colin - Are you guessing or do you have examples that you can show me from high run environments (around 2000) and low run environments (mid 60s)?


#23    Colin Wyers      (see all posts) 2008/11/13 (Thu) @ 17:47

I ran the numbers year by year from the Retro database. I don’t have them available right now, but I’ll see what I can do tonight.


#24    Tangotiger      (see all posts) 2008/11/13 (Thu) @ 18:55

Here are numbers from a medium environment (4.3) and a high one (probably almost 5.0):
http://tangotiger.net/destmob1.html

Note that pre-1974 or so, you have to be careful with the PBP.  You won’t necessarily have it correct when it shows the guy went to 2B on a single.  Alot of times, Retrosheet doesn’t know.  So, they will PRESUME he ONLY went to 2B, even if he might have gone to 3B.  And so, a single with a guy on “2B” might be a single with a guy on 3B.


#25    john      (see all posts) 2008/11/13 (Thu) @ 19:11

I been trying to follow this with 2007 data.  Here’s what I have so far

Innings 43425.2
Home Runs 4957
Runs Scored 23322

My first line is correct I think.  It reads:
.537 .288 .109

Now to get the next line I need to add .398 .276 and .134 to these numbers?  How do we know that 39.8% of runners score from 1st with no outs?  And when I do that it gives me the line:

.935 .564 .243 respectfully.  Is this correct?  It seems these numbers look a bit high?  Looking at the RE table on BP for 2007 I get

.926 .542 .235


#26    Colin Wyers      (see all posts) 2008/11/13 (Thu) @ 19:22

Your figures are going to be a little high, because of some data you’re leaving out of the equation. Try this:

(IP*3) - (AB+SF+SH-H-ROE)

For 2007, that gives you:

(43425.6 * 3) - (167783 + 1540 + 1441 - 44977 - 1746) = 6235.8

That’s an approximation of how many outs on base there were. We have DP and CS data:

3985 + 1002 = 4987

That gives us:

6235.8 - 4987 = 1248.8

That’s (roughly) how many outs we’re undercounting in this analysis, as well as runners who aren’t scoring.


#27    john      (see all posts) 2008/11/13 (Thu) @ 19:24

Ahhh thanks.


#28    Tangotiger      (see all posts) 2008/11/14 (Fri) @ 00:14

You can’t use the 39.8%.  The first thing you have to figure out is the chance of scoring from 3B, 2 outs:

In short, your chance of scoring from 3B = batting average + error average + “3 consecutive walks” average

The batting average is whatever it happens to be for the league (make sure to count SF as a regular out).  It was .267 in 1994.  The error average is typically around .015.  The do-over plays occur 10% of the time, which means 3 consecutive do-over plays is .001.  So, the chance of scoring from 3B with no one else on and two outs in 1994 is .283.

So, start from there…


#29    john      (see all posts) 2008/11/14 (Fri) @ 14:59

Yeah.  After writing that I read down a bit futher and saw that. Thanks


#30    Tangotiger      (see all posts) 2008/11/14 (Fri) @ 15:08

Looking forward to seeing your RE chart…


#31    Tangotiger      (see all posts) 2009/09/04 (Fri) @ 13:26

Looks like Colin implemented the above in SQL:

http://basql.wikidot.com/run-expectancy-without-play-by-play

Great job!


#32    Tangotiger      (see all posts) 2009/10/30 (Fri) @ 14:58

I forgot Colin implemented this.

***

I noticed I made one mistake.  For chance of scoring from 3b and 2 outs, I said this:

Of the various base/out states, the easiest one to figure out the chance of scoring is the man on 3B, no one else on, and 2 outs.  What has to happen for this runner to score?  Well, the batter has to get a basehit, or reach base on error.  If he’s out, the inning is over.  If he walks or is hit by a pitch, it’s a “do-over”, unless you get three consecutive “do-over” plays.

In short, your chance of scoring from 3B = batting average + error average + “3 consecutive walks” average

The batting average is whatever it happens to be for the league (make sure to count SF as a regular out).  It was .267 in 1994.  The error average is typically around .015.  The do-over plays occur 10% of the time, which means 3 consecutive do-over plays is .001.  So, the chance of scoring from 3B with no one else on and two outs in 1994 is .283.

That’s incorrect.

It should be:
(hits + error) / (pa - walks - hitbatter) + .001

That’s because a do-over doesn’t count.  It’s like a 2-strike foul.


#33    Tangotiger      (see all posts) 2010/11/10 (Wed) @ 10:36

Bumping…


#34    Bukanier      (see all posts) 2011/03/18 (Fri) @ 11:02

I’m trying to get to the probability of the runner on 1st (2nd and 3rd empty, no outs) scoring, the 39.4% here.

Is it enough to check if in those cases the total number of runs until the end of the inning is greater than 0 or not, at least if there are no baserunning, non-PA related outs in the remainder of the inning?

If that particular runner was out in a force out, and a run subsequently scored, isn’t that outside of the number that represents 0 outs, bases empty, the .555 here? Because in this case, it would be safe to trust the scoresheet information, namely that the forceout would have been a regular out without the particular runner on 1st.


#35    Tangotiger      (see all posts) 2011/03/18 (Fri) @ 11:27

In CWEVENT, Ted tracks the runner to the end of the inning.  There’s a field called something like RUN1_DEST_ID.  Something like that.


#36    Colin Wyers      (see all posts) 2011/03/18 (Fri) @ 12:08

_DEST_ID fields track the runner for that event only. _FATE_ID fields track the runner to the end of the inning.

Also, Retrosheet uses a destination of 5 and 6 to account for unearned runs (IIRC, 5 is for unearned runs, 6 is for runs counted as unearned for the team but earned for the pitcher).


#37    Tangotiger      (see all posts) 2011/03/18 (Fri) @ 12:46

Colin, ah, thanks for the correction.  I should have known better than to say something without checking it first.


#38    Bukanier      (see all posts) 2011/03/18 (Fri) @ 21:55

Thanks for the answers. I wish I was doing retrosheet smile It’s for my local amateur baseball league in Germany, and all I have is scoresheets from 15 games, involving just my own team, and team stats for my own and another team, accounting for 21 of 45 games in 2010.

My strictly empirical RE24 looks like this:
0 1 2
--- 1.62 0.85 0.22
X-- 2.50 1.52 0.33
-X- 2.11 1.44 0.45
--X 2.94 1.44 0.52
XX- 3.89 2.44 0.40
X-X 2.29 1.22 0.40
-XX 4.05 2.11 0.79
XXX 3.62 3.80 1.46

The secret ingredients tell me for the first line:
1.84 0.92 0.31
due to 0.02 HR/Inning.


#39    Tangotiger      (see all posts) 2011/03/19 (Sat) @ 08:37

Can you run it here and report back:

http://www.tangotiger.net/markov.html


#40    Bukanier      (see all posts) 2011/03/19 (Sat) @ 09:38

I got this:

10.611 : Runs Scored per Game

Chances of Scoring
Event 0 outs 1 out 2 outs AVERAGE
1B/BB 0.612 0.416 0.191 0.406
Double 0.721 0.540 0.313 0.525
Triple 0.891 0.732 0.352 0.658

Run Expectancy
Bases 0 outs 1 out 2 outs
xxx 1.179 0.599 0.196
1xx 1.791 1.014 0.387
x2x 1.900 1.139 0.509
xx3 2.070 1.331 0.548
12x 2.580 1.622 0.733
1x3 2.696 1.768 0.761
x23 2.806 1.893 0.883
123 3.521 2.447 1.213

Obviously, because of catcher skill, the overall number runs is significantly lower (there are 16.54 runs per 9 innings). Average SB/CS is somewhere around 10/1, and there are a lot of PBs and WPs.


#41    tangotiger      (see all posts) 2011/03/19 (Sat) @ 17:58

Your empirical of 1.62 runs per inning means 14.6 runs per 9IP.

If Markov is saying you should get only 10.6 runs per 9IP, then clearly alot of non-batter events are happening.  (Do you count reaching based on error as a single?  You should for Markov purposes.)

Because of that, you should modify the baserunner movement numbers alot higher to account for things like SB/CS being out of whack.  Instead of runners on 1B moving 30% to 3B, bump it up to 60% or 70% or something.

Play with those movement numbers until you get 14.6 runs per game.  Once you do that, then you’ve got a more realistic RE chart.  Try it and report back if you can.


#42    Bukanier      (see all posts) 2011/03/22 (Tue) @ 10:44

The 1.62 runs per inning contains a lot of selection bias (my team’s good pitching and defense, and some scorers don’t properly note the sequence of non-batter events, and I can only safely reconstruct the low-scoring innings there). Using team stats I get 1.89 runs per inning.

With ROEs as singles (I could check for ROE doubles), and all extra base values except for ended innings at 1 I get this:

15.644 : Runs Scored per Game

Event 0 outs 1 out 2 outs AVERAGE
1B/BB 0.791 0.588 0.247 0.542
Double 0.887 0.634 0.385 0.635
Triple 0.978 0.884 0.385 0.749

Bases 0 outs 1 out 2 outs
xxx 1.738 0.903 0.281
1xx 2.529 1.491 0.528
x2x 2.625 1.536 0.666
xx3 2.716 1.786 0.666
12x 3.444 2.183 0.935
1x3 3.510 2.384 0.935
x23 3.606 2.430 1.072
123 4.432 3.108 1.444


#43    Tangotiger      (see all posts) 2011/03/22 (Tue) @ 11:21

Great stuff.  If you need an RE chart for that kind of run environment, that’s what you should use.

Since you said your team scores 1.89 runs per inning (though “empirically” you get 1.62).  I’m not sure what that means.  If you have your total runs and your total innings, then that’s what you should work toward.  You should exclude in your counts the bottom half of any last or extra innings.


#44          (see all posts) 2011/03/22 (Tue) @ 11:39

Breakeven success rate on stolen bases is around 94%

Crazy

Going from 1st to 2nd or 2nd to 3rd gains only 0.1 runs with 0 outs.  If the player is caught stealing, the RE drops at least 1.6 runs.


#45    Tangotiger      (see all posts) 2011/03/22 (Tue) @ 11:48

Note that this basic Markov does not include DP.  So, that could be a big limitation possibly.


#46    Bukanier      (see all posts) 2011/03/22 (Tue) @ 12:36

DPs on the ground are very rare, in fact I can’t remember one between 1st and 2nd. A good 1B has once thrown out a runner going from 2nd to 3rd after a GO last season.

My team had a SB-CS of 136-10.

Yeah, this explains (though it isn’t a good excuse for that) why I was fuming at my basecoach ordering a double steal and I was thrown out at 3rd, in a blowout win nonetheless.

The 1.62 figure is from scoresheet information of games involving my team only, excluding incomplete innings, and as written above, badly scored innings, the latter usually high scoring. I compared the runs to the end of the inning to the extremely small sampled RE.

Those are the only scoresheets I have right now.

In addition to that the offensive and defensive stats of another team (out of 6) are on the internet. Out of 45 total league games I have data for 27 games (15 games for each of the two excluding the 3 head-to-head meetings). That’s where the 1.89 is from (although incomplete innings are obviously not excluded here, I could do that for the games of my team only though).


#47    Tangotiger      (see all posts) 2011/03/22 (Tue) @ 13:05

xxx 1.738 0.903 0.281
1xx 2.529 1.491 0.528
x2x 2.625 1.536 0.666

Steal with 0 outs = +.104
CS with 0 outs = -1.626
breakeven = 1.626 / (1.626+.104) = 94%

Steal with 1 outs = +.045
CS with 1 outs = -1.210
breakeven = 96%

Steal with 2 outs = +.138
CS with 2 outs = -.528
breakeven = 79%

If your SB/CS numbers are 136-10, that’s 93%.  Your fast guys should be stealing ALL THE TIME with 2 outs. (*)

(*) Of course, we really should be using win expectancy not run expectancy.


#48    Bukanier      (see all posts) 2011/03/22 (Tue) @ 14:00

Big thanks for the useful linear weights. And the big responsibility to explain to my teammates why wOBA is better than batting average. I may dive into the secret recipes again soon.


#49    Tangotiger      (see all posts) 2011/03/22 (Tue) @ 14:47

In your case, you’ll have a much different wOBA equation.  The HR weight and BB weight will both be closer to 1.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 15:00
Do pitcher’s reach back for velocity when needed?

May 25 14:44
What sabermetrics is NOT

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion