THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, February 23, 2007

Run Expectancy by Run Environment

By Tangotiger, 12:05 PM

Ever wanted to have the run expectancy chart for a 3.5 RPG environment, or 2.4 or 6.7?  Here you go:


I published it on Google Docs.

Here’s how to read the first line
In a 2.00 runs per game environment, with the bases empty and 0 outs, this state will occur 25.9% of the time.  The run expectancy (RE) is 0.222 (this is the only “duh” part, as it’s 2.00/9).  You will be held scoreless from this point onward to the end of the inning 84.9% of the time.  You will score exactly 1 run 10.4% of the time.

I suggest you “ctrl-A” to select all the data, and copy it to Excel.  Delete the header and footer.  In Excel, you can do “Date / Filter / Auto”, and it’ll set up drop-down filters to make this thing real easy to use.

Method
Now, how did I do all this?  I started with all games from 1999-2002 that had at least 8 full innings.  I selected only the first 8 innings.  I classified each game as A,B,C,D,E.  All the games with a low wOBA were placed in A group, and all the games with high wOBA were placed in the E group.

Then, it was simply a matter of weighting each group as I needed it.  For example, the A group was weighted at 98%, the B group at under 2%, and rest accordingly.  What this allowed me to do was use actual games with actual results, but weighting certain games more than others, so that I end up with a real 2.00 runs per game.

You will of course notice that I removed all 9th and extra innings, which means alot of smallball-9thInning-style data is not represented in the RE charts.  This may actually be a good thing, since one-run strategies are best evaluated in terms of “what if I don’t play for 1-run?”.  Therefore, these charts are not polluted with such events.

In any case, I’ve already provided on my site and in The Book the RE matrix that included those events.  The reader is free to use whichever is appropriate.

Next?

I’m hoping that Fangraphs.com or Baseball-Reference.com or Retrosheet.org uses this file, or applies my methodology to create their own files.  And from that, you can generate value-added performance results.  Anyone wanting to do so should make sure to let their readers have free access to the results.  In return, I grant you a perpetual, non-exclusive licence.

I’m also going to be using this file to generate WE and LI charts by run environment.  I have already arranged to provide these charts to Fangraphs.com.  I’d be happy to extend that offer to whoever comes calling, with the same provision as the previous paragraph.

#1    John Beamer      (see all posts) 2007/02/24 (Sat) @ 05:18

Tango

This is pretty cool stuff. Quick question though: What is stopping you from generating the RE matrix from a Markov? I would have that that would have been a lot easier?


#2    tangotiger      (see all posts) 2007/02/24 (Sat) @ 09:59

I could have generated the five base RE matrix from Markov, and then extrapolated that as I have done. 

However, with Markov, I force the state-to-state transitions as a constant, and force the frequency of each positive event to be the same, relative to the other positive events, and simply modify the frequency of the batting outs.  While this is a quick and cool way to generate the Markov (as I did in The Book, for the 3.2 RPG table), I’m not sure that I have such solid ground to do that.

You can compare the 3.2 from the Google Docs to what I have in The Book, or the 5.0 as well.

The process I’ve outlined here however can now be used to go through all the Retrosheet years, and create a larger sample for the various base RE tables.

However, even that is not good.  If you think about why a runner goes 1B to 3B on a single, that’s based on EXPECTATION of the run environment, and not on the after-the-fact knowledge of the game wOBA.  Therefore, more work needs to be done to figure out how the state-to-state transitions are affected by the EXPECTED run environment.


#3    Tangotiger      (see all posts) 2007/02/28 (Wed) @ 13:48

I changed my process slightly (most won’t notice the difference), so here’s the latest Google Docs.

I’ve also sent Fangraphs a custom WE chart by run environment as well.  This was the biggest issue with WPA on Fangraphs (wasn’t summing to zero at the hitter and pitcher level).  This will no longer be an issue.

More minor issues will be a custom LI chart by run environment, and park factors. 

Even more minor issues is the HFA.  As I wrote on Fangraphs:

The “HFA” is made up of 4 things:
1 - getting to play at home
2 - getting 3 more outs than your opponent when you are at bat
3 - getting opponent to play you differently because they have 3 more outs to get through
4 - use of relievers

My charts I provided Fangraphs only considers #2. As I’ve repeatedly said, I don’t care too much about the other 3, since it’ll come out in the wash over a season, and even for a game the impact is limited. And I have much bigger fish to fry.


#4    Tangotiger      (see all posts) 2010/03/24 (Wed) @ 10:49

Bumping for Peter…

(And, where is John Beamer?)


#5    Peter Jensen      (see all posts) 2010/03/24 (Wed) @ 11:29

Thank you Tango.  I am not sure that I saw this post when you made it originally.  It is an interesting methodology and helpful to those who don’t have the data or inclination to calculate the RE tables each year.  The question that I now have is how is Fangraphs using these tables to calculate RE24?  Also, linear weights should be derived from the same data as the RE tables if RE24 and linear weights for a player are going to be compared and inferences drawn from the comparison, and I am not sure if that is possible using your methodology.


#6    Tangotiger      (see all posts) 2010/03/24 (Wed) @ 11:57

There’s 3245 threads in this blog.  Even if you DID see it, I can’t blame you for not remembering. 

It’s often happened to me that I don’t remember a thread I posted, wished I had done some work on something, and then see that I had already done the work.

***

The run environment calculation is different between Fangraphs and B-R.com.  Sean uses his park factors.  I don’t know if Fangraphs uses Patriot’s park factors, or if he uses my quick park environment calculation.

***

Right, in order to make the apple-to-apple comparison, we should calculate the run value of a single from the same RE table.  Chances are, the run value of the single relative to the double is .30 runs, regardless of how the RE tables are generated.

One thing I’ve been meaning to do is create the RE tables in a more dynamic form.  The one area I wanted to tackle was the HR rate.  The HR has such an impact on the run value of the OTHER events that I wanted to use that as a parameter, rather than just saying “run environment”.  It makes a difference if the run environment is 4.0 runs per game and you have 0.7 or 1.2 HR per game.  The run value of the single and double and walk will change alot.

How much?  Well, we can run the Markov calculator to see.


#7    Peter Jensen      (see all posts) 2010/03/24 (Wed) @ 12:16

Tango - Doesn’t it just make more sense to just calculate the RE tables empirically every year and calculate the linear weights at the same time.  It is not particularly difficult to do.  Or the other choice would be to calculate every 3 years or 5 years or ten years.  It really doesn’t matter much as long as the linear weights are calculated from the same data and the same events are used for each. 

By the way I am now calculating my linear weights by the added runs method that you used for table 4 in The Book rather than the more commonly used delta RE method.  It corrects for IBB and the descretionary aspect of walks and HRs, without significantly changing most of the other values.  I also have increased the number of categories by adding infield singles, outfield air ball singles, outfield GB singles, and double plays, and I separated safe FCs from out FCs and pickoffs from pickoff errors and K from K+CS and K+SB.  No big changes, but a slightly more rational approach for allocating value.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:54
The two uncertainties of UZR

Sep 02 15:17
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?