THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, July 30, 2007

Base/Out and Run Frequency

By Tangotiger, 11:48 PM

I have this laying around and a few people always ask me for “how often does each base/out frequency” occur.  So here you go:


Legend:
reoi: runs to end of inning
R0: scoreless innings
R1+: at least 1 run scoring
freq: frequency of base/out state

IIRC, this was all the games, excluding partial innings and home halves of 9th and later innings

So, runner on 1B and 0 outs: 44.4% of the time a run scored.  Runner on 2B and 1 outs: 41.4% of the time a run scored.

reoi	R0	R1+	freq	basesit	outs

0.561 0.701 0.299 0.240 Empty 0
0.300 0.824 0.176 0.170 Empty 1
0.118 0.921 0.079 0.136 Empty 2

0.959 0.556 0.444 0.056 1st 0
0.576 0.712 0.288 0.064 1st 1
0.253 0.862 0.138 0.062 1st 2

1.196 0.361 0.639 0.019 2nd 0
0.730 0.586 0.414 0.031 2nd 1
0.348 0.773 0.227 0.039 2nd 2

1.579 0.352 0.648 0.015 1st_2nd 0
0.974 0.566 0.434 0.026 1st_2nd 1
0.470 0.764 0.236 0.032 1st_2nd 2

1.486 0.128 0.872 0.003 3rd 0
0.989 0.332 0.668 0.011 3rd 1
0.390 0.732 0.268 0.016 3rd 2

1.911 0.121 0.879 0.006 1st_3rd 0
1.245 0.342 0.658 0.012 1st_3rd 1
0.537 0.711 0.289 0.015 1st_3rd 2

2.064 0.140 0.860 0.004 2nd_3rd 0
1.472 0.299 0.701 0.009 2nd_3rd 1
0.643 0.717 0.283 0.010 2nd_3rd 2

2.412 0.122 0.878 0.004 Loaded 0
1.642 0.323 0.677 0.010 Loaded 1
0.815 0.671 0.329 0.012 Loaded 2

#1    John Beamer      (see all posts) 2007/07/31 (Tue) @ 02:08

Tango—what’s the source for this? Is it actual retrosheet pbp data or is it Markov generated?


#2    tangotiger      (see all posts) 2007/07/31 (Tue) @ 07:26

I think this was from actual 1999-2002.  And now that I think about it, it’s only innings 1 through 8, excluding partial innings.  I think.


#3    John Beamer      (see all posts) 2007/07/31 (Tue) @ 08:33

Out of interest here are my numbers for the 2006 season, which are generated from a Markov

       0 out    1 out    2 out Total  
xxx    24.1%    17.0%  13.3%   54.4%
1xx    6.2%    7.3%    7.5%    21.0%
x2x    1.5%    2.7%    3.5%    7.6%
xx3    0.2%    0.8%    1.4%    2.4%
12x    1.6%    2.5%    3.2%    7.3%
1x3    0.5%    1.1%    1.6%    3.2%
x23    0.3%    0.7%    0.8%    1.8%
123    0.4%    0.9%    1.0%    2.3%


#4    tangotiger      (see all posts) 2007/07/31 (Tue) @ 10:21

Fairly close, except for the 1xx line.  Does your Markov process include GIDP, SB?


#5    Pizza Cutter      (see all posts) 2007/07/31 (Tue) @ 12:03

Tom, I can very easily update that to reflect 2000-2006 using Retrosheet logs.  Interested?


#6    Tangotiger      (see all posts) 2007/07/31 (Tue) @ 12:11

Sure feel free to post it here, or on a Google Docs.


#7    John Beamer      (see all posts) 2007/07/31 (Tue) @ 13:12

Yes, it includes both (although SBs are reasonably situational and I don’t adjust for that). I have been wondering about the discrepancy and I think it may be to do with how I treat errors. If there is an error on the fielding play my Markov logs that as a batting play from xxx to 1xx and then a “non-batting” error advancing from 1xx to x2x. I think that accounts for the difference.

For instance assume that that 24% of PA are xxx_0 and single + walk (ignore HBP) are .25 and the out rate is .66—these are 2006 data, pretty much.

Then you get

xxx_0 = 24%
1xx_0 = 6%
xxx_1 = 16%
1xx_1 = 6%*.66 + 16%*.25= 7.9%
1xx_2 = 8.5% (by the same process)

1xx_Y = 22% ish, which is more in line with my Markov than the actual. Of course there are other situations to get in the 1xx state as well that will add and through SB/ force outs/ errors other ways to get out of the 1xx state. This is dynamic rather than the static picture I present above.

Given the difference is between the 1xx and x2x in my model, and given how I account for errors I think that is the issue.

Thoughts?


#8    John Beamer      (see all posts) 2007/07/31 (Tue) @ 13:58

On second thoughts my error explanation probably isn’t correct. After all the majority of errors on first are those which turn an out into 1xx (from xxx). Errors that allow players to get x2x from xxx after a single will be much more rare.

Hmm. I’m a bit stumped.

Taking out SB/CS changes the % of 1xx by about .3% (up). Taking out GIDP change 1xx % by another .8% (up).

As I said, thoughts appreciated.


#9    John Beamer      (see all posts) 2007/07/31 (Tue) @ 14:00

Tango—if you are interested I am happy to share my markov with you. The complete version is large but I can send you a cut down version which will allow you to see the logic at least.


#10    John Beamer      (see all posts) 2007/07/31 (Tue) @ 16:45

Here is 00-05 Retrosheet data

       0 out   1 out   2 out   Total
xxx    23.4%   16.7%   13.3%   53.4%
1xx    5.9%    6.8%    6.8%    19.6%
x2x    1.8%    3.1%    3.9%    8.9%
xx3    0.3%    1.0%    1.6%    2.9%
12x    1.4%    2.6%    3.3%    7.3%
1x3    0.5%    1.2%    1.6%    3.3%
x23    0.3%    0.9%    1.0%    2.2%
123    0.4%    1.0%    1.2%    2.5%


#11    John Beamer      (see all posts) 2007/07/31 (Tue) @ 16:55

I seem to be having a one man conversation here. No matter it is fun.

I think I can account for the discrepancy between Markov and actual. It is situational hitting. Here is the probability of getting either a single or walk by base state:

xxx    0.234
1xx    0.248
x2x    0.292
xx3    0.308
12x    0.236
1x3    0.278
x23    0.361
123    0.251

For other hitting events 2b/3b/HR there is less difference
xxx    0.093
1xx    0.094
x2x    0.089
xx3    0.087
12x    0.087
1x3    0.097
x23    0.092
123    0.102

This means that there is less chance of going xxx_0 to 1xx_0 in reality than the model assumes.


#12    tangotiger      (see all posts) 2007/07/31 (Tue) @ 17:10

The biggest difference is definitely in walks.  The next big one, far behind is HR.  The others (singles, doubles, triples, hit batters) are fairly random, from what I remember.


#13    John Beamer      (see all posts) 2007/07/31 (Tue) @ 17:16

Yup, you’re right. Walks are a bigger factor than 1B, but 1B aren’t altogether unimportant

       1B/AB    BB/PA
xxx    17%    7%
1xx    19%    7%
x2x    17%    14%
xx3    18%    14%
12x    17%    8%
1x3    20%    8%
x23    18%    20%
123    19%    6%

Apologies for different denominators—easiest way for me to pull the data


#14    tangotiger      (see all posts) 2007/07/31 (Tue) @ 17:25

I’m ok with it, as long as “AB” really means “AB+SF”.


#15    John Beamer      (see all posts) 2007/07/31 (Tue) @ 17:50

No it didn’t. But below it does

      1B/AB
xxx    17.1%
1xx    19.0%
x2x    17.2%
xx3    17.0%
12x    17.0%
1x3    18.7%
x23    17.0%
123    17.9%
Average = 17.5% ... effect of 1B there but pretty small, as you said it is mostly walks


#16    tangotiger      (see all posts) 2007/07/31 (Tue) @ 19:38

The 1xx for the singles doesn’t surprise me.  It’s one of the findings in The Book, especially for LHH. (Not really a big finding, since it’s conventional wisdom.)


#17    John Beamer      (see all posts) 2007/08/01 (Wed) @ 13:02

Tom—a quick question. In your data how do you account for SB? If a batter goes from 1xx to x2x on a SB even though this is still the same PA I assume you add 1 to the 1xx bucket and 1 to x2x bucket?


#18    tangotiger      (see all posts) 2007/08/01 (Wed) @ 13:30

I track two items, “n” and “PA”.  So, for a SB, I add 1 to each n-bucket.  But, the PA-bucket is only added once (the end of the PA). 

Saves me the trouble of trying to justify anything by doing everything.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:33
Avery being Avery

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP