THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, October 25, 2006

The Perfect Run Modeler

By Tangotiger, 03:19 PM

It’s finally published:

http://www.tangotiger.net/markov.html


#1    David Smyth      (see all posts) 2006/10/25 (Wed) @ 16:22

Very cool.

A couple suggestions: remove the default stats, such as the 37 AB. It just makes more work for us to have to delete them to enter our desired numbers. And give the out value, in addition to those for the positive outcomes.

I entered the following ordinary line of 550 AB, 144 H, 26 2b, 2 3b, 14 HR, 52 BB. This comes to .262/.326/.393

The values I got were .379/.491/.778/1.074/1.477

The BB and HR values look too high. Does this have to do with my using the default runner advancement numbers?


#2    tangotiger      (see all posts) 2006/10/25 (Wed) @ 18:03

No, it has to do with no runners being put out on bases.  For example, if when you reach first base you have a 10% chance of being out on a DP, that reduces the value of being on 1B.  But, if you have a 0% chance of getting a DP, that increases your value of being on 1B.  Furthermore, with more guys on 1B, the walk becomes a force in advancing runners over.  Finally, because I assume randomness, the number of walks with men on 1B or bases empty is exactly the same.  This is not true in reality.  So, an extra .02 runs here, .02 runs there, and you get .05 runs too much.

For the HR, again, in reality, HR are hit disproportionately with bases empty.  As well, there are more runners on base in my model (no outs on base), so that helps the HR more.

I put in those default values as a way to “force” in the LWTS values as close to reality.  That is, while the true advancing to third on a single is 30-35%, I put in 25% to account for the fact that there are no outs on base.  I just played around with the numbers so that the LWTS figure in my perfect model matches reality as best I could.

You can of course create your own model.

This is good data to know:
http://www.tangotiger.net/destmob.html


#3    John Beamer      (see all posts) 2006/10/26 (Thu) @ 04:47

Tango—great stuff. Quick question on the reengine(). For the calaculation “state_3b_2_2” what do the numbers refer?

Thanks


#4    tangotiger      (see all posts) 2006/10/26 (Thu) @ 06:03

I should probably put some documentation in there.  In your case, that means:
lead runner in on 3B
there are two outs
there are two other runners on base

The code shows:
state_3b_2_2 = freqOUT;

This means that the chance he will NOT score is the chance that the batter makes an out.

This code here:
state_3b_2_1 = freqBB*state_3b_2_2+freqOUT;

Means there is a lead runner on 3B, two outs and only ONE other runner on base.  Therefore, the chance of the lead runner NOT scoring is equal to the chance of the batter getting a walk times the chance he will not score in the state_3b_2_2 state plus if the batter makes an out.

You just keep working backwards based on this idea until you get to lead runner on 1B, no outs.


#5    tangotiger      (see all posts) 2006/10/26 (Thu) @ 08:04

If you want to change the run environment real quick, just change your value for the AB.  For example, under the default values, change AB from 37 to 11.  Or even 10.1, or 10.01.

You’ll see how the run values all converge toward 1.00.


#6    tangotiger      (see all posts) 2006/10/26 (Thu) @ 08:11

Any suggestions for changes?  I’m thinking I can include K easy enough.

The user just needs to be aware of the “chances of moving on an out” at the bottom will exclude Ks.  So, it’ll read for Event: “Out, not K”.  I can then include the K value, relative to the out value, in the LWTS portion, so you can see how much impact the K has, and in what kind of environments it has alot or a little impact.

Like I said, this model ONLY works if there’s no outs on base, nor non-PA runner movements, so please, don’t suggest that.  By definition, every player in the batting order must be exactly the same.


#7    John Beamer      (see all posts) 2006/10/29 (Sun) @ 09:23

I love the logic of this run model. The Markov chains that I have done in the past have involved fairly complex matrix algebra but you have simplified it greatly with your negative logic.

I adapted the code to include double plays and using David’s numbers with the batter hitting into 15 DPs I get the following values:

0.354 : BB
0.468 : 1B
0.784 : 2B
1.086 : 3B
1.493 : HR

HR still a shade high. I’ll probably add OOB as well as stolen bases over the next week or so.


#8    John Beamer      (see all posts) 2006/10/29 (Sun) @ 09:37

Quick point. If you put 100AB and 75HR, for instance, the model breaks down ... you get -0.8 runs per game with values:

1.460 : BB
2.507 : 1B
1.792 : 2B
1.111 : 3B
1.762 : HR


#9    David Gassko      (see all posts) 2006/10/29 (Sun) @ 11:01

Quick point. If you put 100AB and 75HR, for instance, the model breaks down ... you get -0.8 runs per game with values:

***

John, are you sure you remembered to put 75 hits? I get 81 runs per game with the following linear weight values:

0.892 : BB
0.892 : 1B
0.893 : 2B
0.904 : 3B
1.000 : HR

I just checked, and you did forget to specify 75 hits. You can’t have 0 hits and 75 home runs!


#10    John Beamer      (see all posts) 2006/10/29 (Sun) @ 11:22

David,

Quite correct. I (completely wrongly) assumed that the hit box was 1B!!

Thanks for pointing it out


#11    David Gassko      (see all posts) 2006/10/29 (Sun) @ 13:02

Wow! I decided to run the following theoretical team through the Markov: 427 at-bats, 400 hits, 100 doubles, 100 triples, 100 home runs, 100 walks. Markov says the team will score 490 runs per game, with the following linear weight values:

0.977 : BB
0.988 : 1B
1.000 : 2B
1.006 : 3B
1.028 : HR

BaseRuns says the team will score 489 runs with the following linear weight values:

0.973 : BB
0.981 : 1B
0.999 : 2B
1.017 : 3B
1.027 : HR

I am beyond impressed...I’m shocked.


#12    David Gassko      (see all posts) 2006/10/29 (Sun) @ 13:10

Actually, I probably shouldn’t be, since BaseRuns is the only run estimator that acknowledges everything is worth one run when OBA = 1.000, and we’re so close here. But even when I plug in ridiculous lines like some Barry Bonds seasons, BaseRuns is only marginally off. It doesn’t adjust the walk and single values fast enough as OBA increases (until it’s really close to 1.000), and it (partially) over-compensates by giving too much weight to extra-base hits. But it’s still much closer than anything else.


#13    tangotiger      (see all posts) 2006/10/29 (Sun) @ 15:28

Likely the only time BaseRuns will blow up is with an all-walk team, or all-singles team, or a triples-heavy team (because of the known limitation, but we have a fix for that in another thread).

John: I guess I should put in some error-checking in there, since not all the fields are mutually-exclusive.

As well, when you say you have a “fix” for the DP and OOB, can you clarify?  The reason taht I couldn’t do it in my model is that in my model, I assume that the number of PA for each out level to be exactly the same.  So, if I have an OBP of .333, then I know I have 1.5 hits or walks and 3 outs, and that I have 1.5 PA with 0, 1, and 2 outs.

OOB will kill this.


#14    John Beamer      (see all posts) 2006/10/29 (Sun) @ 20:57

Tango,

To be honest I am not 100% sure what you mean by the example above, specifically the phase I have 1.5PA with 0, 1, and 2 outs. I suspect though that I may have missed a trick with your code. Let me explain what I have done (DP only, not OOB yet); if you want to look at the alterations to the code then I can either post it or mail it to you.

Quite simply I have changed the state transitions by adding in the number of DP in to the relevant states. This involved creating a few new states, for example, differentiating between lead runner on 3rd with a man on 1st vs a man on 2nd (which you previously had as a single state. By including the DP the chance of NOT scoring a run goes up obviously. This is where I spent most of my time fixing the code—but I made one or two other changes on the RE table to reflect the right states.

The chance of scoring from the various events now decreases because of adding the double play; hence rpi also decreases.

The LWTS generator I didn’t touch because it looked liked you generate the LWTS using a type of +1 method.

I actually have a fully fledged matrix based Markov somewhere, which by no means as elegant as this definitely does include DP, I need to compare the accuracy of the two models.

I must say in your model, with my adjustments I was slightly surprised to see the HR value increase as the DP removes men from the bases. I suppose (thinking allowed) that a HR guarantees that the runners score and reduces (to 0) the chance of a DP therefore making the HR worth more.

Keen to hear your thoughts and why I can’t simply add states like I did!!


#15    tangotiger      (see all posts) 2006/10/30 (Mon) @ 04:27

I’ve also got a full Markov version (used in THE BOOK for the batting order and other places), without limitation.

The limitation in this simple one posted here though is fairly rigid.  Adding the DP or OOB to the first part of the code in the reEngine function is fairly straightforward. 

The problems happen right after.  If we look at this piece of code:
chance1B_3 = 1 - (state_1b_2_0 + state_1b_1_0 + state_1b_0_0) / 3.0;

This says that the chance of scoring is based on the player’s chance of not scoring, with 2 outs, 1 out, and 0 outs, EQUALLY weighted.  That is, we expect an equal number of singles to be hit with 2, 1, 0 outs.  However, with GIDP and OOB, we “skip” over PA at either 1 or 2 outs.  So, we don’t have 33.33% of PA at each of 0,1,2 outs.  More like 34, 33.5, 32.5 or some such.

Then, this code:
runs1B_3 = chance1B_3 * (freq1B + freqBB);
won’t work.  Since chance1B_3 is wrong, that is, the chance of scoring from a RANDOM single is wrong, then the “getting on” runs portion of the single is wrong.

Then this code:
runsALL = (runsHR + runs3B_3 + runs2B_3 + runs1B_3) * freqPA ;

See the freqPA?  That was simply determined by this code:
freqPA = freqOBA/freqOUT * 27.0 + 27.0;

This makes no allowance for the OOB.  I suppose this one can be fixed, but you’d have to know the exact percentage of the outs that were DP or OOB.

So, those are your limitations.  If you want to introduce DP, OOB, you have to be aware of that.


#16    tangotiger      (see all posts) 2006/10/30 (Mon) @ 08:23

I have made an update, which will post to my site tonight.  The changes are:
- Added Strikeouts. 
- Added LWTS values for Strikeouts, Outs
- Added run values for the out value for Runs Created

The default values remain.  David, I don’t understand the “more work”, since you have to tab to those fields to change it, and when you tab to the fields, the whole field is highlighted, and therefore, will delete once you start typing in the values you want.  That is, highlight the “37” in the AB field, type what you want, then hit TAB.

If there are any other requests, please put them in before 5 PM ET today, and I’ll consider them.  Otherwise, look for the updated file tonight.


#17    Tangotiger      (see all posts) 2006/10/30 (Mon) @ 14:41

Update: I have also added a BsR estimator, as well as the basic Bill James RC estimator.

As well, I show the linear weights values for both of these.  So, you will see the following:

4.957 : Runs Scored per Game

4.954 : Estimated, Runs Scored per Game, BaseRuns
5.122 : Estimated, Runs Scored per Game, Bill James Runs Created

BA / OBP / SLG : 0.270 / 0.341 / 0.405

...
...
...

Marginal Run Values by Event, i.e. Linear Weights (BaseRuns, RunsCreated in parenthesis)

0.403 : BB (0.328, 0.239)
0.521 : 1B (0.495, 0.587)
0.809 : 2B (0.775, 0.935)
1.090 : 3B (1.049, 1.282)
1.488 : HR (1.412, 1.630)
-0.304 : OUT, excludes strikeouts
-0.326 : SO

Check back tonight.


#18    Tangotiger      (see all posts) 2006/10/30 (Mon) @ 14:47

What you will find interesting is when you put in some extreme games.  For example, put in a game with 27 outs, 4 hits, all of them HR.  Obviously, this game has 4 runs.  This is what the output will show:

4.000 : Runs Scored per Game

4.000 : Estimated, Runs Scored per Game, BaseRuns
2.065 : Estimated, Runs Scored per Game, Bill James Runs Created

BA / OBP / SLG : 0.129 / 0.129 / 0.516

...
...
...
0.243 : BB (0.211, 0.445)
0.244 : 1B (0.216, 0.583)
0.278 : 2B (0.225, 0.720)
0.535 : 3B (0.233, 0.858)
1.000 : HR (1.000, 0.996)
-0.148 : OUT, excludes strikeouts
-0.148 : SO

It’s pretty obvious that RC breaks down here.  Or go the other way, add make it 100 hits, all of them HR.  So, that’s 100 HR with 27 outs.  Result:

104.000 : Runs Scored per Game

104.000 : Estimated, Runs Scored per Game, BaseRuns
330.260 : Estimated, Runs Scored per Game, Bill James Runs Created

BA / OBP / SLG : 0.794 / 0.794 / 3.176
...
...
...
0.915 : BB (0.874, 0.648)
0.915 : 1B (0.875, 1.444)
0.916 : 2B (0.875, 2.240)
0.924 : 3B (0.876, 3.036)
1.000 : HR (1.000, 3.832)
-3.852 : OUT, excludes strikeouts
-3.852 : SO

We see the ridiculousness of Runs Created here.  We also see how BsR is lagging.  The marginal impact outside of the HR should be at the .91-.92 runs per event, but BsR is showing .87-.88.  So, it’s likely that BsR is underestimating the runs scored as the run environment goes up.

Or of course, it’s possible that because of my model, which assumes no outs on base, makes the comparison not directly comparable.


#19    Tangotiger      (see all posts) 2006/10/30 (Mon) @ 14:49

That was obviously 104 HR, 131 AB.


#20    tangotiger      (see all posts) 2006/10/30 (Mon) @ 19:09

Site updated.

Also added limited documentation.


#21    John Beamer      (see all posts) 2006/10/30 (Mon) @ 20:23

Tango—great update. On the DP / OOB condition can’t you parse the pbp data to find out what the right weightings should be for the chance3B etc. line and make the adjustment in the code?

The approach you use here is so elegant and reduces the need to complex matrix algebra that I am trying to think whether it can be adapted to a full Markv.

I guess the other question is how far out will your simpler model be?? Probably pretty damn close ...


#22    tangotiger      (see all posts) 2006/10/31 (Tue) @ 07:08

I think you’ll have to immerse yourself in the code and the concept to appreciate the limitation of the OOB. 

In short, the model works because every player on base either scores or doesn’t score (i.e., left on base), and that there are 27 batting outs.  Anything that deviates from that, will cause problems.


#23    Rally      (see all posts) 2006/10/31 (Tue) @ 10:59

Dino Ebel’s purpose in life is to foil your run modeler.


#24    John Beamer      (see all posts) 2006/10/31 (Tue) @ 12:02

Point taken Tango. I am actually going to dig up my full Markov and recode it in Java and probably post it in a month or two once I have redone the interface to allow the user to flex the inputs like your basic model ....


#25    Tangotiger      (see all posts) 2006/10/31 (Tue) @ 12:43

Another useful thing we can look to the run modeler is: impact of baserunning.

If we go with the default data, the team scores 4.957 runs per game.  Now, what happens if I drop all the runner advancements on base hits by .15?  That is, turn an average runner into a slow runner?  In that case, the team scores 4.734 runs per game. And if we similarly drop the movement on outs?  that drops further to 4.526.

So, the difference between an average running team, and a team of slow-footers is 0.431 runs per game, or 70 runs per season.  Divide by 9 hitters, and that’s 8 runs per player, with half the gain on hits, and half the gain on outs.

If on the flip side we had a team of speedsters, what happens? 5.167 runs, when moving fast on hits.  5.374 when moving on both hits and outs.  That difference is 0.417 runs per game, or 68 runs per team per season, or also almost 8 runs per player.

(Note: for moving on outs, with runner on 1B, I made the range .00 for slow footers, and .20 for speedsters.)

So, there you go.


#26    Tangotiger      (see all posts) 2006/10/31 (Tue) @ 13:55

You may also want to compare the simple Markov program, using the default values, to what I did here:
http://www.insidethebook.com/ee/index.php/site/article/the_secret_recipes_of_the_run_expectancy_matrix/

In there, I just use rules of thumb to come up with an RE matrix.

I then compare it to actual data.  As luck would have it, all three examples use 4.95 runs per game.


#27    John Beamer      (see all posts) 2006/11/04 (Sat) @ 11:25

I think a useful addition, if possible in the simple model, is to show the scoring distribution ie, from a particular base out state what is the probability of 1 run scoring, 2 runs scoring etc ...


#28    tangotiger      (see all posts) 2006/11/04 (Sat) @ 15:35

Unfortunately, I don’t think it’s possible, though I’d be happy to learn I’m wrong.  The best I can do is “0 or 1+” runs.  However, since I know the average runs, we might be able to come up with a simple function to give us what you want (similar to what Woolner did in a recent BP annual… I think BP 2005).


#29    John Beamer      (see all posts) 2006/11/05 (Sun) @ 09:38

How did you do it in the full Markov? Did you create additional states for 1 run, 2 runs scored etc .... or did you run a Monte Carlo to work out the probabilities?

You could probably do a Monte Carlo in this instance but it’d be messy


#30    tangotiger      (see all posts) 2006/11/05 (Sun) @ 13:33

semi-Monte Carlo, with 1 million trials for each 24 base/out states.  Only counted transition rates to next base/out states, and then used prob theory, as I did here.


#31    John Beamer      (see all posts) 2006/11/06 (Mon) @ 23:35

My only other thought is to somehow use the tango distribution but modified for rpg and base/out state. I’ll give it some thought and try to find a function.


#32    tangotiger      (see all posts) 2006/11/07 (Tue) @ 11:41

UPDATE

I have added the Run Frequency matrix, as described earlier.  It simply shows the chance of at least 1 runner scoring (i.e., the lead runner on base scores, or if bases empty, some batter reaching base and eventually scoring).

This is useful for “one-run” strategies, though of course, we always prefer Win Expectancy tables. 

If you set the “AB” box to “53.4”, you will see the chance of scoring from 1B with 0 outs, and from 2B with 1 out, is identical (under the model constraints).


#33    tangotiger      (see all posts) 2006/11/08 (Wed) @ 11:42

Slight update.  Added “Chances of Scoring” by outs, in addition to the overall average already there.

I also formatted the output a bit nicer.

I’m thinking there should be a nice easy (and correct) way to get the Run Frequency Matrix to show “0, 1, 2, 3, 4, 5+” runs, and not just “0, 1+”.  If someone wants to think about it, please do.  Once we have that, we can… wait for it… generate a Win Expectancy matrix on the fly for any possible run environment!  (And of course, Leverage Index too.)


#34    John Beamer      (see all posts) 2006/11/08 (Wed) @ 14:23

Studes’ winexp spreadsheet at baseballgraphs.com calculates the run frequency matrix for all baseout states given rpg as an input. I think it uses the Tango distribution but I am not 100% sure ... was planning to try to delve into it this weekend to work out how it worked but probably the best way to implement it in the simple Markov


#35    tangotiger      (see all posts) 2006/11/08 (Wed) @ 17:51

I’m sure he’s either using the Tango Distribution, or Woolner’s framework.

***

I just had a crazy thought.  I can modify the program, with some effort, to make the “outs per inning” flexible.  That is, I’ll still have 27 outs in a game, but I can make the outs per inning 1, 3, 9, or 27. 

Is this just a trivial fancy for you guys, or do you see something worthwhile?


#36    John Beamer      (see all posts) 2006/11/08 (Wed) @ 23:32

Tango

Definitely has *some* merit, though whether that is enough to justify you spending time on this I don’t know. For instance, I do coach some 9-out softball which would the model would be appropriate for. Also, personally, it is always interesting to see what would happen if you tweaked the rules of baseball somewhat—what would happen if baseball was a game of 4 outs rather that 3? Those questions, although theoretical, are interesting and I believe allow us to have a deeper understanding of the context with which the game is played ...


#37    Tangotiger      (see all posts) 2006/12/18 (Mon) @ 14:50

To convert wOBA (or OBP) into a Runs per game figure, you can either do it the linear way, or this more accurate way:

Runs = (OBP/(1-OBP)) ^ 1.6 * x
where x is set to calibrate to the league average

It’ll be something like 14 or 15.

So, an OBP of .333, and an x value of 14.5 means 4.78 runs per game.  An OBP of .500 means 14.50 runs per game.

This shortcut is valid for OBP of .000 to .500.


#38    tangotiger      (see all posts) 2007/01/30 (Tue) @ 13:51

A couple of you baseball programmers may enjoy this:
http://www.footballcommentary.com/bbmodel.htm


#39    HarryAbles      (see all posts) 2007/04/06 (Fri) @ 17:17

Can someone explain where the taking-the-extra-base numbers are within the code?


#40    tangotiger      (see all posts) 2007/04/06 (Fri) @ 19:04

It’s in the function where you see this code:

state_3b_1_2 = freqOUT*state_3b_2_2*(1-rateNonK_OUT*eventOUT_state3B_1);


#41    jacksprat      (see all posts) 2007/07/02 (Mon) @ 00:58

Plugging in league totals from the past few years, the modeler consistently overshoots by .1 to .2 R/G.  Any idea why that would be?


#42    tangotiger      (see all posts) 2007/07/02 (Mon) @ 09:43

The preconditions were set in the first paragraph of the article.

You can try to compensate by modifying the baserunning values at the bottom of the page, or by excluding IBB.

If you remove IBB (and leave the baserunning numbers as-is), the 2006 Markov gives you 4.93 runs per game, compared to the actual 4.91.

More important than trying to get a perfect match, is seeing the impact when you modify things.


#43    tangotiger      (see all posts) 2007/09/14 (Fri) @ 14:51

A reader has made enhancements to my program (to include the full scoring distribution, rather than the one I have of at least 1).  I have not vetted it, but it looks ok:

http://www.tangotiger.net/markov_wes.html

It has a long background javascript process, meaning it will ask you to abort or not.  Say “no”.

This is pretty fascinating, and was the missing link in terms of creating a full win probability matrix.  Now that the heavy work was done, it just requires a bit more work to be able to create a full win probability matrix (and given two teams of ANY kind)!  Maybe in a few months…

And, because of the processing time, I have to change it from javascript into perl or PHP.


#44    tangotiger      (see all posts) 2007/09/14 (Fri) @ 15:15

Just to be clear, all that’s needed to be done is to use the runs per game distribution from the updated Markov program of post 43, and feed it directly into the Tango Distribution program (last link on my home page).

It would be ideal if I were to turn the javascript of the Markov and the pascal of the Tango Distributon into a common language (perl, PHP, Java, whatever).  That’s where the hold up will be (the pain in the b-tt to do a rewrite).


#45    tangotiger      (see all posts) 2007/09/23 (Sun) @ 08:47

The reader who offered his upgrade spotted a coding bug in my original program.  Those interested will see a new equation for:
state_1b_1_0

It was sloppy on my part.  I will now do a recheck of the code, and will reformat it so that it’s slightly more structured.


#46    Tangotiger      (see all posts) 2007/09/24 (Mon) @ 15:58

state_1b_0_0 had an even more egregious mistake (paren in the wrong spot, and part of a term missing).

Anyway, both files have been updated.  I have reformatted that part of the code so that it is more readable. 

Why did I have the mistake?  Basically, I had a script to generate the script, and I didn’t double-check it.  I did now, so we should be good with that part of the code.

However, making those changes caused the runs per game to go down, meaning I need to readjust by changing my defauly input variables.  I’ll work on that soon.


#47          (see all posts) 2007/09/24 (Mon) @ 17:31

Tango—did both those bugs affect your original code? I used you original code (with parameters) to double check my Markov and while not identical it was very very close. I mean close enough not to quibble over


#48          (see all posts) 2007/09/24 (Mon) @ 17:47

Your Markov gives .545 runs per inning mine is .549 for the same assumptions. LWTS values are close to identical.

Not a train smash.


#49    tangotiger      (see all posts) 2007/10/03 (Wed) @ 21:10

I would hope we’d get identical results, though I guess we can live with it. 

Yes, the bugs affected things.  If you are basing your code on mine, check those two variables I noted, and make sure you’ve got them exactly the same.


#50          (see all posts) 2007/10/04 (Thu) @ 11:48

Tango—quick question on how you account for baserunning on an out.

Suppose the start state is x23_0 and the runner on second takes the base 50% of the time on an out.

What do you assume in the above state. Do you automatically assume that the man on 3rd takes the base (in this case home) as well or do you assume that the baserunning event only occurs if the next base is empty?


#51    Tangotiger      (see all posts) 2007/10/04 (Thu) @ 13:03

Ah, that’s the little gem in the code.  The perspective is always based on the lead runner.  In your case, if the start state is x23_0, I represent that as TWO states.  It is critical however, absolutely critical, that the lead runner advances as much or more than the trailing runner.

So, if we go to my code, the x23_0 is represented as these two states:
state_3b_0_1
state_2b_0_0

As I say in my documentation:
// chance of *not* scoring for each state
// the variable name is of the form:
// - state_bb_o_r, where
// --- bb = base of the lead runner
// --- o = number of outs
// --- r = number of other runners on base

So, what we first handle is the guy who is on third base (with a trailing runner somewhere).  We try to figure out how often he’ll be stuck at third base, and it’s represented by this:
state_3b_0_1 = state_3b_0_2*freqBB + state_3b_1_1*freqOUT*(1-rateNonK_OUT*eventOUT_state3B_0);

That second term tells me: “state_3b_1_1” the runner is on 3B, there’s a trailing runner somewhere, and there’s now 1 out.

If you look at the equation for state_2b_0_0 (I won’t republish the whole one here), one of the terms shows:
state_3b_1_0*freqOUT*rateNonK_OUT*eventOUT_state2B_0

As you can see, our runner on 2b can find himself on 3B, with 0 outs.  Is it possible we have two runners on 3B at the same time?  No!  That’s because the runner on 3B has this:

1-rateNonK_OUT*eventOUT_state3B_0

And the runner on 2B has this:
rateNonK_OUT*eventOUT_state2B_0

If eventOUT_state3B_0 and eventOUT_state2B_0 are equals, then “rateNonK_OUT*eventOUT_state3B_0” is just a constant term, and therefore, means that the runner on 2B will advance 1 minus the number of times the runner on 3B will stay.  As long as eventOUT_state2B_0 is less than or equal to eventOUT_state3B_0, the thing works.

In baseball, this is perfectly acceptable.


#52          (see all posts) 2007/10/04 (Thu) @ 15:29

I found a minor error on how I treat walks—one of the states was wrongly coded. The runs per inning are now 0.543 for me and 0.545 for you. If I

Our philosophy on baserunning is the same in that we focus on the lead runner. If I enter all baserunning events as 0 in both our models I get an identical run exp matrix, that is if no-one advances we are the same. If we code that as a 1 ie, everyone advances we are also identical. If I advance the 1B runner 50% of the time on a single double and out and everyone else all the time then we are also identical.

So the problem is when runner advancement when two guys are on base. Let’s make it simple. We’ll just have 37 ab and 7h—that’s it. The only advancement we allow is 1b 100% and 2b 50% with 0 outs. This means whenever there is a man on 1 and 2 (12x,123) then the advance should only happen 50% of the time. Sure enough comparing run exp we match except for the 12x_0 and 123_0 states.

If we look at the run value of the 12x_0 state mine is 0.328 and yours is 0.354. That means that I assume runners advance less than you do. If we dive into the transition matrix then on a hit, which happens 19% of the time, we’d expect that at 12x_0 to see a transition to 123_0 (no advancement & no run scored) 9.5% and to 1x3_0 (advancement of 2B and 1B and run) 9.5% of the time. My transition matrix does exactly that.

So 2 questions: (1) Is my logic right and if not why? (2) Can you generate a transition matrix from your code under the above assumptions specifically for the 12x_0 state?


#53    Tangotiger      (see all posts) 2007/10/04 (Thu) @ 15:50

Ah, but I see my provision is that the lead runner’s advancement must be greater than or equal to the trailing runner.  Therefore, your inputs won’t work for me.  (I should probably put an error check for that.)

With a runner on 1B, he can’t move 100% of the time… if there’s a runner on 2B who only moves 50% of the time.  What you are saying is that if there’s a runner on 1B and 2B, the runner on 1B moves only 50% of the time.  If there’s a runner on 1B only, he moves 100% of the time.  You determine that by inference.

I don’t do that.  The value in the row of 1B/2B must be greater than or equal to the row of 1B/1B in my chart.

The way I coded it, I can’t do the inference.


#54          (see all posts) 2007/10/04 (Thu) @ 16:03

Okay I get it. And as you say, provided you are using logical run advancement assumptions it isn’t an issue.

Great! I was worried I had made a monumental screw up.


#55    Tangotiger      (see all posts) 2007/11/27 (Tue) @ 11:33

John touches about his work that will appear in the book:
http://www.hardballtimes.com/main/article/introducing-markov-chains/

The amount of work to code a Markov that allows for baserunner outs is enormous.  The modeler I posted does 90% of what you need, with about 10% of the programming effort.  I didn’t think anyone was crazy enough to do the work, but John did it.  In this case, what I have (and most people have) is a simulator or semi-Markov simulator that you run for 1 million trials.

I should also note that Tom Tippett also has a Markov chain (that includes batting order, but I don’t know if it includes baserunner outs), that he discussed on his old blog a few years ago.

Anyway, I’m looking forward to the book.  There seems to be enough in there to keep readers of this blog happy.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main