THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, March 01, 2010

Markov Lineup simulator

By Tangotiger, 01:22 PM

Berselius.  For those not aware, I also have a Markov calculator, but it doesn’t do lineups.  It uses recursion, the code is there for anyone to use.  Adding in lineups simply means adding one more dimension to the array.  I’ve done it (for The Book).  The code is very small when you use recursion.

I encourage all you programmers to look at my code and expand it for lineups.


#1    berselius      (see all posts) 2010/03/01 (Mon) @ 13:38

Thanks for the link!

I would be the first to admit that my code (linked from the bottom) is pretty clunky and could use some streamlining.  What I do plan to do is link it to something like a CHONE projections file/database to make doing the inputs easier. I’m going to use your numbers from http://www.tangotiger.net/destmob.html to add in the baserunning stuff, since it’s simply station-to-station right now.


#2    Xeifrank      (see all posts) 2010/03/01 (Mon) @ 13:58

This is a baby step start.  You really need to take many more inputs into account if you want to ever draw a series conclusion from the results.  And I don’t think it would be too difficult to code up something that cycle through all 9! permutations of lineups.
vr, Xei


#3    Peter Jensen      (see all posts) 2010/03/01 (Mon) @ 14:25

And I don’t think it would be too difficult to code up something that cycle through all 9! permutations of lineups.

Xei - It might not be too difficult, but perhaps berselius would like to use his computer for some other purpose during the 252 days it would take to run 100000 iterations of each permutation.  If you think you can do better, then do it. If you are not willing or able to do that, then just accept the work of others for what it is.  Criticise the work for what the author attempts to do, not on what you would wish it would do.


#4    Matt K. (d_f)      (see all posts) 2010/03/01 (Mon) @ 14:28

Cool idea… can I ask if matlab is easy enough (and small enough) to install? I’d love to play with this myself.


#5    berselius      (see all posts) 2010/03/01 (Mon) @ 15:06

Matt K/4

matlab is a computing software package that is licensed. I have access to it because I work at a university, but getting a personal license is crazy expensive. I’m only writing in in matlab right now because that’s what I do 95% of my research coding in so it’s what I’m most comfortable with.

Given the interest people seem to have in this I’ll probably convert it over to perl in the next iteration so more people than me can actually run it!


#6    berselius      (see all posts) 2010/03/01 (Mon) @ 15:10

Xeifrank/2 and Peter/3

Peter is right - running all the possible permutations on a single processor would take waaay too much time. But there’s a lot of redundancy in the solution space. There are algorithms that can explore these kinds of state spaces (i.e. the spaces of all possible lineups), though they are complicated by the fact that these are discrete and not continuous.

Something to consider is that this should be straightforward to paralelliz - even down to the inning level everything is independent from one another. If only I had a RoadRunner in my backyard…


#7    Tangotiger      (see all posts) 2010/03/01 (Mon) @ 15:29

You don’t have to do all the permutations.

Let’s say, for example, that you have 1 through 9, and you get your runs per inning when the “1” leads off.  You run that 1000 times or 1 million times, and you have your runs per inning for that state.

Now, say you change 8 and 9, so that it’s 1,2,3,4,5,6,7,9,8

Well, the 99% (or whatever) of the time that the inning went 1 through 7, you don’t have to rerun those innings.  You just have to run those times when the 8th and 9th place hitters were in play.

Aaaaaand, you don’t even have to do is in this crude manner.  What you are asking is: “when there are 2 outs, and there’s a runner on 1B”, what happens when the next hitter is Mr #9?

So, think of the 24 base/out fixed states, rather than the sequencing of the batters.  The end-result is far more manageable.

This is what I did for The Book.  It’s really not that bad.


#8    berselius      (see all posts) 2010/03/01 (Mon) @ 15:40

That makes sense if you flip the last two. But what if I scramble them all up? I.e.

2,7,3,1,9,8,4,5

I do agree that this would be easier to re-write in terms of base-out states for each batter’s hiting line. But you still have to account for the cumulative effect of the lineup somehow. Maybe you could shorten things by figuring out how many runs are scored in an inning depending on where in the lineup you are starting, but you still have to figure out who is leading off each inning. Or maybe I’m missing something here.


#9    berselius      (see all posts) 2010/03/01 (Mon) @ 15:46

maybe you could make some distribution of how many spots in the lineup are gone through depending on who starts off the lineup, and use these as weights along with the runs scored depending on who leads off. But generating all of these for your specific lineup permutation AND the specific players you stick in the simulator seems like it would be far more work than just banging out the full 9 innings using the sim.


#10    Tangotiger      (see all posts) 2010/03/01 (Mon) @ 15:47

Maybe you could shorten things by figuring out how many runs are scored in an inning depending on where in the lineup you are starting, but you still have to figure out who is leading off each inning.

YES.  That’s exactly what you are doing.

And think of it recursively.  Start in the 9th inning.  Start with the #4 hitter leading off.  Figure out how many runs are scored to the end of the inning.  So, now you figure out what happens when the #4 hitter leads off and the #5 is the second guy and the #6 is the third guy, etc.  You ALSO have to figure out how often the inning ends with the #6 and #7 and #8 hitters, etc.  You will need this for the recursion, so that when you repeat for the 8th inning, you know how often it ends with say the #3 hitter, and then, guess, what, you know exactly how many runs are scored in the 9th inning when the #4 leads off.

So, this is why I’m saying that there are ways around the issue permutation issue.  Think of things in a more piecemeal fashion.


#11    Tangotiger      (see all posts) 2010/03/01 (Mon) @ 15:51

"seems like it would be far more work than just banging out the full 9 innings using the sim”

But in my case, you get to reuse alot of the cpu time.  In your case, you have to re-run the entire game if all you do is switch two spots.  In my case, the only time you re-run is when those two spots are actually hitting in the inning.

In the end, it comes down to how “clever” you are tryign to make your algorithm.  I’m a programmer, so, to me, 95% of this was the challenge to code this in as few lines as possible and make it run as fast as possible.  The other 5% was to learn something new about batting orders.  If you are not a programmer, than this hidden G-spot won’t do anything for you.


#12    berselius      (see all posts) 2010/03/01 (Mon) @ 15:53

Ooh, that is clever. And obviously the final step is that we know who led off the first inning. And you have the bonus of figuring out the chances that an inning ends on each batter for each possible ‘leadoff’ batter in each inning by calculating the one-inning stuff in the 9th.


#13          (see all posts) 2010/03/01 (Mon) @ 16:01

Matt,

Octave is an open-source Matlab clone:

http://www.gnu.org/software/octave/index.html

It’s designed to run Matlab code as is.  You could try the script out with that.


#14    berselius      (see all posts) 2010/03/01 (Mon) @ 16:01

Thinking out loud about this some more, now I’m not sure there’s a big savings. You need to calculate the runs/inning for when each of the nine batters leads off. Which means you basically need to run....9 innings - a full game! Maybe the savings comes in because you don’t need to do as many iterations for a single inning since there should be less noise in one inning than a whole game. But the big savings would be in what you referenced above - if you know what happens generally when 1,2,3,4,5,6 bat you won’t need to recompute it later.


#15    berselius      (see all posts) 2010/03/01 (Mon) @ 16:02

Thanks Bill/13. This shouldn’t have any problems at all in Octave.


#16    J. Cross      (see all posts) 2010/06/21 (Mon) @ 20:34

We just got matlab at the school and I’m playing around with Berselius’s code to try to teach myself the program.

I added in runners taking extra bases (using the %’s from Tango’s markov chain).

The Cubs go from 3.29 runs/game to 3.97 runs/game which is a bigger jump than I imagined. 

Here’s the distribution:



I think once unearned runs are added in this lineup will perform about as expected.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 22:49
Clutch analogy

Feb 11 22:08
Who is Jeremy Lin?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul