THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, January 27, 2011

Batting Order for Redsox 2011

By Tangotiger, 03:05 PM

A reader wrote this up and asked me to post it.  I’m doing this blind, and will read it in a second.


#1    Tangotiger      (see all posts) 2011/01/27 (Thu) @ 15:13

Ok, just read it.  The author notes some of the things he did not consider, and I suggest that they are important enough that they need considering.

1. Opposing pitching hand.  In optimization of batting order, it’s all about a little edge here or there, and you have to include the pitching hand.

2. If it’s close, break up the lefty/righty.  You might gain in the late innings.

3. Include GIDP.  This is actually pretty significant.

4. Include SB/CS.

5. I would suggest the author goes through the process step-by-step like I do here:

http://www.insidethebook.com/ee/index.php/site/comments/optimizing_the_batting_order/

(Posts 4, 5; 20, 21)

After you try a few, show us how much runs per game each will give you.


#2    RMR      (see all posts) 2011/01/27 (Thu) @ 19:13

It doesn’t seem this accounts for the issues related to the structure of innings (i.e. the bases being cleared after every 3 outs).  I believe that leads to to some other considerations, such as putting a lower SLG person leading off (so as not to waste the advancement value of extra base hits) and batting a mediocre player 3rd due to the frequency of ‘0 on, 2 out’ PA.

It would seem Monte Carlo would be the more effective way to optimize.  Though it would be interesting to see how close the approaches get you.


#3    tangotiger      (see all posts) 2011/01/27 (Thu) @ 20:58

Definitely Monte Carlo or Markov is better.  This method however is much faster.


#4          (see all posts) 2011/01/28 (Fri) @ 13:03

I will admit that my examination of the Red Sox batting order was driven by a desire to use a certain process (linear programming) than a passion for finding the “perfect” batting order. I do agree that a Monte Carlo simulation or Markov model will enable one to incorporate more “real world” events and serve as an. The issue I had was letting the model sort through the 362,880 possible orders quickly and with some degree of confidence that we did find the “optimum” batting order. Hence the use of linear programming. Having said that I did run into some issues that I could use some advice with.

Let me start by showing my analysis (actual numbers unlike my first post). My initial attempt used player projections from FANGRAPHS.com. I converted these to a percentage outcome for each plate appearance as shown below:

NAME    1B    2B    3B    HR    BB    HBP    K    OUT    
ADRIAN GONZALEZ    0.135     0.063     0.001     0.062     0.137     0.004     0.170     0.427     1.000 
DUSTIN PEDROIA    0.172     0.074     0.001     0.027     0.100     0.009     0.085     0.531     1.000 
KEVIN YOUKILIS    0.153     0.063     0.007     0.045     0.130     0.025     0.178     0.399     1.000 
CARL CRAWFORD    0.196     0.049     0.017     0.026     0.072     0.009     0.156     0.476     1.000 
JACOBY ELLSBURY    0.209     0.047     0.009     0.012     0.074     0.001     0.102     0.545     1.000 
JED LOWRIE    0.146     0.070         0.031     0.116     0.004     0.177     0.456     1.000 
J
.DDREW    0.149     0.046     0.006     0.037     0.134     0.006     0.191     0.431     1.000 
DAVID ORTIZ    0.128     0.054     0.002     0.049     0.125     0.005     0.239     0.398     1.000 
J
SALTALAMACCHIA    0.135     0.054     0.003     0.032     0.100         0.237     0.439     1.000

I verified that for each player the rows sum to one (they have to because I set out equal to 1.0000 minus the sum of the other outcomes).
The next step is to use Table 52 from The Book (I incorrectly identified it as Table 51 in my first post) as shown below.

Run Value by Order    1B    2B    3B    HR    BB    HBP    K    OUT
1    0.515     0.806     1.121     1.421     0.385     0.411     
(0.329)    (0.328)
2    0.515     0.799     1.100     1.450     0.366     0.396     (0.322)    (0.324)
3    0.493     0.779     1.064     1.453     0.335     0.369     (0.317)    (0.315)
4    0.517     0.822     1.117     1.472     0.345     0.377     (0.332)    (0.327)
5    0.513     0.809     1.106     1.438     0.348     0.381     (0.324)    (0.323)
6    0.482     0.763     1.050     1.376     0.336     0.368     (0.306)    (0.306)
7    0.464     0.738     1.014     1.336     0.323     0.353     (0.296)    (0.296)
8    0.451     0.714     0.980     1.293     0.312     0.340     (0.287)    (0.286)
9    0.436     0.689     0.948     1.249     0.302     0.329     (0.278)    (0.277)

Then I built a matrix showing the expected run value of each plate appearance for each player in each position in the batting order. The result is shown below:

Run Value Matrix    1     2     3     4     5     6     7     8     9 
ADRIAN GONZALEZ    0.0679     0.0695     0.0657     0.0667     0.0667     0.0643     0.0619     0.0602     0.0578 
DUSTIN PEDROIA    0.0286     0.0295     0.0262     0.0275     0.0280     0.0268     0.0254     0.0250     0.0238 
KEVIN YOUKILIS    0.0710     0.0717     0.0668     0.0684     0.0689     0.0662     0.0636     0.0618     0.0594 
CARL CRAWFORD    0.0195     0.0210     0.0178     0.0188     0.0197     0.0187     0.0176     0.0175     0.0165 
JACOBY ELLSBURY    
(0.0103)    (0.0090)    (0.0117)    (0.0113)    (0.0100)    (0.0097)    (0.0100)    (0.0091)    (0.0092)
JED LOWRIE    0.0136     0.0148     0.0115     0.0117     0.0128     0.0126     0.0118     0.0117     0.0109 
J
.DDREW    0.0225     0.0236     0.0198     0.0197     0.0209     0.0206     0.0195     0.0191     0.0181 
DAVID ORTIZ    0.0226     0.0244     0.0213     0.0207     0.0217     0.0216     0.0206     0.0201     0.0190 
J
SALTALAMACCHIA    (0.0219)    (0.0199)    (0.0219)    (0.0234)    (0.0218)    (0.0201)    (0.0198)    (0.0189)    (0.0187)

This is where we insert all the caveats about ignoring steals, left/right issues, GIDP, etc. Having that out of the way, I then looked at how to select the best combination (batting order) of player and batting order position to maximize the expected total runs per plate appearance. This selection can be viewed as a matrix as well as shown below:

Batting Order Matrix    1     2     3     4     5     6     7     8     9     SUM
ADRIAN GONZALEZ        1.0                                 1 
DUSTIN PEDROIA                1.0                         1 
KEVIN YOUKILIS    1.0                                     1 
CARL CRAWFORD                    1.0                     1 
JACOBY ELLSBURY                                1.0         1 
JED LOWRIE                            1.0             1 
J
.DDREW                        1.0                 1 
DAVID ORTIZ            1.0                             1 
J
SALTALAMACCHIA                                    1.0     1 
SUM    1     1     1     1     1     1     1     1     1     9.0

Where the value, 1.0 is assigned to each player in a given batting order. The assignment is based on the objective function shown in the matrix below:

Matrix    1     2     3     4     5     6     7     8     9     SUM
ADRIAN GONZALEZ        0.0695                                 
DUSTIN PEDROIA                0.0275                         
KEVIN YOUKILIS    0.0710                                     
CARL CRAWFORD                    0.0197                     
JACOBY ELLSBURY                                
(0.0091)        
JED LOWRIE                            0.0118             
J
.DDREW                        0.0206                 
DAVID ORTIZ            0.0213                             
J
SALTALAMACCHIA                                    (0.0187)    
SUM                                        0.2136

The objective function that we are maximizing is the sum of all cells in the matrix (shown as 0.2136). Assuming 40 plate appearances per game yields 8.54 runs per game, an extremely high value (or great offense). I’m guessing that Table 52 in The Book may have some “double counting” of runs. In any event, the linear programming model found a solution.

The next step was to try alternate batting orders. You can do this by pre-assigning a 1.0 in the desire matrix spot (called a “constraint” in linear programming speak). Forcing Jacoby Ellsbury to bat first reduces the team expected runs per plate appearance to 0.2088. I also looked at what the “worst” order would be by minimizing the LP objective function and came out with 0.1857. I concluded that batting order mattered.

I also looked at the results for the NY Yankees using the FANGRAPH.com forecasts. Jeter bats last with a team run expectancy of 0.1391. Good news for a Boston fan but doubt over the player forecasts is surfacing.

The next step was to use the new forecasts for 2011 available through The Hardball Times. Here things started to raise serious doubts. The Hardball Times forecasts were slightly less optimistic than the FANGRAPH forecasts. The LP results were significantly different. Most of the players had negative run expectancies. The optimum order was essentially unchanged but the run expectancy declined to 0.0072 runs per plate appearance (less than 0.3 runs per game). The NY Yankees declined to a negative run expectancy.

Auugghhh. Is there something dramatically wrong with the process? Am I using Table 52 from The Book incorrectly? Any insights would be appreciated.


#5    Tangotiger      (see all posts) 2011/01/28 (Fri) @ 13:15

You must have a programming bug if you are getting over 8 runs a game.

In any case, you are getting a 13% difference between worst and optimal, which if given 750 runs in a season is 98 runs.  That’s far higher than I would have expected.  In a non-pitcher league, I’d be shocked if you can get 10% difference between best and worst, and realistically, I would have expected something closer to 5%, maybe 6%.


#6    Dave St      (see all posts) 2011/01/28 (Fri) @ 14:36

I’ve checked the programming and everything cross checks except the result. I guess my real question is if there is a flaw in using Table 52 with projected plate appearance outcomes to predict expected runs?
Thoughts?


#7          (see all posts) 2011/01/29 (Sat) @ 02:15

Baseball is composed of 9 innings of 3 outs each.  Hits, walks errors, etc only occur in this framework.  A Markov or a Monte Carlo will preserve this.  I’m not sure that what you’re doing with averages handles this correctly.  I would be much more comfortable if you could show me the the 3 times 9 in your numbers.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 02:54
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com