Thursday, January 27, 2011
Batting Order for Redsox 2011
A reader wrote this up and asked me to post it. I’m doing this blind, and will read it in a second.
Buy The Book from Amazon
A reader wrote this up and asked me to post it. I’m doing this blind, and will read it in a second.
It doesn’t seem this accounts for the issues related to the structure of innings (i.e. the bases being cleared after every 3 outs). I believe that leads to to some other considerations, such as putting a lower SLG person leading off (so as not to waste the advancement value of extra base hits) and batting a mediocre player 3rd due to the frequency of ‘0 on, 2 out’ PA.
It would seem Monte Carlo would be the more effective way to optimize. Though it would be interesting to see how close the approaches get you.
Definitely Monte Carlo or Markov is better. This method however is much faster.
I will admit that my examination of the Red Sox batting order was driven by a desire to use a certain process (linear programming) than a passion for finding the “perfect” batting order. I do agree that a Monte Carlo simulation or Markov model will enable one to incorporate more “real world” events and serve as an. The issue I had was letting the model sort through the 362,880 possible orders quickly and with some degree of confidence that we did find the “optimum” batting order. Hence the use of linear programming. Having said that I did run into some issues that I could use some advice with.
Let me start by showing my analysis (actual numbers unlike my first post). My initial attempt used player projections from FANGRAPHS.com. I converted these to a percentage outcome for each plate appearance as shown below:
NAME 1B 2B 3B HR BB HBP K OUT
ADRIAN GONZALEZ 0.135 0.063 0.001 0.062 0.137 0.004 0.170 0.427 1.000
DUSTIN PEDROIA 0.172 0.074 0.001 0.027 0.100 0.009 0.085 0.531 1.000
KEVIN YOUKILIS 0.153 0.063 0.007 0.045 0.130 0.025 0.178 0.399 1.000
CARL CRAWFORD 0.196 0.049 0.017 0.026 0.072 0.009 0.156 0.476 1.000
JACOBY ELLSBURY 0.209 0.047 0.009 0.012 0.074 0.001 0.102 0.545 1.000
JED LOWRIE 0.146 0.070 0.031 0.116 0.004 0.177 0.456 1.000
J.D. DREW 0.149 0.046 0.006 0.037 0.134 0.006 0.191 0.431 1.000
DAVID ORTIZ 0.128 0.054 0.002 0.049 0.125 0.005 0.239 0.398 1.000
J. SALTALAMACCHIA 0.135 0.054 0.003 0.032 0.100 0.237 0.439 1.000
I verified that for each player the rows sum to one (they have to because I set out equal to 1.0000 minus the sum of the other outcomes).
The next step is to use Table 52 from The Book (I incorrectly identified it as Table 51 in my first post) as shown below.
Run Value by Order 1B 2B 3B HR BB HBP K OUT
1 0.515 0.806 1.121 1.421 0.385 0.411 (0.329) (0.328)
2 0.515 0.799 1.100 1.450 0.366 0.396 (0.322) (0.324)
3 0.493 0.779 1.064 1.453 0.335 0.369 (0.317) (0.315)
4 0.517 0.822 1.117 1.472 0.345 0.377 (0.332) (0.327)
5 0.513 0.809 1.106 1.438 0.348 0.381 (0.324) (0.323)
6 0.482 0.763 1.050 1.376 0.336 0.368 (0.306) (0.306)
7 0.464 0.738 1.014 1.336 0.323 0.353 (0.296) (0.296)
8 0.451 0.714 0.980 1.293 0.312 0.340 (0.287) (0.286)
9 0.436 0.689 0.948 1.249 0.302 0.329 (0.278) (0.277)
Then I built a matrix showing the expected run value of each plate appearance for each player in each position in the batting order. The result is shown below:
Run Value Matrix 1 2 3 4 5 6 7 8 9
ADRIAN GONZALEZ 0.0679 0.0695 0.0657 0.0667 0.0667 0.0643 0.0619 0.0602 0.0578
DUSTIN PEDROIA 0.0286 0.0295 0.0262 0.0275 0.0280 0.0268 0.0254 0.0250 0.0238
KEVIN YOUKILIS 0.0710 0.0717 0.0668 0.0684 0.0689 0.0662 0.0636 0.0618 0.0594
CARL CRAWFORD 0.0195 0.0210 0.0178 0.0188 0.0197 0.0187 0.0176 0.0175 0.0165
JACOBY ELLSBURY (0.0103) (0.0090) (0.0117) (0.0113) (0.0100) (0.0097) (0.0100) (0.0091) (0.0092)
JED LOWRIE 0.0136 0.0148 0.0115 0.0117 0.0128 0.0126 0.0118 0.0117 0.0109
J.D. DREW 0.0225 0.0236 0.0198 0.0197 0.0209 0.0206 0.0195 0.0191 0.0181
DAVID ORTIZ 0.0226 0.0244 0.0213 0.0207 0.0217 0.0216 0.0206 0.0201 0.0190
J. SALTALAMACCHIA (0.0219) (0.0199) (0.0219) (0.0234) (0.0218) (0.0201) (0.0198) (0.0189) (0.0187)
This is where we insert all the caveats about ignoring steals, left/right issues, GIDP, etc. Having that out of the way, I then looked at how to select the best combination (batting order) of player and batting order position to maximize the expected total runs per plate appearance. This selection can be viewed as a matrix as well as shown below:
Batting Order Matrix 1 2 3 4 5 6 7 8 9 SUM
ADRIAN GONZALEZ 1.0 1
DUSTIN PEDROIA 1.0 1
KEVIN YOUKILIS 1.0 1
CARL CRAWFORD 1.0 1
JACOBY ELLSBURY 1.0 1
JED LOWRIE 1.0 1
J.D. DREW 1.0 1
DAVID ORTIZ 1.0 1
J. SALTALAMACCHIA 1.0 1
SUM 1 1 1 1 1 1 1 1 1 9.0
Where the value, 1.0 is assigned to each player in a given batting order. The assignment is based on the objective function shown in the matrix below:
Matrix 1 2 3 4 5 6 7 8 9 SUM
ADRIAN GONZALEZ 0.0695
DUSTIN PEDROIA 0.0275
KEVIN YOUKILIS 0.0710
CARL CRAWFORD 0.0197
JACOBY ELLSBURY (0.0091)
JED LOWRIE 0.0118
J.D. DREW 0.0206
DAVID ORTIZ 0.0213
J. SALTALAMACCHIA (0.0187)
SUM 0.2136
The objective function that we are maximizing is the sum of all cells in the matrix (shown as 0.2136). Assuming 40 plate appearances per game yields 8.54 runs per game, an extremely high value (or great offense). I’m guessing that Table 52 in The Book may have some “double counting” of runs. In any event, the linear programming model found a solution.
The next step was to try alternate batting orders. You can do this by pre-assigning a 1.0 in the desire matrix spot (called a “constraint” in linear programming speak). Forcing Jacoby Ellsbury to bat first reduces the team expected runs per plate appearance to 0.2088. I also looked at what the “worst” order would be by minimizing the LP objective function and came out with 0.1857. I concluded that batting order mattered.
I also looked at the results for the NY Yankees using the FANGRAPH.com forecasts. Jeter bats last with a team run expectancy of 0.1391. Good news for a Boston fan but doubt over the player forecasts is surfacing.
The next step was to use the new forecasts for 2011 available through The Hardball Times. Here things started to raise serious doubts. The Hardball Times forecasts were slightly less optimistic than the FANGRAPH forecasts. The LP results were significantly different. Most of the players had negative run expectancies. The optimum order was essentially unchanged but the run expectancy declined to 0.0072 runs per plate appearance (less than 0.3 runs per game). The NY Yankees declined to a negative run expectancy.
Auugghhh. Is there something dramatically wrong with the process? Am I using Table 52 from The Book incorrectly? Any insights would be appreciated.
You must have a programming bug if you are getting over 8 runs a game.
In any case, you are getting a 13% difference between worst and optimal, which if given 750 runs in a season is 98 runs. That’s far higher than I would have expected. In a non-pitcher league, I’d be shocked if you can get 10% difference between best and worst, and realistically, I would have expected something closer to 5%, maybe 6%.
I’ve checked the programming and everything cross checks except the result. I guess my real question is if there is a flaw in using Table 52 with projected plate appearance outcomes to predict expected runs?
Thoughts?
Baseball is composed of 9 innings of 3 outs each. Hits, walks errors, etc only occur in this framework. A Markov or a Monte Carlo will preserve this. I’m not sure that what you’re doing with averages handles this correctly. I would be much more comfortable if you could show me the the 3 times 9 in your numbers.
May 25 02:54
Largest demonstration in Canadian history?
May 25 02:38
NFLPA lawsuit against collusion
May 25 01:43
Neal Huntington’s best moves
May 25 00:36
Help needed with sticky issue…
May 24 23:50
Rooting for laundry
May 24 17:04
Firefox, IE, or Chrome?
May 24 12:07
How to beat the shift
May 24 11:11
Incredible story
May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards
May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com
Ok, just read it. The author notes some of the things he did not consider, and I suggest that they are important enough that they need considering.
1. Opposing pitching hand. In optimization of batting order, it’s all about a little edge here or there, and you have to include the pitching hand.
2. If it’s close, break up the lefty/righty. You might gain in the late innings.
3. Include GIDP. This is actually pretty significant.
4. Include SB/CS.
5. I would suggest the author goes through the process step-by-step like I do here:
http://www.insidethebook.com/ee/index.php/site/comments/optimizing_the_batting_order/
(Posts 4, 5; 20, 21)
After you try a few, show us how much runs per game each will give you.