Friday, April 04, 2008
Help Dusty make the optimal batting order
Justin will give it a go. Look for an update soon…
Buy The Book from Amazon
Justin will give it a go. Look for an update soon…
Justin released his results:
http://jinaz-reds.blogspot.com/2008/04/markov-dusty-bakers-lineups-arent-half.html
Tomorrow, I will run those forecasts against my simplified model from The Book, to give you my take on it.
I do find it very hard to believe that the gap is as large as is being shown. I’d like to know what kind of speed numbers, basestealing numbers, and DP numbers were input (if any).
Also, was there a difference between LHP and RHP?
And maybe MGL can put in a few of those lineups, and see what kind of difference he gets.
I put in several lineups into my sim and got very different results from those that he got. I also got Dusty’s lineup as being one of the worst you can put out there, at least based on the lineups that I tested, which were Justin’s best 3 and worst 3 (not including the one with the pitcher hitting 8th, which my sim is not set up to handle).
My results are in the comments section of the above article.
I’d be interested to see what Tango gets.
My results are actually in the comments section of the follow-up post on jinaz’ web site:
https://www.blogger.com/comment.g?blogID=23241716&postID=2719216967425321508
John Beamer discovered an error in my usage of the model tonight. I’m re-doing the lineups with those modifications, and made it through the opening day lineups this evening. It isn’t changing the principle findings thus far, and if anything is resulting in larger gaps between the lineups…
I’m also going to take MGL’s projections and run them through the Markov. It will be interesting to see if the differences between the two systems remain.
As far as speed numbers goes, it’s restricted to SB rates. GDP numbers are included, but they are estimates based solely on PECOTA projected BIP rates as PECOTA did not actually forecast GDP. In the version of the spreadsheet that Beamer released, all baserunning aside from stolen bases is a constant across all ballplayers...though you can manipulate the heck out of what happens in each situation (I’m leaving it at the default settings).
And no, no differences between LHP and RHP are entered into the model in its present form.
-j
I didn’t get a chance to run it through my simplified LWTS process, but I will. I’m sure it will generally agree with MGL far more than Justin’s implementation of Beamer’s model.
I agree with Beamer that the only way to test MGL and Beamer’s model is to use the exact same inputs. Though, the little differences, like ROE and GIDP is not going to account for the massive differences Justin is reporting.
Reposted from my blog:
----------
Ok, I’ve run MGL’s projections through the model. And I’m sure I’m now using the model correctly.
Those projections are much worse than the PECOTAs! But the rank order from this model has Baker actually coming out on top. Here are the results, listed in the same order you listed them:
Baker OD: 4.41 r/g, 715 r/sea
Old Top 3:
Bluzer OD: 4.32 r/g, 700 r/sea, -16 above Baker
Chris-OD2: 4.34 r/g, 703 r/sea, -12 above baker
Pickoff-OD: 4.36 r/g, 706 r/sea, -9 above Baker
Old Bottom 3 (bottom first):
redmanrick-OD: 4.26 r/g, 691 r/sea, -24 above Baker
Brad-OD2: 4.30 r/g, 697 r/sea, -18 above baker
fareast-OD: 4.33 r/g, 701 r/sea, -14 above baker
Mine:
jinaz-OD: 4.39 r/g, 711 r/sea, -4 above Baker
jinaz-OD-exploit: 4.33 r/g, 701 r/sea, -14 above baker
I’m using 2007 #9-slot hitting totals now, which come to a 0.170/0.216/0.250 hitting line. Before I was using pitchers only, but this helps account for the late-inning pinch hitters.
So...Still disagreements between the two systems. These come out much lower than yours (using the same projections, mostly), and with a different rank order. And the range is a bit higher as well, with ~24 runs between the worst and best in the Markov and ~11 runs between the best and worst in your sim.
The rank order differences are just bizarre though. Baker’s lineup comes out on top in this Markov, but is doing terribly in your sim. And the worst lineup according to this Markov is the best in your sim...it’s almost like they’re inverted!
????
-Justin
Hmmmm. I don’t know. I guess I could have my sim spit out the results of everything and maybe we can see if we can figure out what is going on.
Let’s see what Tango comes up with.
I am confident with the actual rpg numbers from my sim. As I said, I am scaling everything to the average NL, last 3 years, which is around 4.68 rpg per team, I think. Given that I am running CIN at home and their lineup is a shade below average, I think, my rpg numbers seem about right. Yours are way too low, even using my projections, so something is wrong with the Markov such that it is not producing enough runs.
I also agree with Tango that the spread from best to worst seems too high, considering that they are all reasonable lineups.
I can copy over run expectancy matrices and state frequency matrices for any lineup I feed into the thing if that helps. I just don’t really know what to look for at this point. Beamer has said that he did a fair bit of testing of his model against ‘07 data. Maybe the first thing to do is ask him replicate what I just did (using your data) and make sure that I’m (again) not doing something patently wrong. We chatted about it last night, though, so I’m pretty confident that I’m doing it properly…
I may also try to use 2007 NL totals for each lineup slot and see how the model does with those totals. That would get away from any peculiarities with these particular players, and SHOULD match up pretty well. Don’t have time to do that right now, though…
Any in-depth analysis of the Markov’s inner workings is probably going to have to be done by Beamer...though he has offered to send me the “source code” version of the spreadsheet and walk me through it, so maybe I can help at some point after I figure out how it all works. My time’s kind of limited though...been spending too much time on this the past few days as it is.
I know there are some things that it doesn’t include (like handedness), but I still wouldn’t expect the Markov to be as different from your sim as it is. It’s a complicated model, and Beamer did a nice job of trying to include a lot of small details in it. Ultimately, it should converge on the sim results (assuming your sim is accurate, and I believe you that you’ve tested it extensively), especially with a fairly standard set of players. Hopefully we can figure this out, as it would be a great tool to be able to toy around with.
-j
Cross-posting to Justin’s blog:
Using the PECOTA forecasts noted in the main blog entry, and ignoring batting handedness, speed, and GIDP (all things I would not normally ignore), with wOBA in parens:
1. Hatty (.358)
2. Dunn (.401)
3. Junior (.360)
4. Encarnacion (.367)
5. Keppinger (.348)
6. Phillips (.334)
7. Valentin (.333)
8. Pitcher (.160)
9. Patterson (.311)
I don’t see how it’s possible to have such a big disagreement here. Other than Dunn and Patterson, the PECOTA forecasts are very tight for the rest of the players.
Here’s a reasonable lineup (5 best players remain in the top 5, don’t touch the pitcher), but in the worst possible combination:
1. Encarnacion (.367)
2. Junior (.360)
3. Dunn (.401)
4. Keppinger (.348)
5. Hatty (.358)
6. Patterson (.311)
7. Phillips (.334)
8. Pitcher (.160)
9. Valentin (.333)
And that’s just 3 runs worse than the optimal one that my Linear Weights model would suggest.
Basically, it’s pretty difficult to create a bad lineup (notwithstanding the exceptions I noted at the start of this post).
Re: Beamer’s model.
I don’t know where the issue is regarding the application of Beamer’s model. It could be an issue anywhere (coding issue, data issue, user issue).
Beamer did test it against my basic Markov model (all players the same). That’s a good first step. But, there could be a “looping” or “indexing” issue that simply wouldn’t be exposed by looking at a lineup with all the players the same.
In The Book, I have a model where I move the pitcher up one slot each time, whereby it shows the difference between the worst place to put him (cleanup) and the best place (#9) as having a gap of 0.10 runs per game, when the other 8 slots are all equal players. So, that’s another good test. Table 61 is your reference.
In Table 63, I used typical players in each batting slot, and moved the pitcher up each time to see the effect. The gap between the best place to put the hitter (#8) and worst (cleanup) is again close to 0.10 runs per game.
So, if Justin or Beamer can confirm that Beamer’s Markov is giving results consistent with my results, that’s a huge step forward.
Note: My model used a semi-Markov, meaning that I tracked the state to state along 3 dimensions of states (base, out, lineup slot), and then used basic probability to extend that to include innings. I do the same with my Win Expectancy model, using only base/out states to establish the Markov chains, and then use basic probability to extend to innings.
Tango—I’m going to look at all this stuff on the weekend and try to identify divergences .... Is there any chance you can mail me table 61/63 as my copy of the book is at home and I am in South Africa at the moment (don’t ask) for the long haul.
If you are comfortable doing that you should have my email address.
Done.
Well, I can’t test the user issue, but John Beamer will be able to.
I’ve sent him copies of my input page, so if I’m screwing up something he’ll be able to catch it.
Here’s my attempt to replicate your table 63 using the Markov. I used 2007 NL splits by batting position, so the “pitcher” spot includes pinch hitters and Tony LaRussa’s #9 hitters. The expectation from that would be that the difference between the best and worst lineup would be less than you found, because “pitchers” are better hitters in this case.
Here are the data. When the pitcher hits 7th, the original #7 and #8 hitters hit 8 & 9 respectively.
Pitcher hits R/G Prd RS
9 5.061 819.9
8 5.063 820.2
7 5.061 819.9
6 5.059 819.5
5 5.062 820.0
4 5.028 814.5
3 5.028 814.5
2 5.083 823.4
1 5.070 821.3
Here’s the above work repeated, this time using b-ref’s 2007 NL pitcher hitting splits in the pitcher spot, with all other positions the same as above. I expected that the max difference between spots would be closer to what was in your (tango’s) study because pitchers are so incompetent in this lineup:
Pitcher hits R/G Prd RS
9 4.831 782.6
8 4.834 783.1
7 4.829 782.4
6 4.826 781.8
5 4.830 782.4
4 4.795 776.7
3 4.794 776.6
2 4.849 785.6
1 4.836 783.5
At this point I’m going to step back and see what Beamer finds--if there is a problem with the model itself, then I can’t be of much help.
-j
(My Table 63 includes a typical #9 hitter, meaning 65% pitcher, and 35% other.)
Well, that’s a huge bug right there on your ouput! There is no way that a team will hit better with the pitcher batting 1 or 2. It looks to me that there’s some sort of looping or indexing issue.
So, there’s some big ‘splaining to do, for sure.
As you say, it looks to me like there is a gigantic problem in the 1 and 2 holes. It almost seems like the model does not incorporate how many PA each slot gets, almost as if all slots get the same number of PA - like if the game started with a random slot. I don’t see how that mistake can be in the model, though. Of course if you put in the true batting lines for each lineup slot, rather than average batters, other than the pitcher, even if the game started with a random slot, batting the pitcher leadoff or second would still be devastating since your best hitters would have so many fewer RBI situations, so there is definitely something wrong there.
Looks like there is a significant bug in the model.
I think it’s a programming bug, where you loop from the 9th hitter to the leadoff hitter, and he’s indexed wrong.
Guys—bear with me. I’ll take a look over the next couple of days. Thanks for all you help on this ... especially Justin.
ps. This is a surprise as I did do this test before but let me see what I come up with
Tango,
Here are preliminary results for table 61 and 63. My rpg difference is a bit more than yours but the sequencing is the same. I certainly don’t get Justin’s 1/2 issue ... Later I’ll try to replicate Justin’s assumptions and see if I get the same results as him.
tb 61 tb 63 1 4.443 4.704 2 4.427 4.694 3 4.435 4.702 4 4.459 4.712 5 4.528 4.756 6 4.578 4.788 7 4.61 4.807 8 4.611 4.809 9 4.616 4.811
FYI ... my methodology for batting order is to a markov for each sequence of batters (batting once through the order only) eg, 1-9 ... 2-1 ... 3-2 and then work out the probability that each batter will lead off an inning. For instance a lead-off hitter will lead-off an inning ~1.8x more than average .. a #2 0.78x more than average (numbers approx). I then iterate this to work out a weighted average run expectancy matrix from which I generate a rpg
MGL, do you have time to run your sim to replicate Tables 61, 63 from The Book?
I don’t see how we can have that much of a gap between the best and worst spots for the pitcher. In Tables 61, 63 for me, it’s a .010 run gap. For John’s table 63, it’s a .012 run gap, which I can accept. But his table 61 is a .019 run gap, which seems inconsistent.
Tom—I’ll also check my base running etc etc etc assumptions. Could be a factor too.
Great. Looks like I’m doing something wrong with my setup. I wonder if I somehow screwed up the spreadsheet when fiddling around with it over the past few months.
I won’t be around much today, but I’ll start with the original spreadsheet and try to reconstruct my results with a fresh version. I don’t really see what else I could be doing wrong, but clearly there’s an issue. -j
Justin—over the week end I’ll also reconstruct your work and see what I get as well ...
Tom—one reason for my slightly greater rpg spread is that my pitcher is worse than yours. I was using a .140.150/.160 equivalent pitcher ... what were you using?
If we agree on that then I’ll dig into my base running plays, outs on base etc etc etc to see if that has any effect.
I was using what the average pitcher was 1999-2002, which IIRC was a wOBA of .164 (which implies around a .170 OBP, .180 SLG I guess). I do remember that if it was 9 pitchers, then the runs per game was exactly 1.00 (as luck would have it).
I finally figured out the difference between my and Beamer’s data--it was a second case in which I just didn’t understand how to use his spreadsheet. My results now match is perfectly. So, at some point here, I’m going to go back and re-do the Baker study and see how things fall out.
-j
Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season
Nov 20 14:20
Marcel 2009 is here
Nov 20 14:19
Nate Silver: hero to interviewers
Nov 20 13:42
Top Free Agent Pitchers
Nov 20 12:29
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being
Nov 20 12:27
David G. checks in again on whether experience matters in the post-season
Nov 20 10:42
Offense by position groups by decade
Nov 20 02:01
My 1B is better than your 1B
Nov 20 00:26
MLB logo
Nov 19 23:03
NBA’s Marcel
Should be done by the weekend…
What’s interesting, preliminarily, is that it’s remarkably hard to beat Dusty Baker’s lineup with his players. If you start swapping in Jay Bruce and Joey Votto, you can do it. But Markov rates lineup pretty favorably. It might not be “perfectly” optimized, but somehow the interactions between players seems to work well under Markov. Despite Corey Patterson in the leadoff slot.
-j