THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, August 02, 2007

Did the Padres get rid of Bochy because they are smart and he is not?

By , 05:31 AM

Actually, I don’t know whether he quit or was fired.  I’ve been watching Giants games lately hoping to catch a glimpse of history.  In two days I’ve seen a bunch of egregious errors by Bochy.  That does not bode well for his reputation in my book.  Here is what happened in Tueday’s and Wednesday’s games:


In Tuesday’s game with the Giants up 3-1 in the top of the 7th, Lowry was allowed to hit with runners on 1st and 2nd and 1 out, with a good pinch hitter, Sweeney, waiting in the wings.  Surely that can’t be right.  And as usually happens when LaRussa pulls that nonsense, Lowry was taken out in the 7th anyway (not like he was going to pitch a compete game anyway).

To make matters worse, add insult to injury, rub salt into the wound, or whatever you want to call it, he bunted.  You have to be an awful hitter (and decent bunter), which Lowry is not, in order to bunt with runners on 1st and 2nd and one out.  Lowry appeared as if he is a poor bunter and promptly bunted into a pop out with 2 strikes I think.

The next night (last night), with the Giants up 4-2 in the 8th, the first Dodger batter, Furcal, bunted for a hit.  Bochy then brought in Kline to face Pierre, a lefty.  Good move so far.  Pierre bunted for a hit and after Vizquel made an ill-advised bad throw to first, Furcal ended up on third.  So now we have runners on 1st and 3rd and no outs and a RHB coming up followed by a lefty (Gonzo).

Plus, everyone knew that Pierre would likely attempt to steal a base.  What should Bochy have done?  Probably bring in his closer, but we know he is not going to do that.  In all fairness, I don’t know that his closer (Hennessy) would have had much time to warm up.  Everything thus far happened quickly.

So what did Bochy do?  Bring in a crappy really tall (easy to steal on) RHP in Messenger who has a good ERA, lousy peripherals and is basically a replacement level reliever.  Pierre promptly stole second easily on the first pitch, Bochy left him in there to pitch against the LHB (Gonzo), and the rest is history.  The Giants and Messenger lost the game of course.

First of all, brining in one of your worst relievers (although I am not sure he is perceived as such - probably not) in a high leverage situation like that is a joke.  Second of all, leaving Kline in there to pitch to the RHB and then to the lefty Gonzalez was an easy choice to make.  Kline would have essentially kept Pierre from stealing (or at least lowered his SB success rate enough to probably make it incorrect for him to steal) which was critical in that situation.  Not to mention the fact that Kline is an excellent pitcher and not too bad versus righties.

So what was Bochy thinking?  I have no idea.

Then again, I had just watched the Baltimore skipper intentionally walk the bases loaded with no outs in the bottom of the 7th inning in the Boston game only to see that runner score, Baltimore blow the lead and the game. (In case you are wondering, it is almost never correct to issue an IBB with no outs.)

It is incredibe to me how stupid managers can be and how many extra games in win expectancy can be garnered over the course of a season just by using some semblance of an optimal strategy with regard to these types of things.  Give me one day with a manager (assuming he is taking notes) and I can hand over 10-15 million dollars in “FA money” to just about any team, with my eyes closed…

#1          (see all posts) 2007/08/02 (Thu) @ 10:30

Just curious ... how many extra wins per year could a manager get his team by using optimal strategy?  And, in practice, given the managers in MLB at the moment, what is the difference in added wins per year between the best and worst?

A guy I know who follows NFL closely told me that he thought he could add about 1.5 wins per year to a typical team if he had the authority to over-ride any in-game strategy decisions.  At first, I thought that number sounded way too high, but he made a pretty good case that a lot of wins are given away by stuff like not being aggressive on fourth down, bad clock management, kicking too many field goals and the like.

I tend to think the biggest area for improvement with MLB would be situations like the first example you gave ... letting pitchers hit in NL games when they are about an inning away from being yanked anyway.


#2    Fargo      (see all posts) 2007/08/02 (Thu) @ 10:31

"That does not bode well for his reputation in my book.”

You should write a book about it.

Seriously, has anyone done a thorough analysis of the net gains-losses from trades? How many of these seemingly valuable properties are lemons (or go sour)? How often is there an essentially good trade, i.e., an even trade for value—even if one team is seeking to exploit immediate value and another is seeking to benefit from that value in the future?


#3    MGL      (see all posts) 2007/08/02 (Thu) @ 14:26

DFL, I really don’t know.  Wild guess is 3 wins.  In The Book, we lay out a strategy whereby when your 4th and 5th starters pitcher, they never bat, and they actually relieve eachother (they both pitch on the 4th and 5th days).  That is an extremely powerful strategy.  I would add the nuance that you only yank them in a non-low leverage situation.  The 3 wins does NOT include this strategy.

As far as trades, the tricky part of evaluating trades is that they include player contracts (duh), which people seem to forget (you don’t rade players - you trade their contracts).  If you trade Betancourt for Derek Jeter people will say, wow, that was the dumbest trade in the history of baseball (for the Yankees).  Actually that would be an awesome trade for the Yankees (actually let’s assume it was some other team but the Yankees who had Jeter).  You get rid of Jeter and his 19 mil a year and get a protected decent player.  Jeter is grossly overpaid according to his marginal win value and any team who has him should be happy to get rid of his 3.5 wins above replacement, worth 14 mil on the open FA market and pocket his 19 mil salary, netting them 5 mil.  IOW, he should be DFA’d, placed on waivers (for real), or traded for just about anything.  He is a 14 mil house with a 19 mil mortgage.


#4    MGL      (see all posts) 2007/08/02 (Thu) @ 15:28

I also forgot that there was another lefty in the pen, Taschner, who just started to warm up AFTER Messenger pitched to Gonzo.  An alternative strategy for Bochy was to have Messenger pitch to the RHB and then Taschner to the lefty.  Even then, he would still give Pierre a good chance to steal by bringing in the righty Messenger rather than leaving Kline out there.  Just bad managing no matter how you look at it.


#5    Jerome      (see all posts) 2007/08/02 (Thu) @ 18:00

mgl,

Re: the Orioles in the bottom of the 7th in a 3-3 game.  I’m pretty sure there were two outs, Coco Crisp on first and Erik Hinske, a left-handed batter at the plate.  A right-hander (Rob Bell) was on the mound.  Crisp steals second on the second pitch (both balls).  After a third ball, the Os intentionally walk the left-handed Hinske with first base open to bring the right-handed Doug Mirabelli up.  Any qualms with this scenario?


#6    Rally      (see all posts) 2007/08/02 (Thu) @ 21:58

They increased the Red Sox chance of winning from .564 to .575, but considering the Orioles already had 3 balls on Hinske, and would gain the platoon advantage, its probably a defensible moved that just worked out badly.

The night before the O’s walked the bases loaded when they were down 5-3, 2nd and 3rd, one out.  That moved the Red Sox from .856 to .905.  Bad move, but it that case they got lucky and the next 2 hitters made outs failing to score the runs.

In general the Oriole managing has been horrible this year, with Trembley slightly less dumb than Perlozzo was.


#7    MGL      (see all posts) 2007/08/02 (Thu) @ 22:18

In the 7th with no outs and runners on 2 and 3, they IBB’s Manny with a lefty (Parrish) on the mound and then brought in a RHP (I think Bradford - I can’t get gameday to work right now)to pitch to Youk.

If you let Manny hit, the Sox win 89.3% of the time in 100,000 games on the sim.  If you walk Manny, the Sox win 90.2%.  So the IBB costs around 1% in wp.  Not that big a deal, but definitely the wrong move.

That is not to mention the fact that Youk is a real good low ball hitter and that is all Bradford throws, low sinkers.


#8    Pizza Cutter      (see all posts) 2007/08/03 (Fri) @ 11:47

One clarifying note: MGL wasn’t sure of whether lefties really are better at holding runners.  They are.  I went back into my old “Throwing to First” database and sure enough, they save more runs on average than do righties through their pickoff moves.  Over the course of a season, an average lefty is worth a run more than the average righty in that department.  Also, SB success rates off lefties are about 10% less than righties (73 to 63%).


#9    Fargo      (see all posts) 2007/08/03 (Fri) @ 12:20

MGL: Thanks for reminding me that trades were for contracts and not players, and thus the valuation of such trades has to take into account more than just the future performance of the players on each side of the trade. (There are, of course, other arcane rules that may lead to such trades, e.g., to the Tigers’ trade of Ledezma to Atlanta earlier this year.)

Pizza Cutter: What’s the average number of pick-off throws that pitchers make?  Other than the distraction factor, how much might a baserunner’s ability to attract pick-off attempts or “throw-overs” wear down a pitcher?


#10    Pizza Cutter      (see all posts) 2007/08/03 (Fri) @ 13:13

Fargo/9, the number of pickoff throws varies with the speed of the runner.  Pitchers keep a closer eye on gents with some wheels.  I don’t have the info handy on the average number of throws over, but it wouldn’t be informative outside of knowing some of the other contextual factors that go into it. 

As to whether pickoff attempts tire the pitcher out, I haven’t studied that.  Yet.


#11    Math Guy      (see all posts) 2007/08/03 (Fri) @ 13:48

If you let Manny hit, the Sox win 89.3% of the time in 100,000 games on the sim.  If you walk Manny, the Sox win 90.2%.  So the IBB costs around 1% in wp.  Not that big a deal, but definitely the wrong move.

You’ve got to be kidding me here.

You think your simulation has an accuracy of better than 1% error?  Even assuming your simulation is that accurate, if we replay this game 100 times, the Red Sox win one additional time.

I would have a hard time saying with a straight face that this was “definitely the wrong move”. 

I would think that any move that doesn’t have a negative swing of more than 10% in Win Probability is likely defensible given the manager’s greater knowledge of their personnel and the opposition’s, availability and health, etc. 

The Win Probability differences you are citing are laughably small.  There is no way your sim can be accurate enough to make these small differences meaningful given the level of error inherent in a simulation.


#12    Pizza Cutter      (see all posts) 2007/08/03 (Fri) @ 14:31

Math Guy/11, Your point about the accuracy of the simulator is well-taken, and I suppose the error term could be high enough as to render the change in WPA as non-significant.  I’ve never examined MGL’s simulator, but I’m guessing it’s a Monte Carlo type model.  A well programmed MC model is generally going to give some good results and with enough parameters, you can drive that error term down. 

I’m curious as to how something that demonstrably decreases a team’s chances of winning can be anything other than a wrong move.  Yes it’s “only” a difference of one game in 100 repetitions, but tell that to the teams that have missed the playoffs by one game in 162.


#13    Math Guy      (see all posts) 2007/08/03 (Fri) @ 14:47

The problem is that it is not a “demonstrable” effect.  A simulation does not mean it is a real effect.  It may mean that it is more likely than not to be a better move.  I’m mostly just reacting to the absolute certainty mgl has when denigrating these moves.

I’ve got some experience with Monte Carlo simulations and there is no way you can simulate something as complicated as a baseball game to that level of accuracy.  It just isn’t possible.  The actual swing in WP (which is incomputable, the simulation just estimates that parameter) for the two moves could be anywhere and a 95% confidence interval around that parameter is going to be much, much bigger than 1%.  I’d guess something like 10 or 15%.  You just can’t know things to that level of certainty.

The 1 in 100 argument really only carries weight if this is a situation that Tremblay undertakes every night or multiple times a game.  In this one case (assuming the simulation is perfectly accurate), Tremblay cost his team 1/100th of a win.  Should we really run down a manager for a move that might have cost his team 1/100th of a win?


#14    Xeifrank      (see all posts) 2007/08/03 (Fri) @ 16:19

Where is this sim, who wrote it, and where can we read about it?  Thanks!
vr, Xei


#15    David Gassko      (see all posts) 2007/08/03 (Fri) @ 16:52

Math Guy,

10 or 15 percent? Now THAT’s laughable. 1% is the difference between Albert Pujols and an average player, which is obviously pretty big.


#16    Anthony      (see all posts) 2007/08/03 (Fri) @ 16:54

In The Book, we lay out a strategy whereby when your 4th and 5th starters pitcher, they never bat, and they actually relieve eachother (they both pitch on the 4th and 5th days).  That is an extremely powerful strategy.  I would add the nuance that you only yank them in a non-low leverage situation.

If I remember correctly, that didn’t take into account the pinch-hitting penalty. How does that affect this strategy?


#17    Rally      (see all posts) 2007/08/03 (Fri) @ 18:03

One of these days I’ll try a similar strategy.  I’m in a DH league, so it won’t quite work, but perhaps the old LaRussa strategy of starting three pitchers, having them go about 3 innings each, and loading up on the bullpen, as good ERAs from relievers are easier and cheaper than from starters.

But right now I’ve got future Hall of Famer David Lefevre in my rotation, and he’s pretty outspoken and doesn’t like gimmicks like that.  He hated it when I tried a 4 man rotation a few years ago, and I gave in.  He’s 37 though and won’t be around forever.

Yes, its an advanced sim that includes player personalities.


#18    MGL      (see all posts) 2007/08/03 (Fri) @ 18:35

I don’t know what kind of “error” you are talking about, Math Guy, but one standard error in 100,000 games is .16%.

My sim is extremely complex, it is not available publically, and it does a fine job of estimating chances of a team winning given various alternatives.  Modeling Manny against a certain pitcher with runners on 2 and 3 versus bases loaded and Youk up, is not all that difficult to do.  A manager (or Steven Hawking) with all his “knowledge” has absolutely zero shot of knowing with any degree of certainty which is better.  IOW, there is plenty of uncertainty associated with the 89% and the 90% that the sim comes up with, other than the sample error associated with playing out only 100,000 games, but that uncertainty is symmetrical and does NOT get trumped by a manager who is taking an absolute wild guess.  As I said, if anything, the
manager should have considered that Youk is an excellent low ball hitter.  When I say that something is “clearly the right or wrong” strategy that is NOT to be confused with it is incontrovertedly, without doubt or any uncertainty the correct strategy in the real world.  It means that “according to the numbers” (the paramaters and conditions that we put into the model and the model itself, be it a Markov sim or Monte Carlo sim), the gap is large enough that the chance that we made a sample error and came up with the wrong optimal strategy alternative is very small.  The question is often whether the manager knows something that we don’t that would change the model and/or the parameters.  Many anti-sabermetric people or people that just like to argue with analysts think that they often do.  That is crap.  Trembly knows nothing more or less about Manny and his pitcher and Youk or the wp of the game with and without the walk that we don’t.  To think that he does is presumptive and naive.  I’m sure you (MG) think that I am being presumptive and naive.

I’ll take my optimal strategy as dictated by my sim (or a good Markov model) for a season in REAL LIFE verus that of any manager in baseball for all my money and then some.

1% is in fact a decent number for one strategy decision.  Not a lot, but clearly in the “clearly wrong” (according to my above definition) category for anyone that is interested and NOT just interested in arguing the point.  As always, I could be wrong, but I am not interested in arguing the merits of a/my sim in terms of evaluating strategies with Math Guy (I think you need a different moniker, BTW) or anyone else.  No matter what I say, you (MG) are not going to be convinced that using a sim for this kind of analysis is worthwhile while I think it is a lifeblood in figuring proper in-game strategy in REAL LIFE.


#19    MGL      (see all posts) 2007/08/03 (Fri) @ 18:52

I don’t think that the Book considered the pinch hit penalty, but it is still a good strategy, plus part of the pinch hitting penalty I think is that they have not faced the pitcher yet.  In this case, this is the first time that the #9 batter would come up to bat anyway.

Again, the reason that manager let pitchers bat too often in the 5, 6, and sometimes 7 innings in close games is three-fold.  One, although they obviously know that a pinch hitter is better, theu have no idea what the difference in wp is as compared to bringing in a reliever for an extra inning or two (sometimes less, like when they take out the pitcher anyway in the next inning).  Two, they think that if a starting pitcher is pitching well, he will continue to pitch well, even if he is an average or worse pitcher, which generally is not true.  Three, they want to save their bullpen, which is understandable, however, sometimes the difference in wp between a pitcher hitting and a pinch hitter is so large that saving your bullpen cannot possibly mitigate enough…


#20    Math Guy      (see all posts) 2007/08/03 (Fri) @ 22:02

I’ll take my optimal strategy as dictated by my sim (or a good Markov model) for a season in REAL LIFE verus that of any manager in baseball for all my money and then some.

An easy bet to offer when you’ll never have to pay.

The fact is that you can not possibly know the accuracy of your sim to that level of detail in this setting.  It is unknowable.  The difference between an 89.3% or 90.2% chance of winning is probably about the same as the difference between two hitters--one hitting .321 and one hitting .330--getting a hit in a particular at bat.  As an analyst are you going to argue that the .321 hitter should “definitely” be lifted for the .330 hitter?

If your sim says walking Manny in that situation leads to a 90.2% chance of winning, you’ve computed a statistic that is an estimate of the parameter (the actual, unknowable value).  What do you think is the 95% confidence interval for the parameter’s value using your statistic?

My dispute is really in this area.  How big of a gap is meaningful with regards to your sim?


#21    MGL      (see all posts) 2007/08/03 (Fri) @ 23:56

The thing is is that no matter what the confidence interval of those estimates are, based on the model and the parameters themselves, rather than sample error, if one mean is 89 and the other is 90, then, yes, the one that leads to 89 is the “clearly correct” decision, assuming that the sim has some idea as to what it is doing.  And I guarantee you that it does.  As I also said, it is NOT that hard to model these things.  It really isn’t.  All the, “Well you don’t know the personalities involved, etc.,” is just rhetoric and is not going to change the fact that we CAN come up with reasonable estimates of a win percentage in baseball given the pitcher/batter combo and knowledge only of the rest of the lineups.  The fact that we don’t know everything and that we are only estimating the actual parameters does NOT change the fact that we can come up with optimal strategies (we are not right 100% of the time, but we don’t have to be) by doing these kinds of analyses.

And for the record, although I would not bet “all my money” on anything, when I propose a wager for a lot of money, I am quite serious, even though there really is no wager to be made in this case of course! wink

But, we are talking past each other. If you want to have the last word, that’s OK by me. I’ll bow out though.


#22          (see all posts) 2007/08/04 (Sat) @ 00:34

Speaking of inept management, how about Terry Francona on this evening. Tie game, bottom of the sixth in Seattle. The leverage for this situation is 1.3 which calls for one of the Red Sox’ three excellent relievers. Instead, in comes Mike Timlin, with the fifth best peripherals on the team. Certainly, after the leadoff man reaches rasing the leverage to 2.0, a top reliever would come in. Nope, more Timlin. The Ms proceed to take the lead.

This is why I vehemently oppose the move to acquire Gagne. Given Francona’s frequently inept and antiquated management of his bullpen, Gagne is worth three to five runs at best over the course of the season to the Sox. Even with a more efficient use of resources, the value of a relief pitcher to a team with an already effective bullpen over the course of two months is small.


#23          (see all posts) 2007/08/04 (Sat) @ 01:05

By the way, I have to disagree with MGL’s classification of Kline as an excellent reliever. He has a 11/13 K/BB ratio which translates to a 5.17 xFIP.


#24    MGL      (see all posts) 2007/08/04 (Sat) @ 03:47

Phil, one “magic” season does not a reliever make.  A reliever’s (any player’s) classification for the purposes of deciding whom to use when is and should be based on the best projection we can muster for that player.  That has been determined to be some version of a Marcel, which is a pitcher’s historical context neutral ERC weighted and age adjusted.  So far this this year, Kline’s NERC (normalized context-adjusted compenent ERA) is 3.73, not great for a reliever, but not bad either.  I did not realize that he had gotten a lot worse over the last couple of years.  I just remember when he was excellent with the Cards.  In 05, he was a 4.36 and in 05, 4.39, pretty bad for a reliever.  So I ammend my statements about Kline to say that he is probably around an average or a little worse now as a reliever and probably only useful versus LHP.  That probably does not change the analysis though as it was either a RHP facing 2 RHB and a LHB or a LHP facing 2 LHB and a RHB with the added benefit of Kline helping to prevent the stolen base.

Timlin is still an excellent reliever.  Phil, you are making the same (egregious) mistakes that manager and GM’s make.  One is judging a player by the magical time period of exactly one season, the most recent one, and two, using garbage or pseudo-garbage stats.  Timlin’s NERC this year is 2.86.  The last 4 years was 2.30, 3.23, 2.68, and 3.55.  That’s closer quality baby!  Yes, his K rate is down this year and last, but again, the best predicter of K rate, like anything else, are all of a pitcher’s historical stats, weighted and age adjusted, unless there is some injury or something like that to advise us otherwise.  As Timlin appeared to be still throwing in the mid to upper 90’s (96 last night) with a good slider, I see nothing to indicate that we should think that is last 426 TBF K rate is here to stay.


#25    Math Guy      (see all posts) 2007/08/04 (Sat) @ 09:16

How big of a gap is meaningful with regards to your sim?

We know how to do political polls really well also, but that doesn’t mean that a lead of 50.4% to 49.6% is significant.  Any statistician will tell you that they wouldn’t know who in fact was the true leader.

We aren’t talking past each other.  You are ascribing accuracy to your sim that is impossible to have no matter how skilled the sim maker in any endeavor similar to this.  If we were modeling roulette or black jack or chess or craps we could make these assertions but for a situation where the probabilities that the sim is based on are, in fact, themselves estimates of the true parameters, you just can’t know with that level of certainty.

I’m not saying your sim is faulty or poorly written, and that I have a better one, I’m saying even the greatest sim ever created could not know whether 90.2% and 89.3% is a significant difference.


#26    Phil D.      (see all posts) 2007/08/04 (Sat) @ 12:57

MGL,
But Timlin is 41.5 years of age. I’ll defer to you on this one, but shouldn’t we weight the more recent performance of a 41 year old much more heavily than that of a 31 year old? I know a projection system like PECOTA follows this rule. Over his last 98 innings, Timlin has a K rate of 4.2 and a GB rate south of 40%. If you have research contradicting this, I’ll gladly defer to you, but I’m just asking. And there was a very recent minor shoulder injury to Timlin that shut him down for a few days. It’s minor but it’s something.
But even if I concede to you those points, MGL. Can you really disagree with my main point that Timlin is only the fifth best pitcher on the Red Sox bullpen? Even if Timlin’s numbers are quite good, they still lag behind those of Papelbon, Okaljima, Gagne, and perhaps Delcarmen, no?


#27    MGL      (see all posts) 2007/08/04 (Sat) @ 16:07

I don’t weight any differently depending on age.  The age adjustments somewhat take care of that.  I don’t really know if you should or shouldn’t, but it is going to be a minor difference anyway.

As I said, Timlin is just fine for a 1.3 LI in the 6th inning. The ONLY pitcher in the pen who is clearly better is Papelbon and I don’t think it is correct to bring your best pitcher in in the 6th with a LI of 1.3.

Of course, at one time Gagne was one of the best relievers in baseball.  I read that a scout said that he has “average stuff” now.  Okajima is somewhat of an unproven commodity in my mind at least, as I don’t do Japanese MLE’s.  He appears to be a good one though.  He is still a lefty which means that you would tend to bring him in versus a lefty batter or two although he does not appear to be the type of lefty that would have large true splits.

The biggest “trap” in baseball is evaluating players based on the “magic current season performance.” If you or anyone else wants to do any serious analysis it MUST start with evaluating players based on a Marcel.  All “mistakes” in player and team evaluation, commentary, and analysis start and end with this “trap.” That is especially true with bullpens because each individual pitcher has relatively few TBF at any point in the season so that a decent percentage will have that “magic one season performance” far above or below their true talent level.  Saying that a pitcher or a bullpen IS good (with the implication that they will pitch similarly in the future) or bad is A LOT different from saying that they HAVE pitched well or not, regardless of how good they really ARE (and will likely pitch in the future).

Biggest trap in baseball among casual analysts, fans, commentators, managers, GM’s, players, etc.  I can’t emphasize that enough.  No one wanted to believe (it is human nature and understandable) after the first month or two of the season that Zambrano is still a great pitcher and that Marquis still sucks.  Everyone was looking for a reason why they suddenly changed their true talent level.  It doesn’t work that way by and large.  You can’t get around random flucs, and sometimes large ones, when it comes to player performance.


#28          (see all posts) 2007/08/04 (Sat) @ 17:55

MGL,
Firstly, let me say that I am enjoying the dialogue and getting quite a bit out of it. I hope you’ll indulge me one last time with this:

Here are the Marcel projected ERAs for the Sox relievers (PECOTA for Oki), their year-to-date QuikERAs (a DIPS like formula created by Nate Silver with GB, K and BB rates as inputs), and an amalgamation of the two. I used .22 as the weight for this season (2/3 [portion of this season played] divided by three [for the three years in the Marcels]). I have no idea as to whether my intuition behind it is sound, but I suspect that .22 would be close to whatever the “perfect” weight is. The projections largely match up closely with the QERAs, so the weight is somewhat trivial anyhow. For what it’s worth, PECOTA had a much more optimistic view of Gagne coming into the season while largely agreeing with Marcel on the others.

Papelbon: 3.16, 2.45, 3.00.
Okajima: 4.31, 3.24, 4.07.
Gagne: 4.00, 3.94, 3.98.
Timlin*: 4.43, 4.78, 4.51.
Delcarmen 4.50, 3.40, 4.26.

*Your component ERA for Timlin came out very differently.

I’m definitely coming around to your point of the difference between Timlin and the non-Papelbon relievers being small. Yet I’ll reluctantly stand by my initial contention that A) When tied going into the bottom of the sixth, a manager should plan on using his three best relievers to get the last 12 outs and B) Timlin, while still good, is not one of the Sox’s three best guys right now. Again, thanks for reading and responding.


#29    MGL      (see all posts) 2007/08/04 (Sat) @ 20:42

I have a lot of respect for Pecota projections.  I am not sure that even they have a whole lot of confidence in their Okajima projection and I think a projection for Gagne means almost nothing after his surgery.  I don’t know why their projection for Timlin is so poor.  And you can’t really compare Okajima to RHP’s.  RHP and LHP must be used differently.  Obviously you want a pitcher to face as many same side batters as possible, depending upon the true platoon ratio of the pitcher of course.  But your point is well taken and I will agree with you that it is not all that clear the order of quality of those particular relievers other than Paps. And I am not sure that 1.3 is all that high a leverage situation to worry too much about your reliever although I do agree that you DON’T want to use one of your poorer relievers in that situation. They should only be used in very low leverage situations, less than 1.0 other than in an emergency of course.


#30    tangotiger      (see all posts) 2007/08/04 (Sat) @ 23:53

To get some perspective on win expectancy numbers:
a top hitter is worth around +6 wins per 600 PA compared to an average hitter; that’s +.010 wins per PA.  As hard as it is to believe, a .010 win difference is ENORMOUS.  If you leverage that, say if the Leverage Index (LI) of a particular situation was 3.0, then the top hitter in the league would have a .030 win difference.

So, in an average situation, if a team has a .630 chance of winning with an average hitter, they’ll have a .640 chance of winning with a top hitter (one PA only).  If it was an LI of 3.0, then they’d have a .660 chance of winning (again, one PA only).

A top hitter would have 4.5 PA, and so, for a game, he’d be worth, on average +.045 wins.

***

A top pitcher is essentially a .650-.700 pitcher, meaning he’s +.180 wins above average, or +.020 wins per inning.

***

Those are the contexts to remember when dealing with win expectancy.


#31    MGL      (see all posts) 2007/08/05 (Sun) @ 02:21

Seriously, where do managers come up with this crap?  Last night in the HOU/FLO game, HOU walked Cabrera with 2 outs and no one on to pitch to Jacobs.  I don’t think I’ve ever seen or heard of that other than with Bonds.  Of course that is incorrect (my sim says 66.5% to 65.7, again not a huge difference).

Justice prevailed as two wild pitches later, Florida won the game as Cabrera scored.


#32    Math Guy      (see all posts) 2007/08/05 (Sun) @ 13:57

As hard as it is to believe, a .010 win difference is ENORMOUS.  If you leverage that, say if the Leverage Index (LI) of a particular situation was 3.0, then the top hitter in the league would have a .030 win difference.

That is not an enormous difference.  The fact is that in isolation there is little expected difference between Pujols and Eckstein batting in a single PA.  Over the course of the season these events build up drop by drop until you have large differences. 

In criticizing the managers, you are picking out the isolated cases where the probabilities are not in the favor, but are ignoring the hundreds of situations where the probabilities are in their favor and then extrapolating them as if these events are happening multiple times a game.  As you yourself point out Garner’s move cost his team less than 1/100th of a win.  The two wild pitches cost them .335 wins.  In that context the walk was not even close to the deciding factor, the wild pitches were more than 41 times more important than the decision to walk the batter.

If anything you should be criticizing him for using a catcher incapable of catching the ball.


#33    John Beamer      (see all posts) 2007/08/05 (Sun) @ 14:09

Math Guy,

Why wouldn’t you do everything you possibly could to make your team win? There may be some good reasons why Garner did what he did ... I can’t think of many, but if you add all these “sub-optimal” decisions up then as you correctly point out they add up to a lot.

If you were at the casino you’d always play the odds becuase that is where the money is. It is no different in baseball.


#34    john      (see all posts) 2007/08/05 (Sun) @ 14:40

To me, it doesnt matter whether the difference is big or small.....why do something thats obviously incorrect?


#35    Math Guy      (see all posts) 2007/08/05 (Sun) @ 16:41

The problem with the casino comparison is that these probabilities are not exact like they are in casinos.  This is a simulation using ESTIMATED probabilities.  Due to the level of error inherent in the estimate you simply can not say that a 1% difference is significant. 

Law & Kelton is a very good reference for estimating error in these types of simulations.  People are acting like these simulations are the same as modeling coin flipping or dice rolling.  Those are exactly known probability, a player’s true BA is not an exactly known probability, and it especially isn’t known when looking at a particular defense or pitcher.

why do something that’s obviously incorrect?

Because it is not obvious.  If a national poll came out the day before the election saying that John Kerry was leading George Bush 50.4% to 49.6% in Ohio would you say, “John Kerry is obviously going to win Ohio.”? 

No of course, not because you understand there are errors in sampling.  There are similar errors in doing a simulation of this type.  Estimating win probabilities using ESTIMATES of player’s true ability levels (which are unknowable) leads to errors.  People seem to believe that these WP estimates are exact to three decimal points.  They are not. 

When mgl gives a figure of .16% as a standard error all that means is that doing 100,000 simulations gives you a value within .16% of what doing an infinite number of simulations would (the true simulation value).  It does not mean that you are within 0.16% of the value that you would get if you played the game 100,000 times.

I’m not saying these simulations are useless.  I’m just saying that they don’t tell you much when looking at a swing of 1% or 2%.


#36    David Gassko      (see all posts) 2007/08/05 (Sun) @ 16:45

Math Guy,

Are you saying we can’t tell the difference between Albert Pujols and David Eckstein? If we can, then a good simulation can be accurate to within 1%.


#37    anon      (see all posts) 2007/08/05 (Sun) @ 19:28

MG, there are two reasons the simulator could be off, statisticians call them variance and bias. 

Variance is due to sampling variability and is simple to account for.  A typical pre-election poll contacts about 1000 voters, which gives a standard deviation of around 1.6%—large enough that a difference of 1% is not very meaningful.  The simulator was run 100,000 times, which drives the standard deviation down by a factor of 10.  With this much data, a 1% difference is easily detected and statistically significant. 

The other place the models can be wrong is by being fundamentally skewed or biased.  Even if the models are substantially off (which I doubt), the same bias will likely contaminate the simulations for each alternative in almost exactly the same way.  That is, if one of the estimates is exactly 5% too high, the other esimate is also likely to be very close to 5% too high.  When you look at the differences between win probabilities, the biases should cancel almost exactly.


#38    Math Guy      (see all posts) 2007/08/05 (Sun) @ 21:55

anon,

I agree with you on the variance issue.  It is the bias issue that I am concerned about.

I have a tough time believing that in a model as complicated as this one would be--pitching, 8 fielders, offensive ability, baserunning, etc.--that you can assume that the bias is going to be all in the same direction for the number of events necessary to model a game from this point forward. 

I don’t want to come across as saying that simulation is worthless, because it’s not.  I’m saying that when you say a difference in 1% in win probability implies a move is “certainly wrong”, I think you are stretching the tools applicability.

David G,

You are comparing apples and oranges.  One is comparing a pair of batters based on actual events that occurred.  The other is a repeated simulation of multiple events.


#39    MGL      (see all posts) 2007/08/05 (Sun) @ 23:16

I am repeating myself, but it is not that hard to model most baseball situations.  One reason is that you don’t really need to model the “rest of the game.” I mean how tough is it to figure out if Jacobs with a runner on first (Cabrera) is better or worse than Cabrera with no runners on, assuming that you have a decent estimate for what each batter will do, on the average, against that pitcher?  The reason (it is easy to model) is that everything after the Jacobs AB is pretty much the same for both alternatives.  So it is not important to be able to model an actual “game” with 8 fielders, mutliple pitching changes, pinch hitters, etc.  That does not come into play in terms of evaluating the two alternatives.  In fact, for a situation like that, a Markov is probably better anyway.

Anon in #37 is right on the money in terms of the variance, bias, etc.  There is of course a point at which if the model is so bad, any small difference (or any difference if the model is almost worthless) becomes meaningless.  But, as I said, that is rarely the case, because these kinds of situations are NOT hard to model.


#40    Math Guy      (see all posts) 2007/08/06 (Mon) @ 00:06

MGL,

Does your model consider the type of pitcher each batter faces?  Does it consider the fielders on the field at that time.  For instance, Brandon Webb’s success is going to vary depending on whether a player like Mark Loretta or Adam Everett is playing short for him, but it would matter much less for Sid Fernandez, but some of that is going to depend on the batter at the plate and what kind of hitter they are and what kind of pitches they like. 

Does the sim take any of this into account?

Does it assume that all runners can score from first on a double with the same regularity even if all of the outfielders have different arm strengths and some are more likely to cut the ball off than others?  And some are more likely to have the ball hit to them than others?

Does the sim care that a manager might pinch run Amazega for some batters who are walked and might not for others?

Each of these factors add to the uncertainty of the model.

For example, imagine the Marlins had a tremendous defensive outfield.  In that case putting the slow Cabrera on with two outs might be the better play.  Jacobs may be less likely to end the game on one swing than Cabrera is. 

Does the sim consider that the Marlins manager might sub a speedy pinch runner for the lumbering Cabrera if the walk is made?  Garner didn’t know for certain who would be the eventual runner on first when he made his decision to walk Cabrera.  That would have to affect the win probabilities wouldn’t it?

I find it hard to believe that the difference between Cabrera and even a moderately fast runner on first in such a situation wouldn’t be worth at least 2-3% in win probability in such a simulation.  To pick extreme examples, imagine a double by Jacobs.  Wouldn’t it matter a great deal whether Hanley Ramirez or Miguel Cabrera was on first base when that double was hit as to whether that double won the game?  If you put them both on first for 100,000 doubles, Ramirez has to score on a lot more of them, right?

Your sims are likely based on average baserunners and average fieders, right?  None of those apply in this particular game situation.  And even if you did try to model fielding and baserunning ability, you would never be able to get them completely accurate.

This is what I’m talking about. I’m saying that no matter how sophisticated the simulation is there are too many unknowns or factors difficult to estimate to model it as accurately as you are claiming.

Now I’m certainly not claiming that the managers are inherently better at computing these probabilities than your sim is.  I’m just saying that when you are talking 1% difference in win probability, that difference is almost meaningless in terms of managerial decision making.


#41    Xeifrank      (see all posts) 2007/08/06 (Mon) @ 01:40

How does your simulation do against Vegas odds?
vr, Xei


#42    Fargo      (see all posts) 2007/08/06 (Mon) @ 13:13

anon, the sampling error with a sample N of 1000, and the proportions of responses roughly divided 50-50 is about 3.1%, not 1.6%.

Here’s a sampling error calculator:

http://www.dssresearch.com/toolkit/secalc/error.asp

You are absolutely correct about the difference between sampling error and bias. 

At the same time, you might note that a lot of the observed variance is due not just to sampling error but also to other factors (poorly worded or confusing questions, respondent disinterest, coding/recording errors by the interviewers, etc.).  Sampling error is only one source of the error variance.


#43    MGL      (see all posts) 2007/08/06 (Mon) @ 14:12

Math Guy, you are right in that some of those things matter in terms of estimating the win probability for each alternative.  It is up to the person doing the analysis to figure out which ones matter and which ones don’t.  As I said, my sim is a complex one.  For the record, it uses defense, as estimated by UZR projections, the arms of the OF’ers, the speed of the baserunners (1-5), park affects, etc.  For batters and pitchers, not only their “log 5 matchup” projection (based on current projections) for all components, but their projected G/F ratios, platoon ratios, etc.

Keep in mind that when estimating the relative value of alternatives, you have “thresholds.” By that I mean, let’s say that your sim or Markov tells you that putting Cabrera on base is worse than not.  Well, if there is the possibility that the manager might pinch run for him, that obviously improves the chances of the Marlins scoring, so you don’t need to worry about that (assuming that not having Cabrera in the game in case of a continued tie is not that big a deal).  Same thing for OF arms. Let’s say that they have below average OF arms.  Again, you would not need to worry about that. If they did have above average arms, you would need to redo the Markov or sim, assuming you didn’t account for that already.

But, the bottom line is that #37 was right and you are confusing accuracy with reliability.  Even if you had no idea whether they OF arms were good or not or how fast or slow Cabrera was (and you just used average arms and baserunners in your model), your “answer” as far as which alternative was “correct” would still be right.  The uncertainty would just be larger.  That does not change the “answer” though.  I’m not sure you (Math Guy) understand this concept.

While your points are well taken as far as all of the factors you mentioned and more affecting the outcome of the analysis, that does NOT lead to the conclusion that you CANNOT model these alternatives (they can be easily accounted for) or that you CANNOT come up with a reliable answer as to which of several alternatives is better.  I don’t know what else to say.


#44    MGL      (see all posts) 2007/08/06 (Mon) @ 15:15

The point that DG was making and one of the points I am making with regard to the Pujols/Eckstein analogy is that even though batting one for the other in one PA only changes the wp for the whole game by a little, we don’t need to consider the intricacies of an entire game in order to analyze that situation (to see which batter produces the greatest wp) - we only need to determine which is the better batter verus that pitcher, etc.  That is easy of course.

Similarly, estimating with a reasonable degree of accuracy whether a run scores more often with Jacobs at the plate verus Randolph, 2 outs, and a slow/medium runner on first or with the bases empty and Cabrera at the plate, is not that difficult, whether we consider the OF arms or not.

BTW, Cabrera is rated as a “2” out of 5 baserunner in my sim.  If we make him a “5” (one of the fastest runners on the league), FLO wp goes up by .17% not a whole lot and not even statistically significant for 100,000 games.  If we make him a “1” (rather than a 2), one of the slowest in the league, the wp goes down .39%.  If we pinch run for him with Abercrombie, who is a 4, so that Cabrera is out of the game, the wp for FLO goes down .12%, again, not statistically different from not pinch running for him.  It does not look like the speed of the runner is going to change the “answer” which is that walking him does not appear to be the correct thing to do (which is what I should have said in the first place, as opposed to “clearly the right thing” which can be misleading if interpreted as “without a doubt” or “with 100% certainty").

BTW, the the sim uses the 1-5 speed scores of the baserunners by using the actual advance rates on the various hits (and outs if less than 2 outs) of runners who are the fastest in the league (5), the next fastest (4), etc.


#45    Math Guy      (see all posts) 2007/08/06 (Mon) @ 16:19

I understand the difference between accuracy and reliability.  I’m pretty confident that if you run your sim for 100,000 runs 20 separate times you’ll get about the same results for WP for FLO for each case.  I have no concerns that your results aren’t reliable. 

It is the accuracy that I am concerned about.  Every single one of those values you plug into your sim is an ESTIMATE of the true parameters for the players.  Your running speed factor is an estimate of true running speed.  Your arms factor is an estimate of arm strength.  Each of these estimates has some amount of error.

These little errors all add up to widen the bounds for your confidence interval.  There is a reason that engineers have been known to double the level of estimated loads on structures when designing them.  They know their models are not accurate enough to be completely certain.

When you say there is only .16% error in your model, you are essentially stating that you believe your model is a 100% accurate representation of what happens on the field and the only error seen is that from not doing an infinite number of runs.  Is that really what you mean to say?

Perhaps an equation is the best way to get this across.

X = florida’s true Win Probability known only to an omniscient power.

Y = the win probability you would get if you ran your simulation an infinite number of times (this is the simulation’s view of the world).

Z = value you get from 100,000 runs.

I agree that
|Y - Z| ≤ 0.16%

I don’t agree that
|X - Y| ≤ 0.16% or
|X - Z| ≤ 0.16%

This X-Z value is where the rubber meets the road.  If the 95% confidence interval of Z as an estimate of X is more than 1% which it almost certainly has to be, then I don’t think you can malign Garner for making either decision.  And I’m not saying anyone else could do any better than this.  There are way too many factors that go into these situations to say with great precision the best way to model it.

For instance, does your model consider ballpark, wind speed, field conditions, day or night, or umpires working the game, game temperature?  I’m not saying any of these factors matter a whole lot, but they all add up to give you greater uncertainty and widen the differences that are required to give you meaningful differences.


#46    Math Guy      (see all posts) 2007/08/06 (Mon) @ 16:36

Perhaps an example will help.  This isn’t perfect because there isn’t a whole lot of randomness in it.

Let’s say you have 10 two-by-fours that are somewhere between 3 and 6 feet long.

When you model each event you are not able to do so with complete accuracy.  It is like having a tape measure only with inch marks on it.

You measure each of the ten boards rounding to the nearest inch, and add up your measurements.  You then do this 100 times and find the average of your 100 measurements.

If your average sum is 545.23 inches, you are not going to assume that the true, total length of all ten boards is then within half an inch of that amount.  It would be safest to assume that it is within five inches of that amount or less if you don’t want to be absolutely certain it is in your range.

You could measure the boards a billion times and you still aren’t going to be a whole lot more accurate than you were with 100 measurements.  Your values would be more reliable, but not more accurate.

I’m trying to say that your tape measure (in this case your simulation of real events) is only so accurate, and this is especially meaningful when you are looking at something like a single event in a single game.


#47    MGL      (see all posts) 2007/08/06 (Mon) @ 18:10

You are underestimating the powerr of the sim, but it is not really important.  (It does consider the umpires strike zone (it changes the K and BB rates, the park, and the weather.)

The thing is that my estimates are unbiased!  So if a wp result for one alternaitve is .620 with a 95% confidence interval of .02 in each direction (.6 to .64) and another alernative has a wp estimate of .630, also with a 95% confidence interval of .02 in each direction, that DOES NOT change the fact that the .630 is perferable to the .620, as long as these errors are unbiased.

So I am not disagreeing with your assumptions (that the wp is merely an estimate of the actual real life wp based on estimates of certain parameters as well as not even considering other parameters).  I am disagreeing with the consequences of this.  As I said, as long as teh models are reasonable and there is no reason to think that there is bias in the errors, as long as the mean of one estimate is higher or lower than the mean of the other, the correct choice is clear.  That does not mean that in reality we nail all of the correct choices.  It means that in reality we nail the correct choice more than 50% of the time.

That is also a differnent argument than criticizing a manager’s decision when the manager knows at least as much as your model and even more.  That is rarely if ever going to be the case.  The reason is that no matter how much more he knows than your model, he has no idea how to properly use that information to come up with an optimal decision.  For example, if he knows that Cabrera is nursing a sore hamstring ans is really a “1” or a “0” in terms of baserunning, he still has no idea whether that makes it correct to walk Cabrera or not because he has no idea what the “numbers” are when not walking him, when walking him given that he is fast, walking him when he is slow, etc.


#48    Math Guy      (see all posts) 2007/08/06 (Mon) @ 19:12

MGL,

I don’t think you can assume that the results are unbiased.  On what basis can you make that assumption.  Your results are purely based on past observations, you can’t possibly have enough data to know for certain that you have unbiased estimates of how Mike Jacobs will do against whoever was pitching for Houston that night with those particular fielders on the field.  You can’t do experiments and you can’t check your results independently in any way.  I’m not going to take your word for it that these estimates are spot on accurate.  You say there is no reason that there would be a bias, but I could as easily say there is no reason there isn’t, and even if they aren’t biased, the size of the confidence intervals is what is important here.

If the confidence intervals overlap, you really should accept the null hypothesis which would be that the two strategies are indistinguishable in quality.  I believe that would be the standard statistical practice in things like drug testing and the like.  Feel free to correct me if I’m wrong.  A bigger point is that we can’t even put a reasonable estimate on the error bars for something as complicated as this.  And I’m talking about a single event in a single game, not bunting as a strategy over the course of the season.

As for the last paragraph, I really don’t get what you are saying there.  Of course, it is relevant.  My whole point in all of my notes is that sabermetricians love to criticize even in situations where there is no basis for doing so.  Most of the cases you cited above really should be judgment calls.  I don’t understand the problem with saying, these probabilities are so close that either strategy is reasonable.


#49    MGL      (see all posts) 2007/08/06 (Mon) @ 20:41

I can’t argue any further.  I say that most or many of these circumstances are easy to model.  You say that they are not.  I say that in most cases there is no reason to think that there is bias.  You say that isn’t so.  Nothing more to say.

According to your arguments, there is nothing we can model in baseball and come up with a reasonably accurate answer.  That is ridiculous of course. 

There are many instances of small differences in wp (such as the Pujols versus Eckstein thing) being extremely significant (our confidence that there is a true difference is very great) and in other instances we are not so confident.  The point is that we are going to be right more often than we are wrong, assuming that we are using a reasonable model.  That is all I am trying to say.  At this point we ARE talking past each other, becuase I am not sure WHAT you are saying.  That the models are not perfect?  Agreed.  That some analyses are more difficult to model?  Agreed.  That if we think that strategy A yeilds a higher wp than strategy B, that it is possible that we are wrong?  Agreed.  That a 1% difference could be dead even or 1% in the other direction (or virtually anything else close to that)?  Agreed.  I can’t think of anything left to argue.  At the very least you should concede the point that a sim can model things a lot better than you thought they could (umpires, weather, baserunning, defense, OF arms, etc.).  I mean you start out arguing that a sim cannot possibly account for all these things, with the assumption that they don’t.  Then I tell you that the sim accounts (and does a darn good job) for probably 10 things that you thought it didn’t, some of them quite important, yet you ignore all of that and stick to your argument, part of which is that the results of the sim are so innacurate that a difference of 1% is meaningless.  That makes no sense.

And no, it is NOT correct when the error bars or confidence intervals intersect to “accept” the null hypothesis.  They intersect in virtually any anlysis, especially if you make them wide enough (95%, 99%, whatever).


#50    Math Guy      (see all posts) 2007/08/06 (Mon) @ 22:28

That some analyses are more difficult to model?  Agreed.  That if we think that strategy A yields a higher wp than strategy B, that it is possible that we are wrong?  Agreed.  That a 1% difference could be dead even or 1% in the other direction (or virtually anything else close to that)?

Good, I’m glad we agree.  My point is that if you simulate two situations and get a difference of 1% in win probability between two managerial options, then there is a good chance that either option is the best option. 

I’m sorry if I’m frustrating you, but I don’t think it is useful to overstate the value of the tools at our disposal.


#51    MGL      (see all posts) 2007/08/07 (Tue) @ 00:35

NP. I don’t agree that there is a “good chance” that either option is the best option, although I realize where you are coming from.  There are clearly some situations, like the Pujols/Eckstein one where the difference in wp for the entire game might be 1% but clearly one option is better than the other.  So there must be other situations besides that one where the difference in wp is small BUT one is the clear choice over the other.  Surely you must agree that for all 1% differnces in wp, there is a continuum from “clearly obvious which is the correct choice” to “it is not clear at all which is the correct choice so let’s just flip a coin.”

Intuitively it would seem that a 1% difference in wp is nothing and essentially is a “tie” but I have been working with wp’s with sims, Markov models and other ways to analyze (theoretical models) and you would have to trust me (I know you don’t, but I am referring to other readers who are following this thread) that 1% can be a lot and can signal a “clear choice.” One reason I know that is that I can plug all kinds of differing factors into the “equation” and it turns out that the wp results are not that sensitive to these changes in parameters.

For example, let’s say that I don’t incorporate speed of the baseunner into the Cabrera IBB model and you say, “Well I don’t trust your results because Cabrera is slow and you are assuming an average baserunner.” Now, I can test your argument very easily to see if it has any merit (that because I am not controlling for speed, my model is not accurate enough to say that a 1% difference is significant).  Let’s also assume, for the sake of argument, that that is the only parameter that I am not accounting for and your argument is that it is critical to incorporate that paramter.  I can simply plug in a slow runner into my model (which is easy to do - I have the runner advance no more than the slowest runners in baseball advance, whatever that is).  If it is still NOT correct to walk Cabrera then I know that your argument is not valid and that my result is “correct” (again, I am not stating that I know with 100% certainty, or even 80%, that it is correct in the world - that is not necessary for me to declare it “correct").  And I have just shown you that the wp is NOT very sensitive to the speed of the baserunner…


#52    Math Guy      (see all posts) 2007/08/07 (Tue) @ 08:28

And no, it is NOT correct when the error bars or confidence intervals intersect to “accept” the null hypothesis.  They intersect in virtually any anlysis, especially if you make them wide enough (95%, 99%, whatever).

Perhaps a Stat Guy or Gal can verify this for me, but I don’t believe this comment is factually correct.

Also, it is good that you are doing sensitivity analysis on your model, but when you say that the speed of the runner is not that important, you are really just saying that the speed of the runner is not important in terms of your sim.  You don’t know exactly how important the speed of the runner is in real life.  We have a decent idea, but I don’t think that you can say with any great certainty the probability of Cabrera advancing on a wild pitch with Munson catching and so-and-so pitching.  With an average pitcher and an average catcher yes, but in a specific situation probably not.  You are modeling it and that involves error. 

I believe we can model these things on a macro level and get results that show when general strategies are better or not, but on a micro level on a case-by-case basis, I think you need a bigger difference than 1% to make a definitive statement.


#53    Pizza Cutter      (see all posts) 2007/08/07 (Tue) @ 09:25

One never actually _accepts_ the null hypothesis.  We only ever fail to reject it.  It’s a small difference in terminology, but it’s more accurate.  However, yes, you can reject the null hypothesis, even if the error bars intersect.  When comparing two distributions, it’s the size of the overlap that counts.

At this point, y’all are arguing about the model’s sensitivity and the possibility for Type II error (saying that there is no significant difference when there really is one).  We know it’s not sample size that will be the stumbling block (MGL can run his Markov/Monte Carlo a few hundred thousand times if it’s really important.) The only question is whether the model accurately models reality.  MGL’s not going to get absolutely everything (no simulator of anything can).  However, even in hypothesis testing, even if the difference between the two distributions in question doesn’t make it to the pre-ordained alpha level (whether you prefer .05 or .01 or .001), the fact remains that one number is less than the other, and even if we’re not statistically certain on that one, that is still the way to bet, if you trust the way in which it was generated. 

May I suggest, however, that you have reached an impasse in the argument.  MGL believes his sim accurately models reality and that the difference between 90.3 and 89.2 meets an appropriate alpha level cutoff.  Math Guy has problems with one or both of those criteria.  You might be using vastly different alpha levels in your head (solution, state the actual p-value).  As to the accuracy of the sim, MGL prefers to keep his sim private (his prerogative to do so), so the rest of us are pretty much guessing at what’s in the box.


#54    Rally      (see all posts) 2007/08/07 (Tue) @ 09:28

If you want to say you are right with 95% confidence, then you must accept any null hypothesis if the error bars intercept.  If you are just trying to be right more than half of the time, which it appears is MGL’s goal, then you need not pay attention to the error bars.


#55    Rally      (see all posts) 2007/08/07 (Tue) @ 22:33

Angels down 1-0 in the bottom of the first.  One out, Figgins on second, Vlad facing Tim Wakefield, Garret Anderson on deck.  This strikes me as a pretty dumb time to issue an intentional walk, mostly because its the first inning.  The Book says win% is increased for the Angels, but we are talking about Vlad vs Garret Anderson. 

I wonder what the sim says?


#56    MGL      (see all posts) 2007/08/08 (Wed) @ 01:11

The sim says that with Vlad batting, Boston wins 53.57% of the time in 100,000 games.

With the IBB, Boston wins 52.91%.  So the IBB is correct!  Or at least we can’t tell the difference, according to Math Guy!  Anderson is a really crappy hitter these days, and even though he is lefty, Wakefield does not have much of a platoon ratio.

And BTW, Math Guy, if you think that a basrunner’s speed with respect to a possible wild pitch (as in the Cabrera IBB situation) has ANY effect on the wp of the game (more than de minimus of course - it has SOME effect), then you are not arguing from a solid foundation, to say the least.  I mean do you have ANY idea how often a wild pitch or passed ball occurs where a fast runner advances and a slow runner doesn’t and how much that affects wp?  Apparently not.  Sorry to be so harsh, but when you say ridiculous things you can’t expect someone to take you seriously in a serious, legitimate argument.


#57    Rally      (see all posts) 2007/08/08 (Wed) @ 10:34

How is the IBB correct from Boston’s prespective if it lowers their expected W% from .5357 to .5291?  Or did you transpose the numbers?


#58    Rally      (see all posts) 2007/08/08 (Wed) @ 10:42

I wouldn’t call Anderson crappy.  He’s been about league average hitting the last 4 years.  Below average for a corner, yes, and vastly overpaid, but he’s not horrible.

Wakefield’s platoon split:  Last night the switch hitters Matthews and Izturis were batting righthanded against him.  Switch hitters often do that against knuckleballers, and it is the stumbling block in my attempt to get PBP data from MLB.com’s gameday files.  The Baseball Hacks script does not handle a switch hitter doing that.


#59    MGL      (see all posts) 2007/08/08 (Wed) @ 14:28

Oops!  The numbers are right.  I was thinking from the perspective of the Angels.  The IBB was probably NOT correct as you thought, although it IS pretty close.

A “stumbling block” for what?  Wakefield is the only knuckler in the league other than Haeger who has not pitched much.


#60    Rally      (see all posts) 2007/08/08 (Wed) @ 16:08

I copied a Perl script from the Baseball Hacks book that is supposed to download files from MLB gameday, run through them and create retrosheet files.  Early in the year, Wakefield faced Mark Teixiera and Mark batted from the right side.  The event “batter changes sides” is not handled well by the program, and my program blows up at that point.  After awhile I gave up, my Perl programming skills aren’t that good.

From the Angel’s POV, it was a great move!  It didn’t work out, they failed to score in the first, but had no trouble scoring for the rest of the game.


#61    joe p      (see all posts) 2007/08/08 (Wed) @ 16:26

Rally,

There is a field in the inning xml files for each at-bat that lists the side that the batter is hitting from (stand).  My name has a link to the inning files from last nights game.  In the 2nd inning Izturis hit right-handed vs. Wakefield, but in the 5th and 6th vs. Delcarmen and Tavarez he hit left-handed.

I couldn’t figure out how to use that program in Baseball Hacks either, but maybe you can use the different sides that guys hit from to have create the ‘batter changes sides’ event.


#62    Rally      (see all posts) 2007/08/08 (Wed) @ 17:14

Its a rare occurence, I would be happy to ignore batter changes sides if I could get PBP files for everything else, but I couldn’t do it.


#63    David Smyth      (see all posts) 2007/08/08 (Wed) @ 18:12

Isn’t it likely, that when the odds are that close (such as the 1% in the Cabrera IBB case), that the manager, in the real world, might as well pick the option that will be least likely to hang him out to dry, given the state of knowledge of his majority of evaluaters (the sportswriters, general fans, and his own GM)? In his (assumed) primary goal of job preservation, he has to find the balance between the move which is in fact better (and thus leads to more wins on his resume), and the move which is worse but looks better to his evaluaters. When in doubt, which is apparently the case much of the time, since according to MGL not even S Hawking could figure out the ‘correct’ move on the spur of the moment, he might as well take the path of least resistance.

If all you care about is the technical math details of the move in question, then my point is pretty much irrelevant. But if you want to ‘apply’ the technical results to the real world, then it’s a different story.


#64    MGL      (see all posts) 2007/08/08 (Wed) @ 21:15

Well of course it is in the manager’s best interests to do whatever preserves his job longevity.  On the other hand, I think that managers somehow think they can figure out the correct play.  Actually it is more accurate to say that they don’t understand the concept that there is a correct play and an incorrect one, mathematcially.  It is not like they say to themselves, “Gee, not even Stephen Hawking can figure this out so I’ll just do whatever is in my best interest.” While they may subconsciously do that, I think that for the most part they genuinely think they are doing the right thing.  Their “thinking,” in terms of the right thing, in cases like the Guerrero walk, consists of, “Well, I’m certainly not going to let THAT guy beat me.” Or, “Well, who would I rather pitch to?” or “Now I can get a DP.” As if the extra runner on first is not a factor.

Believe me, when I worked for the Cardinals and I tried to talk to Tony LaRussa about optimal lineups and strategies, he had no idea what I was talking about and thought I was nuts.  Seriously.  And he is one of the smart ones.


#65    Ty      (see all posts) 2007/08/09 (Thu) @ 09:28

DS/63,

It’s true that almost all managers’ best interests are to do whatever preserves his job longevity. But at least some managers who’re managing a winning team (maybe Tito is one of them) could choose the mathematically better strategies more since they don’t have to worry losing their jobs too much. I’m wondering would they do that.


#66    Rally      (see all posts) 2007/08/09 (Thu) @ 09:43

In Tito’s case, his bosses know the odds.  He’s not fooling anybody by choosing the “safe” play here.  If it was the 7th inning or later, I could see the standard “I wasn’t going to let Vlad beat me” line, but in the first inning, with only 1 out, its not a “safe” decision, he’s making a risky decision with plenty of opportunity to be second guessed.

In this case I’ll stick with:
A) Francona doesn’t know the odds.
B) He played a hunch.


#67    MGL      (see all posts) 2007/08/09 (Thu) @ 09:48

The thing is, who is going to tell them what the correct decision is?  As I said, in most cases, it is not possible for a lay person to figure them out on his own and certainly not “on the fly.” So what is a manager supposed to hire his own personal analyst?  Read and utilize the advice in “The Book” without sanction from his front office?  No, if a manager is going to choose optimal deicisons, it is going to come from the brass, his bosses.

What is true is that in the face of choosing one alternative over the other, given that the manager is uninformed, he will usually choose the least controversial and the most risk-aversive, to his team and to his job.  I’m not sure, but I think that more often than not, the least controversial and risk-aversive choice is the non-optimal one.

Everything everyone does in always motivated by their self-interest.  It is just that that self-interest varies from individual to individual, even in similar circumstances.  For example, some managers are more concerned with job preservation than others.


#68    MGL      (see all posts) 2007/08/09 (Thu) @ 09:56

For whatever reasons, I think that Boston intentionally gives Tito a lot of autonomy.  I don’t think they want to “pull his strings.” They may have briefed him about certain strategies in general, but I think that he basically has free reign on them.  And I doubt, for example, that he will get chided about that IBB.  And, let’s face it.  Bill James is NOT the best analyst in the world, although they do have plenty of other ones working for the team.

Now if it were up to me, I would make sure that may manager understood that I (the front office) was going to inform him of how to handle all strategy decisions (and that I would review his decisions with him periodically) and that if he was unwilling to do that or did not feel comfortable doing that, he should not manager my team.  Whether that is practical or not, or even whether that is the best way to run a team, I do not know.


#69    Fargo      (see all posts) 2007/08/09 (Thu) @ 10:28

I don’t see how that would work at all on a practical level, in part because the strategy decisions often involve assessments of the capabilities of the personnel on hand at the moment (which the field manager may have a better handle on than anyone in the front office).

A more general approach would be to organize a “sabermetrics school,” in which you would use your sim, the product of research in “the book” and other sabermetric research (e.g., on player valuation and business decisions). A three-day intensive seminar for field managers and front office personnel.  Paid for by MLBAM, perhaps.  In other words, train them in what is known from a statistical standpoint, have discussions about strategic decisions and how different managers might handle them (use real, known decisions, lots of video, and build the stats and odds into the discussion, but don’t just teach stats and odds).

My working assumption is that most people don’t read—not even these wonderful blogs—but they might enjoy “going to school” for a few days in the off-season. You’ve gotta involve more than just statheads in the training, however. . . . Banish the powerpoints, or at least minimize them. Lots of videos and “scenarios” that lead to interpretation and learning.


#70    Ty      (see all posts) 2007/08/09 (Thu) @ 11:03

"For whatever reasons, I think that Boston intentionally gives Tito a lot of autonomy.  I don’t think they want to “pull his strings.” They may have briefed him about certain strategies in general, but I think that he basically has free reign on them.”

Maybe we can try to examine Bob Geren’s in-game straregy? I guess Billy Beane would be willing to pull his strings and dare do so, too.


#71    Fargo      (see all posts) 2007/08/09 (Thu) @ 11:12

I take back the term “Sabermetric School.” I’d call it a “Baseball Strategy Academy,” and focus on strategic decision making on the field and in games, in personnel management, and in finances. And draw “instructors” from the baseball business as well as from the baseball analysis business.


#72    MGL      (see all posts) 2007/08/09 (Thu) @ 14:30

Fargo, not a bad idea.  Ty, I’m not sure that Oakland has an analyst that works in these types of things.  But I agree that if they did, Beane would not hesitate to tell Geren what to do.


#73    Bill Melader      (see all posts) 2007/08/09 (Thu) @ 17:40

MGL,

Excellent thread, but I think you were a little hard on Bochy (and on Math Guy).  Bochy has more information available to him than you, & your simulation is only better than a coin flip?  Math is not my area, but does that summarize it correctly?


#74    Pizza Cutter      (see all posts) 2007/08/09 (Thu) @ 17:53

Bill, while Bochy may have more information available to him, he is a human being (I hope) and humans are very poor processors of information.


#75    MGL      (see all posts) 2007/08/09 (Thu) @ 20:13

Pizza Cutter said it very well.  This whole idea of “the manager or the team knows more than you do” is nonsense.  And no, the result of the sim is NOT a coin flip for reasons which I have articulated ad nauseum.

A manager has zero chance to be able to process any information he has in order to make correct decisions in most cases.  Zero.  Other than the obvious ones. And of course he has a 50/50 chance of making the right decision although they tend to make the wrong one much more than 50% of the time for various reasons.  And in general, managers do NOT have any more information than I have in order to make a correct decision.  In fact, it is quite the other way around.  The information that managers use to “make” decisions is generally quite erroneous, irrelevant, incorrect, etc.  Like, whether a player is hot or cold, batter/pitcher matchups, etc.  And let’s say that the manager knew exaclty how often a player was going to ground into a DP (on the average), how often he was going to hit a sac fly, be walked, strike out, hit a fly ball, where the defense if playing, etc.  How in the WORLD would he be able to process that information in his head in order to make an optimal decision about an IBB, sac bunt, etc.?  Saying that the “manager knows more than you do, therefore you are probably wrong and he is probably right (or something like that)” is a ridiculous way to argue a point.  It is the same thing as saying, “I don’t like the idea that a stat nerd can tell a manager the right thing to do.  I have no idea why and whether he can or can’t, so I’ll just say that he can’t because he doesn’t have all the information at his disposal than the manager does. I don’t really know what information that the manager has that he doesn’t is relevant, but I don’t like the idea that a computer can dictate game strategy.”

With that belief, there is nothing I can say or do to change a person’s mind.


#76    Ty      (see all posts) 2007/08/10 (Fri) @ 02:34

The biggest contrast between stat guys and managers in my thought is most of the stat guys will be willing and eager to know how managers/coaches make their decisions (it might be confidential), but I never heard any managers/coaches tried to know what is Run Expectancy (maybe some did). Perhaps they just have too much to worry about and are always busy with their heavy daily work.


#77    Bill Melader      (see all posts) 2007/08/10 (Fri) @ 08:47

Pizza Cutter:

Humans are poor processors of information?  I’m not sure what you mean...what is that thing between the ears for?

I’m confident any manager is capable of memorizing a set of “optimal strategy” rules (or of carrying them in his pocket, or of having a team of quants on the other end of the phone).  Even if this were the case, I would want MY manager to have over-ride authority because (contrary to MGL’s assertion), they do have information that’s not in the model.


#78    Ty      (see all posts) 2007/08/10 (Fri) @ 10:06

How about information like scouting reports and video studying? I guess they should be useful and have an effect on in-game strategy.


#79    MGL      (see all posts) 2007/08/10 (Fri) @ 11:23

What exactly in a scouting report or video would significantly affect an in-game strategy decision and what makes you think that any manager uses them for those decisions?


#80    Pizza Cutter      (see all posts) 2007/08/10 (Fri) @ 13:00

Bill, consider: human information processing is subject to a lot of biases.  Confirmation bias is the tendency of humans to only look for information that confirms what they want to be true (Some guys are clutch hitters… I mean, look at what he did in those couple of games...).

Hindsight bias is the strange belief of people that after a successful event (and I repeat, _after_), that they “knew all along” that’s what would happen.  Usually paired with “it was destiny,” when in fact, the more accurate statement would be “I had no idea, made a guess, and got lucky.”

Availability bias can be summed up in this one: “Name a baseball team.” Your mind just drifted to your favorite MLB team.  I didn’t say it had to be an MLB team, just any group of people who regularly play baseball together.  (I’m just naming a few of the many biases.)

Humans make decisions, often without considering all the info, but only parts of it, and often times not a representative sample of the information available.  We fool ourselves into believing things that aren’t really true.  I’m a psychologist by training.  I study this stuff all the time.


#81    Greg Rybarczyk      (see all posts) 2007/08/10 (Fri) @ 15:34

MGL - I agree with almost everything you are saying here, but what about on-the-spot changes in game conditions that override the stats?
- a manager observes that an opposing pitcher has just lost his control, and no one’s up in the pen.  Should he pinch hit a hitter with good ability to draw walks, or someone who will swing at nearly anything, even with the take sign on?
- manager needs a home run, but the wind is now blowing in hard, meaning the pinch hitter with a slight advantage in wp overall now has no chance of clearing the fence, while another guy with lower overall wp (on average) might still be able to get one out?
- manager has bases loaded with 1 out, winning run on third, the outfield is drawn in to prevent a shallow sac fly (i.e. normal outcome probabilities are significantly altered).  Should the manager pinch hit a hitter with slightly higher overall wp but a strong tendency to strike out, or a contact hitter with slightly lower wp overall?
- pinch hitting option #1 has higher overall wp than option #2, but is recovering from the flu, which no one outside the clubhouse is aware of?  who should hit?
- ph option #1 has higher overall wp than #2, but is recovering from a hamstring pull, and the manager is concerned about using him because he has to manage games tomorrow and the rest of the season, as well as tonight.  who should hit?
- ph has higher overall wp than other options, but is badly impaired (e.g. Kirk Gibson ‘88 WS), and cannot possibly do anything besides homer, single or walk.  His true, on the spot wp is different from what the stats say, but how different?
- 8th inning of close game.  Manager has to choose which of two ph options to use, one slightly better than the other in wp for the situation.  He also expects to need to hit for his pitcher the following inning?  who should he use?  Is he stupid for saving the better ph for the 9th?

You are dead on that most managers are ill-equipped to analyze all the situations correctly - no question, I agree completely.  But I think I disagree with the contention that managers know nothing that we don’t know.  Also, to reiterate, they have to play tomorrow’s game, while the simulation does not.


#82          (see all posts) 2007/08/10 (Fri) @ 15:38

I can see how an analyst can give a manager a general set of strategic rules, but it hardly makes sense to criticize a manager for botching a play where the delta between his strategy and the optimal strategy is 1%.  As noted, humans are poor processors of information, and since I rarely see managers in the dugout with a laptop and wi-fi, how can we possibly expect him to make the correct (as simulated) decision? 

It’s one thing if we’re going to criticize managers for screwing up when it comes to wholescale strategy (e.g., only the very worst hitters should sac bunt before the seventh inning [or whathaveyou]), but quite another if we expect them to choose between strategies that requires a sophisticated simulation to distinguish.  MGL saying that the move is “clearly” wrong is clear only to an analyst with a powerful tool, not to a human manager in a dugout.


#83    Anthony      (see all posts) 2007/08/10 (Fri) @ 15:38

79: What about things like Felix Hernandez throwing virtually all fastballs in the first inning or a player coming back from a wrist injury and not being able to turn on pitches like he normally does? To me, those are valuable pieces of information that should be considered (and as Enhanced Gameday data becomes more prevalent, some of it can probably be assimilated into your simulation).


#84    MGL      (see all posts) 2007/08/11 (Sat) @ 22:51

It is certainly debatable at what point you can “criticize” a manager for not making an optimal decision.

Tonight in the MIN/ALA game there was a perfect example of a situation that comes up all the time where I think you CAN criticize a manager for not “thinking” enough.  MIN in the top of the 7th had runners on 1st and 2nd and 0 outs and Punto up.  The commentators said that he would be bunting as if there were no choice.  The first and third baseman were right down the batter’s throat.

Now, you would think that a reasonably intelligent manager would figure out that with the 1st and 3rd baseman charging that perhaps a bunt might be difficult and that swinging away would be more fruitful.  In other words, that perhaps you should never let the defense know that you are definitely going to be bunting, even though I would not expect him to be able to figure out the “numbers” of course.

I think it is fair to criticize a manager who does that.

BTW, Ozzie Guillen is an excellent manager along those lines.  You do not know when he is going to swing away or bunt (which is crtical to optimal bunting strategy as I outline in The Book) and he does not hesitate to keep bringing in relievers to get the platoon advantage, even in the 9th in a save situation.


#85    Greg Rybarczyk      (see all posts) 2007/08/12 (Sun) @ 01:47

One element of the frequent pitching changes that I never hear discussed is the idea that the law of averages makes it very unlikely that every single pitcher in the bullpen is at his best on any given day.  If your strategy is to use every guy out there for 1 or 2 batters, then you make it very likely that whichever guy is having an off day will take the mound and hurt you. 

The nightmare scenario: your 2nd best setup man pitches a lights out 7th inning, but comes out because best setup man’s job is the 8th.  Best setup man mows down the first two hitters in the 8th, then is taken out because the 3rd hitter that inning is a lefty.  The LOOGY comes in and walks the batter, and you have to bring your closer in to finish the 8th.  Two guys had their top stuff today, and they’re both in the showers now while you either a) work your closer a bit harder than you might have had to, or b) your closer has lousy stuff today, in which case you lose.  All because the manager can’t seem to leave a relief pitcher on the mound when he’s dealing…

I blame Tony LaRussa…

[/rant]


#86    MGL      (see all posts) 2007/08/12 (Sun) @ 02:04

You never hear it discussed (in a serious, qualified discussion) is because it has no vailidity.

There certainly are many considerations when deciding who to pitch and when, such as future bullpen use, warm-up time, nuances like whether you need a DP, want to avoid a HR but don’t care about a walk, etc., but the bottom line is to put your best pitchers in the highest leverage situations and the worst in the lowest, AND to choose the best reliever out of the pen, based on the handedness of the batter, and as a tie-breaker (actually more than a tie-breaker), the GF ratio of the batter as compared to the GF ratio of the pitcher.

And, unbeknownst to most if not all managers, the best and worst pitchers (the hierarchy of talent) are NOT based on “the magic current season’s stats” nor are they based or influenced by the batter and pitcher’s previous results against one another.


#87    Greg Rybarczyk      (see all posts) 2007/08/12 (Sun) @ 11:50

So when Kason Gabbard tossed a 3-hit shutout on July 16, 71 of 107 pitches for strikes, Francona was wrong to leave him in to finish the game, because he had Okajima and Papelbon available in the bullpen, then?  And those guys are always, in every case better than Gabbard, because Gabbard&