In the second to last sentence above, I mean “WE of the pitching team”. Or alternately, the WE of the batting team was “lower.”
May I ask from where it is that you (Peter or MGL) are compiling your data? I wouldn’t mind having a run at this, but other than BB-Ref’s PI feature which doesn’t provide quite enough detail I’m not sure what else I could use.
I am using retrosheet play-by-play data, or at least some version of it. http://www.retrosheet.org
Heh. Of course. Sometimes I’m not too smart.
Nonetheless, I’m going to give this a go. Thanks MGL.
If you go to my wiki, I give you some instructions to creating your own DB.
I also suggest you join my yahoo group RetroSQL.
Ryan - I used Retrosheet for my database as well.
MGL - Rebuttal was probably not the best choice of words to use in the title as I did not disagree with anything that you did in your own study. Nor did I really disagree with the conclusion that it was very close to a break even proposition on whether to intentionally walk or not in the situation that you defined as I obviously came to a similar conclusion. In fact my only disagreement was whether the evidence that you put forward in your study was strong enough to conclude that final result when all factors are considered would be even a little bit in favor of not intentionally walking. I also agree with your logic that if the evidence shows that intentionally walking a batter in a specific situation results in even a minutely greater chance of losing the game that you should not do it. I have said in the thread on your study and again in my study that the only way these strategic decisions can be evaluated correctly is to look at each one individually with the best simulation that you can devise that takes into account all the factors that might enter into the decision and then proceed with what the simulation recommends. You have stated elsewhere that there are decisions that are clearly wrong, others that are clearly right, and a third group where the answers we currently have are in a grey area, and we have to admit that there may be factors unconsidered in our analysis that could swing a decision one way or the other. In other cases you have decided to defer to a manager’s decision when it fell into a grey area. Here you did not, recommending that a manager not bother with an intentional walk.
Did you do this because you felt you had conclusively proven that intentionally walking the batter would almost always lead to a reduction of winning potential? With only a .1 to .2 average difference per walk on the aggregate of the 508 walks in ten years there surely must be factors that would cause enough variation in the specific run values that a significant portion of those 508 walks would increase the defensive team’s win potential.
My decision to do the study I did was not to try and prove you wrong or to prove that intentional walks are necessarily good strategy. I only wanted to show that there was another way of looking at the question that might leave the final conclusion in the grey area where we leave the recommendations until we have more and better information. Since I was not intending to offer conclusive proof but only casting doubt about whether we had sufficient information to come to a conclusion, my use of a small sample size should not be a problem. But you are certainly encouraged to use my methodology on your ten year sample and see if it follows the same pattern.
As to your point 3, I did look at the distribution of the score differences for both walks and non walks. They were very similar with the tied games being dead even. The small differences that existed in the ahead one and two runs and behind one and two runs both tended to support the tendencies I reported. I would have mentioned if they had not. Again, if I had been intending to “prove” anything rather than to show that there was another methodology that might return the whole question to the grey area of “needs more study” I would have taken the time to show both distributions.
I know that looking at 508 walks through your simulator is way too much trouble. But why not try at least a few to see what you come up with?
Are you in Central NY now? If so, get my email from Tango and let me know when we can get together.
I agree with you pretty much.
s to your point 3, I did look at the distribution of the score differences for both walks and non walks. They were very similar with the tied games being dead even. The small differences that existed in the ahead one and two runs and behind one and two runs both tended to support the tendencies I reported.
Are you sure about that?
First of all, I have 55.9% of the IBB’s with the batting team as the home team and only 51.3% of the non-IBB’s. That actually supports the IBB, since the WE of the the home team is obviously greater that that of the road team, everything else being equal.
I have a very different distribution of score differentials between the IBB and non-IBB.
I have 25.8% of the IBB games being tied and 35.4% of the non-IBB. 14% of the IBB were pitching team up one run and 14.5% of the non-IBB. 5.4% of the IBB were up 2 runs and 12.2% of the non-IBB. 30.1% of the IBB were down by a run (the pitching team), and 22.9% of the non-IBB. 24.7% of the IBB were down 2 runs and 15% of the non-IBB were down 2 runs.
So the distributions are so different, I don’t see how you can compare the WP! You have to at least adjust for the run differentials.
Without going through the basic WP, I have no idea whether the differences support the IBB or the non-IBB, but it is clear that a comparison of the WP tells you nothing.
Peter, I am in NY. Tango can send me your email address or vice versa.
MGL - You have those distributions for 2005-2007 or for your 10 year DB?
05-07 AL only.
Well, I gave it a shot. I set up a little blogger thingie to post the results:
Let me know what you think and what I can do to make it better.
Huh...I guess my comment got flagged as spam.
I gave this issue a look. It’s my first attempt at this sort of thing so go easy on me.
I set up a bloggie thingamajig to post the “study”:
Ryan/11,12 were flagged as spam, and have been unqueued.
Ryan, nice job so far. Can you give us a break down by inning? That is important to match up also. A 1 run lead or deficit in inning 1 is quite a bit different than in inning 6, for example. That is another “mistake” that Peter made - not controlling for the inning.
Also, your blog looks like it is underwater or something.
Nice job Ryan. A better study than mine, but so far consistent with what I found. Remains to be seen whether differences in hitting ability of the batters not walked and the batters walked would even out the total of about 6 games cost by the intentional walk over 8 years. And I would use slugging rather than wOBA to measure hitting ability in this instance since wOBA has the value of a walk as a large factor which you would not not want for this analysis. For me, what you found still leaves the intentional walk decision in this particular subset taken as a whole in the grey area where it would be difficult to criticize a manager’s decision either way.
I still feel that it would be enlightening if MGL would analyze 4 or 5 specific instances with his simulation.
I would suggest using Table 50 of The Book, or similarly, this one:
http://www.tangotiger.net/RE9902event.html
We care about Linear Weights by base/out situation.
If you are feeling adventurous, you’d want Linear Weights by Game State. I *really* should publish those.
I am sorry Tango, I am not following what you are suggesting in post #16 at all. Could you add some more information aboout what you would be using Table 50 for?
I was responding to your point here:
“And I would use slugging rather than wOBA to measure hitting ability in this instance since wOBA has the value of a walk as a large factor which you would not not want for this analysis.”
What you need is a base/out version of wOBA, one where the value of the BB goes down with 1B open, and shoots way up when the bases are loaded. And, extending further, one based on the game-state (inning, score, base, out).
Good idea, MGL. I altered my procedure and am re-running it now. I will post the results on my site when it is finished.
I also changed my blog colors. I am partially color-blind, so decorating isn’t my forté. : P
Whoa, those smilies are pretty, uh, unmanly there Tango…
I have posted the new study on my site.
The numbers still jump around quite a bit, which suggests to me that the samples are still too small to really say much. In any case, there does seem to be a result in favor of NOT issuing the walk, except in a couple scenarious (mostly innings 5 and 6.)
Hard for me to make heads or tails of any of it. What do you guys think?
If you don’t ignore the first 18 rows and total up all the IBB gains the result is losing about 1.4 games over the 8 seasons. This is before adjusting for the extra hitting ability of the batter that was intentionally walked.
As for making the adjustment for the hitting ability of the walked batter, I see where Tango was going now with his table #50. But I think you would have to create 9 table 50’s, one for each line up position because the lineup positions of the walked batters are not going to be represented equally and they are going to make a big difference in the expected runs scored. Personally, I don’t think it is worth the effort. It may swing the IBB to a slight win gainer, and it may not, but it is not going to remove it from the “grey” area and each decision is still going to depend on the individual factors for the specific situation and not any generic solution.
Good work Ryan. Thanks for doing all the research.
When I remove the restriction about the walk needing four straight intentional balls, I get quite different results:
inning/score -5 -4 -3 -2 -1 0 +1 +2
------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1 0 0 0 1.69014085 -.24590164 -.22109827 .35 -.54901961
2 -.45454545 .25 -2.6276596 .088235294 .088888889 -1.7058824 0 0
3 -.2 -.64 -2.6119403 -.4 -.3196347 -.16666667 -1 .361111111
4 .738095238 .370967742 2.04950495 -.51456311 1.312 1.66666667 .747474747 -.77777778
5 -.51851852 .551724138 2.72972973 2.19166667 2.55905512 6.17687075 -1.3414634 -.77227723
6 0 -.15942029 -3.2727273 -1.5890411 -.40909091 2.41584158 3.07142857 1.2
Ryan - Are you sure that table is correct? It has the intentional walk gaining over 10 wins. That’s quite a swing from the losing 1.4 wins that you had before. You may want to look at just the non 4 ball int walks and see if everything adds up.
Peter, the problem with using my sim is that it assumes a static hitting (and pitching) profile for every batter in every PA, regardless of the base/out state. We know that with, for example, runners on second and third, 1 out, both the batter and the pitcher significantly change their approach. Ditto for bases loaded. While the sim is good for running a whole game or even analyzing a situation where the hitter or pitcher approach does not matter that much, it does not do a good job if the hitting and pitching approach matters to the analysis. In this case, I really don’t know if it matters that much to the analysis.
Ryan, the sample sizes are way too small to draw ANY conclusions on each individual category or even all categories combined. That is one of the problems with using WE. You generally need enormous sample sizes to find significant differences.
Imagine that you suspect that the difference between one of two alternative strategies is .01 wins, which is actually a lot for one strategy (the equivalent of .1 runs).
You would need 10,000 games for that to show up as 2 SD greater than the null hypothesis, if that makes any sense to you! IOW, if we found a 1% difference in WE between two alternative strategies, in 10,000 games, we would say, that that is a statistically significant result at the 2 sigma level. Actually if comparing two samples of games, you would need 20,000 games in each sample to get a combined (sum of the variances) SD of .005 (so that .01 would be 2 SD or 2 “sigma").
Yes, MGL I understand and fully agree.
If I eliminate the need for pitch sequence data, I can get a much bigger sample. Will work on it.
I don’t think anything will help you get the sample size you “need” other than looking at a hundred years of games maybe.
I am glad that someone has started to look at win expectancy, or in this case, actual wp, and as I have said several times, I was not happy with the way I did the analysis, but…
I can’t see ANY way to draw any conclusions from Peter’s look, for three reasons:
1)Peter says, “Overall the strategy of intentionally walking the batter may have resulted in a net loss of two games over 3 years…” In case that sentence is confusing, he means that the intentional walk cost 2 wins, as opposed to not issuing the intentional walk. In other words, issuing the intentional walk was a worse strategy than not issuing the intentional walk, at least if we only look at what happened with a walk and what happened without a walk, and not adjusting for the fact that the batters being walked are going to be better batters (especially when platoon advantage is considered) than the ones who are not walked. Granted 2 wins is not a lot in 3 years for an entire league (then again, in my article, I also mentioned that the cost of the IBB, assuming there is a cost, is likely to be very small). In his next sentence, he says, “Overall it is likely that the way that American League managers have employed the intentional walk during the last three years has not resulted in any reduction of win potential, and there is no good reason to advise them to change their current strategy.” What? That is quite a leap of logic, after you just get done telling us that issuing the IBB resulted in more losses than not issuing the IBB. Again, I am conceding that the times that the IBB was not issued may not generate as many wins because of the quality of the batters and because of the fact that they all started with a 0-0 count, whereas, as Peter mentions, sometimes the IBB starts with a hitter’s count (although I have always said that it is OK to issue IBB’s when the count goes in favor of the hitter, certainly more OK than with an 0-0 count). How many fewer wins? I have no idea, and neither does Peter? Is it more than the 2 extra losses? Again, we have no idea. How can he then say that it is likely that the IBB’s have not resulted in a reduction of win potential. Even if it is a total of a half win for all 3 years, why do it? To avoid criticism by the fans and media? Maybe, but that is a separate issue! I tell them what to do to win more games and they decide what to do in the context of their jobs. It is not my job to tell them whether and when to sacrifice WE in order to cover their asses.
2) In the group where the pitching team is ahead or tied, they issue the IBB 42 times and lose 23, or 54.8%. When they don’t issue the IBB, they lose 373 out of 654 times, or 57%. So it looks like they are better off by 2.2%, assuming everything else is equal, which it likely isn’t, but that is another story. 2.2% in 654 and 42 games. Hmmm. One standard deviation due to chance for the difference between two samples, 654 and 42, is 7.9%! So the difference he found is significant at the .28 sigma level. I am not a big fan of deciding that something is “significant” or not based on some artificial and arbitrary cutoff of 2 or 2.5 sigma, but you cannot draw any conclusions whatsoever from a difference that is significant at the .28 sigma level! Peter, you should have immediately told the readers that, “Absolutely no conclusions can be drawn from this small sample of data, and the results that I got (a difference of 2.2%),” or you should have stopped dead in your tracks and used a larger sample size. Are you kidding me? If we take the whole sample, combining the two groups, behind in the game and ahead or tied in the game, we have 92 IBB and 1052 non-IBB, any difference having a standard deviation due to chance of 5.4%. With such a large standard deviation, you are simply not going to be able to find anything of interest/significance in 92 IBB! That should be the end of the discussion.
3)Finally, I agree with Peter that when comparing the batters when issuing the IBB and not issuing the IBB there is some chance that when not issuing the IBB the run/win potential is lower than if you allowed the IBB batters to hit away (maybe – even though the batters IBB’d are likely better than the batter’s not IBB’d, it is also likely that the next batter is better in the non-IBB group - and both batters count!). I also agree that when the count goes in the hitter’s favor, it is more likely to be correct to issue the IBB (a separate issue, though – he and I certainly could have just looked at IBB’s at the beginning of the PA and avoided that issue). However, his methodology only “works” if everything about the two teams and the game situation is the same. If it is not, you obviously cannot compare WP. And in only 92 IBB situations, there is a pretty good chance that the teams, pitchers, environment, etc. is not going to be the same, just by chance alone, let alone any systematic biases that may exist that render the WE inherently unequal when the batter who is either IBB’d or not IBB’d steps up to the plate. Finally, you can absolutely not do what Peter did (compare the actual WP when a batter was IBB’d and when he wasn’t) unless you make sure that there are an equal number of tie games, games where the pitching team is up or down by exactly 1 run, 2 runs, etc. You cannot just break everything down into 2 groups and then compare the actual WP. And remember, there were only 92 IBB’s! What if there were 23% tie games in the IBB group and 28% in the non-IBB group? You can’t possibly compare the WP of the two groups if that were the case, unless you compared them “tit for tat,” and then your individual sample sizes would be tiny. As I said, there needs to be the same percentage of tie games, games where the defense is up by 1 run, down by 1 run, etc, in the two groups. In fact, it is likely that the exact distribution of run differential is different when a team IBB’s and when they don’t. If that is the case, and I don’t know that it is or it isn’t, the kind of analysis that Peter does is going to be worthless.
Sorry, but while I was not crazy about my analysis, and Peter is on the right track with looking at actual WP, I just don’t see how Peter’s analysis is a rebuttal to anything.
And please, let’s be clear about one thing! My analysis indicated that the RE in the 6th inning or earlier, after issuing an IBB was quite a bit higher, and in fact the actual runs score was quite a bit higher (and almost exactly what we would have expected using various mathematical models), than the runs that we would have expected to score had those batters not been walked. Whether that means that the WE was higher without the walk, I don’t know. I suspect that it was, but I don’t know.