Friday, May 06, 2011
Re-ordering batting streaks
It’s an interesting exercise, and well-presented. I think the most likely explanation is that the PH and non-PH games were intermingled. What do you guys think?
Buy The Book from Amazon
It’s an interesting exercise, and well-presented. I think the most likely explanation is that the PH and non-PH games were intermingled. What do you guys think?
Interesting paper. They seem determined to present evidence that a hot hand exists no matter what the data says. They present lots of excellent reasons for the disparity between the number of real-life streaks of various lengths and that expected from their sims, yet they still proclaim that it is likely that the hot hand exists as well. I don’t see that in their data. At least they have not convinced me of it from their data. And I don’t know why they dismiss some of the things that should not be dismissed. For example, they say that playing a 3 or 4 game series in a hitter’s park (or playing a series against a team with bad pitching - the spread in the quality of a pitching staff is not randomly distributed among teams) is not enough to produce long hitting streaks. That is simply not true. Anything which makes one game not independent of another will contribute to more streaks (good and bad) of ALL lengths…
McCotter’s findings are largely or entirely a function of the fact that a hitter’s number of at-bats on each day is not independent from the day before. Any clustering of low-PA games means that a random re-sorting of the season will then greatly understate the true probability of streaks. We discussed this at Phil’s site: http://www.blogger.com/comment.g?blogID=31545676&postID=1208540960185471986.
Pinch hitting is indeed a big part of the problem. Toward the end of the paper he reports the results of an analysis using only games in which the player started. Just doing that, fully 82% of the “surplus” 20-game streaks disappear, and for 15-game streaks there are no longer any extra streaks. (But a significant gap persists at 30 games.)
But there are other ways in which PAs are not independent. A player with a nagging injury may get pulled early for defensive reasons several games over a short period, for example. Most important is linep placement: a hitter who changes lineup position during the season will have a greater chance of a streak than this method estimates. A couple of examples:
* The year B. Santiago had a 34-game streak, he batted mainly 6th or 7th in April and May and averaged 3.76 PA/G. Then he moved up to 5th, and soon thereafter started his streak, during which he average 4.08 PA/G. But if you go back and randomly mix up Santiago’s games, you’ll underestimate the chance of his streak.
* Chase Utley (2006) batted 4th or 5th the first 17 games of season, averaging just 4.1 PA, then moved to the 2nd slot where he averaged about 4.7. He had 9 0-fers in those first 17 games, a much higher rate than rest of season.
For longer streaks, you also have to assume that managers take steps to insure the maximum number of PAs. The streaking hitter will never be removed early in the game (unless he has a hit), and might even be moved higher in the batting order. But I think the bigger factors are those not directly related to the streak.
Excellent point Guy! So, in the games in which they had their streak (plus the game after natch), how many at bats per game did they have? And show it by 20-29 game streaks, and 30+ game streaks. We should see a slight uptick in the latter set, right?
Yes, the paper mentions that hitters have more PA/game during their streaks than in the rest of that season. The author sees this as evidence that managers reward streaking hitters with more PA, which surely does happen late in the longer streaks. But mainly he’s reversing cause and effect: long stretchs of high-PA games cause hitting streaks, not vice-versa.
Actually, it would be more accurate to say long stretches WITHOUT low-AB games is the key. It’s the 1 AB and 2 AB games that are real streak killers. Here are probabilities of a 20-game streak for a .300 hitter averaging 4 ABs, but with different AB configurations:
20 G @ 4 ABs: .0041
4 @2 AB, 8 @ 4 AB, 8 @ 5AB: .0017
Here’s a way the author could figure out if opportunities—rather than a hot hand—explains the results, using his existing database. Use only games where the hitter started (could also throw out all seasons by <.280 hitters, since it’s the long streaks we care about). Looking at the 30-game segments, calculate the percentage of them that contained two or fewer games of <3 ABs—i.e. good streak conditions. Then do the same for the randomized segments he created. I am quite sure that the percentage will drop in the randomized segments. You could calculate how much the likelihood of a 30-game streak is reduced by inclusion of each <3 AB game, and estimate how much of the “extra” hit streaks are explained by the disparity.
I also wonder how bias umpires and scorers are when a hitting streak is on the line - I imagine that questionable calls tend to trend towards keeping streaks alive.
David/#7: That seems plausible. I think someone did research errors and found hitters in middle of long streaks had fewer than expected ROE. Does anyone remember that?
But I think one mistake people make in thinking about why there are “too many” streaks is focusing only factors that presume a significant streak is underway (like official scorers giving credit for a hit on an error, or managers giving a hitter more PAs). The vast majority of potential long streaks are “lost” in game 6, or game 9, or game 12. I doubt that scorers, managers, or umpires are influenced in any way until a player has a streak of at least 20 games, maybe more. So the main explanation has to lie in other factors (mainly, the number of ABs being correlated from one game to the next).
I don’t think it’s just a question of errors though. There are far more opportunities for umpires to call close strike threes as balls and bang-bang plays as safe when a streak is on the line.
Now I can’t say that they do favor the streaking player (and if it is even a conscious decision if they do), because I haven’t look into it but I would not surprised if the data is fairly dirty.
I bet the impact on umps is quite small. Now, if a guy had a 55-game streak, was 0-for-2 in the 6th, and had a 1-2 count, will he get a break on a borderline pitch? Maybe (unless the ump is a huge DiMaggio fan). But below 40-game streaks, and prior to late innings, I doubt there’s an effect. Nothing you could measure, anyway.
May 25 11:02
Do pitcher’s reach back for velocity when needed?
May 25 10:58
Rooting for laundry
May 25 10:14
Largest demonstration in Canadian history?
May 25 09:39
What sabermetrics is NOT
May 25 06:39
Lack of hustle during a game
May 25 02:38
NFLPA lawsuit against collusion
May 25 01:43
Neal Huntington’s best moves
May 24 17:04
Firefox, IE, or Chrome?
May 24 12:07
How to beat the shift
May 24 11:11
Incredible story
I just read the article a couple of days ago and, while I want to re-read it to make sure I didn’t miss something there were a few thoughts I had
1) Health is something which is not iid. If you happen to stay healthy and avoid little injuries for a period you will probably be better during that period than you will over the year on average. This seems like it could bring about the effect but I’m not sure how you could test it.
2) If they are changing their approach that is something we should be able to test for, specifically. As their average went up did their SLG or OBP go down in exchange? They would have to be making a trade-off, right? Either way it should show up in a decreased wOBA and you should be able to show this. I suppose they could be hitting a different but equal equilibrium where their average is higher and their wOBA is exactly the same but if it is actually higher then they should always choose to play that way, if they can do that.