THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, October 31, 2006

Winning Streaks

By Tangotiger, 01:00 PM

Dackle gives us some great research.  Suppose you have a team that went 0-10.  And in the 5-games preceding that 10-game run, they were .358.  What would you expect their record to be after the 10-game run?  And suppose you have a team that went 10-0, and in the 5 preceding games they were .610, what’s your expectation for the games following the streak?  In short, does the hot/cold hand continue, or will teams simply revert to their previously established levels?

Then, dackle also shows the record game-by-game, leading up to the runup.  In essence, when you are hot, you are hot… until you are not.  There’s no runup, there’s nothing for us to see that the run would start, nor that the run would stop.


#1    MGL      (see all posts) 2006/10/31 (Tue) @ 14:42

Yawn.

I did team hot/cold hand research 20 years ago and found nothing as well, which is exactly what I would have expected.

You can print this study (which is sound) on the front page of every newspaper and magazine in the world and still you would hear nothing but globs and globs of gushing from fans, commentators, the media, players, coaches, managers, FO personnel, etc., about how so-and-so team is hot or cold with the clear implication being that it has significant predictive value.

It is worth noting, BTW, that the reason the post-game records for the cold teams tend to be a little worse than the pre-game records and vice versa, is that the true strength of the team is the pre-season record PLUS the record during the streak itself and of course we expect (assuming no predictive value to the streak other than bad teams have more bad streaks and good teams have more good streaks, or, teams with bad streaks tend to be bad teams, etc.) the post-streak record to be exactly equal to the pre-streak record plus the “in-streak” record with a little regression on the in-stream record thrown in, since we are selectively sampling teams that have had good or bad streaks.  (BTW, the pre-streak records need no regresson as they are randomly (not selectively) sampled.

Sports is not important enough in the scheme of life for sabermetric research to have much impact on the “false and misleading ways” in which people, even intelligent ones, interpret such data as hot and cold streaks.

IOW, studies like Dackle’s can be published, books like The Book can be written, and still, 50 years from now, sports commentators will be saying all the stupid things they say now, maybe even moreso. It is not like discovering that the world is not flat or that the Earth revolves around the Sun. No one cares (but sabermericians and some scientists) that hot and cold streaks have little predictive value or that there is not much of a clutch hitting skill or that the best batter in the lineup should usually not bat third or that certain players should or should not attempt steals or bunts at various times, etc.  In fact, not only do most people not care, but knowing the truth would actually make sports and life itself less interesting for them.

For me and for a really small minority of people, life is more interesting when I discover and sometimes reveal objective truths.


#2    Guy      (see all posts) 2006/10/31 (Tue) @ 15:28

I think the persistence of these ideas reflects something other than people being “stupid,” or even ignorant.  Human intuition seems to consistently lead us to certain conclusions that contradict correct probability analysis.  The people who do behavioral economics have documented a number of these.  Part of what is happening is that our brains are wired to look for patterns in whatever data we’re exposed to.  We see patterns even when they aren’t in fact meaningful statistically. 

Why would the human brain do this?  One possibility, I think, is that in evolutionary terms a creature who looks for patterns probably survives more than one who doesn’t.  (Standing in field during thunderstorms bad, same for petting saber-tooth tigers.) Sometimes the pattern is wrong and we’ll act incorrectly.  But we’ll probably do a lot better survival-wise than someone who doesn’t perceive and act on patterns.


#3    MGL      (see all posts) 2006/10/31 (Tue) @ 23:17

Stupid or ignorant is really for lack of better words or simply somewhat arbitrary terms.  In any case if I call someone or something stupid or ignorant, what does that really mean?  Nothing without qualification.

I also agree that since the kind of thinking we are talking about is so pervasive there must be some evoluationary advantage to it or at least there was at some point in time.  As I said before, if nothing else it probably makes life more interesting to most people.  Far be it for me to speculate or theorize beyond that.  I either cut or slept through my anthropology and sociology courses in college if I am even quoting the right disciplines.


#4          (see all posts) 2006/11/01 (Wed) @ 12:08

I tread this ground with apprehension, but let’s get metaphysical: I think the thinking that informs streaks/clutch/etc is, in fact, utterly important to the world because we do it in every facet of our lives.

If a woman contracts breast cancer and dies at age 48, the questions we frequently hear are “Why did it happen to her?” “Why did she deserve this?” or “Why did this happen to someone so young.” The truth is that statistically and probabilistically it happens. There is no why. But we ask these questions yearning for the connection to some truth to explain what is simply a function of life: events happen. And it happens in probablistic ways and can be described by statistical means. But we’d rather seek a character flaw, a behavioral flaw, or even a dietary flaw to explain illness---or any “bad” event---because we can comfort ourselves in the illusion that it maybe, probably couldn’t happen to us.

Or take addiction, the ultimate losing streak.  Our country reviles drug addiction, but in reality addicts are no better or worse than anyone else. They’ve made mistakes (don’t we all?), but that doesn’t make their innate character inferior to someone else’s. How can I know this is true? Because people recover from addiction to lead productive, moral, compassionate lives. If they were bad people to begin with, and if character mattered so much as we all say it does, then every addict, being assumed to be of poor character, would fail to recover.

You could make similar statements about mental illness, incarceration, or even inverse arguments about white-collar crime.

What I think happens in these scenarios is that no one wants to recognize that bad things happen...to everyone. That no one gets a break, but that some people do endure worse because that’s the nature of an essentially randomly assorted world as opposed to a question of character.

So instead of recognizing that the world happens pretty randomly, we ask ourselves what someone did to deserve a bad break (or a good one!), and we take a comparitive inventory of our sins and values versus theirs.

I think a probabilistic view of the world that says, “yes, anything can happen to anyone, anytime” might increase the compassion we have for others rather than push us toward assuming we have a moral oneuppence on people with problems. And that’s why I, for one, think this stuff is important to the whole world. Yeah, it’s a leap, and, yes, I’m an idealist, but it’s there for me at least.

Sorry if no one wanted to go in this direction or if it’s too heavy or whatever.


#5    tangotiger      (see all posts) 2006/11/01 (Wed) @ 13:13

I don’t agree the jump in Dr. C’s drug addict scenario.  I would bet that a drug addict’s character is inferior to a non-drug addict.  It’s not to say ALL drug addicts are worse than ALL non-drug addicts.  Other than saying “ALL people are human”, the qualifier ALL never works.  But, the average drug addict is likely inferior to the average non-drug addict, ceteris paribus, character-wise.  Note my qualifier.

And while the events around us are pretty random, our choices are certainly not.  However, the point is, can we find this before it happens?  Can we figure out who would run into a burning building to save the baby?  Can we figure out who is more likely to hit the game-winning HR?

We all think we see patterns, but most people wouldn’t bet on it.  Certainly not without hedging their bets.


#6    Guy      (see all posts) 2006/11/01 (Wed) @ 15:28

While I think there’s something to the idea that humans feel a need to attach “meaning” to things that may in fact be random events, I was trying to get at something a bit different.  There are sabermetric insights that I think casual fans will be open to considering if/when they are exposed to them.  Example:  the player with the highest BA is not necessarily the “best hitter.” While a fan may have been taught to overvalue BA, there’s nothing intuitive about that.

In a separate category are issues where our intuition about the world is simply wrong.  Most people believe, unless they’ve been taught otherwise, that a coin that comes up heads 5x in a row is more likely to come up heads on the next flip. Same thing for a ‘hot’ team.  In the real world, that’s probably a good instinct—without some assurance it’s a fair coin, the chances of a heads is probably pretty high.  If 3 of the last 4 people I know who’ve gone to a particular neighborhood have been mugged, is it prudent to avoid the neighborhood, or say “bah, an N of 3 means nothing?” Put me down for “avoid.” But that instict will lead us astray at times.  And people DO bet on improperly perceived patterns, Tango:  chasing hot stocks, for example.

Bottom line:  I think it will be difficult, if not impossible, to persuade the general public there’s no such thing as a hot player or team.
But that doesn’t mean lots of other misconceptions can’t be overturned.


#7    tangotiger      (see all posts) 2006/11/01 (Wed) @ 15:34

Stocks: that’s still hedged, since they won’t put all their money on it.  If they chase a volatile hot stock, they may be inclined to put some of the money in something safe.

In any case, it’s like flipping a coin.  Whether they chase a runup or not, they’ll still be at the same point compared to betting on the index.

The mugging is different since there are so few muggings.  You have a built-in prior of say one mugging per year in a neighborhood.  If you see 3 in one week, that’s significant.

Coins: I suppose if it’s a stranger, and you see heads-after-heads in a row, your prior on “chance it’s weighted” goes up significantly after each flip.  If it’s your friend, your prior on a weighted coin is close to nil.


#8          (see all posts) 2006/11/01 (Wed) @ 17:29

MGL (comment 1),

I’m not sure I agree with you that you’d expect the post-0-10-streak records to be worse than the pre-0-10-streak records.  Both sides are selectively sampled based on the 0-10, so they should be the same.  It seems to me that the fact that one happened chronogically earlier and the other happened later should have no bearing.

Might another reason for the effect be that 0-10 teams are more likely to have had an injury during the 0-10, which would carry forward but not backward?


#9          (see all posts) 2006/11/01 (Wed) @ 17:30

Another interesting thing is that in the second chart, every pair of numbers adds up to less than 1.000.  You’d expect that to be the average, with half above 1.000 and half below 1.000.  What’s the deal there?


#10    Guy      (see all posts) 2006/11/01 (Wed) @ 18:25

Tango:
Small investors, as a group, significantly underperform the market as a whole.  They buy and sell at the wrong times, doing literally worse than Marcel throwing darts. 

I’m sure people would also make huge mistakes in purchasing insurance—home, life, health—without insurance agents and government regulation to provide guidance and protection. People greatly overestimate some risks, underestimate others. 

I’d be interested in hearing from any math teachers who frequent this board, but in my experience teaching probability is very hard to do, and that’s because much of it is not simply not intuitive, it’s counter-intuitive.


#11    Dackle      (see all posts) 2006/11/02 (Thu) @ 13:10

Just wanted to chip in with an explanation of why the #s in the second table don’t add up to .500. I took a little shortcut when I worked out the average winning percentage at each 10-game level—rather than totalling the # of wins and dividing by the # of games, I just averaged the w% at each 10-game integer. For example, five games before the 10-game run, the records for the cold teams were:

0-10… 442- 741 (.374)
1-9 ...  2629- 3886 (.404)
2-8 ...  8547-11054 (.436)
3-7 ... 18437-21392 (.463)
4-6 ... 28484-31116 (.478)
Total 58539-68189 (.462)

Instead of using the .462, I went (.374+.404+.436+.463+.478)/5=.431

Sorry about that, this was a study I banged off a few months ago and scribbled down in a notebook for posterity (I can be a bit lazy with myself but not for other people). Here’s the table using total wins/total games rather than average w%. The scale changes but the effect is the same --

4-6 or worse 6-4 or better

5th game before .462 .535

4th game before .462 .536

3rd game before .462 .538

2nd game before .460 .538

Game before .458 .541

10-game run--------------------------------

1st game after .457 .541

2nd game after .459 .540

3rd game after .462 .538

4th game after .464 .537

5th game after .463 .536

The other factor to consider is that 5-5 teams are being excluded, and for whatever reason, 5-5 teams were slightly above average (in first table they showed as .504 before, .501 after), so that drives the average of the 4-6 or worse and 6-4 or better teams slightly below .500. When 5-5 teams are included, then the total becomes .500. It’s also worth noting that the study basically includes game #15 to game #157 for most teams (at the beginning of the season, a set of five games for the “before” column, and then the first 10-game run. The last five games of the year are reserved for the “five games after” the given 10-game run). In some cases, team A is playing its 16th game of the season, team B is playing its 14th, so there’s a corresponding debit at one 10-game integer with no corresponding credit.

Also, for the first table I think the 19th century teams might be driving up and down the 10-0 and 0-10 record. If you win 20 games in a row, then you’ve got a lot of 10-0 records and also a 5-0 record in the next five games.


#12    awsytn      (see all posts) 2006/11/02 (Thu) @ 14:29

I will say that I find a lot of the appeal of baseball to be in its randomness, and that like life, a lot of what people think they know to be true about it simply isn’t the case.


#13    MGL      (see all posts) 2006/11/03 (Fri) @ 00:29

Phil, those selective sampling issues can be tricky.  I’d have to think about what I said and what you said and maybe even run it through a mini-sim, which is what I often do to solve those tricky selective sampling problems.  IOW, you might be right.  It sounds like you are not 100% sure, or you were just being nice to me.  I actually think you might be right (that the pre and post streak records should be exactly the same), but again, I have to think about it, or, as I said, run a sim of many teams with various true talents and then look at all 10-0 and 0-10 streaks and look at pre and post-streak records.  If I run enough monte carlo trials of course I will get the correct answer.


#14    tangotiger      (see all posts) 2006/11/03 (Fri) @ 05:04

I agree with Phil that the pre and post must be exactly the same.  Since you selected on the 10-0 or 0-10 sample, that no longer bears a part in the talent level.

In short, your choices are:
1. all I know is that the team is 10-0.  In order to determine the true talent (i.e., pre and post streak), you regress an appropriate amount. 

In this case, the regression equation is x/(x+G), where x=40, and G=games played in streak (i.e., regress 80% toward .500 with 10 games, or regress 50% toward .500 with 40 games.)

2. of the teams that were 10-0, I know that they played .600 before the streak.  That is their true talent, and will play .600 after the streak.

***

Both come with an uncertainty level, of course.

***

Interestingly, the standard regression equation, regardless of streak or not (i.e., just choose a random sample) is:
regression = x/(x+G), where x=69.

That is, if you have 69 games, you regress 50% toward the mean.  In dackle’s sample, you’d regress only 37%.  It seems then that the streaks contain more information than a random sample of games.


#15          (see all posts) 2006/11/03 (Fri) @ 07:49

Hi, MGL,

Both ... I’m only maybe 95% sure, and I’m also being nice!


#16          (see all posts) 2006/11/03 (Fri) @ 08:19

Tango,

Could the difference between the denominators of 40 and 69 just be random chance in the sample?

I ask this because I can’t think of any decent reason why they should be different.  (That doesn’t mean there isn’t one.)


#17    tangotiger      (see all posts) 2006/11/03 (Fri) @ 09:05

I wouldn’t be surprised if it is.  The 69 is pretty solid.  If we applied that to the equation at 10 games, it would say the regression amount to be 87% instead of dackle’s sample of 80%. So, a 10-0 team would be a true .565, instead of the expected .600, or the empirical .610-.620.  Empirical based on only 1031 games.

The 6-4 streaked team should regress to .513 using the “69” rule, .520 using the “40” rule, and .522 using empirical.  There’s 60,871 games in the sample, giving us 1 SD = .002. 

It’s possible that what a “streak” team denotes is a healthy team or disabled team.  And so, it contains more information than a random set of games.  It’s possible that a team that goes 7-3 means that we “know” Santana is pitching and healthy, and so, we expect that team to be a bit better.  Just speculating.


#18          (see all posts) 2006/11/03 (Fri) @ 09:29

Tango,

All makes sense.  Your first sentence implies that you think it’s *not* random, right?  Because that’s what I conclude from your calculation.


#19    tangotiger      (see all posts) 2006/12/22 (Fri) @ 22:30

Thanks to the Hot Hand for linking to this blog, and recognizing dackle’s work:

http://thehothand.blogspot.com/2006/12/every-so-often-one-hears-reference-to.html


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 17:58
Clutch analogy

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 11:54
Who is Jeremy Lin?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul

Feb 10 18:32
Moneyball at Villanova

Feb 10 17:00
Psst… wanna intern in Canada?