THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, March 17, 2009

Being Behind is a Good Thing (Part II)?

By , 07:12 PM

Followup to part 1.

OK, I have halftime stats for all NBA games from 2001/2002 to 2006/2007.

That is not a whole lot of games to be breaking things down by halftime point differential, but here goes:


diff at half time N games final WP
0 203 .500 (of course)
+1 472 .456
+2 413 .557
+3 412 .604
+4 403 .598
+5 390 .664
+6 387 .698
+7 368 .750
+8 314 .717
+9 310 .803
+10 or more 1899 .888

What do you know!  The same effect!  Assuming that the expected wp is around .525, the sample wp is 6.9 percentages points less, which is over 3 standard deviations for 472 games.

However…

If we look at the same table broken down by the home and road teams, we see a strange effect which puts some doubt on the authors’ conclusions as to the reason for the effect (that when a team is down by a little at half time, they put out more effort in the second half, particularly in the first few minutes of the second half.  And they give us a lot of experimental support for that psychologically-based thesis.

diff homegames home WP roadgames road WP
0 203 .562 203 .438
+1 224 .545 248 .375
+2 240 .633 173 .451
+3 226 .668 186 .527
+4 226 .664 177 .514
+5 225 .716 165 .594
+6 216 .741 171 .643
+7 224 .808 144 .660
+8 165 .758 149 .671
+9 192 .813 118 .788
+10+ 1238 .920 661 .828

So what is likely going on, as suggested by the table above?  Well, for some reason the road team in the NBA, at least in this sample, tends to be up by 1 at the half far more than they should be.  In fact, the road team is up at the half 248 time and the home team is up at the half only 224 times.  Obviously since the home team is around 5 point better for the whole game, they should be up at the half more than the road team is for any point differential.  That is the case, other than at 1 point. I am not sure why that is, but it probably has something to with the strategies employed by the home and road teams, resting players, etc., especially in a close game.

So what is happening, again, at least according to this limited set of NBA data, is that when a team is up by one point at the half, on the average, they are the worse team!

So of course they will have a losing record for the game, regardless of the energy or effort put in in the second half.  I suspect that the same thing is happening in NCAA.  Did the authors break down the data by home and road teams?

The same effect could be true when you take into consideration the relative strengths of the two teams, independent of their home/road status.  For some reason, the better team, again, independent of home/road status could be behind by one point at the half more often than the worse team.  And again, they will tend to win the game more than 50% of the time, even being behind by 1 pt, since they are the better team overall. Or maybe that is not the case, independent of home/road status.  Maybe there is something fundamental about being the home team that makes you down by 1 point more often in the first half.  Can anyone think of a reason why that might be true?

Let’s look at points scored and allowed for the first and second half for home and road teams as a function of the various point differentials at half time.

It is too much to put in a chart, so I will summarize in words.

Tied at the half, the home team and road teams score 48.4 in the first half.  In the second half, the home teams score 49 (includes overtime - that is why it is 1.6 pts more than 1st half) and road team scores 47.7.  So home team outscored the road team by 1.3 pts in the second half.  You would normally see the home team score around 2 to 2.5 points more in the second half, but when a game is tied at the half, it suggests that the road team is better than the home team, not counting the home/road status of both teams.

With a 2 point lead at the half, the home team scores 49 points in the first (same as in a tie game) and scores only 49.3 in the second half.  What happens is that the bigger the lead by the home team in the first half, the less points they score, relative to the first half, in the second half.

For example, with a 5 point lead, the home team scores 49.7 in the first half and only 48.7 in the second.  With a 10 or more point lead, they score 55.7 in the first and 47.3 in the second.

The reason for this is two-fold.  Teams that have a lead in the first half will have tended to have gotten lucky in scoring and thus will regress in the second half, and teams with a big lead at the half will tend to have a big lead late in the game and will start resting starters and slowing down their offense.

Anyway, with a 2 point lead, when the home team is ahead they score 49.3 in the second half and allow 47.9.  With a 2 pt lead at the half by the home team, the teams tend to be about the same strength, not counting home/road status.

In a tied game, the home team scores 49 in the second half and allows 47.7.

I am purposely leaving out the 1 point lead for a minute.

With a 3 point lead, the home team scores 49.1 and allow 48.4.  So they score a little less and allow a little more than with a 2-pt lead, as we would expect.

So nothing unusual so far in tied games and when the home team is up 2 or 3 pts.

What about the road team?  Obviously in a tie game, the stats exactly mirror the home team.

With a 2 pt lead (now the road team is a bit better than the home team, not counting home/road status - but still worse of course if you include home/road status), the road team scores 46.5 in the second half and allow 49.2.  In the first half, the road team scored 49.1 and allowed 47.1.

With a 3 pt lead, the road team scored 48.5 and allowed 45.5.  In the second half, they scored 47.0 and allowed 49.5.  Around the same second half differential as with a 2 pt lead.

So to summarize so far:

In a tie game, the differential in the second half is 1.3 pts in favor of the home team.

Then the home team is up by 2 points, they outscore the road team by 1.4 pts in the second half.  When up by 3, they outscore them in the 2nd half by .7.  When up by 4, it is only .3.
Again, we see a diminishing point differential in the second half as the differential increases at the half, because of the fact that they will rest their starters if they have a big lead at the end.

The road team when up by 2 at the half will score 46.5 and allow 49.2 in the second half for a differential of -2.7.  So, interestingly, the road team will do worse in the second half, when leading by 2 as opposed to a tied game.  You would expect the opposite as a road team that was ahead at the half by 2 pts should be a better team than a road team that is tied.  But then again, maybe it is because they will also tend to have a large lead at the end more often than if the game were tied at the half, but I would think that that effect would be very small.

Anyway, if the road team has a 3pt lead, they are outscored in the second half by 2.5 points.

With a 4 pt lead at the half, it is -3.3.

OK, so what about with a 1 point lead?  The $64,000 question.

The home team with a 1 pt lead scores 48.3 in the first half and of course allows 47.3.  That is around the same number of pts scored as in a tie first half game (that was 48.4).  In the second half, the home team scores 49.3 and allows 48.1 for a differential of 1.2. That is almost exactly the same as in a tie game (1.3 differential in a tie game)! They score and allow a little more for some reason with a 1 pt lead than in a tie.

So when the home team has a 1 pt lead, there is no discontinuity between the tie game and a 2 or 3 pt lead.

What about the road team?  With a 1 pt lead, they score 47.7 in the 1st half and allow 46.7 of course.  That is a little less than in a tie game or a 1 point game for some reason (or no reason at all - remember we are working with relatively small samples of games).  In the second half, they score 46.9 and allow 50.5, a differential of 3.6 pts.

Remember that in a tie game, they score 47.7 and allow 49.0 in the second half.

So it looks like everything is good except that they allow an inordinately high number of points in the second half - that is, the road team with a 1 pt lead at the half.

With a 2 pt lead, remember they (road team) score 49.1 in the first and then score 46.5 and allow 49.2, for a differential of 2.7. With a 3 pt lead, the second half differential is 2.5.  4 pts, 3.3.

So, to really summarize:

Road team

Pts 2nd half diff
tie game -1.3
1 pt lead -3.6
2 pt lead -2.7
3 pt -2.5
4 pt -3.3
5 pt -2.5
6 pt -2.3
7 pt -3.8
8 pt -3.8
9 pt -2.8
10+ pt -4.4

Home team

Pts 2nd half diff
tie game 1.3
1 pt lead 1.2
2 pt lead 1.4
3 pt .7
4 pt .3
5 pt .7
6 pt -.3
7 pt 1.0
8 pt -.9
9 pt -.5
10+ pt -.9

So, you can see an effect (disconnect) only when the road team is up by 1 point and if you look at the points scored and allowed, you can see that is in in the points allowed by the road team when they are up by 1 at the half.  They allow an inordinately high number of pts in the second half.  That could be because of more effort by the home team, less effort by the home team, or some complex strategy thing (where “strategy” includes things like personnel on or off the court).

In addition, the whole effect is magnified because, again for some reason that I am not aware of, the home team tends to be down by a point at half time more than one would expect.

#1          (see all posts) 2009/03/17 (Tue) @ 21:26

Very, very interesting ... great work!

I have no idea what’s going on here.  Anyone?


#2    ubelmann      (see all posts) 2009/03/17 (Tue) @ 21:29

This is interesting.  I took the data from your first table, and assuming that the percentages can be treated as binomial variables with variance p*(1-p)/N, I did a linear least squares fit for the home and road data separately.  The reduced chi-squared for the road data is 1.06 and the reduced chi-squared for the home data is 0.83.  So the data looks awfully linear to me.

The one thing I will say is that the +1 point for the road team is about 1.5 SD below the fit line and the +1 point for the home team is about 1.3 SD below the fit line.  Combined with the NCAA +1 data point being below the fit line, it starts to look like a pattern.

So it might be a real pattern, but I’m still shaking my head at their decision to fit a quintic through 11 data points.


#3    Roger Freed      (see all posts) 2009/03/17 (Tue) @ 22:57

Really nice work, MGL.

That study was a headscratcher.  I knew the results came out as statistically significant, but I ascribed it to the old “even something significant at the .05 level will happen 5 times out of 100 by chance.” By dividing up the NCAA results by every possible lead (-10 to +10) you’ve created 21 possible data sets, and, lo and behold, you’d expect to find spurious “significance” in exactly one of those data points ... and that’s what they found.

But MGL, your explanation slices and dices it in a more interesting manner. I think the home/road thing for the NBA data certainly leads to a more intuitively acceptable explanation for where the real quirky phenomenon may reside.

And let me offer another possible explanation for the “road team with a 1-pt. lead underperforms offensively in the 2nd half” phenomenon.  Officiating.  I wonder if the same discrepancy shows up in number of free throws (home vs. road team) in the second half?


#4    MGL      (see all posts) 2009/03/17 (Tue) @ 23:31

Remember we are talking about a “blip” at a 1 pt lead for the road team only.  So I don’t think that officiating can have anything to do with it.

ubellman, if you construct a least squares/best fit line around the home and road teams results combined, you may find that one result at a 1 pt lead is 1.3 SD from the line and the other is 1.5, but I don’t think you want to do that (construct a line from both sets of data - home and road).  It is clear from the data, I think, that the only anomaly/blip is the road team being up by a point at the half.  The home team being up by a point at the half fits nicely along any kind of a best fit line using home data only, as far as I can tell from eyeballing the data.

BTW, I think that this is a “secret” that has been known by a few NBA half time bettors for a while now, and that the cat is finally out of the bag…


#5    Guy      (see all posts) 2009/03/17 (Tue) @ 23:59

Great work, MGL.  And interesting patterns.  Maybe there’s something here (I’m not yet convinced).  Couple of observations:

The big anomaly is games in which the home team trails by 1. They outscore the road team by 3.6 in 2nd half, much better than in tie games and markedly better than +2 games. (Or, you could say road teams “choke” when up by 1.) On the other hand, n=248, so hard to know how real this is.  When road teams are down by one at half, they win more than we expect but actually perform the same as they do in ties and -2 games.  Maybe they are scoring “when needed,” but seems a stretch.  Here’s a helpful way to look at your data (for me):
Halftime diff / Road 2nd half / Home 2nd half
+4 -3.3 / +.3
+3 -2.5 / +.7
+2 -2.7 / +1.4
+1 -3.6 / +1.2
0 -1.3 / +1.3
-1 -1.2 / +3.6
-2 -1.4 / +2.7
-3 -.7 / +2.5
-4 -.3 / +3.3

Clearly, the home team has a real ability to ratchet up their game in the 2nd half of close games (and not only when down 1).  Whether that’s effort, personnel, strategy, or something else, I don’t know.  It’s worth noting that a 4-point deficit actually inspires more improved performance overall than being down by 1:  the home team does nearly as well as when -1 (+3.3), and road teams really excel (-0.3).

Final thought:  we still see anomalies in the overall win% results other than at +1.  +4 teams win less than +3, and +8 less than +7.  It’s definitely intriguing that MGL also finds an anomaly at +1, but it’s not unique.


#6    MGL      (see all posts) 2009/03/18 (Wed) @ 00:04

Let me make a couple of comments:

1) It looks to me that there is compelling evidence that the “1 point half time blip” is not just statistical noise.  Let this be a lesson for us not to categorically reject or criticize the result of research in sports, especially non-baseball, that does not seem at first glance to comport with out basic models.  In other sports, and probably to some extent in baseball, there are significant strategic and psychological things that make these models more complex than we sometimes think.  It is OK to critique and vett research of course - in fact, that is mandatory - but there is a fine line between properly and responsibly critiquing it, and assuming a priori that its results must be wrong. 

2) While the researchers were correct in pursuing the “more effort when you are down a little but not too much” theory, which seems to be accepted in pyschological circles and has experimental and statistical evidence to support it, I think they jumped the gun a little (or maybe a lot) in accepting that as THE theory to explain the data.  First of all, there are many other potential alternative explanations having to do with strategic and other variations within a basketball game.  Secondly, the jump from college students tapping on keys to professional athletes is a presumptuous one I think.  Finally, and most importantly, I don’t find it particularly plausible that such an effect (extra effort to overcome a small disadvantage) would manifest itself with a 1 point deficit but not a 2 or 3 point one.  There is barely any difference from the standpoint of a basketball player (can you imagine a player saying, “Yeah, we’re only down by one, so let’s come back and win this thing,” and then in another game, saying, “We’re down by 2, why even try since we are going to lose anyway") between a 1, 2 or 3 point deficit.  And even if there were, you would expect to see a diminishing effect the more points a team is down, not an abrupt blip at 1 point.

I have to think there is something significant about the home team being down by 1 pt which has something to do with the structure of the game or how teams strategize at various points in the game.  For example, maybe the road team does not like to go into the locker room tied or down, so they take risks and put in their best players just before half time in order to get a lead.  Maybe in a tied game just before the half, the home team stalls a little because it thinks that it will win the game as long it has a lead or the game is tied at the half.  I am not suggesting that these exact things occur or that they would produce the results we are seeing in the data - only that these kinds of things occur in games like basketball and football (and maybe hockey), but not so much in baseball (although the home team likes to “play more for the tie” than the road team, right?).


#7    MGL      (see all posts) 2009/03/18 (Wed) @ 01:01

Guy and everyone else:

Had this been the seminal study, then yes, we would rightfully be critical of the significance of these statistical anomalies, just as we were critical of the significance of the those by the authors in question.

But, and this is a powerful but, with all due respect to Jennifer Lopez, this is essentially the same result (in the NBA) as the authors found in the NCAA.

That is a completely (and I mean completely) different ballgame.  The chances that both studies produced similar results by chance alone (that the anomalies we see at 1 point half time games are random noise), are let’s say, quite a bit less than if we just had one study to mock.

For Guy to say, “I’m not yet convinced” in light of both studies, as opposed to one, is, well, he must be quite a skeptic.

That being said, some percentage of the hundreds of thousands of statistical studies in history that concluded X based upon finding only statistical significance, even at the 3 or more sigma level, is wrong.  Maybe this is one of them…


#8    Roger Freed      (see all posts) 2009/03/18 (Wed) @ 01:16

MGL, your “that being said” qualifier sets the proper skeptical starting point.

Let’s look at how this type of research is initiated.  Here’s what does NOT happen:

Researcher 1:  “Research on other aspects of human performance suggests that people perform better when trying to close a small deficit than when trying to retain a slight lead.”

Researcher 2:  “Hey, let’s study what happens in basketball games ... we hypothesize that the team that’s behind by 1 point at halftime will actually outperform the team that leads.  Let’s just look at 1 point games and see if the “slight deficit = better performance” hypothesis can be confirmed.”

Nope.  It goes like this:

Researcher 1:  “We’ve got this huge dataset of basketball scores.  Let’s see what kind of number crunching we can do.”

Researcher 2:  “Hey, let’s check win expectancies by various scores at halftime.  Who knows, might not be a linear pattern, or there might be some fun outliers.”

And so the numbers are crunched, and we find one anomalous data point, and suddenly we forget everything we ever learned about statistics, and we say that because 1 out of 21 possible halftime scores doesn’t fall on the curve, we’ve uncovered a very important phenomenon.  After all, we’re 95 percent confident that it is a real effect. This kind of result could only happen by chance one out of 20 times, right?  Oh, wait a minute ...


#9    dan      (see all posts) 2009/03/18 (Wed) @ 01:58

I think what Roger is alluding to is that this may be the black swan.

http://www.amazon.com/Black-Swan-Impact-Highly-Improbable/dp/1400063515/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1237355846&sr=8-1


#10    MGL      (see all posts) 2009/03/18 (Wed) @ 02:44

Roger, that may the case for study #1, but have you completely forgotten already that we are talking about study number 1 AND study number 2 (the NBA)?

I already talked about “curve fitting” when looking at lots of numbers and “cherry picking” the ones that are anomalous. I spoke about that in the initial thread which is about the NCAA study.

Now we have another basketball study which replicates the experiment using another independent data set - the exact scenario in your #1 above.  So why are you bringing up your #2 scenario when we already have moved past that, and in fact, this is the thread on the #2 study, or at least one study in light of the other?

And, BTW, the “curve fitting” scenario you describe in #2 is (properly) used all the time to make initial hypotheses.  Then they are tested against other independent data sets.  That is exactly what we have done here.

If you are criticizing the conclusions of the initial study, belatedly, then you are still a little over the top, as there is some question as to whether this is pure curve fitting or not.  The researchers may have suspected such an effect a priori, and anticipated it showing up when a team was perhaps down by 1,2 or 3 points.  I am not saying that this the case, but if they find an anomaly at 1 point behind, when looking for something for such an effect “with a small deficit,” I would hardly characterize that as observing hundreds or even dozens of data points and choosing those that are an anomalous, which is pure curve fitting and is what you are clearly suggesting they did.  Now, what the researchers were actually looking for, and what brought them to look at an NCAA data set, you would have to ask them.  Given that they study the psychology of economics or something like that, I sincerely doubt that they just happened to be looking at NCAA basketball stats, randomly looking for anomalies, as a handicapper might.


#11    Guy      (see all posts) 2009/03/18 (Wed) @ 06:59

MGL:  I take your point, to an extent.  I was too quick to conclude “this must be statistical noise,” when the correct conclusion based on the first study was “this MAY just be noise” and therefore the case has not been made. 

And I would agree with you that it is highly likely the +1 phenonemon is real, if it weren’t for Brian’s dataset.  Brian’s data shows a +1 team with a .519 win%.  That’s why I remain skeptical, though I concede this may be real. Now, if Brian’s data is wrong, or other datasets confirm that +1 teams win less than tied teams, I’ll readily except the conclusion.

Is it possible that, apart from any halftime effect, a 1-point lead is less than half as valuable as a 2-point lead (because FGAs are worth 2 pts)?  Is it particularly likely that a home team will outscore the visitors by exactly 2 points in the second half, just given the distribution of scoring?  I doubt this is the case, but I’ll throw it out there.

And I am skeptical about the explanation offered by the authors, for some of the reasons you state in post #6.  I think that’s still merited.


#12    Tangotiger      (see all posts) 2009/03/18 (Wed) @ 07:01

The NCAA study had not only the discontinuity at 1/tied, but also discontinuity at 4/3 and 7/6.  And those discontinuities were larger.  See Figure 1A.

MGL also shows discontinuity at 4/3.

I have to believe that the scoring in basketball (1, 2, or 3 points per possession) also has a role to play.


#13          (see all posts) 2009/03/18 (Wed) @ 13:23

My first suspicion, on reading these studies, is that the posession arrow might be contributing to the 1pt anomaly.  But after thinking about it more, this wont be more than a minor factor.  (If the team that is behind was going to get the ball 100% of the time in the second half, and having the ball is worth about +1pt, then this would cancel the lead and we would expect it to be like a tie game.  But the losing team wont have it 100% of the time, maybe they’ll have it more than 50% for some reason, but it wont be a big enough difference to impact things very much).

So it looks like this is just a black swan event, as others mentioned.  Especially given the other anomalies at +4 and +7 as well. 

I suspect that given a much larger dataset these will disappear.


#14    dan      (see all posts) 2009/03/18 (Wed) @ 13:52

The authors seem to be concerned with a different phenomena (the behavioral aspect) than we are (the scoring aspect).


#15          (see all posts) 2009/03/18 (Wed) @ 14:10

The authors showed that, with 95% probability, there is an unusual effect that occurs at a point difference of 1, 4, and 7.  Going from there to saying that this effect is caused by a certain behavioral phenomenon is questionable, imo, especially if they are ignoring the anomalies at 4 and 7.

It seems more likely to me that this is caused by the nature of scoring in basketball (3 is the largest amount you can reasonably score on a play, and the anomalies are 3 points apart).  Each point in football is not equally valuable, going from even to +1 is very valuable, but +1 to +2 is almost meaningless, for example, because you score points mostly in 3s and 7s.  It wouldnt surprise me if there was some similar effect in basketball.

I also wouldnt be surprised if it was just random.


#16    ubelmann      (see all posts) 2009/03/18 (Wed) @ 15:17

MGL/4: I was not the most clear in my initial comment, but I took the home data and did a linear regression, then I took the road data and did a linear regression.  In each separate regression, the +1 differential falls somewhat more than 1 SD below the fit line.  The reduced chi-square still seems reasonably good for fitting 10 points, so looking at each fit individually, it’s hard to say that there is anything special about the +1 data point.  However, since it falls in the same direction on both home and away, and also does so in NCAA, it starts to look more like there might be something.

At this point, I’m not really convinced either way.  There could be an effect, but it might also be noise.  My beef really isn’t that the authors are wrong, just that their data analysis and presentation seems pretty poor, so it’s difficult to say from the paper itself whether or not they are right.


#17    ubelmann      (see all posts) 2009/03/18 (Wed) @ 15:28

Roger/3:  Remember that the authors of the original study don’t even have 21 independent data points.  The winning percentage (p) of the +N differentials is directly correlated to the winning percentage of the -N differentials (1-p).  They only have 11 independent data points in their fit.

I was thinking a little bit more about their quintic fit, though, and because they artificially extended their data to 21 points, it is anti-symmetric about 0, so that ought to constrain some of the parameters in their fit so that they don’t have a full 6 free parameters to vary like you would have for a general quintic polynomial.  I don’t see a goodness-of-fit statistic presented in the paper, but perhaps I have missed it.  Since I see no a priori reason to expect the data to follow a quintic, it would be good to know how well that model actually fits the data.  (I suspect that the model fits the data “too well.")


#18    dan      (see all posts) 2009/03/18 (Wed) @ 15:56

Alex--

The authors set out to test a hypothesis that this effect occurs at +/- 1 point. They created a model that tested this specific point and hypothesis, and the data confirmed their hypothesis. That the effect also shows up at 4 and 7 is a coincidence, since the model used was designed to test only +/- 1.

As sports analysts, we might not like it so much. But the authors weren’t out to analyze college basketball specifically, they were trying to analyze a very specific hypothesis.


#19    MGL      (see all posts) 2009/03/18 (Wed) @ 16:16

I don’t know that the authors set out to look at the wp with exactly a 1 pt deficit, dan.  We definitely don’t know that, unless I missed something in the paper that explicitly says that.  It is not even clear that they set out to examine the effect of a “small deficit” in basketball, period.  I’ll grant them that, though.

IOW, if they set out to investigate the possibility that a small deficit in NCAA basketball leads to greater effort, then I think we can safely assume that they would have drawn the same or a similar conclusion had the effect been evinced in 2 or 3 pt games also.

On the other hand, I agree that the anomalies at 4 and 7 points are straw arguments.  Clearly a 7 point deficit does not fit into their thesis. 4 pts is on the border, although if they found an effect at 4 pts and not at 1,2, or 3, I doubt they would have reached the same conclusion.

So I basically agree with Dan that the anomalies at 4 and 7 pts are pretty irrelevant.

If any of the skeptics want to bet on the result of a new dataset, I’ll take that bet in a heartbeat!  Let’s see you put your money where your mouth is.  If you truly believe with a high degree of confidence that this is noise, and you take that bet, you should be something like a 100-1 favorite to win, no?  So there is no, “I’m not a risk-taker” excuse with this bet offering!

Here’s the bet:  Randomly choose a new dataset (say, NBA, outside of the years I looked at). Look at 1 point deficits, by the home team only (I think that the effect only exists for the home team).  Establish some kind of expected wp from the best fit curve using all the other points.  I’ll take 1.5 standard deviations or more (given the size of the data set) below that and you can take the rest.  If there is no true effect, and this is all noise, you should win 95% of the time, and I 5% (one tail outside of 1.5 SD).  And I’ll bet even money!  That is a bet of a lifetime for you naysayers!


#20    Tangotiger      (see all posts) 2009/03/18 (Wed) @ 16:21

I disagree completely.  What is the effect was much much larger at the 4/3 point than 1/0?

Well, then the reason that that happened could explain the reason of the 1/0 point.

As far as I’m concerned, the authors of the paper cherry-picked the results, and by showing the mirrored data just obfuscated the pattern.


#21    MGL      (see all posts) 2009/03/18 (Wed) @ 16:22

And BTW, I don’t think we see nearly as much of a “diconnect” at 4 and 7 pts as 1 pt, if we look at pt differential, which is better than looking at wp, from a noise reduction perspective, just as we are almost always better off looking at pythag record in baseball rather than real record.

Which data point jumps out at you?

Pts 2nd half diff
tie game -1.3
1 pt lead -3.6
2 pt lead -2.7
3 pt -2.5
4 pt -3.3
5 pt -2.5
6 pt -2.3
7 pt -3.8
8 pt -3.8
9 pt -2.8
10+ pt -4.4


#22    Guy      (see all posts) 2009/03/18 (Wed) @ 16:38

MGL:  the problem I have with your bet is the assumption that the best fit line gives us the right answer for a 1-point game.  I don’t think we know in advance that there isn’t a discontinuity there created by scoring distribution (that most scoring comes in 2 or 3 pt increments).  I’d take a bet that +1 teams win more than 50% of the time.  But I’m too lazy to find and compile the data! 

Ubelmann:  The reason to think this is mainly a home team effect is MGL’s point differential data.  The home team outscores the road team by a lot when down one, but this isn’t true for the road team:
Halftime diff / Road 2nd half / Home 2nd half
+2 -2.7 / +1.4
+1 -3.6 / +1.2
0 -1.3 / +1.3
-1 -1.2 / +3.6
-2 -1.4 / +2.7
On the other hand, road teams do win more than expected when down 1.  So maybe MGL’s assumption that the pythag record is more “real” doesn’t apply.  I don’t think we can assume that in close basketball games, as we do in baseball.


#23    Guy      (see all posts) 2009/03/18 (Wed) @ 16:42

On the other hand, it occurs to me we may be calling this a “home team” effect only because we’re influenced by the study authors’ greater effort theory.  It would be just as true to say that road team choke when they lead by one at the half.


#24    ubelmann      (see all posts) 2009/03/18 (Wed) @ 17:09

Guy/22:  Looking at that table (with the redundant third column), the first thing my heart desires is to know the standard deviation of each point differential.  If MGL could give us that, it should help clarify whether or not this is noise.

Plotting point differential for the home team in the second half vs. point differential for the home team in the first half, it looks surprisingly linear, but it is certainly true that the difference between 0 and -1 is bigger than any other difference and the point at -1 looks like an outlier.

I still really don’t understand why anything is different at -1 than at 0 or -2, but maybe there is something there.  I’m too poor to take MGL’s bet, but I’m always interested in seeing more data.


#25    Tangotiger      (see all posts) 2009/03/18 (Wed) @ 17:14

Here’s another fascinating result.  Using MGL’s data, we see that the home team ends the first half with the lead 3176 times, is behind 2192 times, and is tied 203 times.

Giving them half a win for the ties, the first-half win% for the home team is .588.

However, if the home team starts the second-half tied (albeit this happens only 203 times), their win% is only .562. 

I would think that the true win% for the second-half, starting tied, must be also .588 for the home team.  There could be legitimate reasons that it’s less (fatigue or whatnot).  But, I’d like to see the data over many more years to see if this is true.

Or, it could also go in to the mindset of “hey we are tied on the road… we have a great chance”.  And that leads to only a .562 win%.

Of course, that is under 1 SD, so, likely not true.

I just wanted to bring up the point that the half-game win% for the home team is .588.


#26    Guy      (see all posts) 2009/03/18 (Wed) @ 17:18

OK, I’ll take MGL’s bet for $1,000.  It’s about time someone took him up one of his bets!

Only, that wouldn’t be fair, since I know the outcome (like the authors, perhaps?).  Ed Kupfer just posted some old NBA data here, how teams 1992 to 2002:  http://sonicscentral.com/apbrmetrics/viewtopic.php?t=2142.

Here’s win% for home team in the critical range:
-4 .436
-3 .481
-2 .503
-1 .554
0 .545
+1 .582
+2 .598
+3 .644
+4 .653

I still think it’s possible the curve is not smooth for leads <3 points, but who knows?  In any case, assuming Brian’s data is valid, we have now found a +1 effect in 2 of 4 datasets.  MGL, are we allowed to be skeptical again?


#27    Guy      (see all posts) 2009/03/18 (Wed) @ 17:22

Tango/25:
In games that are tied at the half, the road team is overperforming by 1-2 points compared to our expectation.  So isn’t it likely that they are a little bit better than the home team, and that gives you the .562 win% (assuming this result is confirmed with larger sample)?


#28    Guy      (see all posts) 2009/03/18 (Wed) @ 17:32

Sorry, Ed’s data is for home teams. Now sure how the “how teams” performed.


#29          (see all posts) 2009/03/18 (Wed) @ 18:08

Was 2001-02 a particularly good year for -1 home teams?  I ask because that’s the only year that’s in both MGL’s and Ed’s dataset, and they both show an effect.


#30    ubelmann      (see all posts) 2009/03/18 (Wed) @ 18:24

Guy/26:  The standard deviation for the points you list there is about 0.02, so the third decimal place almost certainly isn’t significant.  If we round to two decimal places, it looks like:

-4 .44
-3 .48
-2 .50
-1 .55
0 .55
+1 .58
+2 .60
+3 .64
+4 .65

Somewhat eye-balling a linear fit line in that region, the -1 differential seems about 1-1.5 SD above where it “should” be given a linear fit.  MGL’s point seems to be above where it “should” be by 1.5-2 SD (taking into consideration that his SD is larger.)

We haven’t seen the NCAA data broken down by home/road, which would be nice because this seems to happen for -1 and not +1, but both of the NBA data sets show an effect that’s in the same direction and by more than 1 SD.

It still doesn’t make any sense to me, but I certainly can’t rule out the possibility that there’s something there.


#31    Guy      (see all posts) 2009/03/18 (Wed) @ 19:00

BTW, here’s the combined home and road win %, using Ed’s data:
Lead/win%
0 0.500
1 0.517
2 0.548
3 0.589
4 0.614
5 0.672
6 0.733
I think there’s something about the home and road scoring distributions that make a one point lead worth very little for a road team.  Something to do with fouls, maybe?

It does look to me like -1 for home team is about +1.5 SDs, if you assume a linear fit.  So MGL may have a draw for his bet.


#32    MGL      (see all posts) 2009/03/18 (Wed) @ 22:03

Guy, I guess you corrected yourself in post #31.  In post number 26, the home team once again does better with a 1 point deficit than in a tie game!

Do we need any more data to be pretty darn sure that this is not a statistical artifact or “noise?”

Again, I don’t know the reason, but I suspect that has little or nothing to do with “the players who are down a little exert more effort” such as in their lab experiment with the students and the keyboard.

Guy, if the -1 point for the home team in Ed’s data set were 1 or 1.4 SD from the best fit point (whatever that might be - and it would depend on which data set we used to draw the best fit curve or line), I would gladly pay off the bet (if you didn’t cheat of course wink), but to say that is a vindication of the “statistical noise” theory is, well, just plain wrong.

When we keep testing data sets for the same effect, and each data set comes up with 1, 1.5, .5, etc. (small) differences from the “expected mean”, all in the same direction, we end up with like 3 or 4 SD’s when we combine the data sets, do we not?

I probably should not have used 1.5 SD as the cut-off point for the bet, since, while I am quite certain of the effect at this point, I certainly don’t know whether the “true effect” is .5 SD, 1 SD, 1.5, or 2, from the expected value.  Certainly anyone who thinks or thought that it was noise should have been more than willing to go 1 or even .5 SD on the bet.  1 SD alone would give them a 5-1 edge (84 to 16), assuming that they were 100% certain the effect was noise!


#33    MGL      (see all posts) 2009/03/18 (Wed) @ 22:06

BTW, if you keep testing data sets, eventually you will come up with one that shows NO effect, even when one exists (a Type II error).  The whole idea of retesting with different data sets is to combine the results (you can leave out the initial one, since it MAY have been cherry picked or the result of “curve fitting").


#34    Tangotiger      (see all posts) 2009/03/18 (Wed) @ 22:15

On Freak blog someone posted:

I have box scores going back to 1998 on NCAAB and NBA results. Here are the results with a bigger sample size. Another interesting note is that the home team usually wins 1-point games whether they are ahead or behind at the half.

NCAAB (Men’s College Basketball), 29,753 games
Team leading by 1 at halftime, Team winning game, Number of games
Away, Away, 455
Away, Home, 695
Home, Away, 436
Home, Home, 777

1232 times / 2363 times (52.14%) the team winning by one at halftime won the game.

NBA, 13,123 games
Away, Away, 223
Away, Home, 309
Home, Away, 242
Home, Home, 304

In the NBA, it is only 48.9% (527 / 1078), but a smaller sample size.


#35    Guy      (see all posts) 2009/03/18 (Wed) @ 22:45

Yes, MGL, I misread your original bet—didn’t realize it was limited only to home team -1 scenario.  I agree there’s enough data to conclude something different is going on there. 

I don’t think the authors’ original claim—that teams behind one are more likely to win—stands up to scrutiny. 

So that leaves the question of why?  I remain dubious that -1 teams are more motivated than those in tie games or +1.  It’s something about tactics, or officiating, or ???


#36          (see all posts) 2009/03/19 (Thu) @ 00:22

Tango and whomever made that post on freakonomics - I have no idea what those charts are saying.  None, whatsoever.  Can someone decipher them?

I don’t think the authors’ original claim—that teams behind one are more likely to win—stands up to scrutiny.

Clearly it seems only to apply to home teams, but it looks like the data shows a normal effect for the road team and an extremely abnormal one for the home team, so that combined, yes, their claim is 100% correct, although diluted of course.

Guy, we have all said 100 times that we are skeptical of the “psychological explanation” as opposed to a strategic, rules, nature of the game, officiating, etc. one.  I would NOT rule out their explanation, though.  Why would you?  First everyone said that it was noise.  Now, it is 99.7% apparent that it is a real effect.  Now, everyone is rejecting their explanation for the effect.  Is this a crusade against these guys?
The principal things that should make one skeptical of their explanation is that the effect appears to be true only for the home team AND the effect appears to be only at exactly one point (as I said before, you should expect to see diminishing effects at 2,3, etc. points).

I suppose that it is possible that the psychological effect they posit for some reason only manifests itself for a home team in basketball and only for 1 point. Without another explanation, it seems silly to me to rule out theirs, although, as i said, I think they jumped the gun a little in declaring that as “the explanation” and I think it was a (bad) mistake for them to overlook the home/road thing, assuming that it exists in their data set.

If I have time, I will also look at how the relative strength of the teams factors into this, using the pre-game Vegas lines as a proxy for relative team strength plus HF advantage (around 4.5 points I think).

Guy, do you know what it means to “dig in your heels?”


#37    Tangotiger      (see all posts) 2009/03/19 (Thu) @ 00:30

Away, Away, 455
Away, Home, 695
Home, Away, 436
Home, Home, 777

Away team leads at the half by 1, and ends up winning the game 455 times

Away team leads at the half, and home team ends up winning the game 695 times

So, win% for away team is .396

Repeating the same steps for the other two records in the chart and win% for home team is .641

Simple average is .518


#38          (see all posts) 2009/03/19 (Thu) @ 00:36

OK, I see what the chart means now.  Without including the wp’s for the other point differentials, I don’t see the value in those numbers…


#39    Guy      (see all posts) 2009/03/19 (Thu) @ 00:59

MGL:  I don’t understand your post at all.  I agreed there is a pattern for the home team.  (See below for another look at that). I said two other things:

1) The authors were wrong to say teams 1 point behind are more likely to win (than lose)—that it’s better than being tied.  That is the big story in their paper.  But it is clearly not true at all for road teams, and it looks like home teams win about equally at -1 and tied.  So their conclusion is incorrect.  You don’t disagree with that, right?

2) And I said I was “dubious” that this was a function of player motivation.  I didn’t “rule it out.” But the authors very much “ruled it in” as the explanation, and dismissed other possible causes with little evidence or even compelling logic.  I think we both agree their explanation is unlikely, and for some of the same reasons.  You seem impressed by their experimental findings—which surprises me given your interest in game theory— while I don’t think it shows what they say it does. But I’m not sure what we’re arguing about....

* *

I ran a regression line thru Ed’s data and got this result:  Lead / Actual W% / Exp W% / Diff (SD)
-8 0.360 0.314 1.7
-7 0.310 0.344 -1.4
-6 0.370 0.375 -0.2
-5 0.381 0.405 -1.0
-4 0.436 0.436 0.0
-3 0.481 0.467 0.6
-2 0.503 0.497 0.3
-1 0.554 0.528 1.1
0 0.545 0.558 -0.6
1 0.582 0.589 -0.3
2 0.598 0.620 -1.0
3 0.644 0.650 -0.3
4 0.653 0.681 -1.3
5 0.719 0.712 0.4
6 0.805 0.742 3.4
7 0.781 0.773 0.4
8 0.825 0.803 1.1

The -1 teams are +1.1 SDs, but don’t really stand out.  However, the same exercise with MGL’s smaller data set shows the -1 teams at +3.0 SDs, and it seems reasonable to assume that the study data would also show home -1 teams performing well (given their total result for -1 teams).  So with three studies, it seems fair to say these teams overperform in 2nd half. 

Interesting that home team overperforms when down 1, 2, or 3, but underperform when tied or up by 1 to 4 points.


#40    MGL      (see all posts) 2009/03/19 (Thu) @ 01:40

Interesting that home team overperforms when down 1, 2, or 3, but underperform when tied or up by 1 to 4 points.

You have to understand the way basketball works.  It is not like baseball.  When any team gets a big lead towards the end of a game, they will pull their starters, so big leads shrink, on the average.  So, a team that is 10 points better than their opponent will win, on the average, by only 8 points.  A 15 point difference in true talent results in only a 10 point average win, etc.

So…

The home team when up at halftime, does not “underperform” in the second half.  They just have more large leads and the average point differential in the second half “shrinks.” When behind in the first half, they have fewer big leads and in fact, the road team has more large leads (late in the second half) so it will appear that the home team “overperforms” in the second half.  Comprende?

I am not “impressed” with the authors’ research.  I do not have the statistical skills to critique it one way or another.  The results are impressive.  That may or may not have anything to do with the authors other than they happened to be the authors.

The authors were wrong to say teams 1 point behind are more likely to win (than lose)—that it’s better than being tied.  That is the big story in their paper.  But it is clearly not true at all for road teams, and it looks like home teams win about equally at -1 and tied.  So their conclusion is incorrect.  You don’t disagree with that, right?

Are you kidding, or did you not read their study carefully or you forgot already the data they presented?  They say that “Teams that are behind by a point end up winning more often than they lose” which is 100% correct (in their data set), but since that is not true for home teams - and they did not even break up the data into home/road - then they are wrong in that statement?  Huh????  If it turned out that teams with green uniforms won more often than they lost with a 1 pt deficit, but all other teams did not, would their statement have been incorrect as well?  How about good teams versus bad teams?  Are they required to look at all subsets of teams before making the statement they did, which was 100% correct?

From their write-up in the NYT:

The team trailing by a point actually wins more often...

So, yes, in their data set, ALL teams combined that were losing by 1 point at half time, won more than they lost.  It is in fact 51.3% of the time. Now, whether other data in the NBA or whatever supports that statement is not the authors’ concern is it? And they don’t split the data up by home/road, so that it is of no concern to them either, although it probably should have been in order to gain more insight into what might be going on.

And yes, they also said that it is better than being tied, which is of course 100% true - since by definition, a team that is tied at halftime will win 50% of the time, since both teams are tied.  So if all (home/away, green/red, old/young, etc.) teams that are behind by 1 pt win 51.3% of the time and teams that are tied win 50% of the time, I guess that it is “better to be down by a point than tied.” Are we on the same planet here, Guy?  As it turns out, that appears to be true only for the home team, but the authors did not look at the home/road splits. There are many other subsets of “types of teams” that it may or may not be true or not true for.

And I said I was “dubious” that this was a function of player motivation.  I didn’t “rule it out.” But the authors very much “ruled it in” as the explanation, and dismissed other possible causes with little evidence or even compelling logic.  I think we both agree their explanation is unlikely, and for some of the same reasons.  You seem impressed by their experimental findings—which surprises me given your interest in game theory— while I don’t think it shows what they say it does. But I’m not sure what we’re arguing about....

I don’t know that I agree that their conclusion is “unlikely.” Let’s say I am agnostic about it.

Yes, you did say “dubious,” so if I said or implied you said otherwise, I apologize and I was wrong.


#41    Guy      (see all posts) 2009/03/19 (Thu) @ 06:20

MGL:  I am simply saying that I don’t think it’s true that being slightly behind makes a basketball team more likely to win than being tied.  That’s pretty clearly not true in the NBA, and I think would not be true in a larger NCAA sample.  Remember, Brian’s NCAA sample did NOT show that.  Maybe further research will support their stronger conclusion for college basketball, but I think that is very unlikely. 

Obviously, their finding is “correct” for their own data set.  Why would I challenge that?  Sheesh.....

And yes, I understand that late game performance in basketball depends on the score. Perhaps “overperformance” isn’t a good shorthand here.  I was just observing the surprisingly sharp break (to me) between tied/ahead and trailing. 

Again, I don’t think we really disagree here....


#42    MGL      (see all posts) 2009/03/19 (Thu) @ 15:15

Guy, I don’t want to beat a dead horse, and you are a smarter guy (pun intended) than I am, but when you “dig your heels in” you tend to say some “incorrect” things.  Trust me, I know from whence I speak.

You said:

The authors were wrong to say teams 1 point behind are more likely to win (than lose)—that it’s better than being tied.

Now, come on, why would they be wrong to say that when that is what their data clearly showed????!!!!!!

Sure, the correct thing to say, I suppose, is, “This is what our data showed.  We are not saying that that is going to be true for another data set, or as a population mean.  It is possible that what we found was completely or partially a statistical anomaly.”

So, I’ll take your last post as an apology and a retraction and an admission of some poor choices of words on your part. wink

I agree that there is essentially nothing on which we disagree, with respect to this issue at least…


#43    Guy      (see all posts) 2009/03/19 (Thu) @ 17:33

Ed Kupfer has done some more analysis of his data:  http://sonicscentral.com/apbrmetrics/viewtopic.php?t=2142

MGL:  Maybe you’re interpreting my use of the word “wrong” differently than I intended.  I meant only that I believe that they are incorrect in their belief that teams behind by one point will win more often than a team tied at the half, not that they shouldn’t have reported their data.  I mean they call the study “When Losing Leads to Winning”—they are not just saying a team behind does better than you might expect, but literally better than it it’s not behind.  Given all the data we’ve now seen, I believe a look at a larger NCAA dataset would show that -1 teams overall would win less than 50% of the time.  And I believe their estimate of the benefit gained from being down one—a 75 point increase in win%—is far too high, especially if only home teams enjoy this gain. 

I wasn’t really commenting on whether they were suitably cautious in presenting their data.  As it happens, I don’t think they were.  If you consider how small their sample was, and then consider the way they presented their data and how strongly they argue that “effort” is the sole cause, I think they overstated their case quite a bit. (And Wolfers managed to hype it even more at Freakonomics.) But that really wasn’t my point.

But you’re raising a larger issue which is fair and important.  When we criticize this kind of paper, it’s important to look for valuable insights and not just dismiss it because it appears to have flaws—even if those flaws are real.  You did that here and found something interesting as a result, which is great.  It’s good to be reminded of that. 

I will make only one excuse for my knee-jerk skepticism about this kind of academic analysis of sports, which is that I really didn’t start out with that inclination.  I assumed that work by academics in refereed journals would be uniformly of high quality, and would teach us amateurs things we didn’t already know.  But a few years of reading a lot of these articles has left me shocked at the generally poor quality, not only in terms of ignorance about sabermetrics, but also basic errors of logic and poor judgment.  So I approach this kind of article now with very low expectations, and I think deservedly so.  But that doesn’t mean we shouldn’t look for nuggets of insight in this kind of work, as you’ve done in this case.

And no, that’s not an “apology” :>)


#44          (see all posts) 2009/03/19 (Thu) @ 18:04

More from Freakonomics:

http://freakonomics.blogs.nytimes.com/2009/03/19/when-winning-leads-to-winning-a-response/


#45    MGL      (see all posts) 2009/03/19 (Thu) @ 18:08

One important thing to keep in mind, to go along with what Guy is saying - and we take it for granted on this site - is that when you find a sample result that is anomalous, that the “true” result is likely much closer to the mean (in this case, the “best fit” line).  I would hope that they know that if they replicated their study with another data set, that they would likely find a result which was not nearly as dramatic as the one they found.  Otherwise known as “regression toward the mean.”


#46    Guy      (see all posts) 2009/03/19 (Thu) @ 20:41

I find the authors’ response at Freakonomics disappointing.  They break out the data in a graph for home teams, which is helpful, and fit a more conventional curve to the data.  They say this reaffirms their finding, saying:  “For example, when the home team is ahead by one point, they end up only winning 57.5 percent of games while we would have expected them to win 65.6 percent of games.” Notice the “for example,” implying this is just one of perhaps several supporting pieces of evidence.  But they don’t mention that:
* a -1 home team is above the line, but it’s not clear that the difference is statistically significant;
* a tied home team exceeds expecatations just as much as a -1 home team;
* there are several other scores as far from projected as the -1 home team.  (And yes, I think the authors should explain why those don’t cast doubt on the importance of the -1 results.)

They also repeat the claim that the -1 teams make most of their gains soon after the halftime break.  But if you look at their graph, it really looks like a steady climb over the first 12 minutes.  Could this effort effect really last than long, over countless score changes?

Interestingly, the stronger case in their data is actually for a -1 road team—exactly the opposite of what the NBA data suggests.  Curiouser and curiouser....


#47    MGL      (see all posts) 2009/03/19 (Thu) @ 22:40

While I don’t have much criticism of their study, other than the couple of points I have already made, their response was just a rehashing of their original article and study.

The bottom line is that we have a relatively small sample of games that suggests that something is going on at one of the data points.  Absent an a priori or even after-the fact “explanation” for that anomaly, we likely have nothing more an anomalous data point among many.

However, given a reasonable explanation for the effect found, either a priori or after the analysis of the data, we are much less certain that the anomalous data point is nothing more than a statistical aberration.

That is about it really.  The “explanation” for the aberration is important in these kinds of analyses.  It becomes a Bayesian problem once we introduce another probability, which we are essentially doing with an “expalantion.” The more likely the explanation (the higher the a priori probability), the more likely that the effect seen in the data is “real” (and tied to the explanation of course).  This is basic statistics!

An analogy in baseball which I use all the time, when I get into discussions about data mining and cherry picking with laypersons is this:

I look at pitcher splits and I find that power pitchers do better at night then finesse pitchers and I also find that power pitchers do better on odd days than even days.  Both results are around 2 SD from the norm.  Maybe I looked at hundreds of splits such that I would expect to find a few that deviated by 2 SD, by chance alone.

Am I correct in making the same conclusion about both anomalous splits I found since they were both 2 SD from the norm?  Not even close!  Why?

Because now we have a Bayesian problem.  A priori, what are the chances that power pitchers will do better (true talent wise - in an infinite sample size) on odd days than even days, using common sense and everything we know about basketball?  Almost zero.  If we do a Bayesian analysis where the a priori probability is almost zero and then after looking at a sample, it is 2SD from the norm, guess what?  The chances that this occurred by chance is near 100% rather than 2.5%!  Amazing!  (The 2.5% does not consider the data mining of course - considering that we looked at lots of splits, we have another a priori probability in our Bayesian formula.)

Anyway, in the second split, the day/night, we can construct an assumption that batters have a tougher time seeing pitches from power pitchers at night than finesse pitchers.  If we assign some non-zero a priori probability of that assumption being true (without considering the data - the two probabilities have to be independent), before looking at the data (hence, a priori), we will get a completely different number for the probability that the numbers we saw in the data occurred by chance.  We can assign that probability after looking at the data, but since it is supposed to be independent of the data results themselves, we have to be careful.

Whether you like their assumption about “teams or individuals who are slightly behind often show extra effort in a game or contest,” and whether you “think” it applies to NCAA basketball or not, and whether you think it should show up at a 1 pt deficit, but not a 2 or 3 pt deficit or not, you have to admit that there is a non-zero chance that it could manifest itself in NCAA basketball and you have to admit that there is evidence in the experimental literature that it is a legitimate phenomenon in certain settings.  That non-zero chance changes everything.  We are NOT talking about ONLY the chances of this data point coming up how it did.  We have another probability to contend with.

I’ll give you another example, which should make this crystal clear.  Say, we do a study on blocked shots and height in basketball and we find that there is a strong positive correlation and the standard error of our results in a small sample of performance is only 2 SD.  Do you think that the chances of this result completely ocurring by chance (and we made a Type I error) is 2.5%?  I would hope not!  The only reason is that we have another Bayesian problem with the a prior probability of whether or not height makes a difference in blocked shot rate (by how much is another story of course) around 100%.


#48    ubelmann      (see all posts) 2009/03/20 (Fri) @ 00:23

The graph that Wolfers presents on the Freakonomics blog raises just seems to raise more questions to me.  The -7 point is just as significant as the +/- 1 data points, and the -4 point is just as significant as the -1 data point.  Why don’t we have explanations for those magic numbers?

While I agree that there are some anomalies in there, and it’s something that seems interesting to look into, it doesn’t seem like anyone has a very good theory for why the data is the way that it is, so I’m pretty unsatisfied.


#49    MGL      (see all posts) 2009/03/20 (Fri) @ 01:07

#48, we’ve discussed that already and read my post #47.  If the authors or anyone else has a nice theory (other than just pulling one out of their *ss) about why teams behind -4 or -7 points would end up winning more than they should, then they would have a lot more significance than just being anomalous data points, as I explain above (the -4 and the -7 are the odd even days and the -1 are the day/night games).  The fact that the authors have a theory about -1 games (or at least close games) that has some support in the psychological literature makes all the difference in the world.  THAT is the difference between the -1 pt and the -4 and -7 points.


#50    ubelmann      (see all posts) 2009/03/20 (Fri) @ 03:33

I guess it seems to me that none of the authors’ arguments ought to apply exclusively to -1 point leads.  It doesn’t intuitively make sense that teams would act all that differently with a 1-point deficit than with a 2-point deficit or a tie game.  So their underlying reasoning, to me, is lacking.  They have a reason for why a team may react differently to a deficit than a tie game, but no reason for why a 1-point deficit in particular is special.

And since the data seems to show that other point differentials are just as anomalous, I’m still left wondering what is so special about a 1-point deficit.

In some sense, what I’m saying is that while the 1-point differential supports their theory, the 2-point differential doesn’t seem to fit it at all, and are basketball players really so flaky psychologically that a difference of less than a basket affects their effort levels?  Theory or not, that seems like a dubious claim to me, and it kind of seems like they are finding an effect simply because it’s the effect that they were looking for (confirmation bias.)


#51    MGL      (see all posts) 2009/03/20 (Fri) @ 05:12

ubelmann, I sort of hate to be the authors’ apologist or spokesperson, but we (I) have already discussed the weakness of finding a large effect at -1, but none at -2 or -3.  However, you also say “it does not make intuitive sense...than with a… tie game.”

Again, their thesis, which is supported by numerous citations and is apparently well known in the field of psychology, and was the impetus for the study, is that a SMALL DEFICIT can lead to increased effort.  So, it is unfair of you to throw a “tie game” into the fray.  They were hypothesizing before they looked at the data that there would be be an anomalous effect at a small deficit and NOT at an even game, and that is exactly what they found!  How can you criticize that? 

The ONLY reasonable criticism of their conclusion, other than the zeal with which they accept it at the exclusion of anything else, is that it would have been stronger support of their thesis had they found a similar effect at a 2 or even 3 point deficit, as it is intuitively unreasonable, to paraphrase your own words, to think that such an effect or syndrome would manifest itself at exactly 1 point but not at all at 2 or 3 points.  And I don’t say that with much conviction, as I think that authors would take exception with that sentiment (I think they would say that given what they know about this “syndrome,” that it is entirely likely that an effect would show up at a 1 pt deficit but not at 2 pts or more).

In addition, the “blip” at exactly one point suggests something fundamental about the structure of the game that may be causing such an anomalous result and casts doubt on the pyschological thesis they posit, at least in my opinion and that of several other smart people on this and other blogs.

Given all of that, the authors should have been more cautious with their conclusions and they should have recommended that further study with additional and larger databases is warranted.  I did not say ever say this before, but I also think that some of their recommendations to the “real world” are silly and misguided, such as lying to an employee by telling them that are in close danger of being fired or overcome by another employee or some such thing (I don’t remember exactly what they said). Academic researchers sometimes seem to be compelled to offer these real-life suggestions based on their research, especially when the research is esoteric, as in this case, as if just doing esoteric research is not enough. Maybe there is pressure from their Universities, I don’t know.  In this case, their recommendations seemed contrived to me.


#52    Guy      (see all posts) 2009/03/20 (Fri) @ 07:31

BTW, here are numbers for MLB home teams 1977-2006 in games after 4 complete innings:
-1 .371
0 .529
+1 .676
So -1 is slightly farther from tied (-.158) than being up 1 (+.147).  Really no difference.

MGL:  Based on their paper, it sounds like the prior research findings cited support the idea that people “close to their goal” have incentive to work harder.  Also that people hate losing (loss aversion—the phenomenon the causes people to hold poorly performing stocks too long, for example, because they don’t want to confront the loss).  But what does “close to the goal” mean in sports?  I’d say it means close to winning.  So the question is, is it plausible that players have this sense of being close to victory (and determined to avoid the emotional pain of loss) only when trailing by one, but not when down by 2?  Just speaking for my own intuition, I find the one point only proposition unlikely.  And as for what the authors were looking for when they began, I think it’s hard for us to know.

BTW, I think extra effort in a tie would also be consistent with the research, but it’s impossible to see in this data since both teams have the identical motivation.  Their experiment, in fact, does show better performance by both players when tied.

I agree that the authors don’t have to explain discrepancies other than -1, because that’s not their interest.  But the existence of similar size discrepancies at other scores SHOULD alert them to the fact that their sample size really isn’t large enough for them to be this confident in their conclusions—they don’t seem to want to get that (loss aversion?).

MGL, I’m interested in why you find their experimental results convincing.  Given the design of their game, it seems logical a player slightly behind would increase their speed.  In fact, every group improves in game 2, suggesting players don’t optimize their speed playing just once (seems reasonable).  If you want to measure effort, better to look at something where extra effort involves no possible penalty—like a 40-yd dash or something.  But maybe I’m missing something.  Why do you like this experiment?


#53    Guy      (see all posts) 2009/03/20 (Fri) @ 14:28

Interesting post over at Freakonomics:  http://freakonomics.blogs.nytimes.com/2009/03/19/when-winning-leads-to-winning-a-response/#comment-396253

A poster named Hugh Critz has 25,000 college basketball games in his database, and shows these results:
Team down by 5 wins game 27.4%
down by 4 wins 34.9%
down by 3 wins 38.1%
down by 2 wins 42.5%
down by 1 wins 48.0%
tied wins 50%

Games where the pointspread for the game was 7.5 or less:
Team down by 5 wins game 29.7%
down by 4 wins 38.1%
down by 3 wins 40.9%
down by 2 wins 45.0%
down by 1 wins 48.1%
tied wins 50%

Consistent with the idea that a one point lead is worth less than other one point gains, but not that it has zero or negative value as the original study found.


#54    Tangotiger      (see all posts) 2009/03/20 (Fri) @ 14:57

I for one am glad that the thousands of bloggers peer-reviewed this (and provided a far larger dataset), and not the four or five academicians that are part of the “official” peer review process.

***

For a baseball theme, you can compare probability theory:
http://tangotiger.net/innwin3.html
to empirical:
http://tangotiger.net/innwin2.html

And you will see some differences in the ninth inning, when closers are used and/or when one-run strategies are used.


#55    ubelmann      (see all posts) 2009/03/20 (Fri) @ 15:12

Tango: Having some experience in physics-flavored academia, I have no problems saying that the academic peer-review process could use an overhaul.  The biggest problem, one that seems difficult to get around, is that faculty members have so many things on their plate (research, teach, write grants, give talks, sometimes travel to give talks, go to conferences, advise students, department administration, etc.) that peer reviewing papers gets stuck way at the bottom of their priority list.  And while it is an understandable decision that someone would want to focus on their own research rather than spending the time to be really critical of others’ research (focusing on construction rather than destruction, in a way), it can significantly water down the peer review process.

MGL: I guess I would have been a lot more satisfied if they had some reason to believe that 1 point was “small” and 2 points was “large.” I don’t think we really disagree in that I mainly think that they were too eager to accept their hypothesis that this was mainly a psychological effect without going into at least some discussion about whether or not this could be an artifact of strategic changes.


#56    MGL      (see all posts) 2009/03/20 (Fri) @ 16:44

I don’t think we really disagree in that I mainly think that they were too eager to accept their hypothesis that this was mainly a psychological effect without going into at least some discussion about whether or not this could be an artifact of strategic changes.

Absolutely!  My major criticism.

Guy, I don’t necessarily “like” their experiment.  In fact, I barely remember the details.  I only mention it to point out the fact that their thesis is not something that they just pulled out of their a**es.  It is apparently a known effect, which as I explained in detail (about Bayesian probabilities), adds strong support to the reliability of their data anomalies, at least in close games.

Anyone who posts the results of more data MUST break it down by home/road.  The effect appears to manifest itself only for the home team (which is not that surprising if you know much about basketball and basketball handicapping).  If you combine home and road data, you will dilute the effect and add a lot of noise to the data.  For example, if the new poster’s data on freakonomics shows a small or slight effect, I would venture a guess that if he showed us the results just for the home team, that it would evince a much larger effect.


#57    Guy      (see all posts) 2009/03/20 (Fri) @ 16:59

MGL:  I agree home and road splits would be nice, but the poster didn’t have that available. It’s still useful.  We now have 3 estimates of NCAA +1 win%: 49% in the original study, and two showing 52% (Brian and this new post).

Moreover, the original study does NOT show this to be a home team effect.  On the contrary, their data shows that it’s road teams that most exceed expectations when down by one. See Wolfers’ new post that I linked to in #53 above.  So I think you’ve been too quick to limit this to home teams only.  That may be true in the NBA (though I’m not sure the effect in your and Ed K’s data is large enough to be sure), but we don’t know about NCAA.


#58          (see all posts) 2009/03/22 (Sun) @ 08:44

I was the poster with the larger dataset.  I’m glad I was forwarded this link, the participation here is interesting.

As for breaking down home/away, that is not possible for me to do.  The reason is that I did not accurately log down home/away for my data.  Unlike baseball and the NBA, where teams almost never play on a neutral field, college basketball teams do often play on neutral courts, especially in early season tournaments and post-season tournaments.  To top it off, it gets trickier because some neutral courts are defacto home courts.  The University of Washington playing a NCAA Tournament game against Akron (Ohio) in Portland, Oregon basically made it a defacto home game as far as the fans and travel was concerned (but not as far as your own locker room and knowing the floor, if there is an effect in that). 

Anyway, the short of it is that I cannot break down my data into home/away.


#59    MGL      (see all posts) 2009/03/22 (Sun) @ 21:47

A fella named “King Yao” was kind and generous enough to send me a dataset from the 93 - 08 NBA seasons, with quarter by quarter scores.

That is the largest data set, by far, for the NBA, that I have been able to look at the results we have been talking about.

We find the same thing as before - large disconnect when the road team is up by 1 pt at the half and even a small disconnect when the home team is up by 1. 

Home team has 707 games where they were tied.  They win 54.6%.  The have a pt differential in the second half of 1.5.  IOW, they outscore the road team by 1.5 pts in the second half after outscoring them by 1 in the first half. Keep in mind that home teams are usually 4.5 pt faves, on the average.  A home team that is tied at the half was probably a 3-4 pt fave or so for the whole game, so a 1.5 pt differential in the second half is perfectly normal.

With a 2 pt lead, the home team was probably around a 4 to 4.5 pt fave overall.  They win 62.5% of the time and outscore the road team by 1.3 pts which seems a little low, but that number will tend to shrink as a team has a bigger half time lead (especially if they are the better team) because when they have a large lead late in the game, they will pull their starters.

Let’s jump back to the 1 pt lead at halftime by the home team.  They are probably a 4 pt fave before the game starts.  They should probably outscore the road team by 1.5 to 2 pts in the second half.  And they should win somewhere between the 54.6 and 62.5.

They win 57.1, which seems reasonable, but they only outscore the road team by .8 pts which is quite a bit lower than we would expect.  We are talking about only 739 games, so that is probably nothing to get excited about.

As I said, we only see a muted effect with the home team being ahead by 1 pt at the half.  That could easily be explained by random statistical fluctuation, I would think.

The large effect we see, again, is when the road team is ahead by 1 pt, or at least for the road team in general, as you will see if you continie reading.

When the game is tied, remember the road team only wins 45.4% of the time, because they are still the worse team by probably 3-4 points for the whole game.  And again, they are outscored by 1.5 pts in the second half, about what we would expect.

With the road team up 2 pts at the half, they win only 47.3% of the time and they are outscored by 2.5 pts in the second half.  We are actually seeing an effect here, I think.  If the road team is up by 2 pts at the half, we would expect them to be better than an average road team as compared to the home team, and we would not expect them to be outscored in the first half by more than a point or 3.  Remember that in a tied game, they only get outscored by 1.5 pts.  So I think we are actually seeing an effect here with a 2 pt lead which did not show up in the NCAA data in the original study.

Anyway, if we jump back to a 1 pt lead for the road team, we appear to be seeing the large effect we have been “expecting” (if he authors’ hypothesis has some significant merit):  Remember that in a tie game, the road team wins 45.4.  With a 2 pt lead, they win 47.3 and with a 3pt lead, which I have not mentioned yet, they win 51.8 (finally more than 50%).  So we expect the 1 pt lead to be somewhere in between 45.4 and 47.3 and for the second half pt differential to be between 1.5 and 2.5 pts.

In fact, with a 1 pt lead, the road team only wins 41.8% of the time and is outscored in the second half by 2.5 pts again.

If we go down the list of pt differentials, no matter what the lead by the road team, it looks like they allow around 2.5 pts more then they score in the second half.  The only time that is not true is in a tie game.

So, honestly, maybe it is the tie game that is the anomaly and not the 1 pt lead, for the road team.  If we just focus on the road team, if we change the data point in a tie game, we probably completely change the best fit line such that the 1 pt lead does or does not fit on that line, depending on whether the tie game data point is the anomaly or not.

Here is some of the data and you can make any conclusions you want:

diff at half time diff for 2nd half wp NG

Home
0 1.5 .546 707
1 .8 .571 739
2 1.3 .625 717
3 1.3 .646 727
4 1.8 .662 777
5 2.7 .714 674

Road
0 -1.5 .454 707
1 -2.5 .418 710
2 -2.5 .473 656
3 -2.3 .528 614
4 -2.4 .555 589
5 -2.2 .611 568

There also looks like there are too many games where the road team has a 12 pt lead. There might be something about the game of basketball that causes that for some reason. I have no idea.’

Actually if we put the two tables together from the standpoint of the road team, we get:

Road

-5 -2.7 .286 674
-4 -1.8 .338 777
-3 -1.3 .354 727
-2 -1.3 .375 717
-1 -.8 .429 739
0 -1.5 .454 707
1 -2.5 .418 710
2 -2.5 .473 656
3 -2.3 .528 614
4 -2.4 .555 589
5 -2.2 .611 568

You could make the argument that the anomaly is in tie games and when the home team is up by 1 pt, I think. I am really not sure, other than the fact that there does not appear to be a smooth relationship between lead at the half and the second hald pt differential or final wp.


#60    Tangotiger      (see all posts) 2009/03/22 (Sun) @ 23:22

King Yao seems to be a player on the betting scene.

He is also, I think, the first person to review The Book on Amazon:
http://www.amazon.com/review/R2D0AJYBRQ89I8/ref=cm_cr_rdp_perm

MGL: can you ask him if (a) you can forward his data to me and (b) can his data be posted


#61          (see all posts) 2009/03/23 (Mon) @ 14:33

Tango, I sent you a message through this site’s message area (I could not find your email address).

I have a few comments about the data and results.

1. The data is still pretty small, only 700ish games for any point.  When a hitter (with about 700 plate appearances a year) slugs .550 one year, but slugged .500 in the two preceeding years before that ... and with 600-700 plate appearances in each year...what is your estimate for the next year?  To clarify the question: 2006 SLG = .500, 2007 SLG = .500, 2008 SLG = .550, what do you expect for 2009?  To isolate the question, let’s please assume age is not an issue in any of those years.  My initial thought would be something around .515. 

2.  My next question is - does the NBA data relate?  We see a tie for road teams = .454, a 2-point lead = .473....but a 1-point lead = .418 .... my opinion is that the two (SLG and NBA) are somewhat similar and going forward, I think the average away team in the NBA up by 1 at the half will win more than the .418 rate and closer to the .454/.473 rates than the .418 rate.  My personal guestimate for the average road team up by 1 at the half is that they will win greater than a rate of .43 and less than .46.  Thoughts on that estimate?

3. MGL and Tango both have (or will have) the raw data, including data for the 3rd Quarter.  In the NCAA study by the Wharton guys, they mentioned that most of the gain for the team down by 1 was in the first few minutes of the 2nd half.  The NBA breaks down the 2H into quarters (NCAA does not).  Although my data is not as detailed as theirs, I would think if there is some “early 2H” action, that we may see it in the NBA 3Q data.  Can you guys look at that and see if you find anything that syncs with the Wharton guys’ data or not?


#62    Jan Suchanek      (see all posts) 2009/03/23 (Mon) @ 15:45

The data from the 3rd Q that I have is going to raise more question than it answers, I’m afraid. My data goes from 97-98 season to present (including this year). I have 1076 games in total where a team was down by 1 point at the half. 550 are Away Teams, 526 are Home. They go on to win the game 51.0% of the time.

It turns out the teams actually do better in the 4th Q than they do in the 3rd. They win the 3rd Q 51.1% of the time by an average of 0.2 points. They win the 4th Q 52.1% of the time for an average of 0.5 points.

Another possibly relevant or interesting observation: Teams that are down by exactly 1-point going into the 4th Q win the game 44.0% of the time (899 games). This is much more in line with the surrounding points: for 2-point deficits the figure is 38.1% and for 3-point deficits it’s 38.3%.


#63    Tangotiger      (see all posts) 2009/03/23 (Mon) @ 16:12

I will definitely look at your data, thanks!

***

If your *prior* (i.e., your true spread in talent of the population) is such that all players are identical in talent, then, *regardless* of what SLG you show, then the true SLG is exactly league average.

If your prior is such that 1 SD = .050 in SLG for your population, then you have to use that to figure out how much to regress your sample performance.  That is, it becomes a Bayes probability.  Read this important thread:
http://www.insidethebook.com/ee/index.php/site/comments/chipper_does_not_compute/

MGL mentions Bayes alot, and then I provide data later on.  Good thread.


#64    MGL      (see all posts) 2009/03/23 (Mon) @ 17:18

That (#63) is what the “why’s” become so important when evaluating anomalous data.  It greatly affects the “prior” probabilities.  These are often nothing more than an estimate (WAG) of the chances that something could possibly make a difference, like the day/night, odd/even thing I talked about above or in the other thread.


#65          (see all posts) 2009/03/23 (Mon) @ 20:24

Thanks, I’ll visit that link tonight.

There is one other issue I’d like to touch on.  Fewer NBA games end with a differential of 1 than other differentials.  Here are the percentage of times the home teams win by X.  The home team wins and loses by 1 about 2.1% of the time, but the win or lose by 2 thru 5 by around 1% more often.

Home team wins by | percentage of times
5 3.7%
4 3.1%
3 3.1%
2 3.1%
1 2.1%
0 0%
-1 2.1%
-2 2.9%
-3 2.8%
-4 2.7%
-5 2.8%

Here are the first half numbers which shows nothing unusual for 1st halfs ending with a 1-point differential

Home wins by 1st half by | percentage
5 3.7%
4 4.3%
3 4.0%
2 4.0%
1 4.1%
0 3.9%
-1 3.9%
-2 3.6%
-3 3.4%
-4 3.3%
-5 3.1%

I don’t know exactly how this fits, but the fact that the 1s occur less often for the end game differential feels like it could be related.


#66    Jan Suchanek      (see all posts) 2009/03/23 (Mon) @ 22:06

#65 - “I don’t know exactly how this fits, but the fact that the 1s occur less often for the end game differential feels like it could be related.”

My first instinct says that they are not. Games don’t land on One because of the way the end of the game plays out i.e. endgame strategy is such that the trailing team is “aiming” for OT as their goal. The most common instance of this I am guess is that when they are down by Three on the final possession they will almost never lose by One since they will never attempt a 2-point field goal. As evidence for this, 6.3% of all games the past 11 years have gone into OT, more than land on the surrounding numbers of One, Two or Three.

I don’t see how this can relate to the One at the end of the 1H, since no such phenomenon exists there.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jul 30 03:43
Roy Halladay’s Bobby Orr career

Jul 30 02:33
Cleveland: Meet Patrick Roy

Jul 30 01:42
“I believe…”

Jul 30 00:30
Maddon at it again…

Jul 29 23:04
Introductions: Strasburg, BABIP… BABIP, Strasburg

Jul 29 20:31
Bannister: the greatest saberist spokesperson ever

Jul 29 19:25
Gotta give Joe Torre some credit

Jul 29 19:10
SABR 111 - Out value

Jul 29 17:47
Reducing bias in fielding metrics

Jul 29 17:44
Colin full-time at BPro