Tuesday, March 24, 2009
Being Behind is a Good Thing (Part III)?
Using King Yao’s data, here is how the home team if you look at their first half scores:
half homeScore homeWins n
1 -7 0.413 1103
1 -6 0.419 1265
1 -5 0.455 1478
1 -4 0.463 1518
1 -3 0.515 1618
1 -2 0.539 1844
1 -1 0.575 1865
1 0 0.607 1918 <--
1 1 0.600 1996 <--
1 2 0.629 1923
1 3 0.646 1908
1 4 0.690 1756
1 5 0.704 1680
1 6 0.739 1542
1 7 0.754 1440
We see the discontinuity when the home team is up by 1 or tied at the half.
Now, let’s look at how the home team does if you ONLY look at the second half scores. That is, assume that the game starts at the 3rd quarter. Here then is how the home team does, based on their score in the second half of the game, and how often they won:
half homeScore homeWins n
2 -7 0.451 1164
2 -6 0.445 1329 <--
2 -5 0.486 1431 <--
2 -4 0.486 1513
2 -3 0.522 1680
2 -2 0.552 1795
2 -1 0.558 1884
2 0 0.597 1946
2 1 0.614 1963
2 2 0.654 2001
2 3 0.657 1886
2 4 0.687 1665
2 5 0.725 1703
2 6 0.728 1627
2 7 0.770 1331
We have discontinuities at different points, but also alot of close calls too. For example, if they win the second half by 5 or 6 points, their chances of winning the game is virtually identical. Same for scoring 2 or 3 more points in the second half.
Remember, we didn’t look to see how well they did in the first half. There’s no reason that scoring 5 or 6 points in the second half should be biased based on the first half score, should it?
I’ll repeat the first half chart, this time adding a straight line regression, and the difference between the empirical and the regression line:
1 -7 0.413 0.407 0.006
1 -6 0.419 0.432 -0.013
1 -5 0.455 0.457 -0.002
1 -4 0.463 0.482 -0.019
1 -3 0.515 0.508 0.007
1 -2 0.539 0.533 0.006
1 -1 0.575 0.558 0.017
1 0 0.607 0.583 0.024
1 1 0.600 0.608 -0.008
1 2 0.629 0.634 -0.005
1 3 0.646 0.659 -0.013
1 4 0.690 0.684 0.006
1 5 0.704 0.709 -0.005
1 6 0.739 0.734 0.005
1 7 0.754 0.760 -0.006
The standard deviation of the differences is .012.
Now, here it is for the second half scores:
2 -7 0.451 0.431 0.020
2 -6 0.445 0.454 -0.009
2 -5 0.486 0.478 0.008
2 -4 0.486 0.501 -0.015
2 -3 0.522 0.525 -0.003
2 -2 0.552 0.548 0.004
2 -1 0.558 0.572 -0.014
2 0 0.597 0.596 0.001
2 1 0.614 0.619 -0.005
2 2 0.654 0.643 0.011
2 3 0.657 0.666 -0.009
2 4 0.687 0.690 -0.003
2 5 0.725 0.713 0.012
2 6 0.728 0.737 -0.009
2 7 0.770 0.760 0.010
The standard deviation of the differences is .011.
It looks to me that the deviations are noise, and not related to anything beyond that. Certainly, there’s nothing really distinguishing between the 1st half or 2nd half.


There’s no reason that scoring 5 or 6 points in the second half should be biased based on the first half score, should it?
Not exactly sure what you are asking here ("scoring 5 or 6 points?"), but in basketball what the score and score differential is in the first half is very much related to what happens in the second half.
Funny, how people that follow basketball and know intimately how it “works” seem to be in almost unanimous agreement that there is something going, which is not surprising, since basketball does not nearly have the “independence” that baseball has, and people who are just “numbers people” attribute all the anomalous to noise *at least that that is most likely).
Guess what? I am 95% sure that the numbers guys are wrong. Let this be a lesson. At least it should be. The numbers people are being hoisted by their own petard! They are the ones that know about Bayesian probabilities, but because they are ignorant of the unique characteristics of basketball and the potential for real effects to be causing these anomalies, they overlook, ignore, or understate the a priori probabilities.
The basketball guys get the conclusion right (most likely) because they unknowingly are using proper Bayesian analysis which the number guys are not.
I repeat (and repeat and repeat and repeat), you can’t just look at the numbers in a vacuum! I don’t know how to stress that any more than I am trying.
It is just like the odd/even days and day/night example I gave earlier. You CANNOT look at a 2.5 SD anomaly in pitcher ERA for power and finesse pitchers and conclude that “there must be something going on” and you cannot look at a 1.5 SD in day/night splits and conclude that it is just noise without knowing or estimating some probability that day/night splits for finesse and power pitchers might be “real.” Same thing for odd/even day splits.
What if you knew nothing about baseball and started looking at all kinds of splits for players: day/night, versus lefty/righty opponents, parks, etc. Could you just look at the numbers and reach good conclusion and make reliable inferences about what is likely “real” or not “real?” No, no and no! You COULD but you would do a hell of a lot better if you applied what you knew about baseball or just applied common sense. It would change those inferences and conclusions dramatically.
Tango, I am afraid you are doing the same thing with this data. Operating in a vacuum as if you know nothing about way the basketball potentially works (I don’t know whether you do or don’t). That is NOT a good way to do an analysis and come up with reliable conclusions.
I’ll say it again. From what I know about basketball, is extremely unlikely that these anomalies are occurring by random chance even though that might the conclusion you have to reach without using a Bayesian analysis (which is fine if you nothing else).