THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, July 01, 2010

Self-policing sportsmen

By Tangotiger, 05:47 PM

Glove-slap Tom:

The paper formally tests a cognitive explanation for the hypothesis frequently put forth by Rasheed Wallace that says that a player will often miss free throws after getting a foul that he does not deserve. The explanation that Haynes and Gilovich point to is a phenomenon known as inequity aversion, the idea that people prefer to avoid unfairness and injustice (even if sometimes it is not in their best interest to do so)…

...the authors calculated free throw percentage for the first shot after these incorrect calls, which turned out to be a whoppingly low 53.2%, substantially lower than the league average for the season on first-shot free throws, 73.6% (the league average was 77.8% for second-shot free throws).  This suggests inequity aversion--players felt significantly less comfortable making a free throw after receiving an unjust foul call.


#1          (see all posts) 2010/07/01 (Thu) @ 18:17

Cool!  I wonder how the authors figured out whether the call was right or wrong?


#2    brent      (see all posts) 2010/07/01 (Thu) @ 18:42

If only the football players felt the same way and would stop diving.


#3    dq      (see all posts) 2010/07/01 (Thu) @ 21:51

No way that’s right. Sample size and/or selection of shooter.

It would be visually noticed if it were true.


#4    Josh K      (see all posts) 2010/07/01 (Thu) @ 22:15

@3 didn’t the quote say that Sheed says this all the time?  He noticed it.


#5    David P      (see all posts) 2010/07/01 (Thu) @ 22:36

@dq: Sample size is 262 free throws.

@Phil: They determined this subjectively (by one of the researchers) but they had three other people watch a stream of “bad calls” and “questionable calls.” They give the results for the calls the everyone agreed were bad calls and it’s about the same; lot smaller sample size though so I have to imagine the significance goes away.

They only control for that player’s free throw percentage...seems like in such a small sample, something else could be going on.  Do refs make all of their bad calls late in the game when shooters are tired?


#6    dq      (see all posts) 2010/07/01 (Thu) @ 23:03

5/the article said 77 in 102 games. If it’s 262 in 102 games that’s even more implausible.


#7    David P      (see all posts) 2010/07/01 (Thu) @ 23:26

@dq:  77 trips to the line after bad calls; 185 trips to the line for those same players after legit calls; 262 total.


#8    MGL      (see all posts) 2010/07/01 (Thu) @ 23:56

The results do seem a little extreme (they ARE something like 3-4 SD away from the null hypothesis, so sample size is not a problem), which should raise a red flag.  A large one at that.  Whenever I get extreme results like that, it is almost certain that I made a mistake in my methodology or interpretation of the data (or a computation error).


#9    Guy      (see all posts) 2010/07/02 (Fri) @ 06:23

Right, there is simply no way that a 20-point drop is true.  The only question is where and how they made their error.  This is confirmation bias of the worst sort.  The researchers are looking for this kind of effect, so they find it.  No one stops to ask if a 20 or 40 point decline in shooting is remotely plausible.  If it exists at all, the effect would have to be subtle—think about how routinized and automatic foul shooting is for professionals. Of course, to find a subtle effect would require a vastly larger sample.

Questions for someone who read the study:

Since the Abstract says they found this only applies when the shooter’s team is ahead, does that mean a 40-point drop in those games?  Or were all 77 cases when shooter lead?

They are really claiming that 77/262 (29%) of foul calls are “obviously” incorrect?  Did the reviewers also review the 71% “correct” calls, w/o being told which were which, or knowing the shot outcome?  No way that 4 independent raters could agree on that.  And 29% is almost wrong by definition, since refs clearly are calling that behavior a foul with some frequency (unless we believe refs call fouls on a nearly random basis).


#10    David P      (see all posts) 2010/07/02 (Fri) @ 09:45

@MGL: Right, the results are definitely statistically significant; just in such a small sample I’m sure if they added in the relevant control variables (say, quarter dummies, etc.) the results would totally change and they’d lose about all of their degrees of freedom.

@Guy: There are 262 fouls called for the players who had “bad” calls called for them at some point in the game.  Presumably there’s a bunch of other players who are fouled and never involved in a bad foul.  So the percentage of bad calls is much lower...they watched 100+ games, so I’m sure the bad call frequency is really low.


#11    Guy      (see all posts) 2010/07/02 (Fri) @ 10:01

David P:  Thanks.  So how many of these 77 bad calls came when shooter’s team was leading, as opposed to tied/trailing?  Is the 53% success rate for all 77 first shots, or just a subset when team leads?

And did the 4 coders review only the fouls for these selected 77 players, or a sample of other fouls in these 102 games?


#12    Ken      (see all posts) 2010/07/02 (Fri) @ 11:14

The paper reports a 34% average when a team is leading and a 66% average when they are trailing. To end up with an overall average of 53% - it must be that a little over half the bad calls happen for teams that are trailing.

The sample size is not very large - but the measured effect appears to be huge. The idea that “It would be visually noticed if it were true” applies to all sorts of logic that sabermetrics have disproven. Most people believed that some players were “clutch” until that was shown to be mostly myth.

I would like to see if the effects of bad calls are more prevalent in players that are normally better or worse foul shooters.


#13    dq      (see all posts) 2010/07/02 (Fri) @ 11:29

“It would be visually noticed if it were true”

Did you ever try to intentionally miss a free throw at the end of the game to get the rebound?

“ a 66% average when they are trailing.”

Have you ever been trailing in a basketball game?

I can’t even comment on this anymore.
This is incredible garbage


#14    Guy      (see all posts) 2010/07/02 (Fri) @ 11:29

"The idea that “It would be visually noticed if it were true” applies to all sorts of logic that sabermetrics have disproven. Most people believed that some players were “clutch” until that was shown to be mostly myth.”

Most if not all of the myths disproven by statistical analysis were things that couldn’t be detected just by watching.  Which sabermetric insights are obvious through mere observation?

In any case, we are talking here about a sample of about 35 foul shots.  35!  And it won’t shock me to learn that 10 of them were taken by Shaq.
If you can tell me that five people independently watched those 102 games and all identified (at least mostly) these same 77 “wrong” fouls, without knowing either a) what the other scorers had said, or b) the outcome of the following foul shot, THEN I will start to take this seriously.

The idea that foul% drops to 34% when the player knows he wasn’t really fouled is just preposterous.  Many players would have to deliberately miss to get a 34% number.  For one thing, these players almost certainly believe that they too are often called unfairly for fouls, AND that refs have failed to call fouls committed against them—so it’s extremely easy to feel OK about benefiting from a bad call.  They also have an obligation to their teammates to try to score, and a big financial incentive to pad their own points total.

I continue to marvel that researchers put out crap like this, and that people take it seriously......


#15          (see all posts) 2010/07/02 (Fri) @ 12:36

Guy,

At your point, they also do the analysis with the calls that all 4 coders coded as ‘definitely incorrect’. 

The number of calls they all agreed on: 27
(and 56 when 3 of the 4 agree)

However, they only reviewed the original 77 calls already deemed ‘incorrect’ by the primary researcher and a some other ones the original researcher deemed ‘questionable’.  Not the entire 102 game video data set.  Ick.


#16    weskelton      (see all posts) 2010/07/02 (Fri) @ 13:33

So it sounds like we’re talking about a sample size of 77 free-throws, as opposed to the 262 that was suggested earlier.  When I saw that they were comparing the results of these free throws to the league average, my first reaction, like Guy’s, was “how many of these free throws resulted from a Hack-a-Shaq strategy”.  I did see the comment in the linked article that “the player’s normal free throw percentage did not matter”, so I’m assuming that this was considered. But, it definitely seems like something is definitely going wrong here.


#17    Ken      (see all posts) 2010/07/02 (Fri) @ 13:48

Guy /14 - “Most if not all of the myths disproven by statistical analysis were things that couldn’t be detected just by watching.  Which sabermetric insights are obvious through mere observation?”

I have no idea what you mean. The authors of the study present a statistical relationship that was not obvious to me from just watching the games. I agree that the result seems too large to be true, but that doesn’t mean that it isn’t.

My understanding is that a single coder watched 102 games, and identified 77 wrong calls, and an additional 62 questionable calls. They put these 139 calls on tape and asked 3 new coders to assess the validity of each call. They collectively agreed on 27, 3 out of 4 agreed on 56. In either case, or with the 77 calls initially defined as wrong, the shooter was worse on the first free throw attempt than if the foul was not determined to be wrong.

I don’t think the results are conclusive to me - partly because the robustness tests are all based around the same sample of plays - but there is a relatively easy solution. Watch a whole bunch of basketball games and prove them wrong. But simply dismissing research because it “is just preposterous” is a little arrogant, don’t you think? That same line of thinking is common to almost any interesting sabermetric finding.


#18    BC      (see all posts) 2010/07/02 (Fri) @ 14:07

I read the paper, just to get some of the facts straight. I would have liked a more detailed “methods” section, but here’s what’s in the manuscript:

1. The original N is 77 (77 instances of obviously incorrect foul calls) picked out of 102 games by one coder, before the coder saw the outcome of the foul shots.

2. Two sub-samples (n=56, n=27) obtained by bringing in 3 other judges looking at tape of 77 original instances randomly intermixed with 62 ambiguous calls, and getting 3/4 or 4/4 agreement on obviously incorrect calls. The results below didn’t change based on sub-sample.

3. The average free-throw percentage of the beneficiaries of the 77 obviously incorrect foul calls, for the entire season: 78.7%

4. The player’s season free-throw percentage was included in the regression.

5. Free throw % on first shot after bad call: 53.2%

6. Free throw % on second shot after bad call: 78.2%

There are clear problems with the data (i.e., we’d like more of it, and like more info about the method for collecting/analyzing the data the paper does present).

However, pts 4 & 6 indicate it’s not just a “Shaq” artifact.

Re: “there is simply no way that a 20-point drop is true.  The only question is where and how they made their error.” There aren’t any confidence intervals etc. reported, but I don’t think the researchers would be silly enough to claim that the 20% observed difference is the “true” size of the effect.


#19    Guy      (see all posts) 2010/07/02 (Fri) @ 14:28

"simply dismissing research because it “is just preposterous” is a little arrogant, don’t you think?”

I can see why it seems that way.  But when you’ve worked with a lot of statistical data, you have a sense of what’s plausible and what isn’t.  If the study said free throw % dropped by 10 points in this situation, I’d say “hmm, that’s interesting.  Unlikely, but possible.” But they are saying it drops by 40 points, and their sample is about 35 fouls.  So I don’t actually need to read the study to know this is bullsh!t.  The only question on the table is why the study is wrong.  I have no incentive personally to do the work necessary to figure that out, but if someone wants to go to the trouble I’m sure the flaw can be discovered. (Unless it’s just biased scoring, because the researcher did in fact know the outcome of the shot.)

A few years ago, I would have been more cautious and deferential, figuring the research had been done by a trained professional (PhD, I assume) and then vetted.  But now I know better.  Academic researchers make claims all the time that are obviously and ludicrously wrong.  Look at Phil Birnbaum’s recent dissection of a study finding that younger brothers attempt to steal bases 10x as frequently as older brothers in baseball.  Complete nonsense, but it got published. 

*

The review by scorers tells us nothing.  First, they reviewers probably knew or suspected these fouls had been singled out as “wrong.” But even if they didn’t, if there was an original bias in selecting these 77, any random subset the reviewers then agree were “wrong” will likely show the same low shot% result.  And that’s exactly what the data shows:  no matter how you slice the 77 you get the same result.  That means the same low shot% exists even on those fouls on which the scorers don’t agree—and which therefore can’t be “obviously wrong” enough to induce the necessary guilt by the player.

A proper review means letting scorers review the “correct” fouls too, to see how many of those get scored as wrong.  That will tell you if the researcher was truly identifying wrong fouls in an objective way.


#20    Guy      (see all posts) 2010/07/02 (Fri) @ 16:29

BC:  You report that at least 2 of 3 judges agreed that 73% of the 77 fouls were wrong.  Did they report the comparable percentage among the 62 fouls rated “ambiguous” by the researchers?  And what was the FT% on those fouls?


#21    dq      (see all posts) 2010/07/02 (Fri) @ 17:41

There were 733 players games in the NBA last year where a forward or guard shot 10 or more free throws. In all but 4, the player made at least 1/2 the free throws, and the worst shooting was 4 for 10. (I used forwards and guards since the % given was 78%)

A player who shoots 78% from the line could make 53% of them with his opposite hand.

In order for a group of players who shoot 78% to shoot as poorly as 53%, they would have to be consciously trying to miss some of the shots.

A player in a losing game will not try purposely to miss free throws.


#22    Ken      (see all posts) 2010/07/02 (Fri) @ 21:22

Guy/#19/ - “I can see why it seems that way.  But when you’ve worked with a lot of statistical data, you have a sense of what’s plausible and what isn’t.”

I actually have worked with lots of data (though primarily not sports data), and I agree with you that the magnitude seems implausible. I just don’t tend to be dismissive of other people’s research without a really good reason. In my opinion the sample size seems small, the effect seems large, and if true, the effect should be easy, though time-consuming, to replicate. And since I’m not willing to actually do the analysis myself, I don’t feel I can make a statement stronger than that.


#23    MGL      (see all posts) 2010/07/02 (Fri) @ 22:13

If you read enough of these studies, eventually you will find one (or more) where the results are 2 or 3 SD off from the null hypothesis just by chance alone.  Maybe this is the one!  Or the true effect is 5% and the rest is chance. Or the true effect is 10% and the rest is chance. Who knows?

Also, these are Bayesian problems where the a priori probabilities are determined to a large extent by common sense.  Common sense tells us that this effect they are looking for, if it exists in sports, is extremely likely to be subconscious and small.  If we were to set up an estimated distribution of what we think the effect might be before looking at the data and then we get the results, what would the final conclusion be?  A lot less than the 30+% of whatever they found!

IOW, the Bayesian analysis would be:

We estimate the chance of this kind of effect being 30% is .00001%. The chance that it is 20% is .0001%.  The chance that it is 10% is 1%. The chance that it is 5% is 3%.  The chance that it is 3% is 10%.  The chance that it is 2% is 20%.  1%, 40%.  No effect, 25%.  Or whatever.

Noe, given the results we got (30% in 77 shots, or whatever the numbers are), and given the above a priori probability distribution, what is the estimate of the mean effect?  Probably a lot less than 30%. Probably closer to 5%.  We can all probably live with that.

Common sense/a priori probabilities are critical with research like this.  The example, I always give is this:  Let’s say that we find that a certain sample of players hits 20 points in wOBA better during the day than at night, or in July as opposed to April. And let’s say that we find the exact same difference (20 points in wOBA) in the exact same sample size, but this time in odd days versus even days.  Is our conclusion from both sets of data the same?  Is our estimate of the true difference the same?  No!  Why not?  Bayes, Bayes, and more Bayes…


#24    Guy      (see all posts) 2010/07/03 (Sat) @ 09:00

"I just don’t tend to be dismissive of other people’s research without a really good reason.”

I think the amount of skepticism we bring should depend on the research.  Here, the finding rests entirely on a sample of 32 foul shots (wrong foul, team ahead), on which the players made 11 shots.  And, the sample was identified by the researcher himself, subjectively, creating the strong possibility of bias.  To me, that’s a study that deserves roughly zero deference.  The burden isn’t on us to disprove the finding, but on the researchers to go back to the drawing board.  There are other, well-designed studies, where I wouldn’t say that.

MGL left a 3rd crucial element out of his Bayesian analysis:  our prior estimate of the likelihood that a researcher who is looking for “inequity aversion” will generate biased data that appears to confirm the effect.  I don’t know what that P is, but it’s way higher than the P for a 34% shooting%.  If we include that, I think we will estimate the effect at zero.

The answer probably lies in their data.  Take a look at those fouls rated “ambiguous” by the researcher, and which 2 of 3 judges say were wrong.  Those should be the same kind of fouls as the 3/4 fouls within the initial 77 sample—the difference is we’ve removed any possible bias by the primary researcher.  I’ll bet the shooting% in that sample is close to normal.


#25    BC      (see all posts) 2010/07/03 (Sat) @ 09:53

One could clearly design a better study to test this hypothesis. As MGL notes “common sense” (i.e. common sense plus lots of previous social psychological research) indicates that the effect size is more likely to be something on the order of 5% than 20% (but who knows?). I did a quick back of the internet calculation, and the required N for a 80% powered study to detect that small diff would be around 250.

Much more interesting study design questions: what other variables would folks want to see included in the analysis? how should the coding of calls be done?

20/guy:"You report that at least 2 of 3 judges agreed that 73% of the 77 fouls were wrong.  Did they report the comparable percentage among the 62 fouls rated “ambiguous” by the researchers?  And what was the FT% on those fouls?”

Nope, they did not--actually it’s not even clear from the paper that the 2 of 3 judges agreed with the original judgment, or whether the original coder re-coded while watching the mix tape.

I wholeheartedly agree with you that the methods (or at least the reporting thereof) is grossly inadequate.


#26    MGL      (see all posts) 2010/07/04 (Sun) @ 01:35

I don’t really think that coding of “incorrect fouls” should be an issue.  If my coding is bad, wouldn’t it be even more unlikely to find a large difference between the two groups?

If I did this study and found no difference, then I think a reasonable criticism by someone who thought that there SHOULD be a difference would be that my coding was bad, thus watering down the true differences between the groups.  If I find a large difference between the groups, it makes little sense to me to criticize the coding of the fouls. 

“MGL left a 3rd crucial element out of his Bayesian analysis:  our prior estimate of the likelihood that a researcher who is looking for “inequity aversion” will generate biased data that appears to confirm the effect.”

Guy, I was assuming that the research itself is beyond reproach, other than the mistake of not considering it a Bayesian problem.  Now, if the researcher is biased and that affects the quality of his research, that is another story altogether.

The other significant issue we find with research where the researcher is a stakeholder or has an agenda, is publishing bias. If 1000 researchers research the same issue, and none of them report their findings if they find no evidence outside of the null hypothesis and the only ones who publish are the ones who find results that are different from the null hypothesis and those differences are statistically significant, we obviously have a problem…


#27    Guy      (see all posts) 2010/07/04 (Sun) @ 08:27

MGL, I’m not suggesting the coding if inaccurate in terms of how “wrong” the call was, but that it is linked (unintentionally, I assume) to the outcome on the following shot.  For one thing, the shot% seems to be the same regardless of how many independent judges agree a foul was wrong—suggesting the true “wrongness” of the call is not the cause of this apparent effect.  More important is the failure to disclose what the judges said about those fouls NOT deemed wrong by the original researcher.  My guess is 1) the judges also called a lot of those wrong—calling into question the actual basis of the original coding—and 2) that the shooting% on the fouls called wrong only by independent judges (but not the researcher) is not lower than usual.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 03 22:02
Susan G. Komen

Feb 03 20:18
Aasif Mavi and The Daily Show

Feb 03 20:06
Werth: How long can a non-CF stay in CF?

Feb 03 19:54
Illusion of numbers

Feb 03 18:02
Knowing enough about numbers to be dangerous

Feb 03 16:36
Who’s evaluating the 2011 forecasts this year?

Feb 03 13:47
Are relievers being used optimally, compared to 1980?

Feb 03 13:00
Casey Kotchman line

Feb 03 12:11
ULTIMATE BASEBALL THE GAME

Feb 03 12:03
Tango, Jr.