Friday, August 20, 2010
Bayes is Regression Toward The Mean
This comment from another thread sparked the exchange that follows between me and Jared:
You’ve got a weighted die. You know it’s weighted because you built it. It lands on “1” 25% of the time. You also have nine unweighted dice. You built those too.
You put all 10 in a pouch. You roll each die 36 times. You get these counts for the number of times you roll “1” for the 10 dice:
11-9-8-7-6-6-6-5-5-4
Which one is the weighted die? YOU DON’T KNOW!!!
What is the CHANCE that it’s the one that rolled 11 ones? MORE than the chance that it’s the one that rolled 4 ones. But EACH of the 10 has a chance to be the weighted die.
It’s all based on probability, a number that is GREATER than zero and LESS than one.
If anyone says “all luck” or “all skill”, leave this blog, and never come back. WE DON’T KNOW. All we can do is make a best estimate as to the mean, and a best estimate as to the uncertainty of that mean. And, if you like, a best estimate of the uncertainty of the uncertainty of that mean. And so on.


It might be interesting to see what you get when using Tango’s dice example using a) Bayes Theorem and b) Regression to the mean. In theory, it should be the same but I think amid the Strasburg discussion MGL said that regression to the mean breaks down a the extremes b/c we don’t have a normal distribution (sorry, if I’m misstating that).
I’d revise the problem slightly so that rather than removing all 10 dice, you remove 10 dice from an infinite supply of dice, 10% of which are weighted (since, I think this better matches the baseball analogy).
So, using Bayes theorem the prior probability of any die being weighted is .1, the chance of getting a “hit” (a “1") if weighted is .25 and if not weighted is 1/6th.
Given the results, the revised probabilities of each die being weighted are:
hits, posterior p(weighted), expected hit%
11, 0.41, .200
9, 0.20, .183
8, 0.13, .178
7, 0.08, .174
6, 0.05, .171
5, 0.03, .170
4, 0.02, .169
How does this compare to regression each die’s hit% to the population mean of 0.171?