Tuesday, November 08, 2011
Rates without Sample Size
I agree with Matt wholeheartedly.
***
I’ve had a minor issue with Pizza Cutter’s threshold for “stabilization”, which I’ve mentioned several times in this blog. Basically, Pizza sets the threshold at r=.70, whereas I set the threshold at r=.50. Why do I prefer mine? Because with my threshold, I can tell you exactly how much to regress the stats. It gives you extra information. In addition, I can explain it in English. If I set the OBP threshold at PA=210, then I can say: “If the player has 210 plate appearances, then his OBP is half real and half noise. Regress his OBP by 50% toward the mean.”
And, if the player had 500 PA, then you would regress by 210 / (210 + 500) = 30%.
For Pizza, r=.70 would mean THE EXACT SAME THING. But his threshold would be PA=500. So, his threshold say: “If the player has 500 plate apperances, then his OBP is 70% real and 30% noise. Regress his OBP by 30% toward the mean”.
So, exact same thing. But, if the player had 400 PA, then what? Well, in my case, you know exactly how much to regress by: 210/(210+400) = 34%. But with Pizza’s case? You’d have to do: 1-400/(400+.3/.7*500) = 34%. That 3/7ths thing there is not very attractive to me.
Pizza is as stubborn as I am, because we both knew exactly what the other guy meant, and still, both of us stuck to our guns on this issue.
Note: no actual pizzas were hurt in the creation of this post.
***
Derek Carty posted the 50% threshold here:
http://www.insidethebook.com/ee/index.php/site/comments/when_is_the_observed_data_half_real_and_half_noise/