Friday, March 30, 2007
Mean, Median: Meandian?
Maybe a couple of you math students can help me. In the past, I would go through the ballots, and drop 1% or 2% of the obvious junk ballots. They’re easy enough to spot, but take a bit of time to setup. I’m thinking of another way to do it:
What if I start my focus on the median. Typically, all the votes would be clustered, so that if I had a junk ballot or two way above or way below the median, the median itself would hardly change. But, if I use the median, I’m basically treating the distribution to be normal around that median (whether or not it was, it would have the same median). I can’t take the mean, because of the possible junk ballots having undue weight. Then I thought: what if I take the square root of the distance between the ballot and the median?
For example, say I have the following forecasts for a player: .850, .875, .900, .910, .920. The median is .900, and the mean is .891. If I do the square root process, I get this: -sqrt(.050)-sqrt(.025)+0+sqrt(.010)+sqrt(.020), which when averaged, and squared and added to .900 I get .899. In this case, I would be happy with the mean, since it’s easier to calculate, and I have no reason to suspect a junk ballot.
But, what if I had someone put in a .700? Now the mean is .859, while the median is .888. And with the square root process? .885. In this case, we see we don’t want the mean, because of the obvious junk ballot. But, rather than discarding it altogether, we keep it, just-in-case it’s not a junk ballot. If we followed the mean process, for me to get a mean of .885, I would have to change the junk ballot from .700 to .855. In this illustration, it would be counting the junk ballot (.700) as if it was the realistic pessimistic forecast (.850).
And if instead of .700, if someone put 1.100 instead, now the mean is .926, and the median is .905. The square root process gives me .906.
So, all I’m doing is weighting the ballots more if they are closer to the median, and weighting them less if they are farther away.
My questions: is this something new? Does it have any validity? Do you see a problem?
I’ll also ask: why square root? I could have put it to the power of anything under 1.
Going back to the junk ballot with the .700:
- If I set the exponent to a number approaching zero, my “meandian” is actual exactly equal to the median (.888)
- If I set the exponent to exactly 1, my “meandian” is exactly equal to the mean (.859)
So, depending how much I want to control the balance between mean and median, I can set the exponent to whatever I want. An exponent of 2/3 gives me a meandian of .879. Set it to 1/3 and I get .887. Set it to one-half (i.e., square root), and I get .885.