THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, March 30, 2007

Mean, Median: Meandian?

By Tangotiger, 03:20 PM

Maybe a couple of you math students can help me.  In the past, I would go through the ballots, and drop 1% or 2% of the obvious junk ballots.  They’re easy enough to spot, but take a bit of time to setup.  I’m thinking of another way to do it:


What if I start my focus on the median.  Typically, all the votes would be clustered, so that if I had a junk ballot or two way above or way below the median, the median itself would hardly change.  But, if I use the median, I’m basically treating the distribution to be normal around that median (whether or not it was, it would have the same median).  I can’t take the mean, because of the possible junk ballots having undue weight.  Then I thought: what if I take the square root of the distance between the ballot and the median?

For example, say I have the following forecasts for a player: .850, .875, .900, .910, .920.  The median is .900, and the mean is .891.  If I do the square root process, I get this: -sqrt(.050)-sqrt(.025)+0+sqrt(.010)+sqrt(.020), which when averaged, and squared and added to .900 I get .899.  In this case, I would be happy with the mean, since it’s easier to calculate, and I have no reason to suspect a junk ballot. 

But, what if I had someone put in a .700?  Now the mean is .859, while the median is .888.  And with the square root process?  .885.  In this case, we see we don’t want the mean, because of the obvious junk ballot.  But, rather than discarding it altogether, we keep it, just-in-case it’s not a junk ballot.  If we followed the mean process, for me to get a mean of .885, I would have to change the junk ballot from .700 to .855.  In this illustration, it would be counting the junk ballot (.700) as if it was the realistic pessimistic forecast (.850).

And if instead of .700, if someone put 1.100 instead, now the mean is .926, and the median is .905.  The square root process gives me .906.

So, all I’m doing is weighting the ballots more if they are closer to the median, and weighting them less if they are farther away.

My questions: is this something new?  Does it have any validity?  Do you see a problem? 

#1    Tangotiger      (see all posts) 2007/03/30 (Fri) @ 16:48

I’ll also ask: why square root?  I could have put it to the power of anything under 1.

Going back to the junk ballot with the .700:
- If I set the exponent to a number approaching zero, my “meandian” is actual exactly equal to the median (.888)
- If I set the exponent to exactly 1, my “meandian” is exactly equal to the mean (.859)

So, depending how much I want to control the balance between mean and median, I can set the exponent to whatever I want.  An exponent of 2/3 gives me a meandian of .879.  Set it to 1/3 and I get .887.  Set it to one-half (i.e., square root), and I get .885.


#2          (see all posts) 2007/03/30 (Fri) @ 17:05

Makes sense to me.  Linear regression maximizes the sum of squares, which gives more weight to outliers.  (That is, sum of squares in (-4, +2) is more negative than (-2, 0), even though the sum of first powers is the same.) The analogy isn’t perfect, because you’re taking the sign out and putting it back in later, but still.

One of the economics blogs (either JC or TWOW) talked about minimizing just the absolute values (without squaring), because they said it gives less weight to the outliers.  So that’s been done before, and it must be a standard technique.  You’re just going one step further: the sum of square roots in (-4, +2) is smaller than (-2, 0), even though the sum of first powers is the same.

(Of course, if you wanted to use the sum of cubes, or the sum of fourth powers, or the sum of 10th powers, you’d be weighting outliers even higher.  The sum of 10th powers especially: you’d be giving a 4-point estimate over a million times the weight of a 1-point estimate.)

Of course, you’re using sums of square roots from the median, rather than from the mean, but that shouldn’t matter.

It seems to me that the method “works” if you feel like it does.  There’s no “right answer,” really, is there?

The only issue to me is: what if a minority of raters legitimately believe a player will collapse?  Suppose you have 20 votes of .700, and 100 votes of .075.  Will your method agree with your intuitive feeling of what the summary statistic should be?

But you’ve probably realized all this already, and maybe I’m misunderstanding the question.


#3    cephyn      (see all posts) 2007/03/30 (Fri) @ 17:16

You’re doing a weird hodgepodge of things here. basically you’re finding a function that allows you to weight the values with the answer being between the mean and the median. So the real key is, what exponent do you use? And why? Why should I trust your weighted average (which is what this ends up being), what significance is your exponent?

A faster way would be to find the mean, find the median, weight them (ie give them a coefficient) and average them. Don’t know what it would tell you though, other than you can calculate an arbitrary weighted average between the median and the true mean.


#4          (see all posts) 2007/03/30 (Fri) @ 18:01

This is basically the field of robust statistics: trying to find the best estimate of an unknown quantity in sample data that is (possibly) contaminated with outliers. There is a large literature on the subject (most of which will be totally irrelevant to what you’re trying to do, of course).

One common technique is to minimize the weighted sum of square deviations, where the weight itself depends on the deviation, i.e. minimize
Sum(W(d) d^2) where W is some weighting function.
Taking W=1 is just the usual least squares minimization, which occurs at the mean of the samples.
Taking W=1/|d| minimizes absolute deviations, which occurs (more or less) at the median.
Taking W to be a step function that is 1 in some neighborhood of 0, and 0 outside of it, is the same as simply throwing out selected outliers.
There are many other choices of W which are more or less useful depending on the situation, and often depend on some parameter having to do with the scale of the data. Knowing which one is “right” depends on prior knowledge of what the distribution looks like, which we don’t have here.
An additional drawback is that there is in general not an analytic solution to finding the minimum, although iterative methods are easy to implement.

Have your eyes glazed over yet?

Your method doesn’t exactly fit into this framework, and theoretically it looks unsound to me. One obvious problem is that it doesn’t scale correctly. That is, if you scale all the scores by 1000 (so that they are integers rather than fractions), than your adjustment scales by a different amount (the square root of 1000).

However, it might be that on this particular scale the answers it gives are good enough, and you’ll be able to use it without the statistics police breaking down your door.


#5    Tangotiger      (see all posts) 2007/03/30 (Fri) @ 18:07

I’m taking the square root of the difference (not of the absolutes), so multiplying everything by 1000 will still give you the same result.  I think.  I’ll double-check.


#6    tangotiger      (see all posts) 2007/03/30 (Fri) @ 19:16

I guess the best way for me to test what the right answer is, is to go through the 1000 ballots, and remove the junk ballots, and then compare the results of the mean of the valid ballots to the mean and median of the 1000 ballots, and figure out the right balance between mean and median.


#7    David Smyth      (see all posts) 2007/03/30 (Fri) @ 19:30

Is it really “kosher” to throw out some ballots, even if they look like deliberate junk ballots? I mean, you are using the philosophy of the wisdom of crowds here. Shouldn’t you be true to the spirit of the concept, and accept both the good features and the bad features of the approach? Otherwise, the resultant ‘accuracy’ will be higher than it ‘should’ be.

This is not the ‘wisdom of crowds’, it’s the ‘wisdom of crowds edited by Tangotiger’.


#8    David Smyth      (see all posts) 2007/03/30 (Fri) @ 19:42

Let me add that I understand that it is an accepted statistical technique, to throw out outliers, in many applications. That’s what some of the prior posters are talking about. But for some reason, it strikes me as “cheating” in this particular application.


#9          (see all posts) 2007/03/30 (Fri) @ 20:05

David,

I think there’s a difference between an outlier (i.e. someone who has a different opinion of the situation) and a spammer (i.e. someone who just hates the Yankees). Think “wisdom of the crowds” if everyone had some money riding on the result.


#10    tangotiger      (see all posts) 2007/03/30 (Fri) @ 20:09

I agree with the second David.  Since this is a free-for-all, I have to put in controls after-the-fact.  Normally, all the controls would be in before-the-fact.  If I were to take the ballots face-to-face, there’s no way that a fan would say Pujols is a .500 OPS.  In one ballot, it’s clear that the fan put OBP numbers, not OPS.  I’m not interested in spammers, and those have to be vetted out.


#11    David Smyth      (see all posts) 2007/03/30 (Fri) @ 20:51

I understand, Tango. But I would guess (without any technical knowledge of the topic) that your after-the-fact adjustments, while perhaps ‘wise’, essentially remove your results from a strict ‘wisdom of crowds’ perspective. If so, then simply acknowledge this and move on. Human error (mistaking OPS for OBA), and presumably deliberate tampering, are an expected part of what you get from a large crowd…

I looked up ‘wisdom of crowds’ on Wikipedia, and don’t really have the energy to figure out if what is being done here really qualifies. Hopefully Tango has done that…


#12          (see all posts) 2007/03/30 (Fri) @ 22:09

I don’t think there’s any strict algorithm for figuring “wisdom of crowds.” For instance, take the stock market.  Suppose stock X is at $30.  Some think it’s worth $25, some $20, and so on.  Some think it’s $35, some $40, and so on the other way.

What does the price wind up as?  It depends.  If the guy who thinks it’s $40 is a billionaire with a high appetite for risk, it might wind up at $39.  But if he’s poor, he might not act on his belief at all.

This is an example where all participants are not weighted equally—they’re weighted by how much money they’re willing to stake on their opinions.  If you *do* take all participants equally, you wind up with the mean.  If you ignore participants who are obviously not using their “wisdom,” you use the mean without the outliers.  And if you don’t want to throw them out, but you want to weight them less, you use Tango’s algorithm.

I bet in real life, every case of “wisdom of crowds” uses a slightly different algorithm.

And so if the “wisdom of crowds” works, it has to work for any reasonable algorithm.  Some algorithms might be better predictors than others, but who really knows? 

You have to argue case by case.  And you’ve gotta think that deleting obvious flakes who predict a .500 or 1.400 OPS for Albert Pujols makes the crowd wiser.


#13    Greg Rybarczyk      (see all posts) 2007/03/31 (Sat) @ 01:18

A hybrid approach:  truncate the data set by removing the top and bottom 5% of all data points.  This is letting the data tell you which points are the outliers, it is not subjective at all. 

Then take the mean of the remaining inner 90% of the values.

Taking the tails off the distribution should not materially affect the identification of the center, IF it is a smooth distribution.  And if it is not, because some of the outer points diverge significantly from the distribution, then they ought to be removed anyway.  So, removing the outer 5% tails is either immaterial, or appropriate…

I think if you get any more involved than this, you’re just gilding the lily…

I don’t agree with weighting according to separation from the mean, if each vote is an independent trial, then they deserve identical treatment and valuation


#14    tangotiger      (see all posts) 2007/03/31 (Sat) @ 10:03

I was considering the 5% scenario at first, but the problem is what to do with guy with only 7 or 12 ballots.  Most of my players would have less than 20 picks for each.

What I’ll likely do is probably do what I did with the Fielding Scouting Report, and remove the obvious junk ballots, and counting all remaining values equally.


#15    Dave Clark      (see all posts) 2007/04/01 (Sun) @ 20:50

Part of the Wisdom Crowds theory is that the crowd has to have some form of wisdom about the subject, isn’t it?

Isn’t that why obvious frauds can’t be included?

I read of an application of the Wisdom of Crowds to find a downed submarine in the North Atlantic.  If a respondent had said that the sub was in the Indian Ocean that should not be factored into the plot points.  By the way, the sub was found through the WoC methodology.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 00:01
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors