Friday, September 22, 2006
How Reliable Are Fans?
Very. Here’s the evidence, using the Fans’ Scouting Report.
How many fans do we really need participating for each team?
Here’s a mini-study I ran, and I’ll take you step-by-step. The first thing I did was look for all ballots where all 7 traits were filled in for each player, and that there were at least 8 such players on each ballot. These can be considered a “fully-completed” ballot.
For each player, I figured out his unweighted score. So, a guy with all 5s was given a 5.0, a guy with some 5s, and some 4s would have been a 4-point-something, etc.
I then took a random team, the Twins. They had 31 fully-completed ballots. I divided up the Twins fans into two random sets, and computed, for each player, their average score. I ended up with these two sets:
Set1 - Set2
3.8 - 4 Bartlett, Jason
2.3 - 2.3 Batista, Tony
3.6 - 4 Castillo, Luis
2.8 - 3 Castro, Juan
3.8 - 3.8 Cuddyer, Michael
3.6 - 3.5 Ford, Lew
4.2 - 4.4 Hunter, Torii
3.1 - 2.9 Kubel, Jason
4.4 - 4.5 Mauer, Joe
3.1 - 3.1 Morneau, Justin
3.9 - 4.2 Punto, Nick
3.1 - 3.6 Redmond, Mike
3.5 - 3.4 Rodriguez, Luis
2.3 - 2.5 Stewart, Shannon
3.8 - 3.8 Tyner, Jason
2.5 - 2.7 White, Rondell
That’s a total of 16 fielders, each received 8 to 16 votes (for an average of 13 votes in each set).
Then, a simple correlation was run between the two. The correlation (r) was 0.96. For all intents and purposes, you take a group of 13 hardcore Twins fans, and compare their evaluation to 13 other hardcore Twins fans, and they will agree almost perfectly. Anyone else surprised that just getting 13 ballots is enough?
Let’s drop it down some, and try it with another team. This time, we’ll try the Reds. I have 15 Reds players, who each received 7 to 12 votes (average of 9.5). Here is how one group of fans evaluated them against the other:
3 - 2.8 Aurilia, Rich
4 - 3.9 Castro, Juan
2.8 - 2.7 Clayton, Royce
3.7 - 3.9 Denorfia, Chris
2.2 - 2.3 Dunn, Adam
3.4 - 3.5 Encarnacion, Edwin
3.9 - 3.9 Freel, Ryan
3 - 3.1 Griffey Jr., Ken
2.6 - 2.6 Hatteberg, Scott
3.7 - 3.5 Kearns, Austin
3.6 - 3.3 LaRue, Jason
2.6 - 2.5 Lopez, Felipe
4.1 - 4.1 Phillips, Brandon
2.6 - 2.6 Ross, David
2.2 - 2.2 Valentin, Javier
Remember now, we have fewer fans here compared to the Twins (9.5 against 13). Our correlation between the two groups of Reds fans? 0.98!
Let’s look at one last one, the Pirates. I have 13 players evaluated for each group of fans, with each player received 4 to 7 evaluations (for an average of only six evaluations in each set). Pretty tiny, right? Here are their evaluations:
3.6 - 3.3 Bautista, Jose
3.3 - 3.4 Bay, Jason
2.6 - 2.5 Burnitz, Jeromy
2.8 - 2.7 Casey, Sean
3.2 - 2.9 Castillo, Jose
4.4 - 4 Duffy, Chris
3.6 - 3.5 McLouth, Nate
3.6 - 3.3 Nady, Xavier
2.8 - 2.9 Paulino, Ronny
3.3 - 3 Randa, Joe
3.7 - 4.2 Sanchez, Freddy
2.7 - 2.6 Wilson, Craig
4 - 3.9 Wilson, Jack
And the regression? 0.90. Think about that.
I need 2,000 PA for a hitter to have as much reliability that his performance represents his true talent level, as I need SIX fans to tell me what they see is representative of other hardcore fans. Now remember, fans in general can be completely wrong, but they will be wrong together. All we’re looking for here is how reliable is the small representation, and not how accurate is their evaluation.
Here’s a quick regression estimate:
r = fans / (fans + 0.6)
If you have 6 fans, the reliability is .90. If you have 13 fans, the reliability is .96.
Establish what reliability level you want, and I can tell you how many fans you need.
Finally, the biggest reprensentation, by far, was for the Mariners. I have an average of 68.5 evaluations for each of these players:
4.3 - 4.3 Beltre, Adrian
4.4 - 4.4 Betancourt, Yuniesky
2.8 - 2.7 Bloomquist, Willie
2.8 - 2.7 Ibanez, Raul
3.3 - 3.2 Johjima, Kenji
3.4 - 3.6 Jones, Adam
3.5 - 3.5 Lopez, Jose
3.5 - 3.5 Reed, Jeremy
2.6 - 2.6 Rivera, Rene
2.8 - 2.7 Sexson, Richie
4.8 - 4.8 Suzuki, Ichiro
The correlation is 0.994. What would our equation say? 68.55 / (68.55+.6) = 0.991.


I think I’m on record as not being a fan of the Fan Scouting Report.
Given that admission, I’m wondering what is the significance that there is a high agreement among the responders. There are obvious reasons why that should be somewhat expected, having little or nothing to do with the accuracy of ratings. And, even if the ratings *are* reasonably accurate, it’s not neccessarily (or even likely, IMO) because fans are good enough scouts to produce accurate ratings “en masse”.