THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, September 22, 2006

How Reliable Are Fans?

Very.  Here’s the evidence, using the Fans’ Scouting Report.


How many fans do we really need participating for each team?

Here’s a mini-study I ran, and I’ll take you step-by-step.  The first thing I did was look for all ballots where all 7 traits were filled in for each player, and that there were at least 8 such players on each ballot.  These can be considered a “fully-completed” ballot.

For each player, I figured out his unweighted score.  So, a guy with all 5s was given a 5.0, a guy with some 5s, and some 4s would have been a 4-point-something, etc.

I then took a random team, the Twins.  They had 31 fully-completed ballots.  I divided up the Twins fans into two random sets, and computed, for each player, their average score.  I ended up with these two sets:

Set1 - Set2
3.8 - 4 Bartlett, Jason
2.3 - 2.3 Batista, Tony
3.6 - 4 Castillo, Luis
2.8 - 3 Castro, Juan
3.8 - 3.8 Cuddyer, Michael
3.6 - 3.5 Ford, Lew
4.2 - 4.4 Hunter, Torii
3.1 - 2.9 Kubel, Jason
4.4 - 4.5 Mauer, Joe
3.1 - 3.1 Morneau, Justin
3.9 - 4.2 Punto, Nick
3.1 - 3.6 Redmond, Mike
3.5 - 3.4 Rodriguez, Luis
2.3 - 2.5 Stewart, Shannon
3.8 - 3.8 Tyner, Jason
2.5 - 2.7 White, Rondell

That’s a total of 16 fielders, each received 8 to 16 votes (for an average of 13 votes in each set).

Then, a simple correlation was run between the two.  The correlation (r) was 0.96.  For all intents and purposes, you take a group of 13 hardcore Twins fans, and compare their evaluation to 13 other hardcore Twins fans, and they will agree almost perfectly.  Anyone else surprised that just getting 13 ballots is enough?

Let’s drop it down some, and try it with another team.  This time, we’ll try the Reds.  I have 15 Reds players, who each received 7 to 12 votes (average of 9.5).  Here is how one group of fans evaluated them against the other:
3 - 2.8 Aurilia, Rich
4 - 3.9 Castro, Juan
2.8 - 2.7 Clayton, Royce
3.7 - 3.9 Denorfia, Chris
2.2 - 2.3 Dunn, Adam
3.4 - 3.5 Encarnacion, Edwin
3.9 - 3.9 Freel, Ryan
3 - 3.1 Griffey Jr., Ken
2.6 - 2.6 Hatteberg, Scott
3.7 - 3.5 Kearns, Austin
3.6 - 3.3 LaRue, Jason
2.6 - 2.5 Lopez, Felipe
4.1 - 4.1 Phillips, Brandon
2.6 - 2.6 Ross, David
2.2 - 2.2 Valentin, Javier

Remember now, we have fewer fans here compared to the Twins (9.5 against 13).  Our correlation between the two groups of Reds fans?  0.98!

Let’s look at one last one, the Pirates.  I have 13 players evaluated for each group of fans, with each player received 4 to 7 evaluations (for an average of only six evaluations in each set).  Pretty tiny, right?  Here are their evaluations:
3.6 - 3.3 Bautista, Jose
3.3 - 3.4 Bay, Jason
2.6 - 2.5 Burnitz, Jeromy
2.8 - 2.7 Casey, Sean
3.2 - 2.9 Castillo, Jose
4.4 - 4 Duffy, Chris
3.6 - 3.5 McLouth, Nate
3.6 - 3.3 Nady, Xavier
2.8 - 2.9 Paulino, Ronny
3.3 - 3 Randa, Joe
3.7 - 4.2 Sanchez, Freddy
2.7 - 2.6 Wilson, Craig
4 - 3.9 Wilson, Jack

And the regression?  0.90.  Think about that. 

I need 2,000 PA for a hitter to have as much reliability that his performance represents his true talent level, as I need SIX fans to tell me what they see is representative of other hardcore fans.  Now remember, fans in general can be completely wrong, but they will be wrong together.  All we’re looking for here is how reliable is the small representation, and not how accurate is their evaluation.

Here’s a quick regression estimate:
r = fans / (fans + 0.6)

If you have 6 fans, the reliability is .90.  If you have 13 fans, the reliability is .96.

Establish what reliability level you want, and I can tell you how many fans you need.

Finally, the biggest reprensentation, by far, was for the Mariners.  I have an average of 68.5 evaluations for each of these players:
4.3 - 4.3 Beltre, Adrian
4.4 - 4.4 Betancourt, Yuniesky
2.8 - 2.7 Bloomquist, Willie
2.8 - 2.7 Ibanez, Raul
3.3 - 3.2 Johjima, Kenji
3.4 - 3.6 Jones, Adam
3.5 - 3.5 Lopez, Jose
3.5 - 3.5 Reed, Jeremy
2.6 - 2.6 Rivera, Rene
2.8 - 2.7 Sexson, Richie
4.8 - 4.8 Suzuki, Ichiro

The correlation is 0.994.  What would our equation say?  68.55 / (68.55+.6) = 0.991.

(55) Comments • 2006/10/02 • SabermetricsFieldingScouting
Page 1 of 1 pages

<< Back to main