THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, November 16, 2007

Reliability of statistics

By Tangotiger, 04:44 PM

Print, read, put aside, re-read then come back here.

I really wish Pizza would have included the mean, not just the minimum, for each stat, but he said he’d get back to that later.  The “intraclass correlation"-type equation I use is:
r = PA/(PA+x), where x is unique to each metric. 
For things like OBP or wOBA, x is 200.  This means that to get an r=.50, you need 200 PA.  Pizza likes to use r=.70, which means you need a mean of 467.  Pizza showed us that you need a minimum of 350 PA (which in the context that he chose, means a range of 350 to 700-odd PA), and therefore likely supports the standard equation that I use.

It’s a great post that he did, and a great service that he’s doing.  But, I take exception to this part:


Context Neutral wins (sum (WPA/LI)) - never did.  at 650 PA, it was at .588

The implication here is that you get an r=.588 at around a mean of x=675PA or so, meaning you’d get a correlation equation of r=PA/(PA+480).  And that’s ridiculous.  The sum of WPA/LI should be virtually identical to wOBA or OPS or LWTS or anything else in terms of reliability. 

Here’s a standard year-to-year correlation from Fangraphs, where he shows in the main blog entry, plus the 4th comment (data from 05/06):

AVG: .12
WPA: .27
BRAA: .35
OBP: .36
OPS: .36
WPA/LI for 2005 to 2006 was .36. For Clutch, it’s .01, as suspected.
SLG: .38

Those numbers are r-squared, not r. As you can see, WPA/LI was at the same level as OPS.  I definitely think that Pizza made a calculation goof somewhere.

That rant aside, great work.

As for calculating the mean, most people will just take the straight mean.  But, as Andy has shown me, what you really want to do is take the average of 1/PA, and then take 1/that.  This has to do with the variance, and if you play around with it, you’ll see that it makes sense.  Which may be the reason that Pizza didn’t present the mean, because he wants to describe something like that.

(38) Comments • 2008/01/08 • SabermetricsStatistical_Theory
Page 1 of 1 pages

<< Back to main


Latest...

COMMENTS

Nov 21 17:29
Sabermetric Moves of the 2009 Pre-Season

Nov 21 20:13
Runs Produced

Nov 21 19:27
Marcel 2009 is here

Nov 21 17:50
The New Triple Crown

Nov 21 16:43
Nate Silver: hero to interviewers

Nov 21 10:57
New BBTN

Nov 20 20:34
ABSO-lutely… not!

Nov 20 19:23
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being

Nov 20 18:06
Top Free Agent Pitchers

Nov 20 17:45
NBA’s Marcel