THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, June 13, 2011

When is the observed data half real and half noise?

By Tangotiger, 08:12 PM

Derek does exactly (one of the way of) what I do.  I don’t know that I actually get the same results, but, the process is bang-on.

Stabilizes    Years    Stat    Denominator
100    0.2    K    PA
-IBB-HBP
168    0.3    UIBB    PA
-IBB-HBP
253    0.4    IBB    PA
501    0.8    HBP    PA
-IBB
959    2.1    1B    PA
-HBP-K-BB-HR-ROE
833    1.8    2B
+3B    PA-HBP-K-BB-HR-ROE
48    1.5    2B    2B
+3B
48    1.5    3B    2B
+3B
1126    2.4    1B
+2B+3B (BABIP)    PA-HBP-K-BB-HR-ROE
143    0.3    HR    PA
-K-BB-HBP
62    0.5    HR 
(HR/FB)    OF FB [MLBAM]
65    0.5    HR 
(HR/FB)    OF FB [RS]
109    0.2    GB [MLBAM]    GB
+OF+IF+LD
116    0.2    GB [RS]    GB
+OF+IF+LD
182    0.4    OF FB [MLBAM]    GB
+OF+IF+LD
189    0.4    OF FB [RS]    GB
+OF+IF+LD
194    0.4    
IF FB [MLBAM]    GB+OF+IF+LD
233    0.5    
IF FB [RS]    GB+OF+IF+LD
795    1.7    LD [MLBAM]    GB
+OF+IF+LD
979    2.1    LD [RS]    GB
+OF+IF+LD
Inconclusive
*        SB%    SB+CS
39    0.3    SBA
%    1B+UIBB+HBP+ROE+FC

UPDATE: For pitchers:

Stabilizes    Years    Stat    Denominator
126    0.2    K    PA
-IBB-HBP
303    0.5    UIBB    PA
-IBB-HBP
943    1.5    IBB    PA
1346    2.1    HBP    PA
-IBB
3893    8.4    1B    PA
-HBP-K-BB-HR-ROE
2305    5    2B    PA
-HBP-K-BB-HR-ROE
4977    10.7    3B    PA
-HBP-K-BB-HR-ROE
1882    4    2B
+3B    PA-HBP-K-BB-HR-ROE
351    11    2B    2B
+3B
351    11    3B    2B
+3B
3729    8    1B
+2B+3B (BABIP)    PA-HBP-K-BB-HR-ROE
1271    2.7    HR    PA
-K-BB-HBP
1239    9.4    HR 
(HR/FB)    OF FB [MLBAM]
105    0.2    GB [MLBAM]    GB
+OF+IF+LD
205    0.4    OF FB [MLBAM]    GB
+OF+IF+LD
288    0.6    
IF FB [MLBAM]    GB+OF+IF+LD
2026    4.3    LD [MLBAM]    GB
+OF+IF+LD
36    2.3    SB    SB
+CS
161    1.2    SBA    1B
+UIBB+HBP+ROE+FC


#1    Tangotiger      (see all posts) 2011/06/13 (Mon) @ 20:20

Two interesting things:

1. While HR stabilizes faster “per denominator unit” when the denominator is FB than contacted PA, it stabilizes faster in terms of elapsed time with the contacted PA.

2. I don’t believe the BABIP result, as it certainly looks inconsistent with the 1B and 2B+3B numbers.  It would imply a negative correlation between 1B and 2B+3B (I think).

Anyway, great stuff.


#2    Colin Wyers      (see all posts) 2011/06/13 (Mon) @ 21:25

While HR stabilizes faster “per denominator unit” when the denominator is FB than contacted PA, it stabilizes faster in terms of elapsed time with the contacted PA.

And FB/CON stabilizes slower than HR/CON, so breaking it down into FB/CON and HR/FB is a net loss in terms of predictive power.

As for 1B being uncorrelated with 2B+3B - it should be a little, shouldn’t it? I mean, there are some hits where depending on how the fielder reacts and the runner’s speed could be either a single or a double; in theory, if you leg out more doubles your singles will drop, all else being equal. Obviously, there are countervailing pressures, but it doesn’t sound totally outlandish.


#3    Tangotiger      (see all posts) 2011/06/13 (Mon) @ 21:56

It just seems that the relationship should be more positive correlation than negative correlation (all things considered). 

But I could very well be wrong…


#4    Tangotiger      (see all posts) 2011/06/13 (Mon) @ 21:57

I’m interested in seeing the pitching results.  HR numbers will shoot way up, and the BABIP will shoot way way way up.


#5    Derek Carty      (see all posts) 2011/06/14 (Tue) @ 00:30

Yeah, BABIP was one that looked a little off to me initially, but I checked and double-checked and triple-checked and couldn’t find anything wrong with how it was being run.


#6    Colin Wyers      (see all posts) 2011/06/14 (Tue) @ 01:08

Tango, I just took a look at batters with 300+ AB from 2001-2010, and the correl between 1B/AB and (2B+3B)/AB was -0.22.


#7    Tangotiger      (see all posts) 2011/06/14 (Tue) @ 06:51

Colin: fascinating!


#8    Guy      (see all posts) 2011/06/14 (Tue) @ 13:08

I’m not sure the negative correlation btwn 1B and 2B is because a pool of BIP could (somewhat randomly) become either a 1B or 2B.  I think it has more to do with the fact that players tend to have one skill (hit singles) or the other (hit XBH).  Singles are also negatively correlated with BB and with HR/AB, while 2B are positively correlated with both.  Singles are strongly negatively correlated with strikeouts, while 2B have little relationship and HR have a positive correlation.  I think you’re just seeing two sets of skills (hit singles vs. power/BB) that are in fact negatively correlated.


#9    Tangotiger      (see all posts) 2011/06/20 (Mon) @ 09:31

I updated the above with Derek’s pitcher data.


#10    Tangotiger      (see all posts) 2011/06/20 (Mon) @ 09:50

By the way, I was extremely excited to see Derek’s number here:

Stabilizes Years Stat Denominator
3729 8 1B+2B+3B (BABIP) PA-HBP-K-BB-HR-ROE

Why?  Because a few years ago I said:
http://www.insidethebook.com/ee/index.php/site/comments/career_dips_numbers/

The halfway point (r=.50) is with BIP = 3700 or so. 


#11          (see all posts) 2011/06/20 (Mon) @ 15:52

Has there been any discussion whether the number at which R=.5 is the best way to present this type of information? 

I believe a credibility standard is less prone to the types of misuse noted in the linked article.  Using something like either of the following would acknowledge that 99 is almost as good as 100.
1) Z = N/(N+K)
2) Z = square root ( N / # of N needed for 100% credibility)

where
Z = credibility to give to the observations
N = number of observations (PA or AB, etc)
K = a ballast factor (not perfect terminology here)


#12    Tangotiger      (see all posts) 2011/06/20 (Mon) @ 16:02

I have been pushing for the r=.50 for several years.  There are two reasons for this.  If I say “ballast” = 300 PA, then it tells me TWO things:

1. r=.50, when PA = 300.
2. I can calculate the correlation at any given number of PA by doing ballast / (ballast + PA)

If on the other hand you do r=.707, then you only learn ONE thing.  In order to figure out the r at other points of PA, you need to do extra work.

Furthermore, by stating the ballast number as r=.50, we can use it conversationally, like “half the performance is noise when you have PA = 300”.


#13          (see all posts) 2011/06/20 (Mon) @ 16:03

I should note that I am aware pretty much everyone in this blog uses #1 from my post #11.  Just not sure if the Buhlman credibility approach was selected for any particular reason.


#14    Tangotiger      (see all posts) 2011/06/20 (Mon) @ 16:28

Buhlman?  Don’t know what that is…


#15          (see all posts) 2011/06/20 (Mon) @ 16:44

Buhlman is the name for the Z = N/(N+K) credibility approach (also called least squares).  Textbook Buhlman uses the prior mean as the complement instead of a larger group, but there are many considerations to selecting a complement.

There is also the Classical approach (also called limited fluctuation approach and uses the square root rule in #2 from post #11), and a Bayesian approach.


#16          (see all posts) 2011/06/20 (Mon) @ 18:24

I guess the question I’m trying to answer, is why select K such that that is the number of observations for the total to be .5? 

Maybe I am thinking about this incorrectly.  But my thinking is the value of K should be a ratio of how much variance there is between players and how much variance there is within an individual player if the complement is the league average.  In other words, is this measuring how quickly variance is explained for a stat, or how quickly the stat differentiates between players?

(note here that I haven’t ran any numbers so all the following examples are purely for demonstration purposes) Ground ball rate stabilizes faster in general than BABIP does.  Would the 0.5 benchmark be met by just taking a random sample from all players of 116 classifications to compare it to?  It would in all likelihood take more than 116 (maybe 150, maybe 300), but would it take the 1,126 observations that BABIP takes to get to 0.5, even if Year 1 in the Year to Year correlation was pure random from everyone?

If K is uesd to get how much credibility is given to the one player compared to the average, shouldn’t it account for how much different that one player is from the league average?


#17    Tangotiger      (see all posts) 2011/06/20 (Mon) @ 18:58

#16: your 2nd paragraph sounds right.

Let me make it clear however: we do NOT look for number of trials such that r=.50.  We look at various levels of number of trials to IMPLY r=.50.

So, you’d find that if your sample has 100 PA each, r=.33. If your sample has 200 PA, r=.50.  If your sample has 300 PA, r=.60.  If your sample has 400 PA, r=.67.

Each of these implies PA=200 r=.50. 

Of course, it’s never so clean.  But, that’s what we are after, by looking at various PA levels to get an implied r=.50.


#18          (see all posts) 2011/06/20 (Mon) @ 20:43

I am missing where there is a distinction for the variance of the true talent between all players and the variance within the individual’s performance in this study.

The sample is split into two groups, right? What would it look like if the second group had the playerID completely randomized? 

My hypothesis is the R would still increase as observations were added for half or more of these stat categories even though we know there “should” be zero correlation between the two because we made the second group completely random.  I am guessing that it will still be implied that r=.50 at some PA threshold using this design.


#19    Tangotiger      (see all posts) 2011/06/20 (Mon) @ 21:00

You would get r=0


#20          (see all posts) 2011/06/21 (Tue) @ 09:31

Thanks for the responses.  I appreciate the effort you put into making your comment section a valuable piece of the site. 

I think I see now why would get a terribly low r with randomized payerIDs with enough observations(because I would be trying to find a correlation between a random number and essentially a mean).  So that line of thinking won’t help me.

I am still trying to make it make sense for me as to why that is the proper value to use for K, though.  I think I keep trying to get something resembling an intercept coefficient the same time as the variable coefficient.


#21    Kincaid      (see all posts) 2011/06/23 (Thu) @ 20:05

Maybe I am thinking about this incorrectly.  But my thinking is the value of K should be a ratio of how much variance there is between players and how much variance there is within an individual player if the complement is the league average.

That is what it is.  For example, the SD between pitchers for BABIP talent is about .0075.  The variance for a single BIP for an individual pitcher randomly drawn from the complement distribution is about .21 [p(1-p), where p is about .300 (average BABIP)].  .21/(.0075^2) ~ 3729.

In other words, is this measuring how quickly variance is explained for a stat, or how quickly the stat differentiates between players?

It measures how quickly the observed variance is explained as much by the variance between players as by the random process variance within an individual player, which more-or-less tells you how quickly you can distinguish between players.

If K is uesd to get how much credibility is given to the one player compared to the average, shouldn’t it account for how much different that one player is from the league average?

K is dependent on the complement (namely how much variance there is in true talent between players-i.e. the variance of the hypothetical means- and how much random variance there is for a randomly selected player-i.e. the expected value of the process variance).  In this case, the complement is the pool of players in MLB.  That complement doesn’t change no matter which player you select from the pool, so whatever you observe from that player doesn’t change how much weight you give that observation. 

If you wanted to assess something like how confident you are that the player is truly above average, then an observation further from average would lead to a more confident result, but as far as determining how much weight to give it in estimating the expected mean going forward, it doesn’t matter.


#22          (see all posts) 2011/06/24 (Fri) @ 13:24

I have taken credibility theory and have a working knowledge of it.  However, I have never used this type of year to year correlation to get an implied K before.  If someone gives me a VHM and EVPV (or the underlying distributions) I can get a K for the N/(N+K) formula.

But, I am struggling to get this year to year correlation process to reconcile that for me (yes, I know the correlation results give the amount of variance explained by separating players).  Do the results correspond to what is believed to be the true distribution in all cases?  I tried to test this.

Using hitter’s BABIP (because I have that from Fangraphs already in a spreadsheet) as an example gives .21/1126 = .00019 as the SD. 
10-90 percentile range.
BIP of 50or more = .233 to .339
151 most BIP* = .270 to .343
BIP of 1 or more = .000 to .349

*that is about 350 BIP, which was about equivalent to 500 PA I believe.

The MLB BABIP was .296, the BABIP for this pool of 151 was .306 (I don’t have/didn’t use reached on error).  I plugged those into a normal distribution to give me the percentile range from the implied distributions.


Mean K 10-90 Range
.296 1126 .279 to .313
.306 200 .264 to .348
.296 100 .237 to .355

Obviously there is noise in these numbers, but it would be nice for them to match as well as possible.

The group of players who reach 350 BIP in a year are presumably more homogenous than the entire MLB.  I presume this is also true for the group of players who accumulated the necessary 1126 given as the linked result.  The group changes to be more homogenous, but the complement didn’t change to match the new group.  The complement is for a different group than what the K was computed on.

If K is computed using p(1-p) as EVPV where p = BABIP, and I compute the VHM as the variance of observed individual BABIP I get these K results for the following BIP thresholds
BIP = 1 … K = 94
BIP = 50 …K=144
BIP = 350…K=206

Those K numbers match the implied normal distributions with the observed distributions better.  So I looked at 2010 David Wright (because he was near one of the 10-90 range end mark and had a nice round N).


N...... 400.......400......400......400......400
K...... 94.......144......206......1126.....1126
Comp. 0.296...0.300...0.306...0.296...0.306
Actual 0.343...0.343...0.343...0.343...0.343
Cred.. 0.810...0.735...0.660...0.262...0.262
Result 0.334...0.331… 0.330...0.308...0.316

Maybe 2010 was a fluke and I need to look at more years and larger data sets, but this is why I am having trouble understanding and embracing this process fully.


#23    Tangotiger      (see all posts) 2011/06/24 (Fri) @ 13:41

You might like this:

http://www.tangotiger.net/solvingdips.pdf

Especially the second-half of that…


#24    Kincaid      (see all posts) 2011/06/24 (Fri) @ 14:50

For hitter BABIP, .21/1126=.00019 represents only the variance of the hypothetical means, not the total variance of the observed means.  Over 350 BABIP, the total observed variance would be estimated by:

variance of hypothetical means + random process variance = total observed variance

The random process variance over 350 BIP with p=.306 is about 0.00061 [(.306*(1-.306))/(350)].  The total observed variance then is .00019 + .00061 = .00080.  Using the normal approximation with mean=.306 and SD=sqrt(.00080), the 10th and 90th percentiles are .270 and .342, which is a pretty close match to the observed.

Setting the variance of hypothetical means to equal the total observed variance implies no random process variance (since total observed VAR = VHM + EVPV), so K would end up being zero if you did that no matter what the variance of hypothetical means is.


#25          (see all posts) 2011/06/24 (Fri) @ 15:30

I’ll eagerly await the pitchers, but what are the points for platoon drag for individual components (singles against lefties versus righties) and for parks (singles against lefties in Fenway)?


#26    Tangotiger      (see all posts) 2011/06/24 (Fri) @ 15:40

The pitcher update is in the main post at the top.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:37
What sabermetrics is NOT

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion