THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, February 13, 2009

R=.50 at BIP = 1500 for BABIP

By Tangotiger, 02:45 PM

That number sounds about right.

PizzaCutter gives these results:
r=.174, BIP=250 ... my equation says: .143
r=.253, BIP=500 ... my equation says: .250
r=.696, BIP=3750 ... my equation says: .714
r=.742, BIP=4000 ... my equation says: .727

I’m happy saying that Pizza’s correlation equation would give us
r=BIP/(BIP+1500)

Good job on Pizza to present the data as he did.  Makes my life easier.  I hope this post makes it seem useful as to why to present it in terms of r=.50.  When BIP=1500, r=.50.  So, all you have to do is say when r=.50.  You then have an automatic regression equation.  I can feel Pizza inching closer to my dark side.  Come closer, Pizza.  You are almost there.


#1    Tangotiger      (see all posts) 2009/02/13 (Fri) @ 14:57

The comments in that thread reminds me of this piece I did a long time ago:
http://tangotiger.net/dipsbands.html


#2    Guy      (see all posts) 2009/02/13 (Fri) @ 15:41

Tango:  In this post you indicated it took 3700 BIP to regress 50%, when looking at a pitcher vs. his teammates.  Why so much larger than Pizza’s estimate?  I assume the answer is that the teammate approach controls for quality of fielding, league/environment, and park, while Pizza’s correlation in part reflects those factors. 

http://www.insidethebook.com/ee/index.php/site/article/career_dips_numbers/


#3    Pizza Cutter      (see all posts) 2009/02/13 (Fri) @ 17:53

Guy, I’m saying 3700 to get to 50% of the variance, which is r = .707.  I’d have to look to see where the actual relaibility was at 1500 BIP.


#4    Tangotiger      (see all posts) 2009/02/13 (Fri) @ 17:56

Pizza/3: I will be shocked that the r at BIP=1500, will be outside of the .48 to .52 range.


#5    Tangotiger      (see all posts) 2009/02/13 (Fri) @ 17:58

Guy/2: right, that’s what it probably is.  Pizza’s correlation includes a park and fielder bias, which is why his correlation is as strong as it is.


#6    Guy      (see all posts) 2009/02/13 (Fri) @ 18:31

Year will also be a big factor, as Pizza is working with 1979 to 2008.  For 1979 to 1992, the BABIP will be around .280; after that it’s around .300.  So that will add quite a bit of correlation, I’d think.


#7    Tangotiger      (see all posts) 2009/02/13 (Fri) @ 19:31

Not really, because of the way that Pizza draws his samples.  He basically takes every other BIP, which is brilliant.


#8    Guy      (see all posts) 2009/02/13 (Fri) @ 20:25

I don’t think that fixes the problem.  A pitcher who finished pitching before 1993 will tend to be above average (i.e. low BABIP) in both his A and B samples, and post-1993 pitchers will tend to be below average in both samples, creating a correlation.  Right?


#9    Pizza Cutter      (see all posts) 2009/02/13 (Fri) @ 20:30

At 1500 BIP, the correlation (N = 359) is .460.  As to the dark side, I’m not convinced that the r = x/(x+c) method for correlations is a good one.  It’s a decent approximation (as shown by your calculations), because it’s an asymptotic function, and reliability is asymptotic as well.  But it can’t be considered anything more than an thumbnail approximation.

Also, Guy, the question isn’t “What is Larry’s BABIP?” but, “Is Larry’s BABIP consistent given two relatively equal samples?” Even if there is a park/fielder/year bias, it won’t matter because it’s the same bias in both samples.


#10    Tangotiger      (see all posts) 2009/02/13 (Fri) @ 21:04

Pizza, how about this.  Restrict yourself to those 359 pitchers and their 1500 PA in each half.  Now, randomly select 750 BIP in each half, and find the correlation.  Select the other 750 BIP in each half, and find the correlation. 

If the correlation is r=.46 at 1500 BIP in each half, then the correlation at 750 BIP will be right around r=.32.


#11    Guy      (see all posts) 2009/02/13 (Fri) @ 21:59

Pizza:  I understand, but the question is consistent compared to what?  Compared to the other pitchers in your sample.  And by mixing pre- and post-1993 pitchers, you’ll create the illusion of consistency.  Petitte’s two samples are similar in part because Petitte has a high-BABIP relative to his peers, but also because he pitched post 1993.  He’s virtually guaranteed to be below average in both samples, just by virtue of when he pitched.  Run your analysis again looking separately at pre- and post-1993 careers, and I’d expect your correlations to drop (or, normalize your BABIPs for year/league).

Factors like park and team defense will also increase the correlation.  If you pull all the Rockie pitchers out of your samples, for example, correlations will drop.  What’s nice about Tango’s teammate approach is that it controls for an awful lot of those factors in one step (though not perfectly).

Or, I’m really misunderstanding your method here.  Always possible…


#12    Tangotiger      (see all posts) 2009/02/13 (Fri) @ 22:08

Guy, I think you are misunderstanding.

Take Clemens’s games in 1992, put half of them in pile A and half in pile B.  Take his games in 1995.  Put half in A, half in B.  Take his games in 2004, put half in A, half in B.

In each pool, the A and the B, you have identical contexts.  So, what would it matter if Clemens has alot of games pre-1993 or not?  They are equally represented in both pools.

Presuming I’m following Pizza’s methdology.

So, the two pools has not only Clemens, but his parks and fielders.  The correlation is not only about Clemens, but Clemens+parks+fielders.  That’s why the correlation is as strong as it is.

If Pizza were to limit his look to only having say Clemens in Toronto in Pool A, and Clemens in NY in Pool B, the correlation will be nowhere near as strong (aging issues notwithstanding).


#13    Tangotiger      (see all posts) 2009/02/13 (Fri) @ 22:18

Now, he’s also making sure that Clemens has 1500 BIP in each pool.  And that each pitcher also has the same number of BIP in each pool.


#14    Guy      (see all posts) 2009/02/13 (Fri) @ 22:30

Tango: 
Take a more extreme example: suppose the pre-1993 BABIP was .250, and post-1993 it was .350.  In both your A and B samples, every above-average pitcher would come from the pre-1993 period, every below-average pitcher from the post-1993 era.  That will create a correlation, even if every single pitcher’s true talent = his own league mean.  Right?


#15    Guy      (see all posts) 2009/02/13 (Fri) @ 23:13

Just to follow up, Tango’s career DIPS spreadsheet has 1100 pitchers who started their career in 1979 or later.  The 200 with the best BABIP have a mean start year of 1989, the 200 worst have a mean of 1995 (that’s a huge difference, given the boundaries of 1979 and 2004).  In the 1980s a lot of guys were putting up BABIPs under .270 (John Tudor had a career mark of .266), levels that no non-closer would reach today.  Divide these player’s careers in 2, and what you’ll find is some correlation based only on when they pitched (i.e. both of your John Tudors will look good, because they’re being compared to post-1993 pitchers).


#16    Tangotiger      (see all posts) 2009/02/13 (Fri) @ 23:14

I don’t understand.

In Pool A, you have 1500 BIP of Clemens with a mean of .240, 1500 BIP of Greg Maddux with a mean of .250 and 1500 BIP of Jamie Moyer with a mean of .260.

In Pool B, you have 1500 BIP of Clemens with a mean of .345, 1500 BIP of Maddux witha mean of .351, and 1500 BIP of Moyer with a mean of .352.

What would it matter if Pool B is .245, .251, .252 instead?

In any case, Pizza is taking all of the Clemens data, from 1986-2004, and splitting half of the BIP into Pool A and the other half in Pool B.  He’s doing that by taking his first BIP and putting that in Pool A, his second BIP into Pool B, his third into Pool A, and so on.

So, your scenario doesn’t apply.


#17    Guy      (see all posts) 2009/02/14 (Sat) @ 00:33

I may be misunderstanding what correlation Pizza is calculating.  But one more try....

If all or most of the pitchers had pitched both before and after 1993, like Clemens, then you’d be right and there wouldn’t be a problem.  But most pitchers in the sample will be predominantly or exclusively pre-1993, or predominantly/exclusively post-1993. That’s the problem:  mixing pitchers from two different worlds.  What you’ll get is samples like this:
A B Pitcher
.240 .254 Sid F.
.270 .250 Steib
.253 .273 Bud Black
.270 .258 Dan Petry
.
.
.300 .320 Sele
.305 .325 Shane Reynolds
.315 .300 Pettitte
.330 .310 Rusch

Each sample will have the same mean (around .290), same SD, and same mix of pitchers.  But the pre-1993 guys will be clustered at one end (in BOTH samples), the post-1993 guys at the other, and that will increase the correlation for individual players.  Because they haven’t really been drawn from a single pool with a mean of .290, but rather 2 different pools. In fact, I think you would find correlation even if every single pre-1993 pitcher had a .280 true talent and every post-1993 pitcher was .300.


#18    Tangotiger      (see all posts) 2009/02/14 (Sat) @ 01:31

Ah, I get you now.


#19    Zach      (see all posts) 2009/02/14 (Sat) @ 01:34

I’ll try to jump in this…

What Pizza is doing (or seems to be doing) is taking yearly numbers and separating them into two groups.

If Pitcher Joe has 30 games in 1990 and 30 in 2000, then in his first group is 15 games from 1990 PLUS 15 from 2000; his second group is the rest of the games. So it shouldn’t matter where the games or PAs in question come from, because there’s an even amount of games/PAs from each time period in each sample.


#20    Pizza Cutter      (see all posts) 2009/02/14 (Sat) @ 01:36

Tom, per your request, I took the same group of 359 pitchers and looked at their reliability at 750 PA.  The result was .303.

Guy/17 (and preceding) - I see what you’re saying now.  I have to wonder if it would be that bifurcated, but it’s a point well taken.  I hesitate to run the analyses as you suggest, not for theoretical reasons (it seems a practical thing to do).  I hesitate because it’s going to cut into the sample size and then I’m going to be basing correlations on 5 or 6 guys.

Funny enough, my point in this article had more to do with the thought that while one year of BABIP doesn’t tell us much about a pitcher, the leap to “and it’s completely out of his control… it’s all noise, no signal” can be shown to be false.

I shall do some pondering on the subject.


#21    Pizza Cutter      (see all posts) 2009/02/14 (Sat) @ 01:40

Zach, it’s even finer grained than that.  Let’s say that Clemens faces 30 batters today.  PA’s #1, 3, 5, 7, 9… go into the “odd” basket and 2, 4, 6, 8, 10… go into the “even” basket.  I group by plate appearance.


#22    Tangotiger      (see all posts) 2009/02/14 (Sat) @ 08:34

Pizza:

(1-r)/r*BIP

So:
(1-.46)/.46*1500 = 1761
(1-.30)/.30*750 = 1750

Still not convinced?

If you take 500 PA, you should get r=.22


#23    Guy      (see all posts) 2009/02/14 (Sat) @ 08:46

Pizza, I’m sure it won’t be as bifurcated as I suggested—I was exaggerating to make my point clear.  The problem would be much worse if you started in 1970 or earlier.  Still, there’s probably some impact.  It would be much more convenient for researchers if MLB hadn’t engineered a sudden, huge one-time shift in offense!

If you haven’t already seen it, Woolner used a somewhat similar split-season approach to look at this:  http://www.baseballprospectus.com/article.php?articleid=883.  He compared pitchers’ BABIP to their own teams, similar to Tango’s approach, to control for park, defense, etc.

Another complication is that your larger samples will only include long-career pitchers.  But high-BABIP pitchers won’t have long careers as a rule, and so will be very underrepresented in these samples.  That will tend to artificially reduce your correlations.


#24    Pizza Cutter      (see all posts) 2009/02/14 (Sat) @ 12:17

Tango/22 - I don’t dispute that it’s a handy formula, but that constant does wobble a little bit more than I would like when applied to actual data.  Surely, it’s not from 500 to 1500, but it’s not quite as constant as I’d like.

Guy/23 - Thanks for the Woolner link.  I’d not read that.  (Considering that the man works up the street from me—seriously, he does—I really ought to stop by.)

The sampling issue has occurred to me, although there’s no really good way around that one.  If only teams let all players, good and bad, have an equal number of plate appearances!


#25    Guy      (see all posts) 2009/02/14 (Sat) @ 14:53

Another great piece is Tippett’s:  http://www.diamond-mind.com/articles/ipavg2.htm.  The graphs on total innings pitched illustrate how much worse the pitchers are who wash out with short careers.  I really think that’s what creates the relatively narrow range of BABIP talent we observe:  the skill is so important that if you aren’t pretty good (<.310 in today’s game), your out of MLB pretty fast.  Of course, there’s a potential selection bias:  some pitchers lose their job due to bad BABIP luck.  But most pitchers with any talent get multiple chances to succeed in the majors, so I’m inclined to believe that most of the short-career pitchers really would get hammered if they continued to pitch.


#26    Tangotiger      (see all posts) 2009/02/14 (Sat) @ 16:37

Pizza: the ONLY reason it wobbles as it does, is because you are not using the same data/pitchers in each of your samples.  If you started with those 359 pitchers at BIP=1500, then the handy-dandy formula will barely wobble as you check those very same pitchers at BIP=1000, 500, 250, 100, 50.

We should expect some wobbling if you are not using the same data.


#27    Guy      (see all posts) 2009/02/14 (Sat) @ 17:53

Tango:
For your method, can’t we skip the step of measuring actual y-t-y correlations and just use combined multi-year data?  The number of PA to regress 50% should = PA/(SDratio^2-1), where PA = number of PA in your sample, and SDratio = SD (observed)/SD(error).  So if you have 7000 PAs and a SD of .012, your ratio is 2.19 and your 50% regression PA = 1842. 

An awful lot of research focuses on y-t-y correlations, when it seems to me combined multi-year samples give you what you want with a lot less effort.


#28    Pizza Cutter      (see all posts) 2009/02/14 (Sat) @ 22:38

Tom, any wobbling at all, even what has been observed in this very thread is too wobbly for me(1761 vs. 1750, when I kept the sample itself the same.) When I see “c” for constant in a formula, I like to take that literally.  The number should be as constant as Cal Ripken.  That might simply be a point of disagreement on how much ambiguity we’re willing to accept, but it’s the flaw that I see in the formula.

Still, let me show you something.  Same sample of 359 pitchers with at least 3000 BIP’s between 1979-2008 in all of the following analyses.  Taking matched samples of X BIP’s, the split-half correlation and the resulting value for “c” in the equation r = x/(x+c).

50 BIP, r = .080, c = 575
100 BIP, r = .090, c = 1011
250 BIP, r = .191, c = 1058
500 BIP, r = .284, c = 1260
750 BIP, r = .303, c = 1725
1000 BIP, r = .358, c = 1793
1200 BIP, r = .390, c = 1876
1250 BIP, r = .404, c = 1844
1300 BIP, r = .414, c = 1840
1350 BIP, r = .436, c = 1746
1400 BIP, r = .441, c = 1774
1450 BIP, r = .450, c = 1772
1500 BIP, r = .460, c = 1760

Over small ranges, the constant wobbles a little, meaning that if you restrict your analyses to a small range of BIP values, it might work.  But even then there are some big jumps (look at 1300 to 1350) in the value of that “constant”, and it doesn’t seem to follow any sort of function that I can think of.  At the low end of the sampling chart, it’s clearly out of whack.  As I increase the sample size, the constant is settling into a range somewhere around 1760ish, which is promising.  The problem, of course, is that to present r = x/(x+c) with some specified value of c, based on a point or two worth of data, as a general equation doesn’t hold with the data.


#29    Tangotiger      (see all posts) 2009/02/15 (Sun) @ 01:25

Pizza, great work.

If you make c=1793, you get these expected r to your reported r:

50 BIP r = 0.080 c = 575 estimated r = 0.027
100 BIP r = 0.090 c = 1011 estimated r = 0.053
250 BIP r = 0.191 c = 1058 estimated r = 0.122
500 BIP r = 0.284 c = 1260 estimated r = 0.218
750 BIP r = 0.303 c = 1725 estimated r = 0.295
1000 BIP r = 0.358 c = 1793 estimated r = 0.358
1200 BIP r = 0.390 c = 1876 estimated r = 0.401
1250 BIP r = 0.404 c = 1844 estimated r = 0.411
1300 BIP r = 0.414 c = 1840 estimated r = 0.420
1350 BIP r = 0.436 c = 1746 estimated r = 0.430
1400 BIP r = 0.441 c = 1774 estimated r = 0.438
1450 BIP r = 0.450 c = 1772 estimated r = 0.447
1500 BIP r = 0.460 c = 1760 estimated r = 0.456

So, the wobbliness occurs at BIP=500 and below.

The “jump” in r from 1300 to 1350 (where the difference in observed r and estimated r is .006) is hardly a jump. 

It’s clear that when you have a function as x/(x+y), that if the y is 1750 or 1850, it will hardly matter.

If I had said that I could estimate r at within .01, for all sampling where BIP equals at least 750, I’d say that’s a pretty fantastic estimate, wouldn’t you?

Indeed, here is the estimate r, if I use c=1750 or 1850:

50 BIP r = 0.080 c = 575 estimated r = 0.03 0.03
100 BIP r = 0.090 c = 1011 estimated r = 0.05 0.05
250 BIP r = 0.191 c = 1058 estimated r = 0.13 0.12
500 BIP r = 0.284 c = 1260 estimated r = 0.22 0.21
750 BIP r = 0.303 c = 1725 estimated r = 0.30 0.29
1000 BIP r = 0.358 c = 1793 estimated r = 0.36 0.35
1200 BIP r = 0.390 c = 1876 estimated r = 0.41 0.39
1250 BIP r = 0.404 c = 1844 estimated r = 0.42 0.40
1300 BIP r = 0.414 c = 1840 estimated r = 0.43 0.41
1350 BIP r = 0.436 c = 1746 estimated r = 0.44 0.42
1400 BIP r = 0.441 c = 1774 estimated r = 0.44 0.43
1450 BIP r = 0.450 c = 1772 estimated r = 0.45 0.44
1500 BIP r = 0.460 c = 1760 estimated r = 0.46 0.45

Using 1750 or 1850 is the mathematical equivalent of toMAYto and toMAHto!

Anyway, great work, and you give me pause to think that the function only works most of the time, not all of the time.


#30    Guy      (see all posts) 2009/02/15 (Sun) @ 09:30

The question is why does Pizza’s data show a “too high” correlation at low Ns?  If we use his large sample results, r=.45 at 1450, that implies a true talent SD of 0.0109.  Using that, we can project the r for any sample size.  This is what I get:

N Expected r
50 0.028
100 0.054
250 0.124
500 0.221
750 0.298
1000 0.362
1200 0.405
1250 0.415
1300 0.424
1350 0.434
1400 0.443
1450 0.451
1500 0.460

The expected r matches Pizza’s very well at 750+ PA, but not below.  So why do very small samples produce a correlation higher than expected?


#31    Guy      (see all posts) 2009/02/15 (Sun) @ 10:37

Pizza:
Any chance the selection of the samples is non-random, like matched pairs of PAs from a pitcher’s two samples, or something like that?


#32    Guy      (see all posts) 2009/02/15 (Sun) @ 14:04

Looking again at Pizza’s r values, they seem higher than we usually see.  At 750 BIP, which is roughly a full season, his r is .30.  IIRC, the typical y-t-y correlation is much lower, like .12.  I’m guessing Pizza drew matching nth-select samples, so they look something like:
A:  PA #10, #25, #40, ...
B:  PA #11, #26, #41, ....
If so, that will ensure each sample is virtually identical in terms of year, league, park, and team defense.  More importantly, it controls hitter quality pretty well:  Manny in A is matched by Papi in B, etc.  And that means much less random variation—and higher r—in the smaller samples than we’d expect with two random samples.  There’s nothing wrong per se with this approach, but it won’t match up with assumption underlying Tango’s approach, that SD(observed) = sqrt(SD(true)^2 + SD(error)^2).


#33    Peter Jensen      (see all posts) 2009/02/15 (Sun) @ 14:31

Guy - That assumption may be part of the problem.  That equation is NEVER true in uncontrolled research.


#34    Guy      (see all posts) 2009/02/15 (Sun) @ 15:32

Well, I think it’s true to the extent the “true SD” means the SD we’d find among this population given an infinite sample size.  Now, that doesn’t mean it’s a true measure of pitcher talent in this case, because a number of other factors are involved (park, league, fielders, etc.).  But I think the formula does a good job of telling us how to separate sampling error from the remaining variance.  Or do you disagree?


#35    Peter Jensen      (see all posts) 2009/02/15 (Sun) @ 15:49

I disagree.  The correct formula is SD(observed) = sqrt(SD(true)^2 + SD(Sample size error)^2 + SD(external factors)^2).  By leaving out the SD due to external factors you seriously distort the process of determining when there is an actual effect.  You can only depend on SD due to sample size error to be decreased with an increase in sample size.  SD due to external factors may stay present at values that overwhelm SD(true) at any sample size.


#36    Pizza Cutter      (see all posts) 2009/02/15 (Sun) @ 18:04

Guy/32.  That’s how I did it.  I numbered all the BIPs up and then went evens and odds.  As to why my correlations are higher than the usual y-t-y, most times, the yty’s are “minimum 100/200/250 BFP”.  My method insures that everyone has 750.  The other one might have a guy with 200 next to a guy with 500 next to a guy with 700.

Peter/35.  Right on.  The observed variance is always going to be a function of true, random, and error/external variances.  External factors could be biases in the sample or a bad metric to begin with.


#37    Tangotiger      (see all posts) 2009/02/16 (Mon) @ 09:27

"IIRC, the typical y-t-y correlation is much lower, like .12. “

That’s not true.  750 BIP is a full season for, like, Greg Maddux maybe.  So, there is no “typical” y-to-y at that level.  Here are the correlations I found a few years ago:

http://tangotiger.net/archives/stud0084.shtml

***

I agree with Peter and Pizza.

But, if you choose your samples randomly, those biases won’t apply to one group of samples more than the other.

So, the sd “true” is really whatever it is you think you are studying (pitchers) plus whatever else you think is random but may not be (fielders, park, batters).

***

In any case, I agree that Pizza’s correlation is way higher than expected at the low levels.  Indeed, it is possible that Pizza is not taking a random sample, but perhaps taking the first n number of PA for each pitcher.  And, that means that the correlation is capturing the fielders+park effect.

***

I’m going to try my own this week, to see what I get…


#38    Bjorn      (see all posts) 2009/02/17 (Tue) @ 08:40

I don’t know if this is a sidetrack or not…

But has there been any attempt to separate the inpact the pitcher can have on BABIP by his pitching and by his own fielding?

Even if you do some type of with-or-without-you type analysis to strip away park effects and fielding by the pitchers teammates this factor should still remain, shouldn’t it?

Or maybe the distinction is meaningless?
(In a sort of an out is an out is an out kind of way.)


#39    Tangotiger      (see all posts) 2009/02/17 (Tue) @ 10:54

Search PZR in this blog.


#40    Tangotiger      (see all posts) 2009/02/17 (Tue) @ 17:46

This is what I did:

1. Created a true talent distribution of 100 pitchers, such that the mean was .300, 1 SD = .010, and that 50% of the pitchers were within .007 of the mean.  That’s a fairly normalish distribution.

2. Gave each pitcher exactly 4000 trials (balls in play), and generated a random number as to whether it was a hit or out, based on the true talent mean each pitcher was given in step 1.

3. Created 8 random buckets of 500 BIP for each pitcher.

4. Ran a correlation of bucket1 to bucket2, bucket3 to 4, 5 to 6, and 7 to 8.

5. The results:  the correlation was r=.268.

6. Use my trusty formula r=x/(x+500) to get a value of x=1366

7. Change Step3 to 2 buckets of 2000 BIP for each pitcher, correlate the two buckets (Step4), and produce the results (Step5) as r=.568, and use the trusty formula (Step6) od r=x/(x+2000) to get a value of x=1521

8. Create 2 different buckets of 2000 BIP, and run another correlation: r=.477, to get a value of x=2193

What do we learn?  Well, there is no very much stability in correlations when using random samples of the same data.  You’d basically have to get an almost infinite number of combinations of the sample data to run your correlations.

Is there another way?  Let’s figure out the sample BABIP for our 100 pitchers over those 4000 BIP: the mean was .3000 and the standard deviation was .0129.

The true spread of the underlying talent was actually 1 SD = .010 (actually, .0098).

Also note that the standard deviation from our binomial is sqrt(.3*.7/4000)=.0072

var(observed) = var(true) + var(luck)
var(observed) = .0098^2 + .0072^2 = .0121^2

First thing to notice is that we have actually observed (sd=.0129) was wider than what we expected (.0121).  So, even with such large samples, the observation is still not exactly what we would have expected.

r-squared = 1 - var(luck)/var(observed)
which is either .688 if we use the actual observed or .646 if we used the expected observed.

r=x/(x+4000)=.688, meaning x=1814

So, when you have two 500 BIP samples, your r=.22 (compared to the .28 that we ran our correlations on)
When you have two 2000 BIP samples, your r=.52 (around what we got).

Therefore, we are going to get alot of non-smoothness, even if we focus on the exact same data split in different buckets.

Indeed, I find Pizza’s results TOO smooth.  Pizza: did you take random samples each time you did what you did, or you did: BIP1-BIP250 for one, then BIP1-BIP500 for the next, etc.


#41    Tangotiger      (see all posts) 2009/02/17 (Tue) @ 18:01

Here is the data I used for those who want to try:
http://www.insidethebook.com/ee/images/uploads/correlation_tests.zip


#42    Pizza Cutter      (see all posts) 2009/02/17 (Tue) @ 18:39

Tom, for twin 500 BIP samples, I took BIP’s numbers 1-1000, and split them evens and odds.  This was to specifically match things as best as I could so that I was matching for quality of opposition/park/age/fatigue.  I considered random sampling, but decided to go with purposeful matching.  I could, theoretically, re-run it with random sampling instead.


#43    Tangotiger      (see all posts) 2009/02/17 (Tue) @ 19:19

Pizza: ah, well that makes perfect sense.  You are in effect capturing not only the impact of the pitcher, but also his fielders, park, and batters.  That totally explains why your correlation was so high, compared to the rest of your data.

Indeed, by doing even/odds, you are getting basically 12-15 BIP from the same game in each bucket.  So, not only are you capturing what I said, you are also capturing the effect of the climate.


#44    Pizza Cutter      (see all posts) 2009/02/17 (Tue) @ 23:53

Alright, same 359 guys with at least 3000 BIP from 1979-2008.  This time, I took a random sampling (and shook up the order) of plate appearances.  BIP, split-half correlation under randomization, split-half previously found (see comment #28)

50 BIP, new = .015, old = .080
100 BIP, new = .097, old = .090
250 BIP, new = .035 (sic), old = .191
500 BIP, new = .270, old = .284
750 BIP, new = .335, old = .303
1000 BIP, new = .382, old = .358
1200 BIP, new = .350 (sic), old = .390
1250 BIP, new = .425, old = .404
1300 BIP, new = .436, old = .414
1350 BIP, new = .390 (sic), old = .436
1400 BIP, new = .402, old = .441
1450 BIP, new = .417, old = .450
1500 BIP, new = .439, old = .460

The reason that it’s so messed up and, at times, actually becoming less reliable in a bigger sampling frames is because I’m only getting one random shot at the data.  If I were to re-run these same analyses, I could get totally new numbers.  In other words, my reliability estimate is not very reliable.

I’m still more inclined to go with my serial method for sampling.  The reason is that in this case, I want to know the answer to is “given roughly the same set of circumstances again, would Larry do roughly the same again.” There probably is some effect for climate/defense/quality of opposition/etc.  If I wanted to pull apart what amount of this was Larry’s doing and what amount was everything else, the random method makes more sense as it brings in a broader range of contexts in the hopes that they will all shake out in the wash.  That, I suppose, is a matter of taste as to which question is more interesting to you.


#45    Guy      (see all posts) 2009/02/18 (Wed) @ 00:24

I agree that either approach can be valid, depending on your purpose.  But the original question on the table was, how many BIP/PA do we need to see before having a good sense of what a pitcher’s talent is?  To answer that question, I think you’d want the random sample approach, as we obviously can’t anticipate our pitcher will face identical hitters, parks, etc. in the future. 

More importantly, you probably want to try to control for some of those factors, whether by Tango’s teammate approach or something else.  Unfortunately, when you do that the correlations fall much further.  So you probably do need something like 3-4,000 BIP to get a good read on the pitcher’s own talent.


#46          (see all posts) 2009/02/18 (Wed) @ 00:54

Bjorn #38.  This is really only one data point, so don’t give it much importance.  I seem to remember seeing an estimate a couple of years ago showed Greg Maddux with one of the higher differences over his career between his BABIP allowed and the average, something like 3 hits per year.  About the same time I saw Dewan’s +/- numbers for Maddux’s fielding, about 6 hits per year above average!  (At age 40 or so.)!


#47    Tangotiger      (see all posts) 2009/02/18 (Wed) @ 12:42

I’m with Guy on this matter as well.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 23:23
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 22:49
Clutch analogy

Feb 11 22:08
Who is Jeremy Lin?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul