THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, April 17, 2009

Is the ball juiced again?

By Tangotiger, 10:43 AM

Greg sent me an email:

Thought you might like to hear about something I’ve been following for the first week of the season.  I began wondering at the large number of long home runs being hit in the first two full days of the season, and started watching the numbers closely.  The distance of the home runs being hit this year (the true distance, i.e where they actually land, as well as the standard distance, which factors out weather and altitude) is significantly higher than last year, with the average standard distance being 8.5 feet longer this year than last.

You may be wondering about sample sizes, and of course I took that into account.  I used a 2-sample T-test on the 2009 and 2008 full season data, and got this:

Two-Sample T-Test and CI20092008 
 
Two
-sample T for 2009 vs 2008
N Mean StDev SE Mean
2009 199 399.8 27.8 2.0
2008 4820 391.3 25.4 0.37
 
Difference 
mu (2009) - mu (2008)
Estimate for difference8.49
95
CI for difference: (4.5412.45)
T-Test of difference (vs not =)
         : 
T-Value 4.23 P-Value 0.000 DF 211

The p-value actually works out to 0.0000341, which is a very strong indicator that something is making 2009 home runs fly farther than 2008 home runs, in isolation of the weather, and to me that implicates the ball.  In the course of observing all the homers, I have also heard lots of comments from announcers who were surprised at how far the ball had carried.

When I look at only April, 2008, I get a p-value of 0.01, so I don’t think it’s just some sort of calendar thing here.  I’ve done the same comparison to 2007, 2006, and the month of April for each of those years, and all indications agree that the difference is significant.

So, you might want to dust off your calculations from that “Changes in HR Rates from the Retrosheet Years” article and see what you get.  Looks like a big year for homers, and so far the actual rate of 2.14 HR per game (in April!) doesn’t contradict that…

I’ll only be able to report my results at the end of the year.  Drastic single-year changes only happens when you have a catalyst, as discussed in my article on the subject. 

***

These are Greg’s images from post 48:


image

image

image

#1          (see all posts) 2009/04/17 (Fri) @ 11:59

For the moment, let us take seriously that the average home run distance is 8.5 ft farther for this year.  It would be nice to see how that extrapolates to an increase in hit ball speed.  Greg has those numbers, at least in the context of his model.  Assuming he has not changed his model, he could compare hit ball speed for 2009 to 2008.  Absent that, here is an estimate.  Given that each mph of hit ball speed corresponds to about and additional 5.5 ft of fly ball distance (based on my own aerodynamics model), then I estimate an increase in hit ball speed by about; 1.5 mph.  Knowing typical properties of bats, swing speed, and pitch speeds, I can estimate that such an increase would result from an increase in the ball coefficient of restitution by about 0.013 (out of a typical value of 0.46 at these speeds).  That is a 3% increase in the “juiciness” of the ball. 

However, I would caution about overinterpreting Greg’s observation or my own calculation I just presented.  Recall the juiced ball claims from 2000 when the number of home runs in April and May were significantly higher (per game) than in previous years.  By the end of 2000, everything looked pretty normal again.


#2    Brian      (see all posts) 2009/04/17 (Fri) @ 12:24

I joked with a friend before the season started that they’d be juicing the ball this year since attendance was expected to be down. Perhaps I should not have been joking.


#3          (see all posts) 2009/04/17 (Fri) @ 12:46

Well, when I look at Speed Off Bat numbers for this year’s homers vs. last year’s, I get a nominal difference of about 1.42 mph, which agrees well with Alan’s estimate.  Here’s the analysis output on the 2-sample T-test for that:

Two-Sample T-Test and CI: 2009 All SOB, 2008 All SOB

Two-sample T for 2009 All SOB vs 2008 All SOB

N Mean StDev SE Mean
2009 All SOB 311 105.62 4.74 0.27
2008 All SOB 4821 104.20 4.54 0.065

Difference = mu (2009 All SOB) - mu (2008 All SOB)
Estimate for difference:  1.424
95% CI for difference:  (0.881, 1.968)
T-Test of difference = 0 (vs not =): T-Value = 5.15 P-Value = 0.000 DF = 347

I don’t have data for 2000, unfortunately, but looking at things this way is much better than just looking at HR totals, which is how people made their judgment back in 2000.  Here we’re looking at several things:

1.  Weather-neutralized distance is up significantly.

2.  Speed Off Bat is up significantly.

3.  HR totals are up significantly.

I’ll add one more thing to this list, at the risk of bringing some scorn down on my head: when I watch the home runs (as I have every one of the 15,000 plus hit since April, 2006), the ball seems to my eyes to be flying farther than it should this year.  This is a “Blink” sort of perception of mine, and in fact this is the reason I looked at this in the first place.  After watching the first 30-40 homers this year, I had already said to myself “what the %^&*” about 7-8 times when I watched a so-so swing and indifferent contact result in a ball that sailed (well) over the fence.  My first thought when that happens is always “tailwind”, but a lot of these balls were indoors, or outdoors lacking a tailwind, so I dove into the numbers.

I will offer some possible explanations for the longer homers so far, but to me it’s not a very convincing list:

1.  Wind inaccuracies.  I could be putting bad wind numbers on these balls, leading to wrong weather-neutralized numbers.  The trouble with this theory is that when I look at the comparisons for indoor-only homers, the effect is still there, so I don’t think my winds are (significantly) wrong.

2.  Sampling bias.  There have been more games played in Arlington than the Bronx so far, and so on.  This could possibly have an effect, but it’s not a very strong bias, and everyone’s supposed to be using the same ball, so this shouldn’t matter if the comparison is being made on weather-neutralized distance.  If I had only compared true distance, this would be a valid concern.

3.  Luck of the draw.  Tis was a possibility earlier, when I first spoke up about this after a day and a half of the season, but now after nearly 300 home runs, the p-value describing the possibility that this is just a statistical artifact is exceedingly small.

I invite everyone’s theories, that’s why I shot this Tango’s way.  But if I were a betting man (and generally I’m not), I’d be taking the overs on runs and home runs this year, and if I were a fantasy baseball player (and generally I’m not), I’d be looking for cheap mid-level power hitters who figure to get 600 PA’s, as they as a group are going to benefit greatly from a lively ball.


#4    Phantom Stranger      (see all posts) 2009/04/17 (Fri) @ 13:25

Just anecdotally, I have noticed a jump in homers off swings that did not look very good.  It would make financial sense for MLB to juice the ball in a year with poor expected attendance.  I would load up on sinkerballers for your fantasy rotations.  That would be an exploitable market inefficency if this data is to be believed.


#5    weskelton      (see all posts) 2009/04/17 (Fri) @ 13:33

If there is an adjustment for weather, than I’m thinking it shouldn’t matter whether you are comparing the first two weeks of 2009 with any two week span in any other season, right?  If that’s true, how unusual is it to have a two week span with an adjusted HR distance that is 8 feet greater than the overall average?  I’m probably missing something here.

One more thought… If we’re only looking at HRs, is it possible that there has just been a greater number of games played at larger ball parks, which would require the balls to carry farther in order to become HRs?  If you have data for non-HR flyballs, do you see a similar pattern?


#6          (see all posts) 2009/04/17 (Fri) @ 13:40

I agree with Greg that the analysis tools available now are much more extensive than those available in 2000, given his seminal work in analyzing home runs.  And the tools will soon become even better.  When Sportvision completes their hitf/x analysis for 2008 and starts their analysis of 2009 data, we will have a direct measure of batted ball speed.  That will then allow a direct comparision between the two years, which should allow some direct statements comparing the COR of the balls for the two years.  The sample size will be huge, since it will look at all batted balls and not just home runs.


#7          (see all posts) 2009/04/17 (Fri) @ 14:14

I looked for 300 homer sequences, to find the biggest average standard distance.  There were about 15,000 in-season sequences, with the mean around 392, of course, since that is the approximate mean for 2006-2008 homers.  The longest sequence I found was 398.3 feet for April 9-22, 2007 (interesting, also in April).  For 2008, there were no sequences over 396, and for 2006, none over 397.5

One thing I noticed about that sequence in April 2007 was that it seemed to be overrepresented by sluggers; here are the hitters with 4 or more homers out of that 300 homer sample:

Alex Rodriguez, 7
Barry Bonds, 5
Josh Hamilton, 5
Chipper Jones, 5
Ian Kinsler, 5
Adrian Gonzalez, 4
Travis Hafner, 4
JJ Hardy, 4
Carlos Lee, 4
David Ortiz, 4

The 2007 list is loaded with long-ball artists.

Compare that to the April 2009 sequence (this is the first 311 homers, actually), check the guys with 4 or more homers here:

Nelson Cruz, 5
Evan Longoria, 5
Carlos Pena, 5
Miguel Cabrera, 4
Albert Pujols, 4
Brandon Inge, 4
Carlos Quentin, 4
Alfonso Soriano, 4
Nick Swisher, 4

Pujols is the only guy on there with any real long-distance credentials (prior to this year, anyway).

To me, this makes the 2007 sequence look like an expected periodic “planetary alignment” of the best long-ball sluggers, while the 2009 list looks like a rather ordinary cross-section of home run hitters.


#8    Tangotiger      (see all posts) 2009/04/17 (Fri) @ 14:31

I dunno Greg, next year, I would not be surprised to see Longoria, Soriano, Cabrera, and Pena on such a list.  Cruz’s travels are well-documented, and is not really a surprise.  Quentin is not much of a surprise.

Swisher and Inge are mild surprises.

I don’t think that anecdote tells us anything really.  You’ve got such a fantastic argument already that I don’t think we need to bring up such a list as meaning anything.


#9    Greg Rybarczyk      (see all posts) 2009/04/17 (Fri) @ 14:40

Those guys will likely pop on any random sequence of 300 homers, yes, because they hit a lot of homers, but to me it would be surprising to find them headlining THE list of the longest sequence of the year.  Which makes me expect that this sequence to start 2009 will most likely not be the longest sequence of 2009…

But anyway, the nice thing is that every day we get 25-35 more data points rolling in.

One other thought, it would be interesting to hit the sporting goods stores and see if there are any 2008 MLB game balls left on the shelves, and if so, box them up and send them to Dr. Sherwood at UMass Lowell along with some 2009 game balls for a little side by side test… or maybe send them to any other baseball physicists we know who have some time on their hands… smile


#10          (see all posts) 2009/04/17 (Fri) @ 15:16

Is it also possible that there’s a bias in the reporting of distances?  Maybe MLB changed the method so that the reports are of higher distances even though the balls aren’t going any farther.


#11    Tangotiger      (see all posts) 2009/04/17 (Fri) @ 15:28

Greg is calculating them himself.  Sounds like someone who doesn’t visit http://www.hittrackeronline.com enough to me smile


#12    Greg Rybarczyk      (see all posts) 2009/04/17 (Fri) @ 15:33

Phil, these are not reported distances I’m getting second hand, they are my own observations.  I’m making them in exactly the same way I have the last three seasons, so there is definitely no procedural bias on the distance numbers.

And incidentally, home run distances should be (and are, IMO) more accurate than other hits that stay inside the fence, because of the profusion of easily located landmarks beyond the fence, as contrasted with the (largely) unmarked expanse of green grass inside the fence.  I would trust my home run distances much more than non-homer data…


#13          (see all posts) 2009/04/17 (Fri) @ 15:36

Re Greg (#9):  Jim Sherwood at UML does the “official” testing of balls for MLB.  There are other options, such as Lloyd Smith at WSU (http://www.mme.wsu.edu/~ssl).  I have used Lloyd’s lab for testing baseballs in the past.  The main issue is coming up with enough 2008 balls to compare with the new ones in a statistically meaningful way.


#14    Greg Rybarczyk      (see all posts) 2009/04/17 (Fri) @ 15:37

Here’s a good example of how the home runs are landing in an area rich in landmarks, while fly balls come down in “the ocean of green"…

http://www.hittrackeronline.com/parks/citifield.jpg


#15          (see all posts) 2009/04/17 (Fri) @ 15:39

Um, yeah, actually, I’ve never visited http://www.hittrackeronline.com ever.  Until now.  Looks awesome!

Greg, no slight intended.  It’s MLB I don’t trust.  smile


#16    MGL      (see all posts) 2009/04/17 (Fri) @ 16:19

Alan, do you really need that many balls to test to come up with a meaningful result?  Is there that much variation ball to ball when you test them?  Or I should ask, “What is the standard error of the test results (either through testing error or fluctuation in actual COR from ball to ball)?”

You say a 3% increase in COR?  What, BTW, are the tolerances allowed by MLB (they were quite large, IIRC).

How hard can it be to come up with plenty of 2008 balls if you ask the right people?  If nothing else, you can put a “wanted” ad on the internet and pay people 100 bucks for a ball from a game last year.

How much does time (like 6 months to a year) affect the COR of a ball which is sitting around in someone’s drawer at home or in a closet somewhere?

Are you seriously able to get these balls tested? I think that someone needs to start doing that (testing balls every year) - someone independent of course.  This idea that we don’t know and have to speculate on balls being juiced every year is ridiculous.  As is the idea that MLB can (and maybe does) do that at their whim.  In fact, that is even more ridiculous.  Consider the fact that people scream bloody murder when records get affected from so-called PED use (whether they actually do or do not is the subject of another thread).  It would put a whole new perspective on that train of thought if people knew that MLB changed the COR of balls (and hence the scoring and HR rates) at a whim to suit their financial interests!

Alan, if you are serious about testing baseballs and you know the right people to do it, drop me an email and I’ll help make it happen!

And BTW, Tango, showing us the people who hit the HR’s is not about an “anecdote.” It is important for Greg’s HR distances to be normalized by the people who hit them. What is the point of going through the weather normalization process without normalizing for the players who hit them.  I think it is more likely that it is a particularly good group of hitters so far this year that has hit homers than that the weather has been particularly warm, or the homers hit in high altitude parks on windy days, etc.

So my suggestion for Greg is to divide each person’s homer distance by their average homer distance from the last 3 years.  I would think he has that data handy.  If not, I can run that for you.


#17          (see all posts) 2009/04/17 (Fri) @ 16:27

mgl:  I’ll give you a longer response later on (I’m in a meeting right now, so this has to be quick).  Several years ago, I suggested on the SABR-L board that SABR take it upon itself to test a batch of balls each year so that at least there be some unbiased record for each year.  It would be very easy to do this.  Unfortunately, I never pushed the idea hard enough, so it did not happen.  I have more time nowadays, so I’ll give some thought to taking this on as a project.  More later…


#18    Greg Rybarczyk      (see all posts) 2009/04/17 (Fri) @ 16:32

I email all the time with Zack Hample, aka the Baseball Collector, who has huge numbers of baseballs (well over 3,000) going back many years, and he can tell the provenance of every one. 

http://snaggingbaseballs.mlblogs.com/

On the down side, a lot of these are BP balls and may therefore be unsuitable, but I could certainly ask him if he has any more or less pristine balls to loan.

However, I think a quick trip to Sports Authority might work, too.  Back when they switched the NBA game ball to that one that feels like needles in your hands, I was able to find an old one pretty easily…


#19    Tangotiger      (see all posts) 2009/04/17 (Fri) @ 17:00

Alan, you will probably find a good and willing partner in MGL.

MGL: I meant strictly in terms of showing the top 10 list, and then making any kind of comment based on that, as he did.  That was my sole objection.


#20          (see all posts) 2009/04/17 (Fri) @ 17:08

Greg...one has to be very careful testing old baseballs, since they tend to deteriorate with age (i.e., the COR is reduced).  That leads to a potential bias that has new balls looking hotter than old ones.  I did an experiment on untouched baseballs from the late 1970’s (in 2004).  We were very careful to store the balls in a climate-controlled container for about 3 weeks prior to the measurements.  I’ll dig up a plot showing the results and post it when I get a chance.


#21    MGL      (see all posts) 2009/04/17 (Fri) @ 19:57

Alan, send me an email if you want.  I’ll do whatever it takes to get this done. It pi**es me off to no end to think that MLB can and does change the COR of restitution to suit their whims and their financial interests.  I don’t know that they do this, but if they do, it is extremely dishonorable, as they are the ones that are constantly talking about a level playing field, the sanctity of the game, etc.

And if they are not actively telling the manufacturers to change the ball, but they know about a change and don’t at least make that information public, that is almost as bad.

It is ridiculous that they don’t test the balls themselves using a certified, independent lab, and then make those results public.  Absolutely ridiculous.  I would LOVE to do that every year. We don’t need SABR for that. There are thousands of balls available of course, from foul balls that fans get.


#22    Greg Rybarczyk      (see all posts) 2009/04/17 (Fri) @ 22:23

I have to say that I don’t believe that MLB would tweak the ball intentionally - quite aside from the integrity argument, they have too much to lose if anyone ever found out, compared to what they might get out of it.

Besides, having worked for two huge corporations on quality issues, including lots of work with manufacturing, I can attest that it can be difficult to maintain production precisely constant long-term, particularly when lots of materials are involved.

So, MGL, I agree that IF MLB did tweak the ball for some purpose, that would be a major integrity violation, but I suspect that the process in the Costa Rican plant has just drifted.


#23    SirKodiak      (see all posts) 2009/04/18 (Sat) @ 01:53

I have worked with major US corporations in the automotive, petroleum, and aeronautical fields on quality issues, and found quality problems not just in the manufacturing process. I have seen improper/incomplete training of operators, management producing a working atmosphere where operators are afraid to record any number not within tolerances, and outright fraud. 

I have seen plants where the only time that they meet industry and contractual standards are the day of announced audits, which they spent over a week to clean up.  It is not hard for me to believe that plants in Costa Rica could encounter the same problems, on top of normal deterioration of the process which the quality software should be able to detect prior to failure in most cases. 

I have also worked in petroleum production, and I still find it surprising how poor QA, QC, efficiency, and overall plant management can be.  I have seen managers that hold the position because of nepotism, political ties, or they were gassed by hydrogen sulfide and the company wanted to avoid a lawsuit. 

There are plenty of reasons for why the COR could change, but it would certainly be good for analysis to know when it does.  It should also give MLB cause to tighten the oversight of the production if it was not purposeful.  Kudos to those who have the means and will to make it happen.


#24    MGL      (see all posts) 2009/04/18 (Sat) @ 03:58

Two issues here:  One, if the ball is different this year or in any other year, I agree that the most likely explanation is that the manufacturing process fluctuates and that MLB is not actively instructing them to change that process in order to increase (or decrease) the COR of the ball.

On the other hand, you guys that are talking about flawed manufacturing processes and what have you are on the wrong track.  According to Alan, the tolerance that MLB provides to the manufacturer is 10%, which is a ridiculous number.  They might as well tell them, “Just make sure that the balls are somewhere between a super ball and a shot put!”

So if the balls are coming out of the plant plus or minus 2% in their COR, they are still well withing the tolerance dictated by MLB.

Now, that being said, I don’t know where Alan got that info from, but I also find it a little hard to believe that that is what MLB tells the manufacturers.  They might as well just tell them nothing, if that is the case.

The important question of course (other than is the COR this year the same or almost the same as last year and do/can they change significantly from year to year) is given the manufacturing process, how easy or difficult is it to control the COR.  They seem to have a pretty easy time doing that with golf balls that cost a buck a piece. 

I mean make a few balls, test them, and if it ain’t right, fix it.  If MLB is truly telling them not to worry about plus or minus a few percent (which is apparently enormous in terms of home run distance), then the only question is how likely is it for the balls to come out a little different each year.  If it is pretty likely and it is a random thing, than we are bound to see some plus or minus 2 or 3% from time to time.

If that is happening, should baseball be allowing that?  I think also that MLB tests the balls every year. Shouldn’t they be testing them before the season starts and if they are not within like .5% or even less, they should be telling them it is not good enough?  Again, is it possible that MLB does test them before the season starts and if they come out a little high they turn their heads?

Could it have come out high (by chance) this year and MLB thought, “Hey, maybe that will increase attendance or make up for no PED’s?”

I don’t know.  No matter what if the balls are in fact a little different each year and sometimes a lot different, MLB is doing something wrong.  There simply is no excuse for that, considering the importance they attach to all these “sacred” records…


#25          (see all posts) 2009/04/18 (Sat) @ 07:34

I tried to post this last night (before mgl #24) but it didn’t show for some reason.  I then sent it as an e-mail to mgl.  Here it is:

Thanks to mgl and greg for their offers of help.  Some comments:

1.  Obtaining balls to test from the current year is easy.  Obtaining *unused* balls from past years is not so easy.  Perhaps one can still obtain 2007 balls.  If anyone out there knows where to obtain such balls, let me know.

2.  I am pretty sure that the UMass/Lowell facility run by Jim Sherwood tests baseballs yearly for MLB.  However, the information is certainly not for public consumption.  So, it does not help in the present discussion.  I suspect his clear conflict of interest would preclude his lab from doing independent measurements.

3.  I have a very good working relationship with Lloyd Smith, having collaborated with him on a variety of different research projects.  I have asked him whether he would be interested in working with me on such a project.

4.  Lloyd and I have already collaborated on a similar study done in 2004 comparing 2004 baseballs with late-1970’s baseballs.  The latter were unused balls in their original boxes provided to me by a niece of Charlie Finley, former owner of the A’s.  The results of our study can be seen at this{link} http://webusers.npl.illinois.edu/~a-nathan/pob/COR120.JPG

Some explanation is required.  All measurements on the plot were done at 120 mph and consist of measuring the coefficient of restitution (COR). The six red points are the older, the six blue points the newer balls. Each point represents an average over six impacts.  The diamonds are the averages over the six balls in each group.

These data provide no evidence that this particular sample of 2004 balls outperform the older ones.  I indicate on the plot that a change of 0.01 (or about 2%) in the COR would lead to about an 7 ft change in home run distance.  (note:  this is probably a more accurate number than the one I posted earlier today).  The allowable range of COR specified by MLB is about 10%, leading to a 35 fit spread in fly ball distances.  That is a huge range.

Note that the 2004 baseballs show considerably less variation than the older balls, perhaps representing an improvement in quality control.


#26          (see all posts) 2009/04/18 (Sat) @ 08:08

More on the MLB spec:

The COR of MLB baseballs is measured by firing a ball at a flat piece of ash mounted on a block of concrete at a speed of 85 ft/s (about 58 mph).  The COR is the ratio of rebound speed to initial speed and has to be in the range 0.514 to 0.578.  That rule has been in existence for a long time.

Several things to criticize about this specification.  First, 58 mph is much to low.  The COR is known to decrease as the speed increases (in the link I posted in #25, you see that the COR had dropped to about 0.47 at 120 mph).  And the manner in which the COR drops with speed need not be the same for every ball.  So, balls that are identical at 60 mph may not be identical at 150 mph, a speed more relevant for the game.  The 58 mph speed was decided long ago when that may have been the only technology available.  Nowadays, one can go much higher, to at least 135 mph (which would be the ideal speed for a ball hitting a rigid wall rather than a bat that is free to recoil).  Second, as mgl has noted the very large spread in values is really ridiculous.  I really don’t know why it is as large as that.  I know that balls can be made to a much smaller tolerance than that, but I don’t know how small a tolerance can be achieved. 
In particular, if indeed we are seeing a 2% increase in the COR of balls this year, can that be explained as falling within the range of manufacturing tolerances?  I don’t know the answer, but I would bet the answer is “yes”.  Moreover, I would bet that at the level of a few percent, one can’t characterize the baseballs used in any given year.  Who’s to say that balls used in April will have the same COR as balls used in August.  To be independent “COR police” would require constant vigilance.


#27    Tangotiger      (see all posts) 2009/04/18 (Sat) @ 08:09

MGL:

This will bother you like crazy:

http://www.wired.com/wired/archive/15.05/posts_baseball.html


#28    Peter Jensen      (see all posts) 2009/04/18 (Sat) @ 11:01

Alan - I believe that I have read somewhere that individual teams use unused balls from previous years as practice balls for the current year.  If that is correct there should be some out there that might be had by a request from the right person.


#29    Ron Stevens      (see all posts) 2009/04/18 (Sat) @ 16:49

The BB in the NL so far this year is 3.89/9inn
Last year the avg was 3.43/9inn.
If the pace continues,it would be the highest
rate on record.In the AL so far,it is 3.76/9inn
compared with last years 3.29/9inn.
Perhaps it is an indications of fatter pitches
being served up,due to more baserunners,(if the hits added minus hr’s would also support the bb increase.


#30    Tangotiger      (see all posts) 2009/04/18 (Sat) @ 17:25

Ron, can you give that numbers as per PA.  Giving it as per 27 outs doesn’t mean as much.


#31    Dave Allen      (see all posts) 2009/04/18 (Sat) @ 18:00

The BB/PA rate in the AL so far this year is 9.6% compared with last year’s rate of 8.5% in the AL.  But last year it was 9.2% in April and 8.1% in May.  Since it probably drops steadily from the beginning of April until May (rather than suddenly jumping down when the calendar flips) it is safe assume during the first half of last April it was above the April average of 9.2%, and probably right about the current average.  The NL numbers probably show the same trend.

This is going off of the baseball-reference month splits, but with a little work one could get the walk rate for the first half of last April.


#32    MGL      (see all posts) 2009/04/19 (Sun) @ 01:16

The bottom line here is, “When the ball strikes the bat, is it coming off harder than in the recent past, and if yes, is that a random fluctuation and the balls and bats (and pretty much everything else) are the same as last year, or has something changed in the construction of the baseball to ‘cause’ this?”

The best indicators or proxies of batted ball speed are flyball distance, line drive and ground ball hit percentage, and DP percentage.

I will look at fly ball distances in a few minutes.  Maybe someone else can look at GDP rates and line drive and ground ball hit rates…


#33    Greg Rybarczyk      (see all posts) 2009/04/19 (Sun) @ 01:39

This was mentioned before, but I wanted to elaborate on the fact that the parks in the sample will affect the home run distances.  What we saw today in the Bronx, with 3 or 4 really short homers, will distort the distance numbers I’m trying to interpret.  There are competing effects: the livelier ball (if it is livelier, of course) makes fly balls go farther, so distance goes up, but it also makes some balls that formerly were not homers homers, which adds data points to the lower tail of the distribution.  So, on days where they play at NYS, the numbers may dip…


#34    dan      (see all posts) 2009/04/19 (Sun) @ 03:53

Greg--

Speaking of the new Yankee Stadium… The listed dimensions in right, left, etc. are the same as the old stadium. But I heard somewhere that the changed curvature of the walls actually made the stadium smaller and more homer-prone. Click my name.


#35    MGL      (see all posts) 2009/04/19 (Sun) @ 04:25

Well, I looked at avg. fly ball and HR distances in April from the STATS databases from 06-09.

Avg. Fly ball distance (coded by STATS as “F")
06 325.8
07 322.6
08 322.1
09 325.6

Avg. HR distances
06 391.3
07 391.3
08 389.1
09 391.0

Avg. Fly balls ("F") per HR
06 7.35
07 9.24
08 9.35
09 7.90

I don’t see a whole lot here to tell you the truth, other than things seem to be reverting back to the days of 06. 

Actually, now that I think about it, that does not include line drive HR’s.  Let’s do the same thing but include all balls coded by STATS as line drives ("L) or fly balls ("F").  That is ALL air balls other than pop flies ("P"). I don’t think they code any HR’s as pop flies.

Avg. Fly ball distance (coded by STATS as “F” or “L")

06 287.6
07 286.3
08 285.7
09 288.8

Avg. HR distances (All HR - F and L)

06 389.8
07 389.5
08 388.0
09 388.3

Avg. Fly balls and Line drives (F and L) per HR

06 11.26
07 14.02
08 14.21
09 11.49

According to this, HR distances are almost exactly the same as last year and actually less than in 06 and 07.  They are just hitting more HR per air ball than in 08 or 07 and around the same as in 06.  The difference in HR/air ball percentage between 06 and 09 levels and 07 and 08 levels is 1.76%.  That is like 3 SD I think for 3700 air balls in 09 and 7000 in 08.

Greg, I don’t know why such a discrepancy between your average HR distances and STATS. I realize that they are crudely estimating the distances of HR (and all fly balls for that matter), but that shouldn’t make much difference when comparing one year to another.

Don’t like or trust STATS? Here is the same stuff with BIS data:

Avg. Air ball distance (coded by STATS as “F”, “L”, “FLL”, “FLF”, or “P")

So these include pop flies, which won’t affect the HR data but will the other data.

06 260.4
07 259.3
08 257.6
09 254.0

Avg. HR distances

06 385.8
07 388.1
08 386.8
09 383.8

Avg. Air balls per HR

06 13.45
07 17.03
08 17.05
09 13.68

Actually, the BIS indicates that HR distance and air ball distance is down this year by 3 feet from last year and BIS also has a lower average HR in 06 than in 07 and 08.

Actually, let me re-run the STATS data including all air balls (pop flies included so that it should be exactly the same batted ball set as BIS):

Avg. Air ball distance

06 263.5
07 261.1
08 262.1
09 264.4

Avg. HR distances

06 389.8
07 389.5
08 388.0
09 388.8

Avg. Air balls per HR

06 13.41
07 16.86
08 16.95
09 13.77

This data is similar to the BIS data, all the way around, although it is a little scary that they can be off, using the exact same batted ball set, as in the average HR distance for 2009.  It is 383.8 for STATS and 388.8 for BIS.

Also keep in mind that I compared April so far this year (through April 17) all of April in previous years.  It should be warmer in the second half of the month, so these 2009 numbers may be higher than the comparable prior year numbers.  In fact, let me redo 06-08 only including games between April 5 and April 17:

STATS

Avg. Air ball distance

06 264.1
07 260.4
08 263.6
09 264.4

Avg. HR distance

06 391.3
07 388.1
08 388.9
09 388.8

Avg. Air balls per HR

06 12.74
07 18.18
08 16.58
09 13.77

Absolutely no evidence of anything but a higher HR rate per air ball!

BIS

Avg. Air ball distance

06 261.8
07 259.3
08 257.1
09 354.0

Avg. HR distances

06 386.6
07 386.9
08 386.5
09 383.8

Avg. Air balls per HR

06 12.69
07 18.25
08 16.67
09 13.68

Pretty much the same thing as STATS but Again they have HR and air ball distances DOWN this year.

So, in summary, I think the data we are getting from STATS and BIS is crap.  But there is also no evidence that HR or fly balls are being hit any further than in prior years, again, at least according to this crappy data.

Yet Greg has HR distances a full 8 feet further so far this year. I can’t believe that all of a sudden STATS and BIS would be shorting HR distances in 2009 by 8-12 feet!  So I basically have no idea what is going on…


#36    Greg Rybarczyk      (see all posts) 2009/04/19 (Sun) @ 11:04

MGL, a couple comments.  First, compared to 2006, true distance (how far the balls actually flew) is not up all that much.  Standard distance (the weather-neutralized number) is up a lot more.  This is because 2006 was an exceptionally favorable year for weather, and April, 2006 was no exception to that.  The weather this year is (so far) nothing special, which means that balls are flying farther without the assistance of weather than they did in 2006 with a lot of weather help.  I’ll pull some numbers together later.

Also, I’d say be double cautious of in-park fly ball distances if the companies compiling it think (for example) that the dimensions of NYS are the same as the old park.  If they think that, they’ll plot a warning track fly ball to RF at NYS as 5-9 feet longer than it really is, which could throw off the numbers considerably.  And recall also from our project work that they log where the catch is made, without adding any distance to account for the height of the catch…

In any event, we need to keep watching this, which of course we will!


#37    MGL      (see all posts) 2009/04/19 (Sun) @ 16:43

And recall also from our project work that they log where the catch is made, without adding any distance to account for the height of the catch…

Greg, I realize that and other inaccuracies, but there is NO reason for these bad estimates to be biased in favor of under-valuing the distances this year but not in previous years.

I mean that is a gigantic discrepancy - you have 6 feet longer on all HR’s and they have a foot or two less, if you combine STATS and BIS.  That is more than gigantic.

When you compare non-standardized (actual) HR distances from this year to last year (either April only of last year or the whole year), what do you get?


#38    McCoy      (see all posts) 2009/04/19 (Sun) @ 17:20

http://www.baseball-fever.com/showthread.php?t=31016

Discussion on baseballs through the years with input from Scientist Dennis Hilliard who did the 2000 Rhode Island study on baseballs through the years.


#39    Tangotiger      (see all posts) 2009/04/19 (Sun) @ 19:28

I think the confusion is that Greg is reporting weather-adjusted distances (what he calls “true") and MGL is reported actual distances (which is what we’d call “sample").

As we know, true does not equal sample.

Perhaps Greg can report both numbers, the “true” and the actual (for 2006 and 2009, April), so that we can see what kind of impact the weather has.

From what I’m reading, it sounds like it’s about 8 feet of impact in the difference in the two weathers.

Could it also be that Greg’s climate-adjustments are wrong?  While I’m sure I trust Greg as much as any other researcher out there, exactly why do we believe that the climate has as much impact as Greg is reporting.


#40    Greg Rybarczyk      (see all posts) 2009/04/19 (Sun) @ 20:03

Tango, actually you’ve misunderstood my terms:

true distance: how far it actually went (or would have gone if permitted to make it back to field level).  Same as MGL’s sample.

standard distance: how far it would have gone if hit on a 70-degree, no wind day at sea level.

Standard distance is the weather-neutralized figure, true distance is what is really did.

As for temperature and altitude, I’m pretty confident that my data is correct, and that Hit Tracker models it correctly.

As for the wind, it’s certainly possible that my wind numbers could be off, but not uniformly wrong in the same direction.  For one thing, I can usually pick up the flags at the park during the home run video, so I’m pretty happy with those ones.  For the ones where I can’t see a flag, I use an internet weather station close to the park.  I’ll readily admit that connecting to weather stations at the park would be much better, but until we can do that I don’t think those differences will always line up in a certain way…


#41    MGL      (see all posts) 2009/04/19 (Sun) @ 23:01

Yeah, even if Greg’s adjustments are off by a little or he does’nt have perfect wind data, just like there is no reason to think that the BIS and STATS data are all of sudden biased this year (but not in 06-08), there is no reason to think that all of sudden Greg’s adjustments or wind inaccuracies are all of a sudden biased in favor of longer distances this year, but not last year.

Greg sent me a file of his actual distances that he estimates from video (Greg can correct me if that is wrong).  These should be exactly the same as the BIS and STATS distances, more or less.  I am going to compare them.

The bottom line is that either Greg is estimating distances too far this year as compared to last year or BIS and STATS are going the opposite.  And, as I said, the differences are gigantic.

I suppose it is possible that the HR hit so far this year have been hit in cold weather, and/or against the wind, and/or in low altitude parks, such that his normalized distances will be higher this year, but the actual distances will be about the same or lower.

Greg, before I look at your data, how do the STATS and BIS actual distances compare with yours for this year and for 07 and 08?


#42    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 00:49

I hate to just sit here and say my numbers are better than theirs, because people who say that sort of thing without backing it up come off as rather obnoxious, IMO, but let me just provide one example as to how their *methods* are flawed, and thus how you can’t really depend on their home run numbers.

On June 22, 2008, Mark Teixeira hit three home runs at Turner Field against Seattle.  I will describe them:

2nd Inning:  Teixeira hits a towering home run down the RF line that hits the RF pole about 57 feet in the air (plus or minus a couple feet on that height, maybe, but that’s a minor point.) True distance from HT: 361 feet.  BIS distance: 329 feet.  The pole is listed at (and is actually) 330 feet from home plate.  BIS is wrong on this one, no other way to put it.  The reason is that they are ignoring the height dimension.

4th Inning: Teixeira rips a long homer to RCF which lands a couple rows up in section 143, at a spot that is 399 feet from home plate and 14 feet above field level.  True distance from HT: 408 feet.  BIS distance: 399 feet.  They are wrong again because again they are ignoring the third dimension, height.  They are less wrong this time because the impact height was less.

7th Inning:  Batting right handed this time, Tex rips a scorching line drive homer to the left field corner, which impacts 354 feet from home plate and 15 feet above field level.  True distance from HT: 373 feet.  BIS distance: 360 feet.  This time their distance doesn’t precisely match the horizontal element of my spot, but that’s not totally unexpected; when a home run is hit that has a lot of fans reaching up for the ball, sometimes my mark catches where a fan first touches the ball, and sometimes my mark catches the ball a split second later when it hits the seats.  Either one will work and give the same result, as both are on the flight path of the ball at slightly different times.  The BIS number matches the seat impact point pretty closely, while my mark was the fans’ hands point.

In any event, the BIS number ignores the height above the field again, and thus is short.  When I average their numbers for Teixeira’s 2008 homers, they are short by an average of 11.5 feet across 33 home runs.

I could repeat this same sort of analysis for other players, or for the same players inside the fence, and I expect I would get the same result: BIS does not account for height, while I do.  That’s the difference (seemingly the main difference)…


#43    MGL      (see all posts) 2009/04/20 (Mon) @ 01:31

I wouldn’t call that “wrong” Greg.  STATS and BIS are simply recording all fly balls at a different point in their trajectories than you are.  In fact, their definition of all their distances are the point at which the ball made contact with something (I think).

So basically you are both recording exactly the same thing.  Neither one is necessarily more accurate than the other.  You are simply using technology to extend the trajectory of the ball so that you can record where it “would have” landed had it been allowed to fall to field level, and they are not.

Again, this has nothing to do with the discrepancies in the data, unless they used to record how far it might have gone, and now (2009) they do not, which is NOT the case.  Your data is not more accurate just because you are recording additional data (the remainder of the trajectory).

One or the other of you (or both) is not doing an accurate estimation/recording and in fact is doing a biased estimate/recording.  I should say that is true for this year or for previous years, or for both, but nor necessarily for both this year and prior years.

The only way we can reconcile whose data is more accurate is first to compare apples and apples, which is to compare Greg’s data for where the ball impacted something (which he does record) with the STATS and BIS data, and then to compare everyone to another independent observer of the video.

First thing is to see if there are any patterns to their divergence for this year or prior years.  Maybe one or the other group is totally screwing up the data in one or a few parks.  Maybe someone actually recorded the wrong data, like files got mixed up or something.

As I told Greg in an email, “I WILL get to the bottom of this!” Or, maybe not…


#44    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 06:47

I agree with MGL that first you have to compare apples.

***

That said, Greg is right in his process, because we don’t live in a Simpsons world of 2-D.  This is a 3-D world, and to only measure the horizontal, while not also measuring the height means that STATS/BIS is wrong.

Why do we care about the distance travelled?  It’s a proxy for its trajectory.  If you hit a moon shot (going straight up, no downward descent yet, but hits the pole), that is far far differnt than a HR that just clears the fence on its way down).  So, no one cares about only the horizontal.  It’s misleading to not extrapolate to the projected landing point.

Now, we “can” give STATS/BIS a standard height of say 20 feet (or whatever is the average) to increase their accuracy.  Otherwise, we are presuming a height of 0 feet.


#45    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 07:22

Dan/34 was marked for moderation and is now open.


#46    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 10:59

Dan/#34:

My thoughts on the dimensions of NYS are encapsulated in this post on WasWatching:

http://waswatching.com/2009/04/19/is-the-new-yankee-stadium-a-homer-haven/

As for the diagram at your link, it seems to derive from Clem’s diagram - here are some comments on that:

http://www.baseball-fever.com/showpost.php?p=1498586&postcount=2225


#47          (see all posts) 2009/04/20 (Mon) @ 11:14

Greg:  I still can’t convince myself whether the increase in home run distances (now 6 ft, if I read correctly what you wrote at WasWatching.com, down from 8 ft from a few days ago) is a statistical fluctuation or not.  So, here is something you can do that might be an interesting exercise.

Go back to the 2006--2008 data and determine an average home run distance for every 2-week period during those seasons.  Make a histogram of that average and see how it looks.  Does it follow a normal distribution?  What is the mean (I guess we already know that)?  More importantly, what is the standard deviation?  Given that standard deviation, what is the probability that we would see a number 6-8 ft larger than the mean in any two-week period?


#48    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 11:39

Alan, I sent you an email with some histograms, I had already done what you suggested (a couple of days ago, mind you, so not counting data After Thursday).  The histograms are not normal, but when you force a normal on them, they give this:

Season Mean Std. Dev.
2006 392.7 1.91
2007 392.6 2.43
2008 391.1 2.10

As of today, the avg. std. distance is 395.5 feet, which would be a 1.5 SD event in 2006, a 1.2 SD event in 2007, and a 2.1 SD event in 2008.

So, the p-values have definitely risen over the past few days.  It looks like a lot of that came from the 20 home runs at NYS in the first 4 games.  The avg. std. distance outside of New Yankee Stadium is 396.6 (i.e. 1.1 feet higher), while the average inside NYS is 373.1 feet.

I think that the lack of games in the Bronx in the first 10 days drove the number up, and now it’s come back down.  That is clear.  I’m planning to watch closely and see what we get ove rthe next couple weeks.

And by the way, I don’t think anyone should be convinced yet of anything - I would describe myself as suspicious and intrigued smile


#49          (see all posts) 2009/04/20 (Mon) @ 11:52

Nice work, Greg.  If I look at the distribution for 2008 (perhaps you should post a link so we can all look at it), a 395.5-ft average would appear to be quite unusual, falling right at the upper end of the distribution.  Interesting!  And looking more statistically convincing.  But, let us watch further, as you suggest.


#50    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 13:02

Again, these are the distances of the average HR, right?  As I said, that’s the wrong way to look at it, since warning track balls, when juiced, will be “just over” HR, and bring down the average.

At the very least, why not report the 300 longest HR for each season.  This way, if you have 310 HR in one year and 360 HR in another year, you lop off the bottom 10 in one and bottom 60 in another.


#51    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 13:11

Tango #50

I definitely plan to do that, but that won’t work until October… I could try looking at the top 10 or 20% of homers now, but that chops the sample size a lot, of course…

Alan #49

I’ll post those histograms later, or maybe Tango could if I mail them to him…


#52    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 13:38

Greg, posted.

Everyone here can go to the top of this thread.  Three histograms are there.

***

Greg: you can do that now, by looking at 100% of the 2006 HR through April 19, and only enough of the 2009 HR so your “n” is equal.


#53          (see all posts) 2009/04/20 (Mon) @ 14:00

Re, Tango’s suggestion in #50:  Just thinking out loud a bit here, but Greg has information about fence location.  A somewhat different (but related) suggestion is that you could eliminate the “just enoughs” from the data sample by demanding that the landing location is xx ft beyond the fence (not sure what xx should be).

The 2007 distribution looks very strange.  Not sure what to make of that.


#54    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 14:25

Alan: yes, I’m not sure that would still be good enough.  Again, you could still get more “just enoughs” that qualify.  The problem is that the lower threshhold is always below the mean, so anything that just makes it over whatever that threshhold is will add to the sample, and decrease the mean.

When people talk about “long HR”, they don’t mean “average”. 

So, I don’t see the issue in having an equal number of HR in each bucket (over the same monthly time period) and simply figuring the mean of those samples.

This is the same issue with “average fastball”.  What if a pitcher decides to throw a “slow fastball”, such that it has more bite but less speed (but the same number of “regular fastballs").  But this slow fastball is not enough of each to drop it into a sinker or slider category. 

Now, we’re talking about a pitcher who loses speed on his fastballs.  But, it’s purely a trick.

If we agree to only count the 25% fastest pitches he throws as “fastball speed”, then we don’t have this issue.

Basically, our classification system does not model the reality of the question.


#55    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 15:27

Hot off the keyboard, some numbers on the upper tail of the home run distributions over the past few years.

Here we are looking at home runs with standard distance greater than 415 feet (which should get out of almost every park in almost every direction).

Season # HRs % True Avg.  Std. Avg.

2009 105/406 25.9% 428.5 429.2
2008 841/4879 17.2% 430.5 428.0
2007 865/4957 17.5% 430.8 428.5
2006 1034/5386 19.2 431.9 429.3

The average standard distances look pretty close to each other, as do the true distances (which depend greatly on league-wide weather conditions).  The number that looks out of place is the % of 2009 homers hit with standard distance greater than 415 feet.

I thought about it, and decided to see what the 2009 numbers would look like if I lopped a few feet off of all of them.  This would simulate the idea that this year’s homers have a few feet of “juice” in them.  If I take out that “juice”, do my numbers look reasonable?

So here’s what it looks like for different values of “juice” taken out of the 2009 ball:

Juice # HR % Avg. Std Dist

-1 98/406 24.1% 429.1
-2 93/406 22.9% 428.8
-3 88/406 21.7% 428.6
-4 86/406 21.2% 427.9
-5 81/406 20.0% 427.6
-6 77/406 19.0% 427.2
-7 69/406 17.0% 427.5
-8 64/406 15.8% 427.4
-9 58/406 14.3% 427.6
-10 51/406 12.6% 428.2
-11 48/406 11.8% 427.9
-12 44/406 10.8% 428.0
-13 41/406 10.1% 427.9
-14 40/406 9.9% 427.2
-15 39/406 9.6% 426.5

From this it looks like the best fit would be around 6-7 feet of “juice” on the ball.  This is no proof, of course, but it doesn’t contradict the idea that the ball might be “hot"…


#56    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 15:50

Greg, your data proves my point about the “average”.  You can take 12 feet off every single HR, and you end up with virtually the same average above a fixed threshhold!

Whether that fixed threshhold is 415 feet, 390 feet, or the outfield fence of MLB is practically irrelevant.

Focus only on the 350 longest HR in each of the last 4 years as of Apr 19, and I’d bet you’d get more instructive numbers.  Indeed, you might end up with exactly the “juice” number you are looking for.


#57          (see all posts) 2009/04/20 (Mon) @ 15:52

Greg:  not to quibble too much, but when you lop off n ft, shouldn’t you change both the numerator and denominator of your ratio?


#58    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 16:09

Tango # 56:

Not sure I understand: the top 350 homers as of April 19th of each year is nearly all of them (for example, in 2007 I only have 380 homers measured through Apr. 19, 2007. In 2009, there have been 406)…

The main reason to position this threshold comfortably above the average HR distance is so you are not missing any fly balls.  If I selected everything over 390 feet, for example, I’d lose an unknown number of home runs, including most home run balls hit to center field.  By putting it at 415 feet, I lose hardly anything, but keep the sample size as big as practical…

Alan #57:

Alan, when I lop off a few feet, what happens is that they all get shorter, and some drop out.  What you see above is the average of the remaining homers still above the 415 foot threshold (afetr I took 1 foot away from each).

When it says “98/406” for 1 foot of juice, it means that 98 of the homers have a standard distance greater than 415 feet after I take 1 foot away (so, 98/406 are above 416 feet, really).


#59    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 16:11

To clarify, I want to stay above the threshold of non-homer fly balls.  Going any lower than 415 starts to let those creep in, and since my data set hasn’t got them, I prefer to stay clear above that level so as to be confident of having a complete set of fly balls…


#60    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 16:14

Greg:

Pool1:
415 ft
410 ft
400 ft
380
380
380

Pool2:
415
410
405
400
390
385
380
370
370
370
370
370

Take the 6 longest HR in pool1 and the 6 longest HR in pool2.  Compare.

The key point is that the number of HR in both pools are identical (n1=n2).


#61    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 16:20

OK, I see what you mean.  Let me check that


#62    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 16:26

Here’s what I get looking at the top 100 longest HR by standard distance from the first 14 days of the 2009 and 2008 seasons.  Looks like the 2009 homers are 4.1 feet longer on average, with a p-value of 0.028 for the null hypothesis that they are really the same (i.e. they most likely aren’t)

Two-Sample T-Test and CI: 2009 top 100, 2008 top 100

Two-sample T for 2009 top 100 vs 2008 top 100

N Mean StDev SE Mean
2009 top 100 100 429.9 11.4 1.1
2008 top 100 100 425.7 14.7 1.5

Difference = mu (2009 top 100) - mu (2008 top 100)
Estimate for difference:  4.12
95% CI for difference:  (0.45, 7.79)
T-Test of difference = 0 (vs not =): T-Value = 2.22 P-Value = 0.028 DF = 186


#63    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 16:29

Fantastic, exactly what I’ve been looking for.  If you look at the top 200, is it also 4 feet?  I presume it would be down to 3.5, maybe 3 feet?


#64    Greg Rybarczyk      (see all posts) 2009/04/20 (Mon) @ 16:42

Nope, it went the other way:

Two-Sample T-Test and CI: 2009 top 200, 2008 top 200

Two-sample T for 2009 top 200 vs 2008 top 200

N Mean StDev SE Mean
2009 top 200 200 418.1 14.8 1.0
2008 top 200 200 411.2 18.4 1.3

Difference = mu (2009 top 200) - mu (2008 top 200)
Estimate for difference:  6.94
95% CI for difference:  (3.66, 10.21)
T-Test of difference = 0 (vs not =): T-Value = 4.16 P-Value = 0.000 DF = 380


#65    Tangotiger      (see all posts) 2009/04/20 (Mon) @ 17:14

Well, that’s very interesting isn’t it?

It would seem that the “top-end” HR (100 longest) gain a bit (4.1 feet), but the next 100 longest HR (101 to 200 longest) gain 9.76 feet each.

Is it also possible that the longest HR are in the air for so long that friction and gravity have more of an effect than other HR?  So that there’s a maximum-type of length for a HR?

I’m thinking in golf or frisbee and the kind of “boomerang” kind of effect where you hit a wall if the trajectory is too steep.

Could there be a bias in your standard distances whereby the launch point affects your measurements?

I don’t know about anyone else, but I find all this very fascinating, especially in light of the accu-score thread.


#66    Bjorn      (see all posts) 2009/04/21 (Tue) @ 11:09

First a stupid question, couldn’t an increase (or any change for that matter) just as likely come from the bats as the balls? From what I understand virtually everyone uses bats from the same company so couldn’t they just have got a batch of unusually “good” wood?

Also, if the tolerance for the balls is actually as big as about 10% and given that MLB should have a financial incentive to have higher production I would find it close to negligent if balls were not ALREADY (i.e. for years and years) at the uper end of the allowed tolerance window.


#67    MGL      (see all posts) 2009/04/21 (Tue) @ 16:35

Yes, I suppose that it could be the bats, but I would think that it is much easier to manufacture a ball with more or less COR than a bat.

I would find it close to negligent if balls were not ALREADY (i.e. for years and years) at the upper end of the allowed tolerance window.

I don’t know that that makes much sense.  It depends on when they set the tolerance interval.  Also, the interval is so large that I doubt that it is ever going to be anywhere near one of the tails.  10% is apparently so (ludicrously) large, that a ball that was near the upper end would travel about 600 feet!  And of course it depends on what the middle is.  If the middle is about what it is now - i.e., the standard was set 10 or 20 years ago, then of course you would not expect it to be anything but near the middle at all times, plus or minus 2%.

And even IF (and it is a big if) it is financially advantageous for baseball to encourage more run scoring, there has to be a practical limit.

But, it is certainly not beyond the realm of possibility that in a panic this year because of the economy that they asked the manufacturer to “tweak” the baseball just a tad, if that is even possible.  As far as “fear of getting caught” as someone suggested would be a deterrent to that, they could easily have a reasonable explanation which would be that run scoring and HR rates have been down recently and that they were just trying to get them back to historically normal levels.  They could even say that the COR of the balls has been dropping the last few years and that they were trying to have more quality control in the manufacturing process.  Or they could just do what almost everyone does when caught - which is to just deny it, even in the face of strong evidence, and eventually it goes away.  Fear of getting caught, when there is no punishment and you can easily and always deny whatever it is you are accused of, is NOT a big deterrent…


#68    Tangotiger      (see all posts) 2009/04/21 (Tue) @ 16:39

In the NHL, they go out of their way to say they want more scoring, and try to tweak their rules to that effect.  And they have one heckavu time figuring out how to do this without increasing the size of the net.

It’s so funny that it baseball that you can make just slight modifications and have such profound effects, but they “purity” of the game is so prevalent that they’d have to commission an inquiry if it is determined that run scoring is up 10% based on balls/bats.


#69          (see all posts) 2009/04/21 (Tue) @ 18:35

Some comments on some recent posts:

Bjorn #66 asks: 

Couldn’t an increase (or any change for that matter) just as likely come from the bats as the balls? From what I understand virtually everyone uses bats from the same company so couldn’t they just have got a batch of unusually “good” wood?

Quick answer:  NO.  To lowest order, all wood bats perform the same. 

MGL #67 responds: 

10% is apparently so (ludicrously) large, that a ball that was near the upper end would travel about 600 feet!

Most baseballs in use to day are at the upper end of that range.  In fact, the UML study done in 2000 showed exactly that.  It is most certainly not true that balls at the upper end would travel 600 ft.  I realize MGL probably did not want us to take that number completely seriously, but the fact remains that there is very little headroom between “normal” baseballs and the upper end of the range.  On the other hand, a large change is not needed to have an observable effect.  In one of my earlier posts, I noted that a 2% change would pretty much account for the extra distance that Greg was seeing.


#70    SirKodiak      (see all posts) 2009/04/21 (Tue) @ 19:04

Tom,

If the NHL wants more scoring, why don’t they change the puck?  Wikipedia says

the use of the Firepuck was discontinued because the slight structural change increased the tendency for the puck to bounce on the ice. This made it more difficult for the goaltender and resulted in increased scoring.

I have no idea if it is true, but it seems likely small changes in the puck (weight, COR, balance, edge texture, etc) could increase scoring.  Do they not do it for safety reasons? Integrity of the game reasons? Or have they tried it and failed?  I ask because I do not know.


#71    Guy      (see all posts) 2009/04/21 (Tue) @ 20:05

Alan:  Can you think of anything about the distribution of HR lengths that would logically produce the results Greg reports in #64 (greater increase for HRs 101-200 than 1-100)?


#72    MGL      (see all posts) 2009/04/22 (Wed) @ 03:13


#73    MGL      (see all posts) 2009/04/22 (Wed) @ 05:26


#74    joe arthur      (see all posts) 2009/04/22 (Wed) @ 06:49

The hittracker spot for McCann’s home run (#1 on MGLs list; 4/5/09) is not quite right. It is touched by a fan in the aisle between the 2nd and 3rd sections (counting from the foulpole) of the second deck, perhaps 4 rows back. Hittracker appears to be spotting it a section further to the left, in the aisle between the 3rd and 4th sections. That overstates the impact distance by about 10 feet.


#75    Tangotiger      (see all posts) 2009/04/22 (Wed) @ 10:15

MGL: one good reason to expect that the HR distances are longer is because there are more HR being hit this year.

This is why I asked Greg to list his top 200 longest HR, and that they come out as 7 or so feet longer than the longest 200 from the past.  Considering that there are more HR hit this year, one would hope it’s because all flyballs are flying farther.


#76    Paul Scott      (see all posts) 2009/04/22 (Wed) @ 12:41

Isn’t there another, in some ways more significant, concern here.  One completely unrelated to HRs.  The basic methodology - a pool of trained individuals looking at video to determine where a ball is hit - is identical to everything that goes into +/- (and, iirc, is the same data relied upon by MGL for UZR).  If BIS is doing a terrible job on HR distance, why should we trust the same group doing essentially the same (but, I would think, difficult) job for non-HRs that for the basis of all advanced defensive metrics?


#77    Tangotiger      (see all posts) 2009/04/22 (Wed) @ 12:55

Paul, I don’t know that it’s that big a deal.

For example, in my WOWY system, I use no subjective data (just the identities of the players and parks).  And I get results that are consistent from the play-by-play (PBP) metrics.

The key point is that as long as the errors are random, and not biased, then sample size is our friend.  We have not shown if the errors in the PBP recordings are biased or not.  They may be.

In the case of the HR, there is data recording issue of some sort that sample size simply doesn’t make go away.


#78    Tangotiger      (see all posts) 2009/04/22 (Wed) @ 13:10

Paul issue is real.  I didn’t mean to imply less of it.  Results from Andruw and Ichiro show that.  And Peter has shown it as well in his comparisons.

But, as I said, given a large enough sample size, I really don’t need to know where the ball went, because it’s almost good enough to just know who the batter and pitcher is in place of actual location of hit ball.


#79    Hizouse      (see all posts) 2009/04/22 (Wed) @ 14:02

One possible discrepancy between MGL in #73 and Greg is that MGL was apparently using Clem’s ballpark diagrams, whereas Greg has his own ballpark diagrams.  I am betting Greg’s are more accurate.  If memory serves, he derived the diagrams in part from satellite pictures where he uses the known 90 ft between the bases to measure everything else.

Here’s a link to one of Greg’s diagrams (note you can view it with or without range rings):
http://www.hittrackeronline.com/detail.php?id=2009_106&type=ballpark


#80    MGL      (see all posts) 2009/04/22 (Wed) @ 14:30

Tango is right about the fact that you don’t really need much accuracy in the long run for these metrics.  I’ll go one step further and say that even a little bias won’t make much difference if it is a systematic, across the board bias.  The problems arise when you have biases like scorers in one park doing things differently than scorers in another park.

I don’t think there are many significant differences between Clem’s diagram’s and Greg’s to tell you the truth.  Plus, the satellite views are not the end-all because of the angles of the views.  In any case, I was not very meticulous in my estimations, so there is no doubt that I am off.  I should have used Greg’s diagrams when I looked at the video. I didn’t know he had diagrams of all the parks on the web site.

I am guessing that STATS and BIS non-HR estimates are going to be much more accurate than the HR ones. It is difficult to track the HR’s and I am guessing that some of the scorers just take a wild guess.  Sometimes you are not even sure where the ball lands.  You often have to look real carefully at the crowd to see their reaction.


#81    Peter Jensen      (see all posts) 2009/04/22 (Wed) @ 15:38

I am guessing that STATS and BIS non-HR estimates are going to be much more accurate than the HR ones.

MGL - As Greg has pointed out it is likely that STATS and BIS are going to be less accurate with their on field measurements than than their HR measurements because there are fewer visual cues that one can use to locate where the ball is in the outfield, especially when trying to record from video.  I also believe this is true from my own limited experience trying to record hit ball locations from video.  The camera is often switched to a closeup on the fielder while he is still running to make the catch and it is near impossible to figure out what his final position is.  Also the original wide shot can be from any one of several cameras and if you don’t pay very strict attention to which camera is being used the perspective on any background cues can cause confusion as to position.  And the amount of telephoto magnification can cause the fielder to appear to be closer to the fence than he actually is. 

Scoring at the game avoids these problems but introduces others.  There is no instant replay or slow motion so when you are trying to determine where the ball hits the ground on a hit you have to go with your first guess.


#82    KJOK      (see all posts) 2009/04/22 (Wed) @ 15:49

One potential bias between years in the BIS data is that the BIS stringer personnel change, at least a little, and perhaps a lot, from year to year, while the stringer for Hit Tracker (Greg) remains the same from year to year.


#83    Tangotiger      (see all posts) 2009/04/22 (Wed) @ 16:08

I think by far the most important reason is that for Greg, he needs accuracy.  It’s his bread-and-butter, and his site lives and dies on it.  Couple that with the fact that he has a singular focus on the HR, and he’s his own boss, and I would put Greg up against all 100 stringers of STATS and BIS.

Furthermore, if Greg is ever wrong, he’ll be there to take the hit.  For the other outfits, it’s simply not that important for them to get this right.  It’s “important”, but not IMPORTANT.


#84    Peter Jensen      (see all posts) 2009/04/22 (Wed) @ 16:40

It’s his bread-and-butter, and his site lives and dies on it.

Well, yes and no.  Greg’s site depends on his reputation for accuracy, but he doesn’t.  He has a full time job to fall back on.  BIS has contracts to supply data to the teams.  If their reputation for accuracy comes into question it’s their livelihood that is being threatened.  And I don’t think they are as diversified as STATS so it might be the end of the company.  BIS’s analysis is nothing that the teams can’t either get just as good on line for free or from in house sources.

If Sportvision (or STATS) finally perfects its plans to electronically track the ball all the way from the pitcher to the glove of the fielder or its landing place in the stands, then I can’t see what either HitTracker or BIS are going to be able to market and it might mean their end anyway.


#85    MGL      (see all posts) 2009/04/22 (Wed) @ 16:40

Peter, yeah you’re probably right about the HR versus non-HR thing except for one thing, which I think is an important consideration.  I think that the BIS and STATS stringers know that distances for non-HR are important are important and I think they think or know that for HR’s no on really cares (except for a few obsessed researchers and analysts).  So when it comes to the HR’s I really think they are just guessing some percentage of the time, or worse than that.  For non-HR BIP, I think they take their time to try and do a good job.

I agree wholeheartedly with # 83 above…


#86    dan      (see all posts) 2009/04/22 (Wed) @ 16:47

If you look at post 72 and 73, MGL messed something up with homerun #8. Post 72 has two home runs listed as HR#7, the second one should be changed to 8, 8 should be changed to 9, etc. I got confused, so I’m just clearing that up for anybody else.

Does anybody know if BIS or STATS use anything more than a guesstimate? (i.e. they look at where it landed and approximate)


#87    Tangotiger      (see all posts) 2009/04/22 (Wed) @ 16:50

Peter: Greg can easily shift from being a data provider to a data analyst, as his work in THT has shown.  Furthermore, nothing can match his inspiration.

Indeed, your line of thinking would spell the doom for b-r.com, wouldn’t it?  Everything on Sean’s site is reproducible.  b-r.com thrives in spite of what MLB.com and ESPN can throw at it.


#88    Brian Cartwright      (see all posts) 2009/04/22 (Wed) @ 20:15

Greg - do you have a chart of Nationals Park? When I followed the link to the Ballpark section of Hit Tracker, Nats Park linked to PNC. Further I know that numbers for that park where late in coming in 2008. I also know that Nats Park still isn’t on Google Earth.

However, I work for an aerial mapping company, and we recently mapped that section of D.C. for a proposed sewer line, and have excellent low level (from 2000 ft) photos. I made a Microstation .dgn file of the park, and found that they have some errors in the “official” fence distances, the worst in cf, where published is 403, but actual is 409.7

I’d be glad to send you my drawing.


#89    Greg Rybarczyk      (see all posts) 2009/04/22 (Wed) @ 20:34

http://www.hittrackeronline.com/parks/nationalspark.jpg


#90    MGL      (see all posts) 2009/04/22 (Wed) @ 20:54

It blows me away that the signs on outfield walls are not 100% correct.  Especially with laser technology these days.  You can buy a $200 golf laser range finder that is accurate to within a fraction of a foot.  How can they possibly put a wrong number on an outfield wall.  If nothing else, the team can’t run a string from home plate to the wall?


#91    Greg Rybarczyk      (see all posts) 2009/04/22 (Wed) @ 21:29

I have CF in Nats Park at 409 in my diagram


#92    joe arthur      (see all posts) 2009/04/23 (Thu) @ 00:01

Brian #88 - Nationals Park is visible on Google Earth; its measurement tool agrees that the distance is 409 feet to the deepest point, just slightly left of straightaway center. It looks like 404 feet to dead center ...


#93    Brian Cartwright      (see all posts) 2009/04/23 (Thu) @ 00:25

Last time I checked Google Earth it wasn’t there, so that’s a very new addition.

I did measure the distances to the corners in the outfield fence.


#94          (see all posts) 2009/04/23 (Thu) @ 05:03

I was just looking into this subject of juiced balls on a project I was working on, and wanted to share some information on this.  If this is common knowledge, just clip it.

In 2000, MLB made public a report from the Umass Lowell on their balls used in 1999-200.  It should still be available online.  From my saved copy it says:

“The weight and COR tolerances provide maximum distance differences of 8.7 and 40.4 feet, respectively.  This means that theoretically, two baseballs could meet the specifications but one ball could be hit 49.1 feet further than the other could be hit”

Thats about a +/- 5% tolerance.  Also, the variation between balls tested was nowhere near the 5% tolerance allowed, it was about +/- 0.5%.  The 2000 balls ranged from 396 to 400 ft.

I suspect the balls in 2007 and 2008 were less juiced, to give the illusion that the testing program was working.  Today’s balls are more like we saw in 2006 and earlier.  If they do any testing and confirm the balls are the same as 2000, that will mean nothing since they have most people convinced it’s all about the steroids, and folks will just assume players are finding loopholes in the testing program.

Of course, we have no idea if they will use the same batch of balls all year, they might tone the balls down in the summer for example. 

I am a little puzzled by Alan Nathans test showing little difference between balls in the 1970’s and 2004.  There was a study by the University of Rhode Island that suggested otherwise.

http://www.uri.edu/news/releases/html/02-0611.html

“Platek and Gregory also conducted another series of bounce tests on the pills. They found that the average height of the bounces from the 1995 and 2000 pills was 83 and 82 inches respectively, when dropped from a height of 182 inches. Each of the pills from 1963, 1970 and 1989 bounced no higher than an average of 62 inches. For each of the pills, the scientists conducted about 20 drops.”

I understand it’s not the came test, but they also found other differences in the balls, and I am convinced the balls account for most of the HR gains in the juiced era.

Of course, there is the off chance MLB is not aware of whats happening, since I have known manufacturers who had special production for testing purposes if they knew a particular lot was going to be subjected to testing. MLB’s own tests may show no change if their sampling is not random and the manufacturer knows which lots will get tested .  But I have a hard time figuring out Rawlings motive to cheat, while MLB has a clear motive, and consider the probability they are not aware of the ball quality to be very low.

Slightly OT, but it’s also interesting that the minor league balls according to the report are deader than MLB balls, about 8.5 ft less than a MLB ball traveling 400 ft.  These balls are made in China. Not sure if those who project minor league players are are aware of this.  I have never heard it mentioned outside the MLB report.


#95    Mike      (see all posts) 2009/04/23 (Thu) @ 07:57

Even more home runs hit yesterday at Yankees stadium and it looks like Cashman has been following Greg’s work:
http://sports.espn.go.com/mlb/news/story?id=4090704

“But Cashman also said home runs are traveling about eight feet farther so far this year compared to last season.”


#96          (see all posts) 2009/04/23 (Thu) @ 09:44

Re Todd #94 and the tests I did on the 1970’s vs. 2004 baseballs:

1.  The sample size was small--very small--so one should not draw general conclusions.  I did the experiment mostly for amusement and certainly not to make some general statement.  And the only reason why I publicize the results is to demonstrate how to address the issue of juiced balls in a scientific way.  It was not meant to be a definitive study.

2.  On the other hand, the URI tests that were referred to did not measure the COR of the balls.  They only studied the pill and only at very low speed.  I tried to contact Hilliard back then to engage him in a dialogue and to suggest he take the balls to UML to test there.  I could never get him to respond to my e-mails.  I am skeptical of claims of balls performing differently because the materials used in their construction are different (e.g., synthetic vs. wool yarn).  The issue is not the materials but the COR.  It is easy enough to measure correctly.  So, why not do it.

BTW, in a previous thread, Erik Walker claimed major changes in the baseball could account for changes in his “power factor”.  I suggested to him that he take the suspicious baseballs to the testing laboratory at WSU, not so far from where Walker lives.  He has never done that. 

There is pseudo-science and there is real science.  It is often times hard to distinguish one from the other.

Sorry for the rant.


#97    Mitch      (see all posts) 2009/04/23 (Thu) @ 14:27

I’m late to this discussion, having picked it up only after Cashman was quoted in an ESPN story this morning.  I think there was a pretty significant sample bias in first week homers.

Average for first 199 home runs, through April 12: 401.7 feet (True distance from Hit Tracker)
Average for home runs since: 394.0 feet

What could be the difference?  Well, over the course of one week, not every park will host games, so that’s one possibility.  Looking at the same period from 2008 and 2009 (Apr 6 through 20), there is no difference at the 95% level, at least according to my calculations (and I fully admit I could be doing this wrong; I’ve taken one stats class).  More here…

baltimorebirdsnest.blogspot.com/2009/04/2009-offense-and-home-runs.html


#98          (see all posts) 2009/04/23 (Thu) @ 18:29

Greg...please keep us updated on how the home run distances are evolving.  Has the effect virtually disappeared, as Mitch(#97) suggests.  Should we all move on and talk about something else--like the Yankee Stadium effect?


#99    paul todd      (see all posts) 2009/04/23 (Thu) @ 19:25

# 96 Alan Nathan.

Thanks for the rant (grin).  Seriously, any study is going to have SSS issues, otherwise there would be more studies with no SSS issues. 

However, while looking at the balls construction is not conclusive, it has value, assuming of course that balls are not constructed differently by the same supplier.  It is for sure that synthetic windings absorb less moisture than wool fibers, so balls in the 70’s made of wool fibers were likely to be heavier, and this could account for balls not traveling as far. 

Not sure if you measured weight and analyzed and compared the construction of the balls in your study.

While the drop test done was not the best, it does suggest the pill is not the same as in 1989 and has less bounce.  But even if the proper test was performed, MLB would argue that the aging of the ball would invalidate such results.  That’s why I was a bit surprised your results indicate no deterioration or aging effect since I expect that to be true as well after 30 years.  This depends on storage conditions, so if your balls were kept away for high temperatures they may have aged less.

The facts though are that when balls suddenly travel farther, there has to be a reason.  And while any hypothesis can not be proven, it can be disproved.  I have yet to see any disproof on the ball hypothesis and the lack of scientific evidence based on testing, is not proof that it is not the ball.

I can’t comment on those who did not reply to your emails or take up your suggestions.  I don’t imagine the cost of testing is cheap though, and if the aging of the ball, or the fact that it has been hit already can be used to invalidate the results, it would be money down the drain.


#100          (see all posts) 2009/04/23 (Thu) @ 21:13

Re Paul #99:

My point really is the following:  Why do an indirect study of baseball performance when it is easy to do a direct measure of the COR?  That is in fact what I suggested to both Hilliard and Walker.  The cost of such an effort:  only your own time.  The guys who run the testing facilities are academics, like me.  For an academic-type study (like what we are talking about), they would take it on as a research project and would not charge to do the actual testing.  Again, this assumes it is a research project as opposed to a commercial endeavor.

Re this quote:

The facts though are that when balls suddenly travel farther, there has to be a reason.  And while any hypothesis can not be proven, it can be disproved.  I have yet to see any disproof on the ball hypothesis and the lack of scientific evidence based on testing, is not proof that it is not the ball.

I must confess that I don’t understand your point.  Of course longer fly balls have a logical explanation, whatever that explanation may be.  Of course it is true that a juiced ball *may* be the explanation.  I am not claiming otherwise.  However, I am seeking more direct evidence that proves the ball is juiced..  I don’t take lack of evidence that it is not a juiced ball as proof that the ball is juiced.  Perhaps that is not what you meant, but that is how I interpreted it.

We will learn a lot once we have some hitf/x data.


#101          (see all posts) 2009/04/24 (Fri) @ 00:11

In my haste to answer Paul #99, I forgot an important point.  Paul wonders in #94 how it can be that the older pill is less lively than the newer pill, yet the baseballs from the two eras has about the same COR at high speed.  Let’s assume for the sake of argument that both those facts are correct.  What does that tell you?  What it tells me is how irrelevant the pill is to the COR of the ball.  And that does not surprise me.  The volume of the ball is mostly yarn and it is the energy losses in the yarn that mostly determine the COR.  The pill overall is more lively than the bulk ball. You could never get a baseball to bounce to a height of 82” when dropped from a height of 186”.  So, while Hilliard did an interesting experiment, he has failed to show the relevance of that experiment to the juiced ball issue. 

Which brings me back to my principal point.  If you want to know if the ball is juiced, do COR measurements on the ball.


#102    Greg Rybarczyk      (see all posts) 2009/04/24 (Fri) @ 09:11

Mitch #97:

A couple things, first we are mainly looking at standard distance in order to factor out the weather, so the comparison ought to be on that output variable.  Next, we’ve discussed a ways back in the thread the idea that when the ball flies farther, you have the competing effects of increasing the existing homers and admitting more homers at the bottom tail of your distribution.

For this reason we’ve been looking at the standard distance on the top 100/200/300 homers, which keeps us above the noisy boundary layer between homer and non-homer.  There the difference is still about 5 1/2 feet (sorry I can’t post the Minitab stats now). 

This is a bit smaller difference than it was a week ago, but we obviously have to keep watching this number over a longer period - if you’ve got one time period with a high average and one with a low, you can’t know a priori which is closer to the underlying “truth”.


#103    dcj      (see all posts) 2009/04/26 (Sun) @ 17:53

Here’s another way of looking at it. Take every ball hit at a speed off the bat of >= 110 mph and an angle of elevation of >= 22 degrees. These are balls that were crushed, so almost all of them should be in the HitTracker database.

In 2006-08, there were 1286 home run balls in this category.

889 “no doubt” (69%)
353 “plenty” (27%)
44 “just enough” (3%)

This gives me confidence that very few of these balls are not HR.

In 2006 there were 509 “crushed HR.” The average speed off the bat was 112.4 mph and the average standard distance was 434.7 ft.

2006: N = 509, 112.4 mph, 434.7 ft
2007: N = 419, 112.1 mph, 432.4 ft
2008: N = 358, 112.2 mph, 433.3 ft
2009: N = 65, 112.2 mph, 434.9 ft

(2009 figures are through 4/23.)

The crushed HR this year look just like the crushed HR of previous years. Maybe they are traveling a little farther, but it’s too soon to tell.

The difference is that crushed HR are occurring more frequently this year. The results are about the same whether the denominator is PA, BIP+HR, or HR. I’ll use a denominator of PA-IBB-SH, which I abbreviate PA*.

2006: 184991 PA*
2007: 185734 PA*
2008: 184778 PA*
2009: 17303 PA* (estimated; I took the number of PA* through 4/25 and multiplied by 0.88.)

At the 2006 rate, we’d expect to have 47.6 crushed HR so far, 1 SD = 6.9. The actual number, 65, is +2.5 SD.

2006: 47.6 +/- 6.9, +2.5 SD
2007: 39.0 +/- 6.2, +4.2 SD
2008: 33.5 +/- 5.8, +5.4 SD

I’m sure that the differences between 06, 07, and 08 would also show as statistically significant.

Barring something weird going on with the HitTracker data, I see three realistic possible explanations.
1. Pitchers are throwing more meatballs.
2. Batters are making more solid contact.
3. The ball is juiced.

Is there any way of distinguishing between these?


#104    dcj      (see all posts) 2009/04/26 (Sun) @ 18:09

The SD figures above are under the assumption that every PA has an equal probability of producing a crushed HR. Obviously that is not true. It’s not clear to me whether the numbers are overstated or understated.

Let’s say that the overall probability of a crushed HR is p. In 2006-08 p was around 0.002 to 0.003. So far in 2009 it’s about 0.004. The denominator is PA* as in my previous post.

Here’s a slightly more realistic model.
20% of PA* have a probability of 0.1*p
60% of PA* have a probability of 0.3*p
20% of PA* have a probability of 4*p

So, 20% of the PA* produce 80% of the crushed HR. This may be overstating the skew, but in that case the truth will be in between the results here and the results in the previous post.

To be continued…


#105    Greg Rybarczyk      (see all posts) 2009/04/26 (Sun) @ 18:15

The problem with the “crushed” approach is that IF the ball is livelier, then you will get more crushed homers (as seems to be the case), but whenever you admit more home runs into your subpopulation, you by definition are admitting them at the bottom tail of the distribution, which drags down the average distance.  I.e. you “expect” 48 crushes, but get 65.  The “extra” 17 are all at the lower end, so naturally this counteracts the increase from the livelier ball.  This will be a problem for any method with a variable number of population members - to avoid it, you have to choose the top X members of each population…


#106    dcj      (see all posts) 2009/04/26 (Sun) @ 19:13

Under the new assumptions, the SD at the 2006 rate is still 6.9. Actually it decreases from 6.89 to 6.87. So it turns out, accounting for the differences between PA has very little effect. Not only that, the adjusted standard deviations are lower than in #103, so the z-scores are higher.

This argument doesn’t account for the possibility that a disproportionate share of the PA so far this season have been taken by power hitters (which would fall under my explanation #2 above). But that should be easy to check.

Bottom line, this post and #104 can be safely ignored.


#107    dcj      (see all posts) 2009/04/26 (Sun) @ 19:32

Greg/105:

You’re absolutely right. My method doesn’t give an estimate for the increase in length of HR. All it does is provide more evidence that the ball is juiced (or the hitters are better/pitchers are worse). One could compare the top 48 to the top 48 in different years, but in that case there’s no point in restricting to such a small sample.

I did the calculation to get a better handle on how many SD away from the mean we are this season. Using t-tests is only accurate if the distribution is normal. The SD’s from my method come from the binomial distribution, so they’re more reliable in that regard.


#108    weskelton      (see all posts) 2009/04/27 (Mon) @ 10:35

dcj,

Where are you getting your “speed of bat” data?


#109    weskelton      (see all posts) 2009/04/27 (Mon) @ 11:12

make that “speed off bat”


#110    dcj      (see all posts) 2009/04/27 (Mon) @ 14:10

wes, it’s on the HitTracker site. It’s back-calculated from Greg’s model rather than measured directly, but that shouldn’t matter for these purposes.

It will be interesting to see how these “speed off bat” numbers compare to what HitF/X gives. I bet that most of the discrepancy will be due to inaccuracy in the wind measurement.


#111    Johnson      (see all posts) 2009/04/30 (Thu) @ 03:38

Conspiracy theory 101: MLB decided to use a livelier ball so that the effects of drug testing would be less obvious. 

Oops!


#112    weskelton      (see all posts) 2009/05/07 (Thu) @ 10:19

So here we are three weeks and about 600 homers later.  It appears that we’ve settled down to a HR/G rate that is in line with the 2005-2008 period.  We still seem to be about 3+ feet longer in standard distance, but I’m pretty sure if we looked at just the last 3 weeks, we’d be lock step with the past.

I’m certainly no t-Test expert, so maybe 3+ feet with the expanded sample size is still significant.  Or is it safe to say that this conspiracy theory is DOA?


#113    Greg Rybarczyk      (see all posts) 2009/05/07 (Thu) @ 11:32

We’re overdue for an update here, sorry about that.

Looking at the distributions via a 2-sample T-test, here’s the latest:

Two-Sample T-Test and CI: 2009, 2008

Two-sample T for 2009 vs 2008

N Mean StDev SE Mean
2009 829 395.8 26.6 0.92
2008 4821 391.4 25.3 0.36

Difference = mu (2009) - mu (2008)
Estimate for difference:  4.409
95% CI for difference:  (2.463, 6.354)
T-Test of difference = 0 (vs not =): T-Value = 4.45 P-Value = 0.000 DF = 1101

I haven’t figured out how to get more digits on the p-value, so I ran it separately in Excel, and got 9.6 X 10^-6.  That says that this year’s balls are flying farther than last year’s, with almost no possibility of being wrong.  It doesn’t say that the difference is definitely 4.4 feet, of course, but we won’t be able to give the exact figure… ever.  Just probabilities…

I also took a look at the top 200 and top 300 homers by standard distance for each season, through the first 31 full days of the season, and got average differences of 4.6 feet and 5.3 feet, respectively.

I think the difference is really there, and I expect it to still be there at the end of the season…


#114    weskelton      (see all posts) 2009/05/07 (Thu) @ 12:50

Greg,

Any reason you’re limiting your T-test comparison to just 2008?  One thing I noticed is that the standard distance was down over a foot in 2008 (391.4), compared to a remarkably consistent stretch from 2005-2007 (2005=392.8. 2006=392.8, 2007=392.7).  What would a T-Test look like if you compared this year to 2005-2008?  I’m also semi-curious about what a T-Test would say about 2008 vs 2005-2007. 

Also, way back in post #7, you had previously looked at groupings of 300 homers.  As of today, we have over 800.  How odd is the 395 average standard distance if you look at groups of 800 homers?


#115          (see all posts) 2009/05/07 (Thu) @ 13:04

I was not sure exactly what weskelton #114 is asking for.  I think he is asking the following (which is exactly the question I would like to see answered):

Suppose you compare the average distance of the first 400 home runs to the average distance of the next 400 home runs, all from 2009.  What is the mean of each?  What is the standard error on the mean for each?  What is the probability that both sets of home runs come from the same distribution?  I think the latter question is the same as asking for p.


#116          (see all posts) 2009/05/07 (Thu) @ 13:08

I just realized I am not asking the same question as #114.  But his question is a good one too.  Suppose you take groups of 800 home runs from 2008.  Find the mean and standard error for each of the groups.  How do they compare with each other?  What is the probability that they come from the same distribution?  How do they compare with the single group of 800 from 2009?


#117    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 13:17

Again, don’t forget that the mean distance of a HR doesn’t help much.  My comment in post 56, in reference to Greg data in post 55:

Greg, your data proves my point about the “average”.  You can take 12 feet off every single HR, and you end up with virtually the same average above a fixed threshhold!

This is the reason we don’t care about the mean.  A short HR, if it were any shorter, would be removed from the sample (now a long flyball), thereby increasing the mean of the population of HR.

Greg’s data shows precisely this.  And this is the reason we need to focus on an equal number of HR.


#118    Greg Rybarczyk      (see all posts) 2009/05/07 (Thu) @ 13:18

Wes #114:

Good idea, I can defnitely look at other years, and in fact I have previously, I just didn’t want to cloud the discussion with too much information.

Here’s 2009 vs. 2006-08 (I’m going to omit 2005, for that year I only have data for Fenway Park, Tropicana Field and a few other random homers)

Two-Sample T-Test and CI: 2009, 2006-08

Two-sample T for 2009 vs 2006-08

N Mean StDev SE Mean
2009 829 395.8 26.6 0.92
2006-08 14941 392.3 25.6 0.21

Difference = mu (2009) - mu (2006-08)
Estimate for difference:  3.502
95% CI for difference:  (1.646, 5.358)
T-Test of difference = 0 (vs not =): T-Value = 3.70 P-Value = 0.000 DF = 915

Next, here’s 2009 vs. just 2006-07:

Two-Sample T-Test and CI: 2009, 2006-07

Two-sample T for 2009 vs 2006-07

N Mean StDev SE Mean
2009 829 395.8 26.6 0.92
2006-07 10120 392.8 25.7 0.26

Difference = mu (2009) - mu (2006-07)
Estimate for difference:  3.071
95% CI for difference:  (1.192, 4.949)
T-Test of difference = 0 (vs not =): T-Value = 3.21 P-Value = 0.001 DF = 959

Here we see the p-value come up off the peg, though still 0.001.  So, the ball in 2009 is still very likely flying farther (weather-neutralized) than it did in 2006-08 or 2006-07.

Next, 2008 vs. 2006-07:

Two-Sample T-Test and CI: 2008, 2006-07

Two-sample T for 2008 vs 2006-07

N Mean StDev SE Mean
2008 4821 391.4 25.3 0.36
2006-07 10120 392.8 25.7 0.26

Difference = mu (2008) - mu (2006-07)
Estimate for difference:  -1.338
95% CI for difference:  (-2.210, -0.466)
T-Test of difference = 0 (vs not =): T-Value = -3.01 P-Value = 0.003 DF =
9642

Looks like 2008 was significantly less than 2006-07, though not by much.

As for groupings of 800, here’s what I found inside 2006, 2007 and 2008 (so, not crossing the off-season, that takes more work than I can do right now):

For 2008 800 homer sequences, the average standard distance:

Descriptive Statistics: 2008 800 hr avg

Variable N N* Mean SE Mean StDev Minimum Q1 Median
2008 800 hr avg 4078 0 390.89 0.0206 1.32 388.76 389.94 390.60

Variable Q3 Maximum
2008 800 hr avg 391.50 394.23

For 2007:

Descriptive Statistics: 2007 800 hr avg

Variable N N* Mean SE Mean StDev Minimum Q1 Median
2007 800 hr avg 4157 0 392.18 0.0242 1.56 389.34 391.10 391.89

Variable Q3 Maximum
2007 800 hr avg 393.56 395.26

For 2006:

Descriptive Statistics: 2006 800 hr average

Variable N N* Mean SE Mean StDev Minimum Q1 Median
2006 800 hr average 4586 0 392.66 0.0194 1.31 389.73 391.70 392.53

Variable Q3 Maximum
2006 800 hr average 393.42 396.10

So, from the past 3 years, around 13,000 in-season sequences of 800 consecutive homers, there was only one period where we have seen a similar standard distance average: this was mid-April to mid-May 2006.  Interesting.  I’m not sure what conclusion this supports, maybe nothing yet, but the complete lack of 800-homer sequences similar to this season from 2007-08 seems to support the idea that the ball is traveling farther in 2009 than in those seasons.  2006 is a bit less clear…


#119    Greg Rybarczyk      (see all posts) 2009/05/07 (Thu) @ 13:27

Here’s the 2009 comparison, 1st 400 homers vs. 2nd 400 homers:

Two-Sample T-Test and CI: 1st 400 2009, 2nd 400 2009

Two-sample T for 1st 400 2009 vs 2nd 400 2009

N Mean StDev SE Mean
1st 400 2009 400 395.7 27.3 1.4
2nd 400 2009 400 396.0 25.7 1.3

Difference = mu (1st 400 2009) - mu (2nd 400 2009)
Estimate for difference:  -0.28
95% CI for difference:  (-3.96, 3.40)
T-Test of difference = 0 (vs not =): T-Value = -0.15 P-Value = 0.881 DF = 795

That’s about as entangled as you can get, no sign from this that weather-neutralized distances are changing as the season progresses…


#120    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 13:35

Can you report on the 200 longest HR in the first 15 days compared to the 200 longest HR in the next 15 days?


#121    Greg Rybarczyk      (see all posts) 2009/05/07 (Thu) @ 13:45

Here it is.  Looks inconclusive - first 15 days had slightly longer homers, but p = 0.339 says don’t bank on it, very possibly a SSS effect…

Two-Sample T-Test and CI: Top 200 first 15 days, Top 200 second 15 days

Two-sample T for Top 200 first 15 days vs Top 200 second 15 days

N Mean StDev SE Mean
Top 200 first 15 days 200 418.8 14.5 1.0
Top 200 second 15 days 200 417.4 15.0 1.1

Difference = mu (Top 200 first 15 days) - mu (Top 200 second 15 days)
Estimate for difference:  1.41
95% CI for difference:  (-1.49, 4.32)
T-Test of difference = 0 (vs not =): T-Value = 0.96 P-Value = 0.339 DF = 397


#122    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 13:50

Good job.  So, we see that the difference 1.4 feet shorter HR these last 15 days, but that difference is about 1 SD from the mean, meaning nothing to get excited about.


#123    Tangotiger      (see all posts) 2009/05/13 (Wed) @ 15:13

http://sportsradiointerviews.com/2009/05/13/dan-duquette-on-clemens-steroids-and-juiced-balls/

On the 1999 Home Run Derby in Boston:

“That was something when (Mark) McGwire was hitting them out; they were going up over the light tower.  I’m gonna tell you for a fact, those balls were juiced.  We’ve got juiced balls for the Home Run Derby, I bet you didn’t know that… Rawlings [juiced the balls].  It added to the entertainment value.”

***

Rawlings tester: Boss, 100% of the balls conform to the specs.

Rawlings boss: No sh!t.  How could they not, considering the window they give us.

Tester: Right.  If I tighten the window, 90% of the balls I can recommend sending for the regulation games.  I have another 10% batch here.  What do you want to do with them?

Boss:  Hmmm… let’s send them to the All-Star game.  And, send an extra box to Chicago and StLouis, and let’s have our fun with Sosa and McGwire.  Shhh… our secret.

Tester: Shouldn’t the balls be sent out randomly?  You know, to preserve the sanctity of the records?

Boss: That’s not our job.  That’s the job of the sportswriters.


#124    Greg Rybarczyk      (see all posts) 2009/05/19 (Tue) @ 16:12

OK, another update, including data through May 18th, 2009.  Using top 300 home runs by standard distance (weather factored out), in first 43 full days of each season.

Two-Sample T-Test and CI: 2009 Top 300 May 18, 2008 Top 300 May 12

Two-sample T for 2009 Top 300 May 18 vs 2008 Top 300 May 12

N Mean StDev SE Mean
2009 Top 300 May 18 300 430.5 11.0 0.63
2008 Top 300 May 12 300 423.4 13.7 0.79

Difference = mu (2009 Top 300 May 18) - mu (2008 Top 300 May 12)
Estimate for difference:  7.07
95% CI for difference:  (5.07, 9.06)
T-Test of difference = 0 (vs not =): T-Value = 6.95 P-Value = 0.000 DF = 570

So, from this it looks like about 7 feet more distance this season vs. last season.

The p-value is too small to show up in this output, but I got it from Excel to be 9.9 X 10^-12.  Clearly the difference is significant…


#125    Tangotiger      (see all posts) 2009/05/19 (Tue) @ 16:40

Good job.  Thanks for keeping up with this.

***

To correct:
“Clearly the difference is significant”

Should actually read:
“Clearly a greater than zero difference is significant”

You are at 97.5% of at least 5.07 feet.  And if you want to be 99.9999999% sure, you’ll be at, I dunno, at least 0.01 feet?


#126    Greg Rybarczyk      (see all posts) 2009/05/19 (Tue) @ 16:45

Yes, true.  I have to watch how I say that…


#127    Tangotiger      (see all posts) 2009/05/19 (Tue) @ 17:00

Only around here smile


#128    MGL      (see all posts) 2009/05/19 (Tue) @ 21:42

I don’t think the actual HR rates per PA are up all that much, are they?  Greg, with an increase of 7 feet or so in distance for all long fly balls, how much should the HR rate per PA or per FB go up?

And of course, we need to know whether we are answering the question of whether the ball is “juiced” this year as compared to last year (last year the ball could have been “de-juiced") or as compared to the last 3 or 4 years combined (which makes more practical sense I think).  Wasn’t last year a down year in HR rates?

Plus, are we controlling for the players, Greg? The better way to do this is as we do for aging studies - which is the “delta method”.  Although I doubt that would explain more than a small portion of the difference, it is possible that this year’s HR are being hit by bigger stronger players than last year, or that the pool of players this year is bigger and stronger or that the pool of pitchers is worse, no?


#129    Greg Rybarczyk      (see all posts) 2009/05/19 (Tue) @ 22:03

Using the “30% of homers clear by 10 feet or less”, for a 7 foot increase, I get +21% home runs.

For April, 2009, the HR/PA was 0.027094, while for April, 2008 the number was 0.023102.  So 2009 is a 17.3% increase.

The HR/game numbers are the same, +17.3% actual 2009 over 2008.  So, pretty good agreement from theory to reality…

I’ll look at some different combinations later.

I doubt the hitting population has changed enough to matter, if it’s changed at all in terms of age or power makeup… but we can always try to figure that out…


#130    MGL      (see all posts) 2009/05/20 (Wed) @ 01:26

For the first “quarter” of 06-08 (thru May 15) in both leagues, I have 13.2 HR per 500 PA, where a PA is all PA not including sac attempts or IBB.

For approximately the same period of time for 2009, I have 13.83, which is only a 4.8% increase in HR rate.

These numbers are for position players only, no pitcher hitting.

Why the discrepancy?  Can someone double check these numbers?


#131    Greg Rybarczyk      (see all posts) 2009/05/20 (Wed) @ 11:07

MGL, I think there isn’t really a discrepancy, those previous numbers were a comparison to 2008 only, not to 2006-08.  When you bring 2006 in, you are bringing in a season that had very favorable weather overall, and particularly in April IIRC.  And remember that most of the distance comparisons we’ve been making have been with weather factored out, but HR/PA and HR/game do not factor out weather, so when you bring in an atypical weather month like April 2006, that can alter the comparison…


#132    MGL      (see all posts) 2009/05/20 (Wed) @ 20:42

Yup, in the first quarter of 2008, the HR rate per PA (no sac attempts or IBB), was 12.1.  So this year is 14% higher than last year (again, this is still different from the 17.3% you quote).

But I don’t see any reason to compare it to 2008 only and then declare that “the ball is juiced.”

For us to declare that a ball is “juiced,” we first have to define or state “as compared to what.” If we are comparing it to last year only, then maybe last year’s ball was de-juiced (since we are going on the assumption that the COR of ball can fluctuate greatly from year to year).  I think the only fair thing to do in order to declare something about this year’s ball is to compare it to many years of data, maybe even the last 10 years or so. Certainly not just last year.  If we are only comparing to last year, especially since last year was quite a “down year” at least as far as recent years go, the only fair thing to do is to declare that this year’s ball seems to go quite a bit further than last year’s ball, but not necessarily much further than balls from 4 or 5 years ago, or whatever the case may be.

If nothing else, we would expect the HR rate and everything else to “bounce back” this year if last year was a down year, just by virtue of regression towards the mean alone.  So again, I caution against getting too excited about this year by only comparing to last year, a “down” year…


#133    Greg Rybarczyk      (see all posts) 2009/05/20 (Wed) @ 21:56

MGL, I’ve compared it to all the years I have data for, which is 2006-08, and it’s a significant difference for all of those comparisons - since 2006 was the year out of those 3 in which the hitters hit the balls the farthest, that comparison shows the smallest difference.  And I have been saying that the ball seems livelier than last year’s ball.  We’re on the same wavelength here…


#134    Greg Rybarczyk      (see all posts) 2009/05/26 (Tue) @ 13:24

Here’s an update through May 25, 2009, 50 full days into the season, or 29% of the way through the season.

This first one compares the standard distance (weather/altitude factored out) of the longest 300 home runs of 2009 with the aggregated longest 300 homers from the first 50 days of the previous 3 seasons 2006-08.

Two-sample T for 2009 Top 300 May 25 vs 2006-08 Top 300

N Mean StDev SE Mean
2009 Top 300 May 25 300 432.2 10.8 0.62
2006-08 Top 300 900 427.5 13.2 0.44

Difference = mu (2009 Top 300 May 25) - mu (2006-08 Top 300)
Estimate for difference:  4.653
95% CI for difference:  (3.155, 6.152)
T-Test of difference = 0 (vs not =): T-Value = 6.10 P-Value = 0.000 DF = 621

Here we see a difference of about 4.7 feet, or in other words, the 2009 homers are flying about 4.7 farther on average than they have been over the last three seasons.  Since these are the top 300 homers, and every one of the top 300 from all four seasons was at least 411 feet, we are comfortably above the “home run/not a home run” boundary layer, and thus this figure is a good one to use.

Here are the comparisons for 2009’s Top 300 to the corresponding data from the past three seasons:

Two-Sample T-Test and CI: 2009 Top 300 May 25, 2008 Top 300 May 19

Two-sample T for 2009 Top 300 May 25 vs 2008 Top 300 May 19

N Mean StDev SE Mean
2009 Top 300 May 25 300 432.2 10.8 0.62
2008 Top 300 May 19 300 425.6 13.1 0.76

Difference = mu (2009 Top 300 May 25) - mu (2008 Top 300 May 19)
Estimate for difference:  6.543
95% CI for difference:  (4.617, 8.470)

*********************************************

Two-sample T for 2009 Top 300 May 25 vs 2007 Top 300 May 21

N Mean StDev SE Mean
2009 Top 300 May 25 300 432.2 10.8 0.62
2007 Top 300 May 21 300 426.8 13.5 0.78

Difference = mu (2009 Top 300 May 25) - mu (2007 Top 300 May 21)
Estimate for difference:  5.350
95% CI for difference:  (3.391, 7.309)
T-Test of difference = 0 (vs not =): T-Value = 5.36 P-Value = 0.000 DF = 570

*************************************

Two-sample T for 2009 Top 300 May 25 vs 2006 Top 300 May 22

N Mean StDev SE Mean
2009 Top 300 May 25 300 432.2 10.8 0.62
2006 Top 300 May 22 300 430.1 12.6 0.73

Difference = mu (2009 Top 300 May 25) - mu (2006 Top 300 May 22)
Estimate for difference:  2.067
95% CI for difference:  (0.182, 3.952)
T-Test of difference = 0 (vs not =): T-Value = 2.15 P-Value = 0.032 DF = 583

***********************************

So, distances are up about 6.5 feet and 5.3 feet over 2008 and 2007, respectively, and 2.1 feet over 2006.  Distances are up 4.7 feet over the collective data of the past 3 seasons. 

These data seem to show noise in the baseball manufacturing process at a level that to me is believable - it would be exceedingly difficult to keep the process exactly stable, and given the width of the COR spec. on the ball, I have little doubt that the baseballs are in spec.  We can just expect some variation from year to year, it seems.

I’d love to have an independent lab test some balls, if they could get some intact vintage balls from the past three seasons, to see how good these numbers are.  But, as noted above, they do agree quite well with the HR per game numbers we’ve been seeing…


#135    Paul Scott      (see all posts) 2009/05/26 (Tue) @ 13:38

Have you looked in these 300 (for each season) to determine a potential player bias? 

As a quick example of what I mean, Adam Dunn will hit longer home runs than Alex Rodriguez.  Is it possible that the difference in long ball average distances is the result of a disproportionate representation of Dunn HRs in the top 300 from Start - May 25, 2009 and a disproportionate representation for ARod in Start- May 25, 2008.  (Those two being merely examples of a posibility, not meant to specifically reference a truth of these two seasons).


#136    Greg Rybarczyk      (see all posts) 2009/05/26 (Tue) @ 14:29

I think the only reason to worry about a sampling bias like that would be if in one of the seasons under comparison, a lot of the sluggers were injured or otherwise absent.  If the sluggers were present, but for some reason hit a few more Top 300 homers in one year than in another, that’s no reason to call the sampling biased - in fact, it is quite likely considering the small sample sizes involved when you break it down to individual players. 

Among the Top 300 so far this year, the leaders are Ryan Howard and Mark Reynolds with 7 each.  Last year they had 4 and 3 in the Top 300 respectively.  Adam Dunn had 7 last year and 5 this year.  Lance Berkman had 8 last year and 4 this year.  Justin Upton 5 last year, 4 this year. Mike Jacobs 4 last year, 5 this year.  hanley Ramirez 5 last year, 3 this year.

A few names have changed out: last year Jayson Werth, Chipper Jones, Matt Holliday, Ryan Braun and Rick Ankiel all had 5 homers in the Top 300.  This year, they’ve had 1,1,1,1 and 0, respectively.  In their place are Nelson Cruz with 6, Miguel Cabrera with 5, Brandon Inge with 5, Adam Lind with 5 and Torii Hunter with 5, where colelctively these guys had 3 in last year’s Top 300.

Honestly, I don’t think this is a factor, and I have trouble imagining how it could be a factor, given the fact that every full-time player in the league gets 200+ plate appearances in 50 days worth of games, and given that the selection for the Top 300 data points is by distance, not the identity of the hitter.  Using the Top 300 ensures that you are capturing the tail of the league distribution, and unless you have some very specific injuries or suspensions to some particular players, you’re not likely to get a significant effect.


#137    McCoy      (see all posts) 2009/08/20 (Thu) @ 15:54

How about another update?


#138    Greg Rybarczyk      (see all posts) 2009/08/20 (Thu) @ 16:47

Funny you should ask, I pulled this data together the day before yesterday, with data through Aug. 16th.  Using the top 750 homers now, in order to get the same approximate part of the upper tail of the distribution…

**********************

Two-Sample T-Test and CI: 2009 Aug 16 Top 750, 2008 Aug 10 Top 750

Two-sample T for 2009 Aug 16 Top 750 vs 2008 Aug 10 Top 750

N Mean StDev SE Mean
2009 Aug 16 Top 750 750 431.6 10.9 0.40
2008 Aug 10 Top 750 750 428.1 12.4 0.45

Difference = mu (2009 Aug 16 Top 750) - mu (2008 Aug 10 Top 750)
Estimate for difference:  3.504
95% CI for difference:  (2.322, 4.686)
T-Test of difference = 0 (vs not =): T-Value = 5.81 P-Value = 0.000 DF = 1476

************************

Strange, I don’t see the update I did in early July posted here, maybe I didn’t put that one up on this thread.  Anyway, the July number was +4.4 feet, with 95% CI’s of 2.9 to 5.8 feet.  These August 16 numbers are 3.5, with 95% CI’s of 2.3 to 4.7 feet.

Could be that something has changed, and the balls aren’t flying as far, but maybe this is just the underlying signal bouncing around.  The earliest estimate put the 95% CI’s at 4.5 to 12.5 feet.  In May it was 5.1 to 9.1 feet.

I’d be interested to hear if anyone can tell how many distinct production batches of baseballs are used each year in MLB.  In other words, are the balls used in April from the same batch as the balls used in June, and in September?  I don’t know the answer…


#139    MGL      (see all posts) 2009/08/20 (Thu) @ 18:17

What is also interesting is that overall run scoring is significantly depressed from last year.  For the HR distance to be greater due to a juiced ball and run scoring to do down is no small feat I would think.


#140    Greg Rybarczyk      (see all posts) 2009/08/20 (Thu) @ 19:42

MGL, have you got the run scoring data handy, by month?  I haven’t gotten to dig into it yet, but the weather in the East during June-July has been pretty bad (anecdotally, for me, so far, hoping to change that).  I’m interested to see just how bad the weather has been, and just how the run scoring data lines up with it…


#141    Tangotiger      (see all posts) 2009/08/20 (Thu) @ 20:01

Greg, when in doubt, b-r.com:

http://www.baseball-reference.com/leagues/split.cgi?t=p&lg=MLB&year=2009#month

Sean REALLY should be showing runs per 9IP, but so far he only has ERA, which is going to be good enough for our purposes here.  You can see that June scoring plummetted.


#142          (see all posts) 2009/08/20 (Thu) @ 20:05

Yes, always gotta use runs per inning or per out and not rpg, because then you introduce the fluctuations of how many innings per game there have been.  I don’t have the data handy Greg.


#143          (see all posts) 2009/08/20 (Thu) @ 20:23

I am not sure why one might expect runs per anything is a measure of weather effects.  I can see how weather might affect home run production but not total run production.  Of course, home runs contribute to total runs, but I would expect the effect of weather on total runs to be severely diluted by runs scored having nothing to do with the weather.  Or am I missing something here?


#144          (see all posts) 2009/08/21 (Fri) @ 00:25

Nathan - There is definitely a relationship with as the temperature increases, so does hitting and run scoring.  Here is a link an article on it and several links in it to other pieces.

The biggest key will be they 4 extra feet of distance a ball travels for every 10 degree difference.  Since there is a 20 degree average difference from coldest times of year to peak summer months, those warning track flies caught in April are home runs in July.


#145          (see all posts) 2009/08/21 (Fri) @ 00:37

Jeff (#144):  I think you are agreeing more than disagreeing with me.  I agree that the 4 extra ft will add to the home runs.  I just don’t see how weather affects total runs (other than home runs).  I suppose there will be 2nd-order effects as warning-track-shots-turned-into-home-runs lengthens the inning and gives the opportunity for more runs.  My only point is that home run production is more sensitive to weather than total runs (normalized to whatever you like).  Is anyone disagreeing with me on that?  Is so, please explain.


#146    Jeff      (see all posts) 2009/08/21 (Fri) @ 00:55

Alan - I think we are on the same page but let me see if I can run a query and which hits increase over the months


#147    MGL      (see all posts) 2009/08/21 (Fri) @ 01:03

More home runs mean more runs scored, everything else being equal.  I don’t understand your argument Alan.  As well, more HR generally means longer fly balls which generally means harder hit everything which generally means more run scoring (although slightly more GDP to go along with that).  But I really don’t get your point Alan.  Warmer weather = more runs.  You are a physicist, right?  You can tell us why that is.  I can only “guess” that the warmer the weather the less friction there is in the air and the harder and further the ball travels. Longer and harder hit balls means more runs, again, everything else being somewhat the same (or at least not enough to cancel the effect of the harder and longer balls).

There may be a little more going on, like pitchers getting more fatigued, but I think the primary reason why warmer weather means more runs is the same reason why higher altitude in Colorado means MORE HR and MORE runs in general. 

Alan, isn’t the reason why many more runs are scored in COL (especially before the advent of the humidor) the same reason why more runs are scored in hotter weather? Less dense air? Is this the real Alan Nathan?


#148    Tangotiger      (see all posts) 2009/08/21 (Fri) @ 07:53

Alan, here is a couple of links that shows that performance, over and above the HR, is directly linked to the temperature:

http://www.insidethebook.com/ee/index.php/site/comments/weather_park_factors/

The images in the linked story are not accessible, as they tell the better story.  But, the summary text is good.  Maybe Chris Constancio will oblige.

Here’s another by Chris on wind:
http://www.insidethebook.com/ee/index.php/site/comments/wind_patterns_affecting_stats/

Finally, one by Jonathan Hale (PITCHf/x-er) on the movement of the pitches by temperature:
http://www.insidethebook.com/ee/index.php/site/comments/impact_of_temperature/

For those interested, all the park-related threads can be found here:
http://www.insidethebook.com/ee/index.php/weblog/category/Parks/

There’s only about 30 of them, and they’re decent reads.


#149          (see all posts) 2009/08/21 (Fri) @ 08:42

MGL and Tango:  you have stimulated my interest.  I’ll go back and read some of the stuff in the links.


#150    pft      (see all posts) 2009/08/29 (Sat) @ 03:33

Anyone noticed that HR’s in the AL are up almost 15% over last year and in August, balls are just flying out of the park at rates not seen since the pre-testing era.  An incredible 1.33 HR/G (league average) in August.  The Red Sox are up close to 2 and Fenway is no longer suppressing HR’s.  Going back to 1998, no August has approached these HR rates and only in 2000 (May and June) were these rates approached (1.30).

Yet this does not seem to be translating into the big boppers hitting more HR, but 15 HR a yr guys are all of a sudden hitting 25.


#151    MGL      (see all posts) 2009/08/29 (Sat) @ 19:29

In June of 1963 in the NL, HR rates were sky high. I wonder what was going on?


#152    Greg Rybarczyk      (see all posts) 2009/08/29 (Sat) @ 23:02

Yeah, I wouldn’t read too much into it.

The weather in August has been (anecdotally) better than in June, and somewhat better than July.  And I get a lot of anecdotes, since I check the weather on every game which has a homer in it (which is around 90%).  I’d be surprised if the weather wasn’t at least partly responsible for the August rate being higher than earlier in the year…


#153    pft      (see all posts) 2009/08/30 (Sun) @ 02:26

Highest rate of any August since 1998, and highest rate of any month between 1998 and 2009 since May and June of 2000.

The rates earlier in the year were already much higher than recent years (presumably why this thread was started in May) .

The YTD rates are higher by 15% than the last 2 years.

Maybe guys are spending too much time on pitch f/x or defensive metrics, this one is like missing the forest for the trees, or missing the barn wall with a FB.

The interesting thing is this is not going on in the NL. HR rates are much lower, even in August.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Mar 15 18:25
The Church of Baseball, part 2

Mar 15 18:24
Not in MLB?

Mar 15 17:49
The Church of Baseball, part 1

Mar 15 17:22
Total WARP by original signing team

Mar 15 17:15
On a HOF path, before injuries strike

Mar 15 16:22
Pre-Introducing Batted Ball FIP

Mar 15 16:04
Wieters II

Mar 15 15:48
Park factors and weather

Mar 15 15:47
Baseball-Reference forecasts using Marcel

Mar 15 14:17
Morgan Ensberg, saberist