THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, June 27, 2008

Are home teams cheating this year?

By , 04:03 AM

So far this year, the home team has won 56.64% of its games.  The last 3 full years (05-07), the home team WP has been 54.2%.  The difference in 1174 games so far in 2008 is 1.65 SD.  The difference may even be less than that because in 2005, while the home team won 53.7% of its games, the home and road team run scoring suggested that the true home team WP was 54.4% (using a basic pythag formula).

This year, the home and road run scoring suggests a true 56.4% home team WP (as opposed to an actual WP of 56.6%), so perhaps we can say that the home team wp is 2% higher than over the previous 3 years.


Technically, if we want to know the SD of the difference between the last 3 years and this year, we have to take the square root of the sum of the variances between this year and 05-07 combined.  That is 1.6%.  So a 2% difference is 1.27 SD, not a whole lot to get excited about, but worthy of some further investigation.

It has been suggested (by Jason Stark who spoke to some baseball insiders) that road team players are no longer taking “greenies” and that they had been doing that for 40 or 50 years (I don’t know what the home team WP was prior to 40 or 50 years ago, but the travel schedule was completely different back then, so we might not be comparing apples to apples anyway).  Thus the road team players are more fatigued this year than in years past.

The answer to the question, “Why suddenly this year?” is two-fold (I am just speculating here - I am not necessarily suggesting that the 56.64% home team WP is “real” or that “greenies” is the reason):  One, HFA has actually been climbing a little the last few years (I think), and two, with stricter testing and penalties, and the off-season publicity surrounding the Mitchell report and PED’s in general, virtually all players have gone cold turkey on PED’s this year.

There is another possible explanation that I have not heard at all.  Again, I am speculating and thinking out loud.  Earlier in the season, I read somewhere that starting this season MLB is allowing teams to store baseballs in climate controlled conditions, much like the Rockies and the humidor.

I have not heard anything else about this.  I have no idea which teams, if any, other than the Rockies are doing this, or even if it is true.  The only other team I can think of that might want to do this for the same reason as the Rockies, is the D-Backs.  They are at a high elevation and it is very dry in Phoenix.  Perhaps in the humid cities, they are storing balls in a “dry-ador.” I don’t know.

As soon as I heard this (that all teams can store baseballs in climate controlled environments), I immediately thought, “What is to prevent them from cheating by using livelier balls when they are batting or when they are behind late in the game?” I have no idea who is in charge of storing baseballs and then providing them to the umpire during the game.  Do they have to give all of the balls to the umpires before the game starts, which limits their ability to cheat?  Are the balls “rubbed up” in the umpire’s room after they are taken out of any climate controlled environment?  Even if all of the balls are given to the umpires prior to the game, can the home team “mark” the balls?  That seems a little far-fetched, even when I put on my “conspiracy hat” as I am now wearing.

Anyway, I took the STATS data and looked at average fly ball distance (which is a proxy for ball “liveliness,” all other thing being equal) for the home and road teams this year.  For 2005-2007, the home team averaged 324.6 feet per fly ball, and the road team, 324.4, for a difference of .2 feet, in favor of the home team.  We expect this of course, as a natural consequence of the HFA.

What about in 2008 so far?  The home team is averaging 324.0 feet and the road team 323.0, for a difference of 1.0 feet, quite a bit different than 05-07.  Now, that could be a consequence of whatever is causing the home team to score more and win more so far this year, whatever that might be - greenies or whatever.  It does suggest, one way or another, that maybe the extra HFA this year is NOT a fluke.  Could some teams be pulling out livelier balls when they are hitting?  I guess that is possible.

The other thing I thought of was that maybe home teams are pulling out livelier balls when they are losing late in the game. If I were going to cheat and did not want to get caught, I would probably do that rather than try and use different balls every half inning.  Plus, it is possible that the ball boy goes into a room somewhere every couple of innings or so and grabs a bucket of balls.  Again, I have no idea how that works.  Maybe someone can illucidate that for us, who has worked at an MLB stadium.

So of course I looked at average fly ball distances late in a game (> 6 inning) when the home team was losing and when they were winning (and not when they were tied).  The results are pretty alarming, I think.

In 05-07, when the home team was winning in the 7th inning or later, the average fly ball distance for both teams was 323.9.  When they were losing, it was 323.5. 

In 08, when they were winning, it was 320.7 and when they were losing, it was 322.5.  Wow.  In 05-07, when losing, the ball actually travels .4 feet less, for whatever reason.  This year, when losing, the ball travels 1.8 feet further, a swing of 2.2 feet.  I don’t have to tell you how much that is worth.  It is worth a lot.  My guess is that each foot is worth an extra .005-.007 HR per fly ball or maybe .1 runs per game (a WAG).  So 2.2 feet is worth maybe .25 runs per game, which is a lot of course.

What about if teams are cheating, but they are only “monkeying” with the baseball in the 9th (or later) inning?  Let’s say that they bring in their closer and bring out the “dead balls.”

In 05-07, when the home team was winning in the 9th or later, the average fly ball distance was 323.4.  When losing it was 323.0.  So again, the ball travels .4 feet less when the home team is losing at the end of the game.  That was in 05-07.

What about in 08 (obviously the above >6 inning data includes the >8 inning data)?  When the home team is winning, the ball travels 317.9 feet.  When losing, it travels 321.2, or 3.3 feet further!  That is a 3.7 feet swing as compared to 05-07!  Wow!

Looking at it another way, in 05-07, the average fly ball traveled 324.5 feet.  When the home team was winning late in the game, it traveled 323.9 or .6 feet less.  With the home team losing, it traveled 323.5, or 1 foot less than overall.

05-07

Overall Home team winning 7th or later Home team losing
324.5 323.9 323.5

08

Overall Home team winning 7th or later Home team losing
323.5 320.7 322.5

05-07

Overall Home team winning 9th or later Home team losing
324.5 323.4 323

08

Overall Home team winning 9th or later Home team losing
323.5 317.9 321.2

As I said, I find these numbers alarming and troublesome.  More research and more investigation has to be done.  I would like it if someone with access to fly ball distances can verify the numbers, especially for this year. I would also love it if someone could shed some light on whether teams other than the Rockies are actually storing baseballs in some kind of climate controlled environment and what the procedures are for giving them to the umpires and disseminating them during a game.

#1          (see all posts) 2008/06/27 (Fri) @ 07:14

I find it hard to believe that home teams are cheating in the sense that you mention. If that was going on, someone on an opposing team would get wise to it and blow the whistle. Tightly organized conspiracies can work without detection, but a conspiracy involving several teams? There are just too many people who would know, from club officials, to at least some players, and even the ball boys. It’s the ball boys who takes balls out to the umpire, isn’t it? If it is, he would notice. There are just too many people involved to keep it a secret.

I have no idea why this is the case, but it could just be the way things are working out this season. Even if the statistical methods that you apply suggest that something is going on, that doesn’t mean it isn’t just luck. I remember reading one of the Bill James abstracts back when. He talked about running one hundred seasons of Willy Mays on his baseball simulator. He found that in one season Mays was miserable, only hitting about .250. Stuff happens. Anomalies occur. I think you’d need more than one season’s worth of data to be certain.

That said, from a business point of view, it’s in the interests of all sports leagues to give as much of an advantage to the home team as is reasonable. A team that wins at home brings out the fans, or so they say. It seems to be the consensus among those who claim to know.


#2    bsball      (see all posts) 2008/06/27 (Fri) @ 07:33

mgl: You say that the winning % difference for 2008 was not much to get too excited about because it’s only 1.3 SD away from 2005-07.  What’s the SD on fly ball length?  I would expect the SD to widen as you narrow the sample size (i.e. go from all fly balls to > 6th inning to > 8th inning).  Could that be all we’re seeing in your data?


#3    Peter Jensen      (see all posts) 2008/06/27 (Fri) @ 09:15

MGL - Here is the article I told you about that says all the clubs are using temperature controlled storage starting in 2007.

http://colorado.rockies.mlb.com/news/article.jsp?ymd=20070208&content_id=1798476&vkey=news_col&fext=.jsp&c_id=col

I am sorry. I have never learned how to add a link.  But you can easily find the article by searching on humidor at the MLB.com web site.  It is from February 8th 2007.

I have an alternative possible explanation as to why the home teams are performing better this year.  Us.  By “us” I mean advanced sabermetrics.  If more teams are using better sabermetrics to acquire players who will perform better in their home parks, it would increase the home field advantage.  Doesn’t do a thing to explain the fly ball distances however.


#4    Peter Jensen      (see all posts) 2008/06/27 (Fri) @ 09:26

Apparently I can add a link successfully after all.


#5          (see all posts) 2008/06/27 (Fri) @ 09:27

MGL - why do you find these numbers alarming?  The HFA in baseball has always been lower than other sports; seems like baseball is just starting to find ways to catch up.

When the Miami Dolphins blast the air conditioning in the visiting team’s locker room, so that when the teams come outside for the game the temperature feels like 110 instead of 95, I find that a little alarming because I worry that it might increase injury from cold muscles.

I suppose it’s a slippery slope (if you’re messing with the balls with temp/humidity, why not scuff some and give those to your pitchers, etc.), but still… hats off to them for taking advantage of things they’re allowed to control.


#6    bsball      (see all posts) 2008/06/27 (Fri) @ 10:09

OK.  I played around with some numbers just to get a feel for how big a spread to expect with these samples.  Warning - I don’t do this for a living so it could be off.

I’m assuming about 54,000 FB per year for MLB and evenly split among the subgroups.  So, the sample sizes are

81,000 for each home/road group in 2005-2007
12,500 for 2008

27,000 for innings 7+ 2005-07
4,200 for innings 7+ in 2008

9,000 for innings 9+ 2008
1,400 for innings 9+ 2008

Given those we expect the spread in the sample means to be about 7.5 times as wide for the 2008 9th inning samples as for the 2005-07 all innings samples.  But how wide is the confidence interval.

To estimate that I assumed 95% of FB are between 250 and 400 feet and normally distributed (mean is 325, which is pretty close to actual).  That gives a SD of about 37 per FB.

For the sample sizes given 95% confidence interval for comparing sample means is:

.4 feet for 2005-07 all FB (i.e. comparing home to road)
.9 feet for 2008 (vs. H/R diff of 1)

.6 feet for 2005-07 inn. 7+
1.6 feet for 2008 (vs. Hwin/Hlose diff of 1.8)

1.1 feet for 2005-07 inn. 9+
2.9 feet for 2008 (vs. Hwin/Hlose diff of 3.3)

From that it seems like all three sets for 2008 are a little beyond the edge of the 95% confidence interval.  Seems like it may be reasonable to guess something besides random variation is behind this.


#7    Greg Rybarczyk      (see all posts) 2008/06/27 (Fri) @ 11:37

How’s it look when you remove the 9th inning?  I would expect a skew to arise from that, as the 9th inning of a home win frequently consists *only* of 3 outs pitched by the home closer, who is typically the best reliever on the home club and thus likely to render a lower-than-average fly ball distance (that effect is apparent in the data you show). 

In comparison, the 8th inning of a home win will often include a strong (but not the best) home reliever to hold the lead, and a weaker visiting reliever to eat the inning (this is my impression, of course, and should be confirmed).

Also, are there any significant effects apparent from park to park?  I’d be curious to see if any of the 30 show significant effects in either direction…


#8    MGL      (see all posts) 2008/06/27 (Fri) @ 12:24

I’ll have to come back and comment later.  Greg, if I look an individual parks, the sample sizes will be really small, so I’m not sure you are going to get much useful information.

I don’t think that fly ball distances are normally distributed, but it could be that that assumption is OK for determining the SD by chance.

When I say “alarming” I mean that the fly ball distance data supports the theory that some home teams are cheating.  Whether that is “OK” or not, is another story.

Peter, do you know anything else about this other than the article you cite?  I had thought they started only this year, and the data show nothing unusual in 07.

There are other reasons why the HFA might truly be large now (such as “no greenies”, sabermetrics, etc.), but that would not explain the home team winning and losing fly ball distance discrepancy of course.

Plus, if it were something like more teams using sabermetric principles to construct their team to take advantage of their home park (and I don’t know that sabermetrics says much about that anyway), it would not occur “all of a sudden” in one year.  That is for sure.


#9    Greg Rybarczyk      (see all posts) 2008/06/27 (Fri) @ 14:27

I can see how the sample size might make a park effect indistinguishable in all but the most extreme cases, which may not exist here…

I think the selection bias for the 9th inning has to be a part of it, but I have no feel for how much, so I’ll bide my time and wait to see what you guys can come up with…

Also, I recall reading somewhere (I think the Boston Globe) that someone on the Red Sox responsible for taking care of the baseballs was asked about the new policy (as described in the link Peter provided), and he replied, in effect, that the Sox were storing the balls exactly as they always had, stacked in a non-climate-controlled closet.  Sorry no link, I tried to find this, but haven’t succeeded so far… I’ll keep looking…

So, I wouldn’t assume that teams are actually doing what they supposedly have been directed to do…


#10          (see all posts) 2008/06/27 (Fri) @ 15:49

MGL -

I think an easier way to cheat, rather than alternating cimate-controlled baseballs with regular ones during the game, would be to simply use the humidity-controlled ones ONLY in games where it was likely to play to your advantage as the home team.

Here’s what I mean:  The home team is starting a fly-ball pitcher on Day 1 of the series, while the road team is starting a ground-ball pitcher.  On Day 1, the home team uses the baseballs that won’t travel as far. 

On Day 2, home team starts a ground-ball pitcher and road team starts a fly-ball pitcher, and naturally the home team supplies the ump with the balls that aren’t humidor-treated. 

Day 3, the two teams start pitchers with nearly identical GO/AO ratios, and thus strategically selecting the baseballs is a non-factor. 

The home team could do this fairly easily. It requires no doctoring or sleight of hand during a game.  Just replacing the entire of rack of baseballs for the next day’s game each night, depending on the tendencies for tomorrow’s pitchers.


#11          (see all posts) 2008/06/27 (Fri) @ 15:53

Another thought:

Would it be possible for home teams to use discreet, high-tech surveilance footage to their advantage during games?

(Ex. Are runners getting caught stealing at a significantly higher rate on the road than they are at home?)


#12    MGL      (see all posts) 2008/06/27 (Fri) @ 16:13

Yes, #10 would work, although it would be more subtle and less effective.

Teams could always steal signs if they wanted to.  Most teams don’t.


#13    dcj      (see all posts) 2008/06/28 (Sat) @ 00:16

I haven’t yet read past the first couple paragraphs. But, it is incorrect to use the standard pythag formula to estimate home team winning percentage, because of the disparity in innings (home half of the 9th). There’s a study in By The Numbers

http://www.philbirnbaum.com/btn2004-05.pdf

that comes up with an adjusted version of the formula.


#14    MGL      (see all posts) 2008/06/28 (Sat) @ 13:40

#13, yes of course.  It is not the disparity in innings that creates a problem, since you want to use runs per 9 innings and not runs per game (of course), but the fact that the home team often scores “partial runs” in the bottom of the 9th or later when they win the game.  Because of that, you want to do some adjusting, although it is not that big a deal.  I’ll take a look at the article by Phi, though, thanks.

I looked at individual teams.  The sample sizes are small so there is going to be lots of fluctuation, but the most egregious (more than 10 feet of difference) teams so far this year in terms of average fly ball distance when winning or losing in the 7th inning or later, are:

ATL
PHI
SD
BAL
CHA
TEX

Mildly egregious (more than 5 feet but less than 10 feet difference) teams are:

CHN
COL
BOS
NYA
TBA
TOR

And BTW, one SD of fly ball distance is 50.6 feet.


#15    bsball      (see all posts) 2008/06/28 (Sat) @ 19:20

If 1 SD of fb distance is 50 ft, then I think 2 sd of the sample mean for 2008 home games in innings 7+ for a single team so far is going to be in the ballpark of 10-15 ft.  That’s going to be something like 150 fb or so, I think.  So I don’t think looking at single teams is going to tell you anything.


#16    MGL      (see all posts) 2008/06/28 (Sat) @ 19:50

I agree that looking at individual teams won’t tell you much at all, although I am never one to be “married” to how many SD’s from the null hypothesis something is, especially when we suspect something to start with (which makes it a Bayesian problem).


#17    bsball      (see all posts) 2008/06/29 (Sun) @ 02:40

What if the reason we suspect something is purely from statistical analysis, rather than having a new piece of basic information?  I think that’s the case here and I think there is no new information to add to make it a Bayesian problem.


#18    MGL      (see all posts) 2008/06/29 (Sun) @ 12:30

#17, right, then it is not Bayesian and is a different story.  I don’t think that is the case here.  I only looked at the the statistics because I thought that allowing teams to keep baseballs in humidors or other climate controlled conditions was just asking for someone to cheat unless MLB had strict “chain of command” rules and procedures with regard to the baseballs.

For example, what if we were told by a bunch of ball boys that 6 teams were definitely cheating.  And then we measured the fly ball distances and found NO evidence of cheating, but it were a relatively small sample sample of fly balls (like we have now).  Well, we could certainly still conclude that teams were cheating but that the evidence from the data suffered from sample error.  Here, we don’t have strong evidence of cheating going in, but we have a suspicion.

Almost everything is a Bayesian problem.  It is one of the things that social and other scientists often forget or ignore.  For example, let’s say that we manufacture a drug and that it “works” in lab animals and it is a known fact that when drugs work in lab animals, they tend to work on human beings.  Now, we do a double blind test on human subjects and we find a difference between our control group and the group that used the drug, but the difference was not quite significant at the 2 sigma level.  Do we reject the hypothesis that the drug works better than a placebo?  No!  That would be ridiculous, if only because we had a good idea that it would work going in.


#19    bsball      (see all posts) 2008/06/29 (Sun) @ 19:50

#18, I guess I read your post as if you had noticed a possible difference in HFA and then went looking for an explanation.  If your starting point was independent evidence of cheating then that’s a different story.

For your drug company testing example, I think it’s probably best that medical studies don’t use the Bayes tests.  Those studies tend to be biased enough towards positive results without Bayes help.


#20          (see all posts) 2008/07/01 (Tue) @ 23:12

To see how HFA has changed over time, go to

http://www.geocities.com/cyrilmorong@sbcglobal.net/HomeRoad.htm

It does fluctuate alot from year to year. That does not mean that mgl is not on to something. I just thought people might like to see it.


#21    Tangotiger      (see all posts) 2008/07/08 (Tue) @ 09:08

Post 20 was marked for moderation and is now open.


#22    MGL      (see all posts) 2008/07/16 (Wed) @ 02:54

Regarding post #20, we don’t need to see a chart of historical HFA to know that it fluctuates a lot (whatever “a lot” means).  There are 2430 games in major league baseball these days.  One SD is 1.025% on wp.  So we “know” that every 20 years or so, it will be 2% or more greater or less than the actual true HWP.  And that is just from random fluctuation.  There are probably other factors that influence the true HFWP, so the likely SD is even higher.

Anyway, here is an update, as of the ASB.  I am missing about 10% of the data from the first half.

Remember that in 05-07, when the home team is winning or losing after the 6th inning, the average fly ball distance is around the same.  It is actually .4 feet higher when the home team is winning.  That is the baseline we are working with.

In 08 so far, when the home team is winning in the 7th inning or later, the average fly ball distance is 321.5 feet.  When they are losing, it is 322.6.  One SD of flay ball distance is 50 feet per fly ball.  For around 7000 fly balls, which is the sample size of each of these two points of data (HT winning and HT losing in the 7th or later), the SD is .60 feet.  Add together the variances and take the square root, and we get .85 feet.  So a difference of 1.1 feet is only 1.3 SD.

Nothing to get too excited about, but again, troubling, IMO.

Oh, and the difference is only in the AL.  Not in the NL.  And in the AL, it is 2.7 feet.  That is 2.3 SD, but I am looking at 2 leagues, so that changes the probability of a Type I and II error.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 01:57
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential