Friday, September 24, 2010
Attendance and HFA revisited
I did the following study:
I looked at all games in 2007-2009. I split each team’s home games into 2 buckets - the lowest 1/3 (roughly) in attendance and the highest 1/3 (roughly) in attendance. Attendance numbers were from the retrosheet.org game files (which I understand may be tickets sold and not a turnstyle count, although I don’t think it makes that much difference).
So I now have 2 total buckets - one with the highest attendance home games for all teams and the ones with the lowest attendance games for all teams.
I then looked at home winning percentage for each of the two buckets, as well as total runs scored, home runs scored, and visitor runs scored. I compared these to “expected” home winning percentage, and expected total runs scored, home runs scored, and visiting runs scored.
“Expected” numbers are from my pitcher and batter projections (using the starting lineup and starting pitcher for each game, and estimated bullpen and pinch hitters), including park factors, and weather. It is quite a lot of work to come up with these projected game stats for each game, but I already had these in my database (IOW, I already spent many, many hours constructing these).
Here are the results:
For the low attendance bucket, we had an average attendance of 25,501 in 2,385 games. The home team was expected to win .589 and they won .577. The home team was expected to score 4.82 rpg and they scored 4.83. The vis team was expected to score 4.55 and they scored 4.48. Total runs expected (home and vis combined of course) was 9.35 and total runs actual was 9.30.
For the high attendance bucket, the average attendance was 38,347 in 2451 games. The home team was expected to win .541 and they won .537. The home team was expected to score 4.75 rpg and they scored 4.73. The vis team was expected to score 4.62 and they scored 4.68. Total runs expected (home and vis combined of course) was 9.39 and total runs actual was 9.41.
I don’t see any difference whatsoever, in terms of HFA as a function of attendance. It looks to me like more fans come out to see a game when they are facing a good team, which is why you see slightly fewer runs scored and more runs allowed in higher attendance games, and why you see a smaller win percentage.
Honestly, I think that authors of the study I cited in the original thread from a few days ago are full of crap. Either they falsified data, made an honest mistake with the data, or their methodology was simply terrible. And I am not afraid to say that publicly…


Dear MGL
I think your last paragraph accusing them of possibly falsifing data is not justified by your findings. You have used a different methodology and a smaller different data set (your 07-09 vs their 96-05 dataset). The effect could (theoretically) be different over the different time periods.
Also they also found that better visiting teams increase attendence and I’m not sure if your method can untangle higher attendence for good teams and higher attendence boosting home team perfromance as the former was a stronger effect in their model. Perhaps if you first subdivided by good and weak visiting teams and then looked at low and high attendence you might have more evidence to build your case that they are wrong.
I think you need to at least look at the same games and fail to find their effect before you can even think of accusing them of falsification.(I realise this could be an enourmous amount of work for you.)
To really build a case for fraud you also need to replicate their exact methods. If you do so and find a different result to theirs then that leaves error or fraud.
I didn’t particualarly like their paper either and they may very well be “full of crap” and it is probably another example of overenthusiastic use of regression techniques but I can’t see how you can make these serious claims without a lot more work to build your case.
Please don’t take this the wrong way as I am a big fan of you and Tango but I think everyone should be given the benefit of the doubt until there is compelling evidence of fraud which in this case I don’t think you have provided (yet).
James