THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, September 24, 2010

Attendance and HFA revisited

By , 12:16 AM

I did the following study:

I looked at all games in 2007-2009.  I split each team’s home games into 2 buckets - the lowest 1/3 (roughly) in attendance and the highest 1/3 (roughly) in attendance.  Attendance numbers were from the retrosheet.org game files (which I understand may be tickets sold and not a turnstyle count, although I don’t think it makes that much difference).

So I now have 2 total buckets - one with the highest attendance home games for all teams and the ones with the lowest attendance games for all teams.

I then looked at home winning percentage for each of the two buckets, as well as total runs scored, home runs scored, and visitor runs scored.  I compared these to “expected” home winning percentage, and expected total runs scored, home runs scored, and visiting runs scored.

“Expected” numbers are from my pitcher and batter projections (using the starting lineup and starting pitcher for each game, and estimated bullpen and pinch hitters), including park factors, and weather.  It is quite a lot of work to come up with these projected game stats for each game, but I already had these in my database (IOW, I already spent many, many hours constructing these).

Here are the results:

For the low attendance bucket, we had an average attendance of 25,501 in 2,385 games.  The home team was expected to win .589 and they won .577.  The home team was expected to score 4.82 rpg and they scored 4.83.  The vis team was expected to score 4.55 and they scored 4.48.  Total runs expected (home and vis combined of course) was 9.35 and total runs actual was 9.30.

For the high attendance bucket, the average attendance was 38,347 in 2451 games.  The home team was expected to win .541 and they won .537.  The home team was expected to score 4.75 rpg and they scored 4.73.  The vis team was expected to score 4.62 and they scored 4.68.  Total runs expected (home and vis combined of course) was 9.39 and total runs actual was 9.41.

I don’t see any difference whatsoever, in terms of HFA as a function of attendance.  It looks to me like more fans come out to see a game when they are facing a good team, which is why you see slightly fewer runs scored and more runs allowed in higher attendance games, and why you see a smaller win percentage.

Honestly, I think that authors of the study I cited in the original thread from a few days ago are full of crap.  Either they falsified data, made an honest mistake with the data, or their methodology was simply terrible.  And I am not afraid to say that publicly…


#1    James      (see all posts) 2010/09/24 (Fri) @ 07:23

Dear MGL

I think your last paragraph accusing them of possibly falsifing data is not justified by your findings. You have used a different methodology and a smaller different data set (your 07-09 vs their 96-05 dataset). The effect could (theoretically) be different over the different time periods.

Also they also found that better visiting teams increase attendence and I’m not sure if your method can untangle higher attendence for good teams and higher attendence boosting home team perfromance as the former was a stronger effect in their model. Perhaps if you first subdivided by good and weak visiting teams and then looked at low and high attendence you might have more evidence to build your case that they are wrong.

I think you need to at least look at the same games and fail to find their effect before you can even think of accusing them of falsification.(I realise this could be an enourmous amount of work for you.)

To really build a case for fraud you also need to replicate their exact methods. If you do so and find a different result to theirs then that leaves error or fraud.

I didn’t particualarly like their paper either and they may very well be “full of crap” and it is probably another example of overenthusiastic use of regression techniques but I can’t see how you can make these serious claims without a lot more work to build your case.

Please don’t take this the wrong way as I am a big fan of you and Tango but I think everyone should be given the benefit of the doubt until there is compelling evidence of fraud which in this case I don’t think you have provided (yet).

James


#2    tangotiger      (see all posts) 2010/09/24 (Fri) @ 08:01

Right.  I think you can either claim their methodology is full of sh!t, or their interpretation of the results is full of sh!t.  To claim falsification of data requires a much bigger burden of proof.


#3    Hizouse      (see all posts) 2010/09/24 (Fri) @ 11:06

If there is an effect, it would probably show up more in teams that have larger variations in attendance.  So maybe the teams that are pretty much always at full or near-full capacity (e.g., Red Sox) should be eliminated from the study.  I don’t think the authors of the original study would claim there’s a meaningful HFA difference between a 95%-full stadium ann a 99%-full stadium.


#4    MGL      (see all posts) 2010/09/24 (Fri) @ 11:07

James (and Tango), I understand your point.  Also, I will separate by good and bad visiting teams, although I am 95% sure it won’t make any difference.  As far as different data sets - come on!

I am not really accusing them of falsifying data. Not at all.  When I first read or heard of the study, I was 95% sure that their results were incorrect, based on pure common sense (of a sabermetrician who has worked with baseball data for almost 25 years), and based on what I know of the “Vegas lines.”

When I finished my study, I was 99.9% sure that their results were incorrect, hence the “full of crap” comment.  I was merely listing the ways that they could have gotten such bad results.  Obviously I don’t have any evidence that they were faking data, and that possibility is probably less than the others, although I think that happens a lot more often than people suspect…


#5    MGL      (see all posts) 2010/09/24 (Fri) @ 11:37

Hizouse, right I thought about that too.  I certainly can eliminate those teams, like BOS, NYY, Dodgers, and the Cubs.

For James and Hizouse:  My experience is that if you find absolutely nothing when you have a sample which is possibly “watered down,” even if you pare your sample down to more relevant data, as you suggest, you are not likely to find any effect either.  The reason is this:  If you have a significant effect with nice, pure data, even if you water down that data a little, you will still find an effect, albeit smaller.


#6    MGL      (see all posts) 2010/09/24 (Fri) @ 12:14

Here are some quick breakdowns:

For opponents (visiting team) with good starting pitchers (must be above-average, based on my projections) pitching that day:

Low attendance (771 games, avg att=26,385)

Ex vis runs: 4.54
Act vis runs: 4.38
Ex home runs: 4.36
Act home runs: 4.43
Ex hwp: .498
Act hwp: .549

High attendance (1055 games, avg att=38,475)

Ex vis runs: 4.67
Act vis runs: 4.72
Ex home runs: 4.34
Act home runs: 4.27
Ex hwp: .484
Act hwp: .482

Interestingly, we seem to have the reverse effect here, although it is likely that my projections are off and that the attendance is a better indicator of the vis team’s strength than my projections are.

Here is the same data for games in which I project the visiting team starting pitcher to be below average:

Low attendance (1602 games, avg att=25,113)

Ex vis runs: 4.55
Act vis runs: 4.51
Ex home runs: 5.04
Act home runs: 5.03
Ex hwp: .566
Act hwp: .583

High attendance (1383 games, avg att=38,297)

Ex vis runs: 4.59
Act vis runs: 4.63
Ex home runs: 5.06
Act home runs: 5.08
Ex hwp: .565
Act hwp: .572

Very similar result as with the better visiting starters.

Here is for all bad overall visiting teams (for that day), starting pitcher and offense combined:

Low attendance (729 games, avg att=24,636)

Ex vis runs: 4.40
Act vis runs: 4.31
Ex home runs: 5.22
Act home runs: 4.95
Ex hwp: .598
Act hwp: .594

High attendance (558 games, avg att=38,213)

Ex vis runs: 4.37
Act vis runs: 4.55
Ex home runs: 5.26
Act home runs: 5.40
Ex hwp: .605
Act hwp: .604

Interestingly, here we have quite a bit fewer home runs scored than expected, although the exp and actual hwp are around the same.  I’m not sure what to make of that.

I’ll revisit the data later.

And of course, I am not splitting the attendance numbers after controlling for the visiting team strength, which I should be doing in order to hold the opposing team strength “constant.” And I probably should be using visiting team actual records to date rather than visiting team player projections as a measure of visiting team strength, since attendance is likely based on how the fans view the opposing team, which is likely based on their records rather than their “true” strength, based on to-date player projections.


#7          (see all posts) 2010/09/24 (Fri) @ 15:56

Hello,
I’m one of the authors of the allegedly dishonest and/or idiotic study.  I thought I might take a moment to write a few comments in my defense.
One of the responses above emphasizes the importance of replicating every methodological decision when trying to confirm a finding or identify where it went wrong. I would have happily provided the data so you could look into this. Unfortunately you didn’t make any attempt to follow the methods in the paper. Had you tried you would have realized a few things that make your comments completely irrelevant.

First, as you correctly point out, it’s hard to determine causality as opposed to correlation. I completely agree and this is why we devote over half the text to motivating our use of instrumental variable to use random shocks to attendance (like weather or game time) to get a kind of treated and untreated group. The goal here is to approximate an ideal experiment of two identical games, one with high attendance and one with low attendance. Our instruments perform very well so to the extent you buy any academic research claiming causality; you should probably buy this one too. 

Just using an IV isn’t enough though. What if some teams in rainy cities always perform better in the rain? What if the schedule lends itself to high attendance games when the home team is playing bad opponents,? etc. We use numerous controls to account for variation in the data that could confound our results. We also used team-year fixed, which essentially eliminates any team-specific effect and only look for an attendance / home fiend advantage relationship within a particular team year. All these team year effects are then effectively averaged to get an overall effect.

The study devotes pages to motivating these precautions so it’s a little funny that your post implied that maybe I hadn’t thought of a reverse causality argument. You may be 95% sure our study is bogus, but I am 95% sure you didn’t ever read the paper.  How can you write such a harsh (and slanderous) critique without having read the paper (which is available for free).

Intellectually, your post doesn’t upset me much – even if done correctly, how could a crude 30-minute pass at the data not be plagued with measurement issues. Your approach of taking averages over “buckets” conveys pretty close to zero information. You also could have found similar numbers in the summary statics table provided in the paper. The numbers are meaningless because you are completely ignoring the causality issues you accused me of neglecting. Not to mention crucial measurement issues – you can’t use raw attendance numbers because stadium capacity varies widely. The important metric is the percent attendance.  (Actually, it’s the log of percent attendance most of the driving force comes from the teams with very low levels of attendance – another reason why your buckets missed quite a bit of information.

Unfortunately, your post did affect me - I found it incredibly disheartening. I have decided to devote my life to the academic research process.  I really believe that despite all the crappy research out there, there are huge benefits to giving other researchers the benefit of the doubt. Don’t get me wrong, one of the best ways to learn is to try to find every flaw with their work so as to better understand your own short comings, but how depressing would it be if the academic community was so distrustful of their peers studies that each individual has to reinvent the wheel before building upon it. I guess this is why it was somewhat devastating that the first public criticism of my work was an attack on my honesty, but it was by someone who didn’t even take the time to read the abstract of the paper. 

I genuinely look forward to defending my decisions with individuals who have demonstrated enough respect for the field to inform themselves before unleashing serious accusations. I hope I will have better luck next time around.


#8    Tangotiger      (see all posts) 2010/09/24 (Fri) @ 16:21

Erin,

Thanks for stopping by.  We discussed your paper at this earlier thread:

http://www.insidethebook.com/ee/index.php/site/comments/either_a_horrible_study_or_terrible_journalism/

My comments are at post 13.

You will find a few sympathetic researchers in that thread.


#9    Andy      (see all posts) 2010/09/24 (Fri) @ 17:09

I’m glad MGL posted the results of this. I’m still uncomfortable with the statistics behind his approach, as I detailed in the other thread (see post #32).

Fundamentally, I think there’s some confusion over what people are trying to evaluate. In my reading, both MGL and Tangotiger are interested in the actual correlation between wins and attendance not merely the causal relationship between attendance and wins.

We can write this really nicely with some simple math:

First, we suspect that attendance (A) is affected by home talent, away talent, weather, promotions, and a million other things. Let’s summarize the talent part of that with R.  Let’s call the other observable parts D. And there’s always some random or unobservable parts, let’s call that E.  So we can summarize this with:

A = b1 * R + b2 * D + E1

where b1 and b2 are some numbers that map talent and other stuff into attendance.  Linearity isn’t particularly important.

Let me be very clear about this equation: it is not the OBSERVED relationship, it is the CAUSAL relationship from talent to attendance.  That is, if God came down and changed the talent by increasing R by 1 unit, holding everything else constant, attendance would increase by b1 people.

There is another CAUSAL relationship, from attendance to HFA/talent.  Using the same notation, we can write

R = b3 * A + b4 * D + E2

Again, this tells us what we expect to happen to HFA (ie, R) if God increased attendance by 1 person, holding everything constant.

What we ACTUALLY see in the real world, is both of these equations happening at the same time. We can thus solve these equations for R and A which are simultaneously determined by D, E1, E2, and the b’s.  The math is merely tedious algebra, but it is easy to convince yourself that the covariance in the data (what MGL is measuring) is VERY different from both b1 and b3, which are the ‘causal’ relationships.  (This is a basic supply and demand framework, and the reason you cannot look at how market data to tell me how people’s demand responds to prices.)

What Erin and her coauthor are attempting to do is estimate JUST b1, using standard and simple econometric techniques. You can be interested in b1, or b2, or the observed relationship, or whichever, but accusing the authors of erring in estimating b1 because it doesn’t equal the observed correlations is missing the point of the paper.

@Erin: Honestly, I am not sold on your result, and would like to see more robustness checks/alternative specifications, especially with just instruments I believe are closest to random. But I agree the methodology itself is sound for the question at hand.


#10          (see all posts) 2010/09/24 (Fri) @ 18:26

Tangotiger,

Thanks for pointing the thread out, which was mostly a constructive discussion and only occasionally mildly insulting.  I have more comments, but I’m feeling remarkably more cheerful while making them.

I agree, it was weird that WSJ didn’t name my coauthor.

On Robustness checks: we had many more but the extra tables got taken out through the referee process. I can’t remember entirely - the sample period should give you an indication of how long ago I actually worked on this. I do remember the attendance effect persisted through many different fixed effect specifications and also works with different functional forms for attendance (although these other functional forms violated normality distributed errors, which, I think, limits the asymptotic conclusions).

On validity of the instruments: having many candidates allowed us to be pretty stringent - the ones we ended up with did very well - by that I mean the model wasn’t over-identified (I was imposing independence inappropriately) and the instruments were strong predictors of attendance. Some posts questioned the impact of weather on the liklihood of a home win - that intuition turned out to be basically correct. Temperature and home win are certainly not independent. We had many weather variables so were able to find the ones that worked best as instruments.

On the magnitude of the results: I’m pretty surprised that readers see these results and think they are too large to be realistic. My co-author and I had exactly the opposite response. We thought the results weren’t economically significant enough to be interesting - a 48% increase in attendance generates one additional run ?! That’s about a two standard deviation move in attendance when you lump all the attendance together. What you REALLY need to get the result is a 48% change in attendance for a particular team’s attendance. I would be surprised if you saw this kind of fluctuation for any more than a handful of teams this year.

I like the use of Vegas odds to measure the likelihood of winning. At one point I tried to use trade-sports prices to proxy for the probability of a republican election but somehow I never thought of doing this with sports. I’m pretty sure this will be my last sports paper though - there are just too many critiques. I’ll stick to finance.


#11    Tangotiger      (see all posts) 2010/09/24 (Fri) @ 18:32

Andy, isn’t she measuring b3 or b4 since she’s saying the causal agent (attendance) has an effect on runs and wins?

We all agree that increasing wins by +.100 wins (b1) per game can cause a 50% increase in attendance.

So, your post, which look like it was going to be promising makes me more confused.


#12    Tangotiger      (see all posts) 2010/09/24 (Fri) @ 18:41

Erin, one run is gigantically enormous.  Huge, beyond belief.

+.100 wins per game (or +1 run) -> +50% attendance
that’s totally believable

+50% attendance -> +.100 wins per game (or +1 run)
that’s totally impossible

It’s not the magnitude, it’s the direction.


#13    Tangotiger      (see all posts) 2010/09/24 (Fri) @ 18:42

Btw, major kudos for bathing into the fire.  Most people would find a reason not to come here, so you’re stock is going up with every post.


#14    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 18:55

Am I the only one who finds it ironic that saberists are responding to claims of the impact of attendance on winning with “there’s no way that’s possible” etc.?  It’s obvious that only a couple people here understand IV methods and have actually read the study.  This sounds awfully similar to parts of the baseball establishment that writes of fielding metrics, etc. because they don’t understand them.  Just because a study gives results that seem of unlikely magnitude (I would echo Andy and Depot’s concerns) doesn’t make it useless.

@Tango: I appreciate your desire to actually understand the methods.  Believe me, no one in economics believes ordinary regressions.  BUT, the instrumental variables version of regression that Erin and her co-author use is used all the time.  It’s a big step forward.  Time spent learning it is well spent.  And yeah, Andy must’ve meant b3.

@Andy and the other obvious economists on here:  So, here’s the deal.  We know that we have something useful in IV.  How do we package it so that it’s clear and useful to current saberists?


#15    Erin      (see all posts) 2010/09/24 (Fri) @ 18:59

I really have no comment on the first direction because we never estimate it. However I really don’t think the other result is as large as you are thinking because +50% attendance just doesn’t happen. +50% attendance doesn’t mean going from 50% attendance to 75% attendance, it means going from 25% to 75%. This will happen only a few times a season. Interpreting the effect as +1 run per game isn’t accurate, it’s more like +1 run in maybe two games of the season for the three teams that experience swings of this sort.


#16    MGL      (see all posts) 2010/09/24 (Fri) @ 19:10

"I really believe that despite all the crappy research out there, there are huge benefits to giving other researchers the benefit of the doubt.”

Actually it should be exactly the opposite, because a large majority of the research, even in the most prestigious academic journals, is bad/wrong, for a variety of reasons.  The default position should be that until the methodology is scrutinized AND the research is duplicated/verified, its conclusions are speculative, if not wrong.  Obviously all research is a good first step, but until it is scrutinized and verified/duplicated, it is nothing more than a tentative theory.  To give it the “benefit of the doubt” is both meaningless and dangerous.  You, as an aspiring academic should really take that to heart.

Of course I read your study, but unfortunately I do not have the requisite knowledge to understand the details.  So I am not the one to critique the study, per se.

However, as one who is an expert on baseball, I can say categorically that both your results and your inferences are wrong.  And I am NOT the one who addressed the causality issue.  Several of our knowledgeable readers stated that you DID address the causality with your methodology, but I have no idea whether that is correct or not.

Regardless of the direction of the correlation, the relationship is just flat out wrong.  I don’t even see how either increased run scoring or home wp can “cause” increased attendance and certainly increased attendance causing a significant increase in either home run scoring or home winning percentage is preposterous, and is not borne out by my “10-minute” (actually about 2 hours plus a hundred more hours in constructing the projected numbers) study.

If any of your findings were correct, it WOULD have been evident from my study, I am afraid.  Sometimes (actually most of the time with sports studies of this kind) it DOES not take a fancy regression analysis to evince the kinds of results we are usually looking for.  In fact, as we on this blog (and Phil B. on his blog) have demonstrated time and time again, using fancy (and opaque) statistical techniques without an expert and detailed knowledge of the sport itself, more often than not leads to faulty conclusions.

I am afraid that your study is faulty, for whatever reasons. Again, I have not in any way, shape, or form stated where I think you went wrong. Maybe it was simply a Type II error.  I have no idea.

If you are insulted (or whatever), well, that’s just too bad.  I just call them as I see them.  And there is simply no way that there is an effect of the magnitude you suggest, regardless of the actual cause/effect relationship.

As to where you went wrong, or providing the data for others to replicate your work, or do something different with the data in order to test your results/thesis, I will leave that to the statisticians, of which I am not…


#17    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 19:24

@Tango:  Erin is really strongly resisting the argument that the causality goes the wrong way for a reason.  Her method explicitly tries to fix this problem.  She, you, and I all know that regressing wins on attendance is a bad idea because of the reverse causality.  So, she doesn’t.  She uses IV, which does the following:

Building on Andy’s framework, here’s what IV does.  Take his two equations:

A = b1 * R + b2 * D1 + E1

R = b3 * A + b4 * D2 + E2

Now, solve them using some algebra.  This gives you:

R = (b2*b3)/(1-b1*b3)*D1 + b4/(1-b1*b3)*D2 + error

A = b4/(1- b1*b3)*D2 + b2/(1 - b1*b3)*D1 + error

where the erors are functions of the E’s.  Now we have the relationships for runs and attendance WITHOUT depending on each other.  So, we’ve intentionally removed the reverse causality.  Now, imagine running a regression A only on the D’s; i.e. regress attendance on promotions and some controls.  We get:

b2/(1 - b1*b3)

Then regress Runs on promotion and some controls.  You get:

(b2*b3)/(1-b1*b3)

Now divide the second by the first.  You get:

b3

To recap.  What we did was regress attendance on promotions.  Then we regress runs on promotions.  Then we divide one by the other.  Now, this post is really long...intuition coming in a sec.


#18    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 19:30

@ MGL:

“In fact, as we on this blog (and Phil B. on his blog) have demonstrated time and time again, using fancy (and opaque) statistical techniques without an expert and detailed knowledge of the sport itself, more often than not leads to faulty conclusions.”

I wholeheartedly agree with this statement.  Two comments, though.  First, IV methods are much more intuitive than they seem at first...trying to get to this in other posts.  Second, sometimes you need fancy methods to get correct answers.  I honestly don’t fully understand fielding metrics because I haven’t taken the time to, even though I was a scorer for BIS once upon a time.  But I trust that even though there’s surely flaws in them, they are moving in the right direction, away from some old guy making decisions about errors and giving us some objective knowledge about fielding.


#19    Tangotiger      (see all posts) 2010/09/24 (Fri) @ 19:33

If a team goes from 75 wins to 91 wins (+16 wins in a 162 game season, or +.100 wins per game, or +1.00 runs per game), it is totally believable that their attendance (that season, or by next season) will increase from 2 million fans to 3 million fans (+50%).

Is there any issue from anyone on that statement?


#20    MGL      (see all posts) 2010/09/24 (Fri) @ 19:37

"We thought the results weren’t economically significant enough to be interesting - a 48% increase in attendance generates one additional run ?!”

I am afraid that that comment alone evinces the fact that you know virtually nothing about baseball.  Something other than the talent of pitchers and position players changing that causes even a .25 increase in runs is ENORMOUS!  1 one is ridiculous. 

I am not saying that you need to be a baseball expert to do econometric research in baseball, but to be honest it is REQUIRED as a sanity check at the very least.  No academician who is not also a baseball expert (or close) should EVER publish a baseball study without first running it by a sabermetrician…


#21    Bob Tarryan      (see all posts) 2010/09/24 (Fri) @ 19:40

@MGL: In your post you suggest the authors falsified data.  Then, in your comment (#16) you readily admit you don’t have the knowledge/toolset to understand the paper.  Seriously?  Do you think it’s fair to make claims like that without being capable of comprehending the paper?

Frankly, I’d expect you to at least PRETEND you could understand an author’s work before accusing them of fraud.

Kindly avoid making the excuse that you were just listing all possible explanations for discrepancies.  By mentioning fraud/falsification, you’re clearly making an accusation.


#22    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 19:47

Ok, enough math.  Explaining instrumental variables (IV) in words:

Say that nothing matters except temperature, attendance, and runs and suppose that we know that if the team has a promotion, then the 1,000 more fans will show up and cheer (totally made up numbers).  So, 1,000 fans per promotion.

Now, suppose that when a promotion happens, we observe that the home team wins 1% more often.  So, .100 wins per promotion.

Now, divide the second by the first so that the units cancel:

(.100 wins / 1 promotion) / (1,000 fans/ 1 promotion) = .01 wins / 1,000 fans

So, if we have 1,000 more fans we get .01 more wins.  This is what the math in 17 is doing. 

It’s really much simpler than it seems at first.  We’re looking at the impact of promotions on winning.  Then, we credit ALL of the effect that promotions have on winning to the effect through attendance.  Given that assumption, all we need to do is change the units of measurement.

Now, it should be clear that the assumption that ALL of the effect goes through attendance is REALLY IMPORTANT.  For example, if promotions motivate the players directly, we’re in trouble.  More likely in this case, the team probably might set promotions for Tuesday afternoons when they’re playing the Astros.  This could be a problem.  Maybe, maybe not.

In my opinion, if you substitute weather for promotions, we’re in big trouble.  Weather affects play on the field greatly, of course.  For it to be a problem here, we need it to affect the home team differently than the visiting team.  I don’t know which it is, but it seems like it should have an impact since it affects play so much.


#23    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 19:56

@MGL/20:  I think review by sabermetricians would be great.  Who’s willing? =)


#24    MGL      (see all posts) 2010/09/24 (Fri) @ 20:00

"Kindly avoid making the excuse that you were just listing all possible explanations for discrepancies.  By mentioning fraud/falsification, you’re clearly making an accusation.”

Please don’t tell me what I meant based upon your interpretation of what I said.

I meant exactly what I said I meant.  I think their study is wrong.  How or why it is wrong, I have no idea.  It could be falsification of data, it could be… Oh, I already said that.

It really pisses me off when people (so far it is 2 or 3, probably more when the smoke clears) when someone says, for example:

“I find X totally implausible.  Either they falsified data or made some other kind of a mistake.”

And then someone takes part of that out of context and says, “How dare you accuse someone of falsifying data?”

REALLY pisses me off!  It is not politically correct in some circles to even suggest that as a possibility, when OF COURSE it is a possibility.  Well, in my world and on my blog, I don’t give a rat’s ass about being politically correct.

If I publish a study on this blog or anywhere else, and someone thinks it is wrong, then one of the reasons it could be wrong is me falsifying data!

So, if in fact, you think (which I do!) that the study in question is wrong - blatantly and clearly wrong - please list for me the possibilities why it could be wrong.  If you don’t include falsification of data in that list, then you are a stone cold liar.

Here was my original comment, and I’ll even bold it for those of you who have reading comprehension or vision problems:

“Either they falsified data, made an honest mistake with the data, or their methodology was simply terrible.  And I am not afraid to say that publicly”

That statement could NOT be any clearer.  What part of “Either they...” do you not understand? 

Is there anything not factual in that statement, given that I do not believe the study is correct?  Or is it just that uttering the words “falsification of data” is taboo in academic circles?


#25    Tangotiger      (see all posts) 2010/09/24 (Fri) @ 20:02

"regress attendance on promotions”

But it seems that this kind of variable, promotions, seems so… unstable.

The framework may be sound, but this particular implementation doesn’t necessarily make it right does it? 

And, yes, if you get a result that says
+50% attendance -> +.100 wins
then I don’t have to know any more about that particular implementation.  Something seriously wrong occurred, either in parameter selection or assumptions.

If you had +50% attendance -> +.010 wins (or 0.100 runs) that would have been more interesting.

And if you got
+.100 wins -> +50% attendance
that would simply be confirming quantitatively what one would think qualitatively.

Otherwise, you are asking us to accept something clearly impossible based on a sound framework and what is supposedly a solid implementation.


#26    Andy      (see all posts) 2010/09/24 (Fri) @ 20:04

@Tango #11, yes, she is attempting to measure b3 (and control for b4). That was a mistake in my write-up.


#27    Guy      (see all posts) 2010/09/24 (Fri) @ 20:15

"+50% attendance doesn’t mean going from 50% attendance to 75% attendance, it means going from 25% to 75%. This will happen only a few times a season.  Interpreting the effect as +1 run per game isn’t accurate, it’s more like +1 run in maybe two games of the season for the three teams that experience swings of this sort.”

OK, but then are you saying that a 25% change in attendance means a change of about .5 runs?  In a typical stadium, that’s a change of 12,000 fans.  That would mean a typical home team averaging 30,000 fans becomes a .490 team when attendance drops to 18,000, and becomes a .590 team on days when 42,000 fans show up (everything else equal of course).  That’s still implausibly large. 

That means a drop in attendance of 10,000 fans entirely wipes out home field advantage (about .040 in win%)!  And HFA advantage includes umpire bias, field familiarity, and the entire effect of playing in front of thousands of your own fans.

Like MGL, I lack the statistical expertise to judge the methodology.  And I wish he had stated his criticisms with a bit more diplomacy.  But baseball knowledge also counts for something.  We know that players have little/ability to perform better in the clutch.  We know recent performance tells us virtually nothing about the next at-bat.  We know how big HFA is, and that the influence of home fans is at most a fraction of that.  So MGL doesn’t actually need to understand the methodology to know there is a 99% chance that the finding is incorrect. (Although I expect that sounds very arrogant to someone not familiar with past baseball research.)

If you want to say 12,000 fans adds .005 to winning percentage, I’m ready to listen.  But there is virtually no chance this finding is right.  So rather than the rest of us all learning IV, it’s just more efficient for those who already know it to figure out where the study went wrong.

#23:  a lot of people here would be happy to review papers free of charge.


#28    Tangotiger      (see all posts) 2010/09/24 (Fri) @ 20:16

Let me give you another example.  Academics run regressions and say that a single is worth +.52 runs, a double is worth +.67 runs and stolen base is worth +.15 runs.

Well, I know that’s wrong.  It’s wrong, and I don’t care what the uncertainty level is of that .67.  It can even be .001.  It’s wrong.

A double is a single plus a stolen base plus the advancement of runners on base beyond what a single is.  A logical, even mathematical, framework can be constructed to prove the gap between a single and a double is close to .30 runs.  ANYTHING anyone says otherwise under whatever “analysis” is 100% wrong.  I don’t need regression when logic and probability will tell me clearly the difference.

It’s along these lines that +50% -> 0.100 is rejected.  It doesn’t make sense.  The current response is to look at how the parameters were used or the implementation constructed that led to the absurd and false result.

That would be more instructive, that you can show other people what NOT to do.


#29    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 20:20

@Tango:  I agree.  I don’t buy the conclusions of the study because of the choice of instruments (annoying jargon alert: in the example, promotion is an “instrument").  They use temperature and day of the week, which I don’t buy.  These have some relationship with performance outside of attendance

Now, imagine we’re in a perfect world.  Promotions are set totally randomly; they just flip a coin a bunch of times at the beginning of the season to choose nights.  It’s really hard to imagine that they directly affect performance, except through attendance, so given that they’re set randomly any affect of promotions should be evidence of the importance of attendance.  So, say we regress home win pct on these randomly set promotions.  Because they were randomly set, we don’t need to control for anything else.  Just find the raw correlation.  If we get a significant positive coefficient on promotions, then we attribute it to attendance...all IV does after that is change the units.

Perhaps more intuitive: think about randomized drug trials.  They randomly select half the people to give a pill and the other half they don’t (or a placebo).  They compare average results for treated to average results for controls and the difference is the effect of the pill.  Now, imagine that instead of giving them the pill, they just made the pill free for the treatment group but gave both groups the opportunity to get it.  More people in the treatment choose to get the pill and take it.  If we assume that making the pill free only affects health through making it more likely that they take the pill, then we can attribute the effect of free pills to the efficacy of the drug.  Now, replace pills with fans and free pills with promotions.

In all of this, we don’t have to control for anything because the treatments are set randomly.  They’re totally unrelated to anything else.  This makes regression less about predicting the outcome precisely (is this what you mean by stability?) and more about getting the right causal effect of one factor.


#30    Guy      (see all posts) 2010/09/24 (Fri) @ 20:30

MGL:
Yes, you mentioned falsification of data as only one possibility.  But the point is you don’t need to mention that option in the absence of any evidence.  You can just say “I think the authors have a problem with their data and/or their methodology,” and leave it at that.  When you raise the possibility of deliberate dishonesty, as opposed to error, you cross an important line. I’m surprised you don’t see that.

If I comment that I think the UZR rating for Albert Pujols is clearly wrong this year, do you think YOU would react the same to these two statements?

A) I think there’s probably a problem with the BIS data, or maybe MGL has a programming error.

B) I think MGL deliberately distorted this result, maybe because he doesn’t like LaRussa, or he doesn’t even bother to use real data at all and just generates the ratings randomly, or maybe he made a mistake. Definitely one of those three. 

C’mon.....


#31    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 20:37

Ok, I think I’ve been won over that the “smell test” can be useful and legitimate.  When its not possible to judge work based on its merit (because it uses unfamiliar methodology, or whatever), you have to weigh trust in authority/academic credentials/whatever against your current knowledge.  If you know a lot, then it’s possible to use prior knowledge to reject a conclusion, even if you can’t say where it went wrong.  Everyone here knows a lot about HFA and baseball in general, so conclusions can be drawn.  Meanwhile, I have no basis to judge fielding metrics a priori because I have no outside knowledge to draw on.  Fair enough.

I think my remaining concern is that the “smell test” is over-used.  Sometimes, outsiders (say, named Bill James) come in with new methodology and contradict conventional wisdom and don’t pass a smell test.  But even if the conclusion is wrong because of errors along the way, it’s still very possible that the basic methodology is a big improvement.


#32    Tangotiger      (see all posts) 2010/09/24 (Fri) @ 20:48

Remember though why we are rejecting it.  It’s not just that it smells wrong, say the way a BBWAA writer won’t vote for a guy with a 12-12 record for the Cy Young.  (That’s a smell with no thinking.)

It’s that as subject matter experts, we already know what it requires to get one run differential added to a team: it requires Albert Pujols AND Evan Longoria, the two best players in baseball.

To think that adding 10,000 fans per game in lieu would have a similar impact is absurd.  You can add 100,000 fans, and you still won’t get that impact.

And, seeing that it is so easy to see that winning more games will bring more fans (I mean, duh, right), and that the relationship shown in the paper is exactly what is expected, then we have to believe that the direction is what our expectation is, and not the reverse direction that this implementation instead shows.


#33    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 20:53

@27: I’m seriously going to take somebody up on that.  Pre-requisite: you have to have read these posts on IV methods…

@Tango:

“The current response is to look at how the parameters were used or the implementation constructed that led to the absurd and false result. That would be more instructive, that you can show other people what NOT to do.”

This is why I’m trying to explain the IV method.  I think understanding how the methodology is important to get at the promise of this paper as well as where it could be better.  So, in short, here is where I think this study gets an invalid result:

(i.) Their instruments aren’t valid, in my opinion.  In particular, temperature probably has a direct affect on HFA.  If higher temperatures are correlated with hometeam winning somehow through performance (higher run scoring environment favors the home team somehow?) or sample selection (interleague takes place during hot months...home team has a bigger advantage then?).  But, because of the way IV works, these other effects of temperature are funnelled through attendance creating a coefficient that’s two big.  It seems a big stretch to say temperature only affects HFA through attendance.

(ii.) The standard errors may be wrong.  It’s not clear to me from the paper, but I don’t think they cluster their standard errors.  This is a sort of tecnhical issues, but suffice to say, IV methods often mean that typical standard errors are too small.  I would bet that their estimate has much bigger uncertainty in reality than is reported.

(iii.) I wish the referees hadn’t cut the sensitivity analysis.  IV estimates can be really unstable sometimes (see ii.).  I’ll take the author at her word that it checks out, but I would like to see this.  Because IV estimates are so uncertain, it’s often really easy to data mine the result you’re looking for if you run enough regressions.


#34    Butler Blue      (see all posts) 2010/09/24 (Fri) @ 21:02

32: Right, I think we’re on a similar page now.  Your smell test is legitimate because of prior knowledge.  I would draw a dividing line, though, even for people with lots of knowledge.  Consider two statements:

1: Higher attendance raises HFA

2: Higher attendance raises HFA by X amount

#2 can fail the smell test (as it did here) if X is unreasonably sized.  I don’t think that #1 can ever fail the smell test.  Agreed?

Regarding the reverse causality: someone using IV methods is always going to strenuously object to this because they are not looking at the correlation between winning and attendance.  They’re looking only at the correlation between winning and temperature as well as the correlation between attendance and temperature.  They then make an inference about winning and attendance.  That inference may be wrong, but reverse causality cannot be their problem.  I agree that causality generally runs from winning to fans in the seats, but they don’t use that correlation.  This is why IV is really, really different from ordinary regression, even though it looks similar at first glance.


#35    MGL      (see all posts) 2010/09/24 (Fri) @ 22:47

I am NOW rejecting any correlation at all, regardless of the direction because I am finding NO relationship whatsoever between attendance and run scoring, run prevention, or home winning percentage in 2007-2009.  None whatsoever.

Here are the data, BTW, without team years that do not have at least a 5,000 difference between the threshhold for high and low attendance.  The “threshholds” are that number above or below which a game falls into one or the other bucket.  I am basically eliminating team years where there is not much of a variance in attendance from game to game, such as teams who are sold out or nearly sold out every game, like Boston, Yankees, Cubs, and the Dodgers.

Low attendance (N=1401 avg. att = 21,124)

Exp. runs per game = 9.43
Actual runs per game = 9.21
Exp. vis rpg = 4.62
Act. vis rpg = 4.44
Exp. home rpg = 4.85
Act. home rpg = 4.77

Exp. hwp = .542
Act. hwp = .572

High attendance (N=1401 avg. att = 21,124)

Exp. runs per game = 9.44
Actual runs per game = 9.32
Exp. vis rpg = 4.69
Act. vis rpg = 4.63
Exp. home rpg = 4.75
Act. home rpg = 4.69

Exp. hwp = .524
Act. hwp = .533

So we have two groups of games - one with an average attendance of 21,124 and one with 37,699.  The pools of games from each team are the same number.  That is a difference of 16,500 fans in the same year.  That should be a difference of over .6 runs scored, according to the authors of the study in question.  By using expected runs scored and allowed, I am in essence holding everything constant - or more accurately, we don’t care if there are biases in the quality of the teams, since the expected numbers already account for that (in any case, as you can see from the expected numbers, in high attendance games, the other team is of better quality than the low attendance games, especially the pitching.

Yet there is NO difference in home runs scored as compared to expected home runs scored.  None whatsoever!

One study uses an opaque, complex regression to determine that 12,000 extra fans equals an extra .5 runs scored (approx.).  Yet the data for 2007-2009 shows that with a difference of over 16,000 fans, NO extra runs are scored.  None.  Gee, why am I skeptical?  I can’t imagine why.

And BTW, Guy (or anyone else), if you think that the data I am presenting is just not intuitively likely or possible, given your knowledge of sabermetrics, which is vast, then by all means list ALL of the ways that I could be wrong, including making up all or some of the numbers.
Do you think that I or the authors of this study would be the first or last to do that?


#36    Guy      (see all posts) 2010/09/24 (Fri) @ 23:20

I can’t make sense of the main model (table 4).  BB, maybe you can clarify.  It appears to me that it gives much more weight to visitor’s record-to-date than season record, which can’t be correct (season record is stronger predictor), and it also includes home record-to-date but NOT home season record.  It also gives more weight to visitor starter ERA than home ERA, which doesn’t make a lot of sense.  So I don’t have a lot of confidence in the main model, if I’m reading it correctly.  (Modeling starter ERA and team record this way also seems problematic:  on a .650 team a league-average starter should lower the team’s expected win%, while the same pitcher on a .350 team should increase expected win%).

The “virtual attendance” variable is basically a function of temperature and day of week.  Temperature becomes a proxy (in part) for month, which certainly could have impact on HFA because in some months teams tend to play more out-of-division opponents who are less familiar and travel greater distances.  I can’t see any obvious reason why HFA should be larger on weekends, although it’s possible that home teams are less likely to rest star players on the weekend. 

But this is small stuff, and likely can’t explain the huge finding.  Perhaps the season-record-to-date variable, which becomes a better predictor as season goes on, interacts with temperature in some way that distorts things. 

Separately, on the run differential calculations it’s not clear to me they adjust for home team not batting in victories.  That will distort those models.


#37    Depot      (see all posts) 2010/09/24 (Fri) @ 23:23

This is a surreal debate.  First, Erin, I understand why you find this discussion disheartening.  It’s frustrating when you spend a lot of time on a study, worry about all these concerns and people just say, “Nope, I don’t agree with that conclusion.”

Second, this distrust of regression at this site is pretty bizarre.  Yes, regression can be misused.  But, and I’ve said this many times here, if you like means, you like regression.  Just because something is complicated does not imply that it’s wrong.  Using “buckets” is just a way of doing a really bad, noisy regression.

Third, MGL, you’re pissed because people are assuming what you meant by something.  But you assumed the authors know nothing about baseball or that they didn’t consult a sabermetrician.  Maybe they did.  Even if they didn’t, I’m not really sure I would hold that against them.  Is there a literature on how attendance affects win%?  Then there aren’t any experts on the topic anyway.  It’s like the Mitch Hedberg line, “You’re a great cook, can you farm?” Yeah, it’s nice to consult people in the field you’re working in and I highly recommend it.  But, in practice, you don’t always get that much out of it.  In this case, what would’ve happened?  It sounds like Erin would’ve spent the rest of her life trying to explain IV to someone. 

Again, I feel like we should stick to addressing the methodology.  If you don’t think the instruments are valid, why?  Also interesting, the instruments - even if invalid because of omitted variable bias - are finding some strange effects of weather and day of week impacting HFA.  Did we know that before?  My point is that these results are not being driven by reverse causality (but it’s either casual or omitted variable bias).


#38    Guy      (see all posts) 2010/09/24 (Fri) @ 23:39

"Also interesting, the instruments - even if invalid because of omitted variable bias - are finding some strange effects of weather and day of week impacting HFA.  Did we know that before?”

Maybe so, and yes that would be interesting.  But as I said above, the coefficients in the model seem so screwy that it’s hard to have confidence something has been discovered about a connection between HFA and day of week (or temp). 

And look, if the result is real then MGL’s “crude” method should pick it up.  As the authors say themselves, there’s quite a bit of variance in attendance.  If it really has an independent impact on HFA, we should see something.  Much more likely that their variable is correlated with something else, or the model is just broken for reasons we don’t understand.

MGL, I find your results 100% plausible.  Not sure why you’re directing it at me....


#39    MGL      (see all posts) 2010/09/24 (Fri) @ 23:42

"Using “buckets” is just a way of doing a really bad, noisy regression.”

True, but, and this is really important point:

If you have a regression that says one extra X equals .1 extra Y, then if you use buckets you will ALWAYS come up with the same relationship, and in fact, it is a sanity check for the regression!  If you create two buckets, where the average difference in X is 3, then the average Y difference MUST be .3!

It does not work the other way (that when you get two buckets with a difference in X of 3, and a difference in Y of .3, that there is a nice linear (or other shape) relationship between variables X and Y.  That is why the regression is better. When you use buckets, the relationship could be because of one or two outliers and all the other data points have NO relationship. This usually does not happen when we do these kinds of studies in baseball, which is why I have no problem doing the “bucket thing” when I do studies.  In fact, I call them “poor man’s regressions.” I like it better than using regressions because the methodology is much more evident and transparent.

But, I’ll say it again, and I’ll also say again that it is a critical point.  “Bucketing the data” is a sanity check on complex (or even simple) regressions, which can easily go awry because of a computer or other “bug” and no one will know it!  And, if a relationship exists in a regression then it MUST exist when you bucket the data. Absolutely, 100%, must exist!

So if the authors say that a regression found a strong correlation between attendance and home runs scored after adjusting for all kinds of things, then if I bucket the data, I MUST find the same or a similar relationship.  Home run scoring must be higher in the high attendance bucket than the low attendance bucket. And the larger the difference in attendance between the buckets, the larger the difference I should find in home runs scored.  (Obviously if I make the buckets too small, I might run into sample size issues).

And using projected runs scored allows for me to control for other things (for example, if higher attendance means a better team, that will show up in my projected data).

I find zero difference in home runs scored between my two buckets yet there is over a 16,000 fan difference in average attendance between the two buckets.  That tells me that the sanity check failed miserably.  It is NOT possible to find such a large relationship in the regression (around .5 runs per 12,000 fans, as Guy estimates), and NONE in the bucketing.  Not possible at all.

If you want to argue that I used a different data set, you will also have to argue that this relationship existed in 96-05 but not in 07-09.  That is not too plausible, although I suppose my study could have failed, as compared to theirs, because of a gigantic Type I error.

This is Bayesian guys!  Start with the chances of a 12,000 rise in attendance miraculously causing (or just being related to) a .5 run increase in attendance.  Then add to the Bayesian analysis the chances that their study was done correctly with no errors, intentional or otherwise, and also that it wasn’t a gigantic Type II error, and see what you get.  If you put the a priori at something like 1 in 10,000, which would be my estimation, you are going to get something like a 99% chance that they are wrong…


#40    MGL      (see all posts) 2010/09/24 (Fri) @ 23:49

Guy, right, we are saying the same thing - that if the relationship existed to any significant degree, let alone the .5 runs per 12,000 fans, it would have to show up in my “bucket” study.

The only thing I was directing at you was about the comment I made regarding falsification of data. You seem to think that was over the line, and I can understand why you would think that.

However, I was merely stating a fact (that that is one of the ways that a study can yield incorrect results and that it happens all the time), and I was not limiting myself to what might be politically correct or what might insult the authors.

To me there is a clear difference between, “I think you falsified data,” and, “I think your study is wrong, and one of the ways that it could be wrong was if you falsified data, and not knowing who you are, that is always a possibility, whether you want to hear that or not.”


#41    MGL      (see all posts) 2010/09/24 (Fri) @ 23:51

To relate it to Phil B.’s lemon study, it is kind of like saying to someone honest who is selling a perfectly food car, “I will not pay you $5,000 for your care, even though that is what it is worth if it is in good or even average condition, because I think that by virtue of the fact that you are selling it, it is likely in less than average condition and you know that.”

Insulting (and I have a significant chance of being “wrong” about the care being in less than average condition), but true…


#42    MGL      (see all posts) 2010/09/24 (Fri) @ 23:58

BTW, Erin, you say this:

“First, as you correctly point out, it’s hard to determine causality as opposed to correlation.”

This is part of the abstract from your study:

Using two-stage least squares, we find that attendance has a significant effect on the home-field advantage. Our results indicate that a one standard deviation increase in attendance results in a 4% increase in the likelihood of a home team win. We also find that if attendance as a percent of stadium capacity were to increase by 48%, we would expect the home team’s run differential to increase by one run.

Can it be any clearer that you are stating causation?


#43    Guy      (see all posts) 2010/09/24 (Fri) @ 23:59

MGL:  Personally, I would set the bar a little higher, and say you should only mention falsification if you have some good reason to suspect it.  If, for example, they were reporting a strong correlation between attendance and HFA, given your data I’d say there’s reason to suspect falsification.  But they aren’t even looking at that correlation, so it’s still quite possible their error is innocent.  (Although I think it’s fair to say that, given their conclusion, the authors SHOULD have examined that correlation, as a sanity check just as you say.)

*

“And, if a relationship exists in a regression then it MUST exist when you bucket the data. Absolutely, 100%, must exist!”

Well, almost.  There can be offsetting correlations that hide true relationships.  Let’s say they claimed a positive in-season attendance/HFA relationship.  You might still find zero correlation, because attendance rises when a good team comes to town.  The net effect could be zero correlation. 

In this case, I think you’ve controlled for the factors that might plausibly “cover up” attendance as a causal factor.  But that might not always be the case in a simple “bucket” analysis.


#44    MGL      (see all posts) 2010/09/25 (Sat) @ 00:00

And…

A “food car” is actually a “good car,” not a car that you can eat. And a “care” is a car…


#45    Depot      (see all posts) 2010/09/25 (Sat) @ 00:08

So, yeah, it’s nice when all results turn out similarly.  But your buckets approach isn’t the same as Erin’s approach so it’s not clear that it should get the same answer.  The discrepancy could be caused because you’re relying very, very heavily on the accuracy of your projections.  If those are wrong, you’re prone to omitted variable bias. 

I don’t agree with this sanity check point.  I’m not sure how “buckets” is less prone to error than a regression.  Economists - and anyone with experience with regression - find the regression-running step pretty easy and straightforward.  And, really, the point of using IV is because you don’t think raw correlations between buckets and outcomes exist even when a causal relationship does.


#46    MGL      (see all posts) 2010/09/25 (Sat) @ 00:19

I would have to be under-projecting the high attendance games by a lot for my buckets study to actually be the same as their results (since teams actually score fewer runs in high attendance games).  That is unlikely even if my projections were not very good for the simple reason that there is no particular reason that my projections would be biased in favor of high or low attendance. In any case, a projection is a projection. They always work in the aggregate. Give me a 1000 players, and I can ALWAYS tell you exactly how they will perform in the future, in the aggregate.  You can always use them as a proxy for expected performance (see Phil’s lemon study). (And these are actually very detailed projections, offense, defense, base running, pitching - they are not Marcels.)

In any case, I presented my data. I don’t know what IV is and I don’t understand the methodologies in their study. I’ve spoken my peace (or is it piece?).  I don’t really have anything else to add, other than philosophical comments.


#47    Guy      (see all posts) 2010/09/25 (Sat) @ 00:25

"But your buckets approach isn’t the same as Erin’s approach so it’s not clear that it should get the same answer.”

Sure it is.  The relationship Erin is describing is very strong, and the variance in attendance is substantial.  It should be visible to the naked eye in MGL’s analysis.  Otherwise, you have to postulate some other variables that are correlated with attendance but suppress winning %, in just the right proportions to obscure this huge relationship.  That isn’t remotely plausible.

A question for Depot or BB:
I had assumed that in the 2nd stage regression, only the instruments are included (day of week and temp).  But as I look at the paper (table 4) it appears the Percent Attendance variable is included (i.e. her full modeled attendance estimate).  But the controls don’t appear to be identical in the two models.  For example, home team’s average attendance—which likely is a very strong predictor of team performance—is part of the Percent Attendance model, but not included in the performance models.  Couldn’t that explain why the attendance variable appears to predict performance?  Or am I misunderstanding the process here?


#48    MGL      (see all posts) 2010/09/25 (Sat) @ 00:28

"Economists - and anyone with experience with regression - find the regression-running step pretty easy and straightforward.”

Your kidding right?  You ARE being sarcastic?

I posted about the original study 3 days ago.  We’ve had lots of smart economists and statisticians (and other really smart people) make comments on this paper, yet no one seems to be able to make heads or tails of it.  Or at least, no one seems to be able to even have a clue as to where they went wrong or right!

Easy and straightforward?

Take a look at Phil’s lemon study and my numbers above. Those are straightforward and anyone can see what is going on.  If that were the first study shown, and assuming that my numbers are not wrong (falsified or otherwise), it is pretty clear that attendance, runs scored, runs allowed, or hwp have no relationship whatsoever.  If someone wants to quibble with those numbers or any conclusions thereof, all is there in black and white to quibble with - the projections, the method for creating the buckets, etc.

Now, if you are talking about a simple single regression with a linear (or even non-linear) result, then sure…


#49    MGL      (see all posts) 2010/09/25 (Sat) @ 00:35

And BTW, by creating the buckets on a year to year basis, it pretty much eliminates the good team, high attendance, bad team, low attendance problem, since a team tends to have the same talent and roughly the same performance all year.

And even if the fans are responding to past performance within a given year, you will still NOT find a relationship between attendance and team talent, if you use a projection as your proxy for team talent, which I am doing.

For example, let’s say that team A is a true talent .600 team and therefore they have lots of fans at the beginning of the season, but they go .500 for the first half.  And let’s say that the fans for game 82 perceive the team as a .500 team and fewer fans go to the game.  They will still win at a .600 clip on that day (and any future days), so if you treat each season separately as I did, and if you use player projections for each game as a proxy for team talent, you will NOT see any relationship between attendance and team talent!


#50    Depot      (see all posts) 2010/09/25 (Sat) @ 00:44

So, I would say that your average economist could read this paper and understand it pretty quickly.  I skimmed it in just a few minutes.  Um, that’s not bragging at all - you just get so used to reading these things that it’s easy.  The study was written for economists so it’s easy for economists to read.  Yeah, people didn’t understand it but many didn’t read it and others aren’t trained to (which is fine).

And I was specifically talking about programming.  If I’m programming, the places I’m most likely to mess up are the parts where I’m coding the data into variables - a step you do regardless of whether you’re using a regression.  Regression themselves are rarely the time when you make a coding error.

As much as I understood Erin’s method, I really have no idea what you did.  Your method is based on the accuracy of your projections and how you go from individual projections to team runs.  But you never explained that.  Back to my original point though...I’m just not sure why Erin’s method is more error-prone than your’s.  It’s not.


#51    Guy      (see all posts) 2010/09/25 (Sat) @ 00:49

OK, following up on #47:  the key model predicting HFA includes the home team record-to-date, but not their season record.  Record-to-date is a weak measure of a team’s true talent, especially early in the season.  Given that, it makes complete sense that the Attendance variable is a strong predictor of HFA.  Teams with a high attendance will obvioulsy tend to be stronger teams.  If I have two 15-15 teams, one of which draws 40,000 fans a night and one that draws 20,000, we can say with some confidence that the first team is likely to be much better going forward.  Because the model includes only this weak control for team talent, the attendance variable is essentially “sneaking in” additionall information about the home team’s true talent.

At least I think that’s what is going on....

Depot:  I don’t see what’s complicated about MGL’s method.  He is using estimates of the talent of each team’s players (based mainly on past performance) to control for the talent level of the two teams.  Erin used various versions of the two teams’s win% to do the same thing (though not as well).  Once you do that, any major effect from attendance will be obvious.  The likelihood that MGL’s projections were off in such a manner as to precisely obcure a large attendance influence is infintesimal.  I think you must know that....


#52    Andy      (see all posts) 2010/09/25 (Sat) @ 01:59

Yes, I showed in the earlier thread how reverse causality can influence MGL’s approach.  Not that it does, but that it can.

We should be having a discussion about where the assumptions of specific approaches break down.  There is nothing wrong with IV, per se. And there is nothing wrong with MGL’s approach per se.  There are just possible errors that we should be able to reasonably discuss.

As I’ve said a few times, the best way to analyze Erin and her coauthor’s paper is to try answer two questions:

First, Is it plausible that HFA is correlated with weather for other reasons? That is, are her instruments truly random? It’s easy to imagine that weather affects home players differently than away players. If so, this might explain her results.

Second, and most importantly in my view, what are the relevant covariances between her instruments and outcomes?  This sounds technical, but it’s really just a call for more data presentation. How much does attendance change when the weather changes? when promotions happen? Graphs of all the first stages and all the reduced forms.  What happens when we change the regression specification slightly? The paper jumps too quickly to the results and omits important robustness checks, at least for my taste. 

If it looks like ALL her instruments have a strong influence on attendance AND there aren’t reasonable stories about how her instruments might impact HFA, then her methodology and results are sound.  If you dispute the results, find a problem with the above two options - those are the only possibilities.


#53    MGL      (see all posts) 2010/09/25 (Sat) @ 02:10

"If I have two 15-15 teams, one of which draws 40,000 fans a night and one that draws 20,000, we can say with some confidence that the first team is likely to be much better going forward.  Because the model includes only this weak control for team talent, the attendance variable is essentially “sneaking in” additional information about the home team’s true talent.”

That is exactly true.  I think that attendance slightly helps to correct my projections just like it corrects a team’s actual w/l record (it seems that when attendance is high I slightly underrate the opposing team and when attendance is low, I overrate it - IOW, attendance is a good - albeit weak - proxy for the talent of the opposing team and it adds to just about anything you do to estimate that talent).

Now, how that affects the results of her study I have no idea.

I highly doubt that weather or anything like that that relates to attendance affects HFA (to any significant degree).  Home and opponent talent, yes.  Anything else, no.


#54    Guy      (see all posts) 2010/09/25 (Sat) @ 07:25

"If it looks like ALL her instruments have a strong influence on attendance AND there aren’t reasonable stories about how her instruments might impact HFA, then her methodology and results are sound.  If you dispute the results, find a problem with the above two options - those are the only possibilities.”

Andy:  What about my point that the performance regressions don’t include the same controls?  Isn’t that a problem?  It doesn’t seem to me that only her instruments are being tested in these regression.

In any case, there’s a simple way to check this.  The most powerful instrument by far is the weekend/weekday variable.  So we should find that HFA is MUCH larger on weekends than during the week.  If someone wants to check, my guess is that we won’t see that.  If we do, that’s interesting and then we can argue about whether other factors might explain that.


#55    MGL      (see all posts) 2010/09/25 (Sat) @ 09:40

Here are the weekday/weekend numbers, using my same database:

Weekdays, 2007-2009

Avg. att = 29,082

Exp. rpg = 9.37 Actual = 9.38
Exp. vis runs = 4.58 Actual = 4.59
Exp. home runs = 4.79 Actual = 4.79
Exp. hwp = .537 Actual = .550

Weekends (Fri, Sat, Sun), 2007-2009

Avg. att = 34,687

Exp. rpg = 9.38 Actual = 9.35
Exp. vis runs = 4.58 Actual = 4.57
Exp. home runs = 4.79 Actual = 4.79
Exp. hwp = .538 Actual = .549


#56    MGL      (see all posts) 2010/09/25 (Sat) @ 09:46

Andy, I don’t understand why it matters whether the factors that could affect attendance also affect HFA, given that I have found ZERO relationship between attendance and HFA.

If weather affected HFA and weather also affected attendance (which I’m sure it does), then that would show up in my numbers.  Same for anything else.  I mean if the low attendance and the high attendance groups in my data have exactly the same numbers (RS, RA, etc.) as compared to expected numbers, then that is the end of the story, right.

If buckets show no difference than regression will show no difference, assuming that they are measuring the same thing.  In my buckets, it is simply measuring attendance versus home runs scored and runs allowed.  Using the expected numbers as a proxy for talent acts as a control for the only thing that needs to be controlled, which is the average talent of both teams on high and low attendance days.


#57    Andy      (see all posts) 2010/09/25 (Sat) @ 09:55

Awesome. MGL, that is exactly what I was looking for with my 2nd option! This is the type of evidence that directly attacks the paper using its own methodology.

The instrument (weekend vs. weekday) has NO impact on any game outcome. Using instrumental variable techniques with just that variable, we would find ZERO impact of attendance on wins/runs/etc.

As I’ve said, the big hurdle this paper faces is presenting a wealth of evidence that such a strong effect exists. If the effect were really this big, it should be there in many different cuts of the data. Not seeing the result in one of the most obvious comparisons (like MGL did here) is a big problem.


#58    MGL      (see all posts) 2010/09/25 (Sat) @ 09:58

Keep in mind that I am using 07-09 and they are using 96-05, for whatever that is worth.


#59    Andy      (see all posts) 2010/09/25 (Sat) @ 10:23

MGL: I completely agree about buckets versus regression. I prefer regression but that’s mostly due to economist training - I think more clearly about regressions. They are fundamentally the same, though.

The issue is how to choose buckets. The nice thing about the paper is that they correctly point out that you should not choose buckets based on attendance and look at runs (or deviation from projected runs). Those buckets are contaminated. That is, God did not randomly allocate attendance into those buckets.  Instead, people chose those buckets, and part of that choice is likely HFA and runs scored. These buckets include reverse causality.

Much better buckets are the buckets in post 55, weekend/weekday game. It is not random which bucket has more attendance, but the choices are plausibly unrelated to HFA. That is the author’s methodology, which you replicated in effect in #55. Not finding any effect there is very effective criticism.


#60    Guy      (see all posts) 2010/09/25 (Sat) @ 10:44

Andy:  Doesn’t this raise questions then about the authors’ models?  MGL shows us that the main instrument cannot possibly be predicting home win%, and the temperature variable is a minor factor at most.  And yet the instruments appear to predict wins in the model.  So the authors’ attendance variable must be measuring something more than the instruments (perhaps because of the failure to use identical controls).  Right?


#61    Butler Blue      (see all posts) 2010/09/25 (Sat) @ 11:34

@Guy:  Good catch.  The controls don’t appear to be the same.  It’s not just for home record but also for visiting record if I read it right.  If this is the case in their regressions (rather than just a reporting error), then this is a clear methodological mistake…should’ve been caught by all of us (and the referee…) in 2 seconds.  My bad.

@MGL: I agree with Andy.  The results in #55 seals the deal.  This is a big place where the IV is going wrong.  Here’s what the methods in the paper claim and assume but without the jargon:

Verifiable Claim 1: Weekend implies higher attendance

Verifiable Claim 2: Weekend implies more HFA

Assume: Weekend only relates to HFA through it’s relationship with attendance

If all three of these hold, it logically MUST be true that higher attendance causes HFA.  All of us believe that Claim 1 is true.  But you have shown that Claim 2 is invalid, so regardless of what we think about the assumption, the argument is bad.


#62    Andy      (see all posts) 2010/09/25 (Sat) @ 11:47

Guy, I am not seeing where she changed controls. Can you be more specific?

If the question is about whether you should include the same controls in both the first and second stage of IV, then the answer is “yes, you absolutely must.”


#63    Butler Blue      (see all posts) 2010/09/25 (Sat) @ 11:53

@Andy:  Now I’m not sure.  They list “Home Average to Date” and “Visitor Average to Date” in the first stage (table 3) and “Home Record to Date” and “Visitor Record to Date” in the second stage (table 4).  Now I’m thinking this is just labeling the same variable different ways.  So, maybe not the obvious error I thought it was.  Depends on what they actually did...I’d expect they probably did it right but it’s not totally clear.


#64          (see all posts) 2010/09/25 (Sat) @ 12:00

In other news, I was serious about getting better interaction between academics and established sabermetricians.  If anyone’s interested in informally reviewing a draft of a paper, send me an e-mail.  I have a paper ready.  Econometric nerds and sabermetricians alike are welcome. =)


#65    Butler Blue      (see all posts) 2010/09/25 (Sat) @ 12:26

Extending #63:

So now we know the effect isn’t coming through weekend.  Then, it must either be coming through:

a.) weekday vs. weeknight
b.) temperature

OR

c.) the controls matter a LOT

I don’t buy C for almost any IV study unless there’s a strong, stated reason.  Not true here.  B seems immediately problematic, discussed above.  For A, there’s probably some difference in HFA between get-away games during the day and a regular weeknight game.  I would guess that this is really what’s going on...I can see this effect existing and they’re channeling this effect.

Any possibility we could see splits of day game vs. night game during the week?


#66    MGL      (see all posts) 2010/09/25 (Sat) @ 12:44

Here is the weekend/weekday data for 1996-2005, the years of their study:

Overall 23,728 games

rpg: 9.67
v rpg: 4.78
h rpg: 4.89
hwp: .5367

Weekend (Fri, Sat, Sun) 11,339 games

rpg: 9.67
v rpg: 4.79
h rpg: 4.91
hwp: .538

Weekday (M-R) 12,389 games

rpg: 9.65
v rpg: 4.77
h rpg: 4.88
hwp: .535

So, without controlling for the quality of each team on the field that day, we have .03 more runs scored by the home team on the weekends, with about 5,500 more average attendance.

Since more runs are scored on weekends in general probably because there are more day games, and we control for that, the home team seems to score .01 rpg more on the weekends than on the weekdays, and hence a higher home wp, although we would expect only a .001 higher wp while we find a .003 higher wp.

Anyway, not a whole lot of difference here.  To create .5 extra runs, it looks like it would take an extra 70,000 fans without controlling for the overall higher run scoring on the weekends (assuming that the home team scores .03 more runs) and over 250,000 extra fans if we control for overall run scoring (and assume that the weekend only adds .01 runs to the home team).


#67    Guy      (see all posts) 2010/09/25 (Sat) @ 13:24

BB:  I think the answer is likely “C”, the controls.  As you note, in model 1 the authors appear to be including Home attendance, while in model 2 they use Home record.  I think it’s very unlikely that in model 1 they are actually using Home team record (mislabeled).  First, I doubt you would be able to get an R^2 of .76 without using mean home attendance.  Second, home record can’t possibly be only half as powerful as visitor record in predicting attendance.  So as I suggested earlier, I think in the 2nd model their Attendance variable is capturing much more than just the two instruments. 

Also, there aren’t enough weekday games to matter anymore, and there is just no conceivable way that temperature is driving this (look at the coefficients in table 3).  If there isn’t a weekend relationship, there’s nothing here.


#68          (see all posts) 2010/09/25 (Sat) @ 13:47

@Guy: Yeah, I guess it would take a 65 degree swing in temperature (6 s.d.!!) to equal the impact of weekends.  It’s either got to be something coming through the inconsistent controls or HFA must be huge during weekday games.

In any case, as a Cubs fan I take exception to you saying there are barely any weekday games anymore.  =) Maybe Wrigley day games provide an exceptional HFA.  That would explain the results. =)

Yeah, probably the controls.


#69    Depot      (see all posts) 2010/09/25 (Sat) @ 13:56

I agree that with IV you rarely want to see controls matter.  The main exception to this is fixed effects.  The authors include ballpark fixed effects so they’re really looking “within-ballpark,” not comparing cold cities to warm cities.  The year fixed effects seem important too.  (And, really, they probably should have used ballpark*year interactions).  I think I’d be ok with these fixed effects mattering because it means the experiment is...how does a change in temperature/attendance affect HFA in Chicago?  Instead of...is there a correlation between HFA and playing in a cold city?

I also agree with the sentiment that it would have been nice to see results using only one instrument at a time.


#70    Butler Blue      (see all posts) 2010/09/25 (Sat) @ 14:05

66:  Exactly.  This analysis is exactly what IV is supposed to do: the comparison of weekend to weekday ‘buckets’ to get a weekend-weekday run differential.  Then change the units using the weekday-weekend attendance differential.  This is exactly IV logic.

So they say that half a stadium of fans => +0.5 run differential to the home team.  So, ~20,000 fans for half a run.  You replicate it and get 250,000 fans for the same run differential.  They’re off by an order of magnitude.

With your numbers, then the difference between an empty and a full stadium (say 40,000 capacity) is .08 runs => ~.008 wins.  Over 80 home games, two thirds of a win.


#71    MGL      (see all posts) 2010/09/25 (Sat) @ 14:06

If you wanted to investigate any possible relationship between att. and HFA, why not just start with a regular ole’ regression of att on hwp (and also home rpg), with dummy variables for year and home team?


#72    Andy      (see all posts) 2010/09/25 (Sat) @ 14:30

For what it’s worth, here is how I would write this paper:

1. Show raw correlations and simple OLS. (MGL’s suggestion #71). It is hard to think about IV without understanding the baseline.

2. Discuss why these numbers have potential problems in telling us about causality. It runs (potentially) both directions.

3. Discuss my instruments. Explain why we think each one is plausibly uncorrelated with HFA but correlated with attendance.

4. Show lots of plots of both the instruments + attendance and instruments + HFA. This is the story of the causality in a nutshell.

5. Show 10-20 different IV regressions with slightly different specifications and similar results.


#73    Andy      (see all posts) 2010/09/25 (Sat) @ 14:32

@Guy #67:

I agree with BB that there is likely a mislabeling of tables. There is no way that “home average to date” is an attendance measure. It surely is their average record (W/W+L). Further, even if the variables switched and Table 3 and Table 4 are different specifications, I strongly suspect that Table 4 was created in one Stata (or equivalent) command, which would make your concern about variable switching moot in interpreting Table 4.


#74    MGL      (see all posts) 2010/09/25 (Sat) @ 15:33

#70, I think any difference (and a small one) in HFA you find on the weekends is going to be due to day games.  Day games historically have a higher HFA, likely due to the road team being tired, especially after a night game, going out on the town at night, etc.  The last few years (not in the 96-05 database obviously), the day game HFA (as well as the overall HFA for all games) has been extremely large.  Some people have speculated that it is due to the players no longer using greenies to pep them up in day games after night games (or just day games alone).  And that the visiting team players are ones who usually need the greenies, as they are staying in hotels, partying at night, etc.

If I can, later today, I will re-run the weekday/weekend numbers, but control for day/night games.


#75    Guy      (see all posts) 2010/09/25 (Sat) @ 15:40

Andy, then how do you explain why visitor record is a better predictor of attendance than home record?  Doesn’t that seem unlikely?  (The R^2 also seems implausible without any variable that captures the team’s average attendance%, but maybe the ballpark fixed effect captures a lot of that?)

But let’s say you’re right.  What do you think explains the apparent (but nonexistent) relationship between attendance and performance?  There’s clearly an error; the only question is where.  This is an autopsy, not a diagnosis.


#76    Depot      (see all posts) 2010/09/25 (Sat) @ 15:51

Guy,

There’s already a measure of home team performance implied by the ballpark fixed effects.  Thus, the visitor’s season record is the only measure of visitor team quality while the home team’s “team quality” is already accounted for my fixed effects.  The R^2 is perfectly normal for a specification with fixed effects.  What you want to see is the partial R^2 relating to the instruments only.


#77    MGL      (see all posts) 2010/09/25 (Sat) @ 16:02

96-05

Weekday night games (9933)

.536
4.74
4.86

Weekday day games (2456)

.533
4.89
4.99

Weekend night games (6058)

.541
4.77
4.91

Weekend day games (5281)

.534
4.81
4.91

Looks like I was wrong about day games.  Of course without knowing the personnel, it is hard to infer anything from these numbers.  And day and night games probably have different average attendance numbers…


#78    Guy      (see all posts) 2010/09/25 (Sat) @ 16:07

Depot:  thanks, that makes sense.  But I still have trouble making sense of all the Table 4 coefficients for predicting home win %.  Why would visitor record-to-date be highly significant and powerful, while visitor season record is not even significant?  Home record-to-date is also highly significant, despite the fixed effects.  All of our knowledge about the hot hand effect tells us this can’t be right. 
I really don’t like a variable like record-to-date whose explanatory power changes over the course of the season (and roughly inverse to game temperature, as it happens).


#79    Depot      (see all posts) 2010/09/25 (Sat) @ 16:39

Coefficients on controls in these models are always a little hard to interpret for the reasons you’re alluding to - they might simply be picking up other things.  The study isn’t designed to identify those coeffs causally.  I’m not 100% thrilled with the controls.  My intuition is that it doesn’t matter, but it just bothers me when the controls include part of the effect (if you win a game, the visitor’s season W-L record is affected, pitcher ERAs are endogenous).  I would prefer it if they hadn’t included those but, again, I’m guessing this is a relatively small problem.  I could be convinced otherwise.


#80    MGL      (see all posts) 2010/09/25 (Sat) @ 17:15

Sorry for the sarcasm, but…

Regression - simple and easy to understand…


#81          (see all posts) 2010/09/25 (Sat) @ 18:13

@MGL:  Thanks for running the numbers.  And yes, this is the least straightforward IV I’ve ever seen.

@Everyone:  HFA advantage is at it’s worst during weekday-day games.  Hmmm… So they can’t be identifying off of that.  We already know that the effect from weekend vs. weekday isn’t big enough to do the job.  That only leaves temperature.

MAYBE the temperature instrument is the problem.  We know from the first stage that it’s correlation with attendance is weak.  Even if temperature has only a small direct impact on HFA, then the weak first stage is going to blow this up into a big effect.  Say we only used temperature as an instrument:

A = b1*T + error
R = b2*T + error

We know that the IV using only temperature is b2/b1.  But b1 is known to be really small.  They’re channeling the direct effect of temperature and then it’s getting blown up.


#82    MGL      (see all posts) 2010/09/25 (Sat) @ 18:25

"HFA advantage is at it’s worst during weekday-day games.”

Let’s not confuse hwp with HFA.  You have to know the personnel on the field for the home team and the opposition to make any inferences about HFA.  It might be that home teams tend to rest their starters in those weekday day games, or some such thing…


#83    Depot      (see all posts) 2010/09/25 (Sat) @ 18:27

MGL, but I still have no idea what your method does.  Yes, I get the basic gist.  But I have no idea how the projections are done, how they’re turned into runs/game and win%, etc.  Maybe the projections don’t handle certain types of teams well, maybe your adjustment for pitcher quality doesn’t take into account that great pitchers handle great lineups disproportionately well, maybe there are interactive effects for having 2 sluggers in the lineup, etc.  And attendance might respond to these exact things.  I’m not saying any of those specific things are true but there are a host of stories along those lines.

To put this slightly better...let’s pretend we had 10,000 situations where we saw the same lineup, same ballpark, same pitcher twice in the same season.  We could just compare the difference in outcomes for each situation to the difference in attendance.  That would be your perfect experiment (you could put attendance in buckets - that’s fine too).  Andy in the other thread would likely (uh, sorry if I’m getting this wrong) say, “You’re comparing Cliff Lee facing the Yankees June 1 to Cliff Lee facing the same Yankees lineup Oct 1.  There’s likely a reason more people showed up to the one game than the other and we don’t know what that is.” I think I agree with this.  And, so, I’m not sure why I should find your results “easy to understand” since I have no idea what’s driving them.

When I say regression is “better,” I’m making a pretty trivial point.  By taking means, you’re always assuming that your buckets are orthogonal to X=control variables and an error term.  With regression, I can do the exact same thing but with a weaker assumption - that the buckets are orthogonal to only the error term.  With IV, I can even let the buckets be correlated with the error term.  It’s much easier for me to understand this.  Using the temperature idea...it’s much easier for me to understand the underlying experiment when ballpark fixed effects are included - we’re looking at the impact of changes in temperature on outcomes/attendance.  Without those fixed effects (using only buckets), I would be confused - some of the variation is coming from variation within-ballpark and some variation is coming from the fact that some ballparks are in warmer weather.  Which one is driving the results? _That_ is confusing.


#84    Butler Blue      (see all posts) 2010/09/25 (Sat) @ 18:42

@MGL: Right, shouldn’t confused HFA and home win pct.  Exactly those types of responses by managers are the reason why I’m skeptical of the instruments in this paper.  That aside, I’m interested in figuring out how in the world the authors got their result.  The numbers you give seem to conflict with that

*

In the interest of not being a lazy bum relying only on MGL, here’s some numbers I ran to look at temperature.  It’s 2002-2009, but that’s all I have on hand.  In the interest of clarity, I’ll use buckets. =) The HIGH bucket includes all games where temperature is greater than the average for that park in that year (read: parkXyear fixed effects; I also tried it just by site alone and its identical).  The LOW bucket has all other games:

HIGH TEMP
Pct: .543
Home: 4.92
Vis: 4.73

LOW TEMP
Pct: .548
Home: 4.65
Vis: 4.49

So, the home team does (relative to opponents) better when it’s cold.  This blows my theory for where the authors’ results come from. 

I’ll get back in a second with temperature buckets that don’t reflect the seasons… This is off from the paper’s setup but should be more interesting.


#85    Butler Blue      (see all posts) 2010/09/25 (Sat) @ 18:51

I ran the numbers with the buckets defined relative to the monthXyearXballpark average temperature and it’s exactly the same.  So, home teams win slightly less in hot weather, even when we focus on temporary changes within a month in a particular park.  This is totally unrelated to the paper, now, but any intuition for why this would be the case?


#86    JEH      (see all posts) 2010/09/25 (Sat) @ 19:30

"So, home teams win slightly less in hot weather, even when we focus on temporary changes within a month in a particular park.  This is totally unrelated to the paper, now, but any intuition for why this would be the case? “

Just a guess, but a lower run scoring environment in cooler weather may make the last at-bat worth more (more close games) . . . perhaps because of all of those visiting managers saving their closer for a save situation. smile


#87    MGL      (see all posts) 2010/09/25 (Sat) @ 19:38

Using my projections for 07-09, here is the data for day/night and weekday/weekend:

For weekday day games (682 games)

Expected rpg = 9.32
Actual rpg = 9.24
Exp vis rpg = 4.55
Act. vis rpg = 4.43
Exp. home rpg = 4.77
Act. home rpg = 4.81
Exp. home wp = .540
Act. home wp = .550

For weekend day games (1354 games)

Expected rpg = 9.28
Actual rpg = 9.36
Exp vis rpg = 4.51
Act. vis rpg = 4.50
Exp. home rpg = 4.76
Act. home rpg = 4.86
Exp. home wp = .541
Act. home wp = .578

For weekday night games (2805 games)

Expected rpg = 9.34
Actual rpg = 9.36
Exp vis rpg = 4.58
Act. vis rpg = 4.61
Exp. home rpg = 4.77
Act. home rpg = 4.76
Exp. home wp = .536
Act. home wp = .544

For weekend night games (1541 games)

Expected rpg = 9.35
Actual rpg = 9.33
Exp vis rpg = 4.59
Act. vis rpg = 4.67
Exp. home rpg = 4.77
Act. home rpg = 4.67
Exp. home wp = .535
Act. home wp = .514

As you can see, a large HFA in day games over the last 3 years, hence the “greenies” speculation…


#88    MGL      (see all posts) 2010/09/25 (Sat) @ 19:42

JEH, what is the standard error for the difference between the .543 and .548?  Just wondering how significant that difference is, although I am not a slave to a 2 or 2.5 sigma “statistical significance.”


#89    JEH      (see all posts) 2010/09/25 (Sat) @ 20:25

MGL/88

I get .005 every time I subtract the larger from the smaller so I am guessing the standard error is 0. smile The standard error on the estimates themselves will be small as well, but I would need to know how the estimates were calculated to be less vague.

My intuition is that it’s probably not significant because there is no indication (that I saw) that expected winning percentage is included or how the games were split into hot/cold (if the latter is by absolute temperature than we face certain teams, based on geographic location, accounting for a large share of certain sample and if by relative temperature than we have blurred hot/cold distinctions).

On the other hand, ButlerBlue (#85) seemed to think it might be significant and he knows how the data was used so I offered a guess.


#90    MGL      (see all posts) 2010/09/25 (Sat) @ 22:34

"I get .005 every time I subtract the larger from the smaller so I am guessing the standard error is 0.”

A good one!

And I asked the wrong person.  I meant to ask BB, and it should be based on the binomial (the sum of the variances) given the number of games in each bucket.


#91    Guy      (see all posts) 2010/09/26 (Sun) @ 12:57

"So, home teams win slightly less in hot weather, even when we focus on temporary changes within a month in a particular park.  This is totally unrelated to the paper, now, but any intuition for why this would be the case?”

BB:  Temperature will serve as a rough proxy for month unless you control for that.  It could be that HFA is slightly higher early in the season, as there’s research from other sports suggesting HFA is greater when a player first visits an opposing stadium.  That will tend to happen more often in April/May than rest of season.  But I’d expect this to be a small factor at most.


#92    MGL      (see all posts) 2010/09/26 (Sun) @ 18:29

1980-2009 (regular season only)

March, April and May

20,633 games
.5390 4.536 4.656

June-October

44,609 games
.5400 4.525 4.666

June-August (warm months)

33,726 games
.5391 4.554 4.686

March, April, May, September, and October (cool months)

31,516 games
.5404 4.502 4.639

Not much of a difference in all of these numbers.

However, in the post-steroid and post-greenies era, who knows?


#93    Butler Blue      (see all posts) 2010/09/26 (Sun) @ 18:39

@MGL: Yeah, it’s not significant.  It’s half a s.d.  And...I fail stats 101.

@Guy: Yes, of course temperature changes with month.  See 85.  In this case, it doesn’t change the numbers noticeably.


#94    MGL      (see all posts) 2010/09/26 (Sun) @ 19:21

"It’s half a s.d. “

Less than that.  (Sum the binomial variances for around 30,000 games, and the square root of that is the SD of the difference between the two hwp - around .0041.)


#95    Non Sequitur      (see all posts) 2010/09/27 (Mon) @ 15:26

Did you ever hear the story about the scientist who trained a flea to jump at the sound of a bell? Some scientist trained a flea to jump at the sound of a bell. Once the flea was well-trained, the scientist pulled a leg off the flea. The flea still jumped at the sound of a bell. Off came another leg. It still jumped. A third leg. Jumped. The final leg. And then, when the scientist sounded the bell, the legless flea didn’t jump. The scientist concluded that the flea was deaf. Now, if the correlation between bell-soundings and legless non-jumps is R^2 = 1, does regression confirm the scientist’s conclusion? If not, pray tell, how could the regression procedure be improved?


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 20:16
Largest demonstration in Canadian history?

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com

May 24 00:16
Psst… wanna intern… somewhere?