THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, March 02, 2010

Attendance and winning

By Tangotiger, 02:27 PM

Sky takes a look.  A similar study was done a year or three ago in By The Numbers.  If Phil is around, maybe he can link to it.


#1    Jeff Z      (see all posts) 2010/03/02 (Tue) @ 16:03

Looks like the same results as this person got:

http://armchairgm.wikia.com/Predicting_MLB_Attendance:_Multiple_Regression_Analysis_of_MLB_Attendance_and_Ticket_Prices


#2    Tangotiger      (see all posts) 2010/03/02 (Tue) @ 17:30

Found it:

http://www.philbirnbaum.com/btn2003-08.pdf

Article by Darren Glass.  Here’s how he started it:

Methodology
I looked at all seasons from 1973 until present. In particular, I looked at the correlation coefficients between the following variables:
• Average home attendance per game (ATT)
• Home attendance per game divided by Average Home attendance over all teams (to normalize for nation-wide trends) (ATT/AVE)
• Final place in divisional standings (PLACE)
• Winning Percentage. (WIN)

I think the more recent seasons is key by the way.


#3          (see all posts) 2010/03/02 (Tue) @ 17:52

Interesting couple of links.  I do like my approach better, but I guess I am biased smile

Tom, I limited the data to 1981 and onward, and all of the results stayed pretty much constant.  Coefficients changed a little, but all the conclusions remained the same.  You also get similar results if you extend back to 1901.  The forces that bring folks to the park have remained remarkably consistent.


#4    Tangotiger      (see all posts) 2010/03/02 (Tue) @ 18:05

Sky: whoah, very cool.

Instead of log, what happens if you change all the attendance figures to an index to that year.  So, average = 100 each year.

I have a good reason to not like logs, which I’ll try to articulate later.


#5          (see all posts) 2010/03/02 (Tue) @ 18:24

I’m not a big fan of regressing on a ratio like that.  Also, I can’t create that index so easily right now, so that will have to wait.

I did try using the raw attendance figures on the 1982-2009 data.  That’s not ideal, but I got pretty similar results, with the added bonus that coefficients are easier to interpret.

Why don’t you like logs?  It’s a quite useful transformation.


#6    Tangotiger      (see all posts) 2010/03/02 (Tue) @ 19:15

Sky, for now, let me just say that you end up trying to minimize the error of the log.  And that means extreme data points will have the same error distance of non-extreme data points, just because the log-fit is better.

But in reality, we want to minimize the square of the actual data points.

I’ll give you an example tomorrow if you can’t read my mind…


#7    Charles Saeger      (see all posts) 2010/03/02 (Tue) @ 19:40

One thing I always wanted to see was some sort of table showing attendance by day of the week, month, day or night, divisional foe, opponent record and so on. I’ve been curious to see if interleague play actually helps attendance—since those games are played while school is out, they’d have a higher attendance expectation.


#8          (see all posts) 2010/03/02 (Tue) @ 20:14

I left the following comment over at Baseball Analysts:

First, the analysis is interesting, and the results are more-or-less consistent with a lot of published attendance studies. But it’s not clear whether your analysis is a multiple regression analysis or a series of bivariate analyses. The results can change as you include more explanatory variables, for any number of reasons. (For example, multicollinearity between the explanatory variables.)

Second, the “ticket price effect” is complicated, because determining an appropriate ticket price is not straightforward. There are, essentially, two approaches. The first takes annual ticket revenue and divides it by annual attendance (defined as tickets sold, not people who actually show up). The second constructs a weighted average ticket price, where the weights are the percentage of seats available at a particular price, regardless of whether they sell or not. These will almost certainly yield different “ticket prices.” I believe the second approach is theoretically preferable, but there’s a lot of disagreement about this.

The first approach (ticket price as average ticket revenue per attendance), for example, is likely to yield “ticket prices” that that decline as attendance rises, if fans tend to buy the best available seats first. So the declining-ticket-price-is-associated-with-higher-attendance finding is NOT lower-ticket-prices-lead-to-higher-attendance, BUT rather, higher-attendance-leads-to-lower-priced-tickets-being-purchased...the causation is from attendance to ticket prices, not from ticket prices to attendance.

The other ticket price weirdness is that a lot of published studies find that higher attendance is associated with higher, not lower, ticket prices…


#9          (see all posts) 2010/03/02 (Tue) @ 22:24

The one factor that doesn’t seem to be mentioned (or even considered) is market size.  Granted part of the problem is getting historical market sizes (good exmaple is the change in Cleveland and Pittsburgh actual population).  I guess my assumption would be that all things being equal, KC (29th in market size) will have a much harder time drawing than a mid-market like a Houston or Texas.


#10    Charles Saeger      (see all posts) 2010/03/03 (Wed) @ 11:57

Oh, there was an article dealing with this topic in one of the last Big Bad Baseball Annuals. The author (I think it was Brock Hanke) looked at opening day payroll and found that to be a big deal as well.


#11    Tangotiger      (see all posts) 2010/03/03 (Wed) @ 12:05

Interesting.  Payroll could be used as a proxy for “hope”, over and above whatever their wins were the past 2 years.


#12    Peter Jensen      (see all posts) 2010/03/03 (Wed) @ 12:07

The market size problem that Tom Kniker mentioned in Post#9 also brings up the problem of confusing correlation with causation.  A larger market size in all probability leads to larger average attendance which likely leads to more available funds for investing in better players which may lead to more playoff appearances and World Series wins.  Such a scenario would negate some of the causal conclusions that Sky is drawing from the positive correlations in his regression analysis.


#13          (see all posts) 2010/03/03 (Wed) @ 12:08

Thanks for the comments…

Tango/6, When you do the transformation, you are trying to reduce heteroskedasticisty in the data.  If you don’t do that, the data with a lot of variability (modern data) will get far too much weight compared to data with a small amount of variability (data from 1910). 

You’re right, that you’re no longer minimizing the square of the actual points, but when the variability varies a lot, you don’t actually want to do that.  For regression to work properly, you need the SD of the error to be constant across all points of your data, and the log transformation does this.

Donald/8, The analysis is just one multi-variate model - sorry if that wasn’t clear - otherwise the analysis would be worthless for reasons you mention.  As for ticket prices, they shouldn’t have a dramatic effect (see the link in #1 showing that they are not statistically significant).  As for finding the optimal ticket price, that’s a whole other issue.

Tim/9, Definitely market size would be a factor.  Market size should be caputred in the team “brand” random variable, so the omission shouldn’t confound the results however.  It would be interesting to look at that on its own.  I would suspect your theory is right…


#14          (see all posts) 2010/03/03 (Wed) @ 12:12

Peter, That would be a problem if I didn’t control for team.  However, by including the team as a random variable, we avoid this possible confounder.


#15    Tangotiger      (see all posts) 2010/03/03 (Wed) @ 12:22

"For regression to work properly, you need the SD of the error to be constant across all points of your data, and the log transformation does this. “

Which is why I suggested indexing.

***

Thanks for reminding me, as I will give you an example of why logs don’t work.  After lunch.


#16    Fargo      (see all posts) 2010/03/03 (Wed) @ 14:27

"Making the playoffs the year before raises a .500 team’s attendance by about 3,000 fans per game - a major boost. Obviously making the playoffs raises hype around the team, and this appears to manifest itself in the form of increased attendance.”

Making the playoffs the previous year, winning the league, winning the world series all matter (in a study I did some years ago). But it’s probably not just a matter of “raising hype.” It’s a matter of raising season ticket sales. Those sales are probably driven much more by previous season’s record (specifically making playoffs, etc.) than by current. 

So a refined analysis—if data is available—would look for lagged effects of wins (making playoffs, etc.) on season ticket sales and for in-season effects of current wins on walk-up or game-day sales.


#17    Charles Saeger      (see all posts) 2010/03/03 (Wed) @ 14:40

TT/11: Yes. The equation he had included wins this year, wins last year, and opening day payroll.


#18    Guy      (see all posts) 2010/03/03 (Wed) @ 15:10

Sky:
You report the impact of each variable as a number of additional fans per game generated.  But to be slightly more precise, don’t your coefficients really tell you the percentage impact?  That is, when you say making the prior year’s playoffs is worth 3,000 fans per game, your equation is actually saying it boosts attendance by 13%.  So it might be worth just 2,000 fans to the Pirates, but 4,000 fans to the Yankees. Isn’t that right? 

However, the log model assumes the impact will be multiplicative (+13%) rather than additive (+3,000).  So I was wondering if you’ve looked at the data to see which approach produces a better fit?


#19    Fargo      (see all posts) 2010/03/03 (Wed) @ 15:18

Doing it with # of fans is the most reasonable, it seems to me.  When you use population size (in the metro area) as a predictor, you can also make an adjustment for whether there are competing ML teams in the region (LA, NY, Chicago, specifically). Don’t split the population, just put in a dummy for “two teams”.


#20          (see all posts) 2010/03/03 (Wed) @ 15:25

Guy, You are right.  I just converted them into numbers for the average team for 2009.  Describing it the way you did probably would have been smarter. 

But, yes, the way I have modeled it, the boost is worth more (in absolute terms) to the Yankees than the Pirates, and more to the average 2009 team than the average 1960 team.  This makes intuitive sense to me which is why I went with that approach.  I’ve tried running a regular linear model as well, although it’s difficult to directly compare the two different fits.


#21    Guy      (see all posts) 2010/03/03 (Wed) @ 15:37

A multiplier effect seems more intuitive to me as well, but it would be nice to know for sure.  (And I suppose the answer could be different for different variables.)

But I have a problem with the log model and the assumption of a multiplier effect when it’s used for player salary models (as it often is).  It doesn’t make sense to assume that playing catcher, for example, increases salary by X% --the impact should be a fixed amount (the dollar value of the position adjustment).  Same for any given quantity of offensive production, which has a specific dollar value—it doesn’t magnify the value of a player’s other contributions.



Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 22:49
Clutch analogy

Feb 11 22:08
Who is Jeremy Lin?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul