THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, May 04, 2009

Additive park factors

By Tangotiger, 10:12 AM

Justin presents Patriot’s park multipliers as park addends.  I’m pretty sure the correct answer will end up being odds ratios, treating them as addends should be done just as often as multipliers. 


#1    Dackle      (see all posts) 2009/05/04 (Mon) @ 12:46

Baseball Hacks (page 400) has a nice method for computing power ratings and park factors using linear regression. I haven’t tried it in Excel, but I think you’d do it something like this—columns 1-30 for the offence, columns 31-60 for the defence and columns 61-90 for the park. Column 91 is the runs scored. Every game gets two rows. So if the Blue Jays beat the Orioles 4-3 in Toronto, then the Jays offence + the Orioles defence + Toronto’s park = 4 runs, and the Orioles offence + the Jays defence + Toronto’s park = 3 runs. So, in the first row, you’d put a 1 under Toronto in the offence section (rows 1-30) and zeros for the other 29 teams, a 1 under Baltimore in the defence section (rows 31-60) (and zeros for the other teams) and then a 4 in row 91. Do the same for the defensive side of the game (ie Orioles offence + Toronto’s defence + Toronto’s park = 3 runs). Repeat the process for all of the league’s games in the season, and then run your regression (although probably easier to read the article in Baseball Hacks and install R on your computer).

The nice thing about this approach is that it takes care of schedule effects and the problem of the team not playing road games in its own park. Baseball Hacks calculates a multiplicative park factor, but I skipped the step of converting runs scored to logs, so the numbers presented below are additive park factors.

Here are the rankings by league, with the 2008 park factors from the Bill James handbook for comparison (based on the classic runs scored at home by both teams divided by runs scored on the road). Note that the program uses Anaheim as a base, so the Angels park is rated 0.00 and every other team is scaled off that number.

AMERICAN LEAGUE

Team  Factor  James
TEX   +0.65    114
CHA   +0.37    112
DET   +0.30    108
BOS   +0.24    108
NYA   +0.14    104
BAL   +0.13    105
ANA    0.00    102
CLE   -0.01     99
TOR   -0.08     96
TBR   -0.20     96
OAK   -0.42     92
KCR   -0.42     93
SEA   -0.43     93
MIN   -0.68     89


NATIONAL LEAGUE

Team  Factor  James
ARZ   +0.17    113
COL   +0.16    113
ATL   +0.02    106
HOU   -0.07    105
CIN   -0.08    107
CHN   -0.11    107
PHI   -0.13    103
WAS   -0.17    104
SFG   -0.19    105
FLA   -0.53     95
NYN   -0.55     95
STL   -0.55     94
MIL   -0.61     94
PIT   -0.75     90
LAN   -0.96     84
SDP   -1.25     80

In the traditional approach, the park factor for a team like the White Sox is boosted because they play a disproportionate number of road games in pitchers’ parks (eg 18 games last year in KC and MIN, another 3 in Dodger Stadium), whereas a team like Toronto faces the opposite problem (27 games in Bos/NY/Bal, 10 in Texas and Chicago vs 3 “road games” in Tex/Chi for the White Sox—ie Chi doesn’t play a road game in its own park).

On a related note, if anyone is looking for the proper regression factors, I just wanted to throw this method out for consideration—dump into Excel all of home and away runs scored and allowed data from Lahman, work out the simple park factors for each team/year, and then use LINEST to predict next year’s park factor based on this year’s park factor. I did this a few years back rather hastily, but came up with the following (year1 is last year (eg 2008), year2 is two years ago (eg 2007) etc):

Based on one year of data: Park factor = (.519 x Year1) + .482
Based on two years of data: Park factor = (.367 x Year1) + (.291 x Year2) + .343
Based on three years of data: Park factor = (.319 x Year1) + (.228 x Year2) + (.167 x Year3) + .286

So, if the simple park factor was 1.12 in 2008 and 1.07 in 2007, the regressed park factor for 2009 would be (.367 x 1.12) + (.291 x 1.07) + .343 = 1.065. Using only one year of data, the regressed park factor would be (.519 x 1.12) + .482 = 1.063.


#2    MGL      (see all posts) 2009/05/04 (Mon) @ 13:11

I think that odds ratio will get you much closer to the the correct way to apply a park factor to players than using an additive method that Justin sort of advocates.

I also think that if you have a choice only between multiplicative and additive, the former is the best choice by far.

As much as I don’t like regression models for these sorts of things, I’d like to see one for predicting player stats, similar to the one that Dackle shows us above.  That would give us a better idea as to whether additive, multiplicative, odds ration, or some combination is best.  I think you would have to play around with a best fit equation though…


#3    Tangotiger      (see all posts) 2009/05/04 (Mon) @ 14:04

Good job.

Dackle are you actually producing a park+league factor?

The mean for his AL teams is -.03 runs, and -.35 for the NL teams.  So, if you want to get the mean to be 0 for each conference, subtract each team by the appropriate baseline.  e.g., Texas is +.68 and San Diego is -.90.

***

Also, as discussed in the past, we should not look at runs, but rate state, like OBP.  The reason is that “runs” is really a combination of OBP and SLG, and it works in a non-linear fashion.  If LOB counted as partial runs, we wouldn’t have this issue.

That is, if a park adds +.01 HR, 3B, 2B, 1B, BB per PA, the net effect is not half as many runs as a park that adds +.02 of those events per PA.

So, only for runs per game do you actually want to use the multiplicative method.  For the component stats, the additive method is better.


#4    jinaz      (see all posts) 2009/05/04 (Mon) @ 14:22

That is, if a park adds +.01 HR, 3B, 2B, 1B, BB per PA, the net effect is not half as many runs as a park that adds +.02 of those events per PA.

So, only for runs per game do you actually want to use the multiplicative method.  For the component stats, the additive method is better.

Well, crap. 

So, your recommendation would be to either

a) use additive component factors, plus some kind of adjustment for PA’s in hitter vs. pitcher parks.

or

b) convert RAA data to absolute runs, apply the multiplicative park factor, and then convert back to RAA?

Just trying to figure out how to do this in an acceptable way.
-j


#5    Dackle      (see all posts) 2009/05/04 (Mon) @ 15:54

Dackle are you actually producing a park+league factor?

Tango, yes, I just dumped the scores for every game last year into R, so it includes the interleague factors.

Another nifty thing you could do is dump in five years worth of games, and if the park changes then it’s considered a new park. In fact you could dump 30 years worth of game scores into R and maybe have 50 or 60 parks, and it would accurately compute the park factor relative to all other parks during its existence.


#6    MGL      (see all posts) 2009/05/04 (Mon) @ 19:57

I don’t think that the additive method for components works better than the multiplicative one. For example, I tested it for home field advantage and the multiplicative (which is essentially the same as an odds ratio) method worked much better.


#7    tangotiger      (see all posts) 2009/05/05 (Tue) @ 04:26

Brian Cartwright recently did work on this, showing park factors by player quality, and the additive ones turned out better.

We had a thread on this a month ago…


#8    MGL      (see all posts) 2009/05/05 (Tue) @ 20:56

I remember Tango, and the analysis he did is fraught with problems.  I can easily show how multiplicative or odds ratio is not only better but much more appropriate, which I guess is the same thing.  I’ll see if I can put together some numbers later on tonight…


#9    Brian Cartwright      (see all posts) 2009/05/05 (Tue) @ 23:04

mgl, is it mine that is fraught with problems? I do welcome constructive criticism. I have not yet finished that study - I see the effects on homeruns (which I believe are a unique category) but I have not yet been able to find the math which will give me a useful way to say how much each player’s HR rates are effected by each park, by knowing how each and every batter performed in each park.

I prefer using log5, which is most similar to odds ratio.


#10    MGL      (see all posts) 2009/05/06 (Wed) @ 00:04

Brain, what I meant to say was that any analysis that attempts to see how each park affects different classes of talent (like hi, low, med HR) is difficult and fraught with potential problems - not your analysis necessarily.

For example, let’s say that you looked at players who had hi HR in one year and you wanted to see how they were affected by parks with hi and low HR park factors.  For one thing, those players will tend to play in hi HR parks in the first place, so you would want to use “out of sample” testing. Another problem is that when you test low HR (or low triples, for example) players, you have lots of players who hit zero HR’s and that really screws things up.


#11    Tangotiger      (see all posts) 2009/05/06 (Wed) @ 06:29

The Odds Ratio Method is certainly the best way to go.

Given the choice between the two inferior methods (Multiplicative, Additive), Additive is likely the much better approach.  Yes, you pay a price for it with guys with 0 or close to 0 HR in a HR park (Juan Pierre, Coors).


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 01:57
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages