Awesome, I’ve been waiting for something like this for awhile. I’m going to read it over the next few days when I have some time…
I was poking around the 2008 Texas Rangers page on BB-Ref, and things didn’t seem to be adding up for me. I looked at the top of the page and figured out why:
Park Factors
(multi-year): Batting - 100, Pitching - 100
(one-year): Batting - 97, Pitching - 98
Over 100 favors batters, under 100 favors pitchers.
What in the world!? My initial reaction was maybe Sean’s formulae are a bit too far from the state of the art. But then I looked for this thread and found that MGL’s park factors for The Ballpark have it as basically neutral. David Gassko’s recent spreadsheet doesn’t look like it agrees, but it would take me a while to add up the components to a get a better idea.
Has this been discussed somewhere at length? The Ballpark in Arlington, or whichever of its subsequent names it’s on now, was, in my mind, a pretty extreme hitter’s park.
No, no, the park is still a very good hitter’s park! HR’s to left and doubles to right are the biggest culprits.
I basically take my component park factors and then do a component overall run factor (like a component ERA). The run factor for Arlington is 1.04, which is the third best in the AL behind the White Sox and Fenway.
Park factors are so sensitive to the data, as well as weather factors, not to mention team personnel, 3-year factors are almost useless. Let’s not even talk about one-year factors.
I use every year a park has been in existence, up to 15 years.
I adjust for the schedule and for the “other parks” (e.g., when COL came into the league in 93, all other parks became more of a pitcher park), so that is not a problem.
Plus I regress each of the components appropriately (the best I can), and I regress them towards a different mean. For example, I regress each park’s HR factors to right and left to a number which is commensurate (more or less) with the size of the field, height of the fence, altitude, and wind and weather patterns (I use the average fly ball distance factors as a proxy for weather and altitude).
For foul ball terr., which is important, I use the actual size (in square feet) of the foul territory to adjust (regress) the sample data.
Basically almost any park factor you see in any source is crap.
Except mine, of course!
Oh, and of course, I adjust for changes in each park, like in 06, SD shortened RC a little, and PHI moved LF out and raised the fence a little, Coors starting in 06 is different from before than, and completely different from pre-humidor days, Dodger stadium has been removing foul terr. since 05. Etc.
I consider these to be the best “true” run park factors (as of 07) you will find for each park. Remember that each park is compared to only those parks in their league. So a park in the NL with a run factor of 1.0 is NOT necessarily the same as one in the AL with the same run factor, although it might be.
ARI 1.08
ATL 1.00
CHN 1.05
CIN 1.04
COL 1.10
FLO 1.00
HOU 1.06
LAN .99
MIL 1.02
NYN .96
PHI 1.07
PIT .98
SDN .92
SLN .98
SFN .96
WAS ?? (I have an estimate based on dimensions and the like. I just don’t have it in front of me right now.)
ALA 1.00
BAL .99
BOS 1.05
CHA 1.06
CLE .99
DET .97
KCA 1.00
MIN .99
NYA .97
OAK .98
SEA .97
TBA .99
TEX 1.04
TOR 1.02
OK, I was waaay stupid on this. I forgot that the Google Doc on this thread was showing the OTHER parks, not the actual park! Just skipped forward to the data because I for some reason thought it was basically the same as DSG’s that came out a month later. I know 1.04 ain’t neutral at all.
What are the chances Sean changes his PF’s in the near future with a little nudging?
You’d have to ask him (Sean). Change to what? He is using the traditional formulas and 3-year factors, no?
Change to a more rigorous methodology/formula. I guess I should say he should update or enhance his formula, rather than simply change it. If people everywhere are looking at Rangers’ OPS+’s that are based on them playing in a neutral park, I think that’s an issue. (Putting aside whether OPS+ and ERA+ are themselves worthwhile.) I don’t have anything particular in mind. I think he’s just using the same formula he’s had in place for what, five plus years? The body of literature on park factors since he implemented his is pretty substantial. His 3-yr formula has Texas as a neutral park in 2007, not just 2008.
http://www.baseball-reference.com/about/parkadjust.shtml
I’m not sure I want to suggest to Sean that he completely overhaul his park factors. He has, I’m sure, hundreds or thousands of other things on his plate. But some regression and use of component data instead of or in addition to runs wouldn’t hurt. To his credit, he does have an innings pitched correction, which I had forgotten about (I thought he just used R/G instead of R/Out, when the latter is what we want).
I don’t know that he is using anything worse than anyone else is using, and there is nothing particularly wrong with that he is using, it is just that using 3-year park factors (at most) created by runs scored alone, and even then, splitting that into batters and pitchers only (I think, see my discussion below), is going to lead to lots of mistakes. And not regressing whatever numbers you come up with is REALLY going to lead to as lot of mistakes.
Plus, I am not sure what he means by batter and pitcher park factors. You DON’T want to compute those separately and it seems like that is what he is doing. There is no such thing as a separate batter and pitcher park factor.
A batter and pitcher park factor, as originated by Palmer (I think) was/is a misnomer. What he did was to combine a park factor and an opponent factor (that fact that a team’s batters don’t hit against their own pitchers and vice versa).
The “batter and pitcher” park factors Palmer computes have NOTHING to do with the park factors per se, at least the “batter/pitcher” part. The batter park factor is a combination of a park factor and adjusting for a team’s batters’ opponents and vice versa for the pitcher park factor.
It seems like Sean is doing something else with his “batter and pitcher” park factors, but I am not sure. As I said, if he is just somehow splitting up the batter and the pitcher data (I am not even sure how you would do that, and certainly there is NO reason to do that), that is completely wrong.
Right, they (Sean, Pete) are doing “park + opponent”. Park factor is a bad name, but, chalk that up to a whole list of bad names out there.
For clarification, the way I did the year to year changes in each component (the second two charts) was to simply subtract the old park’s PF (for each component) from the new one’s PF. If it is an additional park/team, then of course I subtracted 1.00 from the new park’s PF.
For the total changes from 93-07, I simply added up all the yearly changes.
To compute the “other park factors” to see what the unbalanced schedule does to each team, here is what I did:
Every time a team plays in its home park, I add 1/15 of its home PF (for each component of course) to the “tally” for that team. (Technically, it should be 1/14 for the AL and 1/16 for the NL.) For each time that a team plays on the road, I add in the road team’s PF to the “tally”. Then I divide the tally by the number of games played on the road plus 1/15 of the number of games played at home. Again, I exclude all inter-league games.
If a team played all other teams in the league equally, this number should always equal 1.00 for all components. But they do not play each team equally, so the number represents the total, composite PF (for each component) of the “league” that each team plays in, where “league” means the actual teams they play against and the number of games played against each team.
Also, keep in mind that since these “OPF’s” affect a team’s PF, I used a recursive process to first compute each team’s (multi-year) PF’s, with no OPF adjustments, then adjusted them for the unbalanced schedule (using the first set of OPF’s), then redid the OPF’s, then redid the adjusted PF’s, then redid the OPF’s, etc. In reality, you only have to do that a few times, before everything settles in.
Remember that the first two charts (NL and AL) represent the composite PF’s for the “league” that each team played in in 2007. They do NOT contain each team’s own PF’s.
The second two charts (NL and AL) represent the changes from year to year (when parks are added or changed) to the whole league, with the last line being the total accumulated changes since 1993. Again, they do not contain any team PF’s.