THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, December 25, 2008

The latest in park factors

By Tangotiger, 12:50 PM

From the man with the many handles…


#1    Adam      (see all posts) 2008/12/25 (Thu) @ 14:03

Is anyone else very surprised that 18 teams have more fair area than Petco Park?


#2    MGL      (see all posts) 2008/12/26 (Fri) @ 04:20

Don’t know how he computed his fair areas, but I used the drawings from Clem’s baseball site and a computer tracing application.  I got around 111,000 for Petco, which is large.  There are only 6 larger parks, ARI, COL, STL, NYY, and TEX.

Petco, in addition, is at sea level, and is cold and damp (even though humidity makes the air less dense, it also makes the ball softer, etc.).  Plus, the critical area for HR rate, and therefore runs scored, is in the alleys.

Also, I don’t think that relative humidity is as important as absolute humidity, but I am not sure.  In my research, I have found no correlation between humidity (relative or absolute) and fly ball distance or run scoring.

Is this guy using runs scored at home with no adjustment for the team’s offense and defense/pitching (and the offense and defense/pitching of the average opponent)?  And what is “original” in his charts?

Why would be using runs scored at home with no adjustment for the home and road team’s offense and defense (if that is what he did)?  That seems odd.  What is the point of doing these nice regressions (or THIS nice regression, I should say) with really nice temp, altitude, etc., data while using lousy run scoring data?  Am I missing something?


#3          (see all posts) 2008/12/26 (Fri) @ 12:21

More important than the total area of the field is the area where the ball is most likely to be hit - there is not an even geographic distribution of batted balls.

You can move your CF fence in from 450 to 440 and it won’t matter a bit because hardly anyone hits the ball that far, but move the corners in 5 feet and see the HRs spike (such as Comiskey II).


#4    Rally      (see all posts) 2008/12/27 (Sat) @ 20:58

I had the same thought as MGL.  This is a great regression data but if you go through all the trouble to account for that, please don’t stop at such a rough approximation as runs scored.

Compare it to the team’s park factor.  Better yet, look at how these variables predict HR factor, XBH factor, BABIP factor, etc.


#5    Sky      (see all posts) 2008/12/28 (Sun) @ 21:15

I provided TR/Jeff with Patriot’s park factors.  He’s updated the study with those plus some additional variables.  The plan is to bump up the level of statistical analysis (p-values for each variable, etc) and re-publish.  The r^2 sits at about .7 right now between the model and actual park factors.


#6          (see all posts) 2008/12/29 (Mon) @ 01:24

Here are some answers and I hope to have an update to the original article, but with Xmas, a 2 year old and 2 month year old, not baseball writing getting done.

Here are the current factors I am using:
Opponents Errors per game (toughness of stadium to field - i.e. Fly balls in Metrodome)
Humidity %
Foul Area
Average Wall Height
Elevation (ft)
Average temperature
Left Field
Left Center Field
Center Field
Right Center Field
Right Field
Y – Wind (positive # is wind to CF)
X - Wind (positive # is wind to RF)
Surface

#2 I computed the areas from from the discussion here a few weeks back were Tango asked how to calculate park size.  I am no longer using it.  I am now using the 5 OF park measures (lf,lc,c,rc,rf). It seams that the parks that are short in the corners and deep in CF score more runs.(ie. a park at 360, 400, 440, 400, 360 scores more than one where all distances are at 400 even though they both average 400).

RH – From what I have read RH should have no effect on on distance the balls travels. What I have seen quoted a couple times is that a dryer ball comes off the ball faster therefore the players have less time to react. I have not been able to find any actual study/article to reference this to.

I plan not using runs scored any more when running the regression, just park factors.  I might adjust the PF to runs after the fact, since it is easier to explain to most people that a 10F difference leads to .5 runs vice .7 points of park factor.

#3 Agree, see first part of #2

#5 Sky thanks again for the help.

If any one else wants an updated spreadsheet let me know. 

When I originally published the piece, I was just looking to see if I missed any factor.  As the article states, I ran PF without decimal places and that was plain ugly.  Now I have all the numbers I have ever wanted.  I might publish spreadsheet and then anyone can plug in their own data (I need to double check and few facts and make an article on how to run multiple regression in Excel).


#7    MGL      (see all posts) 2008/12/29 (Mon) @ 03:24

Jeff, very nice.  Great job.  Look forward to the results.


#8          (see all posts) 2008/12/29 (Mon) @ 12:06

Question - I would like to do all the factors averaged averaged over the last 3 years.  The only problem is the new National’s stadium.  Should I just not include it in the data, only use a year’s worth of data for it only (and 3 for the rest) or do just one year’s for all the stadiums.


#9    Rally      (see all posts) 2008/12/29 (Mon) @ 14:30

I would either leave it out, or do one year factors for that stadium.  Definitely don’t let one new stadium keep you from using multiyear factors for the other 29 teams.


#10    MGL      (see all posts) 2008/12/30 (Tue) @ 15:43

Jeff, that is a good question.  The longer the park factors are based on, the better the regression I would think.  I don’t know what “mixing” length for the park factors (e.g., one year for the Nationals, 2 years for another stadium, 10 years for another, etc.) will do to the regression.  Someone with a strong statistics background needs to answer that.

One of the problems with using long-term PF’s is adjusting for the other teams/parks in the league.  That can be handled though.  Definitely better to use those long-term factors (10 years or even more) in these types of regressions especially since you are using average temperature and wind (I think - or are you using the actual temp and wind for that year or years?) at each park.


#11    Jeff      (see all posts) 2008/12/30 (Tue) @ 15:58

I think I am going to run 3 years for all but Washington and 1 year for them.

I am planning on adding a whole section on how a person can run the analysis themselves and have a nice spreadsheet, that if you open it in OpenOffice, input your own PF, the final equation and weights get automatically generated.

I can tell there is plenty of interest in the subject, but there is one more large project I want to get done this off season, a manager rating scorecard, so if it seems I am not going as deep/thorough as I can/should, your right. 

BTW - It looks like the “new Shea stadium” will have less runs scored there than old one.


#12          (see all posts) 2008/12/30 (Tue) @ 17:06

When I do my park factors, I identify “versions” of each, and the number of years for each park’s factors equals the number of years each version was in effect. For Wrigley, it’s 56 years, for Nationals Park, one.

Per MGL #10, in the first pass I do matched pairs, holding the teams constant, weight to the smaller PA, and sum each bucket. Once the first pass factors are computed, these are applied to each teams road stats, park by park, and then rerun, and then once again, for three total passes.


#13    Jeff      (see all posts) 2009/01/07 (Wed) @ 17:41

Finally finished the Factors for Park Factors and it is viewable on my blog.

There are some technical difficulties with adding it over at BtB, but once they are figured out I hope to add it there also.

http://jeffsqanda.blogspot.com/2009/01/what-factors-have-effect-on-runs-scored.html


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 22:49
Clutch analogy

Feb 11 22:08
Who is Jeremy Lin?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul