THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, September 15, 2010

B-R.com’s updated park factors

By Tangotiger, 01:55 PM

Sean uses a three-year average, where the current year is the middle of the three years.  Since we won’t know the best estimate of the 2010 park factors until after the 2011 season is over, he is using only 2008-2009.  I don’t have a big problem with this… just a small one.

However, I want to comment on Sean saying this:

Seattle has been much more of a pitcher’s park than previously

“Has been”.  I’m sure Sean wasn’t looking for precision in his statement here, so let’s be precise here:

“The performances by Seattle players and their opponents at Safeco and away from Safeco in 2010 indicates to us that Safeco is a bit more of a pitcher’s park than we otherwise thought.  This applies not only for our understanding of Safeco this year, but for all other years, to some extent.  Our estimate of the pitcher friendliness of a park, for this year, and past years, changes as more games are played.”

If the net result of this is that Yankee Stadium is in fact more of a hitter’s park than we previously estimated, and that Safeco is more of a pitcher’s park, then this means CC has pitched better than we previously estimated, and Felix not as much, to the point that the two may in fact be equals.


#1    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 14:13

Someone also reminded me that we’d be better off using component numbers, and not actual runs scored, to compute park factors.

The obvious reason is that the sequencing of events is not a park characteristic ("hey, it’s easier to hit with men on base at Chase than bases empty!"), but sequencing (i.e., large portion of luck) is a characteristic of the number of runs a team has scored.

I myself use wOBA.  I also control for the specific players in each park, rather than just presuming that Yankees players played the same amount of time at home and on the road.

I would bet MGL does this as well.


#2    LarryinLA      (see all posts) 2010/09/15 (Wed) @ 15:21

My quick reaction to the idea that this year tells us something about all previous years of Safeco is, don’t we need to be careful that park factors are relative to the average park, but the universe of parks is different in 2010 than 2005?  That is park factors are not directly comparable year to year because the meaning of 1.0 changes when stadiums are replaced.  Or does park factor correct for this?


#3    Tangotiger      (see all posts) 2010/09/15 (Wed) @ 15:27

Larry, you are right.  Read the other thread.


#4          (see all posts) 2010/09/15 (Wed) @ 15:44

I really should be telling this to Sean, but he should have a 100 game or so regression for all these factors. That will let some of the air out for the switch between number of years.


#5    Kukla      (see all posts) 2010/09/15 (Wed) @ 15:50

Never quite understood why there had to be a magic number such as “three years” in this context. Why not do an ongoing tabulation wherein you listed the results not just for three years, but also for two years, four years, one year, and five years. You multiply your total number of columns by five, but why not—it’s not as if in this Internet era you must constrain your space to meet print limitations.


#6    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 02:55

I made this note on Sean’s blog:

I agree you need regression.  Any time you are dealing with sample data, which, presumably is 100% of the time, and you are trying to find the “rrue” of something, you need to regress.

Sean, if you want, send me a file of your PF like this, and I’ll tell you how many games I would add:
year,park,B_PF,P_PF

Also, what Sean does is not really PF is it?  It includes opponents?  So, that is a problem too if you apply it over multiple years.

What you REALLY need to do is have a factor that applies only to the park, and then a factor that applies to opponents of that year.  And when you do multi-year, you do multi-year on that park-portion, and single-year on the opponent portion.


#7          (see all posts) 2010/09/16 (Thu) @ 14:49

Yes, I assume he’s still making Pete Palmer-style park adjustments, which includes not having to face your own pitchers or hitters, and yes, that must be separate.

Essentially, what we really have is a park factor, then opposing hitting and opposing pitching factors. He just combines the opposing factors with the park factors and presents the overall adjustments.


#8    Tangotiger      (see all posts) 2010/09/16 (Thu) @ 14:59

Right, it’s park+SoS adjustments, but calls it “park”.  It then doesn’t make any sense to do multi-year like that, since it presumes the SoS somehow means something year-to-year.


#9    KJOK      (see all posts) 2010/09/16 (Thu) @ 18:42

"I made this note on Sean’s blog:

I agree you need regression.  Any time you are dealing with sample data, which, presumably is 100% of the time, and you are trying to find the “rrue” of something, you need to regress.”

But why just regress park factors then?  Shouldn’t you also regress the actual batting and pitching results for the year, since they too are just ‘sample data’?

Seems like you’d be better off to simply use 1-year runs-based parkfactors, to get ‘value’, or else you need to regress EVERYTHING?


#10    tangotiger      (see all posts) 2010/09/16 (Thu) @ 18:53

You are regressing the observed park factors to get to the true park factors so that what you have left is a performance batting line based on the true talent of the player + luck.

If you don’t regress the park factor, then you are not going to get that.

You CAN also regress the performance line after you do that to get to the true talent of the player if you so wish.


#11    MGL      (see all posts) 2010/09/16 (Thu) @ 21:59

KJOK, you don’t need to regress both the underlying stats and the park factors.  Regressing the park factors is the same thing as regressing the underlying stats.  Doing both is “double counting.” You are regressing the park factor because the underlying stats are unreliable.  I suppose it MIGHT be more accurate to regress the underlying stats since each one regresses differently.  For example, if a park has a PF of 1.05 and that is because of a large difference in HR and another park has a PF of 1.05 and that is because of a large difference in singles, I guess that you should really regress the second one more.  So, actually that is a good point you made, although I’m not sure you knew you made it! wink


#12    Lee Panas      (see all posts) 2010/09/17 (Fri) @ 11:08

As I understand it, the BR park factors for the current year are based on at most two years of data and we don’t get three years until a year after a season is complete.  Since ballpark factor is a volatile measure, this is going cause numbers to bounce around a lot until the three years surrounding a season are complete.

This method seems fine for historical purposes (if you are comfortable with a three year park factor) but confusing if you are more concerned with the current year.  I’d like to see B-R also report what WAR would be without ballpark adjustments along with the ballpark adjusted WAR.

Since one or two years seems too limited to me, another thought would be to report the most the most recent three year period available. So for the beginning of 2010, you would include park data for 2007-2009.  At the end of 2010, you would use 2008-2010.  At the end of 2011, you arrive at the final historical estimate. 

Not knowing the exact methodology used, this may be impractical. It just seems you would get more stable park factors over time if you always used three years (I’d prefer 4 or 5 but that’s another issue).  I’m not sure this method would help most of us here.  However, I think it would be easier for the public to digest if they knew that three years were being used all the time rather than between 1 and 3.  It also might make changes in WAR from the current year to the year after less dramatic which I also think would be beneficial for presentation.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 20:16
Largest demonstration in Canadian history?

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com

May 24 00:16
Psst… wanna intern… somewhere?