THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, November 22, 2011

Selective end points AND data mining AND publishing bias rearing their ugly heads again…

By , 11:29 PM

Let’s see how many posts it takes for the geniuses on BBTF to figure this one out.  So far 9 and counting…

Anyway, here is the link:

http://www.thegoodphight.com/2011/11/21/2485197/phillies-citizens-bank-park-not-a-hitters-haven

to an article which tells us that CBP has played almost neutral for the last 4 years, therefore it is now a neutral park, as opposed to the first 4 years when it was an extreme hitters’ park (around 1.07).

Let’s forget for a second how a park can all of a sudden change its true PF’s (it can’t other than by changing other PF’s in the league and even then it won’t change much - of course the “effective” PF can change - a little - with weather and with different players).

Instead, let’s do this thought exercise:

You have 30 parks with a true PF of x, y, x, etc.  I am telling you that they never change (which is actually reasonably true, as I indicated above, barring a remodel of course).  We track the observed (sample) PF’s for 8 years.  What are the chances that in the last say, 3, 4 or 5 years (you get to choose the end points) some park will show an observed PF that is quite different than its true PF AND/OR quite different than the observed PF in its first 3, 4, or 5 years?

IOW, what can we conclude about the true PF of CBP?  Not much other than its true PF is likely the un-weighted average of the observed PF over the last 8 years, regressed toward some mean (of a similar park, dimension, weather, altitude-wise, etc.).  If you want to weight more recent years slightly more than past years, I don’t have much of a problem with that, although I don’t think that any weighting is necessarily appropriate…


#1    Matt Swartz      (see all posts) 2011/11/23 (Wed) @ 08:16

In all fairness, the fences were moved back in LF in 2006, so it should be the last 6 years instead of the last 4 years. The park factor for runs in 2006 & 2007 were ranked 8 and 14, so his overall point would remain true.

He didn’t mention it probably because he didn’t expect the article to go past Phillies fans who all already remember the peculiar move to have Pat Burrell cover more ground and turn all his pulled short porch home runs into fly outs.

I’d think the last six years provide a more appropriate number.


#2          (see all posts) 2011/11/23 (Wed) @ 11:53

Right, you don’t want to be fooled by randomness into thinking something changed when it didn’t.

But ... how do you know nothing changed?  I’d be curious to see a study.  Suppose the PF changes significantly over three years.  Is the next year’s PF closer to the three-year average, or the eight-year average?

My gut guess ... if you don’t have any evidence that there were real changes, 60% would be closer to the 8-year, and 40% would be closer to the 3-year.  Expressing it differently (by regressing to the mean), that would mean maybe 70% of the sudden changes are just random, and 30% are real.

Very gut guess.  Probably wrong.


#3    Tangotiger      (see all posts) 2011/11/23 (Wed) @ 11:59

I did this some 10 years ago, but it’s relevant to the discussion:

http://www.tangotiger.net/parks.html

A park is NOT just about its configuration.


#4          (see all posts) 2011/11/23 (Wed) @ 12:01

Tango/3: Good points.  I hereby revise my estimate of randomness downward from 60%.  No idea how far.


#5    aweb      (see all posts) 2011/11/23 (Wed) @ 12:06

I’m not sure sample size can be appealed to here. 81 games, for all players on both teams, is a significant sample size. For four years, that’s two full seasons of games, or four seasons worth of team-level numbers (81 games, counted for both teams, is a single season of numbers).

Considering how often teams make tweaks to their parks (new seats that might affect airflow, changes to fences. dimensions, foul territory, the hitters backdrop, maybe even the mound), and that park effects depend on the other parks in the league (Texas looks like a pitcher’s park in a Colorado/Texas-only league), I’m not sure how much better your park factors get by looking long-term.

Since Philadelphia got their new park, just in that division, the Mets, Nationals/Expos (played in 3-4 places), Marlins (this year) also got new parks. Only Atlanta has reamined constant. Elsewhere, Colorado played with the humidor, Pittsburg got a new park around the same time, as did Milwaukee. St Louis got a new park (or big upgrades, I can’t even remember). Houston may have changed dimensions slightly as well, although I’m not sure. The NL West, with its weird combination of hitters/pitchers parks, has actually been relatively constant. That’s a lot of turnover and changes to the comparables. Park factors should be changing year-over-year as the overall baseline changes. I think it is more than fair to say that Philadelphia, compared to the 2011 baselines, was an almost neutral park for hitters. In 2008, the baselines are different, as they will be in 2012 and 2015.

It’s possible that park effects are actually estimated worse, by the current simple methods, by using more and more data.


#6    Tangotiger      (see all posts) 2011/11/23 (Wed) @ 12:29

aweb: your points are reasonable.  What we need is actual testing, evidence.

And I certainly would not characterize 81 games as being “substantial”.  That’s nowhere close to being true.


#7    schmenkman      (see all posts) 2011/11/23 (Wed) @ 12:43

mgl,

Granted that there is bias in the endpoints, which is why…

1) I showed the 2004-07 data

2) I said in the title “Is Not Playing Like a Hitter’s Park” (the URL sounds more definitive, but was generated when the draft was first created; the published post never had that title)

3) I specified in the two key, bolded headings, that I was referring to the past 4 years:
“For the past 4 years, Citizens Bank Park has NOT been a hitter’s park” and
“For the past 4 years, Citizens Bank Park has NOT been easier to hit home runs in”

You seem to be making two other points:

a) PFs don’t change, except “a little” due to weather and team composition. 
- This makes intuitive sense, but I’m not sure what you mean by “a little”.

b) 4 years is not enough time to calculate PFs
- That may very well be true, which, again, is why I was careful to be specific about the time frame analyzed.  However I wonder how many of the park factors which are published and widely used are calculated over 6 or 8-year windows.

Going back to 2006, as Matt suggests, gives us these PFs (by averaging annual PFs, rather than adding up the year-by-year runs, games, etc., but should be close):

Scoring: 1.02 (12th highest)
HRs: 1.12 (9th highest)

I stand by the key points of the article:
- “bandbox” is a misnomer
- over the past 4 years, it’s been neutral overall, for both scoring and home runs
- in addition (I alluded to this but didn’t have all the data to quantify it), over the past 4 years it has helped left-handed hitters but hurt righties


#8    aweb      (see all posts) 2011/11/23 (Wed) @ 13:05

81 games is a full season worth of data for the teams involved, essentially the equivalent of a teams seasonal stats. Over the course of a single season, what is insufficiently powered for a team-level estimate?

81 games is a terrible sample for a single position player (worse for a reliever, better for a starter). But for figuring out the relative skills of a group of players (16-18 full-time equivalents), it seems like a substantial sample to me. A single season in a single park is (roughly) the equivalent of 9 full-time seasons for a single player, isn’t it?


#9    Tangotiger      (see all posts) 2011/11/23 (Wed) @ 13:19

Sorry, aweb, but you are just wrong.  If you do a correlation year-to-year of PF, even limiting it to parks that don’t change, you are going to get something like an r=.50, or worse.

So, even though you have 6000 PA in each sample, your r is just .50.

That means we have more faith in figuring out if Ryan Braun is really good after 200 PA, as we do in figuring out if Tim Lincecum has a BIP skill after 3000 BIP, as we have in figuring out if Dodger Stadium really is a pitcher’s park after 6000 PA.

It’s not just the size of the sample.  It’s also how directly what you are measuring is linked to the metric of choice.  And it’s also, most importantly, how much variation there is among the population you are looking at (hitters for Braun, pitchers for Lincecum, parks for Dodger Stadium).

Resist the temptation to just believe that big numbers are big numbers.


#10    Nivra      (see all posts) 2011/11/27 (Sun) @ 02:25

9/tango:  The problem with taking year-to-year PF correlation is that part of the variance you get should be true variance of your measured statistic inasmuch as your statistic is expected to change each year based on team composition, league park changes, and yes, weather. 

If you could somehow control for the above mentioned factors which will impact your measured statistic, you may get r=0.6 or r=0.7 in your year-to-year correlation, which would give you a much better indication of what the correct sample size will be.

I think PF is the incorrect way to think about it.  I’d be more interested to know y-t-y correlations for the individual components of park factors.  I’d expect HR component of PF would be very volatile, but LF/RF biases and 2B component to be less volatile.


#11    Brian Cartwright      (see all posts) 2011/11/27 (Sun) @ 03:22

I did a piece at StatSpeak several years ago (no longer on the web) where I looked at how long it took park factors to stabilize. Being early on in my career, I did not use a split half method, but one I still believe is valid.

There was a six year stretch in the NL in the 1980’s where there was no change in parks, configurations or schedule. I calculated park factors for each stat for the entire six year period, then redid them in groups of one, two, three years, etc, to saw how close each were to the six year factors.

I do recall that HR/balls contacted took three years for all parks to get within 0.1 of their six year factor.

I do park factors by matched pairs of each combination of ballparks, summed over the duration of each park configuration. I then iterate five times to account for the road park’s factors, then regress.


#12    Tangotiger      (see all posts) 2011/11/27 (Sun) @ 09:19

Brian: all that matters is how much you regress.  If you’d like to report those numbers, then we can talk more productively.

Nivra: obviously (except for your guess at r=.6 or .7).


#13    MGL      (see all posts) 2011/11/27 (Sun) @ 16:04

I do recall that HR/balls contacted took three years for all parks to get within 0.1 of their six year factor.

#12, right.  Brian, I have no idea what that represents. And as I have mentioned on a previous thread, I despise the term, “how long a stat takes to stabilize.”. If you or anyone else means, “the 50% regression point,” and it is not nearly obvious that that is what anyone means, then say that. How can it possibly be that helpful for me to say, “it takes 3 years for UZR to stabilize,” when no one would have any idea if I mean that is the 50% regression point, 75% or 90% (or any one of an infinite number of possibilities” not to mention the fact that I make no mention of sample size in terms of batted ball opportunities even if I were to say “3 full seasons").


#14    Brian Cartwright      (see all posts) 2011/11/27 (Sun) @ 23:11

But I never said “how long it takes to stabilize”.

I made a specific statement of how many years it took ALL teams HR factors to get within 0.1 of their six year factors.

Stated another way, the factors are equal to one decimal place.


#15    MGL      (see all posts) 2011/11/28 (Mon) @ 04:56

"But I never said “how long it takes to stabilize”.

Brian, someone must have posted under your name, or we are speaking a different language.  This is what you said in #11 above, and that is what I was referring to, regardless of what you meant.

I did a piece at StatSpeak several years ago (no longer on the web) where I looked at how long it took park factors to stabilize.

“I made a specific statement of how many years it took ALL teams HR factors to get within 0.1 of their six year factors.

Stated another way, the factors are equal to one decimal place.”

I know what you said, and I know what it means semantically, but, as I said, it means nothing to me in terms of how much to regress given a certain sample size, which to me, is the only important thing to know.

Let’s not get into a pissing contest…


#16    Brian Cartwright      (see all posts) 2011/11/28 (Mon) @ 20:03

Damn, I even looked back at my post, and I overlooked that. It was sloppiness on my part, as I should have expected your reply. I think it’s just become part of the lexicon and I end up saying the words even when I mean something a little different.

At work a few years ago, where we construct digital elevation models from 3d photography, I was developing a macro to sample points testing the accuracy of the 3d surface. Pick a xy location, measure the z to the best of your ability, and compare that to the z calculated at that xy from the collected elevtion model of lines and points. See if the rmse exceeds accuracy standards. It’s given that with a certain height of photography we can’t measure better than 0.7 feet (random variance of measurement). Accuracy standard for rmse is 0.25 feet. So for several stereo pairs of photography, I sampled 100 points in each, taking that as my true level of error. Then took the rmse for 5, 10, 20 etc points to see how it compared to the 100 point rmse. Found it took 20 measurements to get the rmse very close to the 100 measurement value. Didn’t add any improvements to the measurement to go past 20. Therefor told map compilers they needed to check 20 random locations for each stereo model.


#17    MGL      (see all posts) 2011/11/28 (Mon) @ 20:10

Sure.  Np.  Wasn’t really criticizing you for that term. Lots of people use it and I have stated many times why I don’t like it. I typically don’t like qualitative terms when what we want are quantitative ones.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 11:53
Do pitcher’s reach back for velocity when needed?

May 25 11:33
“Why Kickstarter works”

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 10:14
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 17:04
Firefox, IE, or Chrome?