THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Saturday, September 19, 2009

At what point would you prefer road stats only?

By , 06:55 PM

(and a portion, around 1/15 or so, of the home stats of course.)

I have thought about and written about this before, and this piece by DC got me thinking about it again.

Assuming that park adjusting home stats are problematic because we think that parks affect different hitters differently, there must be a point at which it is better to use road stats only (adjusting for HFA of course) plus a small portion of home stats, rather than using all of the home stats after park adjusting them.  This is with respect to “neutralizing” a player’s stats of course, in order to determine his context-neutral true talent.

What point would that be?  IOW, how many seasons or PA?  Would you only even think about doing that for non-traditional parks, like Coors or Fenway?  Would it matter if the player had extreme splits or not?

Or perhaps it is correct to always weight the road stats more heavily. If yes, by how much.  And again, would you weight or weight them more heavily for players playing in unconventional parks or for those that had unusually large or small splits (for their home park)?


#1    pft      (see all posts) 2009/09/19 (Sat) @ 21:07

Maybe hitting well in your home park, any home park, is a skill, and some do it better than others after adjusting to a parks unique features, which may adversely affect them on the road.

Also, many factors go into a players road performance, travel (example, Seattle has tougher travel than any other team), night life, etc.  Some travel worse on the road than others, it’s not that they hit so much better at home, it’s that they are much worse for wear on the road and hit worse.

BTW, we would like to see road splits for the defensive stats for the same reason we need it for hitting, especially for OF’ers.


#2    MGL      (see all posts) 2009/09/19 (Sat) @ 21:51

I don’t know that you “need” splits for defense.  I don’t provide them to Fangraphs because I don’t want people to make more of them than they deserve.

While park factors for UZR are a little tricky, especially in parks like Fenway and Coors, you don’t have the same problems with them as you do with offense, which is that parks affect players differently on offense.  Plus, even though I use park factors on the IF, they are not that important. The last thing I want is for people to start quoting home and road UZR splits to justify some silly argument.


#3    Nick      (see all posts) 2009/09/20 (Sun) @ 00:47

I agree.  People already abuse 1 year samples of UZR.  Can you imagine what would happen if FanGraphs intentionally showed even smaller samples?


#4    Anonymous coward      (see all posts) 2009/09/20 (Sun) @ 09:10

While park factors for UZR are a little tricky, especially in parks like Fenway and Coors, you don’t have the same problems with them as you do with offense, which is that parks affect players differently on offense.

Is that true? If I remember correctly, you wrote an article some years back saying that speedy outfielders were better in big outfields. And then wrote another one refuting yourself (http://www.insidethebook.com/ee/index.php/site/comments/i_could_use_some_help_with_some_data/). It would be more proper to say that you don’t really know.

Btw, have you tried running the same study with one of the inputs being Fans Scouting Report’s Speed rating instead of the baserunning based speed score? Since baserunning instincts and team policy affect baserunning aggressiveness and success rates, the FSR should provide a purer measure of speed.


#5    Xeifrank      (see all posts) 2009/09/20 (Sun) @ 11:26

Regarding how to treat the home vs road stats, what method would you use to test if your hypothesis was correct, or one method more correct than another?
vr, Xei


#6    MGL      (see all posts) 2009/09/20 (Sun) @ 12:34

"Btw, have you tried running the same study with one of the inputs being Fans Scouting Report’s Speed rating instead of the baserunning based speed score?”

No.  But speed is speed.  I don’t think you are going to get any different results regardless of how you come up with speed ratings.  IOW, if you used the fan ratings to determine speed and you then used a conventional “speed rating” formula (triples, SBA, etc.) to determine speed, I would guess that you would have 90% the same players in your “fast” and “slow” groups.

“what method would you use to test if your hypothesis was correct..”

What hypothesis?  That park effects are not as important for defense as for offense?

I suppose you could look at y-t-y or intra-class correlations for the home and road splits.


#7          (see all posts) 2009/09/20 (Sun) @ 12:53

Shouldn’t the strategy be to create better component factors that take into account a player’s batted ball profile? 

Because we don’t really understand what it is exactly that makes the home team win 54% of the time, we definitely don’t know why it affects some players more than others.  Maybe its adopting a style of hitting that fits your ball park, maybe its just feeling more comfortable at home and playing better because of it. It’s most likely a combination of many of these factors, just probably in different proportions for different players.  I think if you use home/road splits, we don’t really know what we’re measuring.


#8    Xeifrank      (see all posts) 2009/09/20 (Sun) @ 16:24

#6, my question refers to the original blog entry not the stuff other people were asking about defense.  How would you know if 1/15th or x/15th is the correct weighting for home HITTING stats?  How to measure?
vr, Xei


#9    MGL      (see all posts) 2009/09/20 (Sun) @ 23:41

The 1/15 was just if you were using road stats only.  1/14 in the AL and 1/16 in the NL. If you are using road stats only, you want to include some of the home stats so that for every player, every park is represented (the imbalanced schedule not withstanding).  That is all I meant.  I wasn’t trying to comment on how much to weight the home stats.

#7, HFA (home field advantage) and park factors are two different things.  I was mainly referring to the disparate impact that park factors have on players, not the HFA.  Even though we don’t really know what creates the HFA, I don’t think that it has much of a disparate impact on players.  IOW, I am comfortable assuming that all players get around the same advantage from playing at home, or the same disadvantage from playing on the road.  What I am not comfortable with is applying the same park factors to all players.

Sure, one way of mitigating the park factor problem is using component park factors. I don’t think too many serious forecasters DON’T do that.


#10    Jar Jar Binks      (see all posts) 2009/09/21 (Mon) @ 00:02

On average this would work, but a rather big problem is that road and home stands happen in relatively big chunks.

You’re just going to up the standard deviation of already noisy numbers by a pretty significant amount.

You may cut out an entire “hot streak” from a player’s season that is completely unrelated to park effects.

Baseball seems like too volatile of a game for this to work on anything beyond career numbers.


#11    MGL      (see all posts) 2009/09/21 (Mon) @ 02:47

#10, yes of course you are cutting your sample size in half, which is generally not a good idea.  I don’t know that there is a reasonable point at which road stats only are better, but there may be.  Off the top of my head, if a player plays in a relatively typical home park, I wouldn’t even consider it, but if a player plays in an unusual home park, there might be a point at which road stats only, even with the smaller sample size, gives you a more reliable indication of his context-neutral value/talent. 10 years?  I don’t know.  Maybe never.  I wouldn’t be surprised if the best answer is to use road stats weighted more heavily than home stats, regardless of the number of years, again, especially for players who play in unusual home parks, like Coors, Fenway, or even asymmetrical parks like the Giants’ stadium, whatever the heck it is called these days.


#12    Colin Wyers      (see all posts) 2009/09/21 (Mon) @ 10:12

I guess my question is, better for what?


#13    Tangotiger      (see all posts) 2009/09/21 (Mon) @ 10:38

My question is why would anyone take the handle “Jar Jar Binks”, and think we can actually take anything he writes seriously? 

If you were to come up with the most useless characters in a movie of all-time, you’ll be hard-pressed to top “JJB”.

(The guy who played JJB is a funny guy in Stomp.  If you watch Sesame Street, you can see him there too when Stomp guest-starred.)


#14          (see all posts) 2009/09/21 (Mon) @ 13:45

blog.oregonlive.com/behindblazersbeat/2009/01/snoozing_to_stop_losing.html

Semi on topic.  Related to secondary effects of being on the road.


#15          (see all posts) 2009/09/22 (Tue) @ 23:18

Tango: You ask how anyone can take me seriously? My answer: Irony.

99% of the people that comment on blogs do about as much for the post they are commenting on as Jar Jar did for Star Wars. I mock them in ways they’ll never know.

You know you’re just jealous you didn’t come up with it.

--

MGL: It’s unfair to say that all you are doing by dropping home games is cutting the sample in half.

Think of a player’s performance game by game as a sort of random walk that staggers above and below a player’s true talent level. It’s obviously not a true random walk, more a random process that is noisy around a true value and also has groups of steps correlated with each other. An autoregressive model of sorts. If those last two sentences didn’t mean anything to you, just ignore it.

Because you can have a few steps that are correlated with each other, it would be one thing to randomly randomly remove observations on this random walk. This would probably increase the variance a bit.

But, what you are talking about doing is deleting 9 consecutive observations here, 13 there, etc. The fact that consecutive observations are correlated means that you could really be skewing things.

Just sayin’.


#16    Tangotiger      (see all posts) 2009/09/22 (Tue) @ 23:54

JJB: in that case, it’s brilliant!


#17    MGL      (see all posts) 2009/09/23 (Wed) @ 01:27

JJB, not exactly sure what you mean, but I agree (if that is sort of what you meant) that I am not randomly removing half of the sample.  I am definitely creating a biased remaining sample.  The attempt of course is to leave a sample that is less biased, but that is not always going to be the case. 

I would strongly suspect that using road stats only plus a small portion of home stats (in order to represent all parks on an equal basis) better represents a player’s true talent (on the average of course) than taking half (randomly eliminating half) of a player’s total stats.  Same sample size, less bias.  That stands to reason simply because when half of your sample is from one park, that is a problem.  IOW, would I be better off with 320 PA coming from 20 PA per park or 320 PA from the same player, with 160 PA coming from his home park and 160 coming from all other parks in his league?  I think the answer is obvious.

What about 320 PA with 160 from one park versus 300 with equal representation from all parks?  Which would you prefer if you had to estimate a player’s context-neutral true talent?  320/250?  Etc.?  That was my original question and I think it is an interesting one.

And BTW, I simply asked a question and posed a problem.  I made no assertions whatsoever.  So those of you who automatically go into argue and refute mode (and there are lots of you), you’re out of luck on this thread!


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential