THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, September 30, 2008

HR rates by height

By Tangotiger, 09:37 PM

Reason?  Sampling bias.  Who are the players 6’5” and greater?  And do they appear in both sets (aging)?  You only have 10% of the sample, and so, much more likely for wild swings.  Create 4 groups of 25% of the ballplayers, and I’d bet you get smooth results.


#1    MGL      (see all posts) 2008/09/30 (Tue) @ 22:33

Although you never know, I would guess most of it is sampling error.  Maybe David can provide the standard errors of those numbers so we can get a sense of how much sampling error there is.

And remember that since we are dealing with two samples, the total standard error (of the difference or ratio between the two samples) is the square root of the combined variances.

And, as Tango implies, unless you are using matched pairs, which I don’t think he is, you may have selective sampling problems.  If the environment changes (steroids, the baseball, whatever), the criteria for who plays the most innings might change.

Or David might be right about the steroids thing.  However, I don’t really buy the “tall players are not good at defense, so they have an incentive to take steroids” argument.  That sounds like a contrived argument to fit the data.

What if only the small and medium players had a precipitous drop-off in HR rates, but not the tall guys?  I could easily say, “Well small and medium guys have incentive to take steroids while the tall guys naturally hit more HR anyway.” Which is true.

Or what if only the tall guys had a large drop-off in HR but not the short and medium ones?  I could say, “The smaller guys (short and medium) are mostly playing for their defense anyway, whereas the tall guys, who are most likely playing for their offense in the first place, need to make sure that they maintain that offense (by taking steroids).”

Got to be really careful about speculation just to fit some unusual data.  I’d much rather see an hypothesis which is on firm ground first, and then an “experiment” with data to support that hypothesis or not.


#2    Rally      (see all posts) 2008/10/01 (Wed) @ 09:15

I’m surprised there are even enough players 6’5 and up to make up 10%.  Besides the 1B and DH, I think you could count them on one hand - Joe Mauer, Troy Glaus, Alex Rios, Matt Kemp…

I don’t think you’d wind up with 10% of players unless you were including pitchers.


#3    SirKodiak      (see all posts) 2008/10/01 (Wed) @ 18:26

6’5” and on 40-man roster right now according to MLB.com:

Frank Thomas
Alex Rios
Corey Hart
Troy Glaus
Chris Duncan
Derrek Lee
Adam Dunn
Tony Clark
Justin Maxwell
T.J. Bohn
Jayson Werth
Ryan Shealy
Mike Hessman
Joe Mauer
Jermaine Dye
Shelley Duncan

I did this manually, so I may have missed some.

It also doesn’t include players that have played this year and are not on a roster, like Sexton. A few of those on the list didn’t play this year. Kemp is listed at 6’2”.  If the bar was 6’4”, there would be a lot more players.


#4    David Gassko      (see all posts) 2008/10/01 (Wed) @ 18:29

I too thought about sampling bias. I guess I’ll have to look into this in a more systematic way. Article upcoming, I suppose…


#5    David Gassko      (see all posts) 2008/10/05 (Sun) @ 03:30

I’ve done some preliminary research - let me know what you think now. I used matched pairs of players who played in 2000 and 2007, weighting their stats by the lower number of plate appearances. Tall players saw a decline in home run rate of 23.8%; short players, 32.3%; and medium-height players, 18.6%. There are 2,455 weighted plate appearances in the tall player bucket, 7,138 in the short player bucket, and 43,517 in the medium-height bucket. If I’m doing my math right, at those levels of plate appearances, all differences are statistically significant.

The tall player bucket I’m willing to ignore, since the vast majority of weighted plate appearances can be attributed to just five players (Frank Thomas, Derrek Lee, Richie Sexson, Troy Glaus, and Tony Clark). The results for short players, however, seem a lot more robust, and interesting…

Thoughts?


#6    Peter Jensen      (see all posts) 2008/10/05 (Sun) @ 08:57

David - Why don’t you take a look at Greg Rybarczyk’s HitTracker web site.  He categorizes HRs as “just enough”, “plenty”, and “no doubt”.  I bet you will find that short hitters have a greater percentage of their home runs in the “just enough” category.  As all players age the distance of their home runs should begin to decrease.  “Just enough” home runs will no longer be just enough.  Hence the greater drop off for short hitters. That would be my hypothesis.


#7    tangotiger      (see all posts) 2008/10/05 (Sun) @ 11:29

David, the same players over a 7yr time span had a statistically significant change in HR rates.  That’s the claim? 

But, is the three different rates (24%, 32%, 19%) statistically significant different from each other?

All you’ve done is prove that each, on its own, is statistically significant different from zero.  And even then, you haven’t shown how much different from zero.  Though, obviously, if you have a group of player who are 27 years old in one group and 34 years old in the other group, of course we expect to see a non-zero difference.

You need to lay out your hypothesis and test the null.


#8    David Gassko      (see all posts) 2008/10/05 (Sun) @ 13:38

No, Tom, I’m saying the differences are significantly different from each other. For example, the short players declined 32% in 7,138 weighted plate appearances. The random variance associated with that is equal to .32*(1-.32)/7,138 = .003. The square root of that, which gives us one standard deviation due to random chance, is .0055.

So the difference between the short players, who experienced a 32% decline, and the medium players, who experienced a 19% decline, is (.32 - .19)/.0055 = 23.5 standard deviations, which is obviously highly significant. If you do the math for all the differences, you’ll see they are all statistically significant.


#9    tangotiger      (see all posts) 2008/10/05 (Sun) @ 18:44

I see your error.

If the HR rate went from .03 to .02 per PA, that’s a 33% drop.  With 7000 PA, that’s 1 SD = .002.  That difference of .01 is 5 SD.  The change itself is highly significant, which is not a surprise.

What you cannot do is take the % change and treat that as the binomial.

But, a .03 to .02 change, compared to say a .03 to .024 change?  I doubt those two differences are statistically significant.


#10    David Gassko      (see all posts) 2008/10/06 (Mon) @ 01:20

Okay, fair enough. If I do it your way, the short players saw a drop-off of .009 HR/PA, the medium players, .007, and the tall players, .013. If I compute the random variance using the 2000 home run rates, the short players are 1.22 standard deviations away from the medium, while the tall players are 1.40. If I use 2007 home run rates (slightly lower), those numbers are 1.48 and 1.60, respectively. I don’t know what the correct mean (that is, the correct probability) to use is…

Certainly, that is a smaller difference than what I reported before, but I do still think it is interesting.


#11    tangotiger      (see all posts) 2008/10/06 (Mon) @ 01:47

Interesting but not significant, sure!  Like clutch hitting…


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 04:02
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 19:13
Offense by position groups by decade

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?

Nov 19 13:50
Response of a fired head coach