THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, March 11, 2010

“Just Enough” Homeruns

By Tangotiger, 02:11 PM

Greg.  He has a chart that shows that guys with at least 30 HR and at least 40% “just enough” HR (those that barely cleared the fence), they had a drop of 23% in their total home runs.

There are 2 reasons for this that is unrelated to the Just Enough:
1. fewer at bats… Greg shows us the raw HR counts, rather than showing HR per contacted PA or HR per swing
2. regression… anyone with at least 30 HR is bound to hit less HR

Greg, in order to improve your study, you need a control group: give us all the 30-HR guys with less than 25% “Just enough” and all teh 30-HR guys with 25-40% just enoughs.  And show the group totals on a per PA or per swing basis.

Then show us the before/after. I will guess that after all that, the true drop in HR rate for the “just enoughs” will be somewhere close to 5%.


#1    Tangotiger      (see all posts) 2010/03/11 (Thu) @ 15:11

Ok, table has been updated.  The % change is based on HR per AB.  So, that takes care of #1.

My point #2 still stands.


#2    Greg Rybarczyk      (see all posts) 2010/03/11 (Thu) @ 15:13

Tom,

I agree, and I’ll show the rest of the data when I can pull it together.  In this post (target: 400 words), there wasn’t room for the look at the “control group”.

Also, the chart initially went up without the right annotation about the data - it is calculated on a HR/AB basis, which showed a 23% drop overall in the group of 13.  The chart is fixed now.

When looked at by HR/PA, it was a 22% drop.

More on this later…


#3    statzombie      (see all posts) 2010/03/11 (Thu) @ 15:50

How much of an effect do you expect regression to have? My guess is not much.

Unless you are assuming no player has a true HR ability above 30 HRs (and guys like Pujols, Fielder, and Howard certainly do not seem to fit in this), it seems likely that only the players that were lucky/well above their mean should be expected to regress below their previous season total. And how close their HRs were, which Greg is calculating, seems like a very good proxy for that.


#4    Greg Rybarczyk      (see all posts) 2010/03/11 (Thu) @ 16:22

OK, here’s the “Control group” data for 2008 hitters. 

I’ve got the 2008 hitters who hit at least 30 HRs and had JE% of 25% or less, with their 2009 results.

The hitters are:

Pat Burrell
Miguel Cabrera
Carlos Delgado
Adam Dunn
Prince Fielder
Jason Giambi
Josh Hamilton
Ryan Howard
Albert Pujols
Hanley Ramirez
Alex Rodriguez
Grady Sizemore
Mark Teixeira
Jim Thome
David Wright

Collectively in 2008, these guys hit 531 HR in 8614 AB’s, or 1 HR per 16.2 AB’s.

In 2009, collectively they hit 376 HR’s in 6684 AB’s, or 1 HR per 17.8 AB’s.  The change in HR on a per AB basis for this group is -8.7%.

(and while I know I probably shouldn’t say this, if you remove David Wright and his well-publicized *intentional* change in hitting approach (which arguably ought to be figured separately from regular regression processes), you get a change of -4.5%.  Now throw something at for me for cherry-picking smile

That is significantly different from the change in the 40%+ JE% group of -23%.

Next: the hitters in 2008 who hit at least 30 HR’s, and whose JE% was between 25 and 40%.  They are:

Jason Bay
Ryan Braun
Jermaine Dye
Adrian Gonzalez
Aubrey Huff
Ryan Ludwick
Carlos Pena
Carlos Quentin
Manny Ramirez
Dan Uggla
Chase Utley

Collectively in 2008, these guys hit 376 HR in 6190 AB’s, or 1 HR per 16.5 AB’s.

In 2009, collectively they hit 313 HR’s in 5552 AB’s, or 1 HR per 17.7 AB’s.  The change in HR on a per AB basis for this group is -7.2%.

This data from 30+ HR hitters in 2008, with future performance in 2009, suggests that it does not matter a whole lot what the JE% is if it is less than 40%: these two “control groups” were very similar.  You can expect a drop in HR output in the high single digits, due to regression

However, if your JE% is above 40%, you are ripe for a much bigger drop in home run performance the next year, due to “regular” regression plus another special regression due to the JE effect on homers…

This data also suggests that “across the board” regression of 30+ HR hitters is not the best way to do it: the JE% *does* allow an analyst to differentiate within that group.

Now, I’ll have to check 2007 and 2006 later, so maybe I shouldn’t be too sure about this, but the data’s pretty strong so far…


#5    Jamie      (see all posts) 2010/03/11 (Thu) @ 16:29

Jayson Werth might be one of the ten strongest men in baseball.  Scouts from the phillies have said that he has more raw power than even Ryan Howard.


#6    Tangotiger      (see all posts) 2010/03/11 (Thu) @ 16:46

Greg: good stuff, exactly what I’m looking for.


#7    Greg Rybarczyk      (see all posts) 2010/03/11 (Thu) @ 16:52

2007 data (in brief):

JE% <= 25:

HR/AB drops by 12.8%

JE% 26-39:

HR/AB drops by 11.6%

So, a bit higher for this season.  But still only half the drop the 40%+ group sees…

I probably can’t do 2006 until later tonight…


#8    Tangotiger      (see all posts) 2010/03/11 (Thu) @ 17:28

Very interesting.  Generally speaking, let’s say that the high JE guys drop by a bit over 20% and the other JE guys drop by 10%.

So, we have something like this:

Control Group (pro-rated to same AB)
year1 - 33 HR: 23 regular, 10 JE
year2 - 30 HR: 21 regular, 9 JE

Test Group (pro-rated to same AB)
year1 - 33 HR: 19 regular, 14 JE
year2 - 26 HR: 17 regular, 9 JE

Basically, the JE regress ALOT while the rest of the HR regress a little.

***

Greg: if you want to see this more plainly, do this:

1. Pool of players: 20+ HR excluding the JE
2. Test Group: JE of at least 40%
3. Control Group: JE under 30%

What this does is that it sets the non-JE HR as equal.

And what you will find in the year x and year x+1 is that the non-JE HR will regress IDENTICALLY (or so my theory would go) for the two groups, and the JE HR will also regress IDENTICALLY.

That is, the non-JE HR might regress 25% toward the league mean, whilee the JE HR will regress 90% toward the league mean.

It just doesn’t look that way in your chart because both groups have a different number of non-JE HR to begin with.

Hopefully, I’m right in my theory and that will be the payoff you need.

***

Basically, my point is that the regression amount is tied in to the number of JE, regular, long HR per AB.  And we should be able to come up with a regression number like 90%, 30%, 10% or some such.


#9          (see all posts) 2010/03/11 (Thu) @ 17:30

Why not just run a linear regression?  Predict 2009 HR by 2008 long HR and 2008 JE HR.

That way you don’t have to worry about arbitrary categories.


#10    MGL      (see all posts) 2010/03/11 (Thu) @ 17:45

Good stuff Greg!


#11    MGL      (see all posts) 2010/03/11 (Thu) @ 17:46

Greg, can you do the same thing for pitchers?


#12    Guy      (see all posts) 2010/03/11 (Thu) @ 17:51

Tango:  Are you sure you want to regress JE-HRs for your control group toward the league mean?  This group actually begins with a below-average # of JE-HRs given their overall power (10, vs. an average of maybe 12).  Isn’t it possible that they will post as many, or even slightly more, JE-HRs in year X+1? (while their non-JEs decline)

In other words, there is a relationship between the two types of HRs, and a player with a high non-JE/JE ratio probably should not be expected to see their JE-HRs decline at all.


#13    Greg Rybarczyk      (see all posts) 2010/03/11 (Thu) @ 17:58

Tom, I like that idea, I will pull together the data for it later.

Phil, I’ll see if I can do what you’re describing as well.


#14    Greg Rybarczyk      (see all posts) 2010/03/11 (Thu) @ 18:17

MGL, I will check on pitchers, but I may not have time tonight.  Logically, this same sort of effect ought to be detectable…

One thing I need to give some more thought to: if you regress each different type of homer, how do you do that?  When regressing a No Doubt homer, do you say “due to regression, you didn’t happen” and take away a HR, or do you say “due to regreesion, you weren’t hit as far, so now you’re a Plenty HR”, and not take one off the board?  In other words, are we regressing fly ball distance, or the end stats?  I need to think about that…


#15          (see all posts) 2010/03/11 (Thu) @ 18:22

What you might say is that (for instance) a “no doubt” HR this year might be worth 90% of a HR next year.  The 10% difference is due to the fact that some of the HRs are due to getting an extra lucky fat pitch to hit, or just randomly getting better wood on the ball than expected, and that won’t reproduce next year.

Then, you might say a “just enough” HR might be worth only 60% of a HR next year.  10% of the decline is the same as above.  The other 30% is that you were lucky the ball went 405 feet and not 399 feet.

Anyway, I’m not sure if that’s your question.  The point is that if you just regress on the two variables—ND rate and JE rate—the equation gives you the prediction for next year.  It tells you how much to regress, and you then interpret what the results mean.


#16    Guy      (see all posts) 2010/03/11 (Thu) @ 18:51

Looking at this by age might also be interesting (though sample size issues are daunting).  A young player with a lot of JEHRs might still be gaining strength, and thus regress less than an older player.  If you break this into <29 and 29+, might see a difference.


#17    Tangotiger      (see all posts) 2010/03/11 (Thu) @ 18:58

Phil,

You are absolutely right that running a regression will give us the answer.  It’s just that real-life examples makes it look more real.  And it might give us some extra insight. 

For example, the JE that are at 40% or higher seems like a bigger break than below that.  We’d only know that by looking at the data.  Or it might be luck.  So, there’s value in doing it like Greg is doing it.

He can add a “dummy variable” that say “JE > 40” in addition to the JE rate into the regression equation.

And yes, in the end, we’d like to have a regression equation.


#18          (see all posts) 2010/03/11 (Thu) @ 19:02

Tango/17: Yup, I like to see it both ways.  I only suggest a regression because the breakdown didn’t seem to show a difference in the first two groups.

I’d run a separate regression for those groups too, just to make sure it isn’t just a >40 effect.  I don’t see why it should be just a >40 effect, other than perhaps noise in the data ...

Maybe the <25s are concentrated at 24, and the 26-39s are concentrated at 28, which is why there’s no difference observed.  A regression might tease that out.


#19    J. Cross      (see all posts) 2010/03/11 (Thu) @ 20:20

Trying a regression (and I hope I’m not butting in):

For guys with 30 or fewer HR’s in Y-1 (n=725) I get:

Y HR% = 1.23*Y-1 ND% + 0.87*Y-1 JE%

(the standard errors in the coefficents are .11 and .094)

and for guys > 30 HR’s in Y-1 (n=136) I get:

Y HR% = 1.17*Y-1 ND% + 0.51*Y-1 JE%

(standard errors in the coefficients are .19 and .18)

HR% = HR/AB
JE% = JE/AB
ND% = ND/AB

and these are weighted by sqrt(Y AB*Y-1 AB)


#20    J. Cross      (see all posts) 2010/03/11 (Thu) @ 20:26

I left out the intercepts in those equations.

+0.019 for the <= 30 group
+0.036 for the >30 group


#21          (see all posts) 2010/03/11 (Thu) @ 23:28

How do I get access to all the hit tracker data?  I can’t seem to find out how at the site.


#22    Nick Steiner      (see all posts) 2010/03/11 (Thu) @ 23:50

I don’t see why >40% is some magical number that makes you prime for extra regression.  If the 25%-40% group shows no change from the <25% group, the fact that the change is so much bigger in the 40% group is likely just noise. 

I agree with Phil that a regression would be the best way to go.  x = 2008 JE rate, y = HR/AB drop% from 2008 to 2009.


#23          (see all posts) 2010/03/12 (Fri) @ 00:14

In Ron Shandler’s Forecaster this year, there was an article about how very few players are capable of hitting opposite field home runs, and that players that do are either elite power sources, or candidates to break out with high numbers the next season.

My guess is that opposite field home runs are more likely to be “just enough” home runs.  And I wonder if these home runs are qualitatively different than other “just enough” home runs.


#24          (see all posts) 2010/03/12 (Fri) @ 00:59

Josh/23

All the data is on the site. There are, on average, 50 pages worth of HR data per season since 2006. You can copy and paste each page into a spreadsheet and add IDs to run queries on it.

HTH.

P.S. A bunch of fantasy saberists have looked at JE in the past (and recently over at 911), as have I, and have found little actionable info. What you want is a 70% percentage play or better, which this is not for the vast majority of the hitter pool. Now for “real” baseball, see the post above regarding the data being worth 500K to a team on average, assuming they were in a position to put it to use.


#25    Tangotiger      (see all posts) 2010/03/12 (Fri) @ 12:03

Nick: it’s not that you need to draw a bright line at 40%.  It’s always a continuum.  But, rather than a linear line, it can be exponential.  And so, rather that having JE as a parameter, you can have JE and JE^2 and JE^3.

If let’s say you have JE^3, then the gap between 50% and 40% is the same as 40% and 0%.  So, just saying “> 40%” brings that out better than saying JE^3.  (Just for illustration purposes only.)


#26          (see all posts) 2010/03/12 (Fri) @ 19:55

The AL playe quite differently in HR from 2008 to 2009 than the NL

AL 2008 1.00 HR/G
AL 2009 1.13 HR/G

+13%

NL 2008 1.01 HR/G
NL 2009 0.96 HR/G

-5%

Lumping AL and NL hitters in this study given the above numbers might lead to misleading. results.  The new Yankees stadium and the new Mets stadium might account for as much as 3% of the difference in each league (so I would exclude Yankee and mets hitters, call it cherry picking if you will), but something else (??) was at play last year.


#27    MGL      (see all posts) 2010/03/13 (Sat) @ 00:19

Interestingly, in the NL, it was .55 degrees cooler in 09 than in 08 and in the AL, it was 1.5 degrees cooler (according to the retrosheet game time temps), so that makes the difference in home rates between the two leagues even stranger.

Certainly, a change in quality of the hitters and pitchers for the two leagues (from 08-09) could account for that.  Then again, it could be noise.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 01:57
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul