THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, December 15, 2010

Matt and batted balls, part 2 (a thread about baseball)

By Tangotiger, 02:27 PM

There’s tons of information here.  I’ll just point out this and get back to the article tomorrow:

The .025 correlation year to year on line-drive BABIP is particularly surprising because it is at odds with previous research. Six years ago, Mitchel Lichtman found that line-drive BABIP was persistent for pitchers, but look at the line-drive BABIP net of team line-drive BABIP and this unravels. This is a mixture of team defense adjustment and official scorer adjustment, but it un-teaches something important about pitcher BABIP that many of us thought we knew.

Except MGL also provided the correlation for the team-switchers, and the correlation for the 107 team-switching pitchers had an r = .365 for BABIP on OF line drives.  I don’t think we’re being untaught anything yet.

***

Anyway, Matt’s data is too overwhelming to analyze while working at the office.  I need time.  Maybe the smarty-pants out there are quicker on the draw here and can interpret some of this data.


#1    Matt Swartz      (see all posts) 2010/12/15 (Wed) @ 16:42

To avoid it getting lost, my response was the following when this was a one-part thread: “The outfield versus infield line drives thing I did not test-- the team switchers thing is notable.  I’m guessing that outfield line drives and fly balls look a lot a like, and outfield fly balls do have pretty high persistence in BABIP, so there is probably an increasing skill with controlling BABIP on balls launched with a certain angle.  Something like control of batted balls goes down to almost nothing when launched at +/- 90 degrees and 0 degrees, but is particularly high at about +/- 30 degrees.  Just throwing numbers out there.”

The key to the article was not the line drives thing so much as highlights that:
(1) Ground ball pitchers have better BABIP and SLGBIP on ground balls than other pitchers.
(2) Ground ball pitchers have worse BABIP and SLGBIP on fly balls than other pitchers.
(3) As you increase GB%, BABIP goes up-- and then back down.  SLGBIP too.  Because we’ve been looking at this linearly, the curvature wasn’t clear originally.  (The pictures in the article are working now to highlight this.)


#2    Tangotiger      (see all posts) 2010/12/15 (Wed) @ 17:25

Matt, in your case, “BIP” includes or excludes HR from the denominator?  Because I don’t see why we would need to exclude it.  Therefore, it’s not BIP but BB (batted balls), or CON (contacted balls) as Colin calls it.  So, BACON and SLGCON. 

Can you confirm which way you are doing it?


#3    Matt Swartz      (see all posts) 2010/12/15 (Wed) @ 17:27

BIP excludes HR from the denominator.  Tomorrow’s article I dig into HR/FB more.  I guess the two results can be combined to say something about BACON and SLGCON, but I didn’t do that in either article.


#4    Tangotiger      (see all posts) 2010/12/15 (Wed) @ 17:49

Well, I don’t understand why you would exclude HR then.  If Brett Myers gives up alot of HR, why would we care about his SLG on BIP (excluding HR)?

You would do that if you are focusing on fielding.  But, that’s not what you are doing here.

You have GB pitchers with a worse SLG than FB pitchers, but what if FB pitchers give up tons of HR?  Why would they not be part of the equation?

There’s a time and place for excluding HR, and this one, unfortunately for you, is not it.  Unless you can make the case as to why you want to do this.


#5    Matt Swartz      (see all posts) 2010/12/15 (Wed) @ 17:54

You make a good point, though tomorrow’s article as I study HR/FB might shine some light on why I left that out. Specifically the last line of today’s article is: “This statistic has a low year-to-year correlation, lower than BABIP does (.07 for HR/FB versus .13 for BABIP, both net of team rates) - but SIERA still gives us some help at picking up some of this effect.” Maybe I should check BACON and SLGTON too though, but I’m somewhat fixated on how catchable the BIP are from a given pitcher so that’s been how I focused it.

A little tip of my hand for tomorrow’s article where this will be more detailed-- fly ball pitchers do not allow more home runs per outfield fly ball than ground ball pitchers do.


#6    Tangotiger      (see all posts) 2010/12/15 (Wed) @ 18:00

Matt: I understand why you did it, because I’ve been in that same “fixated” frame of mind.

There could be good reasons why FB don’t do that: it’s a skill that it’s been selected for.  Imagine you are a FB pitcher: how long are you going to last in MLB if you allow alot of HR?

If you allow alot of HR/FB, you probably have a sh-tload of strikeouts to compensate.  If you are a low K FB pitcher, you probably would have to have a low HR/FB rate, otherwise, you won’t last long enough to make the sample.

Just throwing out some possibilities that should be looked into…


#7    Tangotiger      (see all posts) 2010/12/15 (Wed) @ 18:05

Oh, and I’ve already shown that there IS a HR/FB skill.  The noise overwhelms the signal when you look at it on a one-year basis, but, fortunately, we don’t need to self-constrain in that manner.  Unless we really intend to not look for the signal.


#8    Matt Swartz      (see all posts) 2010/12/15 (Wed) @ 18:16

Well, it’s perfectly plausible that ground ball pitchers could allow more HR/FB given that they would still allow fewer HRs.  It’s just that they don’t.  The relationship disappears after you adjust.

To adjust for different skills at once, I used regression.  It doesn’t model it, but it can find correlations after controlling for another variable, which is how I used it.  You’ll let me know tomorrow if you think I hit on the right stuff.  It sounds like we thought about it similarly from what you’re saying.

HR/(ifFB + ofFB) is about as much of a skill as BABIP.  HR/(ofFB) is less of a skill than BABIP.  One of the things I discuss tomorrow is how K% is correlated with the HR/(ofFB), and how SIERA works that in just like it works in BABIP skills.

Brett Myers is an interesting counterexample/natural experiment to the issues surrounding HR/(oFFB), because he gives up a lot historically even though he DOES strike people out.  Personally, I think it’s an issue of pitch selection.  While most pitchers who select pitches poorly or predictably will have this poor skill washed out by ceeding pitch selection to a catcher or pitching coach (thereby reducing the effect the pitch selection skill has on their pitching outcomes), Myers has historically been reported to be incredibly stubborn.  I’ve never seen a pitcher refuse to throw curveballs the first time through the lineup and actually announce to the media that he was doing so.  It’s like playing with a clear glove.  I think hitters know what’s coming too often with him, making him an exception to the rule that pitchers are generally similar in HR/(ofFB) outcomes in similar classes of K%.


#9    Tangotiger      (see all posts) 2010/12/15 (Wed) @ 19:44

"HR/(ofFB) is less of a skill than BABIP.”

What you are really saying is:
“single-season HR/ofFB shows less signal:noise ratio than single-season nonHR hits per BIP”

Of course, with a smaller sample size (ofFB), you would expect that even if they were equally predictive.  (The smaller the sample, the lower the r, all other things equal.  That’s simply implied by the regression equation.)

Nonetheless, I don’t find that to be true.  Perhaps it’s because I’m capturing the park effect as part of the correlation, but I don’t remember now.


#10    Matt Swartz      (see all posts) 2010/12/15 (Wed) @ 19:57

Touche-- sample size affects the correlation.  I guess in being about 30% of batted balls, outfield fly balls sample size is smaller, so a correlation a little more than half (sqrt(.3)ish)as big might mean similar variance in skill level. 

But actually I think that the variance due to sample size should be smaller for statistics further from 0.5 though (p*(1-p) = .21 for BABIP, .11 for HR/ofFB) so that may cancel out a lot of the effect. 

I guess I’m saying that you shouldn’t be looking at one year of HR/ofFB as more predictable than one year of BABIP, and several years of BABIP will closer to a pitchers true skill level than several years of HR/ofFB anyway.


#11    Brian Cartwright      (see all posts) 2010/12/16 (Thu) @ 00:24

A few months ago I ran some numbers on what pitchers give up, divided into groups by gb%.

Although my study was not as rigorous as Matt’s (it’s been 25 years since college math and Matt has the PhD) I found the same things as Matt did.

Although Matt says in #5 “fly ball pitchers do not allow more home runs per outfield fly ball than ground ball pitchers do” which I found as well - it was the least varying stat - but there is a difference in ldhr/ld, about double. If you take (ldhr+fbhr)/(ld+fb) then fb pitchers do allow abut 27% more hr’s per outfield airball.

gb% slgcon  babip  _fbhr  _ldhr   _hr
.30   .564   .282   .124   .027  .091
.35   .536   .288   .114   .023  .081
.40   .529   .293   .115   .022  .080
.45   .518   .301   .115   .020  .077
.50   .500   .301   .112   .020  .073
.55   .483   .303   .115   .017  .071
.60   .467   .298   .123   .016  .073
.65   .443   .300   .113   .013  .066


#12    Matt Swartz      (see all posts) 2010/12/16 (Thu) @ 00:48

Not sure about that, but I’m not seeing it when I work net of team.  I’m guessing the ldhr phenomenon is classification of batted ball issues and even the ld/gb batted ball classification could be an issue for the (ld+fb) thing.  Two questions:
(1) is this net of team for batted ball rates and home run rates?
(2) is this including infield pop-ups?

For (GB% - team GB%) I get a -.085 correlation with next year’s [(HR/(LD+ofFB+ifFB))-team(HR/(LD+ofFB+ifFB))] for pitchers with 300 batted balls (BIP+HR) in both years, and -.094 correlation with this year’s.  I’m getting a slightly higher .090 correlation for just [(HR/(LD+ofFB+ifFB))-team(HR/(LD+ofFB+ifFB))] with itself.

There does seem to be some skill here, so there’s probably a way to tease it out better, but it should all be net of team or normalized in some way, and the general point is that you’re not getting a very different amount of skill than BABIP than we do with HR/FB type stats.


#13    Brian Cartwright      (see all posts) 2010/12/16 (Thu) @ 01:01

My last three columns are ldhr/ld, fbhr/fb and (ldhr+fbhr)/(ld+fb). I did not use popups.

So my _hr is pct of airballs in the outfield that were hr’s. I’d rather use this than break it down into ld & fb, as that is subjective, but it was interesting to see that the fbhr was very flat while ldhr was not. My thinking is that gb pitchers lower the mean angle off the bat (illustrated by limited hitfx sample). Some high flies are lowered and then go out of the park, while some lower angle fly hr’s drop into are are reclassified as ld’s. Net is that fbhr doesn’t change, while ldhr goes up 50% (former ld’s that come off at a lower angle are then sharp grunders).

“(1) is this net of team for batted ball rates and home run rates? “

Don’t understand the question.

I grouped pitchers by their gb%, then looked at the cumulative stats for each group. Quick and dirty, but I believe I got the same meanings out of it that you did, except for the overall hr%


#14    tangotiger      (see all posts) 2010/12/16 (Thu) @ 02:35

When you do net of team, do you do OTHER teammates, or are you including the pitcher himself?  Obviously, it makes no sense to compare a pitcher to himself, and at the team level, a starting pitcher is going to get 13%-15% of the innings.  That’ll bias the results toward seeing less variability than there actually is.


#15    MGL      (see all posts) 2010/12/16 (Thu) @ 02:52

A few comments as I read through the first part of the series, which is very good so far:

There are two more important reasons why Skill-Interactive Earned Run Average’s (SIERA) is so successful at predicting the following year’s ERA. First, most other Defense-Independent Pitching Statistics, like FIP and xFIP, assume that pitchers have no control over their Batting Average on Ball in Play (BABIP), but we know that they do have some control.

This may have been discussed already in the first thread, but this is an unfair characterization of FIP and xFIP, as compared to something like SIERA.  No one claims that a fixed BABIP is correct.  FIP is a short cut.  It is simply a 100% regression, where something less than 100% is correct.  Any metric that regresses BABIP somewhat properly (maybe 90% for one year, 80% for two years, etc.) is going to do better than FIP in predicting another year’s ERA.  FIP will do better for one or two years (and maybe 3) than not regressing at all.  For many years, not regressing will do better than FIP.  We know all that.  To “critcize” FIP because it regresses BABIP 100% is little bit unfair and misleading I think.  In addition, as Matt points out, the regression should be towards different means, depending upon the GB rate of the pitcher.  As well, BABIP does not tell the whole story.  The SLG portion needs to be taken into account too.

Each of these guys has a career BABIP on ground balls that is at least 12 points below their teams’ ground-ball BABIP. This is not a coincidence. They don’t just induce contact with downward trajectories - they induce ground balls that are easier to field.

Do ground-ball pitchers induce weak contact on all balls in play? No. The reverse seems to be true for fly balls. Looking at outfield fly balls only, and excluding home runs, ground-ball pitchers have a distinctly higher BABIP and slugging average on balls in play.

We’ve (I’ve) known this for years.  That is why I have an adjustment in UZR for the GB and FB rate of the pitchers.  GB pitchers give up easier to field ground balls and harder to field fly balls, and vice versa for FB pitchers.  I have written about this many times – particularly in my UZR primers.  The reason should be obvious.  When a batter hits a ground ball from a ground ball pitcher, it tends to get beaten into the ground.  At least at one end of the ground ball spectrum.  When a ground ball is hit off of a fly ball pitcher, it tends to be hit hard, at the other end of the ground ball spectrum.  Same thing in reverse for fly ball pitchers.

However, these ground-ballers do not exhibit any tendencies toward line-drive BABIPs and infield pop-up BABIPs that are different than other pitchers. There is almost zero correlation year to year for BABIP on line drives or pop-ups for any pitchers.

This should also be somewhat obvious or at least not too surprising.  A line drive is a line drive and a pop fly is a pop fly.  Imagine that the stringers are told this:  “Record a ground ball as any ball that hits the ground at least once in front of where a normal infield plays and a fly ball for any air ball that does not have a ‘line drive’ trajectory, in your opinion.” There could be hard hit or softly hit ground balls and fly balls.  For LD and pop flies, the stringers are essentially told:  “Any air ball that hangs more than 3 seconds per 100 feet in distance is a pop fly and any air ball that hangs less than 1 second per 100 feet is a line drive.” The range of “hardness” for pop flies and line drives is limited.  IOW, by classifying a batted ball as a line drive or a pop fly, the stringer is already telling us approximately how hard the ball was hit.

The .025 correlation year to year on line-drive BABIP is particularly surprising because it is at odds with previous research. Six years ago, Mitchel Lichtman found that line-drive BABIP was persistent for pitchers, but look at the line-drive BABIP net of team line-drive BABIP and this unravels. This is a mixture of team defense adjustment and official scorer adjustment, but it un-teaches something important about pitcher BABIP that many of us thought we knew.

First of all, the two methodologies are slightly different.  One uses the “net team” method to remove park, defense, and possible stringer bias.  The other (my 2004 study) uses only those pitchers who switched teams to do the same thing. I think my method is better, but I am not sure.  Second of all, we are using different years and different data sets.  I was using 1993-2002 STATS data and Matt is using more recent BIS (I think) data.  Third of all, and more importantly, when one study conflicts with another study, it “un-teaches us something we already knew?” Huh?  Or am I interpreting that incorrectly (that his study is right and my study is wrong)?


#16    Matt Swartz      (see all posts) 2010/12/16 (Thu) @ 10:53

Brian/13:
I agree with your assessment and hitf/x analogy. The “net of team batted ball rates and home run rates” was asking whether you subtracted the team batted ball rates and the team home run rates out, or normalized for park and scorer bias some other way.  I did {HR/FB - team(HR/FB)}.

Tango/14:
Darn. You’re right. I screwed that up; that’ll lower the correlation, I think.  I don’t think it’ll change it extraordinarily, but it’s definitely a coding mistake.

MGL/15:
Re: BABIP-- Yeah, that’s discussed in the other thread.  I meant “assumes” as in “the model assumes” for tractability.  It’s not a criticism so much as highlighting a limitation of the FIP model.  SIERA has model limitations too, and I tried to make that clear.  SIERA effectively regresses BABIP less than 100%, which is a benefit of the model relative to FIP.  The obvious model benefits of FIP include the fact that it measures the run-scoring effect of HR, BB, and K more exactly.  Not meant as a slight.

Re: GB beaten into ground-- This is exactly what I had in mind.  I think that probably the best expression of a pitcher’s batted ball tendencies could be something like a normal or at least unimodal distribution of a launch angles where the mean of the launch angle is lower for high-GB pitchers.

Re: stringers and selecting on PU & LD-- Agreed, and agreed with reasoning.  I think the finding helps narrow down what to look for with batted ball data, so it’s good to highlight and check.

Re: your 2004 study-- What I was going for with “un-teaches” was that something that appeared true with earlier data is no longer true.  I don’t think that part of the study is useless, and I keep the study bookmarked on my computer.  I’m just saying that the more strict classification of line drives in recent years has suggested no effect.  Perhaps it would be interesting to see more recent data and see the same thing about team switchers.  BTW I was using Retrosheet data.  I don’t know enough to have a preference but that’s just what I have access to more easily.

Everyone:
Thanks for these thoughts.  I think there’s a lot of information coming out of this discussion.  I understand my results better now because of these comments.  Today’s article is up http://www.baseballprospectus.com/article.php?articleid=12584 if anyone’s interested in the rest of the study.  I don’t have anything else written about this right now, but I have some other data I need to wrangle into something useful to look into.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 12:51
Chad Curtis

May 25 12:42
“Why Kickstarter works”

May 25 12:40
Largest demonstration in Canadian history?

May 25 12:38
Do pitcher’s reach back for velocity when needed?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves