THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, May 09, 2007

Groundball Distributions

By Tangotiger, 10:39 AM

John Walsh gives us a good graphical representation of where groundballs go.

(Aside note to John: Dan Levitt or Dan Levine… I wish everyone would have an easy handle to remember like Tango or MGL… did something similar, but in table format, not the cool graphs you have.)

Anyway, he did the very first important thing you need to do: break it down by handedness of batter.  Anyone who has ever played baseball ever in their lives knows the reason.  And John does it.  But then, shockingly to me, in his table “Who fields grounders”, he went back to the main data without the handedness split.  And he shows, unsurprisingly, overlaps in fielding slices.  From the point he introduced the split by batter hand, he had to continue to the rest of the article with that split. 

You would then see far different outs/BIP rates for each slice, based on handedness.  That’s why we need to split it up.  Hopefully John can handle that in a part 2.

And, I agree with John’s basic point about getting back to basics.  If it’s one thing Dewan’s Fielding Bible did, is that it is important to present something in no uncertain terms: data.  Don’t just present the final adjusted numbers, but present the underlying data in something manageable.  And that’s what John is doing here.


#1    Tangotiger      (see all posts) 2007/05/09 (Wed) @ 10:52

Here’s the Dan article:
http://www.baseballthinkfactory.org/btf/scholars/levitt/articles/fielding_opps.htm


#2          (see all posts) 2007/05/09 (Wed) @ 11:07

An interesting thing to note about the fielding data is this:  Mr. Clemen$ just left the best ground ball fielding Defense for a bottom 10 infield D.  In addition to switching to the ‘harder’ league, he’s going to get the added degree of difficulty of having to get more outs based on more ground balls getting thru.  For a guy struggling to get 6 innings in already, this seems a bad omen. Balls in the SS hole would seem to be the most obvious place he’s going to see the difference.


#3          (see all posts) 2007/05/09 (Wed) @ 13:11

Natedogg, yes, between another year in age, a much better pitching league, and the poorer Yankee defense, Clemens is going to appear to be much worse than last year.  OTOH, he is truly still a great pitcher, so unless age has really caught up to him or he gets hurt, he is still one of the best pitchers in baseball (top 5), according to my calculations and projections.

With Houston, aided by the league and the HOU defense, he was ridiculous in his numbers.


#4          (see all posts) 2007/05/09 (Wed) @ 13:20

Great graphics in this article!  Let me compare John’s team numbers with those of UZR.  UZR is essentially the same thing but adjusts for handedness of batter and pitcher, speed of ball, runners on base and outs, park factors, and G/F tendencies of the pitcher (ground ball pitchers tend to allow easier to field GB’s, even after controlling for speed of ball according to STATS).

Infield Plays Made Above Average in 2006
Team Total UZR runs
HOU 44 +73
COL 43 +31
DET 40 +53
SDN 30 +18
SLN 29 +17
SFN 29 +18
NYN 28 +20
PHI 23 +12
TOR 18 -12
FLO 15 -33
MIL 5 +15
ARI 5 +29
CHA 2 +7
ATL 1 -11
BOS 0 -4
OAK 0 -11
SEA -3 +13
CHN -8 +9
LAN -8 -19
MIN -11 0
TEX -12 -22
KCA -13 -17
NYA -14 -14
ANA -15 -18
PIT -16 0
WAS -22 -21
CIN -25 -21
TBA -38 -52
BAL -40 -12
CLE -90 -37

There appears to be reasonable agreement, as you would expect, save TOR, FLO, ARI, SEA, CUBS, and PIT.


#5    John Walsh      (see all posts) 2007/05/09 (Wed) @ 16:33

Tango: thanks for the pointer to the Levitt article (I always mix him up with the Freakonomics Leavitt). I don’t understand why he broke down the data in terms of pitcher handedness, instead of batter.

I’ll be looking at handedness more closely in a future article. I only included it in that one graph, because I was trying to explain the unexpected (to me) structure of the GB distribution.  I have already looked at the data borken down that way, and I don’t see a _huge_ difference. Anyway, more on that in the near future.

In defense of Dewan, he’s trying to make a living (I imagine) off this stuff, so I’m surprised he doesn’t reveal the data that his company is trying to sell to teams, etc.  I’ve got no problem with that, but it would be nice to have a freely available pbp system (beyond Pinto’s).

mgl: How do you deal with overlapping zones in UZR? Let’s say in a given zone (using “zone” to denote any way you want to divide up the data), league average for 100 BIP is: 30 fielded by 3B, 45 fielded by SS, 25 hits.  So, what is league average out fraction for a 3B for that zone? SS?

Dewan published team +/- numbers in the Harball Times annual this year.  I did a quick comparison with my simple system and the r-value was around .7, so there are some differences with that system, as well.


#6    Tangotiger      (see all posts) 2007/05/09 (Wed) @ 17:01

John, if you look at the groundBouts/groundBIP of each zone, by handedness of batter, you will see some substantial differences. 

And the issue of overlap, I’d expect, will be somewhat reduced.  That is, the reason we see as much overlap as we do, is because the SS/3B hole is shared by the SS with the RHH and the 3B with the LHH.

I’d encourage you to make these charts:
1. Same “Where Grounders Go”, but change the Y-scale to a percentage.  The area under each of the three curves should sum to 1.00.

2. Same “Who fields grounders”, but make it as a percentage of outs per BIP.

What will these charts give you?  Ideally, the peak of the first one (distribution) will match the peak of the second (out-conversion).  That is, you convert into outs the plays you most are involved in (it goes to positioning).

And, you can use this for positioning.  If you slide a guy over 4.5 degrees toward 3B, you can try to model his new out percentages (slightly higher to his right, and slightly lower to his left).


#7    Tangotiger      (see all posts) 2007/05/09 (Wed) @ 17:03

And, in all my suggestions, the LH/RH split is a given.  In none of these charts should they be combined.


#8    MGL      (see all posts) 2007/05/10 (Thu) @ 03:53

I think I simply use the percentage of balls fielded in a zone by that type of fielder.  If that type of fielder catches a ball, he gets credit for 1 minus that number.  If someone else catches it, he gets no demerits, and if no one catches it, then he gets docked his percentage.  I think that is what I do, but I am not sure.  I’d have to look at the program.


#9    Joe Arthur      (see all posts) 2007/05/10 (Thu) @ 07:25

The mlb.com hit location data which John used omits the hit location of errors. These are about 2.5% of ground balls and have a biased distribution - mostly on the left side of the infield(whether the batter is lefty or righty) and somewhat more likely along the “tougher” vectors. This should (slightly) smooth out the peaks and valleys in John’s charts…


#10    Peter Jensen      (see all posts) 2007/05/10 (Thu) @ 09:16

Joe - Interesting.  We should include hit location of errors in our requests of information we would like to see in EG.  Certainly wouldn’t require any new changes in technology, just a change in policy.


#11    John Walsh      (see all posts) 2007/05/10 (Thu) @ 09:28

Tango: I’ve made a couple more graphs, breaking out lefty and righty batters. They’re not normalized the way you suggested, but I think they tell the story.

Two things I noticed right off:

1) The overlap region between 3B/SS and 2B/1B is not significantly reduced even when looking at only LHB or RHB.

2) The out fraction for the 3rd base zones are actually lower against LHB. I would have thought: LHB, weaker gb’s to 3B, higher out fraction, but I see just the opposite. I don’t think it’s due to extreme shifts put on some LHB, because the same thing happens on 1B side for RHB. Maybe it is a positioning affect.

You can view the 2 graphs by clicking on the link.


#12    John Walsh      (see all posts) 2007/05/10 (Thu) @ 09:34

Joe: good catch, I was not aware of that.  It would be nice to have the hit locations for the errors, too.


#13    Tangotiger      (see all posts) 2007/05/10 (Thu) @ 10:58

John, fantastic stuff.  I am surprised that the overlap stayed as much as it did. 

The distribution of balls has just a subtle shift.  It seems that shift is around 4.5 degrees.  Doing some quick math, if the radius is 130 feet, then the circumference of a circle (360 degrees) is 817 feet.  So, 4.5 degrees is around 10 feet.  That seems to be how much a player should move over between RH/LH hitters.  So, I suppose with such a tiny shift, we should still expect to see alot of overlap.

As for the out rate: if you were to redo your first graph normalized as I indicated, you will see that groundballs to 3B have more of a peak from RH, and a bit more flat from LH.  This probably indicates that the positioning of a 3B will end up being more static (i.e., RH are fairly consistent as to where they hit the ball, when they pull toward 3B), while more dynamic with a LH (they spray it more, so, we are less certain as to where the 3B is positioning himself).

The different in out rates for 3B is enormous.  Remember, the difference between the absolute best fielder and the absolute worst is around .10 outs per play.  Your cool chart at the end (difference in out rates between RH and LH) shows the gap at that level or even larger.  So, if you have a 3B with plenty of LHP (i.e., faces alot of RHH), the 3B will look much better than someone who has mostly RHP (i.e., faces alot of LHH).

(Friendly design note: your 3B dotted line should be a color other than red or blue, as those undotted colored lines represent something else.)

Now, wanna try something cool?  Redo the charts, but only for Astros and Yanks (broken by RH/LH).  We should be able to see how their SS do, and possibly even infer their positioning.


#14    Guy      (see all posts) 2007/05/10 (Thu) @ 11:56

Very interesting work, John.

The LHB advantage in the 3B zone must result mainly from positioning.  With a LHB, 3B is playing more toward 2B, so balls down the line will have low out%.  Similarly, they have an advantage on balls to right of 2B bag, as 2Bman has shifted toward 1B.  Conversely, RHBs have an advantage on balls hit down 1B line (1B is further from line).  In addition, LHBs have a generic advantage in that they have fewer steps to 1B.

* *

In the article you note that 3B have lower out% than SS and 2B, perhaps because they have less reaction time.  Another factor, probably more important, is the need to guard the line to prevent doubles.  The proper positioning for a 3B (or 1B) has to weight the XB threat, while a SS or 2B basically wants to stand where they can get to highest % of GBs.  At 1B you have the same issue, plus an additional consideration: he has to stay close enough to 1B to easily beat the runner to the bag.  As a result, a 3B has much more freedom to shift appropriately when a LHB pull hitter is up than does a 1B when facing a RHB pull hitter (especially if hitter is fast).


#15    Rally      (see all posts) 2007/05/10 (Thu) @ 13:47

The LHB is a step quicker to first, but I don’t know if that explains it, as you see a mirror image of the effect when a RHB hits the ball towards 2nd or 1st.

It might be that weaker hit groundballs = higher hit percentage.

A hard hit GB is more likely to get through the infield than a medium, but if I remember correctly soft GB are tougher to get outs on than medium, they are likely to be infield hits.

Or maybe its just a distribution issue.  I played infield, yet it never occured to me that there was a difference in GB out% for batter handedness until I worked with PBP data.  Perhaps I just didn’t see many lefty batters when I played.


#16    tangotiger      (see all posts) 2007/05/10 (Thu) @ 14:53

There’s also a difference based on the tendency for a pitcher to be a GB or FB pitcher.  A GB pitcher will get more outs, in the same zone, as the FB pitcher, on ground balls.  Obviously, most teams don’t have GB/FB tendencies, just individual pitchers.


#17    John Walsh      (see all posts) 2007/05/10 (Thu) @ 16:20

Tango:

The different in out rates for 3B is enormous.

Agree that this is an effect that needs careful investigation. What I don’t know (yet) is the variance in the proportion in LHB faced by any given team.  If all teams see (roughly) the same proportion of lefties, the overall effect could be small.  They probably don’t, but all this has to be quantified.

Redo the charts, but only for Astros and Yanks (broken by RH/LH).  We should be able to see how their SS do, and possibly even infer their positioning.

I’ll be looking at teams asap.

Guy:

The LHB advantage in the 3B zone must result mainly from positioning.  With a LHB, 3B is playing more toward 2B, so balls down the line will have low out%.

That’s what I would think, too, but wouldn’t you expect a shift in the 3B out fraction curve, much like you see for SS and 2B? Instead, all balls hit towards 3B are harder to field if they are hit by a LHB. I don’t think positioning explains it.


In the article you note that 3B have lower out% than SS and 2B, perhaps because they have less reaction time.  Another factor, probably more important, is the need to guard the line to prevent doubles.

Again, if the issue was mostly positioning, you’d expect that somewhere in the 3B area, the out fraction would be as high as the peak out fraction for SS. In other words, if the 3B is standing near the line, the balls hit near the line (at the 3B) should be converted to outs at a high rate. But, my graphs don’t show that.  That’s why I mentioned the reduced reaction time as an explanation.

Thanks to all for the useful discussion.


#18    tangotiger      (see all posts) 2007/05/10 (Thu) @ 16:36

but wouldn’t you expect a shift in the 3B out fraction curve, much like you see for SS and 2B

As I mentioned, I think this is based on “spraying” the ball.  And if the ball sprays differently from different LHH, then you are positioned differently, unlike against RHH, where you may have a more set position.

As well, the weak and strong grounders would also have an impact.


#19    tangotiger      (see all posts) 2007/05/10 (Thu) @ 16:40

The 3B/SS is based on reaction time (3B plays much closer), and the different types of balls hit (a ball hit to SS is not the same as a ball hit to 3B).  Plus, SS are better fielders to begin with. 

So, you have two samples that have nothing in common, other than “MLB”.


#20    Tangotiger      (see all posts) 2007/05/10 (Thu) @ 16:56

In 2006, 51% of Pirates BFP were LHP, and 1.8% of Arizona were LHP.  (The league standard deviation was 11%.)

Presumably, Pirates pitchers faced very few LHH, and Arizona faced mostly RHH.

Checking retrosheet, it looks like 70% of opposing hitters were RHH against Pit, and about 50% of oppositing hitters were RHH against Ari.

So, if you had an out rate gap of .10 per play, then a 3B on the above two teams would show a .02 outs per play gap, if you didn’t adjust for handedness.


#21    Guy      (see all posts) 2007/05/10 (Thu) @ 17:01

You’re right, the 3B doesn’t appear to change position much based on L/R batter—peak out% is same location.  So advantage for LHB could be:
1) the step closer to 1B (which matters less as distance of throw to 1B declines)
2) slow rollers the runner beats out (more likely when hitting to oppos field)
3) ?

* *

I agree that reaction time is likely part of the explanation.  Also probably more slow rollers to 3B than SS, and of course SSs are generally better fielders.  But I do think guarding the line is part of the issue.  Based on your graphs, it looks like 3B stand at about 10 degrees.  It sure looks like they would get to higher % of balls if they were somewhere between 12 and 15 degrees.  But then they would give up more doubles.


#22    Tangotiger      (see all posts) 2007/05/10 (Thu) @ 17:11

Absolutely that guarding the line is an issue.  STATS Scoreboard looked at this once, and compared the out rates and XBH rates in close/late games and otherwise, and there was a definite shift.  The overall effect, IIRC, cancelled out.


#23    Joe Arthur      (see all posts) 2007/05/11 (Fri) @ 07:03

A minor observation about one of John’s graphs:
pitchers have a slightly higher out percentage on ground balls toward the first base side. Is that mostly a result of pitcher handedness - more pitchers are right-handed, with the glove hand toward first?


#24    John Walsh      (see all posts) 2007/05/11 (Fri) @ 08:25

Joe, I noticed the same thing about pitchers.  RHP also tend to fall off the mound towards their left, which makes fielding a comebacker to the right side of the mound difficult.


#25    Guy      (see all posts) 2007/05/11 (Fri) @ 10:10

Handedness may play a role, but balls fielded to the right of the mound also involve a shorter throw, and thus more time to recover on bobbles and difficult chances.  In general, it appears that balls to the right are easier to convert to outs:  the out% for 2Bmen is about as good as for SSs, despite their being clearly inferior fielders, and the out% for 1Bmen is higher than for 3B, again despite inferior talent.


#26    dcj      (see all posts) 2007/05/12 (Sat) @ 23:46

Wonderful graphs!

Any ideas on what might be the cause for the two-peaked distribution of balls hit to the left side by RHB? I’m stumped.


#27    tangotiger      (see all posts) 2007/05/13 (Sun) @ 10:07

In order not to have the two-peaks, you’d have to have an extra 1000 GB in that one particular valley separating the two peaks.

What is a GB?  If you hit it just a little harder, you’d get a line drive.  So, this could very well be selection bias.  A ball that goes between SS/3B and hits the ground just outside the dirt area of the infield is likely marked a line drive, not a GB.

It would be more instructive not to look at GB/LD/FB, but balls by distance.  How many balls were hit that landed or were fielded within, say, 200 feet, or at various distance levels, like, first hits the ground between 75 and 150 feet, etc.

***

The other possibility is that you have distinct types of RHH: those that really try to pull, and those that spray a bit more.  You end up with the peaks we see.  Of course, it’s more likely you have a continuum of such hitters, and therefore this theory shouldn’t hold water.


#28    Guy      (see all posts) 2007/05/13 (Sun) @ 14:51

Someone in another thread (maybe Joe A?) speculated there’s a coding bias, such that if a ball is fielded by 3Bman it gets coded on a trajectory a bit closer to 3B, and if fielded by the SS a trajectory closer to 2B.  Seems plausible.  It would be interesting to look only at balls not touched by a fielder to see if the distribution is more smooth.  Might still be affected by which fielder was closer to ball, but bias would likely be less.


#29          (see all posts) 2007/05/14 (Mon) @ 00:35

First, hats off to John W. for creating the most accessible map of groundball distribution ever. While a certain amount of adjustment is needed for purposes of quantifiing run prevention values, this method has the inestimable virtue of showing WHERE the strengths/weaknesses are.

In the old BBBA days, we had a method to measure the force of the grounder and the distance the fielder travelled. (It was never quantified into a system because the data collection requirements were astronomical.) These two elements are still missing from the current data set, but I’m not sure that “distance” doesn’t get mostly canceled out ("distance" accounts for positioning w/o having to code it into the data, but it’s infinitely easier to assign a location than to track the fielder movement).

As noted by a number of posters, it’s “force” that determines a lot of the differences here, with weaker GBs causing more problems than hard hit balls, especially those hit to the left side. That could also account for the markedly lower out% between 3B and SS (hence the double hump)--if the 3B can’t get to the ball, the SS is going to take too much time to get the out on anyone other than a slow RHB.

It would be extremely interesting to see these charts broken out for pitchers, though you’d probably need several years’ worth of data to be able to trust them.


#30    joe arthur      (see all posts) 2007/05/14 (Mon) @ 10:18

I liked Tango’s 1st suggestion in #27, because I expect that the same trajectory batted ball can become either a line drive or a one-hop grounder, depending on whether the fielder is in or back, and whether he handles the ball or not. However, I looked at some 2006 data I have (collected from Fox hit charts, whose underlying source is STATS) and it looked like the peaks persist even when line drives are counted (though my data isn’t perfect and I had to make an assumption.)

Then I spot checked Tango’s 2nd suggestion, on spray hitters vs pull hitters by looking at Derek jeter’s distribution - he had really 3 peaks (along vector D (3b) along vector J (ss) and vector M (just about up the middle on the ss sid eof the bag, as well as a declining but much greater than typical number of grounders to the right side. So he still shows two peaks on the left side.

I don’t remember suggesting scorer bias; while I don’t rule it out completely, I don’t think it’s so hard to locate the direction of balls hit in the infield, bias could soften the valley, but not eliminate it I’ll suggest yet another theory: that batted ball location may correlate somewhat with pitch location. A pull hitter may tend to hit an inside pitch on the ground to 3rd, and an outside pitch toward straightaway short. A pitch down the middle would be hit in the hole, but those are perhaps rarer. A spray hitter is aiming up the middle, and will tend to hit an inside pitch to short and an outside pitch to 2nd, while hitting a “down the middle” pitch up the middle.

Just as we seem to have discovered that there’s more to getting a ground ball than just being up or down in the strike zone, but that being down in the zone does help, I don’t expect that pitching inside or outside is the full explanation of the peaks and valleys in the distribution of ground balls in play, but it may be a good part of the explanation.



#32    Mike Green      (see all posts) 2007/05/24 (Thu) @ 15:58

Superb work.  The next step is addressing park differences, which presumably would mostly be about playing surface.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:17
Mail: rWAR v fWAR

Sep 02 15:08
The two uncertainties of UZR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?