THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, May 22, 2009

Fielding positioning, part 2

By Tangotiger, 10:08 AM

Max comes back for more.  It’s excellent work, though I have a couple of issues.

1. You need to use the run value of all batted balls, not just those that go for hits.  Outs have a run value as well.  From his chart, it’s apparent that he only considers positive values.  Either you do it the linear weights way, where the average is zero, or you do what wOBA does, and compare the run value of the positive events to the out.

2.

If you believe the dotted lines should cross the (blue) density line in places where it peaks, you are quite right.

No.  Suppose you only have one peak (a parabola)?  Do you put the three infielders all at the same spot?  We want to lower the area under the graph. 

You take this graph, which is the opportunities you are faced with:

And move players so that the results of this graph can get manipulated

So that when you multiply the two graphs, the area under the line is minimized as much as possible.


#1    MGL      (see all posts) 2009/05/22 (Fri) @ 22:26

I just skimmed the article, so Max or someone else can correct me if I am making any incorrect assumptions, but one of the things that must be considered when deciding where to position your fielders is the fact that, like everything else, a hitter’s spray chart will regress toward the mean.  Thus, you rarely want to position your infielders in such a way to minimize the run value of a hitter’s actual batted ball distribution (unless of course you can go back in time and THEN position your defense). You want to take a hitter’s spray chart, regress it towards the mean and THEN figure out how to position your fielders.  This is a profound difference.  In fact, I am pretty sure that almost all teams make this mistake in positioning their fielders - assuming that that a hitter’s spray chart will more or less be the same in the future as it was in past - what will NOT be the case unless it (the distribution of the batted balls) were exactly average given the population that the batter belongs to.  I am assuming that Max makes the same mistake (I could be wrong), but his methodology is probably good - in order to use it however, first you have to “regress” that spray chart…


#2    Peter Jensen      (see all posts) 2009/05/23 (Sat) @ 02:38

MGL - I don’t understand your post at all.  Maybe you could illustrate what you are saying with an example.  Just how would you regress Ryan Howard’s spray chart to the mean?  What mean would you use?  If you had taken his spray chart through 2007 and regressed it, was his 2008 spray chart closer to his regressed chart or his actual prior spray chart?


#3    Max      (see all posts) 2009/05/23 (Sat) @ 04:50

Tom,
thank you for your quick review.

Issue #2 was actually going to be the core of my next work on shifts.

#1 I’m not sure I have understood.
Here’s why I used just balls gone for hits.
The likelihood of a batted ball to be turned into an out depends on the actual defensive positioning, thus taking all batted balls into account would give lower values in the proximity of where infielders play.
Let’s assume I find that for a particular player the leftmost player (third baseman) must be placed on the 2nd base bag: obviously every ball hit to the usual 3rd-baseman position will go for a hit; if I took all batted balls in my chart, those balls’ weight would be deflated just because somebody plays there against other batters.

If I missed your point, please let me know.

---

MGL,
I didn’t regress and I understand your point that without doing it I’m basically predicting the past; I think I’d benefit from an example too.


#4    David Gassko      (see all posts) 2009/05/23 (Sat) @ 13:20

I think for a first pass through, regressing to the mean is unnecessary. To generate actionable predictions, however, you would indeed have to.


#5    Peter Jensen      (see all posts) 2009/05/23 (Sat) @ 14:51

David - I’ll ask you the same questions I asked MGL.  What mean would you use?  Can you give an example using data up to 2007 to predict 2008 for a specific player?


#6    MGL      (see all posts) 2009/05/23 (Sat) @ 18:51

Let’s say that the average RHB has a median vector for his ground balls of 52 degrees (where 0 is the RF line and 90 is the LF line and 45 is right up the middle) and the SD of each of his GB is 11 degrees.  Now, I don’t know that these two numbers exactly defines the ground ball distribution for a player, but I suspect that it is close.

Now let’s say that you have 3 GB of data for RH player A and they are right down the first base line (he cued one right down the line for a double), one up the middle and one right to the SS.

Do you think that the proper defensive alignment in the future can be gotten from those 3 ground balls?  Of course not!  You would probably go with the average distribution for all RHB.

If you have a RH player with a few hundred ground balls and his median vector is 48 degrees, I can guarantee you that his “true talent” median vector will be between 48 and 52.  And if his SD is 9 degrees his true talent SD will be between 9 and 11 degrees.  I have done the research.  There is a high year to year correlation on both median vector and SD for both fly balls and ground balls (as well as distance of fly balls), but it ain’t 1.0 (of course).

That goes for any number of ground balls. Whatever his actual/sample/historical distribution, his future distribution is going to be closer to the normal or typical ("mean") distribution of all similar players.

Is this a new concept?  You guys are acting like I just made some outlandish statement.  I thought it was obvious that a player’s distribution of batted balls will “randomly” fluctuate (around his “true talent” distribution) like any other sample performance and that we have to regress that distribution toward the average distribution for all similar players, as we do with everything else.

Again, his methodology for positioning fielders may be correct, but you DON’T use a player’s actual distribution of batted balls if you want to figure out the optimal positioning for that player in the future.  First you have to project his batted ball distribution, and like anything else that involves sample data, that projection is NOT going to be the same as the sample data itself as I make crystal clear with the example of the player who has hit only 3 ground balls.


#7    Peter Jensen      (see all posts) 2009/05/24 (Sun) @ 09:20

MGL - There’s an intersting non-answer answer.  I asked three questions in post 2.  You didn’t answer any of them.  Instead you give me a silly example of a player with 3 ground balls.  Who in his right mind is going to construct a spray chart for three ground balls?  Let alone make predictions from that spray chart.  Max uses two examples in his paper.  Derek Jeter who hit 299 ground balls in 2008 and Ryan Howard who hit 172.  You say that there is a high year to year correlation for hit distribution.  Howard has 546 GB 2005-2008 and Jeter 1233.  If you had said that Max’s point would have been stronger if he had used multiple years instead of just 2008, I would have agreed.  What would be gained by regressing their actual distribution to some other mean? 

And I ask again: What would you use for that mean?  Above you say all same handed batters.  Surely using all left handed hitters wouldn’t give a better projection for Howard.  Nor all right handed hitters give a better projection for a spray hitter like Jeter.

And how useful is this information going to be anyway?  Is Jeter’s spray chart going to be the same for right handed pitching as left handed pitching?  For a pitcher with and 88 MPH fastball as a pitcher with a 94 MPH fastball?  Against a fast ball as against a curve?  With the bases empty as with runners in scoring position.  On the first pitch as during hitter’s counts or as during pitcher’s counts?  There are just too many variables.  And on top of that we have no idea of the extent that a batter can control the horizontal angle of his hit balls to avoid fielders.  Although the charts are interesting and I think Max did a great job with both the research and presentation, I have serious doubts that even when we have accurate hit ball information and location of the fielders that this type of research will be able to help fielders prevent hits by locating themselves better.  With the exception of gross movement of fielders like shifting for Howard.


#8    MGL      (see all posts) 2009/05/24 (Sun) @ 13:36

Peter, I have no idea whether Max’s methodology would be good for positioning fielders.  I was not commenting on that whatsoever.  I was simply commenting on the fact that if you are going to “project” a hitter’s average batted ball distribution (against all pitchers or any class of pitchers you want), as with any projection based on sample data, you have to regress that sample data towards a population mean. Period, end of story.  And I think that I made that point crystal clear in my last post, including the example of the player with 3 ground balls, which was given in order to illustrate the point.  Obviously a player that has several hundred or several thousand batted balls is not going to get regressed very much.  I would even “admit” that not regressing the spray chart of a hitter with several thousand batted balls is probably good enough for “government work” since the regression is probably going to be something on the order of a few percent or so.

As far as whether the kind of methodology that Max advocates actually works in reality, don’t talk to me about that!  I didn’t write the article and I am not advocating it (although I did hastily say that I thought it “worked").  I actually agree with you that the fielders knowing the pitcher and the type of pitches he might throw in various situations is going to trump any “optimal position” that is determined from Max’s method.  But again, my comments had nothing to do with that whatsoever.  My comment was on one thing and one thing only - which is simply that if you are going to use a player’s batted ball distribution to determine ANYTHING about the future, you have to regress that distribution towards some mean, as we do with ALL sample data when we want to make inferences about the future r about “true talent.” Obviously the more data you have, the less it is necessary to regress.  And I am pretty sure I explained how and why, more or less, 100% in my last post.  So I have absolutely no idea what your issue is.  I have nothing else to add to the discussion.


#9    Tangotiger      (see all posts) 2009/05/24 (Sun) @ 17:14

If I had to guess, I would say you need to add 50 “average” balls in play to the distribution.  That is, you don’t really need to add much (compared to what we are used to).

However, for the “run value” one, I’d expect you need to add much more.


#10    MGL      (see all posts) 2009/05/24 (Sun) @ 18:35

Well, at least SOMEONE actually understands what I was saying, which I thought was simple and obvious.

Off the top of my head, I think that 50 is way too little.  I ran some year to year correlations a while ago on fly ball distance, median direction, and SD of direction, and the same for ground balls without the distance.  I think that the “r’s” suggested quite a bit more than 50 “average” balls, but I’m not sure.  I’ll have to check.

But I think the concept is important.  I will reiterate that I think that teams improperly position their fielders sometimes based on short term spray charts.  Although, as I said, and Tango says as well, as long as you have at least a couple hundred batted balls, you are probably OK using an actual spray chart with no regressing.


#11    Tangotiger      (see all posts) 2009/05/24 (Sun) @ 20:57

There are two reasons I am guessing at “50”.

1. When you look at K/PA rates, or GB/BIP rates, you need something like 100 PA or BIP or something.  That is, you don’t need much.  And the reason you don’t need much is…

2. ...if a “skill” is not that important, the spread in talent will be rather wide.  And the larger the variance, the less noise affects the data.

Since I presume that the spray pattern is not that important to a player’s skill level (just as you have lots of successful low K/PA or hi K/PA, or hi GB/BIP or lo GB/BIP hitters, I also figure you have a similar situation with pulls and spray hitters), then the more real the spray pattern is.

Since I also know that OBP and wOBA is at around 200-250 PA (which includes BABIP of course), I’m just presuming something as base as the spray pattern will be akin to the GB/BIP. 

I guessed 50, I will presume it’s not more than 100, and will be in shock if it’s at 150.


#12    MGL      (see all posts) 2009/05/24 (Sun) @ 22:26

Sure, one of the determining factors for the talent spread you see is how “important” it is.  But, if there simply isn’t a lot of spread in true talent among all players at all levels, then there won’t be much spread in the major leagues even if the skill is not important.  Plus, even with a large spread there are a certain minimum number of average “N” that you have to add. I would think that 50 is near that minimum.  IOW, even with a large spread in true talent, I think you are looking at closer to 100.


#13    MGL      (see all posts) 2009/05/24 (Sun) @ 23:31

OK, here is some data.  I recorded for every player, their mean and median distance and location vector for all of their fly balls and the mean and median vector for all of their ground balls.

For switch hitters, I recorded the data separately for when they were batting RH and when they were batting LH.

I did this for 2002-2008.

I then created regression pairs for each player with a min of 150 fly balls or ground balls in each year.  I regressed 02 on 03, 03 on 04, 05 on 06, and 07 on 08 (I overlapped one year - 03).

I computed the “r” for the regressions.

For ground balls for RHB, the average mean vector was 193 degrees where 180 goes through second base.  The “r” for all players with at least 150 ground balls in back to back years, an average of 203 ground balls, was .746.

For the median vector, the average was 198 and the “r” was .629.

If we call that an “r” of .700, that implies that “r” = .5 at around 87 ground balls.

The average SD in direction is 21 degrees.  The “r” is .551, so that implies that “r"=.5 at around 165 ground balls.

So for ground balls, we add around 90 average for mean or median direction and 165 average for the “spread” of the ground balls.

For lefty batters, the “r’s” for average direction are about the same, but for spread, it is only .377 for an average of 204 ground balls per player, so we have to add around 335 average ground balls.  Average mean and median direction for LHB are 169 and 163 degrees.  SD is 21 degrees.

For fly balls, we have 3 parameters: distance, direction, and spread (SD of direction).

For LHB and direction, we have:

An “r” of around .660 for an average of 234 fly balls per player for mean and median direction.  Average direction is 182 degrees, just LEFT of CF (for long fly balls, it is .  Average spread is 29 degrees and the “r” for spread is .433.

For RHB, the average direction is 178 degrees, just RIGHT of CF.  Average spread is 29 degrees.  The “r’s” are around .720 for direction and .483 for spread.  The average number of fly balls per player is 251.

So the number of average fly balls for RH and LH batters to add is around 200 for direction and 330 for spread.

For fly balls, it also depends if we are talking about all air balls or just, for example, long fly balls (excluding pop-ups).  So the fly balls numbers are a little tricky to figure.


#14    MGL      (see all posts) 2009/05/24 (Sun) @ 23:39

Tango, to follow up on your statement about spray pattern not being important in baseball (which I agree that it isn’t), therefore you would expect a large spread in talent:

Again, the most important factor is what the spread is among all possible baseball players.  I agree that even if the spread in a certain skill is large among the general population or even among all levels of baseball, if that skill is important at the major league level, then we will see a small spread in talent at the major league level, which is ONE of the reasons why we see little spread in BABIP among pitchers, for example.

But, again, if there is not much spread in talent to start with, it will not matter whether it is important or not in the major leagues.  For example, height is not THAT important in major league baseball, so we see a fairly large spread in height (say from 5’7” to 6’7").  But, head size is not that important, but we don’t see much of a spread in that because there is not much of a spread in head size among all human beings.

So even though spray pattern is not important in baseball, we don’t know without looking at the numbers whether there is much a spread in that “skill” at all among all potential major leaguers.  So I think you jumped the gun a little with that statement.


#15    Peter Jensen      (see all posts) 2009/05/25 (Mon) @ 01:27

Tango and MGL - The last 6 posts are the most ridiculus I have seen on this blog.  You are discussing mean and median vectors, r, and SD of vector direction, and “skill” and true talent of vector direction as if they actually mean something.  How can the mean of vector direction of GB, or FB have any meaning if the batter is not trying to hit the ball to the same location every time?  When I played ball every at bat I would pick out at least 4 locations given where the fielders are playing me where I would try and hit the ball depending on the location, speed, and type of pitch.  My “skill” is how close I can come to the location I intend to hit the ball.  Sometimes that will be pulled down the line, sometimes in the gap, sometimes up the middle, and sometimes going the other way.  I could have all the “skill” in the world and hit my exact location every time and my vectors would still have a large SD, and the year to year means might vary considerably if the pitches thrown to me changed or the defensive alignment changed.  Unless you find a way to measure the batter’s intent what you are measuring isn’t a skill at all.


#16    Colin Wyers      (see all posts) 2009/05/25 (Mon) @ 02:54

Peter, the entire idea of defensive positioning is to place yourself where the batter is likely to hit the ball even after they know that’s where you’re standing. If there is no element of a hitter’s spray chart that isn’t both persistent and unrelated to where the hitter is aiming the ball (presuming that the hitter wouldn’t aim the ball directly at a defender), then there is no point to trying to position defenders based upon the specific hitter at all.

And that’s wrong, because we know that batters of certain handedness have certain pull tendencies; if they all have the same pull tendencies regardless of what batter then, again, we simply regress every spray chart to the mean 100%, because there is no difference between players.


#17    MGL      (see all posts) 2009/05/25 (Mon) @ 03:24

Peter, I think you are going to be a minority of exactly one in terms of your assessment of my and Tango’s comments.  I don’t really know how to respond to your posts.  Maybe someone else can help, as Colin has tried to do.  Major league baseball is not softball.  By and large, batters cannot control where they hit the ball.  Batters have certain tendencies which is why they have their own somewhat unique spray charts, but when you are facing major league pitchers throwing 88-98 mph with all kinds of wicked off-speed and other pitches, it is all you can do to try and make solid contact.  Now, I don’t know that mean or median distance, direction, and spread exactly define a hitter’s “true talent” batted ball pattern, but I suspect that it does to a large extent.

And of course when we say “true talent” or “talent” in this context, we are talking about a hitter’s performance over an infinite number of trials.  In practice, we also equate that with our best guess as to a player’s performance in the future over X number of opportunities.

At this point, we seem to be speaking two different languages, but as I said, maybe someone else can bridge the gap or act as an interpreter, because I really don’t understand what your quibble is, and saying that Tango’s and my posts “are the most ridiculous things you have ever heard” is not helping to further the discussion, not to mention the fact that the odds of either Tango or I, let alone both of us, EVER expounding something ridiculous is about 1000-1.  Maybe you just hit the jackpot though.  You never know…


#18    Max      (see all posts) 2009/05/25 (Mon) @ 12:34

MGL, I’m wondering if working with means and standard deviations might have some issues, since the distributions of batted balls (GBs at least) are skewed.
Anyway, thank you very much for showing the whole process at #13.

Tango, what about your point number 1 and my reply at #3?


#19    Peter Jensen      (see all posts) 2009/05/25 (Mon) @ 14:08

Peter, the entire idea of defensive positioning is to place yourself where the batter is likely to hit the ball even after they know that’s where you’re standing.

No, Colin, that’s wrong.  The idea of defenseive positioning as Tango correctly pointed out is to minimize the win value of a hit ball given the probability distribution of where that ball might be hit by the batter GIVEN ALL KNOWN FACTORS AFFECTING HIT BALL DISTRIBUTION and the current game state.  If you were correct and MGL were correct that “By and large, batters cannot control where they hit the ball” then it would be advantageous to have a shift on for every batter.

The reason that Max’s charts are interesting and potentially useful is that they are not a normal distribution around a central mean, but are multimodal as well as skewed, as he mentions in post #18.  That seems to me to indicate that batters can exhibit some control over their hit balls.


#20    Mike Fast      (see all posts) 2009/05/25 (Mon) @ 17:20

Peter, my work would also tend to indicate that batters have a significant ability to “hit ‘em where they ain’t”, although this ability is curtailed by the pitcher’s ability to choose pitch type and location.  There is an element of random chance as well, but it’s not at all clear to me that the random element is the controlling one.  In fact, all the evidence I’ve seen tends to indicate that the random element is a lesser one compared to the control exhibited by the batter and pitcher.  I would not assume a normal distribution as a starting point and would be hesitant to accept several of MGL’s assertions about the fundamental nature of hit balls.


#21    MGL      (see all posts) 2009/05/25 (Mon) @ 20:12

I am not by any means assuming a normal distribution.

“Several of MGL’s assertions.” What assertions are those?  I am making one assertion, which is that whatever method one uses to position fielders so as to maximize the defensive team’s chances of winning the game, that one must make some adjustments to a player’s actual, historical distribution of batted balls and that that adjustment will involve “regressing” that actual/historical batted ball distribution towards that of the average, similar batter.  That has nothing whatsoever to do with how much batters can or cannot control where they hit the ball (other than the magnitude of the regression), or whether the batted ball distributions of most or all batters is normal, multi-nodal, skewed, etc.  For the record Max, all distributions have a SD, a mean, a median, etc.  I said at least twice that I am not claiming that those things (mean or median and SD) define the distribution as they would with a normal distribution.  So what assertions of mine exactly are being disputed?


#22    Mike Fast      (see all posts) 2009/05/25 (Mon) @ 22:58

Re MGL #21, I misunderstood that you were not specifying a normal distribution, so I appreciate your clarification on that point.  Your use of regression analysis in #13 implied to me that you thought the distribution of vectors was random around a central mean.

Still, there are two assertions of yours that I disagree with or at least am very skeptical of.

spray pattern is not important in baseball

I know that was originally Tango’s assertion, but you agreed with it, too.  I guess I’d be curious to hear what both of you mean by that because at face value that claim does not make sense to me.

Now, I don’t know that mean or median distance, direction, and spread exactly define a hitter’s “true talent” batted ball pattern, but I suspect that it does to a large extent.

I suspect that it only does to a small extent, such that an approach based on your assertion would not help a team much more than simply placing their fielders at the typical positions, whereas an approach based on Peter’s approach would find more success.

Maybe Peter and I are both tainted by having looked at too much HITf/x data.  I know that my belief in the randomness of baseball has been shrinking the more I have learned about it.  Newton’s equations of motions are pretty damn deterministic.  On the other hand, I wonder if you are “tainted” by having looked at too much aggregated fielding data, where the “randomness” washes out the real information that was to be had.

I would love to see the whole set of data and a proper approach that would show who is right.  Maybe some day we can have that.  We are getting closer to it, anyway.


#23    MGL      (see all posts) 2009/05/25 (Mon) @ 23:20

Mike, yes I was pretty much just agreeing with Tango without thinking much about it.  But, without speaking for him, what he means is that when there is not a large selection process in the major leagues around a certain attribute then you are more likely to have a large spread of talent in the majors with respect to that attribute.  Now that is not a hard and fast rule, but he is correct. We all realize that a hitter’s spray pattern is part and parcel of his overall hitting skill and success, but it is not like there are too many scouts looking at a player in the minors or at the amateur level and saying, “Yeah, he has great power, speed, and strike zone judgment, but I just don’t like his spray pattern.” Seriously, it is not at the top of the list of things that determine whether a player makes the major leagues, other than as a small reflection of his overall hitting skill. Another way to look at it is, There are successful hitters who pull the ball, hit to the opposite field, and everything in between. That is NOT the case with, for example, BABIP.  There are not successful pitchers who allow a high and low BABIP in the major leagues.

So Tango is right that if there is indeed (and this is a big IF, as I stated earlier) a large or fairly large spread among all baseball players at all levels with respect to that attribute, and if that attribute is not a significant selection factor at the major league level, we are not likely to see that spread shrink considerably at the major league level, such as we do with BABIP.

“...such that an approach based on your assertion would not help a team much more than simply placing their fielders at the typical positions, whereas an approach based on Peter’s approach would find more success.”

Again, I am NOT advocating any approach at all.  In fact, I have said twice that Max is probably on the right track.  I only said that rather than use a player’s actual spray chart to implement that approach, or whatever approach works, one has to somehow regress his actual spray chart towards some population “mean” (where I use the word “mean” as a proxy for the average distribution of a similar player).

We KNOW that has to be the case by invoking the example of the player with 3 batted balls that Peter mocked, yet was a perfect example of why you HAVE to regress these actual spray charts if you want to set a defense for the future.  Obviously you can’t use the actual spray chart for a player who has 3 batted balls.  So what WOULD you use for that player?  Hmmmm....  The average spray chart of a similar (same handedness, same type of swing, etc.)? Yes, of course!  Somehow we all know that.

So what about a player with 10 batted balls?  What do we use then?  Hmmm… An average, similar player, with a shade towards that player’s actual spray pattern?  Yes, of course.

What about a player with 40 or 50 balls?  Does Peter or anyone think we magically start to use a player’s actual spray chart at 29 or 68 batted balls?  I would hope not.  So what are we doing here?  Let’s see.  We’ve only talked about this a million plus times on this blog.  We are regressing a hitter’s actual batted ball distribution towards that of an average similar player in order to determine where to defense him in the future!  How to do the actual regression and how to set the defense based on a projected spray chart (rather than an actual one), I am NOT commenting on at all.  I’ll leave that up to Mike, Max, and the rest of the very competent hit and pitch f/x guys…


#24    Mike Fast      (see all posts) 2009/05/25 (Mon) @ 23:31

MGL, I agree with your main assertion, that you should not take a spray chart at face value, unregressed, especially if the sample is small.

You are, however, commenting on how to do the actual regression even if you are not laying out a full and detailed method for doing so.  You are suggesting a beginning point, but I would begin at a different place.

I think Peter’s point is, and mine is whether Peter’s is or not, that you are better off starting that “regression” process by determining the factors of the game state and the players on the field than you are staring by regressing to a typical RHB or LHB distribution.


#25    Peter Jensen      (see all posts) 2009/05/26 (Tue) @ 02:26

Seriously, it is not at the top of the list of things that determine whether a player makes the major leagues, other than as a small reflection of his overall hitting skill.

But I bet that the phrase “Has the ability to hit with power to the opposite field” has made it into many scouting reports.  A player’s ability to have some control over his “spray chart” is not just a small reflection of his overall hitting skill”, it can be one of the major determining factors of that player’s potential to be a high average hitter.

There are successful hitters who pull the ball, hit to the opposite field, and everything in between. That is NOT the case with, for example, BABIP.  There are not successful pitchers who allow a high and low BABIP in the major leagues.

A better analogy would be pitchers who have mastery over more than two pitches.  There are successful pitchers in the majors with only two pitches.  But having 3 or 4 or more pitches in his arsenal gives a pitcher more options and a greater chance of success if he doesn’t have a dominating fastball.  Ryan Howard is a successful hitter without much ability to adjust his spray chart because he can crush pitches from right handers that are anything less than perfect.  But it is not as likely that he will be as great or have as long a career as Albert Pujols or Alex Rodriguez because those players have a greater range of skills including the ability to adjust to good pitches from both left and right handers and hit them with authority to all parts of the field.  And some players like Ichiro and Jeter or Pete Rose or Wade Boggs who didn’t have much ability to hit with power may owe their very successful major league careers to their abilty to control where they hit the ball.  It has been pointed out many times that the difference between a .240 hitter and a .320 hitter is only two hits a week.  We just don’t have enough information to know whether a batter’s ability to control where he hits the ball is an important factor in getting those extra two hits.  So it is a bit premature to dismiss spray pattern as “not being important in baseball” and to say “batters cannot control where they hit the ball.”

As to your 3 batted ball player being a “crystal clear” example of why we need to regress a players spray chart, I agree that we can not use small sample sizes to make projections without additional information.  But all I asked is what is the population you are going to regress to and to show me an example (a real life example, not an absurd hypothetical) of where regressing to that population gives a better projection than using a player’s personal spray chart history.  You have made vague assertions about regressing to “all same handed players” or “similar players” or “players with the same type swing”.  Very few players make to the major leagues without playing a year or two in the minors.  Is another major league player or group of players going to be more similar than the same player in the minor leagues?  Is adding 200 or 300 hit balls from any other population going to give a better projection than adding 200 or 300 hit balls from the same player when he was younger even if they were from when he was in the minor leagues?


#26    Tangotiger      (see all posts) 2009/05/26 (Tue) @ 11:20

Max, I think you probably don’t appreciate the many past discussions (which you probably missed) on the run value of a hit and an out.

You must use the run value of both.  To the extent that you only want to have a positive number, then the run value of the hit should be set so that you subtract the run value of the out on all batted ball run values.

So, your choice is:
+0.47 single
-0.28 out

+0.75 single
0.00 out


#27    Peter Jensen      (see all posts) 2009/05/26 (Tue) @ 11:34

Max - What graphing package did you use to generate your density plots?  I read the explanation of density plots that you linked to and found them interesting.  Also can anyone help me with the HTML that would allow me to put charts generated in Excel into a post here?


#28    Tangotiger      (see all posts) 2009/05/26 (Tue) @ 12:04

This software blocks the posting of images, other than for the main entry.  You can email the images to me, and I can post them if you like.


#29    Tangotiger      (see all posts) 2009/05/26 (Tue) @ 12:08

Hmmm… looks like I had a setting to block images.  I will open it up, and see how it works

This is how you code it:

<img src="http://www.tangotiger.net/thebookthumb2.jpg" />


#30    Matt Mitchell      (see all posts) 2009/05/26 (Tue) @ 12:31

Peter/#27,

My guess is Max used R, just based on the style of the graphs. Max can correct me if I’m wrong.


#31    Max      (see all posts) 2009/05/26 (Tue) @ 16:26

Tango #26. Ok, I got it; I misunderstood what you wrote on the main entry. Agreed.

Peter #27. Yes, it’s R.

MGL #21. I know that every distribution has a mean and a standard deviation, but only the gaussian is so well defined by those two values.
You have not assumed that the spray pattern has a distribution close to normal, but by calculating the numbers to regress against on the mean and the standard deviation (thus defining the spray pattern by a position and a dispersion), I believe there is some implicit normality assumption.
I agree with you that regression is important, and I think the values you provided are a very good guide, but I think it’s not an easy task to find out the “right” numbers and the similar players (unless we choose to regress to all same handed batters).


#32    MGL      (see all posts) 2009/05/26 (Tue) @ 18:42

"I agree with you that regression is important, and I think the values you provided are a very good guide, but I think it’s not an easy task to find out the “right” numbers and the similar players (unless we choose to regress to all same handed batters).”

I agree, other than one thing.  It is not any more “difficult” to find the appropriate similar population of players towards which to regress.  As with anything else, you do the best you can.  Certainly you start with the same handedness and go from there.  If you know nothing else about the player but his handedness, then you simply use that. If you know something else, then you use that too. Etc.  Finding similar players in order to regress sample data towards is never “difficult.” By that I mean that is not the right word.  You do the best you can given the information you have on the player and what attributes you think are relevant in order to define the population you want to use.  Obviously it is better to narrow the population if you can, but it is not “necessary.” For example, with our 3 batted ball player, if all we know is his handedness, then we simply use the spray chart of the average player of the same handedness.  It is not “difficult” to figure out what population he belongs to, given the information we have.  To be honest there is not a whole lot of really significant factors you can use when it comes to defining the population a player comes from in terms of his spray pattern, I don’t think, other than handedness.  What you DON’T want to do, which is a mistake that many people make, is to use ANYTHING that is based on or inferred from the sample data you are using.  IOW, let’s say that you have a RHB that appears to spray his ground balls all over the place rather than pull them as much as the typical RHB.  Well, guess which population you want to (MUST) use in order to regress that spray pattern?  That of the typical, average RHB who mostly pulls his GB.  That is the whole point of regressing and using the “mean” of a certain population.  It is NOT to try and figure out what kind of a player you have - it is the exact opposite, again unless you know something about the player that suggests a different population AND that is not a function of or implied by the sample data itself!  Off the top of my head, things you MIGHT use to determine the population a player comes from in terms of spray pattern, are how close or far to the plate he stands, his bat speed it you know it, whether his stance is open or closed, his K rate, his contact rate, his power (as measured by some stat), and maybe a few other things.  But by and large, handedness (and power) is going to be the primary determinant.

As far as HOW to do the regressing, as I said, I don’t really know.  I would guess that treating the distribution as somewhat normal and regressing the spread (SD of direction) and the mean or median direction (and distance for air balls) is a pretty good start.  If nothing else, I would take a player’s actual spray pattern and merely shift it towards that of the average player of that handedness and probably shrink or widen it, again, towards the width of the average player of that handedness.  And I definitely would shift the distances of all air balls toward that of the average player (of that size and weight).  We can easily show that NOT doing any of that (basically regressing) would be a mistake.  For example, let’s take all the players in 2005 and 2006 whose fly balls averaged 280 feet where the average player’s fly balls averaged 260 feet.  If we set the defense in 2007 and 2008 to assume that these same players average fly ball distance was going to be 280 feet, I guarantee that would be wrong.  I guarantee that their average fly ball distance in 07 and 08 will be somewhere in between 260 (the league average) and 280.  I guarantee the same thing for the average direction of their ground and air balls and the spread of those ground and air balls.  How much they will regress will depend of course on how much data for each player we have in 05 and 06.  Which is exactly why you HAVE to regress in some form or fashion when using some method to formulate an optimal defensive position.  Again, with players for whom you have lots and lots of batted ball data, you can probably get away with not regressing at all.  But for players that you don’t have a lot of data on, it will be a mistake.  Again, that should be obvious.  That was the ONLY point I was trying to make.  Why was I mocked for that?  I am not being sensitive.  I don’t EVER care if someone agrees or disagree with me or even mocks me personally.  But I do WANT to know WHY I was being mocked for something as basic, obvious, and correct as saying that a hitter’s actual spray chart must be regressed toward the distribution of a similar population of players before you use ANY method to set up an optimal defense for that player - and that how imperative that is depends on the amount of data you have. Everything else that was brought up in response to my initial post was a red herring.


#33    Mike Fast      (see all posts) 2009/05/26 (Tue) @ 19:35

MGL, correct me if I’m summarizing your position incorrectly, but it seems that you believe that as long as you’re regressing a player’s batted ball distribution toward some mean from a population of similar players, how you do the regression is of secondary importance.

I just can’t agree with that.  I don’t agree that the discussion of HOW is a red herring.  You have to apply baseball sense to this problem, not simply do a regression.  If you simply regress toward a population mean, you’re going to end up positioning three infielders in the gaps halfway between where the infielders typically currently stand (I don’t know where you’d put the fourth infielder, maybe in the shallow outfield of the pull field, but let’s ignore him for now).  The typical batted ball distribution has very notable peaks in between the infielders.  Presumably that’s because the hitters are trying to hit the ball between the infielders. 

The minute you move the infielders, the batters are going to aim elsewhere and the “typical” batted ball distribution is going to change.  Simply applying a handedness regression is going to get you a very poor baseball answer.


#34    Guy      (see all posts) 2009/05/26 (Tue) @ 21:20

Mike:  My sense is just the opposite:  that batted ball distributions generally show peaks exactly where fielders stand.  And that makes sense:  fielders can shift their position easily from batter to batter, while hitters I think would have to train themselves to change their spray pattern.  And by the time a hitter did that, opposing fielders would just shift accordingly.  So fielders are always going to “win” this game theory exercise. 

Of course, that’s in the aggregate.  Good hitters for average presumably tend to have more varied spray patterns (in addition to hitting ball harder), making it harder to defend them.

* *

This whole exchange is frankly rather odd.  I think everyone would agree that a small sample of BIP would not provide an adequate prediction of a hitter’s future spray pattern.  And I think everyone agrees that there might be multiple “mean” distributions, and figuring out which one a hitter should be regressed to would be very challenging. 

Now, maybe we don’t all agree on how many discrete spray “types” there could be.  Surely, regressing Ryan Howard and Wade Boggs both to a “mean LHH” would not be helpful.  Perhaps there are only be 3 or 4 categories for each hand—extreme pull, standard, opposite field—with tiny differences within each. On the other hand, there might be so many different types that all regression would mean in practice would be a smoothing of the hitter’s actual distribution (after some reasonable N is reached).  This seems like a topic worth exploring.


#35    MGL      (see all posts) 2009/05/26 (Tue) @ 22:17

I am going to bow out of this discussion as it is has passed the ridiculous phase.  I’ll end my contribution by saying to Mike, that yes, you are summarizing my position 100% incorrectly.  If you want to know how, simply re-read my last post and treat it is if it were my only post.  I think my “position” (if you want to even call it that, as I am not really positing anything at all that is not obvious) is crystal clear and there should be NO argument whatsoever with it, anymore than there should be an argument over whether a player’s offensive stats equate to their projection.


#36    Mike Fast      (see all posts) 2009/05/26 (Tue) @ 22:41

Guy, in the outfield, yes, batted ball distributions peak where the outfielders stand.  In the infield (ground balls, line drives) they peak between the fielders.  To me that implies that the hitters are trying to drive the ball between the infielders (and not necessarily trying to--or able to--place it in the gaps between the outfielders) and they have a fair amount of success at that placement. 

When they get under the ball too much or too far out in front of the pitch, they loft it in the air instead of hitting it hard on a line, and then it goes to the outfield, still in the gaps between the infielders, and voilà, there are the outfielders standing in those infield gaps.  How else can you explain why there are fewer balls hit into the outfield gaps?  I haven’t been able to come up with any other good explanation, and my explanation (which, to be fair, I originally heard from Peter Jensen) has the added benefit of lining up with the data we see on the infield.

Even the term “spray pattern” can be a biased term since it implies that the hitter is just fighting off the pitch and spraying it around the yard hoping to have it land fair rather than carefully trying to place it in a certain chosen direction.  Of course, there is some of both effects at play, but I don’t think it’s a good idea to ignore the latter and assume only the former.


#37    Guy      (see all posts) 2009/05/27 (Wed) @ 06:50

Mike, can you provide a source that shows the peaks occur between fielders?  Whan I look at the distribution in John Walsh’s THT piece, http://www.hardballtimes.com/main/article/infield-defense-mdash-back-to-basics/, it looks to me like the peaks are very close to infielders’ traditional positions (with the caveat that 1B and 3B also try to minimize doubles down the lines). 

Look at it this way:  about 70% of GBs become outs.  Unless you think that 4 infielders can cover something like 80% of the total perimeter, then it can’t be true that hitters are disproportionately hitting the ball into gaps. And that doesn’t sound plausible to me.


#38    Peter Jensen      (see all posts) 2009/05/27 (Wed) @ 09:13


Guy - This is a chart of Pedroia’s GB 2005-2008.  Of course we don’t know the exact positions of the fielders and they will change a bit depending on the pitch type and baseout situation and other factors, but my best guess at an average position is where the percentage of ground balls that are outs is at or near a maximum.  I have also used the average position where an infielder catches a line drive for an out as an approximate location.  As you can see from the chart. The peaks are not exactly where the fielder is playing.  Pedroia’s largest peak is the gap between the 3B and SS.  There is another big spike near, but not exactly at where the shortstop is playing.  There are also smaller peaks directly up the middle, and very close to, but not directly at where the 2B is playing.

As you can see by the chart, even when Pedroia is successful at hitting the ball in the gap it is fielded for an out more than 50% of the time.  That is how players can hit them where they ain’t but still only have a BABIP around .350


#39    Peter Jensen      (see all posts) 2009/05/27 (Wed) @ 09:16

Whoops! Doesn’t seem to have worked.  Tango, any chance you can fix this?  The image location I am trying to include is “C:\Users\Peter\Documents\Book2_files\Pedroia_GB.gif”


#40    Tangotiger      (see all posts) 2009/05/27 (Wed) @ 09:46

Peter: oic.  You are trying to link to a document that is on your hard drive.  That won’t work.  You need to post it online somwhere, like flickr.com .  Then, you can link to it via the IMG tag.  If uploading is a problem, you can email me the image, and I can upload it on my server.


#41    Peter Jensen      (see all posts) 2009/05/27 (Wed) @ 10:01

Thanks, i have already sent you the image.


#42    Mike Fast      (see all posts) 2009/05/27 (Wed) @ 10:16

Guy, one quibble I have with using John Walsh’s data to draw that conclusion is that it excludes balls which were labeled as line drives.  “Ground” balls which are hit hard enough to make it sharply between the infielders seem more likely to be labeled as line drives than balls hit just as sharply but at an infielder.

Here’s the graph I have of all ground balls plus line drives.  I divided the field into 18 slices of five degrees each.

Approximate average position of the infielders is Slice 4 (3B), Slice 8 (SS), between Slice 13-14 (2B), and between Slice 17-18 (1B).

You can see that hitters do a pretty good job of hitting the infield gap on the pull side and a fair job of sending balls back up through the gap in the middle.  They do less well at hitting the infield gap on the opposite field side.

But this is just the aggregate graph.  If you look at individual players, you can see some players who are much better at this than others, and you can see some players who are good at it at certain ball-strike counts but not at others, or who just take a different approach, whether good or bad.  For most hitters, the “spray” pattern changes noticeably as soon as the pitcher gets one strike, as hitters begin aiming more for the opposite field.  But for Ryan Howard, for example, he continues to aim for the infield gap on the pull side even with two strikes in the count.


#43    Guy      (see all posts) 2009/05/27 (Wed) @ 10:41

I guess I just read this graph very differently.  Fielders obviously have a range in either direction, and within that band they will turn almost all GBs into outs.  For example, I’d guess 3Bmen are able to handle most GBs in slice 5, and SSs can usually get to slice 7.  So, hitting to the gaps really means hitting between the outer edges of the 3B/SS ranges, the SS/2B ranges, and the 2B/1B ranges.  So it looks to me like hitters should be trying to hit to slice 6, 11, and 15-16.  There’s some indication that LHH are pulling the ball into the 1b/2B gap with success, though it’s hard to know without looking separately by different baserunner/out states.  But other than that, I don’t see evidence that hitters successfully find the gaps.  (Of course, that doesn’t mean some hitters don’t show this ability).

On the other hand, if I’m greatly overestimating fielders’ range—if most slice 5 and slice 7 GBs are hits—then I’d be more inclined to agree with your assessment.

Does your data allow you to see if good hitters (for average) tend to have a wider dispersal of BIP?  I would think that would make them harder to defend against.


#44    Mike Fast      (see all posts) 2009/05/27 (Wed) @ 11:12

Guy, we’re looking at something where a 35-40% success rate by the hitter is very good.  So if a RHH aims for the 3B-SS gap and can get the ball within Slices 5-7 most of the time, he’s going get enough of them through that it will be worth his while, as opposed to simply hitting in a “random” direction “typical” of his handedness.  Remember that I’m including line drives here, too, which probably will go for a hit as long as they are not directly at an infielder.

I am not arguing that the infielders typical fielding positions are sub-optimal.  In fact, they look pretty good to me, for the reasons you describe.

I am arguing that there is some very good evidence that the hitters are often trying to aim for a certain direction on the field, namely the gap between the infielders on the pull side of the field.  Both the infield and outfield batted ball distributions show this.

Also, as the count changes, you can see the hitters began to shift their aim toward the infield gap on the opposite field side, which isn’t shown in my graph.  The distribution isn’t a nice, smooth, skewed version of a normal distribution where one central peak gradually inches over toward the opposite field as the count changes.  Instead, the multi-modal peaks begin to shift from one infield gap to the other. (The point is probably clearer if I break the data down by number of strikes on the batter, but I don’t have the time or space to write a whole in-depth article here.)


#45    MGL      (see all posts) 2009/05/27 (Wed) @ 11:20

OK, I’ll bow in again. Call me crazy, but that chart above by Mike looks pretty much like batters have little control over where they hit their line drive/ground balls.  There is a small “peak” up the middle for both left and right handed batters (which is not THAT surprising as many batters in many situations try and hit the ball up the middle) and another one next to the 2B and SS (depending upon whether the batter is LH or RH), but by and large for both LHB and RHB, they look like skewed normal curves to me.  More importantly, if you completely smoothed out those curves, I don’t think the optimal fielder positioning would change much at all. (I say “more importantly” because that enables you to do the “regression” quite easily.) And as Guy said, the reason the 1B and 3B play “too” close to the lines is to balance doubles and singles.

If batters in general truly had a significant ability to “hit the ball where they ain’t,” then no matter where the fielders played, they would be able to hit away from them since there is NOT much game theory involved as the fielders must position themselves FIRST.

Even for individual good hitters, I doubt that they have much of an ability to hit them where they ain’t.  In addition to simply making more and better contact (which is the primary determining factor for a “good” hitter), I imagine that they (hitters with high averages) simply have a wider distribution, which makes it more difficult to defend of course, although in order to have a wider distribution, you probably have to sacrifice some speed on those batted balls.  If they could truly hit balls away from fielders, you would see a very “non-curve” looking spray chart.  Do any hitters (in the long run) have any “non-curve” looking patterns?

Mike can you show the same chart for fly balls only?

BTW, I am not even convinced that those small spikes in between fielders are real.  It is possible that they are an artifact from biases by the people who record the data.  Mike, what data source are you using for that chart?  If it is pitch f/x (as opposed to retrosheet, BIS or STATS), then maybe there is no possible bias.

And BTW, the comments above about Howard and Boggs are completely wrong and is the “trap” that I mentioned in one of my posts above.  Maybe Tango can explain why I say this, since I am sure he understands what I mean.  Basically, unless you something about the kind of swing that a hitter like Boggs or Howard have (or their power or K numbers or something like that), independent of and blind to their results, it is imperative that you DO regress them to EXACTLY the same distribution.


#46    MGL      (see all posts) 2009/05/27 (Wed) @ 11:22

Just as it is imperative to regress Mike Matheny’s HR totals to EXACTLY the same number as Mike Piazza, as both are big, strong RH catchers, yet Matheny has no power whatsoever and Piazza is one of the best HR catchers of all time!


#47    Guy      (see all posts) 2009/05/27 (Wed) @ 11:32

Mike:  It certainly makes sense to me that as the number of strikes increases, hitters focus more on making contact at the expense of power.  And I’d expect that to mean more hits to the opposite field, less pulling.  But I don’t think that’s evidence either way for hitters’ relative ability to hit the gaps. 

We just have to agree to disagree here.  But I would say that if RHH hitters are aiming for the 5-6-7 gap, as you suggest, shouldn’t we see the highest frequency at slice 6—which I imagine has the lowest out%?  But instead, that’s the lowest frequency of the three.  Instead the peak is at 5, where the 3Bmen makes the play moving toward first and thus an easy throw.  And why are far more balls hit to slice 8—right at the SS—than to 9 or 10? 

This is a complicated question.  But we both seem to agree that fielders are optimally positioned, given the current spray pattern.  But now do the reverse exercise:  given fielders’ current average positions, are your spray patterns optimal for the hitters?  Not even close.  So that strongly suggests that fielders have the upper hand in this battle.


#48    Peter Jensen      (see all posts) 2009/05/27 (Wed) @ 11:37


Trying again!

Guy - This is a chart of Pedroia’s GB 2005-2008.  Of course we don’t know the exact positions of the fielders and they will change a bit depending on the pitch type and baseout situation and other factors, but my best guess at an average position is where the percentage of ground balls that are outs is at or near a maximum.  I have also used the average position where an infielder catches a line drive for an out as an approximate location.  As you can see from the chart. The peaks are not exactly where the fielder is playing.  Pedroia’s largest peak is the gap between the 3B and SS.  There is another big spike near, but not exactly at where the shortstop is playing.  There are also smaller peaks directly up the middle, and very close to, but not directly at where the 2B is playing.

As you can see by the chart, even when Pedroia is successful at hitting the ball in the gap it is fielded for an out more than 50% of the time.  That is how players can hit them where they ain’t but still only have a BABIP around .350


#49    MGL      (see all posts) 2009/05/27 (Wed) @ 11:40

"Unless you something” should be “unless you KNOW something..”

And yes, fielders have the “upper hand” because trying to hit a baseball from a major league pitcher “between” fielders (other than, for example, “up the middle,” “pull” or “opposite field” in general) is virtually impossible.  Of that I am certain.


#50    Peter Jensen      (see all posts) 2009/05/27 (Wed) @ 11:44

I give up.  The spray charts can be seen at flickr.com under my “misterdirt” screen name if you are interested.  Unless Tango (or someone else)can help me get it posted here.


#51    Tangotiger      (see all posts) 2009/05/27 (Wed) @ 11:44

You can have a classroom of students, from obvious smarty-pants to obvious knuckleheads.

You give them a test.  You select the top 20% scores, and the bottom 20% scores.  You ask all the students to take a different test (but of the same subject). 
- The scores in the second test for the group of students who were in the top 20% in the first test will be closer to the average. 
- Similarly, the scores in the second test for the group of students who were in the bottom 20% in the first test will be closer to the average.

BUT, if you were to preselect the “obvious smarty-pants”, the group scores in each of the two tests will be identical.  And the same for the “obvious knuckleheads”.

AND, the group score of the “obvious smarty-pants” would be LOWER than the group score of the top 20% scores in either of the two tests (regardless if you remove the overlaps).

When we say “Wade Boggs”, it carries with it an abundance of information.  If you say “Wade Boggs, 1982-1983”, that’s much different.  Just like saying “Dwight Gooden” is different from “Dwight Gooden 1984-1985”.

***

I also re-recommend this link:
http://www.socialresearchmethods.net/kb/truescor.php

As well as all the links below that one in the left-hand menu.


#52    Mike Fast      (see all posts) 2009/05/27 (Wed) @ 11:46

I have a hard time responding politely to most of your comments, MGL, so I will respond to the one that I feel I can. 

If batters in general truly had a significant ability to “hit the ball where they ain’t,”

I’d be curious to know what you consider significant.  To me, a difference of 50 points of batting average would be hugely significant.  It seems that you’re setting the bar for “hitting where they ain’t” at being able to bat close to 1.000, and when you don’t see that in the distribution, you dismiss it.

Btw, did I kick your dog or something?  Why do you feel the need to describe my ideas as “passed the ridiculous phase” and “completely wrong” and “100% incorrect” and imply that I’m so far past the logical pale that you can hardly bear to descend to converse with me?  Can we not discuss things on this blog without the personally directed hyperbole?  I have disagreed with some, but not all, of the things you’ve said, and I’ve tried to keep it civil and calm.  I appreciate your intelligence and experience in the field and would appreciate the same respect in return.


#53    Mike Fast      (see all posts) 2009/05/27 (Wed) @ 11:57

Tango, re #51, I think everybody involved in this discussion knows that, so where are you going with that?

Do you believe that Ichiro and Ryan Howard properly belong in the same population for batted ball distribution simply because they are both left-handed?  There are plenty of reasons to distinguish them into very different populations, not the least of which are power and K-rate, as even MGL mentioned.


#54    Guy      (see all posts) 2009/05/27 (Wed) @ 11:59

On Boggs/Howard, I’m the guilty party.  What I was trying to suggest is that there are probably a few “types” of spray distributions, rather than a single mean and a normal distribution around that.  If you analyzed a lot of hitters, you could probably identify a few discrete profiles.  And the guys who hit a lot to the opposite field would, for example, tend to have less power.  So you would then want to regress Boggs toward that profile, not a generic LHH.  But I agree that if the only info you had was handedness and a small spray sample for a hitter, your best projection would be the kind of regression MGL suggests.  (Of course, given minor league experience, this situation could never occur in real life.)

I really don’t think we all disagree on this issue.....


#55    Tangotiger      (see all posts) 2009/05/27 (Wed) @ 12:16

First, good job on Mike/52 moderating here, which is probably what I should have been doing.

As for my comment in 51, I wasn’t going anywhere with it.  Just setting the stage for the difference between sample and true rates. 

The only reason Ichiro and Howard belong in different pools is because we have a visual as to how they hit.  The plane of their bat, how quickly they uncork, how they work the strike zone and the count, etc. 

The bottom line is regress toward whatever population the player is drawn from.  The number of parameters to consider would be fairly large, but the handedness of the batter will undoubtedly be the one that has the most impact.

As for the general disagreements in this thread, I think all of us probably have more commonality than the thread suggests, and we may either disagree on the peripherals or speaking a different language on those.


#56    Mike Fast      (see all posts) 2009/05/27 (Wed) @ 12:26

Guy, I don’t think we all disagree either.

I would agree with your summary and would add that ball-strike count makes a huge difference, as much or more than handedness.

I completely agree with MGL’s main point, that you can’t cherry-pick and then think you’re doing a real regression after that.  I don’t think he believes I get that, but I do.  What I am mainly opposed to in the view that MGL is offering is that handedness is the primary divider for populations and that if you really only have two populations of RHH and LHH you’ll get close to the real distribution.  I think that division in two populations is very inadequate, primarily because it ignores ball-strike count, and secondarily because I believe we can profile hitters better into more populations (K-rate and HR-rate might be where I’d start) that would also make a significant difference on the same level as handedness.

I am also interested in how the distribution comes to be, i.e., can hitters hit them where they ain’t?  I think this is a fascinating topic to explore.  I don’t believe we’re anywhere near a conclusion, rather, we’re just scratching the surface of what the data we’ve begun to gather can show us.  So I hate the idea that that avenue of research should be shut off with a summary decision that the conclusion is fore-known--there’s an inkling of an indication that hitters can’t hit em where they ain’t, therefore no need to sift the evidence, case closed, move on.  I don’t know for sure that hitters can hit em where they ain’t, but I see indications they can and think it’s worth an investigation.  I don’t see that the conclusion is obviously known.


#57    Mike Fast      (see all posts) 2009/05/27 (Wed) @ 12:37

I would add that maybe MGL has looked at all this already in more detail.  I wouldn’t doubt that he’s looked at many times the batted ball data in his life that I have.  However, I’m not aware of anything anyone has published on this topic (which isn’t to say it isn’t out there).

Also, my point would be more clearly made if/when I publish the data I was looking at last night.  My initial comments in the thread were based on the research I did for my “Confessions of a DIPS apostate” article at THT about fly balls and line drives, but I spent some time yesterday breaking down ground balls and line drives by count and pitch type and so forth.  None of that is published, of course, and I know it would be helpful for the discussion for everyone to see what I was looking at, both to understand what I was saying and to surely point out the flaws in my thinking or things I may have overlooked.


#58    Mike Fast      (see all posts) 2009/05/27 (Wed) @ 12:38

Oh, MGL asked about my data source.  It is the MLB Gameday stringer fielding data from 2007 to 25 May 2009.


#59    Guy      (see all posts) 2009/05/27 (Wed) @ 12:43

The tendency to hit the other way in pitchers’ counts is interesting.  Do fielders tend to adjust accordingly?  For example, on a 1-2 count on RHH, will infielders shift a bit toward 1B?


#60    MGL      (see all posts) 2009/05/27 (Wed) @ 13:46

"Btw, did I kick your dog or something?  Why do you feel the need to describe my ideas as “passed the ridiculous phase” and “completely wrong” and “100% incorrect” and imply that I’m so far past the logical pale that you can hardly bear to descend to converse with me?”

Kind of an odd statement, as only one of those comments referred specifically to you.  It was Guy that wrote the Boggs/Howard comment as he pointed out.  Why did you take my “completely wrong” as an attack on you?  That had nothing whatsoever to do with you.  The “100% incorrectly” was a bit strong and I take that back and apologize for it.

The “past the ridiculous phase” referred to the discussion on this thread and was mostly in reference to Peter’s comments.  I don’t think you an I have much of a disagreement.  My dander was ruffled a little by Peter’s initial comment which was completely out of line and I took it out on everyone else.  So I apologize in general.

Arguing about whether a batter’s ability to hit em where they ain’t is “significant” or not is a fruitless argument as are most arguments using qualitative words.  One person’s significant is another one’s not significant…


#61    Peter Jensen      (see all posts) 2009/05/27 (Wed) @ 14:51

MGL - I am sorry if my post #15 offended you.  Other than using “ridiculous” in the first sentence I can’t find anything else offensive about it.  Perhaps ridiculous was an inflammatory word to use.  I was non-plussed that two people that I know to be very smart were treating hit location vectors as if a batter were attempting to hit every ball to the same location.  I still don’t understand it.  And I can’t believe that even with all the spray charts indicating that batters react to the location of pitched balls and the positioning of the fielders by trying to hit balls to different locations, and that they have at least some success at doing so, that you still insist that they can’t do that to major league pitching.  You have backed off your original statements a little to allow that hitters may have enough control to pull the ball, hit it up the middle, or to the opposite field.  If you extended that to “pull into the gap between the two outfielders”, “pull into the gap between the two infielders”, “hit up the middle” and “hit to the opposite field” then I think Mike and I would agree with you that that is about the extent of what a batter is trying to do.  That pretty much describes the typical spray chart; 4 peaks, usually 2 major and 2 minor but sometimes 1 major and 3 minor in those locations.  There probably would be a fifth peak for “pull down the line” if we had all the information on foul ball locations.

I get a little frustrated when I ask specific questions and for real life examples that back your opinions and only get hypotheticals and generalizations in return.

Here is the link to the Pedroia Chart. http://www.flickr.com/photos/38798618@N05/3569791679/

Sorry I’m too dumb to get it to show up here.


#62    Tangotiger      (see all posts) 2009/05/27 (Wed) @ 15:27

I was non-plussed that two people that I know to be very smart were treating hit location vectors as if a batter were attempting to hit every ball to the same location.

I don’t think you read it the way I intended it.  All I said was to take any player’s actual spray, and add 50 “average spray” BIP.  So, if the average spray by a LHH is 50% pull, 30% up the middle, and 20% opposite, then you add 25, 15, and 10 balls to whatever a player actually had (say 750, 200, 50), to get his “true” spray pattern (which in this example would be 775, 215, 60).


#63    MGL      (see all posts) 2009/05/27 (Wed) @ 15:53

Peter, I was not “offended.” I’ve said it many times that I don’t care how someone refers to my ideas, comments, etc. If someone wants to call me an idiot, that is their prerogative and it does not bother me personally in the least.  (I know that you were not calling me an idiot.) I am not saying that that is the correct thing to do (name-calling). In fact, I don’t encourage that at all on this site, although Tango is more of the “enforcer” than I am.  It is just that it does not bother me at all.

Anyway, my comment about being “out of line” was referring strictly to the content.  Tango and I are on exactly the same page.  You take a hitter’s spray chart and you regress it, or add X number of balls from an average hitter from the same population.  I said you were “out of line” because I think you misinterpreted or misread (or whatever) what I was trying to say.  I’m not sure why, as I thought I was making myself clear.  Apparently not, though, as I think several people misinterpreted it, which may have been my fault and not theirs.

And if anyone DOES kick my dogs (2 Basset Hounds)…


#64    MGL      (see all posts) 2009/05/27 (Wed) @ 16:00

If batters as group were hitting the ball away from fielders, you would see “dips” where the fielders are or some “bumps” around the fielders.  If they are simply hitting the ball where it is pitched or toward a certain direction (either a pull hitter, an opp field hitter, or somewhere in between), you will see a smooth curve that peaks toward a “natural” direction.  The fact that we see some bumps suggests that perhaps hitters as a whole (obviously some more than others) have some skill at keeping the balls away from fielders, however, not nearly enough to get me excited.  I suppose you can smooth out Mike’s curve above and then take the difference between the smooth curve and the actual curve (in run value of BA or whatever) and that should give you a pretty good idea as the value of that skill.  Keep in mind that I am only referring to ground balls and line drives as represented by Mike’s curves above.  I would have to look at a fly ball chart in order to comment on them.  It is also possible that the only significant thing a batter can do is to hit the ball towards the middle a little more than would be “normal” since no matter what fielders are not going to play up the middle.  And when I say “middle”, I mean the middle of a players spray chart, which may mean between first and second or second and third for an otherwise extreme pull hitter.


#65    Peter Jensen      (see all posts) 2009/05/27 (Wed) @ 16:30

MGL - did you look at Pedroia’s chart that I gave the link to above?  His highest spike is in the 3B-SS hole.  His next two highest spikes are on the 2B side of the SS.  And he has a group of small spikes up the middle on which he has a BABIP well over 50%.  This is stronger than normal but by no means out of the ordinary for a high average GB hitter. Here’s the link to Jeter’s chart: http://www.flickr.com/photos/38798618@N05/3569791817/

And here is Suzuki’s: http://www.flickr.com/photos/38798618@N05/3569791781/

And just for fun here is Howard’s Fly ball - Line Drive spray chart: http://www.flickr.com/photos/38798618@N05/3570604352/

No wonder teams are willing to give up ground ball singles to the opposite field on the shift if Howard is willing to forego hitting the ball in the air.


#66    Peter Jensen      (see all posts) 2009/05/27 (Wed) @ 16:37

If Pedroia doesn’t have much control over where he hits the ball it would sense to move the third baseman over to 60 degrees and the shortstop to 77 degrees where the spikes are.  But my guess is that Pedroia has the ability to put a lot of pitches close to the third base line if the 3B moves over.


#67    MGL      (see all posts) 2009/05/27 (Wed) @ 20:32

I’m really mystified.  I am staring at Pedroia’s chart and it looks like a skewed normal distribution more or less.  Clearly some of those peaks and valleys are random flucs.  I don’t see a whole lot of ground balls up the middle.  I guess we can argue (which would be pointless, as I already stated) about what constitutes “significant”, but if hitters, any hitters, truly had a lot of control over where those batted balls went, wouldn’t there be a lot more balls hit up the middle?  Every chart I have looked at, Jeter, Pedroia, Suzuke looks more or less like a skewed normal distribution to me with some small peaks and valleys here and there.  It wouldn’t take much to smooth all of those curves out.  SOME of those peaks and valleys have to be random.  So what is left simply cannot be all that significant.  But, again, I guess it depends on your definition of significant.  Plus, I would like to see another chart for all those hitters that include infield line drives just to make sure that we aren’t seeing the possible scorer bias that Mike (or Guy or whoever it was) mentioned above.


#68    Tangotiger      (see all posts) 2009/05/27 (Wed) @ 21:31

Let me see if I can put in an image with this code:

<img src="http://farm4.static.flickr.com/3343/3569791781_686f2839d3.jpg?v=0" />


#69    Guy      (see all posts) 2009/05/27 (Wed) @ 22:46

I’m with MGL here:  I look at this (or Pedroia’s) and I just don’t see a lot of evidence of strategic hitting.  If you ran a correlation between hit% and frequency at each vector, it would clearly show a strong negative correlation.  Again, fielders are clearly winning the positioning battle by a very large margin, even though (as MGL noted) they have a huge disadvantage: they commit to their position before the hitter swings. 

Now, Ichiro clearly has more dispersion than most hitters, which I’d expect to see for high-BA hitters.  It would be interesting to develop a metric for dispersion, and see how that relates to BABIP.


#70    MGL      (see all posts) 2009/05/27 (Wed) @ 23:23

Ichiro’s chart is interesting.  Even with his though, he hits a large percentage right to the 2B and SS.  Why would he do that?  I suspect that with him, he goes with the pitch.  Inside pitch, he pulls it, outside pitch, he hits to the opposite field.  Pitch right down the middle, he hits it up the middle.  He definitely seems to have a lot of dispersion, which as I said earlier, makes it difficult to defense, which is probably why he has a high BA and BA on ground balls (I assume), in addition to the fact that he gets a lot of infield hits, I think.  In fact, the average SD in degrees for lefty batters is 18 for GB and 24 for FB.  Ichiro is 23 and 29.

Certainly batters can control to some extent whether they pull a ball or hit it to the opposite field, as evidenced by where balls tend to go with a runner on second and 0 outs.  But, I just can’t imagine that batters can keep it away from fielders or hit it in between fielders more than any de minimum amount of time.  Plus, even when most batters deliberately try and pull balls or hit them to the opp field, they have to pay a price for that, which is not hitting the ball as hard as they typically can or making the same amount of contact.


#71    Guy      (see all posts) 2009/05/29 (Fri) @ 08:40

It occurs to me that you could also measure BIP dispersion for pitchers, which might be interesting.  It might provide some insight into which pitchers have a true BABIP rate that is above or below average.  Distance of LDs and non-HR FBs might also have predictive value. Similarly, length of FBs might give us a better measure of HR/9 talent than the actual number of HRs.


#72    Dave Allen      (see all posts) 2009/05/30 (Sat) @ 18:04

Here are the number of GBs for each angle (-45 is the third base line and 45 the first base line) and the average babip of a GB to that angle.  It is all GBs from 2007 and 2008 with hit locations from the GameDay data.  It would probably be better to break it up by batter handedness, but anyway:

The number of GBs is in black and the babip is in blue.  You can see the location of the four infielders where the babip drops below 0.2.  There are small peaks in the number of groundballs at -30 (between the thirdbaseman and shortstop), at 0 (between the shortstop and secondbaseman) and at 30 (between the secondbaseman and the fistbaseman).  But for the most part the groundballs go to locations of low babip.  As per Guy’s suggestion in 69 here is the correlation between number of GBs and babip:

Overall a negative relationship with the three outlier peaks evident.  For the most part it looks like fielders position themselves at the high hit locations, but batters have a slight ability to hit into the three gaps.


#73    Mike Fast      (see all posts) 2009/05/30 (Sat) @ 18:34

Dave, if you exclude line drives from your analysis, it biases the results toward being closer to the infielders since balls scored as line drives tend to go between the infielders.  I’m not sure if this is as a result of scoring bias or is a documentation of an actual difference in where line drives go, or some of both, but it will have a significant effect on your results.


#74    Dave Allen      (see all posts) 2009/05/30 (Sat) @ 18:57

Good point, Mike, I totally forgot you had made that point in 42.  Here are my graphs reproduced with GBs and LDs.

Yeah, you are right, the BIP peaks between the fielders grow when you include LDs.  So those LDs are coming in mostly in the angles between fielders.  The relationship between number of LDs + GBs and BABIP is still negative, but the ability to ‘hit it where they ain’t’ looks much more pronounced.


#75    Guy      (see all posts) 2009/05/30 (Sat) @ 19:54

Question for those of you who work with this data:  do you believe the peak at zero degrees is real?  Or is that balls hit up the middle which are not fielded tend to be scored at zero, while balls reached by the SS or 2B get scored in an adjoining vector? 

It seems to me that if RH hitters “naturally” hit toward the SS, but are trying instead to hit it to the gap—and have some ability to do so—then we would see at least as many balls at -10 as we do at zero (probably more, given the gravitational pull toward -20).  So I wonder if this could be a scoring artifact.


#76    Dave Allen      (see all posts) 2009/05/30 (Sat) @ 20:36

Guy, it looks to me like there is an artifact in the data.  If you break up the slices even smaller (the ones I post above were with 90-slices each one degree wide) you see way too many balls in play right at 0.


#77    Guy      (see all posts) 2009/05/30 (Sat) @ 21:02

Dave:  Can you remove GBs fielded by the pitcher or catcher?  In addition to the possible bias I mentioned above, I could imagine that many weakly hit grounders might get coded as zero.


#78    Dave Allen      (see all posts) 2009/05/30 (Sat) @ 21:23

Guy, when I scraped the gameday data I didn’t get the ‘fielded by’ part.  I need to go back and rescrape it and pick that info up this time.  Maybe Mike or Max could look into that.


#79    MGL      (see all posts) 2009/05/30 (Sat) @ 22:24

I think there are a plethora of potential problems and biases with this kind of data.  If a line drive goes through the IF, even if it is hit right where the SS or 2B typically play, I think it is going to tend to get coded at a vector that is NOT where the SS and 2B typically play.  And if it gets caught when the 2B and SS are not where they typically play, it will tend to get coded where they do typically play.

So on and so forth.

In addition, Dave, what line drives are you including in your second graphs?  If you are including outfield line drives that have no chance to be caught by the IF, rather than the hitters hitting them “where they ain’t” in terms of the infielders, the outfielders may be positioning themselves exactly where fly balls and line drives are being hit and the batters cannot do anything about it.

Again, lots of problems with this type of analysis when you have stringers manually coding each batted ball as opposed to a computer working with video and automated digital coding of batted ball locations and locations of fielders.

Also you really need to split these graphs up by handedness, otherwise you end up with curves that make little sense to the naked eye.


#80    MGL      (see all posts) 2009/05/30 (Sat) @ 22:37

Another thing that strikes me as kind of odd is the notion that a peak in the middle for ground balls and line drives suggests that batters are hitting ‘em where they ain’t.  Since I don’t think that batters can control to a large degree whether they hit ground balls or fly balls (otherwise we would see more sac files and fewer GDP’s), if they are intentionally hitting it up the middle, they are hitting it right to the CF’er a good proportion of the time.  If we saw fly balls in the gaps and ground balls up the middle, that would be a different story.  Do we? 

My main criticism of trying to find evidence of batters hitting ‘em where they ain’t is simply that the entire field is pretty much covered, as well it should be.  The infielders and outfielders fill in each other’s gaps.  The notion that batters can, to any significant degree, hit it between fielders seems preposterous to me.

I played a lot of high level baseball and never once, not even once, did I ever think about trying to hit a ball where the fielders were not playing. It never even crossed my mind.  Never.  I would try and hit the ball hard as often as I could, period.  And in order to do that, I would try and go with the pitch as much as possible (hit the outside pitch to RF and the inside pitch to LF).  I also NEVER heard any other player EVER talk about trying to hit the ball to a place on the field where fielders were not playing, other than some players in some circumstances tried to hit the ball up the middle.  Even then, it was not in order to scoot a ground ball up the middle.  No one ever tries to hit a ground ball. It was to square the ball up. Often the best chance of squaring the ball is to try to hit it up the middle.  That way if you are a little late or early with your swing, you can still hit it hard and fair.  That is the main reason for trying to hit it up the middle.  Not because there are no infielders up the middle.  After all, the CF’er is playing pretty much up the middle and you are NOT trying to hit a ground ball, unless perhaps you are trying to move a runner from second to third with no outs, and even then you are still trying to get a line drive base hit or even a deep fly ball to RF if you get a pitch high and away (for a RHB).


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 05:00
Help needed with sticky issue…

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards