THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, February 09, 2011

HITf/x

By Tangotiger, 10:52 AM

A pre-cursor to HITf/x by using Greg’s HitTracker to try to estimate the probability of a batted ball being a HR.  Love this stuff.


#1          (see all posts) 2011/02/09 (Wed) @ 13:07

Tom:  Since you say that you love this stuff, perhaps you could tell us what exactly about it you love, as a basis for further discussion.


#2    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 13:35

I love the idea of trying to find the characteristics of a moving ball such that we can estimate its landing or contact point.

This separates the ball from the opponent and its outcome.


#3    Alan Nathan      (see all posts) 2011/02/09 (Wed) @ 14:08

OK, here you go:

This article makes no sense to me and I don’t really see the point the author is trying to make.  Home run probability depends on fly ball distance, fence distance, and fence height.  Nothing else!  If the ball makes it over the fence, it is a home run.  It’s just that simple.  The real focus of an analysis should be on the question: What determines the fly ball distance.

So, what does fly ball distance depend on?  It will depend generically on two things:  the batted ball parameters and atmospheric conditions.  The latter are quite interesting to investigate (wind, temperature, elevation, etc.), and I did such an analysis back in 2009 using combined hitf/x and hittrracker data.  See
http://webusers.npl.illinois.edu/~a-nathan/pob/carry/carry.html.  The purpose of that article was to investigate how a fly ball carries differently in different ball parks.  Equally interesting to investigate are the batted ball parameters, including batted ball speed, vertical launch angle, and spin, and how these affect fly ball distance.  Using hitf/x and hittracker data (albeit only for home runs), there are many questions one could investigate.  For example, one could study the dependence of fly ball distance on vertical launch angle or batted ball speed; the dependence of hang time on vertical launch angle; the dependence of spin on the various batted ball parameters; the
dependence of hang time on spin; etc. etc.  I have begun to investigate these things and someday I will write it up.

As an example of the “wrongheadedness” of the article under discussion, the author looks at a scatter plot of distance vs. vertical launch angle.  I would argue that it makes no sense whatsoever to do that, since these are not independent quantities:  distance depends on vertical launch angle.  Doing a multi-parameter regression analysis on quantities that are not independent also makes no sense.

I agree with Tango’s sentiments (#2).  I just don’t think the article in question even begins to address the interesting questions.


#4    Colin Wyers      (see all posts) 2011/02/09 (Wed) @ 14:44

Not to pile on, but - the model for “is a fly ball a home run” is simple, easy to understand, and PROVABLY true. It’s not an estimate, it’s a measurement of reality.

Given that, why on earth would you build a SEPARATE “is a fly ball a home run” model using a regression that is NONE of those things? What could it possibly tell you?


#5    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 14:57

The way I was reading the article (and perhaps I was reading INTO the article) is one from a true talent perspective.

Let me back up, and just tell you what I would do. Suppose you have 1000 batted balls from Manny Ramirez.  You would look at the speed off the bat of those flyballs, and the spray angle, and vertical launch angle.  And if you can, the spin angle and speed.  (Am I missing anything?)

If you had those parameters for each ball, and you have a layout for each park, you would introduce random weather conditions (specific to each park and date), and come up with a “true talent” HR per contacted batted ball rate.

Suppose you don’t have all the parameters linked on a per-ball basis, but you have them as if they are all independent.  You have a list of 1000 launch angles, you have a list of 1000 spray angles, etc, without a notation of which is linked to the 1000 batted balls.  Can you still find this information useful?

The way I was reading the article, the author was treating these parameters independently, and then trying to come up with some sort of equation to see what each player’s talent level is.

***

Setting aside whether I read the article well or not, or whether the author wrote it well or not, my question is:

How far can we get by delinking each of the parameters from the batted ball, and treating them as if they are independent?  How much uncertainty does this give us?


#6    Alan Nathan      (see all posts) 2011/02/09 (Wed) @ 15:16

Here is something that one can do.  In the three-parameter space of batted ball speed, vertical launch angle, and fan angle (assuming we have all those data from hitf/x), we can examine the probability of hitting a home run in each park.  I leave spin out of this since we have no measure of it anyway.  Then for any given batter in any given park, we can examine the batted ball characteristics (same three parameters) and come up with a metric for “home run prowess” for that batter in that park.  The metric might be the distance in three-parameter space from the batters average quantities to the peak of the HR probability for the park.  One can average over all parks to establish prowess for the batter, independent of park.

A simpler thing to do is to repeat the above procedure but not separate things by parks.  Of course, you lose information that way but it might be good enough.

I haven’t given this much thought beyond what I have written here.  But it is a line of possible research (if we had hitf/x data to do it!).


#7    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 15:34

Alan: that’s exactly what I was getting at.

MGL used to do this many years back using batted ball distances and spray angles as tracked by STATS to come up with “virtual HR”.  Basically, he’d overlay the hit location of batted balls in each park to see how many would have cleared the fence.

So, it’s just an extension of that: rather than relying on distance and spray, we’d use batted ball speed + launch and spray.

***

I’m not even sure that we’d need to treat them as a triplet data point.  Perhaps we can separate (to some extent) the spray from the distance. 

For example, if you have one hitter that pulls FB at 350 feet, and opposite fields at 250 feet, and some other hitter pulls at 340 feet and opposite fields at 330 feet, isn’t it more likely that the second guy may actually pull the ball better than the first guy?  He just chooses to spray more.

So, that’s the question I have in terms of the validity of treating the various parameters independently.  I’m not saying I would, but I’d have to check to see if it’s value-added.


#8          (see all posts) 2011/02/09 (Wed) @ 15:57

Since we have equations of motion, why in the world would you want to use linear regression to come up with a new, but stupider, version of the equations of motion?

If you’re missing a piece of data, you can estimate it, and still apply the equations of motion, and you can use them to determine the error in your estimate.

There’s very little reason you would want to do what this author did or what Tango is suggesting, as far as I can tell.


#9    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 16:05

What we are trying to do is infer the true talent of the player that caused that motion.  We are not necessarily trying to figure out the motion equation.

As I said, if you have a player that pulls balls at 340 feet, and hits opposite field at 330 feet, while someone else hits at 350/250, and we are trying to figure out a player’s HR talent level, we don’t need to presume that the batter at 340/330 is fixed at that.  What if he can change his split to 350/310, or 360/290, or 370/270? 

That is, he changes his approach so that he gives up 20 feet opposite field to gain 10 feet of pull.  Suddenly, we are comparing a 370/270 hitter to a 350/250 hitter, and the former is naturally the much better HR hitter.


#10          (see all posts) 2011/02/09 (Wed) @ 16:10

But you are trying to answer a question that the equations of motion are designed to answer correctly.

You are just coming up with a new and error-prone way of answering the question, when there is already a much better way of answering the same question.


#11    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 16:19

I’m also asking if there’s any value-added to treating the parameters independently.

For example, in Alan’s example, he is taking it as a given that he can transplant a particular batted ball into each of the 30 parks for the purposes of determining if a ball would clear the fence.  But, what if the batter swung a certain way because he was at that park.

Similarly, Alan is asserting (or seems to assert) that the three parameters need to be treated as a single data point.  And so, if he has 1000 batted balls, that’s what he’s going with.  But, what if each of the three parameters can be treated independently?  Now, instead of having 1000 batted balls, we have 100,000,000 batted balls for that batter.

We can then put those 100 million batted balls into Alan’s motion equation, overlay that on each ballpark, and come out with a HR rate.

I don’t know that you can do that.  I am asking if we can, and if this method is value-added.


#12          (see all posts) 2011/02/09 (Wed) @ 16:38

You CAN do all sorts of things.  There’s no fence around sabermetrics to keep people from wandering off into stupidity.  Thus, you see a new version of a run estimator every so often that’s basically a reconstruction of Total Average.  Do they have value?  Well, compared to knowing nothing about baseball, yes.  But compared to knowing how to use BaseRuns, no, they have negative value.

What you’re proposing here is trying to invent Total Average for fly balls when we already have BaseRuns.

Be my guest, and it probably won’t be worthless, it will just be worth less than what we already have.


#13    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 16:45

I can explain why the others are inferior to BaseRuns, which is why I can discourage others from trying to do something inferior.

What would be helpful is to know the reason that decoupling (detripleting?) the three parameters of a batted ball by treating them independently is necessarily inferior to keeping it as a triplet of 1000 data points.


#14    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 16:53

Or, more importantly, would doing both (keeping as a triplet, and detripleting) be better than keeping as a triplet?


#15          (see all posts) 2011/02/09 (Wed) @ 16:58

I’ve tried to explain it already, but for whatever, my explanation is not connecting with you.

We have a working model already of how baseballs move through the air.  We can change the input parameters to this model and see how the outputs change.  This model has been experimentally verified and is sound.

What you are doing by “decoupling” is simply coming up with a poor man’s model of how baseballs move through the air.  It is a less accurate model than the current model that we have.

Having 1 billion inaccurate answers is inferior to having 1000 correct answers.

However, you can change whatever parameters you want within the current aerodynamic model.

You don’t need to develop a new poor man’s aerodynamic model in order to be able to vary the input parameters to the aerodynamic model.

So the model does not constrain you there.  What constrains you is effectiveness or common sense, or whatever you want to call it.

The parameters are not independent.  If you try to treat them like they are, you will get worse results than if you recognize how they are dependent on one another.

If you don’t recognize that the rate of doubles and home runs is not independent, your regression will give you bad results.  Moreover, you have no way to tell that you’ve gone wrong. Similarly, if you abandon all knowledge of aerodynamics, your regression model will give you bad results, and you will have no way to tell that you’ve gone wrong.


#16    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 17:08

When I said this:

We can then put those 100 million batted balls into Alan’s motion equation, overlay that on each ballpark, and come out with a HR rate.

I am accepting whatever aerodynamics equation Alan has.  I am not ignoring it, nor trying to recreate a motion equation.  I am not trying to create a poor man’s anything.

Now, what you said here is what I am after:

However, you can change whatever parameters you want within the current aerodynamic model.

So, given the model, we need to feed it a triplet of data, using Alan’s parameters: speed, launch angle, spray angle.

Where do we get that data?  From the 1000 batted balls hit by Manny Ramirez.  We have the following:
80mph, +20 launch, -17 spray
82mph, +15 launch, +12 spray
75mph, -2 launch, 0 spray

And so on.

The standard process would be to keep those triplets intact, and put those 1000 values into Alan’s equation, and we’ll come out with a hit location, which we’d overlay on a park.

What I am suggesting is to do the following:
80mph, +20 launch, -17 spray
80mph, +20 launch, +12 spray
80mph, +20 launch, 0 spray
...
...
And so on

The three data points of triplets becomes 27 data points of three independent parameters.

So, that’s what I’m talking about.  That by taking just three batted balls, we can tell so much more about the player.

The question on the table is: can we treat these three parameters as if they were obtained independently?


#17    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 17:16

Let me take an absurd example:

1. Hitter1: every batted ball is either hit at a speed of 100mph with a vertical launch of -10 degrees (hard ground ball), or 60mph with a vertical launch of +15 degrees.

2. Hitter2: every batter ball is hit at 70mph +15 degrees

Now, is it necessary to think that the only way the batter1 can hit the ball hard is when he hits it into the ground?  Or, is it possible that we can say that his real talent is that he can hit 100mph balls, and 70% would be into the ground and 30% would be at +15 degrees?

That is, even though we’ve never observed him hit that combination of 100mph/+15degrees, we think that his talent level IS to be able to do that on occasion.  And we infer that based on the breadth of the observations we have by decoupling the parameters.


#18          (see all posts) 2011/02/09 (Wed) @ 17:17

Okay, so you’re asking whether we can ignore that there is a physical reality to how bat-ball collisions happen. 

Can we? Yes. Should we? No.  Granted, our model of the ball-bat collision is less well-developed than our model of flyball aerodynamics.  But it still informs us.

You basically seem to be asking here whether ignoring our knowledge of physics is better than not ignoring it.  I don’t see how the answer to that question can possibly be yes.

The batted ball parameters are not independent of one another.


#19          (see all posts) 2011/02/09 (Wed) @ 17:29

What you’re basically saying here is that if we have a hitter who hits a lot of long fly balls to the pull field and a lot of weak fly balls to the opposite field, that we should infer that we would also expect him to hit a lot of weak fly balls to the pull field and a lot of long fly balls to the opposite field.  But baseball doesn’t work that way.  There’s a physical reality that imposes itself and produces two of those results and does not produce the other two.  the parameters are not, in fact, independent.


#20    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 17:31

I don’t think Mike/18 is still getting at what I’m trying to say.

Tango/17 (which cross-posted Mike/18) probably does the best job as to try to illustrate what I want to say.

***

Let me try another example.  Let’s say that you throw a baseball at either 70mph, 80mph, or 90mph.  Let’s say you can put a break on a ball of say 2 inches, 8 inches, or 14 inches.

However, we can stipulate that if your top speed at throwing a pitch is 90mph, you will not be able to put a break of 2, 8, or 14.  The correct answer is only 2 inches.

That’s because the break you put on a ball is limited by the speed that you can throw the ball.

I accept this as fact because in my experience, it strains the arm too much to try to be able to throw both a curve ball and a fastball at the same time.  Even if my n=1 wasn’t good enough, I accept it as fact because no MLB pitcher puts as much break on his fastball as he does his curveball.

So, that’s why we CANNOT decouple the values we observe from speed + break.  They are linked.

My question is on the batted ball side: we have batted ball speed, we have launch angle, and we have spray angle.

Now, I can accept that there is some relationship there, that in order for me as a RHH to hit a ball opposite field at +45 spray (down the 1B line), that I won’t be able to also launch it at +15 degrees and 90mph.

But that kind of relationship is nowhere near that of a pitched ball is it?  While I would think that decoupling the pitched ball into independent speed and break numbers would get you nowhere (no value-added to simply keeping as a couplet), wouldn’t decoupling the batted ball into launch, spray, speed get you somewhere (value-added in addition to keeping as a triplet)?


#21    Peter Jensen      (see all posts) 2011/02/09 (Wed) @ 17:32

Tango - I described how we could create batter and pitcher metrics using Hit f/x data in my articles on Skill Based Metrics at THT.

http://www.hardballtimes.com/main/article/using-hitf-x-to-measure-skill/

I continued with a presentation on a skill based fielding metric using Hit F/x at the 2009 Sportvision Summit.  Until Hit F/x data becomes available, further discussion is pretty much a moot point.

You can’t decouple the 3 parameters because they are not independent physically.  A batter’s bat speed does not reach his maximum until his swing arc is passed the point where he is most likely to hit a ball to the opposite field. Similarly, the plane of the swing arc increases toward the end of the swing arc so the hardest hit pulled balls would have a tendency to have a greater vertical angle off the bat than balls hit to the opposite field.  Batters can overcome these tendencies somewhat by altering their stance and swing, but almost always with an overall decrease of maximum bat speed.

Third, just because we are not currently measuring the spin of hit balls does not mean that it is not an important factor in the distance that a ball travels.  Leaving it out of any projection model for HRs seriously compromises that models utility.


#22    Alan Nathan      (see all posts) 2011/02/09 (Wed) @ 17:33

(just returned from my WWI class, which explains my absence from the dialogue between Mike and Tom):

Except for the issue of the spin of a batted ball, if we know the hitf/x parameters (i.e., the initial speed and angles of the ball), then we know everything to predict the rest of the trajectory, in particular the landing point.  We can predict whether or not the ball will be a home run.  For the sake of argument, let us suppose that we can do that.  Given that that is the case, is there any value to Tango’s proposal.  I would argue that the answer is “yes” (and in doing so, I would have to disagree with Mike).  Let me explain.

The value lies in park-to-park differences between fence distances and heights.  Aside from being able to hit the ball a long way, is there there another skill involved in playing to the particular ball park.  Remember the old saying about Fenway:  The Green Monster giveth and taketh away.  Hard hit balls on a relatively low trajectory that might be homers elsewhere turn into long singles in Fenway.  Weak popups that are routine outs elsewhere become HR at Fenway.  So, the question is, do skilled HR hitters know how to take advantage of the park they are in to maximize the number of home runs, whether it be the Monster in Fenway or the short porch in RF in NY or whatever.

If you ignore park differences and just look at global HR production, I would say the answer is easy and already well known:  The batter wants to hit the ball as hard as possible (high batted ball speed) with a vertical launch angle roughly in the range 25-35 degrees.  So, if you want to evaluate HR prowess, you simply need to look at mean batted ball speed and launch angle (using some appropriate algorithm to combine the two pieces of information).


#23    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 17:36

that we should infer that we would also expect him to hit a lot of weak fly balls to the pull field and a lot of long fly balls to the opposite field.  But baseball doesn’t work that way

Alright, we’re getting closer to what I’m talking about!

I’m not saying it’s interchangeable.  At the beginning, I said a batter that pulls 340 and opposites 330 could be the kind of hitter that pulls 350 and opposites 310, or pulls 360 and opposites 290.  I am NOT suggesting that if you pull 360 and opposite 290 that you can also pull 290 and opposite 360.

I am also asking the following:
If you have a batter that pulls 340 and opposites 330 given n=100, isn’t it more likely that he can pull farther than someone who pulls 350 and opposites 250, given n=100.

That is, why must we keep the observations we have of the same hitter separate between pull and opposite?

Couldn’t we say that the profile of 340/330 will lead us to believe that this player’s true talent is in fact 360/290, and therefore a much better HR hitter than someone who is 350/250?


#24          (see all posts) 2011/02/09 (Wed) @ 17:37

Peter is correct.  In addition, the vertical and horizontal launch angles are dependent on one another.  Matt Lenztner does a good job of explaining why that is here:
http://www.hardballtimes.com/main/article/why-flies-go-one-way-and-grounders-go-the-other/


#25    Greg Rybarczyk      (see all posts) 2011/02/09 (Wed) @ 17:38

To use a home run example, I think what Tom is getting at is if a hitter knocked 3 home runs into section 110 of the outfield seats, and 2 more homers into section 112, but none into section 111, then can we create a “smoothed-out” distribution of batted balls for that hitter that includes some balls landing in section 111.  This “smoothing” would be done in some mathematically rigorous and valid way, and once you were done, you’d feed this smoothed out distribution back into the model to see how the player looks.  The model would, of course, still be the same aerodynamic one you used before.

I don’t think Tom is proposing disregarding, or dismantling the aerodynamic model, he’s talking about decoupling the constituent launch parameters, essentially saying “If a guy can pull a ball straight down the LF line, and a guy can hit a ball 115 mph, and a guy can hit a ball at a 19 degree elevation angle, all on separate batted balls, can I take that to mean that he’s capable of doing them all on one hit?”

Tom, am I on the right track here?


#26          (see all posts) 2011/02/09 (Wed) @ 17:45

If the question is, does pull-field power tell us anything about opposite-field power, and vice versa, I don’t know the answer to that.  That’s a worthwhile question. 

I don’t see that the correct way to answer it, though, is decoupling batted ball parameters which are physically dependent on each other.

It would probably make sense to see, if you had the batted ball data set available to do so, how well pull-field power and opposite-field power correlated with themselves and each other from year to year (or in split-half samples, or whatever).

In the same way, curveball movement and fastball movement are not independent of each other.  If you research it, you will find that arm angle has an effect on both of them.  But I don’t think that you’d arrive at that conclusion by decoupling speed and movement parameters and looking at them as if they were independent data points to be mixed and matched with each other.


#27    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 17:45

Peter:

Third, just because we are not currently measuring the spin of hit balls does not mean that it is not an important factor in the distance that a ball travels.  Leaving it out of any projection model for HRs seriously compromises that models utility.

I had it!  For purposes ofAlan’s discussion, he removed it, and I’m keeping it removed for that purpose only.

As for the rest of your post, I agree that they are not totally independent. I’m asking how independent must be treat them.


#28          (see all posts) 2011/02/09 (Wed) @ 17:49

Greg/25, I originally took from Tom’s support for Kevin’s article that he was in support of what Kevin did, which was to disregard the aerodynamic model.  It’s clear to me now that Tom did not mean that.

However, Tom is proposing disregarding the realities of the bat-ball swing and collision model.

It’s not going to be any more fruitful.

That’s not to say that the questions he’s trying to answer aren’t worth answering, just that the method of decoupling physically dependent parameters is not the right way to get there.


#29    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 17:50

So, the question is, do skilled HR hitters know how to take advantage of the park they are in to maximize the number of home runs, whether it be the Monster in Fenway or the short porch in RF in NY or whatever.

Right, that’s what I’m getting at.  But more! 

We shouldn’t treat the OBSERVED values as if they are the only values we have.  Couldn’t we, for example, be able to estimate which batter would be able to better take advantage of Fenway?  And couldn’t part of that estimate be based on the individual parameter information we have.

Take for example how someone hits against LHP and RHP.  We accept that the batter will hit differently.  But, surely, how he performs against LHP will be a strong indicator as to how he hits against RHP, regardless of what we happen to observe.

Similarly, if we see someone hit the ball off the bat at 100mph alot, but almost all of the time, it’s a groundball, can’t we infer that this guy was simply getting unlucky to some extent?  That knowing this guy hits 100mph groundballs will tell us something about how he can hit flyballs, even if all the flyballs we’ve observed from him are at 60mph.


#30    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 17:55

Greg:

he’s talking about decoupling the constituent launch parameters, essentially saying “If a guy can pull a ball straight down the LF line, and a guy can hit a ball 115 mph, and a guy can hit a ball at a 19 degree elevation angle, all on separate batted balls, can I take that to mean that he’s capable of doing them all on one hit?”

Tom, am I on the right track here?

For the bold part: I wish I came up with that.  That’s a great term.  Decoupling, constituent, launch, parameters.  That should be in the sabermetric word-of-the-day dictionary.  (Why don’t we have one?)

For the italics part: not necessarily “all” at the same time, but some reasonable combination that was informed by those three values.

As a simple example: if a pitcher can throw a curve ball at 70mph, can’t we infer his fastball speed is at least 80mph?  But, if a pitcher can throw his fastball 80mph, that tells us nothing about if he can even throw a curve ball.


#31          (see all posts) 2011/02/09 (Wed) @ 18:02

We shouldn’t treat the OBSERVED values as if they are the only values we have.  Couldn’t we, for example, be able to estimate which batter would be able to better take advantage of Fenway?  And couldn’t part of that estimate be based on the individual parameter information we have.

Take for example how someone hits against LHP and RHP.  We accept that the batter will hit differently.  But, surely, how he performs against LHP will be a strong indicator as to how he hits against RHP, regardless of what we happen to observe.

Similarly, if we see someone hit the ball off the bat at 100mph alot, but almost all of the time, it’s a groundball, can’t we infer that this guy was simply getting unlucky to some extent?  That knowing this guy hits 100mph groundballs will tell us something about how he can hit flyballs, even if all the flyballs we’ve observed from him are at 60mph.

Yes, but through the physical model of how the swing and ball-bat collision happen, not in ignorance of them.


#32    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 18:05

just that the method of decoupling physically dependent parameters is not the right way to get there

1. How dependent are they?
2. Even if they are dependent, can’t we infer a distribution of possible real-world combinations?

Let’s say we have a pitcher that throws his curveball at 70mph with 14 inches of “break”.  We have another pitcher that throws a curve at 70mph and 6 inches of break.

Both pitchers are able to throw as fast as 90mph at 2 inches of break.

Couldn’t we be able to take this information and be able to come up with a reasonable couplet for each pitcher at 80mph?  That the first pitcher might be able to throw a changeup at 80mph with 8 inches of break and the other’s 80mph changeup would have 4 inches of break?

***

So, we have a hitter with 100 batted balls of which the first three are this:
80mph, +20 launch, -17 spray
82mph, +15 launch, +12 spray
75mph, -2 launch, 0 spray

Rather than be limited by n=100, can’t we use the launch parameters to be able to construct a profile of a batter that would tell us his true talent distribution?


#33    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 18:08

Yes, but through the physical model of how the swing and ball-bat collision happen, not in ignorance of them.

I have no idea why you would say I’m trying to be ignorant of how the collision happens.

Let’s say a guy hits groundballs at 100mph because that’s how he gets most of his power.  His swing is simply made to generate the most speed by hitting groundballs.

And, if he starts to hit flyballs, he can’t generate as much power.  And so, this guy, who hits 100mph GB will only be able to hit 80mph FB.

I’m not being ignorant as to how the bat-ball collision works.

I’m asking how much can we infer by looking at various data points independently of each other.  And if we can’t, how much can we infer by looking at data points as couplets or triplets.


#34    Tangotiger      (see all posts) 2011/02/09 (Wed) @ 18:11

I’ve gotta go now, and I won’t be back online until tonight.


#35    Peter Jensen      (see all posts) 2011/02/09 (Wed) @ 20:33

I’m asking how much can we infer by looking at various data points independently of each other.  And if we can’t, how much can we infer by looking at data points as couplets or triplets.

If or when the Hit F/x and Field F/x data becomes available to us, we will be able to infer a whole host of new information about batter’s specific skills, including maximum bat speed, bat acceleration during the swing arc, ability to control the bat position striking the ball, even what the batter’s intentions were when hitting the ball.  We will also be able infer the spin on the hit ball, and the bat ball offset at the time of the bat ball collision.  We will be able to see how different pitch types affect all the above batter parameters as well as different movements on pitches and different pitch combinations.  We will finally be able to answer the question of whether batter pitcher matchups have any predictive value by looking at how well the batter is able to hit the ball rather than at hit ball outcomes.  We will even be able to infer different weather patterns within different areas above the playing field and what temperature and humidity the balls are being kept by the home team.

But we won’t learn about these things by doing correlations or regressions.  Instead, as Mike Fast has stated above, they will be the result of applying the established laws of physics and physiology.  So young people who desire a career in the stats department of major league baseball better add physics and phsiology courses to the economics and statistics courses they have been taking.


#36    Alan Nathan      (see all posts) 2011/02/09 (Wed) @ 20:46

I am not as optimistic as Peter regarding learning all about the swing parameters from hitf/x and fieldf/x.  However, I do agree with his point (which echos Mike’s point) that any progress we make will be in the context of good models for the batter’s swing (physiology) and for the ball-bat collision (physics).  The physics part of that has been a goal of mine for a long time.  Doing a regression analysis, much like what was done in the original article that started this thread, will not teach us much of anything.


#37    tangotiger      (see all posts) 2011/02/09 (Wed) @ 21:04

Let’s take another example of where correlations can help us.

Suppose that there is a relationship, a strong relationship, between batted ball speed, and spray angle.  That the batted ball speed is higher the more you pull, and lower the more you go opposite.  As an example, a RHH pulls at -30 degrees at a batted ball speed of 90mph.  But when he goes opposite at +30 spray, the batted ball speed is 70mph. 

I’m not suggesting a linear relationship, or even a smooth relationship.  But simply SOME relationship.  If I know that the average for this RH hitter is the above on pulls and opposites, does this then allow me to possibly create a distribution of batted ball speed from -45 to +45 spray angles?  Could we then also establish frequencies at which he hits at each spray angle?

I’m asking questions as a novice.

***

I have similar questions for outfielders taking routes.  We can see how often he takes direct routes at various spray angles and launch angles.  Could we look at those two, independently and/or as a couplet to then infer the chance of him taking direct routes at any combination of spray and launch angles (and batted ball speeds)?

I’m thinking more like a video game designer.  I’m trying to capture a player’s “true talent level” in various specific categories, and then he responds to various stimuli based on how well I can capture his true talent at these granular levels.

So, as a fielder, I’ll have say 10 or 15 things, true talent things, that represent him.  And then, based on the launch parameters, I can determine based on his 10 or 15 granular skills, how often he’ll take the correct route.

***

Same thing as a hitter.  That I would capture some 15 or 25 things of him as a hitter, granular things.  And, based on how he sees the ball, he’ll respond a certain way, and hit a certain way (distribution wise of course).

I’m suggesting that all the observations we see, keeping as a couplet, and/or decoupling, will let us infer the things that basically a scout would give his right arm to figure out.

***

I hope at least one person gets what I’m trying to say…


#38    tangotiger      (see all posts) 2011/02/09 (Wed) @ 21:15

To summarize:

1. given the launch parameters observed, what true talent level can we give this players at 15-25 granular physiological things? 

2. Once we have these 15-25 things, simulate 1 million contacted balls, and enter those launch parameters into Alan’s equation. 

3. Take the outcome of these 1 million hit locations, overlay on parks and fielder positioning, and figure out his hits and outs.


#39    tangotiger      (see all posts) 2011/02/09 (Wed) @ 21:25

Let’s take my #1.  We have a launch parameter of -30 spray, 90mph; +30 spray,70mph; -5 launch, 100mph; +20 launch, 80 mph.

Now, what most people would do is take that, and go right to #3: put it in Alan’s equation and see where the hit locations are.

I’m suggesting no, don’t do that, because it limits you.  We are far more interested in INFERRING what that SAMPLE data is telling us about that player, then to explicitly know what the outcome was of that sample data.

So, given those launch parameters, do we necessarily want to keep it as a couplet or triplet, or can we decouple the launch parameters (TO SOME EXTENT) so that we can establish #1: the players’ granular true talent level.


#40    tangotiger      (see all posts) 2011/02/09 (Wed) @ 21:31

Not to belabor the point, but if it’s not clear, this is a Bayes problem.


#41          (see all posts) 2011/02/09 (Wed) @ 21:37

I would like to note that I wasn’t in any way trying to say a linear model approach was better than using known and established aerodynamic equations to infer a ball’s projection.

My opinion of home runs is that it’s very arbitrary. A fly ball caught way deep has a very different run-value compared to a home run that may be considered as a lucky shot. I’m not arguing that aerodynamics equations are the wrong way to go. I’m more trying to figure out what effect the event of a HR, since home runs are categorical and even knowing the three variables Alan mentions of the ball projection won’t tell us if a home run occurred (unless it’s something ridiculous like 550 feet) since there are other environmental factors. What I was trying to figure out is, including physical properties of the fly ball, along with environmental factors, what relationship do these have with the outcome of the event (HR or no HR)? The topic of the original article was more of a basis to find better inference methods to evaluating talent beyond home run totals, or HR/FB rates.

[tangent]
I’m just curious, but why can’t we look at the data in both ways? Physics/Physiology and from a statistical standpoint? Don’t other fields do this, like studying the effects of body sizes, food diet, exercise, family history in predicting the event of a stroke, while another researcher might take to looking into the more physiological variables (blood pressure or other variables of the sort) and write a report that doesnt use linear modeling.


#42    Alan Nathan      (see all posts) 2011/02/09 (Wed) @ 21:41

Back in #21, Peter gave a link to his hitf/x article from two years ago.  He divided the three hit ball parameters (speed and two angles) into buckets with a granularity appropriate for the sample size.  Based on a reasonably large sample, he then examined the outcome (out, single, double,...), from which he could assign a linear weight to each bucket.  One could as well do the same thing for home run probability.  Then we have a metric for evaluating home run prowess, independent of the actual outcome.  For each batted ball, we see which bucket it falls into and we have it. 

Actually, I like using this technique more for things like BABIP than for home runs.  One assigns a probability of a hit for each bucket.  Doing this type of analysis gets at the heart of the question about the degree to which an elevated BABIP is based on luck.  If a batter consistently hits balls hard and in a narrow range of low launch angles (say, 8-15 deg--but don’t hold me to those numbers), that is the type of hit that will lead to a high BABIP, regardless of whether it actually does.  I don’t recall the actual words used by Peter, but I like to call it “outcome-independent batting metrics”.


#43    Alan Nathan      (see all posts) 2011/02/09 (Wed) @ 21:44

My last post crossed Kevin’s, so let me respond quickly.  My last post was indeed statistical.  It’s the best one can do with the information we have.  If we had the full trajectory (say, from TrackMan), we could then get back to the physics, meaning we could make a good start to using that trajectory to determine the swing parameters.


#44    Peter Jensen      (see all posts) 2011/02/09 (Wed) @ 21:52

Essentially you are asking whether having Hit F/x and Field F/x information will allow us to build better simulation models based on more granular “true talent” player skills rather than observed performance results.  The answer is, of course, yes, we should be able to build much better simulations.  With several caveats.  First, we don’t have access at this time to enough of the key pieces of data.  Second, when we get the data we will be able to identify more player true talent skills. But the tradeoff is that if you identify 15-25 skill factors, as you speculate above, you divide your data up into very small relevant groups and you have sample size problems in assessing significance.  Third, if you attempt to incorporate 15-25 skill factors into your simulation you vastly increase the complexity of the simulation and the time it takes to run. 

The bottom line is whether all the work involved creates enough of an improvement to be worth the cost in time, energy, and ultimately money.  A lot of work goes into better predicting the weather using the most sophisticated super computers because of the billions of dollars and actual lives of people at stake.  The same is true with modeling war and international negotiations or the mechanisms of world economies.  How much is it worth knowing whether Pujols is going to be worth 42 or 35 WAR of his next seven years?  To me, as a fan tne answer is not much.  To his potential employers it might be 35 million dollars or more.  Only time will tell if individual teams or MLB as a group will see the value in increasing analysis to this new level.


#45    tangotiger      (see all posts) 2011/02/09 (Wed) @ 23:25

Peter, I think we are in general agreement here.

Sabermetrics will reach its pinnacle when scouting observastion and performance outcomes can converge.  We’ll reach that, eventually, with the FX systems.  To do that, we need to use the FX data to infer the underlying granular talent of the players.

Everything (most anyway) we do, in sabermetrics, is about inference of the observed data.


#46    Tangotiger      (see all posts) 2011/02/10 (Thu) @ 12:00

To highlight the two links noted above.

Peter’s article was discussed in 2009:
http://www.insidethebook.com/ee/index.php/site/comments/fielding_independent_hitting_stats/

Lots of good comments in there.  MGL does a good job in showing what to be careful about:

Without at least the speed of the batter and some measure of power for the batter (as a proxy for fielder positioning), I would not even think of using a formula like this to replace a traditional lwts one.

Notably: if you are looking at batted balls in isolation, and overlaying over a particular park with a “standard” fielding alignment, you will be making a drastic error.  In addition to overlaying to a park, you need to overlay to a fielding alignment typical for that hitter (and really, a distribution of such fielding alignments, that not only looked at the hitter, but also the game situation).

***

As for Matt’s article: I quite enjoyed it, and I have no idea why I never linked to it.

It did make an appearance here:
http://www.insidethebook.com/ee/index.php/site/comments/contact_rate_by_pitch_type/
And here:
http://www.insidethebook.com/ee/index.php/site/comments/where/

That last thread was about outfielder positioning.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com