THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, November 16, 2011

HITf/x: (horizontal) batted ball speed

By Tangotiger, 10:31 AM

Great stuff from Mike:

Batters have a good deal of correlation between halves of the sample, with a correlation coefficient of r=0.76 with an average of 201 batted balls in each half. That means that we would add 63 batted balls (or about one month’s worth) at league average to the observed average speed for each batter in order to estimate his true skill.
...
Pitchers have fairly good correlation between halves of the sample, though not as good as batters. The correlation coefficient is r=0.48 with an average of 251 batted balls in each half. That means that we would add 269 batted balls (or about three months’ worth for a starter) at league average to the observed average speed for each pitcher in order to estimate his true skill.

Just fantastic stuff, and I’m glad Mike did it, as well as showing the key points, which is the point at which r=.50.

***

I’m not really surprised by the results.  The closer you get to someone’s base physical and mental skills, the less observations you need.  This is why scouts are so important.  And the F/X and Trackman systems are, at their heart, scouting tools. 

What we’ve had until recently are outcomes, results, things like OBP and K/PA, etc.  What drives OBP and the like are the players’ base skills AND luck.  That’s why we infer a players’ base skills by stripping out as much luck as we can figure out.  We do this through a Bayesian process (or its equivalent in regression toward the mean).  We need a few hundred contacted balls for a hitter, and in the thousands for a pitcher, in order for us to be able to strip out that luck to infer the base skill.

Inside a player’s contacted ball skill is not only the horizontal speed off the bat, but placement as well.

Unseen in Mike’s data is what the horizontal speed off the bat really means.  Let’s take a pitcher’s fastball speed.  We presume that there’s a high degree of correlation in a pitcher’s fastball speed.  I have no doubt that if you do a split-half correlation, you’ll get something ridiculous like r=.99 (really, it’s a question of how many nines) for pitchers who throw 1000 fastballs.  So, we can ascertain a scouting observation: we can readily and easily ascertain a pitcher’s underlying true fastball speed.

But, what does THAT give us?  He throws really hard or really soft.  But, that by itself, still doesn’t tell us how EFFECTIVE he is.

The next step is to correlate that particular base skill, that scouting-level observation, into results.  And Mike has given us that:

We see that a player who hits the ball at close to 80mph has a BACON of close to .300, while those who hit the ball at close to the league average (70mph) has a BACON of close to .200, and those at the league low (60mph) is just above .150. 

I have to say, all those numbers look pretty low.  I guess that’s what happens when you have non-linearity.  For example, suppose you hit one-third of your balls at under 60mph, another third at 60-80, and the last third at over 80mph.  (Numbers for illustration purposes only.) If it’s under 60mph, you get a batting average of .050 to .150, or say around an average of .120.  If you hit it between 60-80, it’s .150 to .300, or an average of .220.  And above 80mph, it’s from .300 all the way up to .650, for an average of say .500.  That gives you an average of .280, for an average of 70mph.  As you can see, the overall average for a distribution around 70mph is way above the batting average at the 70mph point.

Anyway, so what I’d like to see is this: create a DISTRIBUTION for each player, centered around his true talent horizontal speed off the bat, and apply the rates from the above chart (or a more smoothed version actually).  This way, we can end up with a player’s true talent BACON, if all we know is his horizontal speed off the bat.

THAT will tell us how valuable knowing his horizontal speed off the bat is.


#1    Peter Jensen      (see all posts) 2011/11/16 (Wed) @ 11:54

Excellent research Mike.  I hope you will also be allowed to publish follow up studies on the effect of vertical and horizontal angles off the bat from this data.  Since MLB teams now have 4 full years of Hit Fx data available to them I can only assume that at least some of them are already employing their own version of skill based metrics for player evaluations.


#2          (see all posts) 2011/11/16 (Wed) @ 12:15

Thanks, Peter. I’m working on a follow-up article on the effect of the vertical launch angle and its interaction with speed off the bat as they relate to BABIP.


#3          (see all posts) 2011/11/16 (Wed) @ 12:53

Fantastic stuff.

Did the league average include players without the necessary 300 batted balls?

How much would the results chane if you exclude bunts?  Or make bunt a variable in the model.  Taveras, Gomez, Blanco, Bourn, etc putting down bunts 15% of the time is really going to drag down their average and add to between player variation.  While this might be a real result, it’s a result of tactics not of skill.  And it is a tactic which is easily identified, unlike “going the other way to advance the runner” or an “intentional sac fly to take the lead in the 8th”.


#4          (see all posts) 2011/11/16 (Wed) @ 12:57

Yes, the league average included players without the necessary 300 batted balls.

I did not exclude bunts, but that would be a good idea.


#5    Tangotiger      (see all posts) 2011/11/16 (Wed) @ 13:13

The “average” is really going to be deceptive. 

Take for example a hitter, Bang Orbuster, and another hitter, Steady Eddy.

Bang Orbuster hits half his pitches at a 40mph level and the other half at 100mph, for an average of 70mph.

Steady Eddy hits all of his pitches at 60 to 80mph.

Since 60 to 80mph has a pretty linear relationship, when you do the average for 70mph, it will really represent the average BACON for each point of 60 to 80.  The average BACON at 60 to 80 mph and the average speed of 60 to 80 mph will correspond directly.  And 70mph is around .220 (whether you take the exact point there, or the average of 60 to 80, more or less).

Bang Orbuster however has a 40mph BACON at .050 and a 100mph BACON at .600, for an average of 70mph and .325 BACON.

As you can see, treating things as an “average” really only works if the relationship you are looking at is linear.  If you have a non-linear relationship, getting the average doesn’t help.

You can think of also spray patterns: a pitcher throws inside or outside equally (great!), or a pitcher throws down the middle all the time (terrible!).  On “average”, both pitchers throw down the middle.  So, averaging doesn’t help us.

What we really care about is throwing on the edges, either edge, is great.  In that case, we are more interested in averaging the ABSOLUTE VALUES of the distance from the center of home plate.  So, you’ll get that he throws his pitches 0.8 feet off the plate.  If you instead average his -0.8 feet pitches and his +0.8 feet pitches, you get 0.

Hence, the average, without knowing the distribution, is going to skew things, and it’ll happen any time you don’t have a linear relationship.


#6    Tangotiger      (see all posts) 2011/11/16 (Wed) @ 13:26

Another example is “average” flyball distance.  Average of 370 flyball and average of 350 fb may seem “similar”, but if the 350 guy gets alot of warning track plays, then the number of HR hit will be drastically different from the 370, and really similar to the 330 guy.


#7          (see all posts) 2011/11/16 (Wed) @ 13:28

I would say that average works pretty well in this case, though of course it’s not perfect.  Almost all the batted balls are in the 40-100 mph range where an increase in horizontal batted ball speed results in an increase in BACON.

The missing piece that adds a lot more than accounting for the non-linearity here is to understand how launch angle impacts the results, as Peter said.  It works better to incorporate that into the model first, which I hope to do soon in another article.


#8          (see all posts) 2011/11/16 (Wed) @ 13:33

This is nothing like using average plate location where the two extremes are good and the middle is bad.  That is not an applicable example.

I do see some similarities to your average flyball distance example, but I’m not sure what you’re saying there.  In that case, also, I’d expect some non-linearity but that average flyball distance would be a pretty valuable metric.

If you’re arguing that what I presented in this article is not sufficient for modeling BABIP, I have no disagreement with that.

I was mainly addressing the conclusion that some have drawn from DIPS that the pitcher has little control over the quality of contact.  That conclusion is false.

I will address more about how to model expected BABIP in another article.  As I said in #7, that requires understanding more about how launch angle affects the likelihood of a hit.


#9    Tangotiger      (see all posts) 2011/11/16 (Wed) @ 15:25

If you’re arguing that what I presented in this article is not sufficient for modeling BABIP, I have no disagreement with that.

Yes, this. 

If you look at your chart from 60 to 80 (which is where your extreme hitters lie), we see relatively little difference between 60 to 70, and then a huge jump from 70 to 80.  So, a guy who hits disproportionately more at 80+ and 60- to get an average of 70 is much better than a guy who hits in the 60-80 range.

And it has to do with the non-linearity.

As for my analogy to pitch location, my only intent there is to show that “average” does not mean “midpoint”.  Average hit location, average distance, etc.  Average anything, where the relationship is not a linear relationship means that using average is going to be limiting… to some extent.  The extent being how non-linear the relationship is.

I was mainly addressing the conclusion that some have drawn from DIPS that the pitcher has little control over the quality of contact.  That conclusion is false.

It was false from the outset!  As I noted in the solvingdips file:

It may very well be that if we look at very specific breakdowns by zones, opponent, fielders, park, weather, etc, that we CAN ascertain what a pitcher’s skill is at preventing hits on balls in play (see: PZR). It’s just that, for the moment, the metric called “hits per ball in park” does not do a good enough job at establishing the pitcher’s skill with “hits per ball in park”.

***

I will address more about how to model expected BABIP in another article.  As I said in #7, that requires understanding more about how launch angle affects the likelihood of a hit.

Looking forward to it!


#10    Tangotiger      (see all posts) 2011/11/16 (Wed) @ 15:37

By the way, in Mike’s recap of DIPS history, MGL’s article should also be included:

http://www.baseballthinkfactory.org/files/primate_studies/discussion/lichtman_2004-02-29_0/

It was the first one that dealt with batted ball outcomes, as it pertains to BABIP. 

I’d also include the Solving DIPS compilation (notably from Arvin Hsu and Erik Allen) as groundbreaking as well, not only for BABIP, but for regression toward the mean as well:

http://www.tangotiger.net/solvingdips.pdf


#11    Tangotiger      (see all posts) 2011/11/16 (Wed) @ 15:43

As for the “who has more influence”.  There are two ways to answer that.

#1: They each have the exact same influence.

#2: The batter has much more influence.

How can you have both as the correct answer?

Let me give you one example: we know that the range in OBP for hitters is far wider than it is for pitchers.  The reason should be clear, but in case it’s not: to be a nonpitcher, you can be a great hitter or a great fielder or a great runner.  There’s difference ways to be in MLB for a nonpitcher.  For a pitcher, there’s really just one way: make outs.

So, if you have a .300 true OBP for a pitcher, facing a .400 true OBP for a hitter, the resulting matchup will be IDENTICAL to a .300 true OBP for a hitter facing a .400 true OBP for a pitcher.  ANY AND ALL combinations of pitcher and batter, if you reverse them, will give you the EXACT SAME ANSWER.

Therefore, from that perspective, they are “identical” in terms of how much influence they exert on the OBP.

But, because the range in OBP is much tighter for pitchers than it is for hitters, it is far more important to know the OBP of the hitter than that of the pitcher.  It is purely because of the spread in talent can we say that the hitter influences the OBP more than the pitcher.

So, it depends on how you look at it.  Either way you look at it though, the final answer of the .300 v .400 matchup is the same.


#12    Brian Cartwright      (see all posts) 2011/11/16 (Wed) @ 21:23

Great work Mike.

If you’ve seen my THT Annual article, I’ve been working on some similar analyses.

You answered a question at BP on how the hSOB was derived, and talked about the horizontal distance travelled and hang time, but I don’t have those values in my Hitf/x sample. I’ve been using cosine(vert_angle) * init_speed.

Looking at the init_speed at each vert_angle, calculating a delta and then combining into a weighted mean, I found a range of about 10 mph for batters and 5 for pitchers.

Mike’s method weighted down the observed init_speed based on the vert_angle, so that pop ups would have a very small value. My WOWY style method of grouping the results by vert_angle found that even at the same vertical angles, Mariano Rivera was allowing about 3 mph less init_speed than MLB average (Apr-June 2009).

This result could tie in to the Trackman stats linked here yesterday. Those showed that faster spinning pitched balls, whether fastball or breaking ball, yielded lower hit rates. I presume the faster spin causes more movement or break, which makes it more difficult to consistently square the ball up. re the recent article at THT on whether slider pitchers allow lower BABIPs.


#13    Brian Cartwright      (see all posts) 2011/11/16 (Wed) @ 21:26

I found a range of 10 mph from best to worst hitters, 5 mph for pitchers.


#14          (see all posts) 2011/11/16 (Wed) @ 21:52

Thanks, Brian.

If you’ve seen my THT Annual article, I’ve been working on some similar analyses.

I haven’t yet.

You answered a question at BP on how the hSOB was derived, and talked about the horizontal distance travelled and hang time, but I don’t have those values in my Hitf/x sample. I’ve been using cosine(vert_angle) * init_speed.

No, I don’t have distance and hang time in my data.  I was using that to try to explain the concept of hSOB.  But the HITf/x data only has the initial parameters, same as what you got from Sportvision, I presume.


#15    Tangotiger      (see all posts) 2011/11/17 (Thu) @ 01:45

Mike also noted this:

“For that sample, the best prediction for the horizontal speed of the ball off the bat comes from weighting the pitcher’s regressed average hSOB by 1.04 and the batter’s regressed average hSOB by 0.99. “

The expectation would have been to have a coefficient of exactly 1 for each of the pitcher and hitter.

There are two reasons that it’s not:
1. Sample size.  1.04 and 0.99 are really really close to 1 that it’s basically actually 1, but sampling shows it to be off by a little.

2. You REALLY should use the Odds Ratio method, and not the additive method (which is what the above would imply).  The above is the way say Strat-O-Matic would do it.  So, using the additive method is an approximation, and so, the weighting is off a little.


#16          (see all posts) 2011/11/17 (Thu) @ 12:34

Sorry to have missed out on all this discussion.  I’ve been at a conference the last two days and am just now getting caught up.  I’ll try to find time today to read Mike’s article plus all the comments here.


#17          (see all posts) 2011/11/17 (Thu) @ 14:27

Quick comment on Mike’s excellent article.  I believe that the particular batting skill for high babip (high horiz velocity as Mike says) is not the same skill for high hr probability.  The latter requires both high batted ball speed and a launch angle in the approx range 25-35 deg.  When I get a chance I’ll post my analysis of hr from 2009-10 seasons.


#18    Colin Wyers      (see all posts) 2011/11/17 (Thu) @ 14:30

You REALLY should use the Odds Ratio method

How?


#19    Tangotiger      (see all posts) 2011/11/17 (Thu) @ 16:23

I obviously can’t use the Odds Ratio method, since that would imply a binary choice, and that’s not what we have here.  Thanks to Colin for pointing that out.

We’d really have to test what we’d need to use. 

Probably it would be multiplicative, so that if a guy has a 60mph talent and the other guy has an 80mph talent, and the league average is 70, then the expected should be 80*60/70 = 68.6.  As it stands, Mike is doing 80+60-70=70.


#20          (see all posts) 2011/11/17 (Thu) @ 19:21

I believe you can use a copula to model joint distributions if you have a distribution for the hitter, a distribution for the batter, and know (or can estimate) how much one affects the other.



Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com