THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Ball_Tracking

Tuesday, January 31, 2012

Dissecting a mystery pitch

By Tangotiger, 04:46 PM

The always affable, generous, and insightful Alan Nathan follows up.

KNUCKLEf/x

By Tangotiger, 11:42 AM

From Alan:

Within the precision of the tracking data, knuckleball trajectories are just as smooth as those of ordinary pitches. Read on to find out how I arrived at this conclusion.
...
With apologies to John Walsh, I conclude that knuckleballs are more like bullets than butterflies.

How about it’s more like bullets fired by a 10-yr old?  So, the kid doesn’t have good aim, he’s jittery when he fires, and ask him to rehit the same target, he won’t be able to.  So, it follows a smoothish pattern, after the fact, but even upon release (knowing angle of release, speed, spin angle), you won’t be able to predict that path.

That about right?

(3) Comments • 2012/01/31 • SabermetricsBall_Tracking

Wednesday, January 11, 2012

Umpire strike zone size

By Tangotiger, 10:22 AM

Josh takes a look.

I must say, this looks a bit of a step back from others that I’ve seen.  Suppose that an umpire has pitchers that throw alot of pitches in the middle of the strike zone.  The way Josh calculates it, that’s included in his metric.

The way the other guys have done it, they looked for a “contour” of x-inches of width where the called strike to called ball ratio was 1.  Basically, the wide pitches and the center pitches tell us nothing about the umpire. 

Or, do they?  Perhaps, rather than a step back, Josh took a step to the side.  If a pitcher is throwing alot of pitches in the middle of the strike zone, he may be doing it in ANTICIPATION of a smaller strike zone from the umpire.  So, if an umpire happens to see alot of down-the-middle pitches, then that may tell us something.  Except.... Josh removed any pitches that the batter swung at.

So, I think there’s alot of different considerations here, as to exactly what it is that Josh (or, the interested reader) actually wants.  I’m not sure that Josh answered his intended question, or some other intended question.  And, it’s quite possible, that the huge sample size mitigates any of this anyway, since umpires are not paired with pitchers.

Anyway, lots of things to consider, and reformulate…

(9) Comments • 2012/01/11 • SabermetricsBall_Tracking

Thursday, January 05, 2012

Run values by strike zone location

By Tangotiger, 05:41 PM

Great job by Josh:

On pitches down the middle, the balls that are put into play have, on average, about twice the magnitude of run value as pitches that aren’t put in play. That means for the two to come into equilibrium, you would need to have about 33% of pitches put into play and 66% not put into play. But as discussed earlier, far fewer than 33% of pitches are put into play. This means that, on average, pitches thrown down the middle are good for the pitcher, not the batter.

Merging everything together, we can see this visually:

(13) Comments • 2012/01/07 • SabermetricsBall_Tracking

Wednesday, December 07, 2011

PITCHf/x leaderboards that span multiple seasons?

By Tangotiger, 12:35 AM

Yup.

(4) Comments • 2011/12/08 • SabermetricsBall_Tracking

Monday, December 05, 2011

PITCHf/x on Fangraphs update

By Tangotiger, 07:58 PM

Some updates on Fangraphs.  Check out the various tabs in there.  Tons of good stuff.

And, if you have suggestions, post them here.  David is pretty much incomparable in terms of turnaround time of taking suggestions and implementing them.

(7) Comments • 2011/12/06 • SabermetricsBall_TrackingData

Wednesday, November 30, 2011

Swing area predictability

By Tangotiger, 11:50 AM

More good stuff from Josh.

But if we have plate discipline metrics from 2010, we also know strikeout rate from 2010. Do these metrics give us any information that the previous year’s strikeout rate does not?

If I run a regression of 2011 strikeout rate on 2010 strikeout rate and 2010 swing area, I find that swing area no longer has significance. I find the same result for O-swing. In other words, these plate discipline metrics are not useful in predicting the next year’s strikeout rate if we already know the previous year’s strikeout rate.

Tuesday, November 22, 2011

HITf/x: (vertical) launch angle

By Tangotiger, 10:31 AM

A followup to Mike’s terrific piece of the horizontal speed off the bat, this time, with the added focus of the vertical launch angle

There’s actually plenty of info here, and I can’t comment properly yet until I do a second re-read.  There’s also something that seems inconsistent, and I’m hoping Mike can set me straight on whether one of the charts published needs to be updated, or my reading skills need to be improved.  I’m hoping it’s the latter.

(41) Comments • 2011/11/28 • SabermetricsBall_TrackingHit_Tracking

Friday, November 18, 2011

Swing area

By Tangotiger, 10:23 AM

Excellent work, and exactly the kind of thing that is actionable.  As noted, it’s more helpful to break up by count, but that’s just one small step away.  (And pitch types, natch.) He also did it for pitchers.  Just fantastic work.

(7) Comments • 2011/11/19 • SabermetricsBall_Tracking

Wednesday, November 16, 2011

HITf/x: (horizontal) batted ball speed

By Tangotiger, 10:31 AM

Great stuff from Mike:

Batters have a good deal of correlation between halves of the sample, with a correlation coefficient of r=0.76 with an average of 201 batted balls in each half. That means that we would add 63 batted balls (or about one month’s worth) at league average to the observed average speed for each batter in order to estimate his true skill.
...
Pitchers have fairly good correlation between halves of the sample, though not as good as batters. The correlation coefficient is r=0.48 with an average of 251 batted balls in each half. That means that we would add 269 batted balls (or about three months’ worth for a starter) at league average to the observed average speed for each pitcher in order to estimate his true skill.

Just fantastic stuff, and I’m glad Mike did it, as well as showing the key points, which is the point at which r=.50.

***

I’m not really surprised by the results.  The closer you get to someone’s base physical and mental skills, the less observations you need.  This is why scouts are so important.  And the F/X and Trackman systems are, at their heart, scouting tools. 

What we’ve had until recently are outcomes, results, things like OBP and K/PA, etc.  What drives OBP and the like are the players’ base skills AND luck.  That’s why we infer a players’ base skills by stripping out as much luck as we can figure out.  We do this through a Bayesian process (or its equivalent in regression toward the mean).  We need a few hundred contacted balls for a hitter, and in the thousands for a pitcher, in order for us to be able to strip out that luck to infer the base skill.

Inside a player’s contacted ball skill is not only the horizontal speed off the bat, but placement as well.

Unseen in Mike’s data is what the horizontal speed off the bat really means.  Let’s take a pitcher’s fastball speed.  We presume that there’s a high degree of correlation in a pitcher’s fastball speed.  I have no doubt that if you do a split-half correlation, you’ll get something ridiculous like r=.99 (really, it’s a question of how many nines) for pitchers who throw 1000 fastballs.  So, we can ascertain a scouting observation: we can readily and easily ascertain a pitcher’s underlying true fastball speed.

But, what does THAT give us?  He throws really hard or really soft.  But, that by itself, still doesn’t tell us how EFFECTIVE he is.

The next step is to correlate that particular base skill, that scouting-level observation, into results.  And Mike has given us that:

We see that a player who hits the ball at close to 80mph has a BACON of close to .300, while those who hit the ball at close to the league average (70mph) has a BACON of close to .200, and those at the league low (60mph) is just above .150. 

I have to say, all those numbers look pretty low.  I guess that’s what happens when you have non-linearity.  For example, suppose you hit one-third of your balls at under 60mph, another third at 60-80, and the last third at over 80mph.  (Numbers for illustration purposes only.) If it’s under 60mph, you get a batting average of .050 to .150, or say around an average of .120.  If you hit it between 60-80, it’s .150 to .300, or an average of .220.  And above 80mph, it’s from .300 all the way up to .650, for an average of say .500.  That gives you an average of .280, for an average of 70mph.  As you can see, the overall average for a distribution around 70mph is way above the batting average at the 70mph point.

Anyway, so what I’d like to see is this: create a DISTRIBUTION for each player, centered around his true talent horizontal speed off the bat, and apply the rates from the above chart (or a more smoothed version actually).  This way, we can end up with a player’s true talent BACON, if all we know is his horizontal speed off the bat.

THAT will tell us how valuable knowing his horizontal speed off the bat is.

(21) Comments • 2011/12/15 • SabermetricsBall_TrackingHit_Tracking

Tuesday, November 15, 2011

Trackman Leaderboards

By Tangotiger, 03:04 PM

trackman_leaders.pdf

(4) Comments • 2011/11/16 • SabermetricsBall_Tracking

Wednesday, October 26, 2011

Where do pitchers pitch to Pujols?

By Tangotiger, 02:02 PM

I like these “differential” graphs, because it saves me the trouble of comparing to the league average (though as noted later in the article, it would be better to match on the count).  Chart is from the catcher/batter/ump perspective.  Pujols gets more pitches (red) low and away, and less pitches (blue) up.  No surprise of course.

Meaning in hot and cold zones?

By Tangotiger, 10:16 AM

This article was a long-time coming, so thanks to Mike for all the hard work on this one:

I ran a regression for all the right-handed batters with at least 630 plate appearances in 2007-2011 that ended on a pitch in the strike zone.
...
With larger sample sizes, the split-half correlation improved somewhat, as expected. However, even with only four zones, much noise remained in the results. Here is the regression equation for right-handed batters:

Zone Performance in Split Half 2 = (0.32 * Zone Performance in Split Half 1) + (0.32 * Performance in Other 8 [Ed note: Mike meant 3 here] Zones in Split Half 1) + (0.36 * League Average Performance).

The correlation coefficient was r=0.46, and the p-values for both input variables were highly significant (<.0001).

With sample sizes from larger zones between 200 and 300 plate appearances in each half of the sample, both the split-half correlations and the statistical significance of the results have improved.

Let’s say the average number of PA per player in the sample is 2000 PA.  So, we can say that for someone with 2000 PA, and if you want to know how good he is at balls in the top left corner, you take one-third based on his performance in zone A, one-third base on his performance in the other three zones B, C, D, and one-third the league average.

Michael Young for example had OBSERVED TAv of the following: .270 (up and away), .353 (up and in), .198 (low and away), .257 (low and in).  If you want his TRUE low-and-away, you would take one-third .198, one-third the other three (.293), one-third league average (whatever that would be… let’s just say it’s .220), to come up with a(n estimated) TRUE TAv of .237.

Now, I’m thinking we’re going to have some selection bias here.  It doesn’t look like Mike controlled for count, and he’s only looking at the very last pitch of the PA.  If you know a plate appearance ended on a pitch low and in, it’s possible that you got an out.  That may be one reason we see some difference.  I don’t know, but we need to control for count, and even after that, I’m not sure that’s enough.

This is a great first step, so I definitely want to encourage more work like this.  Just seeing the OBSERVED hot/cold, but not the TRUE hot/cold is a definite hole in (public) sabermetrics right now. 

And I think the next step is to treat each pitch in the plate appearance, one by one, rather than just looking at the last pitch of the plate appearance.

(16) Comments • 2011/10/26 • SabermetricsBall_Tracking

Sunday, October 23, 2011

Passive-aggressive hitters

By Tangotiger, 12:43 PM

Good stuff.

Thursday, October 20, 2011

INTROf/x

By Tangotiger, 07:16 PM

Mike did a bang-up job on PITCHf/x in the THT Annual a couple of years ago, and studes has made it available for free for the public (pdf).  Tremendous stuff.

There are two other must-haves as well in book form.  Dave Allen did one (I don’t remember where), and I think John Walsh or Harry Pavlidis did another.  Heck, there might even be more, and I don’t remember.

In any case, thanks to studes for opening up the vault on this one.  I’m looking forward to getting the new THT annual.  This will be the first one where I haven’t contributed something in a while.  I think I wrote in each of the last 3 or 4.

Tuesday, October 18, 2011

Outcomes by fastball speed

By Tangotiger, 01:42 PM

Jeff presents some interesting data.

This is a perfect example of a sampling bias.  While it looks like this is a complete population of pitchers, the reality is that the MLB pitchers is a sample of all pitchers.  Notably, if your fastball speed is below 90mph, then the only way to be a MLB pitcher is to be able to do something else well (location, movement, other pitches, etc).

Ultimately, the bias is so strong as to render the data presented as applying only to those pitchers who happen to be in MLB, and you can’t apply it to all pitchers who throw at that speed.

(4) Comments • 2011/10/19 • SabermetricsBall_Tracking

Landing/crossing spot of uncaught pitches

By Tangotiger, 09:14 AM

Great stuff from Bojan, who models the wild pitch / passed ball scenario into two types: those that actually land in front of the catcher, and those that don’t.  It adds a level of complexity, but it represents reality, so I’m very happy he went the extra mile here:

Now that we see how he models it, we get the payoff.  If you can mentally “fold” it at the line, you can do so if that helps:

Then he has a ton more good stuff.  And the payoff to see the impact by catcher, where you want to focus on the last column that shows that we’re talking about 4 runs of value:

We can compare to data I produced here for 1978-1990, and see that, other than Bruce Benedict, the best catchers saved 15 “passed pitches”, which converts to around 4 runs.

If we both end up in the same place, then why go to the lengths Bojan did?  Well, two good reasons.  Number one is we learn, and for that Bojan does a fantastic job.  Number two is that his model can pinpoint things with a much smaller sample size than what I would need.

Remember the thread I had yesterday about fielding opps not created equal?  The same applies here.  Whereas after a few years, we’d expect all catchers to have the same kind of catching opps (after adjusting for the identity of the pitcher), a CATCHERf/x type of system would require a far smaller sample.

Here’s his list for worst catchers at blocking:

He also shows the correlation, and from there, we can actually figure out how much to regress the observed sample.  The average sample size was over 6000 pitches for each catcher in each bucket.  To figure out how much to regress, you do (1-r)/r * N.  Since r=.68, you add about 3000 pitches of league average performance.  It looks like there’s around 40 pitches per game in his sample, so we’re talking about adding around 75 games of league average performance to get from observed rate into a true talent rate.  That is, r=.50 when G=75.

Anyway, this is in the running for my favorite research piece of the year.

(37) Comments • 2011/10/20 • SabermetricsBall_TrackingFielding

Wednesday, October 05, 2011

Three-area strike zone

By Tangotiger, 04:51 PM

I missed Matt’s article for some reason, but someone was kind enough to point it out to me.

Great stuff.  The three-area strike zone is definitely a step in the right direction.  I’ve been trying to model it more like a six-area strike zone.  (Fans of hockey probably know what I’m talking about, with the 4 corner targets and the five-hole.)

I’m still not satisfied with the down-the-middle pitches not being as damaging as I would think.  I’m thinking it’s more like with a goalie’s 5-hole that moves as the goalie moves in his crease.  The down-the-middle is not a fixed point pitch to pitch, if that makes any sense.

(4) Comments • 2011/10/05 • SabermetricsBall_Tracking

“Why Baseball Info Solutions pitch location data is the best in the industry”

By Tangotiger, 12:58 PM

John Dewan:

One of the questions that has come up is: How can the video scouts who track pitch location data at Baseball Info Solutions (BIS) be as good as Sportvision’s very cool Pitch F/X technology that tracks pitch location using hi-tech camera angles. In short, how can a human being be as good as the technology?

The answer is that, at BIS, it’s not simply human vs. technology. The equation at BIS is that technology PLUS human review is much better than technology alone. Let me explain. Pitch F/X technology is a huge step forward in baseball analytics and the pitch location data it provides is excellent. But not perfect. At BIS, they take it a step further. Thanks to the fact that Pitch F/X data is publicly available, when BIS video scouts review video to determine pitch location, they also have information about how Pitch F/X plotted the location. The video scout reviews both the actual video of the pitch and the Pitch F/X location to determine where the pitch is located. In essence, pitch location charting at BIS enhances the charting done by Pitch F/X to come up with what BIS believes to be the best data possible, a kind of Enhanced Pitch F/X.

As a way to test this, BIS conducted an impartial study. They selected the 100 pitches from their database of the 2010 season that represented the biggest discrepancies in pitch location between BIS data and raw Pitch F/X data. They then meticulously reviewed video once again on all these pitches. The video scouts reviewed the pitch location and selected the data source, either BIS or Pitch F/X, that they believed best represented the true location.

These impartial video reviewers chose BIS plotted pitch location data 55 percent more often than the raw Pitch F/X location as the correct location. The details: 59 choices for BIS pitch location (Enhanced Pitch F/X), 38 choices for the raw Pitch F/X location, 2 pitches that Pitch FX has since corrected, and one pitch where neither location was close.

While these results still leave room for improvement, they do indicate that the BIS method of enhancing pitch location data improves on the charting of raw pitch location by itself.

I’d like to see this evidence posted for everyone to discuss.

(51) Comments • 2011/10/06 • SabermetricsBall_Tracking

Tuesday, October 04, 2011

PITCHf/x v BIS

By Tangotiger, 05:00 PM

Colin is showing the compiled data from PITCHf/x, and Fangraphs continues to use BIS.

I’m only doing a very cursory review here, so hopefully, someone will be inspired to do alot more.  I sorted by “Zone_rt” or “zone%”.  I put in a minimum 2000 pitches on Colin’s site, and 500 PA on David’s.  That should correspond fairly well.  Anyway, #1 for Colin and #5 for BIS for amount of pitches in the strike zone is Vlad (41% with PITCHf/x, 39% with BIS… guys, can you drop the decimals please?).

But, #1 for BIS is Prince Fielder at 38%, and he’s #23 with Colin at 46%.

I stopped there.  Now, I understand that there is no fixed strike zone to speak of.  And maybe Prince gets pitched more on the edges, so we expect to see some differences.

Hopefully, someone else does some leg work here and gives us some correlation numbers.

(6) Comments • 2011/10/04 • SabermetricsBall_Tracking
Page 2 of 20 pages  <  1 2 3 4 >  Last »

Latest...

COMMENTS

May 16 22:47
Dodgers’ win reversed because Mattingly did not attest to proper score!

May 16 20:44
How to beat the shift

May 16 20:02
Sponsoring MLB jerseys

May 16 19:34
Now you frame it, now you don’t

May 16 16:56
Did Manny Pacquaio actually quote Leviticus?

May 16 16:06
Does changing your pitch frequency lead to substantial change in results?

May 16 14:18
Extra Innings: One-minute review

May 16 14:16
This particular criticism of UZR is unfounded

May 16 13:21
Psst… wanna intern for the Astros?

May 16 12:23
Arena wars

THREADS

May 16, 2012
Now you frame it, now you don’t

May 16, 2012
Dodgers’ win reversed because Mattingly did not attest to proper score!

May 16, 2012
Does changing your pitch frequency lead to substantial change in results?

May 16, 2012
Sponsoring MLB jerseys

May 15, 2012
Andre The Hawk Dawson speaks

May 15, 2012
Euro 2012 Preview

May 15, 2012
How to beat the shift

May 15, 2012
Will Pujols end the season with at least 30 HR and .500 SLG?

May 15, 2012
Kershaw v Strasburg, part 2

May 15, 2012
Did Manny Pacquaio actually quote Leviticus?