THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, January 11, 2012

Umpire strike zone size

By Tangotiger, 10:22 AM

Josh takes a look.

I must say, this looks a bit of a step back from others that I’ve seen.  Suppose that an umpire has pitchers that throw alot of pitches in the middle of the strike zone.  The way Josh calculates it, that’s included in his metric.

The way the other guys have done it, they looked for a “contour” of x-inches of width where the called strike to called ball ratio was 1.  Basically, the wide pitches and the center pitches tell us nothing about the umpire. 

Or, do they?  Perhaps, rather than a step back, Josh took a step to the side.  If a pitcher is throwing alot of pitches in the middle of the strike zone, he may be doing it in ANTICIPATION of a smaller strike zone from the umpire.  So, if an umpire happens to see alot of down-the-middle pitches, then that may tell us something.  Except.... Josh removed any pitches that the batter swung at.

So, I think there’s alot of different considerations here, as to exactly what it is that Josh (or, the interested reader) actually wants.  I’m not sure that Josh answered his intended question, or some other intended question.  And, it’s quite possible, that the huge sample size mitigates any of this anyway, since umpires are not paired with pitchers.

Anyway, lots of things to consider, and reformulate…


#1          (see all posts) 2012/01/11 (Wed) @ 12:01

With experience with the method that Josh is using to measure the zone, I think his methodology for zone measurement itself is perfectly fine.  He essentially does what you say above: uses the contour at which Pr(Strike)=Pr(Ball), given the location.  The methodology is what I use for my zone comparisons and is superior than other methods I’ve seen, IMHO.  This is precisely the technique I use for two academic papers I have under construction.

The advantage of the method is that it doesn’t really matter what the pitcher anticipates the umpire, related to zone size.  We still find that 50% contour that you describe with his method.  This is the case even if 100% of the pitches are thrown down the middle and well outside the zone (though, the imputation would be less accurate...but for most umpires, there isn’t any serious issue with having a full distribution of pitch locations).

If pitchers are throwing to the middle of the plate more due to the smaller strike zone, the hypothesis is that we would pick it up in the kwERA (easier pitches to hit and/or possibly more walks, since fewer are being called strikes if they aren’t changing their behavior).  As you mention, there are important considerations here with non-called pitches as well that may or may not be accounted for.

I agree it would also be interesting to look at the average location overall by umpires...but as for calculation of the strike zone size itself (leaving aside its impact on the game), I find this method to be the most appropriate one out there. 

Of course, the contour chosen is up to the researcher, but I also use the same choice that Josh does here (50% contour...i.e. more likely to be called a strike than a ball means “in the strike zone” and less likely to be called a strike than a ball means “outside the strike zone")


#2    Tangotiger      (see all posts) 2012/01/11 (Wed) @ 12:39

If he did use the countour method (a bunch of concentric circle-ish lines, one inside the other), then that’s fine.  That’s what I meant about the x-width, and 50% of called strikes in that countour region.

But, Josh is NOT using the countour method (as far as I can tell).  This is especially obvious when he talks about intentional balls.  I mean, why even talk about them, if he’s using the countour method, since they’d virtually never be in the sample of the particular 50% contour.

This is how I was reading Josh’s method: if an umpire had 3000 called pitches (including intentional balls), then take the 1500 called strikes closest to the center.  That becomes his “called strike” region.

Did I read him wrong?


#3          (see all posts) 2012/01/11 (Wed) @ 12:57

Just sounds like a misunderstanding, maybe he didn’t explain well enough.

Technically, calling it a ‘countour’ method is too vague, as you can draw contours for both densities and for probabilities.  But I understand the confusion here.

What you describe that you suspect Josh did is contours for kernel density estimation.  If that is the case, then that is not a good way to do it.  This would be just the % of called pitches that are called strikes.  This would make the strike zone much smaller than it would otherwise be (it would likely be at the 100% ‘contour’ the way you want to see it).  Here, we have:

px*pz contour at 50% == 1500/3000 as you say.

However, Josh uses a generalized additive model, in which it is modeled just like a binomial regression, but with smooth functions for pitch location.  So the 50% contour is the line at which Pr(Strike)=Pr(Ball).  If we say f() is a smooth function, then we have:

g(u) = f(px,pz)

Where u is 0 for a ball call, and 1 for a strike call, and g() is the logit link function.  The only difference between this and a logistic regression is that there aren’t parameter estimates for px and pz since they are non-monotonic due to the spatial features of the zone.  They are estimated non-parametrically and cross-validated using the data to ensure you don’t over/under fit.

This is also what he does for the swing areas, but “u” just equals 1 if the batter swings, and 0 if he does not.  Including intentional balls WOULD affect swing RATE overall, but should not affect the swing area estimation if done as I believe Josh did.

Josh can correct me if I’m wrong here, but I’m nearly 100% certain that he did it the way you are hoping he did.  I generally get rid of intentional balls and pitch outs in my models, but I have found that they don’t have much of an impact on the contour if done the latter way (and they really shouldn’t).  They certainly would in the 1500/3000 that you describe above.


#4          (see all posts) 2012/01/11 (Wed) @ 12:59

Woops, u above is the mean of the “0” and “1” binomial dependent variable.  Still get the idea though.


#5    Tangotiger      (see all posts) 2012/01/11 (Wed) @ 13:16

Right, exactly on the intentional balls.  Almost all the intentional balls (if not all) are not going to be in the 50% region.  So, even talking about them was pointless in my view.  That’s where I got my confusion.

Given Josh’s other work on the swing areas, where he does use the concentric circles, it seemed odd that he didn’t maintain that.

So, he probably did, and I read him wrong.


#6    Josh Weinstock      (see all posts) 2012/01/11 (Wed) @ 14:07

I did use the contour method. It was not based on a kernel density estimate, but on a GAM model as Millsy correctly guessed.

I did basically the same thing that I did with swing area with the concentric circles. The circle now is just the 50 percent contour, where as before it was lower. Guess I wasn’t clear about that.


#7    Tangotiger      (see all posts) 2012/01/11 (Wed) @ 15:18

Josh: what percentage of the intentional balls fell at the 50% contour?  If it’s zero, then this means it’s irrelevant how you handled the intentional balls, correct?

Put another way, what is the closest intentional ball to the contour line?  Must be at least 6 inches, no?


#8    Josh Weinstock      (see all posts) 2012/01/11 (Wed) @ 15:27

I haven’t checked, but I can’t imagine any intentional balls fell within the 50 percent contour. It’s probably 0, and if not, it’s like 1 or 2 intentional balls when the pitchf/x cameras were badly miss-calibrated. The intentional balls are so far away from the contour line that they make no difference. That’s why I included them, because it wouldn’t have made a difference to exclude them.

Same thing with pitches in the dirt, or any other pitch with a very low likelihood of being called a strike. These pitches wouldn’t have changed anything.


#9    Tangotiger      (see all posts) 2012/01/11 (Wed) @ 15:35

Right exactly, hence my initial confusion.

Ok, we’re good.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 14:44
What sabermetrics is NOT

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion