THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, January 18, 2007

Do Umpires Have Their Own Strike Zone?

By Tangotiger, 01:29 PM

Yes.  Here’s how to tell.


DanAgonistes asks: “ It would be interesting to see whether there is any trend that holds over from year to year “.

We don’t need year-to-year data.  We just follow the same process we follow for everything else.  Figure out what a random distribution would look like.  Figure out the actual distribution.  The difference is the umpires’ inherent “skill”.  Convert that measure into a correlation.

Using the wonderfully-compiled umpire data here:
http://www.baseballprospectus.com/statistics/sortable/index.php?cid=131616

I added the “strikes + balls” of each umpire.  (Balls in Play were excluded.  And, it seems that the data is somewhat off, as I expected strikes+balls+battedBall to equal pitches.  It doesn’t.)

I took the 64 umpires with the most strikes+balls (which I’ll now call pitches, which I hope is not confusing.) The strike percentage, or strikes per pitch, was .534.  I figured for each umpire what one standard deviation would be, if all calls followed a binomial around this population mean.  And then, figured how many standard deviations they were above the population mean, which is their z-score.  I then took the standard deviation of the z-score.

(A similar process was followed here in more detail: http://www.tangotiger.net/dipsbands.html )

If we get a z-score of 1.00, then we know it was all random, and umpires call it by the book.  The z-score was a high 1.65.  This translates into a correlation coefficient of r = .63 (or 1 - 1/1.65^2).

Another way to get to that number is to note that with an average of 8000 pitches, the random standard deviation is .0056.  The actual observed standard deviation was .0092. 

Regression toward the mean = var(luck) / var(observed)
where var = variance, and is standard deviation squared.

So, regression toward the mean = (.0056/.0092)^2 = .37

Note also that r = 1 minus regression toward the mean, making r = .63

Now that we have our r of .63, and we know the number of pitches was 8000, we can create our regression toward the mean equation:

regression toward the mean = 4700 / (4700 + pitches)

If you have an umpire with 8000 pitches, you regress his actual observed strike percentage 37% toward the mean.  Greg Gibson had 9037 pitches, meaning his regression toward the mean is 34%.  His observed strike percentage is 51.9%, of which we regress 34% toward .534, making Gibson’s “skill” strike percentage as .524.

McClelland is at .520 and Eddings at .550.

If anyone wants to do a year-to-year correlation, I would bet that if the average number of pitches in the sample was around 8000, the correlation would be an r = .63.

#1    David Gassko      (see all posts) 2007/01/18 (Thu) @ 15:03

Tom,

Please kindly cease-and-desist in stealing my Hardball Times article ideas. smile

(Note: I still plan on doing the piece, but you’ve certainly taken some of “oomph” out of it. I guess it’s my fault for being too slow.)


#2          (see all posts) 2007/01/18 (Thu) @ 15:45

Awesome!  Tom, did you include the variance caused by umpires not necessarily seeing the same set of pitchers?  It’s probably pretty small, but still ...


#3    Peter Jensen      (see all posts) 2007/01/18 (Thu) @ 15:50

I have analyzed called strike/ball ratios for 2003-2005 using retrosheet data.  Eddings was by far the most prolific strike caller in this period, more than 3 SDs above the mean.  McClelland and Gibson were near the top of the small strike zone umps with called strike rates around 1.5 SDs below the mean.


#4    tangotiger      (see all posts) 2007/01/18 (Thu) @ 16:33

Phil,

You are right that the variance should include the pitchers (and batters and parks).  The full equation would be:

var(observed) = var(true) + var(luck)

var(true) = var(ump) + var(pit) + var(hit) + var(park)

From the perspective of the umpires, the variance of the true skill of the pitchers of one umpire must be pretty much the same as the variance for another umpire.  So, the variance of these variances would be zero.  Same for hitters and parks.  Unless of course umpires are not rotated randomly.

***

DSG: don’t worry about any oomph.  This site gets a tenth the visitors of THT, and it’s certainly not written in a reader-friendly manner.


#5    Ryan Armbrust      (see all posts) 2007/01/18 (Thu) @ 16:48

Nice work. It looks like you’ve proven, in a more statistically confident way, the conclusions I’ve come to in the last two weeks.

http://thepastime.net/category/general-baseball/umpires/

Specifically, the balls and strikes data I worked with here:

http://thepastime.net/2007/01/13/graphing-the-umpires/

If you’re interested, I’ve compiled complete umpire date from 1995 to 2006 in a spreadsheet. 1998 is partially incomplete, due to bad ID data from BP that I can’t fix. BP’s umpire data is fraught with ID errors, in fact, it was somewhat difficult to combine the years. I’d be glad to share that data with you.

Also, it would mean a lot to me if you (and David Gassko) would consider mentioning my work on umpire stats in your posts/articles. Thanks.


#6          (see all posts) 2007/01/18 (Thu) @ 16:53

Hi, Tango (#4),

Not sure what you mean about “variance of the variances”.  The individual variances are greater than zero, and will therefore reduce var(ump), which you’re trying to estimate, no?

Even if the rotation is random, there will be some positive var(pit) and var(hit).


#7    tangotiger      (see all posts) 2007/01/18 (Thu) @ 17:17

Ryan: I had a recent thread to your site.  Check out the main page.

Phil: I definitely didn’t speak well.  I should have said variance of the means, for each ump.  Each umpire sees a certain distribution of pitchers (obviously) of a certain mean.  An umpire who only calls games by Santana, Halladay and Oswalt will get lots of strike calls.  When you look at the mean quality of pitchers for each umpire, it should come out to pretty much zero.  So, the variance of these means will be zero.


#8    tangotiger      (see all posts) 2007/01/18 (Thu) @ 17:19

That is, the “quality” of pitching for each umpire is such that his “opponent” (the pitcher) will have a true strike skill of 53%.  His other opponent (the batter) will also have a true strike skill of 53%, for each umpire.  The umpire will call 1/30th of each of his games at each park (or, the skill of each park in terms of visibility is such that it’s average).


#9          (see all posts) 2007/01/18 (Thu) @ 17:37

Regarding Ryan’s graphs here:

http://thepastime.net/2007/01/13/graphing-the-umpires/

I find it interesting that McLelland is one of the more well-respected umpires (I think - anectdotally), even by pitchers, but he’s actually pretty unfavorable to pitchers.  Kind of reminds me of how students never think they like the strict disciplinarian teachers they have in high school, but after it’s over they realize that they often got what they deserved, and were treated fairly.


#10    Tangotiger      (see all posts) 2007/01/18 (Thu) @ 18:19

It’s possible that the umpires’ opponents are not random.  Even such benign things as GB rates and line drive rates have z-scores above 1.00.  The GB, FB, line drive, and pop up rates all have z-scores between 1.1 and 1.2, which implies a correlation of r between 0.2 and 0.3.  It’s not much, considering we are talking about 2000 BIP for each ump.  But, it may indicate that the umpire is not seeing random action (pitcher, hitter, and/or park).

Another interesting tidbit is that while the correlation between strikes % and GB and LD is virtually zero, it was .06 with FB and .11 with popups.


#11    MGL      (see all posts) 2007/01/18 (Thu) @ 19:23

Since you are including swinging strikes in your “strikes” (and not just called strikes), I think, why are you excluding balls in play?  Aren’t they the same as a swigning strike from the standpoint of ball/strikes?  In fact, isn’t a ball in play more likely to be an actual pitch in the strike zone than a swinging strike?

I only use called strikes and balls in my umpire analyses, since those strikes are definitely in the umpire’s strike zone, by definition.  That reduces the sample size of course, but the data is more “pure.”

You are obviously increasing the sample by using swinging strikes, but I still don’t understand your rationale for not using balls in play.  Unless I am missing something…


#12    Ryan Armbrust      (see all posts) 2007/01/18 (Thu) @ 19:38

I agree with MGL’s comment above. In my work, I decided to lump all non-balls (called strikes, balls in play, foul balls) together as “in the zone”.

I think that its more telling if you consider just the percentage of all pitches thrown (balls+nonballs) that are called balls.

It seems to me that the only outcome of a pitch that the umpire completely controls is to call a ball. It would then follow that if he calls more balls, as a portion of all pitches seen, he has a smaller strike zone. Correct?


#13    Trader Joe      (see all posts) 2007/01/18 (Thu) @ 19:41

Fascinating findings.  Doesn’t this analysis probably underestimate the variance in strike zones across umpires? That would be true, I think, if we make the reasonable assumption that pitchers adjust their pitching to the biases of the umpire they’re facing on a given day (and thereby reduce the effects of umpire bias).

Another interesting investigation in this context would be if it were possible to examine whether and how the variance has changed over time (or between parks) as a result of QuesTec.  One might hypothesize that umpires are being driven to a more consistent strike zone over time. That should be testable.


#14    MGL      (see all posts) 2007/01/18 (Thu) @ 19:41

I have not used 06 data yet, but based on 02-05 ball and called strike data from “retrosheet,”
the most:

pitcher-friendly umps are

1) Eddings (by far)
T2) Craft
T2) Hernandez
T2) Reed

hitter-friendly

1) Scott
T2) Cousins
T2) Davis
T2) Hollowell
T2) Hoye
T2) Kelley
T2) Wolf


#15    MGL      (see all posts) 2007/01/18 (Thu) @ 19:46

Ryan, yes of course that is the case all other things being equal.

If you wanted more “pure” data, as I said before, you could use only called strikes since that eliminates any fluctuation in batters swinging at balls outside of the zone.

I used to think that was the better way, but on reflection, it is probably better to use all pitches since that greatly increases the sample size and the fluctuation on those swinging strikes (including balls in play) should be minimal.


#16    MGL      (see all posts) 2007/01/18 (Thu) @ 19:54

Trader Joe, good point.  Very good point.  If pitchers (and hitters) adjusted their pitches, then it would greatly reduce the observed variance among umpires.

I am not sure how much pitchers do that.  I suspect a little, but not a lot, especially with umpires who are not well known as hitters and pitchers umps (but actually are one or the other).

Also, I think that Ryan, in his series of articles, showed that Questec has definitely reduced the variance among umps, which goes along with my findings.  I also found, several years ago, that in Questec parks (there used to be like 10 or 11 of them - I am not sure how many there are now), umpires had different zones than in non-Questec parks.  Basically, in Questec parks, the extreme umps like Eddings and Davis were not so extreme, which makes sense of course.  In fact, given that Questec exists in at least 1/3 of the parks, I don’t know how Eddings and his ilk get away with being so extreme.  Wouldn’t he and the other “extremists” get terrible marks from Questec?  I suspect that no one (umpires union, MLB, etc.) takes the Questec grading seriously anymore, OR maybe all they care about is consistency within each umpire’s zone…

BTW, how the heck do you highlight text in other posts in IE?  I can’t seem to do that.


#17          (see all posts) 2007/01/18 (Thu) @ 21:32

Tango (#7),

But var(pit) can’t be zero unless all the umpires see *exactly* the same pitchers.  In real life, it might be small, but not zero.  What I’m wondering is: how small? 

The theoretical SD should have been .0056.  But after the effects of random pitcher distribution, what would it be?  .00561?  .0057?  .0060?  .0062?

It wouldn’t affect the conclusion that umpires have different strike zones, because obviously the SD won’t rise all the way to the observed .0092.


#18    MGL      (see all posts) 2007/01/18 (Thu) @ 22:11

Yes, Phil is right.  The SD expected from the binomial distribution is exact only when the mean is constant.  It is not, regardless of whether each umpire faces the same pool of pitchers. 

The theoretical SD, even if all umpires have the same strike zone is going to be larger if each pitcher has a different mean strike % which they clearly do.  That is what Phil is referring to I think.

As he says, how much higher, we don’t know.  It depends on the SD of means among the pitchers.  It probably does not raise the .0056 all that much.  But it has nothing to do with what the pool of pitchers are that each umpire “faces.” As I said, even if all the umpires faced exactly the same pool of pitchers AND all the umpires had the same strike zone, the variance in strike % among the umpires would be the binomial variance PLUS the “skill” variance among the pitchers.

It is the same thing with team w/l%.  If a team has an average theoretical wp% of .500, the SD expected in 162 games is NOT 6.35 games (as expected from the binomial distribution).  It is larger - that plus some more variance from the fact that each day a team’s true wp is not exactly .500.


#19    Mike      (see all posts) 2007/01/18 (Thu) @ 22:18

I have a book on gambling that lists the best over and under umpires:

UNDER UMPIRES (Pitcher umpires)
Doug Eddings
Brian Gorman
Angel Hernandez
John Hirschbeck
Ron Kulpa
Alfonso Marquez
Jerry Meals
Angel Hernandez
Dan Iassogna
Jeff Nelson
Hunter Wendelstedt
Tim Tschida
Bill Welke

OVER UMPIRES
Gary Cederstrom
Jerry Crawford
Gary Darling
Larry Poncino
Mike Reilly
Paul Schrieber
C.B Bucknor
Daryl Cousins
Mike Everitt
Ed Montegue
Charlie Reliford
Tim Timmons


#20          (see all posts) 2007/01/18 (Thu) @ 22:20

Tango #10: if umps are known to have a relatively low strike zone, that would probably influence GB rates.  Certainly when an up calls a questionably low strike early in a game, pitchers on both teams are told by their coaches that he’s “giving the low strikes” and thus you aim for that area more than you otherwise would.  If an ump is predisposed to calling these low strikes, batters are more likely to miss or make incidental contact above the ball, and hit a GB.  Just a potential non-biased-sample source of GB correlation.


#21    MGL      (see all posts) 2007/01/18 (Thu) @ 23:09

Those gambling sources that list “over” and “under” umpires often go by rpg, which is a ridiculously coarse measure of an umpire’s strike zone.  They sometimes use K and BB rates which are much better than rpg, but not as good as balls/strikes of course.  And they don’t usually account for sample size, regression, etc., althoug most of the umpires on that list appear to be experienced ones with lots of games under their belt. And of course, these days with umpires doing NL and AL games (previosuly, there were NL and AL umps), you would certainly have to normalize rpg, as umpires will have different % of AL or NL games.


#22    Peter Jensen      (see all posts) 2007/01/19 (Fri) @ 00:19

MGL - Are you sure about Scott? I have him as slightly more likely to call a strike than the average ump for 2003-2005.  He must have had a strange 2002 to top your list.  Or he quickly adjusted to QuesTec starting in 2003.  Otherwise my list for 2003-2005 is very similar to yours.  I still think called strikes/balls is the best method.  The umpire has no decision to make on balls that are swung at so I don’t see how including those pitches actually increases the sample size.


#23    tangotiger      (see all posts) 2007/01/19 (Fri) @ 00:35

Right, the true SD would be the sum of each individual binomial.  Sticking with MGL’s example of using teams, what would we get if we had say these opponents: .400, .450, .475, .500, .525, .550, .600?

This gives us a true opponent skill standard deviation of .062, which is close to reality (.060 true variance). 

In each case, the variance is between .0385 (for the .400, .600 teams) to .0393 (the .500 team). The overall becomes .0390.  As you can see, not that much of a difference from assuming each team is in fact a .500 team.

Going back to our pitchers, let’s assume our average pitcher is a 53% strike pitcher, and the range is between 45% and 61%.... well, you can see where I’m going here.  It’s pretty much a similar distribution to what we get with teams, right?  If the true distribution of strike % was uniform, and spread between 25% and 80%, then maybe MGL’d have a point on this one.  But, the reality of MLB is that, on a pitch-by-pitch basis, pitchers are very clustered around the mean (like teams are). 

***

Phil: yes, it won’t be exactly zero. 

The observed was .0092, and the random was .0056.  The true variance makes it .0073.

That true variance is the variance of the true ump skill plus the variance of the true pitcher skill + ... .  How much do we think the non-random pitcher distribution could be?  .0020?  In that case, the true umpire would be .0070. 

Let’s assume the pitcher, hitter and park were each .0020.  The true umpire would be .0064.

I’d be surprised if the non-random nature of the hitters, pitchers, park could be that much, but who knows.  Just a matter of someone rolling up their sleeves.


#24    tangotiger      (see all posts) 2007/01/19 (Fri) @ 00:38

Peter: if the non-called strikes are truly random, then they are noise and the increase in sample size makes our correlation *worse*.  You are right.  However, if they are not random, then they could help.  I don’t know the answer.

What would definitely help is to include the count.  After all, if you have an umpire that pushes a pitcher to a 3-0 count, the called balls/strikes will be different from other umpires that have lots of pitcher counts.

So, if we want to do this right, let’s make it called balls and called strikes, by count, by umpire (excluding IBB).


#25    MGL      (see all posts) 2007/01/19 (Fri) @ 03:02

I knew there was some reason why I did not include swinging strikes before!  Sure, they are basically noise.  The idea that the batter will tailor his swinging or not swinging to the size of the zone is true but I don’t think the effect is large enough to turn the noise into useful data.  So I will continue to use balls and called strikes only.

In 46 games from 03 to 05, Scott had a ball to called strike ratio of 2.46.  Compare that to Eddings at 2.03, Cousins at 2.67, and the league average at 2.50.  I don’t know from where I got that he was a hitter’s umpire.  During the season, when I watch games, I note any extreme umpires.  I may have seen Scott in a game where nothing was a strike.  I don’t remember.  Also, I weight the last 3 years to come up with an “umpire projection.” Umpires do change their zones from one year to another for various reasons, beyond normal fluctuations.


#26    tangotiger      (see all posts) 2007/02/01 (Thu) @ 10:09

http://www.hardballtimes.com/main/article/the-outside-corner/

And in fact, there has been a greater than 25% reduction in the true spread between umpires since the MLB first installed QuesTec in selected major league parks… Indeed, Eddings and Miller, for example, had the two largest strike zones between 1999 and 2001, calling 2.7% and 2.4% more strikes than expected, respectively… In the past three years, they have still called a higher number of strikes than anyone, but only 1.9% and 1.0% more than the league average, respectively.


#27    Tangotiger      (see all posts) 2007/11/28 (Wed) @ 11:26

Which umpires are calling the outside pitches and not:
http://www.hardballtimes.com/main/article/a-zone-of-their-own/

Related thread:
http://www.insidethebook.com/ee/index.php/site/comments/where_is_the_strike_zone_exactly/


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 10:54
David G. checks in again on whether experience matters in the post-season

Nov 20 10:42
Offense by position groups by decade

Nov 20 04:02
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?