THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, October 11, 2006

Zone Rating

By Tangotiger, 12:12 PM

When I was looking for fielding data to generate my Scouting Report ballots, it took me a while to find a good source.  Some places merged guys like Abreu into one line, or ignored one of his records, or put multi-position records into 1, etc.  Everything wrong about what you’d want a database, and these sites had it.  Except SI.  They had both the data split the right way, but also in an easy to use format.  If we look at their SS data:


CNNSI

And select all SS with “Ch” greater than 81, we have 44 records.  It looks like “Ch” is chances, but not in the PO+A+E chances, but actual chances in their zone of responsibility.  Or more likely, balls in their zone, plus outs made outside their zone, to be true to the classic definition of STATS, the data supplier to SI.

Anyway, let’s get back to our 44 SS.  Ideally I’d split them up by league, but, whatever.  I’m only trying to make a general point.  Someone else can do the heavy lifting.  We have the ZR for each SS, the Chances, and we can calculate the population average ZR easy enough (.831 for our group of 44).  All we want to do is figure out each SS z-score.  The random standard deviation is simply sqrt(.831*(1-.831)/Ch) for each SS.  Do that.  Figure out each SS ZR relative to the .831.  The respective figures for Adam Everett is .017, .074.  Divide one by the other to get the z-score.  Everett is 4.4.  To get a feeling of z-score, just remember that in a random distribution, 68% are within a z-score of -1 to +1, and 95% are at -2 to +2.  So, Everett is really out there.  Do this for all the SS.  Then, take the Standard deviation of those z-scores.  If ZR was perfectly random, the answer will be something close to 1.  We get 1.37, which is significant.  Of course, with such an outlier as Everett, it’s not much of a surprise.  It would be more beneficial to do this with multi-year of data, or to also include 2B and 3B.

Nevertheless, what can we do with this 1.37?  Lots!

Regression toward the mean is 1/1.37^2, or .53.  Pretty nifty, eh?

Next, figure out the average number of Ch.  A simple average is 334, but the way Andy does it in The Book, it’s 239 (*).  So, at 239 chances, the regression amount is .53.  Our equation of x/(x+Ch) makes x=270 . So, with 270 chances, you regress 50% toward the mean.  270 chances is 723 innings, or 80 games, or half a season.  This is a great result!

Now, the heavy lifting.  Repeat this for all positions, and split it up by league.  Group results for IF, OF, 1B, C, P.  Tell me what you get.

***

The cool part is that you can now figure out how much to regress each player’s ZR!  Continuing with Everett, we regress his ZR by 35.7% toward the mean.  Since his ZR was +.074, this becomes +.048.  That’s his number.  That’s our best guess, using this data, as to the true talent of Adam Everett.  He makes +.048 more outs per play than the average of these 44 SS.  With about 540 plays in a full season, that’s +26 plays, or about +20 runs.

***

(*) To figure out the average number of chances, you do 1/average(1/ch1+1/ch2+...+1/chN).  If you do this in Excel, create a column called 1/Ch, and put that formula there.  Then, just do
=1/average(myCol:myCol)
, with myCol simply being the Column in Excel with the formula, like Q or something.

#1    Rally Monkey      (see all posts) 2006/10/11 (Wed) @ 17:41

Nice explanation.  I should dust off my college Statistics book.  For my ZR projections I used X = 300 (suggested by DSG).


#2    Tangotiger      (see all posts) 2006/10/12 (Thu) @ 08:40

300 seems like the right number, given that I got 270 just with SS.

***

Betancourt comes out pretty low in ZR.  As we know, the one big problem with ZR is how it counts the number of opps.  It’s silly to include all balls in a particular zone, plus balls caught outside the zone, into the denominator. 

The second problem with ZR is that the zone of responsibility does not change between LH and RH.  You’d think the guys who came up with ZR never played baseball in their lives.  Is it not apparent that a RH hitter has a different spray pattern than a LH hitter, and that fielders know this?  So, at the least, there should be two zones of responsibility, one for LH and one for RH.

The third problem with ZR is how to draw the lines of responsibility.  Unless things have changed, there’s a whole bunch of little grids, I don’t know, say 5ft x 5 ft or 3ft x 3ft (not important for my purposes).  Any zones where the league average has a ZR of over .500 counts as a subzone of responsibility, and the totality of these subzones become the zone of responsibility.  Seeing that the average of the big zone is .8x, that’s a pretty wide disparity, of including all zones from .500 to 1.000.  Imagine a SS, maybe Betancourt, who gets alot more balls hit into the .600 zones than the .900 zones.  But, ZR just says “hey, that’s a .8x zone, and it all counts the same”.

Play-by-play metrics (PBPm) address all three of these problems. 

Even so, PBPm are not without their issues.  We all watch baseball, and so, we see lots of routine balls.  My estimate is say 60% of all plays are routine, that even Manny Ramirez would play SS, and he’d make the plays on the routine outs, and not make the plays on the routine hits.  Those plays are noise.  They should be counted as “0” or “1”. 

But, PBPm can’t be that sure.  They try to include as many parameters as possible (the GB tendency of the hitter, pitcher, the park characteristics, the men on base situation, how hard the ball is hit) and try to come up with an estimate as close to 0 or 1 as they can, but they’ll never get there.  Likely the closest they’ll come is something like .05 or .95, simply because we don’t have as strong as data as we want, and that certain fielders position themselves a different way that allows them to turn a .05 play into a .50 play, simply because they put on a shift, which the parameters used didn’t recognize as a shift being required.  (i.e., no “Ortiz” parameter.)

***

People will wrongly say that since we just don’t know, better to use all the systems.  That’s completely wrong.  Since PBPm uses all the parameters of ZR, plus more, it is, therefore, better.  The reader’s only recourse is to say that he doesn’t trust the extra parameters that PBPm uses because he doesn’t trust the data quality, and that’s fine.  Therefore, he has to use ZR.

Or, better, would be for PBPm to produce multiple estimates, one that addresses the ZR problems noted above, and create a basic-metric, one that handles the data intelligently, and is not dependent too much on the quality of the scorekeeping.  (This may simply mean that we can only address the LH/RH issue.)

And then, various upgraded versions of PBPm, which includes more and more parameters, so that each version of PBPm has the same “uncertainty” level around its estimate.

The LH/RH data has no uncertainty.  The location of ball hit has a certain uncertainty.  The speed of batted ball has a higher uncertainty. 

Unless PBPm publish uncertainty levels around their estimates, then the right thing to do is publish different versions of their results, so taht we can assign a “global” uncertainty level to each version.


#3    Rally Monkey      (see all posts) 2006/10/12 (Thu) @ 17:16

I would consider Zone Rating a PBP method.  If not, what is your definition?

I will agree that a PBP using all the parameters of ZR plus more is better.  Of course, ZR is infinitely cheaper and easier to calculate, and gets you results that are close enough anyway.

For 2B I used the process from above.  I took the top 50 2B (minimum of 75 CH).  The STDev of the z scores was 1.5, and the average chances was 174 (straight avg of 268).  So for 2006 2B, you only need 137 chances to regress 50%, assuming I did this right.


#4    tangotiger      (see all posts) 2006/10/13 (Fri) @ 08:15

I guess I should have said “other PBPm”.

I just received an email with all the heavy-lifting done.  I’ll publish the results soon.


#5    tangotiger      (see all posts) 2006/10/13 (Fri) @ 08:38

In the data that I was provided from Joe, the SD was 1.83 for AL 2B, and 1.33 for NL 2B, for a simple average of 1.58.  When I combine them as just one league, the SD is 1.60 (total of 45 2B).  The regression equation has an “x” = 136, which is 42 games. 

So, certainly the spread in talent at 2B is much higher than at SS (as you’d expect).  Placido Polanco, poster boy for UZR and one of the unsung players of our time, comes out near the top in ZR as well.  Aaron Hill leads the league, while Jorge Cantu is at the bottom.  The Fans have Cantu as the 2nd worst fielding 2B of the year:
http://www.tangotiger.net/scouting/pos2006_2B.html

Only Vidro is worse, and Vidro’s z-score is in the bottom third.

The first big disagreement between the Fans and ZR-based z-scores (I guess I’ll call them zZR), is Marcus Giles of Atlanta.

***

For 3B, the z-score is 1.40, whether I take the average of the two leagues, or put all the players into the same one league.  The “x” in the regression equation is 210, which is 74 games.

***

For IF, it seems therefore that I shouldn’t try to lump them together, because the distribution of talent is nowhere near the same, with the tightest at SS, and the widest at 2B.  It’s not too long ago that I think the spread in fielding talent was probably wider at 3B than at 2B.  Perhaps a shift has been happening, as teams realize that they don’t need a premium fielder at 2B, relative to 3B.

If I go through each team, I’d guess that I’d probably find an even distribution of fielding talent at 2B and 3B.  For example, the better Cardinal fielder is found at 3B not 2B, the Tigers 2B/3B are probably comparable, while D’Backs 2B is better than their 3B.  I don’t think that’d be the case if I were to go back 15-20 years ago.


#6    tangotiger      (see all posts) 2006/10/13 (Fri) @ 08:57

Comparing 2B/3B, here are the teams with the better 3B:
Bos (Lowell)
CHW (Crede)
LA (not Kent)
Mil (not Weekes)
Pit (Sanchez)
Sea (Beltre)
SF (Feliz)
STL (Rolen)
TB (not Cantu)
Tex (Zimmerman)

And better 2B:
Ari (Hudson)
Atl (Giles)
Bal (Roberts)
Cin (Phillips)
Col (Carroll)
KC (Grudz)
Phi (Utley)

I count 10 teams with the better fielder at 3B, and 7 with the better fielder at 2B.

We know the better hitters are at 3B as well.  It seems to me that there’s been a slight shift in how teams are operating.  I know, I know, “How do you know that Beltre is better than Lopez in Seattle?” The Fans seem to think that Beltre has the stronger arm, the surer hands, the better legs, and the better instincts.  It’s gotta count for something.  At the very least, if the Fans perceive this, certainly the managers may be perceiving this as well.  That is, Seattle, and the other teams (Lowell, Rolen, Zimmerman, et al) think that their better fielder is at 3B.


#7    tangotiger      (see all posts) 2006/10/13 (Fri) @ 08:58

Results above based on Fans’ data.


#8    Rally Monkey      (see all posts) 2006/10/13 (Fri) @ 10:17

Two things I’ve found -

Comparing zone rating +/- for players who’ve played both positions, 2nd base and 3rd base are even.

From my projection database last year, the replacement level for 2B was actually a little bit higher than 3B, though by only 0.1 RC/27 outs.  Its really easy to find a halfway decent player at 2B.  Even the Royals had 3 - Grudz, German, and Keppinger.

Yet the average 3B was a much better hitter than the 2B.  I think its safe to say that major league 3B are better players than the 2B.


#9    Rally Monkey      (see all posts) 2006/10/13 (Fri) @ 10:21

Tango, would it be possible to use a similar process as outlined above for catcher data, if you were working with just runs above/below average, and number of innings as a your sample size?  Since your mean is zero, the first problem is the random SD ( 0 *(1-0)) won’t get you anywhere.

Is there a way to work around this or is it a dead end?


#10    tangotiger      (see all posts) 2006/10/13 (Fri) @ 11:04

We’re never at a dead end! 

What I do in these cases is come up with something reasonable.  For example:
1 - What are the number of opps per game?  So, for catchers, what are we talking about?  Baserunning?  Blocking pitches? etc?  All of it?

2 - Come up with a “success rate”.  Going back to BIP, because the league DER is .700, and if I only had say Ozzie Smith being +200 runs in 15 full years of games, I’d do:

5 BIP per game x 162 games x 15 years = 12,000 opps

league average = .700 outs per opp

Ozzie = +300 runs, or +400 plays made, on 12,000 opps, or +.030

1 SD = sqrt(.7*.3/12,000)= .004

That puts Ozzie, in this illustration, 7 SD above the league mean.


#11    tangotiger      (see all posts) 2006/10/13 (Fri) @ 11:07

Rally, when you compare the 2B/3B, you should do two things:

1 - do you have an equal number of “natural 2B” going to 3B as you have “natural 3B” going to 2b?  If this is skewed of more 2B going to 3B, then you have to remember that there is a “familiarity” penalty that is quite substantial, on the order of 4 to 8 runs.

2 - What is the physical traits of players moving back-and-forth.  Because only certain types of players move between positions, the guys with poor 2B arms won’t see the light of day at 3B.  That’s why the Fan data is important.  It’s a *great* way to see the kind of guys moving between positions, compared to what the average guy at that position is.

The 2B/3B analysis would be the most illuminating of all multi-position switches.


#12    tangotiger      (see all posts) 2006/10/13 (Fri) @ 11:26

Rally, if you want to be the most conservative, set the success rate = .5, since that will maximize the variance, and set the number of opps per game as low as possible.

For example, say you have IRod as +28 plays per 140 games, and he’s “the best”.  If you set the number of plays at 2 per game, then he’s +.10 better than average.  If it’s 1 per game, then he’s +.20 better than average.

What happens with his z-Score in either case?

Well, 1 SD = .5/sqrt(n), where n = 140 or 280, meaning 1 SD = .042 or .030

His z-score is either .20/.042 or .10/.03 = 4.8 or 3.3.

After you figure the regression toward the mean equation, and then multiply his value per play above average times the average number of plays per game, you’ll probably get very similar results anyway.


#13    Rally      (see all posts) 2006/10/13 (Fri) @ 15:42

Thanks.  I gave it a try, but didn’t work out too well.  I made up a value for chances, set at 2 per 9 innings of play, and used (.5*(1-.5)) to calculate the SD.

The SD of m z scores was less than one.  That can’t be right.  By playing around with the fudge success/fail ratio I could get a sd over 1, but then the regression would be determined by my guess - I don’t want that.

Good thing I have multiyear data, and there’s another way to do this.

I did matched innings for catchers who had 100 or more innings in 2005 and 2006, and an r = .605

My average innings was 305, so we can regress catcher runs by 50% on just 198 innings caught or 22 games.

Using a straight average of innings, instead of the way its calculated above, its 500, and x = 326 innings.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 03 21:29
Sabermetric Moves of the 2009 Pre-Season

Dec 03 23:36
How to calculate the area of a baseball field

Dec 03 23:25
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 03 21:15
What would happen if the shootout period was 10 minutes, not 5?

Dec 03 20:51
Marcel 2009 is here

Dec 03 18:40
Avery being Avery

Dec 03 14:50
The Return of the Baseball Abstract?  No, the next best thing…

Dec 03 14:48
Estimating BABIP

Dec 03 10:42
What was Pedro worth?

Dec 03 10:20
Complete Run Expectancy, Retrosheet Years