THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, August 16, 2006

Primates Look at Fielding

By Tangotiger, 11:58 AM

Looks like some interesting fielding numbers will be published for our consumption

In the long-run, a player’s ZR should pretty much match his UZR, making this a good metric when looking at careers.

And yes, the aging pattern…


...for fielding follows a similar one for hitting, but just a tad earlier.  If you remember, a hitter’s speed skills peaks at around age 23, while his other skills peak later.  For fielding, speed is alot more important than it is for hitting, and therefore, fielding peaks a little earlier than it does for hitting.  IIRC, it’s about 1 year earlier than for hitting.  And for the SS/CF positions, it was 2 years earlier.  1B peaked 1 year later than hitting, I believe.

In any case, that was based on 4 years of data.  With the Primates’ data, we will be able to see this better.

As well, with so much more data, we’ll be able to do the “multiple fielding position” analysis I did several weeks (as well as years) ago.

#1    David Gassko      (see all posts) 2006/08/16 (Wed) @ 13:20

In the long-run, a player’s ZR should pretty much match his UZR, making this a good metric when looking at careers.

***

This is actually incorrect. We know that the biggest flaw in zone rating is that undervalues guys with good range by including balls out of zone in both the numerator and the denominator. Over a small (say, one-year) sample, this is not a big deal in terms of comparing to UZR because ZR has such an overwhelming advantage over non-PBP metrics due to its use of PBP data. However, as distributions smooth out, the non-PBP metrics pull up and ahead of ZR, as their main flaw is minimized while the main flaw in ZR is not. Michael Humphreys showed this quite nicely in his three-part series on DRA for THT.

With multi-year samples (especially over a player’s career) or for grouping purposes (like constructing aging patterns), a non-PBP metric like DRA of even DFT is better than ZR. Dial’s project might help us identify the best fielding seasons, but I don’t think it will lend any extra information beyond that.


#2    Tangotiger      (see all posts) 2006/08/16 (Wed) @ 13:29

I should say the “good ZR”, and not the STATS version of ZR.  This is another implementation/framework issue.  The framework of ZR is simple enough: count the number of outs in a particular zone, and count the number of BIP is that same zone.  Assign various zones to each position.  Add it all up.

STATS, year-to-year, had different implementations of this framework.  I prefer the simple way.

A slight adjustment to make to ZR is the % of LHH faced.  Another adjustment is % of GB allowed.  These two adjustments should vault ZR above any non-PBP metric.


#3    Tangotiger      (see all posts) 2006/08/16 (Wed) @ 13:34

It’s pretty simple to construct a fielding system.  Simply write down all the variables that you’d like to consider.  Then, figure out if you can infer it, or if you can observe/record it.  There’s nothing a non-PBP system has on a PBP system.

Things like % of LHH is something that the STATS implementation of ZR purposefully does not consider, while other systems do consider.  As anyone who has played baseball as young as five knows, the spray pattern of a LHH is different than a RHH.  Why STATS/ZR would not even consider such a simple adjustment is rather bothersome.  They show batter/pitcher splits by handedness.  Why not fielder splits by batter handedness?

Anyway, once (royal) you do this, you quickly realize that a semi-intelligent PBP system will always trump a very intelligent non-PBP system.


#4    David Gassko      (see all posts) 2006/08/16 (Wed) @ 19:41

Tom, I’m not arguing with any of that. A good play-by-play system is always better than a great non-PBP system. The problem is that zone rating has very clear biases (beyond the LH BFP and GB% adjustments that you mentioned—which would actually be very minor). Over the short run, those biases are outweighed by the fact that ZR knows how many balls were hit in the vicinity of the fielder. In the long run, they’re magnified and put it behind a great non-PBP metric like DRA.


#5    tangotiger      (see all posts) 2006/08/16 (Wed) @ 21:05

I don’t get the biases in the ZR framework.  I understand the silly implementation of STATS’s version.


#6    David Gassko      (see all posts) 2006/08/16 (Wed) @ 22:42

Well all we have is STATS’s version. It’s not like the raw data is out there. The construct of ZR = Outs/(BIZ + Outs on BOZ) is clearly wrong. In fact, if you subscribe to David (Smyth)’s theory “that fielders should be evaluated on the margins (great plays vs terrible plays),” zone rating is a really crappy stat to use in the long run since it does not include half of that equation (the good plays). Well, pretty much does not include. The point is that the data that this project will give us, while useful for single-season evaluation, tells us little and probably nothing that we didn’t already know about long-term fielding performance—except for which players are over- and underrated by ZR.


#7    Joe Arthur      (see all posts) 2006/08/17 (Thu) @ 01:48

I agree with David that the systematic error in STATS ZR will not cancel out over time. But at last, the Fielding Bible did make available an entirely alternate ZR, which separately counted outs on balls outside zone. So there are 3 years of those ZR, for regulars.

Chris Dial implies that the ZR he received were spidered off the web. The only sites I know of which have ZR going back to 1987 (including retired players) are ESPN and FoxSports (which I’m guessing is the more likely source). These sites do not report opportunities [cnnsi.com does, but not for retired players]. 

The interesting thing is that the Fox version of ZR is sometimes a bit lower than other web sources. [Of course they’re all much lower than the ratings which were published contemporaneously, meaning that historical ZR were recalculated at some point using STATS’ newest methodology.] I’m guessing that Fox’s ZRs are more recently refigured than the ZRs on other sites, which tend to agree with each other. I suppose it could be late breaking corrections to the data, but perhaps they keep re-calculating opportunities when the cumulative percentage of outs in a particular zone crosses the 50% threshold. When looking at ZR over time, it isn’t clear at the moment whether each year of STATS ZR has its unique version of zones of responsibility, based on percentages within that year, or whether the zones are determined using multi-year data. [Not clear with Dewan’s either.]

This of course has the potential to muddy the waters a bit when using ZR to do longitudinal studies.


#8    tangotiger      (see all posts) 2006/08/17 (Thu) @ 06:59

I said:

I should say the “good ZR”, and not the STATS version of ZR.  This is another implementation/framework issue.  The framework of ZR is simple enough: count the number of outs in a particular zone, and count the number of BIP is that same zone.  Assign various zones to each position.  Add it all up.

Therefore, we should stop talking about the STATS implementation.  I’m talking about the good ZR, the semi-intelligent ZR.

I disagree that “all we have” is the STATS version.  It’d be rather simple for anyone with the PBP database to come up with the semi-intelligent ZR.  That no one’s published this on a large scale doesn’t mean which should limit ourselves to a poor implementation of ZR.

So, going back to the good ZR, what biases won’t even out?


#9    MGL      (see all posts) 2006/08/17 (Thu) @ 20:28

I have always said, especially in response to people who think that it is SO difficult to evaluate fielding, that in the long run, a simple PBP (ZR) system is perfect.

For some reason, we call a system that only assigns a few zones to each fielder a “ZR” system and one that uses all zones for all fielders a “UZR” system, when in fact, you can call them all zone rating systems if you want to.

What will even out in the long run are the baserunners, outs, L/R batters, G/F of the pitchers, speed of the balls, exactly where “in” the zones the balls are hit, bad hops or not, trajectories and hang times of fly balls and line drives, etc.

Lwts is also a perfect measure of a player’s offensive talent in the long run, assuming you are using the correct lwt values.

Ditto for fielding and a ZR type system. Which is why I laugh when I hear people, even intelligent, analyst-types, say and write that you “can’t” measure fielding.  I am extrememly comfortable with many years of UZR or “full” ZR data, in terms of representing a player’s true defensive value in runs saved or cost (and in projecting future defensive value).

The fact that a ZR system may not incorporate ALL defensive value, like relay throws and scooping bad throws at first, etc., is another story.  Of course, these things can be accurately measured too, given the right data.

I also disagree with David about the “bias” in STATS ZR.  I think he overstates the problem with not including balls outside a player’s zone.  There are many difficult and hard hit balls within a player’s zone that are not caught by poorer fielders and caught by the good ones.  The relative infrequency of balls hit outside of a player’s normal zones that can be caught by ANYONE make them fairly inconsequential, IMO.

In fact, many of these plays (outside of zone that are caught) are not that difficult to field, either because they have great hang time for a fly ball, are hit softly for a ground ball, or the fielder is in a completely unconventional position, because of a shift or something like that.

I believe that a STATS ZR in the long run is going to be very close to a full ZR or UZR in the long run and will be BETTER than a non-PBP metric in the “medium run.” Of course, a non-PBP metric in the very long run is going to be better than a STATS ZR.  That goes without saying since a STATS ZR is purposely truncating the data precisely in order to make it more accurate in the short run (I think).


#10    tangotiger      (see all posts) 2006/08/18 (Fri) @ 08:15

Part 2 is up.

As Walt notes later, it is completely wrong to present the career total of RS/150 by summing the RS/150.  I hope Chris cleans that up, because it’s a great presentation otherwise.


#11    David Gassko      (see all posts) 2006/08/18 (Fri) @ 10:06

I also disagree with David about the “bias” in STATS ZR.  I think he overstates the problem with not including balls outside a player’s zone.  There are many difficult and hard hit balls within a player’s zone that are not caught by poorer fielders and caught by the good ones.  The relative infrequency of balls hit outside of a player’s normal zones that can be caught by ANYONE make them fairly inconsequential, IMO.

***

Looking at the Fielding Bible’s Zone Rating section, it looks like for shortstops, over three years, BOZ_Plays/BIZ can vary anywhere from 10% to 20%. Now let’s assume that there are around 350 BIZ per year, and that an average ZR (without BIZ) is .840. So the average player is making 294 plays in-zone and between 35 and 70 out of zone.

Now let’s take two players, of equal fielding talent. One makes 329 plays in-zone and 35 out of zone, the other makes 294 in-zone and 70 out of zone. How are they rated by zone rating? The first guy, the sure-handed one, has a (329 + 35)/(350 + 35) = .945 zone rating. The second guy, the rangy one, has a (294 + 70)/(350 + 70) = .867 zone rating. The first guy is 30 runs above average. The second guy is two or three runs above average.

But in fact they’re equal fielders!

This may be an extreme example, so let’s look at a real life one. Last year, Adam Everett 296 plays on balls in zone (in 344 chances) and 78 plays on balls out of zone. That makes his zone rating .886. Omar Vizquel made 340 plays on balls in zone (in 397 chances) and 43 on balls out of zone, for a zone rating of .870. So that makes Everett about five runs better per year. But in actuality, he was more like 35 runs better, if we break out balls out of zone! So which is it? Are they practically equivalent fielders, or is Everett clearly superior? To me, the answer is easy, and why I don’t trust zone rating for long-term evaluations. For in-season numbers, I use it all the time.


#12    tangotiger      (see all posts) 2006/08/18 (Fri) @ 10:35

David,

Again, I am talking about the good ZR, not the STATS ZR.  The good ZR focuses only on plays in the same zones for all players!  Why do you keep bringing up the in-zone, out-of-zone, and the way STATS merges them?  Other than STATS, everyone thinks it’s ridiculous to do that.

I like the way Dewan splits it up, by looking only at same-zones for all players, and then reporting the additional outs made on out-of-zone balls.

All I’m saying is that if we create a ZR implementation where we only add up balls made and not made on same-zone plays, then the biases in ZR will wash away for a career.

Obviously, we are going to be missing a chunk of the out-of-zone plays, but I’d expect a large correlation, over a career, with same-zone ZR skills and out-of-zone plays.

Where exactly is our disagreement?


#13    David Gassko      (see all posts) 2006/08/18 (Fri) @ 10:56

In that case, there is none which is why I didn’t respond to your earlier comment. However, Mickey was talking about the Stats ZR, which is pretty clearly flawed. “Good” ZR I think is, well, good, but the question is: Where do you get the data? And once you have the PBP data, why not just go the extra mile and calculate UZR?


#14    tangotiger      (see all posts) 2006/08/18 (Fri) @ 11:19

You can get the data from any PBP source.  They’ve all got the hit locations.  Gameday on MLB.com gives you hit locations.  I’m not saying it’s easy, but it’s there.

There’s a world of difference between ZR and UZR.  ZR is based purely on a counting stat.  “Plays Made In Assigned Zones” divided by “Balls Going Through Assigned Zones”.

UZR makes several adjustments to these base numbers.

The “extra mile” would be ZR(RHH), ZR(LHH).  UZR would be a trip to China.


#15    tangotiger      (see all posts) 2006/08/18 (Fri) @ 11:20

In terms of effort.


#16    MGL      (see all posts) 2006/08/19 (Sat) @ 17:04

I may have been wrong about STATS ZR, but obviously it is a matter of degree (the in-zone only “bias"). 

I don’t understand why you (David) would like STATS ZR in the short run but not in the long run.  If you think that only counting in-zone plays is a big problem, then it is a problem short and long-term.

Bottom line really is that there is a pretty good correlation between STATS ZR and UZR.  As far as when a good non-PBP system “overtakes” STATS ZR, I have no idea.


#17    Joe Arthur      (see all posts) 2006/08/20 (Sun) @ 08:00

1) The mlb.com hit location data can’t easily be converted to a ZR system, since the locations recorded are where the ball was fielded, not where it landed. I suppose you could use it to create a GB-only infield zone rating, if you decided not to worry about deflected balls. For outfielders, the best you could do would be some poor man’s version of Pinto’s PMR, because only direction and not distance can be treated as reliable.

2) In terms of the potential impact of the construction problem in STATS ZR, there was a decent peek behind the curtain in the 1996 STATS baseball scoreboard, which had an article on why Robbie Alomar’s ZR at 2b was so low; it published 3 year totals for 2b (min 2000 innings, ‘93-’95) and presented plays made outside zone. The extreme comparison in that sample of players was Brent Gates [2755 innings] vs Chuck Knoblauch [3361 innings]. Their overall zone ratings were a point apart,each looking a little below average [-.020 or so from average ZR], but Gates was #1 in plays outside zone and Knoblauch nearly last.  When comparing for plays within zone only and then adjusting for expected plays outside zone, Gates came out about +25 plays over average and Knoblauch about -75. So over 3 years, one had about a +50 play adjustment and one had about a -50 play adjustment, when “raw” ZR would have suggested they were about even.


#18    tangotiger      (see all posts) 2006/08/21 (Mon) @ 07:28

The following was from the comments section at the BTF article:

Tango did some of this. He doesn’t like using hitters “above position”, so he did a relationship analysis. I think a SS > 1B = +9 runs. I think that is wrong because any LHer will be much worse. Also he used older UZR, which is now overhauled, and my data is now much closer to UZR (using ZR zones).  So, Tango’s work can be re-done witht he database I have - it will have thousands more innings than the original work Tango did (Tango had 3 seasons), now we have 20 and many many more players that have multiple positions. So, when Tango gets this database, he’ll produce more/better data.

Actually, it was more like 15 runs, and I did it both ways, with old UZR and new UZR.  With old UZR it was based on 99-03 data, and the new UZR was based on the 00-05 data.  More can be seen here

And if I get the ZR database, I’ll be sure to rerun the fielding adjustments.


#19    David      (see all posts) 2006/08/21 (Mon) @ 22:09

I don’t understand why you (David) would like STATS ZR in the short run but not in the long run.  If you think that only counting in-zone plays is a big problem, then it is a problem short and long-term.

***

Because in the short-run, having PBP data trumps poor construction. In the long-run, the biases in a non-PBP fielding system even out, while those in STATS ZR do not, and eventually the non-PBP becomes more accurate.


#20    Chris Dial      (see all posts) 2006/08/25 (Fri) @ 11:16

Hi,
Tango, drop me an email. I have seen the grid for MLB, and it should be ZR-able.

I don’t see how non-PBP data becomes more accurate.  See Jones, Chipper.  It’s flat out wrong by nearly 200 runs.  Same with Jeter.

I haven’t read TFB, but does Dewan use the same zones?  Are his definitions of BOZ the same?


#21          (see all posts) 2006/08/25 (Fri) @ 11:54

Dewan follows the MGL approach of looking at it on an x,y basis.  Unlike MGL, they don’t smooth out their probabilities.  And, no handling of park factors.

Though, for 2005, he did break down each player’s record into home/away.

Chris, send me an email, as I must have an old email address of yours (it bounced a while ago).

(Click on my name)


#22          (see all posts) 2006/08/25 (Fri) @ 12:02

Yes, x,y, but how is he defining “zones” assigned zones.  How can he declare something as a BOZ (in DSG’s above example), if he doesn’t have a similar zone?


#23    tangotiger      (see all posts) 2006/08/25 (Fri) @ 13:14

Dewan doesn’t use that for his rating system, but maybe he was using it for something else to make a point?  I dunno.  I’ll have to re-read that part.

The basis is the original UZR in the STATS Scoreboard, but using the x,y point.  What percentage of balls are turned into outs on average, and how many did this guy turn into outs.  He then provides a nice breakdown, where he splits up the zones into “left, middle, right”, just to show you how he did in each zone.  But, the underlying calculations are always:
playsMade(x,y) minus (lgRate(x,y) * BallsHit(x,y))

It’s just about the most basic calculation you would do, if using PBP data.

The key is:
1 - how big to make the x,y points.  IIRC, each pixel is around 3 feet x 3 feet.
2 - how reliable is your lgRate value

With STATS ZR, their x,y “point” is huge, probably 30 feet x 30 feet for the IF, and much bigger for the OF.

On the flip side, I don’t see how Dewan can get a reliable lgRate with such small zones, and you would need some smoothing function.  If your x,y is 30x30, you obviously wouldn’t need one, since there’s only one zone per player.

And, obviously, you would need to adjust for the handedness of the batter (or better, the batter’s actual spray tendencies), and, if possible, the tendency of the pitcher/batter to get the ball on the ground or not.


#24    Tangotiger      (see all posts) 2006/08/28 (Mon) @ 07:16

If I remember one of the conditions to selecting a “zone” as a “zone of responsibility” is that the ZR for that zone, for the league, be above .500.  What does this mean?  Well, if this condition is accurate, then this means that you won’t necessarily have the same zones year-to-year!  As well, the zones floating in-and-out are the extreme level zones, where in some cases, the .510 zones are in, and those same zones, because they are .490 zones, are out.

As well, it is incredible to me that they would not, at the least, use:
1 - the exact same zones year-to-year
2 - have different zones based on the batter’s hand

And of course, the way the grids are determined for the outfield will affect all OF ratings, notably Manny.


#25    David Gassko      (see all posts) 2006/09/08 (Fri) @ 23:00

Tom,

Later in the book, Dewan publishes Zone Rating using the old format, but breaking out balls out of zone. I think his zones are a little smaller than Stats’ but I’m not sure.

Chris,

How is it wrong with Jeter?


#26    Joe Arthur      (see all posts) 2006/09/09 (Sat) @ 00:26

Just to clarify a couple of things which weren’t clearly enough distinguished in prior comments [since Chris indicated somewhere he hadn’t seen Dewan’s book]…

Dewan has TWO different new systems -

1)a plus/minus system which is an x,y [really vector,distance] system with an average granularity of 1 foot by 1.5 feet, with no park effects, but some listing of home/road splits. The plus/minus is in terms of outs. [Dewan has a further variant called enhanced plus/minus which is in terms of total bases allowed instead of outs.]

2) a revised zone rating system which counts only balls in zone and lists plays out of zone separately. The zones aren’t fully described, but he implies that he is using a 50% rule to define the zones. If so then for outfielders apparently he is not following the ‘later’ STATS elaboration with separately defined zones for line drives and fly balls.  The denominator [opportunities] in Dewan’s ZR is much bigger and the ZRs around 200 points lower for corner outfielders than the STATS ZRs [cf more like 100 points lower]. Separation of balls outside zone is responsible for only a little of that. His numerator for plays made in zone + plays outside zone then ties pretty well to total outfielder putouts, so Dewan is not excluding popups for outfielders as the STATS ZR would. For infielders, on the other hand, Dewan’s zones seem to be more restricted [fewer opportunities recognized], and total plays made (inside + outside zone) less than the total implied by the STATS ZR. Perhaps Dewan is excluding short line drives as opportunities where STATS does not ...

So his ZR is fairly different from the STATS version.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 09 16:41
Sabermetric Moves of the 2009 Pre-Season

Jan 09 19:56
Modeling Baseball Player Ability with a Nested Dirichlet Distribution

Jan 09 18:08
Line Drives

Jan 09 18:04
Challenging Nate Silver (and all other forecasters)

Jan 09 17:31
Cheers

Jan 09 17:14
Teaching sabermetrics at school

Jan 09 16:51
The first Hardball Times Annual available for download!

Jan 09 14:44
Vote for the Worst Player in MLB

Jan 09 12:29
Clint Eastwood is Archie Bunker

Jan 09 12:16
Mailbags on Parade