THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, March 30, 2007

BIS Fielding Data

By Tangotiger, 11:10 PM

Here’s a good look at how to use the BIS fielding data found at Hardball Times. 

(Hat Tip: studes.)


(This is from Joe Arthur, and should have been published as part of post #9)












Fielding Opportunities: BIS Zones vs STATS Zones 2004-2006
____________2006____________|____________2005____________|____________2004____________|
POSBIS biz+oozBIS OppSTATSBIS biz+oozBIS OppSTATSBIS biz+oozBIS OppSTATS
1B4851+2012686381375493+194074338419 5406+178371898612
2B12679+1211138901538312825+1478143031581412129+12031333215557
3B10880+163612516134639271+239611667132829007+20741108113244
SS13218+1659148771607112821+1948147691599011995+19191391416067
LF14663+445151081058913712+718144301053012242+8471308910211
CF12667+2046147131380012590+1963145531344611905+20341393913654
RF15037+461154981106714161+695148561089013442+7811422311091

#1    Joe Arthur      (see all posts) 2007/03/31 (Sat) @ 16:10

I agree that this a good example of what can be done with the HBT data, and that what jinAZ has done forms perhaps the most fair readily available baseline for evaluating the more advanced metrics. When they deviate significantly, there should be a good explanation in terms of an unusual distribution of easy or difficult balls in play (or park effects), things which the HBT zone rating does not account for.

If you take on faith that either PMR or fielding bible plus/minus is a highly accurate measure, jinAZ’s results still differ by 10 plays made fairly often, and by 20 plays made in a few cases. [For PMR, he does not use the most recent variation using distance/100 as a parameter, but instead the older version using “soft/medium/hard hit” as a parameter. PMR sometimes varies by 5 or more expected plays made between those versions.]
This gives some idea of how much adjustment a detailed accounting of difficulty can still bring to the analysis. 

To pick up on a comment you left on jinAZ’s blog:  “I don’t trust the year-to-year quality of the data recorders”; yes, BIS batted ball data quality does remain a concern.

Here are BIS line drives by season with STATS (estimated) and Retrosheet in brackets for comparison
2002: 28,512 [26487;n/a]
2003: 30,473 [26820;25686]
2004: 25,606 [26169;25647]
2005: 28,241 [26096;25445]
2006: 26,597 [26274;25904]

Recording of line drives by BIS has fluctuated significantly every single year, with a range from low to high of nearly 5000, vs. not much more than 700 for STATS and less than 500 for Retrosheet. In 2006 BIS agrees closely with STATS for the first time. If 2007 shows “internal” consistency with 2006, then I’ll start to be more confident in BIS hit-typing.


#2    JinAZ      (see all posts) 2007/03/31 (Sat) @ 17:41

Joe,

I did, in fact, use the distance/100 PMR data when it was available.  Thanks for the comments. -j


#3    Peter Jensen      (see all posts) 2007/03/31 (Sat) @ 19:09

The 2003 fielding assessments in the Fielding Bible ignored significant portions of BIS data that were considered not reliable enough for Dewan’s +- analysis.  Consequently, neither BIS data for that year nor Dewan’s ratings for that year should be considered for serious fielding analysis.  My understanding is that these problems in data collection were fixed by 2004.


#4    Tangotiger      (see all posts) 2007/04/01 (Sun) @ 08:58

In the other blog, I talked about the difference in ZR in the OF.  Here’s more data:

Here are the totals by position, for 2004-2006:
Pos BIZ PM OOZ inZR ozPlaysMade
3 15750 12290 5735 0.780 0.318
4 37633 30667 3892 0.815 0.113
5 29158 20714 6106 0.710 0.228
6 38034 31165 5526 0.819 0.151
7 40617 25308 2010 0.623 0.074
8 37162 30012 6043 0.808 0.168
9 42640 27518 1937 0.645 0.066

The second-to-last column is plays made (PM) per balls in zone (BIZ). 
The last column is out of zone plays made (OOZ) per total plays made (PM+OOZ).

If we look at the LF/RF and compare to the CF, we see that the percentage of plays made in-zone is far lower.  At the same time, the percentage of out of zone plays made relative to all plays made is also far lower.

It seems that the “net” is being cast alot wider in the LF/RF zones, that they’re treating far more balls as being in-zone than they should be. 

Using a “matched-pair” of guys who played both in CF and in the corner OF (and weighting the players based on the minimum BIZ of the two “positions"), I get the following totals:

“minimum” BIZ: 9588
ZR in CF: .804
ZR in corner: .648

Remember, we are talking about the exact same players.  As you can see, it’s not a queston of the talent level that is the cause.


#5    Rally      (see all posts) 2007/04/01 (Sun) @ 12:45

The first base zone is not nearly large enough - some 1B make more plays out of zone than in zone (for 2006 at least).

The data seems OK at shortstop, 2B, and CF, but may not be very useful at the corners.  I did compare their total plays made estimate to STATS, and for SS at least, BIS comes closer to crediting the player with the right amount of plays.


#6          (see all posts) 2007/04/01 (Sun) @ 14:13

Tango + Rally,

That may not be the case. After all, we don’t know what the distribution of plays by first basemen and corner outfielders looks like. It may well be that corner outfielders make barely over 50% of plays in a bunch of zones, giving them more zones of responsibility, and lowering their zone rating. Park effects also certainly play into it, as many parks may suppress zone ratings.

Again, first basemen may just be making a lot of plays in various zones to the shift, playing on-or-off the bag, etc., but not making plays on 50% of the balls that are in those zones.

To say that BIS is doing something wrong here with such certainty is incorrect. At best, we might say, we think that they are making some mistake.


#7    tangotiger      (see all posts) 2007/04/01 (Sun) @ 15:40

The STATS zones from the 90s, when Dewan was involved, had the corner OF in the 80% range.  And they still do.

I find it extremely implausible that the zone of responsibility for the corner OF is so wide as to average 62%, with a range of 50% to 99%, yet the CF have an average of 80% with the same 50% to 99%.


#8    tangotiger      (see all posts) 2007/04/01 (Sun) @ 15:54

Let’s also not forget that hit rate on FB if we include ALL balls in play, including those in no-man’s land is between 14 and 27%, making the out rate between 73 and 86%:
http://www.baseballthinkfactory.org/files/primate_studies/discussion/lichtman_2004-02-29_0/

The hit rate on GB is 23%, making the out rate 77% on ALL balls in play.

So, no surprise that the out rate on GB in the “zones of responsibility” would be a bit over 80%.

It’s impossible for the out rate in the corner OF to be 62% if you’ve already weaned out all the almost sure-hits. 

My guess is that line drives are included in the “FB” that BIS is reporting, but I’d have to study the numbers.  Maybe Joe or PEter, who already have the Retro data handy, can compare the numbers to my post #4.


#9    Joe Arthur      (see all posts) 2007/04/01 (Sun) @ 19:26

[The out rate on GB (remembering errors) is 74-75%.]

Retrosheet 2004-2006
pos..Fly....LD

LF 29290 22122
CF 40197 21237
RF 31424 21331

There’s no doubt BIS is counting many line drives in the LF and RF zones.

In Tango’s table, LF and RF actually have more chances than CF under the BIS definition!! Compared to the STATS zones, in 2006 BIS LF and RF had 40% more chances, while CF only had 7% more chances according to BIS. In the infield the BIS zones are counting fewer opportunities than STATS; about 85% as many for 1B, and 90-93% for 2B,3B,SS. This might mean the BIS infield zones were smaller; it might mean that BIS only counts ground balls and ignores IF line drives while STATS counts both.

If BIS is actually using a 50% rule (and the Fielding Bible does not explicitly say that they do), with their much finer grid for the playing field, BIS could have a sharper boundary for the zones, but it can’t greatly vary from STATS zones if they’re both using the same rule. Without doing the calculation in detail, “optimistically” BIS could refine the edges that STATS calls just above or below 50%, and find that around half of the area of the STATS boundary subzones could be included (or excluded) from what BIS recognizes as a 50% zone. But I don’t think that the BIS zone could get more than 10% larger or smaller in area from boundary sharpening.


#10    Tangotiger      (see all posts) 2007/04/02 (Mon) @ 07:49

Joe has additional data, which you can see in the main blog entry at the top of this page.


#11    Tangotiger      (see all posts) 2007/04/02 (Mon) @ 07:57

And here’s Rally’s take on THT:

http://www.hardballtimes.com/main/printarticle/what-is-zone-rating/


#12    Rally      (see all posts) 2007/04/02 (Mon) @ 09:11

STATS has always counted line drives as part of zone rating.  What they do is use 2 separate zones of responsibility, a large one for flyballs and a small one for line drives.  It just look like BIS is using the big zone for everything.


#13    Rally      (see all posts) 2007/04/02 (Mon) @ 09:13

Thanks for the THT link.

One extra thing I learned is that you can credit 2 assists to one putout, for example a shortstop taking a cutoff throw from OF and gunning a guy out at home. Both OF and SS get credit with assists. 

Up to now I thought you could have no more than 1 assist to a putout, but I guess baseball is more like hockey than I thought.


#14    Peter Jensen      (see all posts) 2007/04/02 (Mon) @ 09:35

There also seems to be some field by field subjectivity in the classification of FB-LD in the Retrosheet data.  I am leaning toward lumping all balls in air together for fielding analysis.  There should be no reason that a three year aggregation of hit balls from many different batters should differ in distribution significantly from fielder to fielder (at a specific position) if you believe that pitchers have very little ability to control their line drive percentages as some studies have suggested.  At least for the time being until we have the hit ball speed and vectors that we need to do the analysis properly.

My understanding is that Dewan does the fielding anlysis on his own, and that only the raw data and not the analysis should be considered a BIS product.  For his analysis Dewan considers all hit balls and not just balls hit in a certain zone.  He explains this in his book and it was confirmed to me when I called and talked to his assistant to reconcile some differences between his numbers and Retrosheet’s.  By the way when I called BIS no one that I spoke to there had ever heard of Retrosheet.


#15    Peter Jensen      (see all posts) 2007/04/02 (Mon) @ 09:45

I noticed that last night’s game didn’t have the enhanced gameday data so maybe MLB.com wasn’t as successful at implementing their grand statistical information gathering as soon as they had hoped.  Although, their new Gameday format had categories where the new information could be included (speed and curvature for pitches but not hit ball speed and vector for batted balls).  Let’s hope that they are still trying.  It sounds like our best bet.


#16          (see all posts) 2007/04/02 (Mon) @ 11:07

The ideal, of course, is to have bat contact to ground/glove time measurement.  Failing that data, the best approach, it seems to me, is to use 3 year information with standardized zones for all balls in the air.  One needs to keep count of line drive rates simply to ensure that a fielder is not penalized for a weak pitching staff, and make necessary adjustments, but the effort to attempt to determine in/out zones for each possible line drive (described on a subjective basis) seems not likely to be worth it.

Does anyone have data for particular outfielders about 3 year zone distributions of line drives?  How often does it occur that an outfielder will have a particularly deceptive (positively or negatively) zone rating because of disproportionately many balls at him or in the gaps? My sense is that it is a relatively small item.

The more important points, I think, are park adjustments and reconciliation of individual ratings to team park-adjusted DERs.


#17    tangotiger      (see all posts) 2007/04/02 (Mon) @ 12:47

Rally: maybe STATS counts the lineouts-in-zone for Cabrera?  That would make up just about the whole difference.


#18    Rally      (see all posts) 2007/04/02 (Mon) @ 13:36

That could be.  I didn’t think they were supposed to do that.  But until last year, I didn’t think they counted hits off the green monster either.


#19    Tangotiger      (see all posts) 2007/04/02 (Mon) @ 16:47

These are the totals for all MLB SS, according to STATS (via SI)
http://sportsillustrated.cnn.com/baseball/mlb/stats/2006/fielding/ml_6_byCHANCES.html

Ch INN PO A E DP OutsMade
16158 43255 7181 14186 634 3139 13389

According to Retrosheet, we have:
http://www.retrosheet.org/boxesetc/2006/YT_2006.htm

Team POS G GS CG INN PO A ERR DP TP AVG
NL SS 2832 2590 2354 23137 3752 7581 322 1644 0 .972
AL SS 2494 2268 2045 20121 3423 6605 311 1495 3 .970
Tot SS .............. 43258...7175 14186 633 3139 3

STATS is missing 3 innings, which we can all live with.  The assists are identical, while Retrosheet has 6 less putouts.  Errors are off by 1, while DP are a match.  All-in-all, pretty close.

STATS/SI has 13,389 “outs made in zone”, which I calculated by multiplying the ZR by the “Ch”.  “Ch” is chances, and is the number of ground balls “in zone” (whether assisted, or unassisted putout). 

BIS has 13,218 in-zone chances, with 10,809 outs made in-zone, plus another 1,659 outs made outside zone, for a total (in and out zone) plays made of 12,468.

So, STATS/SI has 7% more plays made “in zone” compared to the BIS plays made “in+out zone”.  I think we can fairly conclude that STATS/SI includes the plays made out of the zone, in the numerator and (since I can’t find anyone above 1.000, even with limited playing time) the denominator.  That still leaves alot of plays to account for.

The question is: how many of those 14,186 assists are ground balls (as opposed to relay throws, lineout DPs, or knock downs of line drives with subsequent out)?  And, how many of those putouts are unassisted plays?

Using the Chone/Cabrera data, about 58% of DPs by Cabrera is as the “second out”.  So, of those 3139 DP, we knock out 1820 assists as “second outs”.  That brings out 14,186 assists to 12,366 “first outs” and 1,820 “second outs”. 

According tot he Chone/Cabrera data, only 1% of the putouts are “first out” (other than the DP which we’ve accounted for).  That gives us another 72 “first outs”.

Our total therefore is 12,366+72 = 12,438 “first outs”.

To remind you, BIS has 12,468 first outs, and STATS/SI has 13,389 “outs made”.  That’s 921 extra outs that STATS/SI is recording, that aren’t easily accounted for.


#20    Tangotiger      (see all posts) 2007/04/02 (Mon) @ 17:03

And the ZR calculation for Stats is definitely OutsMade / Ch.  How do I know?  Because if you do this:
OutsMade = round(zr*ch,0)

And then do this:
newZR = round(OutsMade/ch,3)

My ZR and newZR match 100% of the time.  If for example I used PO+A instead of CH in both equations, I don’t get 100% match.  Nor if I make it PO+A+E.  The only way to get the ZR to calculate exactly is to assume that it’s equal to outsmade per CH.

***

Also worth noting is “outs not made in zone”.  For STATS/SI, that’s 16,158 minus 13,389 = 2769.  That would be hits and errors in-zone.

For BIS, balls in zone is 13,218 minus outs in zone of 10,809 = 2409.  That would be hits and errors in-zone.

So, STATAS/SI is counting 15% more hits and errors in zone (for SS).  This simply means that the STATS/SI in-zones are wider.  Not a problem, of course.  Just an observation.


#21    Peter Jensen      (see all posts) 2007/04/02 (Mon) @ 17:42

13,389 outs made is about what you would get if you included both ground balls and line drives but not pop ups according to Retrosheet.  Retrosheet’s count of “plays made” on just ground balls is 12,440 Event Type #2 (generic outs) plus some of the 116 Event Type #19 (fielder’s choice) and some of the 463 Event Type #18 (errors) where the error was by someone other than the shortstop.


#22    Tangotiger      (see all posts) 2007/04/02 (Mon) @ 17:54

Peter, great stuff.  In post #17, I brought up the lineouts, which in Cabrera’s case is 11% of all putouts, which would be an esimate of almost 800 linedout putouts, and close enough to the 921 gap.

So, I think that’s that then.  STATS/SI includes in the numerator:
+ GB outs in-zone
+ outs outside zone
+ line outs

And in the denominator:
+ all the above
+ GB hits + GB errors in-zone

***

I prefer the approach that BIS takes, of comparing apples with apples, and then presenting the extra data (outs outside zone, lineouts, or whatever) that lets the reader choose how to put them together.


#23    Peter Jensen      (see all posts) 2007/04/02 (Mon) @ 18:12

I counted the fielder’s choices that should go in the SS “plays made” column and there were 96 that either resulted in an out on a runner or where the fielder being thrown to made an error.  Of the 463 Event Type #18 all were errors made by the shortstop.  Total of “plays made” on ground balls according to Retrosheet should then be 12,536.  So BIS is still shorting the shortstop 68 outs.  “Plays made” is where Dewan was way off on his 2003 data.  Some individual shortstops had 50 or more fewer “plays made” according to Dewan than Retrosheet said they had.


#24    Joe Arthur      (see all posts) 2007/04/03 (Tue) @ 00:22

Tango,

thanks for posting my table. The point I wanted to draw attention to was that, just as STATS has a lot of consistency when it comes to year to year line drive totals, and BIS has a lot of variability, a similar effect can be seen in the zone totals.
Note the range between each source’s 3 year low and 3 year high for total opportunities (in + out of zone):
1B: STATS 475 BIS 570
2B: STATS 431 BIS 971
3B: STATS 219 BIS 1435
SS: STATS 81 BIS 963
LF: STATS 378 BIS 2019
CF: STATS 354 BIS 774
RF: STATS 201 BIS 1275

Again, the great variation from year to year in the BIS totals is disconcerting.


#25    Rally      (see all posts) 2007/04/03 (Tue) @ 09:36

Peter, if BIS is short 68 plays, could those be where the shortstop fields a ground ball and throws the runner out at home?

From my look at Cabrera, I don’t think they were counting those plays.


#26    Peter Jensen      (see all posts) 2007/04/03 (Tue) @ 12:45

I think that it is probable that the difference lies somewhere in how BIS scores fielder’s choices in general plus possible accounting mistakes.  There is room for interpretation of what should count as a “play made” on a fielder’s choice especially if it doesn’t involve a force out.  If someone wanted to count all 116 fielder’s choices as “plays made” (minus any shortstop throwing errors) I wouldn’t argue with them too much.  The decision to try for the out on the runner rather than on the batter that results in both runners being safe might have just been a bad decision or it might have been a good decision given the game situation.  You have to keep in perspective that we are only talking about a play or 2 a year per shortstop.


#27    Peter Jensen      (see all posts) 2007/04/03 (Tue) @ 16:01

Check out what is happening at the HitTracker web site.  All the information is there for fly balls; speed off the bat, angle of elevation, horizontal angle.  If he can really do this for all hit balls it will revolutionize fielding evaluation.  He doesn’t seem to be tracking ground balls though.

MLB.com so far has only used the enhanced Gameday information for 1 game, yesterday’s CLE at CHA.  It includes pitch speed, amount of break, and a couple of categories that they don’t explain and I have yet to figure out.  No hit ball information though.  We really must impress upon them how important hit ball speed and angles are to us.  It would be nice to have MLB and HitTracker providing two independent sources for this data, especially if HitTracker isn’t able to do ground balls or all the teams with his volunteers.


#28    Tangotiger      (see all posts) 2007/04/03 (Tue) @ 16:43

Peter, from my discussions with the MLB.com guy, he has a different point-of-view.  Whereby I would prefer to have as much observational data as possible, even if it’s “manual” and prone-to-error, his preference is to have machinery do the recording (like getting the speed and trajectory off a 3-angled video-feed).

***

Here’s what Peter is talking about:
http://www.hittrackeronline.com/hits.php
http://www.hittrackeronline.com/hrdetail.php?id=2007_15


#29    Tangotiger      (see all posts) 2007/04/03 (Tue) @ 16:45

He dismissed the idea of the stopwatch, preferring something more accurate.  However, to me, a stopwatch is more accurate than the null set.


#30    Peter Jensen      (see all posts) 2007/04/03 (Tue) @ 18:08

Tango - They should automatically be able to get speed off the bat and both vertical and horizontal angle of flight automatically from the same set up that does the speed of pitches and amount of curve.  All they have to do is program to look at the first frames after the ball is struck instead of when it is on its way from the pitcher top the plate.  Ask him to present that possibility to his tech guy who set up the speed of pitch program.


#31    Peter Jensen      (see all posts) 2007/04/03 (Tue) @ 18:18

Although it would be preferable to actually measure where the ball landed and how long it took to get there, there is no current way to do that automatically.  But what I just told you should be able to be done automatically.  And the information that it would give you would be almost as good.  You would have an accurate measure of just how hard the ball was struck and where it was initially headed.  Combined with what HitTracker might give us or even with the old Gameday files of where the ball was fielded we can have a very accurate picture of what is going on.


#32    Rally      (see all posts) 2007/04/03 (Tue) @ 20:14

Better idea would be for MLB to hire Tango to design and direct their game outputs.  And give him the budget to hire some of us to do the work.

One can dream.


#33    tangotiger      (see all posts) 2007/04/05 (Thu) @ 18:24

Straight from a long-time STATS employee:

What goes into the numerator and demoniator of zone ratings is a little complicated. If you have any old editions of the Baseball Scoreboard, there is usually a Glossary section at the end with a definition. The one from the 2001 Scoreboard goes on for over a page.

I’ve got all of mine in the basement.  If I can get to it tonight, I’ll reprint it.


#34    Rally      (see all posts) 2007/04/06 (Fri) @ 01:13

I have mine.  I’ll just cut to the chase for infielders:

“Only groundballs are considered when zone rating is calculated.  Line drives, popups, and flyballs are ignored....Infielders no longer get credit for two outs when they start a double play”

Despite this, I’m 95% sure they were counting liners in the 2000 stats.  I checked a 3B, Larry Jones.  He is credited with 319 outs.  He only had 297 assists that year.


#35    Joe Arthur      (see all posts) 2007/04/07 (Sat) @ 08:37

Unfortunately the text descriptions in the Scoreboards, even the really detailed one in the glossary of the final edition [p.300], can be “buggy” re-statements of what the computer program actually did. That description at least does look as though it was derived from inspection of the program logic. There are earlier, briefer descriptions in the scoreboards which do claim line drives are counted; for instance p.212 of the Baseball Scoreboard 1996 “Zone rating is simply the total number of outs recorded by a fielder on line drives and ground balls ...”

MGL dropped by BTF recently and pointed out that Orlando Cabrera’s 2006 STATS ZR plays made reconciled exactly when both ground balls and line drives were counted.

Checking 2000 data against Retrosheet, as Rally suggested - I get the following comparisons for plays made:
3B
Larry “Chipper” Jones 295 g + 20 ld =315 vs STATS 319
Troy Glaus 343 g + 15 ld = 358 vs STATS 359
Joe Randa 289 g + 16 ld = 305 vs STATS 307

SS
Neifi Perez 449 g + 34 ld = 483 vs STATS PM 484
M. Tejada 441 g + 29 ld = 470 vs STATS PM 475
Jose Valentin 394 g + 18 ld = 412 vs STATS PM 411
Deivi Cruz 412 g + 25 ld =437 vs STATS PM 436
Guzman 359 g + 25 ld = 384 vs STATS PM 391
Rey Sanchez 394 g + 23 ld = 417 vs STATS PM 418
Royce Clayton 349 g + 33 ld = 382 vs STATS PM 382
A-Rod 379 g +26 ld = 405 vs STATS PM 402
Mike Bordick 366 g + 28 ld =394 vs STATS PM 392
Garciaparra 365 g + 28 ld = 393 vs STATS PM 392

Not perfect matches when counting line drives, but the biggest discrepancy among these players is 7 plays made and about half are within 1; so we’re far closer to reconciling when counting line drives than omitting them.

So line drives were included in STATS ZR in 2006, which is relevant for comparison to BIS, which appears to exclude line drives. For historical purposes, STATS is counting line drives in 2006, certainly seems to be counting them in 2000, and said they were counting them after the 1995 season.


#36          (see all posts) 2007/04/07 (Sat) @ 09:15

A STATS guy has told me he’ll be taking a look at this, this upcoming week.  I will pass on Joe’s additional research.

Joe, can you email me (click my name)?


#37    tangotiger      (see all posts) 2007/04/07 (Sat) @ 21:45

The STATS guy had this to say:

Tom - yes, despite what the definition says, I’m sure lineouts are included.

So, there you go.  Problem solved.  Great work from you guys.


#38    studes      (see all posts) 2007/04/11 (Wed) @ 13:08

Just had a chat with John Dewan.  Thought I’d pass some notes along to you guys:

- The BIS system is still developing, and the 2006 Zone Ratings were based on different parameters than previous years.  John didn’t go back and change the old systems.  That’s probably one of the main reasons there’s so much variance between years.  John does feel that comparisons for specific players between years is still valid, however.

- In 2006, they changed the number of vectors from 262 to just 90.  Also, they measured distance of flyballs in five feet increments (I forget what they used before, but it was less).

- For infielders, he includes groundballs only in ZR, and the zones will be the same each year (going forward).  For outfielders, he includes all airballs (fly, fliner, line drive) and the zones will change from year to year, based on which vector/feet segments reached 50% in that year.  As a result, his outfield zones may not be contiguous—that is, there might be a zone inside a wider zone that doesn’t reach 50%, and it won’t be included.

- His guess is that Stats doesn’t change its zones at all from year to year.  John uses the major league figures for each year (not league specific) to set zones in the outfield.

- I showed him this thread, and he is intrigued by what’s happening in left and right field, vs. center field.  Hopefully, he’ll find some time to investigate it further.


#39    Tangotiger      (see all posts) 2007/04/11 (Wed) @ 13:29

Great stuff studes.  My comments:

1. He used 1 foot I believe, and then 3 feet.  I argued then that it wasn’t a good choice, because of the sample size.  You could have conversion rates of 80%, 90%, 70%, 85% all next to each other. What you want is something continuous, be it as a function (my preference), or, as Dewan is doing here, getting your zones big enough.

2. Having the same zones year-to-year in ZR is good.  Why not also in the OF?  And zones must be contiguous.  The “50% rule” is really just a guideline.  All you’ve got to do is set a zone, and stick to it.  It’s not clear by your post, but each type of airball should have its own zone.

3. I would absolutely have a different zone based on handedness of batter, especially for the CF, but also for the SS and 2B.  It is a slippery slope (GB/FB tendency of pitcher-batter, base/out configuration, inning/score, etc).  But, handedness goes back to little league.

4. I hope he’s more than intrigued!  Me, I’m stopping the presses, and no one goes home until this gets solved.  Seriously, good of you to bring it up to him, and good for him to take it seriously.

5. Fenway Park.

6. Dewan should release the 2007 event files of one day (say Apr 10, 2007).  Then, the big guns http://www.insidethebook.com/ee/index.php/site/comment_leaders/
from this site can meet and come up with recommendations.  In a sense, Dewan should appoint a Zone Rating commission.

I have a version of the 2004 BIS database, and let me tell you: {big kiss}.  It is fantastically designed, and just wonderful to parse through.


#40    studes      (see all posts) 2007/04/11 (Wed) @ 13:47

John doesn’t have different zones for different ball types.  All air balls = one zone.  All groundballs are the same zone, regardless of batter handedness, etc.

I agree more can certainly be done with zone definition.  The issue for John is a strategic/business one.  He really wants to sell his plus/minus system, and if Zone Rating gets too good, it undermines the value of plus/minus.

He actually hasn’t said that to me, but that’s how I would think about it if I were him.


#41    Rally      (see all posts) 2007/04/11 (Wed) @ 16:13

Any chance John will give us a chart showing what zones are counted?

I’d love to compare it to the STATS zones.


#42    tangotiger      (see all posts) 2007/04/11 (Wed) @ 16:22

… if Zone Rating gets too good, it undermines the value of plus/minus.

Sounds like MGL, Pinto, and Protrade need a new agent.  Dewan’s system is half-way between ZR and MGL, but Dewan’s presentation and adaptability is much better than MGL. 

The presentation of Dewan (granular data), Pinto (vector-based charts), and Protrade (grid-based charts) are each fantastic.  I’d marry all three concepts into one.

The choice of making all airballs the same zone is not good.


#43    Rally      (see all posts) 2007/04/11 (Wed) @ 16:51

You’re right, that’s not a good decision.  For some reason though, it affects corner outfielders more than centerfielders.  Are there relatively few line drives hit to center?


#44    studes      (see all posts) 2007/04/11 (Wed) @ 16:54

Rally, John doesn’t have any charts or anything.

I can tell you the zones for infielders, though.  Picture the field split into 90 vectors, with zero at first base and 90 at third.  The zones are:

1-15: first base
18-38: second base
52-69: shortstop
72-87: third base

So they’re equidistant on both sides of second and between infielders on each side.  The first baseman plays closer to the line, natch.  And the second baseman’s zone is wider than the shortstop’s.  Remember, these only apply to the 2006 data.


#45          (see all posts) 2007/04/11 (Wed) @ 17:05

Yeah. According to Retrosheet data, about 30% of line drives are hit to center field versus 40% of fly balls.


#46    tangotiger      (see all posts) 2007/04/11 (Wed) @ 17:09

Sounds like the vectors are simply degrees.

When I was processing the 2004 data, I was in quite a bit of contact with the programmer there (Damon).  I was also doing some data cleanup.  There was a fair bit that year.  I offered to do a complete quality control of 2005 in exchange for data.  No dice.

Anyway, he explained to me how the “vectors” worked.  It was a bit strange, but essentially there was a pixel at the top of the screen from 1 to 270 or some such, and you would get the vector by drawing a line from home plate through the location of the ball, all the way to the top of the screen.  That would get you your vector.  (Been a while, so bear with me.) And then the calculation for distance was a bit more confusing.  And the locations for behind the plate was a little more cumbersome too I think.  Can’t remember the details.

I said something like: “Can’t you just use degrees?  This way we can cover the whole field from 0 to 360, and figure out the distance using trig?” It seems that this is what they must have implemented.


#47    tangotiger      (see all posts) 2007/04/11 (Wed) @ 17:14

Re: LD

It sounds therefore that they create the “50%” zones based only on the FB out rate, but then include all airballs for that zone.

So, line drives that are on the periphery of the FB zones, where a FB would be converted at a 60% clip, the line drive would be converted at say the 20% clip.

And line drives to CF are probably outside the FB zones (in the 40% and lower zones), and therefore, are not part of the BIZ.

Whatever, pretty weird.  Kinda strange that no one at BIS would notice the ZR of LF/RF being under .650, while the CF was at .800.


#48    studes      (see all posts) 2007/04/11 (Wed) @ 18:13

That’s not my understanding, Tango.  I believe they create their zones based on all air balls.


#49    Los Angeles Waterloo of Black Hawk      (see all posts) 2007/04/11 (Wed) @ 19:35

You’re right, that’s not a good decision.  For some reason though, it affects corner outfielders more than centerfielders.  Are there relatively few line drives hit to center?

Yeah. According to Retrosheet data, about 30% of line drives are hit to center field versus 40% of fly balls.

Could this be a scoring issue?  Maybe some batted balls on the borderline appear to be line drives when going to the corners, for whatever reason, than when going to center.  How standardized is the scoring for liners vs. flies in such instances (and how did BIS adding “fliners” last season alter it)?

Great discussion guys; sorry for butting in ...


#50    Rally      (see all posts) 2007/04/11 (Wed) @ 20:37

Studes,

It seems strange that 1B make so many plays outside the zone - pretty much 50% of plays made in zone on average.

Strange when their zone is vectors 1-15, the only out of zone opps for them are 16-17 and 0.


#51    studes      (see all posts) 2007/04/11 (Wed) @ 21:40

I agree Rally, but one correction: all zones higher than 15 are out of zone for first basemen.

Black Hawk, the difference between line drives, fliners and flyballs doesn’t affect BIS, because they use all of them to establish zones.


#52    Rally      (see all posts) 2007/04/11 (Wed) @ 21:56

You think they are making plays in zones 18 and up?

It doesn’t seem it can be that many plays.


#53    tangotiger      (see all posts) 2007/04/11 (Wed) @ 22:31

Don’t forget those are slices.  The grid for in-zone plays for the 1B might start at 80 feet or so, meaning all bunts would not be part of the equation.

If bunts are not treated separately, they should be.


#54    Joe Arthur      (see all posts) 2007/04/11 (Wed) @ 22:50

Rally,

the formal 1B zone is probably skewed to the right by the how often he holds a runner (30% of the time?). Note that the 3B zone is shifted 2 degrees further from the line.

As David pointed out, CF see fewer line drives than corner OF, relatively speaking. This probably has two causes: 1) they play deeper, so fewer line drives hit their way carry far enough to land in their zone 2) pulled balls are relatively more likely to be line drives; the ratio of fly balls to line drives increases as the direction of the batted ball moves from the pull field through center to the opposite field. [Also, pulled line drives are more likely to become hits than opposite field line drives; probably because they are harder hit, possibly because within the family of line drives, pulled balls have less arc - this will be an interesting thing to look at with hittracker this year.

I have a google docs spreadsheet based on retrosheet 2003-2006 data which is grouped by hit type, batter and pitcher handedness. Rather than out percentage, I just figured hit%. The link to it seems to be recognized as spam so I can’t post it directly in the text here. e-mail me for the link if interested.


#55          (see all posts) 2007/04/11 (Wed) @ 22:54

well that didn’t work as I hoped - I don’t know if you’ll be able to get into the document by clicking on my name in the comment above or not. you should get my e-mail from this one.


#56    JinAZ      (see all posts) 2007/04/12 (Thu) @ 01:08

- The BIS system is still developing, and the 2006 Zone Ratings were based on different parameters than previous years.  John didn’t go back and change the old systems.  That’s probably one of the main reasons there’s so much variance between years.  John does feel that comparisons for specific players between years is still valid, however.

That’s troublesome to me.  What do you folks think this means for estimating the expected number of plays per BIZ when looking at this years’ stats?  I was planning to use the 2004-2006 totals posted by Tangotiger in comment #4, but now I’m wondering if just sticking with 2006 data ratios might be more accurate given the change in the system.  Regardless of what I use, it’ll have issues… -j


#57    studes      (see all posts) 2007/04/12 (Thu) @ 08:36

I’d just use 2006.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:21
The two uncertainties of UZR

Sep 02 15:17
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?