Friday, March 30, 2007
BIS Fielding Data
Here’s a good look at how to use the BIS fielding data found at Hardball Times.
(Hat Tip: studes.)
(This is from Joe Arthur, and should have been published as part of post #9)
| Fielding Opportunities: BIS Zones vs STATS Zones 2004-2006 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ____________2006____________| | ____________2005____________| | ____________2004____________| | |||||||
| POS | BIS biz+ooz | BIS Opp | STATS | BIS biz+ooz | BIS Opp | STATS | BIS biz+ooz | BIS Opp | STATS |
| 1B | 4851+2012 | 6863 | 8137 | 5493+1940 | 7433 | 8419 | 5406+1783 | 7189 | 8612 |
| 2B | 12679+1211 | 13890 | 15383 | 12825+1478 | 14303 | 15814 | 12129+1203 | 13332 | 15557 |
| 3B | 10880+1636 | 12516 | 13463 | 9271+2396 | 11667 | 13282 | 9007+2074 | 11081 | 13244 |
| SS | 13218+1659 | 14877 | 16071 | 12821+1948 | 14769 | 15990 | 11995+1919 | 13914 | 16067 |
| LF | 14663+445 | 15108 | 10589 | 13712+718 | 14430 | 10530 | 12242+847 | 13089 | 10211 |
| CF | 12667+2046 | 14713 | 13800 | 12590+1963 | 14553 | 13446 | 11905+2034 | 13939 | 13654 |
| RF | 15037+461 | 15498 | 11067 | 14161+695 | 14856 | 10890 | 13442+781 | 14223 | 11091 |
I agree that this a good example of what can be done with the HBT data, and that what jinAZ has done forms perhaps the most fair readily available baseline for evaluating the more advanced metrics. When they deviate significantly, there should be a good explanation in terms of an unusual distribution of easy or difficult balls in play (or park effects), things which the HBT zone rating does not account for.
If you take on faith that either PMR or fielding bible plus/minus is a highly accurate measure, jinAZ’s results still differ by 10 plays made fairly often, and by 20 plays made in a few cases. [For PMR, he does not use the most recent variation using distance/100 as a parameter, but instead the older version using “soft/medium/hard hit” as a parameter. PMR sometimes varies by 5 or more expected plays made between those versions.]
This gives some idea of how much adjustment a detailed accounting of difficulty can still bring to the analysis.
To pick up on a comment you left on jinAZ’s blog: “I don’t trust the year-to-year quality of the data recorders”; yes, BIS batted ball data quality does remain a concern.
Here are BIS line drives by season with STATS (estimated) and Retrosheet in brackets for comparison
2002: 28,512 [26487;n/a]
2003: 30,473 [26820;25686]
2004: 25,606 [26169;25647]
2005: 28,241 [26096;25445]
2006: 26,597 [26274;25904]
Recording of line drives by BIS has fluctuated significantly every single year, with a range from low to high of nearly 5000, vs. not much more than 700 for STATS and less than 500 for Retrosheet. In 2006 BIS agrees closely with STATS for the first time. If 2007 shows “internal” consistency with 2006, then I’ll start to be more confident in BIS hit-typing.