THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, December 06, 2007

Simple Fielding Runs

By Tangotiger, 04:38 PM

Dan Fox comes up with his Retrosheet fielding system.  Last year, Chone also came up with his Retrosheet fielding system.  Hopefully, jinaz will come up with a comprehensive comparison of UZR, PMR, RZR, Fans, Fox, and Chone.

In this year’s THT Annual, you will see parts of my version which uses Retrosheet, but no zones at all.  In my case, it’s not useful year-by-year, but over a period of years.


#1    DanAgonistes      (see all posts) 2007/12/06 (Thu) @ 17:19

I didn’t mean to diss Chone (hey, I think that’s the first time I’ve actually tried to type that slang word). I just didn’t find his article.

It’ll be interesting to see how they differ. Thanks


#2    Guy      (see all posts) 2007/12/06 (Thu) @ 17:39

Dan:  Nice work, as usual.  I understand that this is just a quick-and-dirty first look (Dan doesn’t control for batter or pitcher handedness, for example), but even before working on those kind of issues I think it would be worth looking at a version that excludes popups, and even one that uses only GBs. 

I think there’s a consensus here that popups tell you nothing helpful about infielders, since 95%+ will become outs in any case.  And there are a few “popup hogs” (e.g. Orlando Hudson) whose rating gets inflated by including them. 

The case on LD’s is less clear, but my suspicion would be that—in the absence of zone data—your estimates of LD opportunities is so rough that you may be adding more error than signal by including them.  Surely the percentage of LDs fielded by an OF that could possibly have been caught by an IF must be pretty small.  In any event, I think it’s worth checking to see if your correlation with metrics like UZR or Plus/minus is any better (or worse) for a version of SFR that includes only GBs.


#3    Tangotiger      (see all posts) 2007/12/06 (Thu) @ 17:41

And I didn’t mean to suggest otherwise, either!

***

Fielding Bible can be found here:
http://www.billjamesonline.net/fieldingbible/2005-2007-plus-minus-leaders.asp

(Not sure if it’s for subscribers or not.)

Most recent Fans results are here:
http://www.tangotiger.net/scouting/scoutResults2007.html


#4          (see all posts) 2007/12/06 (Thu) @ 18:50

I did some research on IF LD a few years ago, and I think I came up with a y-t-y correlation similar to GB, for 2B and SS only.  Some day, I’ll have to look into that again. 

Right now, I think that either way is fine.  For 2B and SS, it might be slightly better to include LD, but I don’t think it matters much one way or another.  You definitely don’t want to include pop flies for IF!

One of the philosophical problems when you combine things into one metric is what to include when some of the things have high correlations and some have low ones.  That is the thing with DIPS, right?  BABIP is a skill, it is just that the skill to noise ratio is a lot lower than for K, BB, and HR rate.  So the question is to include it in one metric or not.  The same goes for defensive metrics and GB, LD, and pop-ups for the IF. 

One of the answers is that for small samples, it is “better” to not include the low correlation/skill items and for large samples it is better TO include them.

And BTW, let us not forget that PBP defensive and good non-PBP ones eventually appoach one one-another as the sample sizes get large enough since the ONLY benefit of the PBP metrics is to get finer data for small sample sizes.  Exactly where the ball is hit, the proportion of LH and RH pitchers and batters, the baserunners, etc., “evens out” such that the number of balls caught at each position per balls hit will eventually be a constant percentage for each fielder.  And once we account for the pitchers and batters’ handedness, which we can with non-PBP systems, the only thing left to “even out” is the exact location of each batted ball.  And that should “even out” for pretty much all players over time, as long as there is not some major bias in a player’s pitchers or the batters they face, or something like that.  Of course, park factors are always a problem, but that can also be accounted for in non-PBP metrics.

So, basically I am a big believer in non-PBP defensive metrics. Their ONLY negative is sample size.  It is harder to take them seriously for small sample sizes.  And for small samples, the PBP ones are ALWAYS better.  At some point in time (at some large sample), the non-PBP ones may actually be better since there is less to “screw up,” although that point may be longer than most players’ careers, I don’t know.  And one of the problems with defensive metrics based on long-term samples (this is true for pitching and to a lesser degree for offensive metrics), is that defensive true talent changes significantly over time (with age and injury), such that knowing someone’s average defensive talent over the last 6-8 years may not be that helpful for a projection if you don’t have confidence in your last 2 or 3 years’ results.


#5    Rally (Chone)      (see all posts) 2007/12/06 (Thu) @ 20:00

Dan, no diss taken.

I think I explained exactly how I came up with those ratings, but feel free to email me if I didn’t.  Anyway, its good to see somebody else looking at this, and see if we get similar results.

As to line drives - with detailed ball location data like MGL has from stats, you could look at how many line drives are caught in a certain area.  Working from retrosheet, I know how many line drives a shortstop caught, and how many were fielded by the left and center fielders.  I don’t feel comfortable estimating opportunities from that so I stick with groundballs for infielders.  My groundball data is similar, I know how many a SS fields, how many go to the lf or cf, and how many become infield hits or errors.  Of course some groundball singles to center are impossible for any SS to field, but I don’t know which.

I guess I look at it this way, at least I know its on the ground, so the error can only be in one dimension.  With linedrives it could be to the right of second base, or it could be 25 feet over his head.


#6    Guy      (see all posts) 2007/12/07 (Fri) @ 14:48

Dan has team numbers up at his site now.  CO ranks first at +93 outs, FL is last at -140 (these are outs, not runs).  I think they reveal a serious problem with including LDs in the estimates for infielders, which is that some teams’ pitchers give up more LDs (and/or fewer GBs/IFs) than others.  Consider CO and FL:  using THT data, I estimate that for CO infielders 26.8% of their chances were LDs, compared to 30.7% for FL infielders.  Assuming that 9% of LDs become outs by infielders, compared to 65% of non-LDs (estimated from numbers in Dan’s article), that means the 4-pt. higher LD incidence for FL accounts for about 38% of the reported difference between them and the Rockies IFs.  Presumably, individual players may face even greater differences in the distribution of BIP types faced, skewing individual player ratings as well.

So I think Dan needs either to exclude the LDs, or find a way to control for the LD/GB/IF distribution that players and teams face.


#7    Guy      (see all posts) 2007/12/07 (Fri) @ 14:52

Oops—I failed to account for the fact that some LDs become OF outs, and so won’t count as opportunities for IFs.  So my 26.8% and 30.7% estimates are both a bit too high.  But this shouldn’t change the spread between the two teams much, so I think the larger point stands.


#8    Rally      (see all posts) 2007/12/07 (Fri) @ 18:43

I’ve created a link to my 2005-2006 ratings here:

http://home.comcast.net/~briankaat/statsite.html


#9          (see all posts) 2007/12/08 (Sat) @ 10:46

Dan,

Thanks for mentioning DRA.  Your system looks excellent.

Rally,

On your website, you say the following:

“For the 2003-2006 the batted ball type is reasonably complete, and it tells you who fielded every out or hit. Its enough to make a pretty good, though certainly not perfect system.

“For the older seasons its a lot tougher. Batted ball type is generally available only for outs, and a lot of the time the position that fielded hits is incomplete.”

Dan and Rally--for which seasons is the Retrosheet data good enough to apply Dan’s method and Rally’s ‘optimal’ version of Total Zone?


#10    Guy      (see all posts) 2007/12/08 (Sat) @ 14:49

Having gone back to Dan’s original article, I see that he DOES adjust expected outs for the distribution of LD/GB/IF.  So the concern I raised in #6/7 shouldn’t be a problem.  However, there is a pretty high correlation (-0.47) between a team’s SFR and the LD/(LD+GB) ratio it’s pitchers allowed.  With one year of data, that could be a fluke and/or a function of my looking at BIS rather than Retrosheet data to estimate the LD proportion of IF opportunities.  But it’s large enough that it seems worth checking to see if it occurs in other years as well. 

Another check would be to look at the correlation between a player’s rating and that of the other 3 IFs on his team.  There should be little or no correlation, so if you find one then probably something about the distribution of BIP is impacting all the players on that team.


#11    Rally      (see all posts) 2007/12/08 (Sat) @ 20:17

Michael, I was able to get all the data I need for totalzone for 2003-2006.  2000 to 2002 is incomplete, and 99 is missing from retrosheet’s files.  I think for 93-98 I could do TotalZone and then some, it has project scoresheet hit locations - I could really do a poor man’s UZR for those years.  But so many projects, so little time.  I’m doing some stuff for the Hardball Times 2008 Preview and have a deadline at the end of the month.

I tried working on it today but Studes and the gang distracted me by having my THT Annual show up on my doorstep.  Tango has an article about another approach(es) to using retrosheet to measure defense.  Check it out, the cool thing is there’s no reason it wouldn’t work for every year retrosheet has data.


#12          (see all posts) 2007/12/08 (Sat) @ 22:29

Rally,

Thanks.  I know you’re probably all sick and tired of hearing this, but I really am writing the d*mn book applying DRA to rate the best fielders from 1893-2006.  I should have all the actual ratings by year end.  (I’ve already done the shortstop chapter, as well as the chapters explaining the system.) It’s been a lot of work, particularly developing a system for outfielders before 1957 and testing it for contemporary fielders (that is, using the limited pre-1957-type data on contemporary outfielders and seeing how well they match with zone-type systems).

I don’t want to overstate in my book DRA’s accuracy relative to other systems not reliant on proprietary zone data.  I have incorporated more Retrosheet data into DRA, which solves the problems at third base and to some extent in right field.  The correlation with Plus/Minus in the infield and PMR/UZR (averaged) in the outfield is now .76 without any edits, though as always, only including full-time fielders who played at least two full-time seasons between 2003-05.  As before, the standard deviation in runs saved under DRA is virtually identical to that under P/M or PMR/UZR.  As Mitchel rightly points out, zone systems should be better for smaller samples.

It would seem that your Total Zone and/or Dan Fox’s system could be best for years from 2003 onward, and, of course, if you have Project Scoresheet data for 1993-98, you can do a good zone-type system for those years.

Looking forward to reading Tom’s article in THT Annual.


#13    Rally      (see all posts) 2007/12/11 (Tue) @ 10:17

Chris Dial alerted me to new retrosheet data, and I spent last night cranking out the 2007 Totalzones.  Here are the players at each position with at least 100 chances last year.  I plan on writing an article on the methodology and publishing some split data in the future.

http://home.comcast.net/~briankaat/tz2007.xls


#14    Guy      (see all posts) 2007/12/13 (Thu) @ 16:00

Dan Fox has made some revisions to SFR, improving the correlation with UZR and bringing its variance more in line with other metrics:
http://www.baseballprospectus.com/article.php?articleid=6990.


#15    studes      (see all posts) 2007/12/13 (Thu) @ 16:14

Rally (or anyone else):

Something I’ve wanted to analyze is the impact of Coors Field on infielder fielding.  One of the things they do at Coors to keep offense down is let the grass grow long, and this has to help Tulowitzki.  The question I have is how much.

We don’t get RZR data from BIS broken out by home/away.  Is that something you or others could look at?


#16    Rally      (see all posts) 2007/12/13 (Thu) @ 17:33

No problem Studes.  I use one combined park factor for all infielders - simply the % of groundballs turned into outs at home vs on the road.  When I get home I’ll post the groundball factor for Coors.  Actually I might as well post the factor for all teams.


#17    Rally      (see all posts) 2007/12/13 (Thu) @ 20:41

For Colorado and their opponents, 2003 to 2007, the out rate was .751 at home and .754 on the road.

The infields that seem to help most in turning grounders into outs are Cincinnati, Oakland, and San Diego with a +.018 rating.  Yankees are at the other end (-.017) but thats not enough of an adjustment to prevent Jeter from ranking as one of the worst in the league.


#18    tangotiger      (see all posts) 2007/12/13 (Thu) @ 20:57

In the THT annual, I also concur that Yankee Stadium is tough on SS.  I also concur with Rally that it’s nowhere near enough to get Jeter out of the big negative, especially considering that I included all the “peak” years of Jeter in there.


#19    studes      (see all posts) 2007/12/13 (Thu) @ 21:42

Thanks, Rally.  As I understand it, the long grass in COL was a recent development.  Can you look at the breakout for 2007 only?


#20    studes      (see all posts) 2007/12/13 (Thu) @ 21:48

Well, I just realized that we actually get that info from BIS, in a different form.  Looking at 2007 only, 78.5% of COL pitchers’ GB were fielded for outs, vs. 76.0% away from home.

I need to think through how that would have affected Tulo’s stats.


#21    studes      (see all posts) 2007/12/13 (Thu) @ 22:16

Clarifying, that 78.5% is the figure for groundballs at Coors.


#22    Rally      (see all posts) 2007/12/13 (Thu) @ 22:20

Here is my shortstop data, with some analysis:
http://lanaheimangelfan.blogspot.com/2007/12/2007-total-zone-shortstops.html#links


#23    studes      (see all posts) 2007/12/13 (Thu) @ 22:31

Looking at 2007 only, Rockies had the 7th most advantageous home GB advantage—not an overwhelmingly big factor.  Interestingly, the average team did better at home: 74.3% vs. 73.5% away from home.

Even more interesting: the team that had the biggest home advantage was the Pirates, a whopping 75.4% vs. 69%.  Seems like a one-year fluke.


#24    david smyth      (see all posts) 2007/12/14 (Fri) @ 20:30

Here are some numbers for SS in 2007, in +/- runs/162G, range only (no GDP, etc.). This is a simple system I’ve been playing around with for a while. How do these numbers look, compared to the usual data-intensive PBP stats?

Tulowitzski, +36
J Wilson, +23
McDonald, +22
Everett, +22
Pena, +7
Crosby, +7
Hardy, +6
Rollins, +5
Renteria, +5
Bartlett, +5
Vizquel, +4
Greene, +4
Tejada, +3
Reyes, +1
Peralta, +1
Furcal, 0
Cabrera, -2
Drew, -2
Lugo, -2
Eckstein, -3
Theriot, -5
Harris, -6
Young, -9
Uribe, -10
Betancourt, -11
Gonzalez, -11
Lopez, -17
Jeter, -19
Ramirez, -20
Guillen, -22


#25    tangotiger      (see all posts) 2007/12/14 (Fri) @ 22:09

Comparing to here:
http://www.tangotiger.net/scouting/pos2007_SS.html

It’s a generally reasonable list.  The big pluses and big minuses deserve their numbers, while ther smaller pluses/minuses are more or less on the right side of the ledger.


#26    Tangotiger      (see all posts) 2007/12/27 (Thu) @ 18:36

Dan has updates on players.  I’ll provide the general link to all his fielding work:

http://danagonistes.blogspot.com/search/label/Defense


#27    Tangotiger      (see all posts) 2008/01/24 (Thu) @ 16:45

Updates, including a downloadable version, which includes the retroId, making it easy to link to my UZR database.

http://danagonistes.blogspot.com/2008/01/sfr-v10.html


#28    tangotiger      (see all posts) 2008/02/01 (Fri) @ 23:36

Downloadable minor league data:
http://danagonistes.blogspot.com/2008/01/outfield-defense-redux.html


#29    Tangotiger      (see all posts) 2008/02/14 (Thu) @ 15:21

In similar spirit to MGL’s UZR, and as a blueprint for what everyone should be doing, Dan Fox presents complete results of his fielding system, 2003-07, including providing a retroID:
http://danagonistes.blogspot.com/2008/02/defending-in-wide-open-spaces.html

(Look for link at the bottom.)

***

You guys have made my Retro fielding database the #1 priority once the MLB season starts.  So, between Dan, Rally, and me, we should be a in a great position to come to a consensus.


#30    tangotiger      (see all posts) 2008/02/21 (Thu) @ 22:27

Dan gives us more data than we can handle:

http://danagonistes.blogspot.com/2008/02/maybe-he-is-but-maybin-he-isnt.html


#31    Tangotiger      (see all posts) 2008/03/31 (Mon) @ 10:40

1988-1998:

http://danagonistes.blogspot.com/2008/03/historical-sfr.html


#32    Tangotiger      (see all posts) 2008/04/17 (Thu) @ 10:41

More data for you:
http://danagonistes.blogspot.com/2008/04/get-your-sfr-data.html


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 21 17:29
Sabermetric Moves of the 2009 Pre-Season

Nov 22 06:40
The New Triple Crown

Nov 22 06:24
Chance of Scoring by Base/Out, Retrosheet Years

Nov 22 02:48
How good are the Fans in evaluating fielding?

Nov 21 20:13
Runs Produced

Nov 21 19:27
Marcel 2009 is here

Nov 21 16:43
Nate Silver: hero to interviewers

Nov 21 10:57
New BBTN

Nov 20 20:34
ABSO-lutely… not!

Nov 20 19:23
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being