THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, November 07, 2008

Do fielders with good range commit more errors?

By , 02:10 AM

No.


You have heard this for a long time, usually in reference to telling us why errors (fielding percentage) should not be used to evaluate fielding, as it often is, either directly (the “best defensive team” is always the team with the fewest errors) or indirectly (as in the GG awards, where the voters, if they look at any stat at all, it is fielding percentage) or simply in conversation and in the MS media:

“Players with more range get to more balls; therefore they are likely to make more errors.  You can’t boot what you can’t reach.”

Poppycock.  Guess what?  Players with more range make fewer errors.  As to why, I don’t know.  It could be that they are simply more athletic or it could be that they get more love from the official scorers.

Here is the data:

I looked at all fielders who were greater than +3 (total - not per anything) or greater in range runs (UZR) and put them in one bucket - the good range fielders.  I put those with -3 range runs or less in the bad range bucket.

Then for each bucket, I looked at their “error runs” which is just error rate turned into runs above/below average using the error rate of the average fielder at that position as the baseline.

2008

Good range N=144 players

+15.1 range runs per 150 games
+1.1 error runs per 150

Bad range N=136 players

-15.1 range runs per 150
-.32 error runs per 150

If we turn that around and create 2 buckets according to error runs (essentially error rate or fielding percentage), here is what we get:

Good fielding perc. N=61 players

+3.8 error runs per 150
+.6 range runs per 150 games

Bad fielding perc. N=61 players

-6.2 error runs per 150
-5.6 range runs per 150

So a player with a lot of errors tends to have lousy range.  A player with few errors, however, tends to have only slightly better than average range (and of course the players with average errors are also slightly above average in range).

Bottom line is that a player’s fielding percentage, especially if it is poor, actually tells us a lot about his range, and it is NOT in the direction that a lot of people were thinking…

#1          (see all posts) 2008/11/07 (Fri) @ 02:46

Interesting stuff MGL.
What do the results look like if you separate the infielders from the outfielders?


#2          (see all posts) 2008/11/07 (Fri) @ 05:27

For some reason, I actually laughed when I scrolled down and saw “No.”

This does seem counter-intuitive, and IIRC Moneyball had a little section about errors where it said (in effect) that more range = more errors. Not sure if there was any research backing the statement or not.


#3          (see all posts) 2008/11/07 (Fri) @ 08:21

Might this be biased by the number of outfielders vs infielders in each bucket?

Could you just correlate range and fielding percentage and figure it out that way?


#4          (see all posts) 2008/11/07 (Fri) @ 16:53

Shoot this down if it’s screwy, but this is what occurred to me:

Fielders are positioned to cover the ‘hot spots’ where the hit ball is most likely to pass through. The ball that is hit on the edge of a fielder’s range is, perhaps, where more errors initiate, but are also least likely to occur (and for a fielder with wider range, the likelihood is less still). Those hit balls on the edge of the range of a fielder with poor range will be fielded cleanly by a fielder with good range, so no usually error occurs. If a fielder with poor range has those chances, he is more likely to make an error (or make a questionable play that is more likely to be scored an error). Those on the edge of the range of the fielder with more range may result in an error, but will occur less often, or the fielder will be given the benefit of the doubt ("He made a great play even to get to the ball, so that’s gotta be a hit").


#5    MGL      (see all posts) 2008/11/07 (Fri) @ 22:32

Matt, sure that could be what is going on.  Absolutely.

In baseball, or at least in things that tend to be nice and linear, doing a “poor man’s correlation,” as I did (where you essentially look at 2 pair of data points rather than all of them, as you would in a regular regression/correlation), usually yields the same results as a regular/full correlation/regression, but not always.

Splitting by outfield/infield makes sense to give us a better idea as to what is going on.  Of course, OF’ers make very few errors, so there is a lot of noise in anything you do regarding outfield errors.  As well, I would think there is only a very small skill component in OF errors, as opposed to IF errors.

OK, here goes:

IF only

Good range infielders N=72

+13 range
+1.6 errors

Bad range infielders N=72

-13 range
-.51 errors

OF only

Good range OF’ers N=72

+19 range
.52 errors

Bad range OF N=64

-18 range
-.1 errors

Of course, the range of OF error runs is small as compared to IF runs, but there still seems to be a “correlation” among OF, between errors and range.

let’s turn it around again:

OF only

Low errors OF’ers N=31

+.65 range
1.7 errors

High errors OF N=58

-2.8 range
-3 errors

IF only

Low errors IF’ers N=56

+2.7 range
+3.9 errors

High errors IF N=53

-4.8 range
-6.4 errors

So the relationship is still there.  And it seems to be stronger for the IF, but again, that might just be because OF don’t make many errors, so the range of “error runs” is small no matter what, with a lot of noise, because I think there is a lot of luck in an OF error (like dropping a fly ball or overrunning a ground ball or line drive hit).

Here are the actual correlations ("r")

Including everyone with at least one chance, I regressed range runs on error runs.

All players
r=.153

IF
r=.241

OF
r=.028

If I restrict the data to players with at least 50 games:

All players
r=.143

IF
r=.180

OF
r=.045

Not much help there!

BTW, that is why I like to do the “poor man’s correlation,” where we look at the two extreme groups.  If we just look at the “r” or especially the “r-squared,” we might conclude that there is little to no relationship.  But if you look at the “poor man’s correlation,” you can see that there is a pretty big difference/relationship, or whatever you want to call it.  You can probably tell the same thing if you look at the regression equation, which I don’t have to show.


#6    Pizza Cutter      (see all posts) 2008/11/08 (Sat) @ 13:00

I pulled out my OPA! numbers for shortstops from 2004-2007.  The “range rating” (per GB) was only slightly correlated with the “hands” rating.  (r = .192, min 100 GB in the SS’s general direction.) Range and arm correlated at .113 (same parameters.)


#7    MGL      (see all posts) 2008/11/08 (Sat) @ 15:25

Your .192 “r” is in the same ballpark as mine.  It seems like not much, but, as I said, if you do the “poor man’s correlation,” you can see that it is significant.  I suppose you would “see” the same thing if you looked at the regression equation which would tell you, for example, if I have a fielder who is -10 in range, what can I expect in error runs.  Probably something like -.5 runs.  And if I have a fielder who is +10 in range, probably something like +1 in error runs.  To me, that is a significant finding.

In any case, at the very least, it heartily debunks the prevalent notion or assumption that fielders with greater range make more errors.  That was my original intent (not to debunk, but to examine the question of course).


#8    Pizza Cutter      (see all posts) 2008/11/08 (Sat) @ 16:39

Guys with good range probably do get charged with more errors in the absolute sense (Smith made 9 errors last year), because they are better fielders and will get more playing time at the position. 

The low correlation suggests though that the two skills are largely independent, and so there probably are some guys who, because of their bad hands, make more errors because of their good range.  It’s just not a 1:1 relationship.


#9    MGL      (see all posts) 2008/11/08 (Sat) @ 20:42

Guys with good range probably do get charged with more errors in the absolute sense (Smith made 9 errors last year), because they are better fielders and will get more playing time at the position.

Huh?  You don’t think we’re talking about error rate

The correlation suggests that the two skills ARE related.


#10    MGL      (see all posts) 2008/11/08 (Sat) @ 20:47

This type of correlation tells you nothing (well, a little actually) about the relationship between the skills, other than as suggestion that there is or is not a relationship. 

It tells you about the relationship between one small sample of performance (ert that skill) and another.  You, as a “statistician,” should know that!

Remember that in these kinds of regressions/correlations, the “r” is a function of the relationship between the underlying skill or skills, and the underlying sample sizes.  If I found a gigantic “r”, say .9, but then looked at only samples of 10 games or less.  What would my “r” be then?  Probably .1 or .2.  How can the relationship between the skills change when the only thing that has changed are the underlying sample sizes?  Because when we do regressions like these we are comparing a sample of the skill (not the skill itself) as represented by a certain sample size.


#11    Pizza Cutter      (see all posts) 2008/11/08 (Sat) @ 22:24

There very well could be sampling error, although a good minimum inclusion criteria usually solves for that problem.  As your sampling frame gets bigger (and thus, your measurements more reliable) the correlation between the two variables approaches it’s “true” value, in an asymptotic manner (you always get closer, but never truly touch it.) In this case, it will not likely approach 1.0.  It will in the type of split-half stuff that I usually do because split-half correlates HR rate to HR rate.

I bumped up the minimum inclusion criteria to 200 groundballs and got a range-hands correlation of .169 and a range-arm correlation of .191.  Pump it up to 300 and you get .200 and .151 (again).  At 400, it’s .167 and .112.  At 500 GBs it’s .140 and .174

The numbers are likely fluctuating due to some selective sampling issues (as I increase the floor number, fewer players qualify, so the samples of players are not the same).  But the story is the same.  The correlation between being able to get the ball and being able to field it cleanly is a very weak (although positive) one.  It’s “true” value is probably somewhere around .15.


#12    MGL      (see all posts) 2008/11/09 (Sun) @ 03:37

Pizza, fair enough.  I guess when you use qualitative language (such as “weak”, “strong”, “good”,"bad"), you can go around in circles.  The correlation is what it is.  What is more important to me than the correlation is the regression equation - the slope.  Nice job!


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jul 04 01:40
BPro Idol

Jul 03 01:39
sUZR v bUZR

Jul 02 21:15
Batting Order and the pitcher

Jun 30 07:22
NHL draft analysis and spreadsheet 1994-2009

Jun 30 04:14
The Poz goes FJM on Harold Reynolds’ a$$ - gather around the kids

Jun 30 00:11
Blogosphere Question of the Day, 06/24; OR Why should OPS die?

Jun 27 16:04
Loss aversion in golf

Jun 26 16:30
Donald Fehr

Jun 26 14:04
Barry Code

Jun 26 10:33
David Wright