THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, December 01, 2006

Minor League Fielding

By Tangotiger, 10:46 AM

Genius is 1% inspiration and 99% perspiration.  We need more hard workers like Jeff Sackmann.

The run value of a play is around 0.80 runs.  I’ll post a clear illustration as to why this is so, in a few minutes.


#1    Rally      (see all posts) 2006/12/01 (Fri) @ 14:33

If you have precise data then .8 is the right run value.

David Gassko’s Range stat is an estimation using traditional fielding stats.  Jeff Sackman has actual gb/fb stats, so he doesn’t have to estimate chances based on team assist and putout totals, but its still just an estimation.

If you look at fielding for major league players, you’ll get wild results like this, with players having a +-50 or +-60 run value.  Do that for recent years where we have PBP data, and the “adjusted range factor” stats have a much larger spread.

So I think its appropriate to use a smaller run value to bring your results into line with what we know is the talent spread from using PBP data.


#2    tangotiger      (see all posts) 2006/12/01 (Fri) @ 14:38

Rally, what you are saying is that you want to regress first to reduce the uncertainty level of the metric, and then secondly apply the runs per play figure.  I don’t disagree, but unless its explicitly stated that way, then I don’t think anyone else will understand.


#3    tangotiger      (see all posts) 2006/12/01 (Fri) @ 14:40

Illustration on why to use .80 runs, not .50 runs for a fielding play:

http://www.tangotiger.net/archives/stud0247.shtml#1011


#4    Rally      (see all posts) 2006/12/01 (Fri) @ 14:47

Yes, that is a better way of stating it.


#5    Rally      (see all posts) 2006/12/01 (Fri) @ 14:50

And these numbers are up for all players on Sackman’s site.

I was confused at first since his headline still says Defensive stats are coming soon!

But just go to a player page, and after his splits there’s a link for defensive stats.  Totally sweet.


#6    Anthony      (see all posts) 2006/12/01 (Fri) @ 20:38

How reliable would you say these numbers are? How much regression is appropriate?


#7    Jeff Sackmann      (see all posts) 2006/12/01 (Fri) @ 20:44

Thanks for the kind words. 

A couple of questions, Tango: is .8 the proper number for IF and OF?  I had thought that the number would be greater per play for outfielders, since more of those balls turn into extra-base hits.

Second, are you confident that the same values (at least roughly) apply to the minors?  I can’t think of compelling reasons why they wouldn’t, but I’m curious what your thoughts are on that.


#8    Tangotiger      (see all posts) 2006/12/01 (Fri) @ 23:11

I think .75 for IF and .85 for OF would probably fit the bill.

As for the majors/minors, it doesn’t matter.  It’s based on the run environment.  So, for a roughly 5 RPG environment, it would be .80 runs.


#9    Peter Jensen      (see all posts) 2006/12/02 (Sat) @ 00:19

The minor league stats for hit balls are like the major league stats and retrosheet’s hit ball types in that they indicate who ultimately fields the ball rather than the fielder that best had a chance to make a play on the ball.  This moves a lot of GB’s to the outfield instead of being included as balls that an infielder might have made a play on.  This can be adjusted for, of course, but it does mean that you will get wacky numbers if you try and use metrics that were developed on STATS or BIS data.


#10    MGL      (see all posts) 2006/12/02 (Sat) @ 04:07

Hmmm.  I am not really familiar with the methodology (Gassko’s) although I read his work a while ago.

You definitely want to use the “correct” run value for a play, which is indeed around .80 runs and THEN do your regressions afterward.

Generally when you are using less granular data and then make estimates on what you really want, the spread is usually smaller and not larger.

So I am not sure why this methodology yields larger spreads.  That seems counterintuitive to me.  For example, using ZR rather than UZR always comes up with smaller spreads, as it should.  If you are using non-PBP data, you are really estimating ZR and certainly not UZR so you should come up with spreads similar to that of ZR, maybe even smaller.

These numbers seem awfully large.  Of course, if you look at the “per 150” numbers and they are only based on 50 or 75 games, the spreads will automatically look large.

I’ll really have to delve into DG’s methodoloy to see why the spreads seem so large (maybe they aren’t - maybe they just look like that to me at first glance). 

One thing I always wondered about was a comparison of AAA defense to the major leagues.  While you would expect that major league players are much better at defense (since they are partially being selected based on their defense), you have two things working significantly against that:  One, minor league players are mostly promoted based on their offense and not their defense, and two, younger is generally better when it comes to defense. I would not be surprised if the average minor league defender were as good as the average major league one. 

What do you guys think about that?


#11    Rally      (see all posts) 2006/12/02 (Sat) @ 11:00

I’ve probably looked at every type of “adjusted zone rating” stat that can be imagined, which is what RANGE is, and the spreads are always huge, far larger than what you get with PBP data.

Its because players get different amounts of chances to field, and it is very difficult to estimate how many chanceseach player gets, even if you know gb/fb and rh/lh batters faced.  It gets even larger when you have to estimate these.

I think one reason is there is greater variance between players in number of chances than in their ability to convert those chances, but I’d have to check and se if this is true.


#12    Rally      (see all posts) 2006/12/02 (Sat) @ 11:35

Ok, here’s some numbers to back up what I vaguely remembered:

Take all 2006 shortstops with at least 500 innings.  Figure chances per inning, using the “CH” denominator for zone rating.  Figure every player’s chance percentage compared to the league leader.  In this case Craig Counsell is the leader, and Felipe Lopez the trailer, with only 74% as many chances per inning.

Do the same thing for zone rating.  Adam Everett is our fearless leader, and Felipe Lopez again the trailer (since he’s such a bad SS, good thing he doesn’t get a lot of chances), he fields balls 86.7% as well as Everett.

Take the Standard deviation of each column, and I’ve got .067 for chances and .030 for zone rating.  So something like range factor is about 70% noise and 30% real.  Gassko’s RANGE corrects for some of that, but any non-PBP method faces a long uphill battle to be useful at all.


#13    Rally      (see all posts) 2006/12/02 (Sat) @ 14:15

I’ve been thinking that using the denominator for ZR might lead to bad results here - since it includes balls out of zone that are fielded.  A player’s range (or lack thereof) might influence the variation in opportunities.

Luckily for us, John Dewan has an improved range factor in his Fielding Bible, so I looked at the same data for 2005, not counting plays made outside of zone.

The results were pretty much the same, std(opps) = .078, std(zr) = .033.


#14    David Gassko      (see all posts) 2006/12/02 (Sat) @ 15:57

Mickey,

The reason Zone Rating has a small SD is because it does not properly reward good fielders who get to a lot of balls out of zone.

A non-PBP metric should have a greater SD than UZR, IF it is calculated on the individual level. Imagine the following scenario: Every player has a true talent of 0, there is no random variation in fielding performance, and STATS perfectly captures the probability of play being made on each BIP. UZR will rate everyone as 0, and will have captured the exact spread in true talent, which is also 0.

On the other hand, take the polar oppositve of UZR: Range Factor. There will still be plenty of variation in the number of balls hit at every position and the difficulty of fielding them. So even if everyone is average (or maybe ESPECIALLY because everyone is average), different players will have different RFs, and the SD > 0.

This is essentially what happens. The SD of any metric is equal to, SD(Performance)^2 + SD(Noise)^2. Because there is a lot less noise in an advanced metric, its SD will be lower.

This does not apply to Davenport or Win Shares because they use a top-down system, which limits how high players can be rated. It also seems to me that Davenport may “re-distribute” ratings or something; that is, if the first and second baseman are +20 each, and the shortstop and third baseman are -20 each (after controlling for left-handed innings pitched), he’ll make the first and second baseman +10 each and the shortstop and third baseman -10 each, though I’m not sure that’s the right thing to do (I also don’t know that’s what he actually does...just a suspicion).


#15    MGL      (see all posts) 2006/12/03 (Sun) @ 01:10

DSG,

OK, I’ll buy that but I am still not sure it applies to every situation.  I’ll have to mull it over a bit.  Your example makes perfect sense of course.  Maybe I simply have it backwards for some reason.  My thinking has not been too sharp lately.  Someone in my family is seriously ill and I am using baseball as merely a diversion these days.


#16    Chris C.      (see all posts) 2006/12/03 (Sun) @ 13:01

For what its worth, I’ve developed a PBP fielding metric (similar to PMR) for minor leaguers and the spread is generally much smaller than what Jeff is presenting for the same players/seasons.


#17          (see all posts) 2006/12/03 (Sun) @ 13:51

Is this something that you can share?  Or is it in the subscription part of your site?

How do you get this, from the game logs on MiLB that Sackman queries or is there a way to pay for more detailed data?


#18    Jeff Sackmann      (see all posts) 2006/12/03 (Sun) @ 14:39

I would not be surprised if the average minor league defender were as good as the average major league one.

What do you guys think about that?

My preliminary findings support that.  All I have to go on as of yet is running MLEs for players who played in MLB in 05 and AAA in 06, but there’s still a substantial sample at each position.  IIRC, at no position is there more than a 2% difference in (plays made)/(est avg plays) except for CF, where it’s 3%--and AAA defenders look better!  I’m not putting a ton of stock in that particular translation (or any of them, just yet), but your reasoning (players promoted based on offense, and average age) suggests that it’s plausible.


#19    Chris C.      (see all posts) 2006/12/03 (Sun) @ 14:41

I don’t use the exact game logs that Jeff uses, but at milb.com there is hit location data available freely available for all minor league games from 2005 and on. So I have two years of fielding data at the minor and major league levels. That’s also the data source I use for the batted ball charts, etc.

Anyway, if you e-mail me I can share specific results if you are interested in certain players, positions, etc. To be completely honest, seeing folks respond to Jeff’s work (which is a fine effort, by the way) is discouraging me from releasing everything publicly any time soon. So many people seem to be taking results like +44 for Ellsbury or +24 for Kouzmanoff at face value, and that tells me I really need to think hard about how to present any minor league fielding data responsibly. There are issues of small sample sizes and extreme park effects in the minor leagues that everyone really needs to consider when looking at the results.


#20          (see all posts) 2006/12/03 (Sun) @ 15:09

Chris, I didn’t see an email link at firstinning.com, but you can reach me at rallymonkey5 - at - comcast - dot - net.

I would be interested in seeing anything you are willing to share.  I cannot get enough of this stuff.  But I’d certainly be interested in seeing data for the Angel shortstop prospects, and for Hanley Ramirez in 2005.  I have a crude defensive metric called JAARF (just another adjusted range factor) and Ramirez did not rate well in 2005 (before your site and Jeff’s my defensive data came from typing in from Baseball America’s almanac, which doesn’t even have innings played.

Ramirez was really bad as a rookie by the pbp stats, but had a good reputation.  I’m wondering if that reputation was deserved or if we should have expected bad defense from him.

Jeff, have you looked at the significance of the minor league vs major league data?  I tried looking at it and found virtually none, an r of .02, from a sample of about 35 players, but I don’t have the complete set of players and was looking at range vs zone rating, and those two don’t correlate that well anyway, since to a great extent they are measuring different things.


#21    Jeff Sackmann      (see all posts) 2006/12/03 (Sun) @ 15:45

Jeff, have you looked at the significance of the minor league vs major league data?

Not yet.  I’m guessing there would be very little in the sample I have to work from.  There are a precious few players in the sample with more than a couple hundred innings at each level, so their individual stats are pretty meaningless.  The validity of the translations is predicated on all of that meaningless variation averaging itself out 3000-3500 innings per position.


#22    JoeArthur      (see all posts) 2006/12/03 (Sun) @ 21:50

Jeff,
as I understand your method from your article at baseball analysts, it ultimately builds upon Charlie Saeger’s ball in play expectations derived from major league data a few years ago [eg how likely that a LHB will hit a ball to the 3rd baseman]. I would expect that these ratios might vary somewhat especially in the lower minors. Have you looked into this already? Likewise, in terms of run value, certainly you’re right that the average run value of a missed play by an outfielder should be higher than for an infielder because it is more likely to result in an extra base hit; as Tango points out the value depends on the run environment. But because extra base hits are rarer especially in the lower minors, I suppose that the run values are a little different too. Depending on how messy you want to get, if you want to denominate your results in runs, you might want to normalize the results so they are comparable accross leagues. In a sense perhaps that’s what you’re already doing if you’re using major league values for the calculations.

Chris,
I have only glanced at the inning_hit files for the minor leagues; for the majors these files omit the locations of balls in play which became errors, and they record where the ball was picked up, not where it landed, which of course makes it more difficult to rate outfielders. I assume but don’t know that the same rules apply with the minor league data. 1) do you try to fill in the missing fielding errors when you construct your PMR-like metric? 2) caroms off a side wall or the fence can mislead about the direction in which the ball was actually hit. That would be very park-dependent and in the long-enough run would shake out in park factors. Without something like mlb.tv to use as a double check, do you have any other ways to catch limitations in the minor league data?

This is exciting work, though it will take some time to know how to interpret the results; I hope the milb.com data will remain accessible and that you both will continue to work with it. Thank you.


#23    tangotiger      (see all posts) 2006/12/04 (Mon) @ 00:01

I don’t really see much point in using anything other than a .80 run value.  A great fielder would be +40 plays at the extreme, meaning a .75 runs per play would be +30 runs and .85 would be +34 runs.  And that’s at the very extreme.

And considering that you’ll rarely be comparing the IF to the OF, you are really splitting hairs here.

I kinda prefer the way Pinto present its, as plays made.  Because at least he makes it rather clear that he is not looking to see if Ichiro prevents more XBH than average because he sets himself deep.

If you do go to the trouble of doing that, then an “Extended” measure, as Dewan does it, would call for a run conversion.

We don’t convert OBP to runs, do we?  We just show it as OBP, even though we can convert it as:
(OBP minus lg OBP)/1.15 = runs per PA
whereby we assume a typical distribution of XBH


#24    JoeArthur      (see all posts) 2006/12/04 (Mon) @ 02:16

Tango,

Of course I mildly disagree, though I suppose it depends on what accuracy you’re claiming for a particular metric, and how people intend to interpret it. People regularly argue roster construction decisions and MVP votes involving defensive value across positions. But that’s not really my point. I’m principally concerned with comparisons to fielders playing the same position.

Assuming we have the data on type of hit allowed, which Jeff and Chris do, it’s probably a loss of accuracy of plus or minus 2 or 3 runs or so to ignore it. In today’s analytic environment, I don’t think that falls all the way to the level of splitting hairs. 

I know you don’t need this spelled out, but one can imagine 2 outfielders with equivalent range; one plays a bit shallower. Because of this he takes away 7 singles on balls hit in front of him but gives up 5 more doubles over his head. He’s made two extra plays but basically broken even on run value with the outfielder who plays deeper. In this example, attending to run value of the hits allowed enables a truer judgement of defensive value than mere plays made.

It’s polluted by park effects, but the spread between top and bottom MLB teams in extra base hits allowed is eg 110 or so. That certainly creates room for differences of 35 or more between the top and bottom teams at the CF position specifically. Worth accounting for; after park effects, for CF accounting for XBH prevented might be a little greater than just plus or minus 3 runs.


#25    tangotiger      (see all posts) 2006/12/04 (Mon) @ 08:56

You missed my larger point when I said:

If you do go to the trouble of doing that, then an “Extended” measure, as Dewan does it, would call for a run conversion.

So, I’m not saying not to do that. I’m saying that you *need* to do this, if you want to express it in runs, and you need to really distinguish between .70, .80, and 1.00.

But, if you are doing what Pinto is doing, looking only at plays made, without looking to see if you have extra XBH, then just using a generic .80 is fine, since you’ve got more uncertainty there, so what’s the point of trying to get really accurate.

Like I said, what Pinto does is analogous to OBP.  And since almost all non-Tango fielding comps are intra-position, it makes little sense to convert a Pinto-model to runs as a final product.  Let the analyst who needs it do that.  Presenting the OBP-equivalent (like ZR) is sufficient.


#26    Rob McQuown      (see all posts) 2006/12/04 (Mon) @ 15:41

Without mining the retrosheet data, I would assume a breakdown of balls hit to outfielders (over the population) that are not caught to be something like:

1b: 33%
2b: 50%
3b: 17%

That would provide the following run expectancies (in a 5-run environment):

1b: (.33)(.3+.475)
2b: (.5)(.3+.776)
3b: (.17)(.3+1.070)

That would make a defensive “+” play made by an outfielder worth approximately 1.0 runs.  I agree with Tango’s point ("I’m saying that you *need* to do this, if you want to express it in runs, and you need to really distinguish between .70, .80, and 1.00"), and that if you are going to do inter-positional comparisons, you need to use something more accurate than 0.8 runs/play.


#27    tangotiger      (see all posts) 2006/12/04 (Mon) @ 16:26

Rob, I would be surprised if your numbers are anything close to reality.  For every 6 OF hits, you expect 3 doubles, 2 singles, and 1 triple?

For every 135 hits, there are about 100 singles, 32 doubles and 3 triples.  Of the 100 singles, probably 80 of them are OF singles.  I’d guess 28 of the doubles are OF doubles, and all 3 triples being OF triples.  That gives us a breakdown: 72/25/3.  Maybe 70/27/3?  But certainly not 33/50/17.


#28    MGL      (see all posts) 2006/12/05 (Tue) @ 03:25

BTW, the value of the out in modern, high offense baseball is much closer to .27 or .28 than to the .30 of years ago.  It is not that big a deal, but let’s stop using .30 as the value of the out.  I don’t have the numbers in front of me, but the value of the IF and OF hit vs. out is close to the .75 and .80 that has been mentioned.  It is easy enough to figure out of course.


#29    tangotiger      (see all posts) 2006/12/05 (Tue) @ 11:34

In the 5.0 RPG environment, the run value of the out is -.30 runs, as shown in THE BOOK.  This is also verified by:
http://www.tangotiger.net/markov.html

If you use the default values on that page, you get a RPG of 4.96, and a run value of the out of -.304, no K, and -.326 for the K.  These values are a little more extreme than expected, because my program doesn’t allows runners to be out on base, and therefore, a batting out will be more detrimental (more runners on base).  The important point though is that it establishes a baseline.

So, what do we do now?  Change the AB figure from 37 to 38.4, and change the K value from 7 to 7.36.  The RPG is now 4.55 runs.  The run value of the out goes to -.280, -.300.

So, to change the run value of the out by about .025 runs, you need to change the run environment by about .40 runs per game.  (Obviously not a linear relationship to extend, but, that’s why we have the Markov program.)


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:33
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors