THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, June 01, 2007

UZR, 2007, complete list

By Tangotiger, 11:37 AM

Here you go, courtesy of MGL.

expOuts: number of outs made by an average fielder, given this fielder’s
- ball in play distribution (location, trajectory, hardness of hit)
- park
- pitcher’s GB/FB tendency
- runners on base and outs in inning

G: expOuts divided by average number of expOuts per game for that position.  For example, the average LF makes 2 outs per game.  If Manny has 86 expOuts, then he’s got around 43 “games’.

UPDATE:
And here’s the complete UZR, 2003-2007:
http://www.tangotiger.net/mgl/UZR0307.zip


#1    Rally      (see all posts) 2007/06/01 (Fri) @ 13:27

Awesome.  Thanks for bringing this back.


#2    Tangotiger      (see all posts) 2007/06/01 (Fri) @ 14:24

John Dewan, one week ago, posted this:
http://actasports.com/sows.php

Adam Everett, Hou +11
Tony Pena, KC +9
John McDonald, Tor +7
Julio Lugo, Bos +5
Troy Tulowitzki, Col +5
J.J. Hardy, Mil +5
Vizquel is at a respectable +2 so far while last year’s American League Gold Glover, Derek Jeter, is at -9, second worst in MLB at shortstop to Hanley Ramirez at -10.

MGL and Dewan agree on all at within 4 runs.  The average difference is about 2 runs.

Seems like they both see things the same way.


#3    Mike Green      (see all posts) 2007/06/01 (Fri) @ 14:29

Thanks.  I guess the Dodgers are going to have to move Jeff Kent.  He’s had a great run, but there does come a time…


#4    JinAZ      (see all posts) 2007/06/01 (Fri) @ 14:41

Thanks for this!  Just in time for some Reds stuff I was about to do. -j


#5    Tangotiger      (see all posts) 2007/06/01 (Fri) @ 14:52

Some Mariner fans are unbelieving of Betancourt.  It should be pointed out that MGL has him at +1 for range, and -4 for errors.  (Kent is worst in the league at -5 for errors).

Betancourt has 12 errors in 413 IP (equivalent of 46 games).  Per 162 GP, that’s 42 errors.  That’s pretty horrible.

I don’t see any reason to doubt the -4 runs on errors.


#6    Jeff Sullivan      (see all posts) 2007/06/01 (Fri) @ 15:05

I don’t think anyone’s questioning Betancourt’s error value - his throws have been awful this year. The issue seems to be his +1 range, since the general consensus is that he covers as much ground as any shortstop in baseball.


#7    Tangotiger      (see all posts) 2007/06/01 (Fri) @ 15:41

http://sportsillustrated.cnn.com/baseball/mlb/stats/2007/fielding/ml_6_byCHANCES.html

Betancourt has 165 “balls in zone of responsibility” (chances) in 413 IP.

I don’t know what the heck is wrong with the CNNSI site (it lists all players, not just SS), but Adam Everett is right there with 164 chances in 416 IP.

So, both guys have had about an equal number of balls to handle.  (We can’t tell by this list if one guy faced tougher balls, but that’s what UZR is for.) Anyway, Betancourt has 144 assists, and Everett has 156.  On this basis alone, Everett made 12 more plays, or 10 runs.

If you do in-zone-chances x ZR, Everett is ahead by 6 plays for the in-zone plays.  The out-of-zone plays, Everett is ahead.

A .830 ZR seems to be around league average for a SS.  UZR seems to reflect whatever data is being recorded.  That is, Betancourt’s numbers are around league average, and Everett is around +10 over league average.

It’ll be interesting to see if Mariner fans are still going to be high on him, as he marches toward Robin Yount’s 44 E in 1975, or Jose Offerman’s 42 E in 1992.

Ozzie Smith’s career high in E (25) still allowed him to be better than league average in E per chances.  In fact, in every season (from age 23 to age 41, except age 40), Ozzie Smith made fewer E per chance than the league average SS.  And at age 40, he was 1 error worse than average.

So, we have to question why are the numbers not following Betancourt at this point?


#8    Edgar for Pres      (see all posts) 2007/06/01 (Fri) @ 15:50

Another M’s fan.  I’ll grant you that Betancourt won’t be a legit SS gold glove until he can stop sailing throws.  I am very surprised to see Beltre at zero runs.  In my mind he might be the best defensive 3B (probably some bias) but he’s got to be at least quite a bit above average.


#9    Tangotiger      (see all posts) 2007/06/01 (Fri) @ 16:25

UZR loves Beltre, historically.

I do want to point out something: Scott Rolen was forecast by everyone with a .500 slugging, while right now, he’s under .400.  Scott Rolen, barring injury, *is* a true .500 slugger, even if he’s “showing” .390.  (Of course, we need to daily revise his .500 to incorporate his .390, but it won’t change as much as you’d think.)

So, even if Beltre is showing 0 runs right now, he is still an above average fielder.


#10    philly      (see all posts) 2007/06/01 (Fri) @ 16:53

Speaking of Rolen… does the fact that his UZR is still so good impact our offensive expectations the rest of the way?

I think Rolen will end up with a fine year as he’s been pretty hot the last couple of weeks anyway.  That hot stretch + his good defense (by observation) has led me to beleive that he was mostly just in a terrible slump and not in the throes of injury related decline.

But in general, could we say that if we have two groups of underperforming hitters and half are playing to defensive projections and half are not, that we can be more confident that the former will bounce back? 

Or would the in-season data just be too noisy?


#11    Tangotiger      (see all posts) 2007/06/01 (Fri) @ 16:56

I updated the main blog entry.  You can now download the full UZR, 2003-2007.


#12    Tangotiger      (see all posts) 2007/06/01 (Fri) @ 17:03

philly, that’s a good question.

After all, one can infer, to a point, that the poor production can be related to injury.  If a guy is underperforming (say by 2 SD) in offense *and* in defense, that might tell you something.  If he’s +2SD in one case, and -2SD in the other case, that might be less likely due to injury.

As for sample size, we know what the binomial is.  If a guy has a “true” .400 OBP, and he’s currently at .300 after 200 PA, 1 SD = .035.  So, that .300 performance is 3 SD below.  If his true OBP was .370, his performance would be 2 SD from there.

So, a bad performance does alter your perception, but, you’ll still stick closer to what you originally thought when the season started.


#13    SG      (see all posts) 2007/06/01 (Fri) @ 17:21

A .830 ZR seems to be around league average for a SS.

Through yesterday, I’ve got the AL average SS at a ZR of .831, NL average at .839.  I think you have to separate the leagues, because I think that balls in play by pitchers tend to skew some infield numbers.


#14    Chris Miller      (see all posts) 2007/06/01 (Fri) @ 17:44

#13, pitchers give up more GB than DH’s, and the GB are probably (but I can’t verify at the moment) easier to field, so maybe that’s why.  It’d probably be easy to figure out with retrosheet data.  I’m not sure what the y-t-y ZR has been for different Infield positions though, so maybe it’s just being caused by early season noise.


#15    Tangotiger      (see all posts) 2007/06/01 (Fri) @ 17:58

I did my old matching exercise: looked at guys who played both LF and RF (weighted by the lesser of their games), and see how their UZR compare.

Of the 6549 matching games, the UZR for LF was +0.8, while for RF it was -0.2.  This difference, 1.0 runs, implies that you have slightly weaker fielders in LF, for 2003-2007. 

I then compared CF to each of LF (5026 games) and RF (4048 games).  The average CF is +12 runs above the average LF and RF.

So, in terms of “position difficulty” or whatever you want to call it:
CF: 12 runs
RF: 0 runs
LF: 0 runs

***

Next up is the 2B/SS/3B.  The average SS is 4 runs better than the average 2B (based on 4056 games), and 3 runs better than the average 3B (2338 games).  The average 2B is 3.5 runs better than the average 3B (3147 games). 

This is the typical inconsistent pattern I always get.  From the SS perspective, the average 2B and 3B fielders are equals.

From the 2B perspective, they are at the midpoint between SS (above) and 3B (below).

From the 3B perspective, 2B and SS are equally above them.

I end up resolving it this way:
SS: 4
2B: 2
3B: 0

***

The 1B is only 5 runs behind the average 2B/SS/3B.

The 1B is also 8 runs behind the average OF.

***

This next part fails the smell test.

Forcing the 1B at 0, gives us the following:
SS: 7
2B: 5
3B: 3
1B: 0

CF: 16
LF: 4
RF: 4
1B: 0

Subtracting 6 from everyone, and we get:
CF: 10
SS: 1
2B: -1
RF: -2
LF: -2
3B: -3
1B: -6

I know, I know.  It doesn’t make any sense.

***

The things that do make sense, the most, are the intra-OF comps.  In short, when comparing the CF to the corner OF positions, give an extra +1 win to the CF.  This puts them all on the same scale.

The SS is barely ahead of either 2B or 3B, which certainly seems wrong.  We’d expect a much larger gap.  This may be due to selective sampling.  The average UZR of the 2B and 3B who also play SS is 0.  This means that you don’t really have poor fielding 2B or 3B who move over to SS, where they’d be really exposed.

***

Then you have the issue of using 1B as a centering point between the other IF and OF.  It sure make it look like the average OF and average IF are similar.

I also looked at 2B+3B and how they did in LF+RF.  On average, they posted roughly the same UZR (-3 in IF and OF).  Of course, the direction is uni.  IF play OF, and not the other way around.  So, there may be some familiarity/experience issue here.  We could grant say a 3-5 run factor for this.

***

So, what we can come up with is this:
SS: 7
2B: 2
3B: 0

CF: 8
LF: -4
RF: -4

1B: -9

So, IF-wise, we give a little boost to the SS.
OF-wise, we keep them relatively the same (CF, 12 runs ahead of the corners).

IF-OF, we make LF/RF 5 runs worse than 2B/3B.

In short, we get these positional-adjustments
SS/CF: +0.5 wins
2B/3B: 0 wins
LF/RF: -0.5 wins
1B: -1 win

We’d give catchers +1 win to balance it all out.


#16    traced.out      (see all posts) 2007/06/01 (Fri) @ 18:28

Wow, I had no idea Lowell and Drew were doing so poorly - haven’t they always been stellar?


#17    Anthony      (see all posts) 2007/06/01 (Fri) @ 19:05

Just so you know, in the UZR spreadsheet, Griffey is listed as “Griffey, Ken” for 2005-2007 and “Griffey Jr., Ken” for 2003-2004. There end up being two lines for him on the totals sheet (1984 and 1995, embarrassingly enough). Looks like combining them will push him past Manny for worst total runs.


#18    MGL      (see all posts) 2007/06/01 (Fri) @ 19:58

For those that are “surprised” at some of the 07 ratings, remember that it is based on an embarrassingly small sample of performance.  Let your eyes and brain focus on the total runs column rather than the runs per 150.  Looking at non-regressed UZR runs per 150 is like looking at a column that says 80 HR’s per season for A-Rod, and 55 HR per season for Hardy, or whatever it is.

Given the small sample, I really doubt that you can infer much of anything about overall health.  Plus batting health is not necessarily the same as fielding health of course.  But even if it were, and I am sure they are strongly correlated, I wouldn’t make much of the fact, for example, that Rolen has been good (presumably) defensively, and bad offensively.

Also, about Betancourt, keep in mind that while we think, and it makes intuitive sense, that a fielder’s error rate is a strong skill, there is a very weak y-t-y correlation in error rates per chance.  If someone can do a quick y-t-y correlation, separately for IF and OF, that would be great.


#19    tangotiger      (see all posts) 2007/06/01 (Fri) @ 21:11

Anthony/17: thanks.  I normalized the data now, so that won’t be an issue.  I’ll republish on Monday.


#20    tangotiger      (see all posts) 2007/06/01 (Fri) @ 21:26

Applying the positional adjustments noted earlier (+5 runs to SS, CF, -5 to LF/RF, -10 to 1B), here are the top 20 and bottom 10 in total runs, 2003-2007, unregressed:

runs_pos_neutral runs_pos_neutral_per150G Name
129 37 Everett, Adam
79 19 Beltran, Carlos
78 25 Counsell, Craig
69 27 Sizemore, Grady
66 17 Wells, Vernon
59 18 Polanco, Placido
59 20 Rowand, Aaron
58 16 Rolen, Scott
58 16 Cameron, Mike
56 16 Uribe, Juan
54 20 Perez, Neifi
53 15 Hunter, Torii
53 16 Matthews, Gary
53 12 Beltre, Adrian
52 15 Feliz, Pedro
52 18 Patterson, Corey
52 14 Eckstein, David
52 12 Hudson, Orlando
51 19 Ellis, Mark
51 16 Crisp, Coco

***

-58 -16 Matsui, Hideki
-60 -18 Wigginton, Ty
-60 -32 Cantu, Jorge
-64 -31 Pena, Wily Mo
-64 -17 Berkman, Lance
-67 -18 Sexson, Richie
-71 -18 Dunn, Adam
-73 -16 Young, Michael
-109 -41 Griffey Jr., Ken
-128 -33 Ramirez, Manny

***

Jeter is -8 per 150G, relative to the average fielder (not SS).

As you can see, UZR thinks that Polanco is 26 runs above Jeter, fielding-wise.  Regressed, that’s probably a bit over 20 runs.  That’s the equivalent of 35 OBP points and 50 SLG points.


#21    Rally      (see all posts) 2007/06/02 (Sat) @ 00:11

I’ve added up the team totals here and compared to the plus minus on THT.

http://mvn.com/mlb-stats/2007/06/01/uzr-data-for-the-last-4-13-years/


#22    studes      (see all posts) 2007/06/02 (Sat) @ 09:38

Truly awesome.  Thank you, mgl.

What do you think is up with the Cubs’ center field?  They have been very positive each of the past three years, despite having a different center fielder out there each year.  (or, a bunch this year alone, each with a positive UZR).  Pierre had his best year in Wrigley.  Is there some weird park effect going on, perhaps?

I believe you already include park effects in UZR, don’t you?  So maybe it’s just a random fluke.


#23    Rally      (see all posts) 2007/06/02 (Sat) @ 10:30

This year, its the Cubs whole outfield.  Even Cliff Floyd, though Floyd has rated surprisingly well by a lot of systems the last few years.


#24    Vlad      (see all posts) 2007/06/02 (Sat) @ 11:38

No Ron Paulino in the spreadsheet? Or am I just looking right past him?


#25    Pizza Cutter      (see all posts) 2007/06/02 (Sat) @ 11:59

As sort of requested by MGL/18: Intra-class correlations (read it like y-t-y, only it’s a better method) over all four complete years (2003-2006) for Runs/150G, limited to players who appeared in > 10 games per position-year.

Everyone: .377
Infielders: .364
Outfielders: .383

1B: .246
2B: .440
3B: .241
SS: .521
LF: .466
CF: .412
RF: .249


#26    MGL      (see all posts) 2007/06/02 (Sat) @ 12:44

PC, thanks.  I was actually referring to the error runs only (fielding plus non-fielding, or each one separately).


#27    Pizza Cutter      (see all posts) 2007/06/02 (Sat) @ 13:38

Oops… same parameters as above in #25 for fielding errors.

Everyone: .244
Infielders: .266
Outfielders: .156

For “other” errors.

Everyone: .074
Infielders: .047
Outfielders: .092


#28    studes      (see all posts) 2007/06/02 (Sat) @ 14:20

This year, its the Cubs whole outfield.

In the past, too.  Jeromy Burnitz, for one example, had his only good year when he was in right for the Cubs.  According to the BIS data I’ve analyzed in the past, Wrigley isn’t a good place for a flyball pitcher (and not only because of the home runs).  I wonder what’s up?


#29    MGL      (see all posts) 2007/06/02 (Sat) @ 15:48

Probably the wind.  The park factors should really make everyone’s UZR at any one park “fair.” IOW, the park adjustments should make all players’ home UZR roughly equal to their road UZR, minus any home park advantage in fielding.  Unless perhaps there is a large park factor at Wrigley because of the wind and I regressed it too much towards 1.0.  I don’t think so, as it is one of the few parks that has not had any changes for many, many years so that I use lots of data for the park factors and consequently would not regress them very much.  I have to check though.


#30    tangotiger      (see all posts) 2007/06/02 (Sat) @ 16:22

Pizza: assuming you didn’t merge players like Abreu Phi/NYY in 2006, I get that the average number of G is either:
a. 57
b. 28

a. is just the straight average
b. is 1/average(1/g) ... this is usually the way Andy calculates it… it makes sense the way he explains it

Anyway, Pizza calculates the r=.38.  Remembering that r = G/(G+x), then x = 93 or 46, depending which of the two ways you calculate the mean.

This means, to regress 50% toward the mean, you need 46 games or 93 games.  (When I typically run this, I usually get close to 100 games)

Note the following though: a game at SS tells you more than a game at 1B.  That’s why, I convert a “game” to “BIP”, by using the following BIP/g multipliers:
SS, 2B: 5
CF, 3B: 4
LF, RF: 3
1B: 2 or 3

Pizza, can you rerun, but this time using the BIP approximation as I’m showing it here for your n, rather than G?


#31    Mike      (see all posts) 2007/06/02 (Sat) @ 16:31

I know it’s early and I’m not basing any conclusions from the first 1/3 of w/e of the season...but J.D. Drew has looked awful at the plate and on defense.  About the only thing he does good is see a lot of pitches and walk.  I swear he’s rolled over to 1st or 2nd the last 20 at-bats...and MGL’s rating on his defense seems to be about right.  What happened to the all-star defense we were supposed to get from this guy?  He’s come up short on several plays that would appear catchable by a very good defensive outfielder.  The only thing I can maybe conclude is that he’s playing hurt.


#32    MGL      (see all posts) 2007/06/02 (Sat) @ 17:50

Tango, I don’t know how you get the BIP/games, but I think they are too high.  For purposes of a binomial, we want to define chances as something potentially catchable.  I don’t think those numbers are catchable balls.  For example, for every ball that is barely or not catchable to a particular fielder of fielding position, we are artifically increasing the sample size with no benefit to the robustness of the data.

Also, if you figure some kind of correlation and you are using players who have anyway from 20 to 500 chances for each datapoint, that is problematic.  What do you use for your number of chances?  The average in the group?  That is not good, as the ones who are way less than average are going to bring down your “r” more than the ones that are above average, I think.  I’m not sure how to handle that though, other than to use a threshold that is high enough that most of the players in your dataset have near the same number of chances.  IOW, use mostly full time players.  Of course, if you do that, you greatly reduce your sample of datapoints and he confidence around your resultant “r” is very wide.  I’m confident that there is a statistical way to maximize the integrity of the “r” for a given number of chances per data point and also to maximize the number of data points used, but I don’t know what that is.


#33    jto      (see all posts) 2007/06/02 (Sat) @ 19:00

Tango,
When you say someone is +20 runs defensively compared to Jeter.  How did you make the translation to 35 points of OBP and 50 slugging?  Thanks.


#34    MGL      (see all posts) 2007/06/02 (Sat) @ 19:57

jto, I think there is a recent thread below which talks about how to translate OBP and other rate metrics into runs.  And of course, if that 20 runs is per 150 defensive games, you would use around 630 PA for the average player (150 games) I think.


#35    MGL      (see all posts) 2007/06/02 (Sat) @ 20:00

There is absolutely no doubt in my mind that Jeter is THE most overrated player in baseball and it is not even close.  He is certainly one of the most overpaid, even for a Yankee, perhaps THE most overpaid, not including players who got a big contract and then got hurt.


#36    MGL      (see all posts) 2007/06/02 (Sat) @ 20:50

I did a regular linear regression and y-t-y correlation for errors.  I used all players with at least 100 chances per season and regressed season 99 on 00, 01 on 02, 03 on 04, and 05 on 06.

Here are the “r’s”:

All errors (ROE and other)

All fielders: .301 (N=613) Average # chances=259
IF: .303 (367) chances=265
OF: .293 (246) chances=251

ROE errors only

All fielders: .249 (N=613) Average # chances=259
IF: .249 (367) chances=265
OF: .221 (246) chances=251

“Other” (Non-ROE) errors only

All fielders: .060 (N=613) Average # chances=259
IF: .080 (367) chances=265
OF: .055 (246) chances=251

So the 50% regression for fielding errors is around 600 chances or around 2 seasons for a middle fielder and 3 seasons for a corner fielder.


#37    Tangotiger      (see all posts) 2007/06/04 (Mon) @ 10:42

MGL, I don’t see the problem with doing regressions of data points of differing sample sizes.  Just weight them by that much.  I’m pretty sure Andy goes through the exercise in the Appendix or the end of the Toolshed chapter.

***

As for my using the total number of BIP to determine chances (when I really should be excluding the noise ones—no man’s land and extremely routine), sure I could do that.  But, then I’d have to justify why I use what I use.

For example, right now, I’m using 5 chances per game for SS and 2B.  Even though they both get the same number of outs, 2B probably have more routine outs than SS.  You could say that perhaps I should use 4.5 for SS and 4.0 for 2B.


#38    Mike Green      (see all posts) 2007/06/04 (Mon) @ 11:53

Tango,

The matching exercise is probably not the optimal way to perform positional adjustments, because of age-related position change.  Shortstops move to second and third disproportionately when they are older.  Comparing a 25 year old shortstop season with a 31 year old second or third baseman season has its difficulties.  Similarly, with outfield to first base conversions. 

In the result, I have the most difficulty with the centerfield vs. second base adjustments.


#39    tangotiger      (see all posts) 2007/06/04 (Mon) @ 12:17

The dataset I’m looking at is 2003-2007.  Therefore, age-related issues would be mitigated.

As well, most position switches are not “switches”, but playing multiple positions.  This would definitely be the case for LF/RF, and mostly be the case for CF/cornerOF.

I’m sure if I look at the list of SS/2B players, it would be a small sample of players that played SS mostly exclusively in 2003/2004, and then played 2B mostly exclusively 2005/2006.

Even then, as I said, the age-based dropoff of the 27/28 yr old SS to then become the 29/30 yr old 2B would be fairly tiny.

It is a fair point though, one that deserves hard numbers.


#40    Rally      (see all posts) 2007/06/04 (Mon) @ 14:49

You can handle an aging bias by only looking at players who play mutlipe positions in one season.


#41    Tangotiger      (see all posts) 2007/06/04 (Mon) @ 15:33

I figured out the average year for each of the multiple position players, by weighting them by number of games.  For example, if you had 100 games at 2B in 2003, and 0 through 2007, your “year” was 2003.  If on the other hand you had an equal number of games in each year, your “year” would be 2005.

The largest gap was with the SS/1B dual positions.  The average year of the dual players at SS was 2003.9, while they were in season 2005.3 at 1B.  Makes perfect sense, since we obviously don’t expect any 1B to SS moves.

The next largest gap was SS/CF, with the dual players being in SS in year 2004.3 and in CF in year 2005.4. 

Every other dual position players differed by 0.5 years or less.

2B/SS specifically differed by 0.1 years.  In effect, these players played both positions at the same time.

2B/3B were the same age.

3B/SS were of 0.4 years difference, in the direction you would expect.

The LF/RF, CF/LF, CF/RF matches were all within 0.2 years.

1B were all a little bit older than at other positions (i.e., players shift to 1B), except for CF (thank you, Darin Erstad).


#42    MGL      (see all posts) 2007/06/05 (Tue) @ 05:54

You could do age adjustments, which might be -2 runs a year or so, IIRC.  That might be for 300 or so balls fielded, so you would have to adjust that for different positions since an age adjustment should be a “rate” adjustment of course.

Tango, I think that a player’s opps should be the number of balls that the best fielder would make given his distribution of BIP.  I think that is reasonable.  Would you agree?  From the standpoint of the denominator of a binomial, trying to figure opps is a tricky thing.  I’d have to think about it a little more.  Maybe one of the statisticians can chime in.  For a batter, we know that when a ball is in play, those are the opps where a hit is a success.  What is the equivalent for a fielder?  Obviously fielding is a conglomeration of many different opps with different “p’s” and different N’s.  An easy ball to field is obviously one opp with a p of near 1.0.  The more difficult ones to field are also opps with p’s anywhere from less than 1.0 to .01 or even less.  Actually given that, opps would be a lot more balls than the best fielder would field.  A lot more.  But we have 2 issues.  One is that I don’t know how to represent the numerator and denominator for a binomial which is really a conglomeration of a lot of different binomials, as I explain above.  Although I suppose we could say the same thing about hitting. Each type of pitch actually has a different p.  Anyway, the second issue is trying to figure out all of the N’s combined for a total number of opps.

So let’s say that we knew a fielder had 10 balls that he made .9 of the time, 10 balls that he made .50 of the time, and 10 balls that he made .1 of the time.  How would we figure the random SD of his result?  Can we use one binomial of N=30 and p=.5 (for a SD of .0913)?  Would the rigorous result be close to that?  If we take the individual variances of all 3 separate binomials and average them and then take the square root, we get .09915, a little higher than just using one binomial with p =.5.  That sounds about right to me.

Is that the way to figure out the SD of a number of trials, N, where p may be different for each trial?  Figure the mean variance for all the trials and then take the square root of that?


#43    tangotiger      (see all posts) 2007/06/05 (Tue) @ 10:24

MGL, that’s a good point.  Let’s start off with this sample for SS:

BIP ZR outs
100 0.99999 99.999
100 0.95000 95
100 0.90000 90
100 0.80000 80
100 0.70000 70
100 0.50000 50
100 0.30000 30
100 0.10000 10
3600 0.00001 0.036

The first line says: for a particular set of 100 BIP, the average SS will convert it to an out 99.999% of the time.

The total comes out to 4400 BIP, and 525.035 outs.

How much is 1 SD using the above?

So, can one of you geniuses solve for realBIP:
realZR = 525.035/realBIP
oneSD = sqrt(realZR*(1-realZR)/realBIP)

realBIP should be somewhere between 650 and 800 or so.


#44    Tangotiger      (see all posts) 2007/06/05 (Tue) @ 11:02

I’m just putzing around, but I get a realBIP=658, which implies a realZR of .800.

I calculated ZR*(1-ZR)*BIP for each list item, added them all up (I get 106), and took the square root (1 SD = 10.3).

In order to match that, I made realBIP=658.

Does this make sense?


#45    Tangotiger      (see all posts) 2007/06/05 (Tue) @ 11:34

I tried a different distribution of balls hit:
BIP ZR
130 0.99999
120 0.95000
110 0.90000
100 0.80000
95 0.70000
90 0.50000
85 0.30000
80 0.10000
3590 0.00001

Removing the last line, and the average ZR is .701.  So, I think this distribution makes more sense.

Anyway, the realZR now comes out to .826.

Coincidentally or not, it seems that the DER for a team (around .700), if you take its square root, will give you the realZR we are interested in.

I tried a few different combinations, and they came out pretty close.

***

To get back to MGL’s point then, if the “expected outs” is say 500, then the “probable BIP” should be 500/.83 = 600.


#46    Tangotiger      (see all posts) 2007/06/05 (Tue) @ 16:45

In light of our discussion, this is what I propose.

pos minOuts actOuts neutOuts maxOuts
2 2.35 3.00 2.92 3.49
3 1.09 1.28 1.36 1.62
4 2.00 2.48 2.48 2.97
5 1.53 1.90 1.90 2.28
6 2.08 2.63 2.59 3.09
7 1.53 1.86 1.90 2.27
8 2.00 2.53 2.49 2.98
9 1.62 1.97 2.01 2.40

The SS line says the following:
The fewest number of plays that the worst fielder would make at SS is 2.08 outs.
The 2003-2007 MLB average, according to UZR, was 2.63 outs.
An average MLB player from any position, would record 2.59 at SS.
The maximum outs to make at SS is 3.09.

(I threw in a line for catchers.)

Where did I get all these numbers?  This is what the chart looks like if I divide it by “maxOuts”:
pos minZR actZR neutZR maxZR
2 0.673 0.859 0.837 1.000
3 0.673 0.789 0.837 1.000
4 0.673 0.837 0.837 1.000
5 0.673 0.837 0.837 1.000
6 0.673 0.849 0.837 1.000
7 0.673 0.820 0.837 1.000
8 0.673 0.850 0.837 1.000
9 0.673 0.821 0.837 1.000

As you can see, “maxOuts” is really opportunities, or chances, or BIP.  minZR was fixed at 67.3%, meaning that the worst fielder will always make at least 67.3% of the plays.  neutZR was fixed at 83.7% (or square root of 70%), meaning that plop the average fielder (say Willie F Bloomquist) at any position, and he’ll make 83.7% of the “available” plays.

actZR is the actual MLB average for that position.  As you can see, the best fielders are at C, SS, CF, and the worst are at 1B.

Now, here’s the first or second chart, but converted into runs:
pos minRuns actRuns neutRuns maxRuns
2 -74 10 0 74
3 -34 -10 0 34
4 -63 0 0 63
5 -48 0 0 48
6 -65 5 0 65
7 -48 -5 0 48
8 -63 5 0 63
9 -51 -5 0 51

This tells us that the actual MLB SS is 5 runs better than the average MLB fielder.  The worst possible fielder in MLB would be -65 runs at SS.  The perfect fielder would be +65 runs.

minRuns and maxRuns are opposites.

If you focus on the 1B line, this tells us the following: the worst possible 1B (-34 runs relative to neutral fielder) is 24 runs worse than the actual MLB average 1B (-10 runs relative to neutral fielder).

So, every fielder gets his own “neutral ZR” number, like Everett might be .920, and Frank Thomas might be .700.  Then, using those numbers, we can figure out how they would be at any position.

You’d probably want to knock off a certain amount, say .030 or .050, if you have a LH playing 2B/SS/3B.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 03 22:02
Susan G. Komen

Feb 03 20:18
Aasif Mavi and The Daily Show

Feb 03 20:06
Werth: How long can a non-CF stay in CF?

Feb 03 19:54
Illusion of numbers

Feb 03 18:02
Knowing enough about numbers to be dangerous

Feb 03 16:36
Who’s evaluating the 2011 forecasts this year?

Feb 03 13:47
Are relievers being used optimally, compared to 1980?

Feb 03 13:00
Casey Kotchman line

Feb 03 12:11
ULTIMATE BASEBALL THE GAME

Feb 03 12:03
Tango, Jr.