THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, November 23, 2009

Is UZR park adjusted

By Tangotiger, 03:04 PM

Eric makes his case:

If UZR had no park error, this estimate of staff BABIP skill would not correlate with our very reliably calculated Park Factors. But it does so, enormously (r = .47, p = 10^-15). In fact, the best predictor of what UZR thinks is staff BABIP skill is .751 * Park Factor. Which is an awful lot.

I dunno… my head was spinning quite a bit there.  I think you would jsut need to correlate UZR to BPro’s park factor for BIP, much like I have it here.  In that chart, we see that Coors and Fenway were fielding-unfriendly and Dodger and Yankee Stadiums were fielding-friendly.  So, when you run your correlation, if UZR has properly handled the park effect, then the correlation should be close to zero.  Eric howver is reporting a high correlation, but ... to something.  I don’t like the way he says that if you subtract this from that, you are left with the other thing.  Luck is always part of the equation too.

Now, the first thing that jumps out at you is that there’s no way the 2005-6 New York Yankees were both the worst fielding and best BABIP-pitching team in recent memory. They were certainly bad at the former and good at the latter, but the size of the numbers suggests that their UZR for those years was low, maybe way too low, and thus the data is giving their pitchers undeserved credit and Derek Jeter their fielders too much blame.

Equally suspicious are the ‘06-’07 Royals, who are the opposite. The ‘03 A’s, another crazy good-fielding, bad pitching team, are also suspect.

In fact, if UZR were doing a perfect job of separating fielding from BABIP skill (which is precisely what it is attempting to do), these two tables would not correlate at all. In fact, they have a mild inverse correlation (-.18); you can predict the numbers in the second table to a mild but very significant degree by multiplying the first table by .16 and flipping the sign.

I think at the least he’s given us enough to consider in order for us (or MGL) to show that bias does not exist.  If it shows that we have an inverse correlation, then we can be pretty sure that the level of adjustment is not enough.


#1          (see all posts) 2009/11/23 (Mon) @ 15:33

IIRC from trying to duplicate MGL’s work, the toughest things to deal with were irregular outfield areas and parks with massive year-to-year changes in expected performance (Coors.) We’d get real outliers - Manny has great range; Tulowitzki is +6 wins at shortstop (due to us using a 4-year park factor but Coors became less extreme in his rookie year).  Etc…

This is a little off-topic: one thing that I’ve been wondering is if Randy Winn’s defensive prowess in RF is real or if it’s an artifact of the shape of RF in AT&T Park.  He was a league-average CF five years ago; it’s hard to see him turning into the best RF in the league at age 33 (yet still not being good enough to play CF).


#2    Colin Wyers      (see all posts) 2009/11/23 (Mon) @ 15:37

I just briefly skimmed it and will have more later, but… Eric is wrong. (Well - his methods are wrong. If his conclusions are right, well, that’s not because of his method.)

BABIP measures two things:

1) The performance of the defense, and
2) The distribution of BIP.

In other words, if you have two teams of equal defensive ability, you can still expect to see different BABIPs if their BIP distribution differs.

UZR purposfully focuses only on the first point, and ignores the second - it compares the observed defense with an estimate of what an average defense would do given that BIP distribution.

Pitchers may have some control over BIP distribution, but not a lot:

http://www.hardballtimes.com/main/article/whats-a-batted-ball-to-do/

So the idea that what’s being measured here is pitcher BIP “skill” is well off-base, I think.

That said, there are some interesting things here - I’m just not sure yet what they mean. I do want to further examine the idea that there are park biases in UZR. But I don’t think they have anything to do (necessarily) with MGL’s park factors. I think they have to do with the BIS data. Read this discussion between me and Dan Turkenkopf:

http://www.hardballtimes.com/main/blog_article/hr-fb-park-factors/

I may be overplaying a hunch here, but something doesn’t smell right to me.

BenJ, if you’re around, it’d be great to hear from you on this - could you describe the process used to assign games to video scouts to score? Are games randomly assigned to scorers?


#3          (see all posts) 2009/11/23 (Mon) @ 16:58

Colin-

BIS Scorers are assigned “randomly”.  We’re not using a random number generator, but it’s almost as effective.  Scorers have a designated number (Ex. Scorer #11) which are then rotated through different slots in the schedule.  If scorers 7 and 8 are scoring the late (west coast) games one day, they’ll be rotated to early games the next time around.  There’s some miscellaneous switching to accommodate vacation, etc. too.  In the end, everyone’s getting a good mix of every team in every park. 

We also have several different quality control methods in place to make sure that scorers are consistent with their hit locations and types.  We added some new tests this season using the hit timer to flag the batted ball data, so the 2009 data is better than ever.

There shouldn’t be any bias (in theory), but I ran some numbers to check anyways after reading Harry’s article at THT on the topic a few weeks (months?) back.  I was working on designing an appropriate significance test and haven’t gotten back to it yet.

And FYI, I don’t always see every thread here, so in the future just shoot me an email (ben-at-baseballinfosolutions.com) and I’ll be sure to get back to you there.


#4    BenJ      (see all posts) 2009/11/23 (Mon) @ 17:10

now that that’s out of the way… I glanced through Eric’s article.  Does he use UZR with the Outfield Arms, Double Plays, etc. all included?  Also, UZR and Runs Saved weight deep fly balls differently than soft grounders based on potential extra base hits.  PADE doesn’t. 

Without spending too much time on it, I wonder if by “park-adjusting” he’s taking out the extra base hit impact?  That doesn’t seem like something you’d want to do…


#5    BenJ      (see all posts) 2009/11/23 (Mon) @ 17:25

One more thing for you Colin-
BIS gets an almost entirely new set of video scouts each season.  If you’re seeing the same “bias” in the same parks year after year, I can’t see how it would be related to the individual scorer.


#6    Tangotiger      (see all posts) 2009/11/23 (Mon) @ 17:32

"BIS gets an almost entirely new set of video scouts each season”

That’s strange isn’t it?  You mean that there’s an almost 100% turnover?  The commonality then is simply the “BIS University” ?


#7    BenJ      (see all posts) 2009/11/23 (Mon) @ 17:58

Tango/6:

Yes.  The scoring is split between full-time staff and scorers/interns.  There is a new crop of interns each season, though some have been known to return for more than one season.  The full-timers spend a month before Opening Day training the crop of newcomers and quality-checking their games so they’re ready to go once the season starts.


#8          (see all posts) 2009/11/23 (Mon) @ 18:00

By the way, if anyone’s interested in a BIS scoring and charting internship for the 2010 season, shoot me an email and I can get you the info.  Or just find the BIS group at the winter meetings.


#9    MGL      (see all posts) 2009/11/23 (Mon) @ 18:50

I have not read the article yet, but one of my first thoughts is that the park factors that UZR uses, while based on multi-year data, are pretty heavily regressed.  So if you are using unregressed park factors for BABIP and correlating that with UZR, you will see a positive correlation rather than zero, by definition.  Again, I did not read the article yet, so I’m not sure if that is one of the issues.

That being said, my park factors are by no means perfect or even very good, and I am working on some changes for next year, which will be reflected in the FanGraphs UZR data of course.

And yes of course the distribution of batted balls will fluctuate for a team, even in one (or more) seasons, which is why team UZR is a much better reflection of team defense than DER.

Also, there is a year-to-year correlation (around r=.2 for pitchers with at least 500 BIP per season) for PZR for pitchers who change teams, which means that somehow pitchers have some control over their BABIP independent of the distribution of those batted balls.  IOW, if pitcher A has the exact same distribution of batted balls as pitcher B, yet pitcher A somehow allows fewer hits (even thought the number of hits should be the same, given an average defense), at least part of the difference is likely pitcher “skill.” For example, all of the ground balls by pitcher A that are classified as “medium hard” might actually be a little softer than those classified as medium hard by pitcher B.  That seems to be true even after adjusting for pitcher G/F ratios which UZR does.


#10          (see all posts) 2009/11/23 (Mon) @ 19:37

Big update: *it is BP’s PADE which is the source of the error I attributed to UZR.* (I’ve mentioned that on SoSH, but not in that thread!  Which I will now do since it has been bumped back to the main page.)

I took PADE on faith because the methodology is straightforward and they described it well.  But it turns out that PADE is wildly overcorrected, by precisely the factor of 4 that I ascribed to UZR. And it is not consistent about it; my preliminary figures for 2009 correlate with theirs only decently.

Lesson: if there is a discrepancy between two metrics, don’t assume that it’s the simple one that must be accurate! (I refrain from making any generalization about the reliability of any one source of analysis ...)

I am in the process of generating my own set of park adjustments for DE, for the 8 years that we’ve had UZR data.  I’m waiting for Sean Foreman to fix a bug in the Play Index for ROE vs. pitchers, and then I have to write some VBA code to grab the necessary data (which means re-learning VBA, so this might take me another month or more).

I do think the findings concerning the outlier teams (Yankees and Royals) will hold up when I redo all the work.

And I think that what I’ve found concerning Y2Y correlations is very interesting and may end up challenging our assumptions about the nature of defensive data.  Preliminary versions of those findings are in the thread at SoSH.


#11    Rally      (see all posts) 2009/11/23 (Mon) @ 21:04

Eric, wouldn’t it just be easier to get ROE vs pitchers from retrosheet event files?

Sounds a lot easier than code to grab data from play index, and is encouraged, rather than against the terms of service.


#12    MGL      (see all posts) 2009/11/23 (Mon) @ 21:04

I quickly read through the SOSH thread.  Interesting stuff that will take a while to digest.

One thought:  If in fact true fielding skill can change dramatically from one year to the next, much more so than, say, batting skill, that can obviously have a profound impact on how many years and with what weights one uses to construct either a projection or an estimate of the average true talent over that time period or any time period that the data encompasses.

That being said, while it makes sense that defensive talent is more sensitive to age, injury, weight, and perhaps one or two other things, I doubt that there is a huge fluctuation in true defensive talent from year to year (or month to month or whatever).

Of course, we should be able to estimate that by comparing the variance across time with that expected by chance (assuming that true talent remains the same), although the integrity of the data (and the methodology, like UZR, which produces the “output” data) may make that difficult to do.

Gotta also be careful about inferring too much from team data, because a “team” is actually a bunch of individuals, and that makes a big difference in terms of the impact of sample size.

For example, let’s say that you did y-t-y correlations for anything (UZR, DER, OPS) for teams and then did the same thing with individual players.  Do you think that you will get similar “r’s” (correlations) given similar sample sizes?  Not even close!  1000 team PA for example might yield an “r” of .2 for a team and .7 for an individual player.

Why is that?  Because the magnitude of “r” is a function of two things: Sample size and the magnitude of the spread in true talent in the population.  The spread in true talent in OPS (or UZR or whatever) is far greater among the population of all players than among all teams.

One thing that concern me at first glance is the number of correlations that Eric does without any reference to the confidence intervals.  While it might be OK to draw some “conclusions” and be reasonably satisfied with them based on one or two statistical outcomes, once you start working with dozens of them, as Eric is doing, you are bound to have some of them create Type I and Type II errors such that a certain significant percentage of your “conclusions” will be just plain wrong.

It is kind of like this. Let’s say that I generate 1 statistical number and based on that I am 90% confident of a certain conclusion. I make that conclusion and everyone is happy, even though there is a 10% chance that I am making an error and my conclusion is wrong.

But what if I generate 100 such numbers and make 100 conclusions.  Now all of a sudden, we know that on the average 10 if them will be wrong (assuming everything is independent of course).  Well, we are not happy anymore.

Anyway, I bring that up not to criticize Eric’s work, but to point out that he throws out a lot of numbers ("r") and draws a lot of conclusions or at least inferences from those numbers, when it is likely that a lot of them are suffering from sample error.  I would at the very least like to see confidence intervals around all those “r’s”. The average person might be surprised at how large they can be when doing regressions and dealing with samples of less then several hundred.


#13          (see all posts) 2009/11/24 (Tue) @ 10:50

Great feedback, MGL.

I try to report p values whenever I report r, but I guess I’m not as consistent about it as I think I am!  The r values for player Y2Y correlations, though, are based on lots of data and would have extremely low p values, so there would be no need for a Bonferroni-style correction (the need for which you describe well, and the fact that I know what it’s called is, I hope, evidence that it’s always part of my thinking!) Nevertheless, when I do my final version I’ll be sure to include them.

If someone could point me at the best public-domain Retrosheet parser (ideally one that interfaces with Excel or Foxpro), that would be very helpful.  I’m a laggard when it comes to working with that data.

My latest take on all this (with some more cool findings) should be on SoSH later today (it’s apparently too long to work here). The biggest news is that there is no correlation between UZR (Range + Err) and the leftover part of (unadjusted) DE, *once you remove the five extreme outlier team years.* There definitely seems to be a freaky bug in UZR which caused those Yankee and Royal teams to come out wrong (and maybe the one year of the A’s), but there is no evidence for a systematic weakness in separating fielding from non-fielding aspects of DE.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 03 23:19
Susan G. Komen

Feb 03 23:03
Danks or Garza?  ToMAYto, ToMAHto?

Feb 03 20:18
Aasif Mavi and The Daily Show

Feb 03 20:06
Werth: How long can a non-CF stay in CF?

Feb 03 19:54
Illusion of numbers

Feb 03 18:02
Knowing enough about numbers to be dangerous

Feb 03 16:36
Who’s evaluating the 2011 forecasts this year?

Feb 03 13:47
Are relievers being used optimally, compared to 1980?

Feb 03 13:00
Casey Kotchman line

Feb 03 12:11
ULTIMATE BASEBALL THE GAME