THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Sunday, December 07, 2008

UZR on Fangraphs: using BIS on Ichiro

By Tangotiger, 10:09 PM

If this isn’t the greatest news since the launch of Fangraphs itself: UZR, going back to 2002, available on Fangraphs.  The very first guy I went to was Ichiro.

UZR, using BIS as the source, since 2003, has Ichiro as +33 runs in RF (+7 runs per season) and 0 in CF.  Using STATS as the source, he’s +14 in RF (+3 per season) and -24 in CF (-13 per season).

That, ladies and gentlemen, is the reason that fielding metrics are considered so untrustworthy.


SabermetricsDataFielding
#1    devil_fingers      (see all posts) 2008/12/07 (Sun) @ 22:41

Thanks again, all involved. It is great news…

Re BIS vs STATS: what should a relative novice like myself say to people when they say that discrepancies such as the above mean that fielding metrics are “worthless.”


#2    Peter Jensen      (see all posts) 2008/12/08 (Mon) @ 01:12

Fielding metrics aren’t worthless or untrustworthy.  Observational fielding data is untrustworthy.  With better data the best metrics would be just fine as they are structured now.  For now they give as good results as they can with the data they have.  Something is better than nothing.


#3    Trev      (see all posts) 2008/12/08 (Mon) @ 04:44

Thank you MGL for making this available.


#4    MGL      (see all posts) 2008/12/08 (Mon) @ 05:07

#3, sure no problem.  As I said in a previous post, it is a little disturbing to me that there are some significant differences in the UZR results, using two different data sets that are both supposedly very reliable.  (Although I don’t know that it SHOULD be disturbing.) The correlation ("r") between STATS UZR and BIS UZR for 05-08 for all players with at least 50 defensive games was .712 (620 data pairs).  For players with at least 100 games, it was .727 (517 data pairs).  Seems kind of low, but I am not sure what it “should” be.  And maybe if I compare aggregate data from 2, 3, or 4 years (say, a min of 250 games), that the correlation will be .9 or so.

I’m really not sure what that says.  As I just said, if both observational systems and data sets were equally accurate and reliable AND each one was accurate and reliable by itself (by whatever standard you choose), I have no idea what the correlations SHOULD be (given a reasonable measurement error).  It does suggest, of course, that recording the data accurately is not as easy and straightforward as one might think.

In any case, I think that a UZR which simply averages the two would be much better than either one by itself, again, assuming that both data sets are roughly as “good” as one another.

Ichiro was cherry picked by Tango, because his STATS UZR results, especially in CF, did not comport with what we think we “know” about his fielding (according to his speed, “the fans”, and traditional scouting reports and CW).  So it is merely an anecdote.  If we look at either UZR (STATS or BIS), and choose a result that do not comport with “the fans” or with “common sense” (given what we think we “know” about a player), it is likely that the “other” system’s UZR will be different.

For example, here is the first player I looked at whose BIS UZR did not “jive” with what I think I know about the player:

Garret Anderson was +12 per 150 in 82 games in LF in 2008.  That seemed too high, especially for his age and speed.  Sure enough, he was +1 in STATS.

In fact, I would venture a guess, that for any group of persons with a certain range of UZR in one system, that the UZR in the other system will be regressed toward zero, on the average.  I’m almost certain of that.

I might write an article about the similarities/differences in the data sets as well as the UZR results.  I have to be careful though that I don’t report too much of the raw data itself, as I am not authorized to do so, and I take my licensing restrictions very seriously of course.


#5    MGL      (see all posts) 2008/12/08 (Mon) @ 05:46

If I run a correlation (linear regression) between 05 STATS on 06 BIS, and 07 STATS and 08 BIS for players with at least 50 games in each year at the same position, I get: .358 (N=301)

If I do it in reverse (05 BIS on 06 STATS and 07 BIS on 08 STATS), I get: .385 (N=309)

BIS on BIS yields .478 (312) and STATS on STATS yields .547 (311).

These numbers are for “range runs” only ("error runs” should be the same for both systems).

I guess that suggests that there is some bias in each system that is not shared by the other system, which suggests that a combination of thw two systems might be “better.”

If I regress 05 on 06 and 07 on 08 using an average STATS/BIS UZR, I get an “r” of .508 (N=305).

If I run an average in 05 on STATS in 06 and an average in 07 on STATS in 08, I get .478 (420).  Running an average on BIS yields .415 (436).


#6    Rally      (see all posts) 2008/12/08 (Mon) @ 10:55

Anderson does not seem to be that big a deal.  Expressing it as per 150 games just magnifies the difference for a guy playing part-time at a position.  His actual number is +6 BIS, +1 or +.5 in STATS.  Just a couple of plays scored differently could account for that.

I’ve found similar differences in offensive runs measures, for example Torii Hunter’s batting runs on baseball reference, compared to my baseruns calculation, which uses the team’s actual runs scored to generate custom linear weights.

Is one data source better than the other?  I’m not sure if we should pay more attention to one, or average them.  Some sort of objective standard is needed.  Which one predicts defense better, using an external standard?

My suggestion is using DER, adjusted for the mix of FB/GB/LD/POP, and ballpark adjusted.  Figure projections based on 2004-2007 from each dataset, and see which set predicts adjusted DER better for 2008.


#7    Tangotiger      (see all posts) 2008/12/08 (Mon) @ 11:21

Of course Ichiro was cherry picked, since we know how much MGL/STATS v Fans had no match here for the longest while.

***

What is bothersome the most is that MGL is using (I suppose) the exact same engine on two different datasets that records the exact same balls in play.

If I run a Linear Weights that uses batted ball type (different run value on groundball and flyball outs), and run a correlation on the results of two different sources, I will get an r of .999 on the results.

As MGL is getting an r of only .72ish on the results, this is a big concern.

The errors should be random, not biased.  If the recording errors were all random, then we should still see the r’s pretty high.  What would it take to have a 26 run difference for one player over 300 games?  According to the Fangraphs page, the average CF was expected to have, given BIS’s understanding of the ball distribution that Ichiro saw, 709 outs, which is close to what Ichiro got.  (Actually, he had 741 putouts, but I suppose that there are a certain number of balls in play that MGL discarded, like liners or something).

MGL/STATS said the average player should have gotten 30 less outs, or 679.

Remember, we are talking about recording identical plays.  (Maybe not, if MGL is removing liners that are not “liners” in the other system.)

Usually, when we try to establish the out rates, we are trying to figure out “is this a .85 out play, or a .90 out play”.  We are not trying to figure out “is this a .15 out play or a .75 out play”.  The slice and the distance would be the biggest parameters in play, and all the others move things a bit, here and there.

Anyway, the total is 30 outs, meaning there might be a .20 out discrepancy for particularly tough plays to mark here, and .20 out there.  In that case, we have 150 opportunities in which two systems see the plays different enough to see a .20 out difference, all involving the same fielder.  That is enormous!

Like I said, if this was random, we’re not going to see these big differences after seeing 1000 balls in play.  To see 150 plays where the difference is .20 outs per play for one guy?  That is very bothersome, even if this may be the most extreme case.

Remember, the difference between the best and worst fielder is less than .10 outs per play.  It’s almost like saying: “Well, of course Adam Dunn strikes out alot: he faces CC Sabathia and Johan Santana and Mariano Rivera in 500 of his PAs!” We know he doesn’t, but the recording of the data shows that he does.  That’s what’s happening with Ichiro: we are giving him a context with STATS that is simply not the same as what BIS sees.

(Note: I don’t know if the “500” is accurate.  I’m sure you can pick out a more reasonable number.)


#8    Tangotiger      (see all posts) 2008/12/08 (Mon) @ 11:37

By the way, Rally’s point doesn’t apply here about his Linear Weights and B-R.com’s LWTS: they are using two different engines.  In MGL’s case, it’s the identical engine on two different data sources.  To make the same comparison, Rally should run his LWTS on data from the Lahman database and Retrosheet and Gameday. 

I am very disappointed at the low r MGL is reporting.

Here’s a few things that MGL can try:
1. remove the consideration of GB, FB, LD.  What happens to the r?  That is, is the difference simply in the batted ball type, that some scorers are counting FB that are LD and vice versa?

2. remove the hard/soft ball hit.  Again, what happens to the r?

Or, more simply, start with a UZR system that ONLY looks at the vector.  What’s the r?

Start with a UZR system that ONLY looks at the distance.  What’s the r?

Start with a UZR system that ONLY looks at the GB/FB/LD: what’s the r?

See where I’m going here?  Exactly what is it that BIS is seeing differently from STATS.


#9    Rally      (see all posts) 2008/12/08 (Mon) @ 12:32

How about showing the correlation seperately for infielders and outfielders?

I think there may be some disagreement on what is a flyball or liner (or fliner if they are using the hybrid), but groundballs are pretty easy to define.  At least usually.


#10    Peter Jensen      (see all posts) 2008/12/08 (Mon) @ 13:07

Tango - While you are waiting for MGL to run some studies here is some data to chew on from the 947 observations of hit balls to the outfield from the BIS and STATS Torii Hunter and Andrew Jones data from 2007.  STATS doesn’t give a vector in degrees, they only give the zone so the vector distance that I am giving is from the vector at the center of the STATS zone recorded.  Since most of the STATS zones (20 of 22) are 4 degrees wide a difference of 8 degrees would put the ball at least 2 zones away.

80 of 947 vectors differed by 8 degrees or more
291 of 947 distances differed by 15 feet or more

Observational data, even by professional observers who are being paid to get it right, is just not very good.


#11    Tangotiger      (see all posts) 2008/12/08 (Mon) @ 13:33

Good stuff Peter.  It certainly makes one wonder, especially if it’s the difference between random and systematic biases.  Perhaps it doesn’t apply to these two CF, and applies more to Ichiro, but here’s my argument:

If you look at all BIP that travelled a certain number of feet in BIS, say 290-310 feet (average of probably 300), what was the average travel for those STATS plays?  Is it also 300?  Or, was it biased to like 310 or 285 or something?  Like I said, maybe it doesn’t apply for these two OF, but it might apply to others (almost certainly Ichiro).

And since the systematic biases would follow a particular scorer, then what we’ll see is most teams have little bias, with a few huge biases on the other teams (Seattle almost certainly).

***

For those with a conspiracy mind, let me point you to this thread from over 4 years ago about Vernon Wells:
http://www.battersbox.ca/article.php?story=20040625122702999

Where in the comments I noted:

Someone from Batter’s Box gave me a contact name. I contacted him twice. On the second contact, he said he changed his mind about discussing this with me. It was rather strange, since my question was one of procedure and not opinion. It sure sounds like something strange is going on with the scoring.

Since I published the above link, Wells’s ZR is now down to .931 overall. In that time period, he made 59 plays on 69 balls in zone, for a ZR of .854. He was 161 for 167 before that (.964).

If he’s a true .920 player, that means his performance when I first made the note went from 2 SD from the mean one way to 2 SD away from the mean the other way.

I sure wish I had the data.


#12    Tangotiger      (see all posts) 2008/12/08 (Mon) @ 13:36

I also had a primer on the difference between ZR and UZR:
http://www.battersbox.ca/article.php?story=20040930025019999


#13    jinaz      (see all posts) 2008/12/08 (Mon) @ 13:58

Let me add to the chorus of “thank you“‘s to MGL for working to get this integrated to Fangraphs.  And also to David Appleman for continuing to produce such an innovative and important site.  Now he just needs to start calculating WAR based on his linear weights and UZR fielding values!

I will say that I’m surprised at the apparent surprise here of a lack of agreement between BIS and STATS.  Michael Humphreys found this same thing in 2007, with MGL and David Pinto’s help:
http://www.hardballtimes.com/main/article/ghosts-in-the-outfield/

And my quick’n’dirty correlation matrix between different fielding systems had similar findings:
http://jinaz-reds.blogspot.com/2007/10/player-value-part-3b-comparing-of.html

FWIW, Fans Scouting Report agreed with BIS statistics slightly better than it agreed with STATS statistics.  And in Humphrey’s study, his own stat, DRA, correlated better to BIS data than STATS data.  So I think we’d be justified in favoring the BIS data.

Nevertheless, I continue think that our best bet, as Rally has advocated for a long time, is to average statistics from at least these two sources (using the best engine we can in each case--now it’s UZR for BIS data and probably also for STATS data).
-j


#14    jinaz      (see all posts) 2008/12/08 (Mon) @ 14:01

Oops, that was MGL and Shane Jensen, not David Pinto.  My memory deceived me. -j


#15    Tangotiger      (see all posts) 2008/12/08 (Mon) @ 14:09

You have to understand the difference here.

One the one hand, you have two different engines using two different data sources.

In THIS case, it’s the SAME engine, using two different data sources ON THE SAME PLAYS.

To give you a perspective of how crappy an r=.7ish is when you have the SAME engine using two different data sources ON DIFFERENT PLAYS, consider running Offensive Linear Weights on players with at least 400 PA in 2007 and on the same players with at least 400 PA in 2008.  You will get an r of .7.

That is, it’s the same engine, on TOTALLY DIFFERENT data, and you get an r of .7.

In the fielding case, we are looking at the SAME plays, but from two different observers. 

Anyway, you may not like the analogy (hitting v fielding), but r=.7 when you use the same engine on the same plays (but from 2 different observers) is simply… not good.

***

Also, I suggest calling one bUZR and the other sUZR, to distinguish between the data sources.


#16    Rally      (see all posts) 2008/12/08 (Mon) @ 14:12

I second that.  I’ve loaded defensive data from a variety of sources into a database, and I call this stuff BUZR.  It’s got a good sound to it.


#17    Tangotiger      (see all posts) 2008/12/08 (Mon) @ 14:15

Do you pronounce that Boozer, or Buzzer?


#18    jinaz      (see all posts) 2008/12/08 (Mon) @ 14:16

In Humphrey’s study, he tried to get both Shane and MGL to use the same engine as best he could.  Probably wasn’t perfect, but it was close.  I agree, this is a tighter comparison, but I think it’s been pretty clear for a while now that STATS and BIS have massive disagreements.

In any case, at least now if there were any doubts before, they surely must be placated with this new comparison. 

I’m just going to keep on averaging--but now I can use Fangraphs’ UZR data instead of RZR, which is really nice.  Say “bye bye” to the OOZ denominator problem. 

I also completely support the bUZR and sUZR notation.
-j


#19    Peter Jensen      (see all posts) 2008/12/08 (Mon) @ 14:18

I actually included all of Jones’ and Hunter’s hit balls in post #10.  Limited to non HR Line Drives and Fly Balls (as defined by Retrosheet) the numbers are:

170 of 382 distances varied by 15 feet or greater
22 of 382 vectors varied by 8 degrees or greater
123 of 382 vectors varied by 4 degrees or greater

STATS and BIS were almost evenly divided between whose distance was greater, both overall and at greater than 300 feet.


#20    Tangotiger      (see all posts) 2008/12/08 (Mon) @ 14:32

It’s possible then that the Braves and Twins scorers have random biases, while the Mariners have a systematic bias.

A view by park would be far better.  I wouldn’t even run UZR by fielder, but simply by park.  That is, what is the UZR (using vectors only) for all LF at Fenway, at Coors, at Shea, etc, using STATS and BIS as the sources?

Ideally, we should see something like:
plays,park,BIS,STATS
4000,Fenway,-8,-11
4200,Shea,+3,+4
4100,Safeco,+6,-17

You know, something that is very close to a differential of zero, but a few enormous outliers.


#21    MGL      (see all posts) 2008/12/08 (Mon) @ 21:45

I’ll try and do as much research as I can to try and identify potential sources of disparity between the two sets of data.  As I said, I might write an article on Fangraphs looking at this issue.  Lots of good suggestions above about how to go about the inquiry.  FWIW, the only batted balls I don’t include at all are pop flies to and near the infield.  Those are defined as balls that are recorded as flies or pop flies that are less than a certain distance (150 feet I think).  Obviously some of those are caught by outfielders as well as infielders.  That could be one of the sources of the problem.


#22    Tangotiger      (see all posts) 2008/12/08 (Mon) @ 22:27

If I would have to bet, I’d say line drives / flyballs.  That is, if the out rate of the average line drive is .25 and that of a FB is .75, then misclassifying one as the other has an ENORMOUS difference. 

Naturally, if this was random, we don’t care, since .50 here or there works out in the wash.  But, if the Mariners’ scorer is always calling something a flyball, then it’s going to make Ichiro look bad if other scorers are calling that a line drive.

My second choice would be that whatever the scorers are using as reference point for the OF is completely mismarked, perhaps causing some say 20 feet of bias in one direction or another.  Not big enough to stand out as a problem, but sensitive enough for UZR to pick up on.

Why teams don’t demand guys like us do quality check, I don’t know.  Especially since we would all do it for free, if STATS and BIS were to ask us.


#23    WaddellCanseco      (see all posts) 2008/12/08 (Mon) @ 23:21

"if the Mariners’ scorer is always calling something a flyball, then it’s going to make Ichiro look bad if other scorers are calling that a line drive. “

If this were a problem wouldn’t see a difference in Ichiro’s numbers home vs away?


#24    Bookie Monster      (see all posts) 2008/12/09 (Tue) @ 00:11

Speaking of data sources…

Any word on the progress of Hit F/X?


#25    Rally      (see all posts) 2008/12/09 (Tue) @ 00:35

"Do you pronounce that Boozer, or Buzzer?”

Buzzer


#26    Colin Wyers      (see all posts) 2008/12/09 (Tue) @ 00:53

I’d be interested to see the results using average error instead of correlation. I took a look at bUZR year-to-year correlation and average error, and the results on correlation just look… wrong.

http://statspeak.net/2008/12/how-well-does-uzr-predict-uzr.html

Outfield UZR seems to have a higher y-t-y correlation than infield UZR. With average error it’s the other way around.


#27    terpsfan101      (see all posts) 2008/12/09 (Tue) @ 02:02

Thanks MGL. I never thought I would see UZR again. The article you were thinking about writing in #21 sounds like a great idea.


#28    Sky      (see all posts) 2008/12/09 (Tue) @ 18:48

#23 makes a great point.  Might we see all fielders by home and away bUZR?  Although, I’m guessing MGL’s already looked at that sometime in the past five+ years.


#29    Xeifrank      (see all posts) 2008/12/10 (Wed) @ 00:20

How does the author recommend using UZR/150?  Do we need to regress based on games played?  If so, by how much??
vr, Xei


#30    Colin Wyers      (see all posts) 2008/12/10 (Wed) @ 03:29

Real quick:

OF: 31.25 games
IF: 32.37 games

So I think we can just say 32 games for a quick and dirty solution.

[For the curious, I used y-t-t correlation and the x/(x+CH) method to come up with the regression amount. For outfielders, r was .273, average of 83 games. (These are weighted with no cutoffs, using DG as the weight.) IF had .249 r, average 97 DG.]


#31    Colin Wyers      (see all posts) 2008/12/10 (Wed) @ 03:43

So using Bobby Abreu as an example (-26 in 146 DG in RF):

(-26*150)/(146*150+32*150)
-0.15
answer*150
-22

(Everything is rounded, which is why your results may differ slightly.)

I think that works, at least. I should obviously be in bed right now anyway so I’ll do that.


#32    MGL      (see all posts) 2008/12/10 (Wed) @ 04:05

For those of you who have no idea what Colin is talking about (including me) - sorry dude - I’d have to check, but a good rule of thumb, I think, is to regress 300 expected outs (around 150 games for a 1B, 3B, LF, or RF and 120 games for a 2B, CF, or SS) 50% towards some estimated population mean (zero if you know nothing about the player’s “population").

Which is the same thing as adding 300 average (for whatever population you think your player belongs to) expected outs to a player’s UZR.

To use Colin’s example, if Abreu is -26 total runs in 146 DG, which is around 300 ex outs, we regress the -26 50% towards zero (if you want to use zero as his population mean), which is -13. 

I don’t know where or how he got the -22.  I am not saying it is not correct for whatever regression he is using. I am just saying that what he wrote in posts 30 and 31 is hard to follow - starting with:

Real quick:

OF: 31.25 games
IF: 32.37 games

So I think we can just say 32 games for a quick and dirty solution.

Real quick what?  Solution to what?  32 games for what?


#33    terpsfan101      (see all posts) 2009/02/13 (Fri) @ 15:29

MGL,

Thanks for posting the Arm and DP Ratings on Fangraphs.


#34    MGL      (see all posts) 2009/02/15 (Sun) @ 16:00

Sure, no problem!


#35    Tangotiger      (see all posts) 2009/09/02 (Wed) @ 11:11

Bumping, and I will especially want to highlight posts 8 through 10.


#36    terpsfan101      (see all posts) 2009/11/27 (Fri) @ 17:43

UZR data, thanks to Fangraphs, MGL, and the newest version of the Baseball Databank Database:

UZR 2002-2009:

http://spreadsheets.google.com/pub?key=tjzH1kXO74ZIhhUMz8pytRQ&output=csv

UZR 1999-2003:

http://spreadsheets.google.com/pub?key=tvdNP7RO_wccwxK1xBN2Gwg&output=csv


#37          (see all posts) 2009/11/29 (Sun) @ 20:27

Has the data on fangraphs changed since 08 to increase the r of the BIS STATS data?


#38    terpsfan101      (see all posts) 2009/11/30 (Mon) @ 02:03

I can’t answer your question JD, but the UZR data on Fangraphs has been modified at least 2 times by my count. The very first figures didn’t include Arm and DP ratings (Dec 2008). The second modification added the Arm and DP ratings (Feb 2009). In June, I noticed a third modification, but I’m not sure what MGL tweaked the third time around.


#39          (see all posts) 2009/11/30 (Mon) @ 11:56

Thanks Terp - I noticed the arm too. I love that.


#40          (see all posts) 2010/07/17 (Sat) @ 02:15

"Why teams don’t demand guys like us do quality check, I don’t know.  Especially since we would all do it for free, if STATS and BIS were to ask us.”

I can tell you why you won’t get too far even if teams do demand it.  The methods used by firms like STATS are proprietary and valuable, and they are reluctant to allow people to really look around under the hood as it were just because they want to.  I know because I report MLB data for them. 

And why would you “do it for free”?  What you do is a skill that has great value.  I know because I’ve read THE BOOK!  Do ball players work for free because they love it so much?  While it may be true that you love it, doing it for free makes it hard for anyone to make a living at it, yourself included.


#41    Tangotiger      (see all posts) 2010/07/17 (Sat) @ 06:58

While it may be true that you love it, doing it for free makes it hard for anyone to make a living at it, yourself included.

There’s the conundrum, right?

If none of us did it for free, then I would not know about Dan Fox and Alan Nathan and John Walsh et al.  And Mike Fast and Dan Brooks.  Where would we be?

I’m even using the free Firefox to write these words.

The way I see it, you expose enough of your work for free as a portfolio. 

I mean, look at my current situation.  I’ve got more job leads from people who read this blog, than from those I’ve contacted directly who have posted a want-ad.  It’s insane really.  My portfolio to corporate america is my resume of jobs I’ve been paid for.  My portfolio to the employers who contacted me through my blog is my free baseball work (and The Book).

So, in some sense, trying to maximize each unit of work for its capital is not necessarily the best way to do things.  You should do it just enough that in the grand scheme of things you reach the optimal point.

(I guess like Strasburg pitching in college.  He could have signed in an independent league and get paid money.)

You can argue we do too much for free and so are not being optimal.  That’s possible.  At this point, any more work I do doesn’t really add value to my portfolio.

And really, I should stop highlighting the work of others, as I’m increasing competition too. FWIW, I don’t think like that.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 14:26
Mail: rWAR v fWAR

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 13:00
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 12:05
Could Rob Dibble have been a comp for Strasburg?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II

Sep 01 22:11
PITCHf/x Summit 2010 - Recaps