THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, December 09, 2008

UZR v PMR v UZR

By Tangotiger, 05:51 PM

Dan The Turk compares PMR to bUZR (both use the same data source, but obviously different engines).  Dan lists the top 10 in differentials.

Soooo… I took those guys, and compared them to sUZR (UZR using STATS).
Iwamura: almost perfect match to bUZR
Feliz: 3 runs away from bUZR
Glaus: perfect match to bUZR
Renteria: much closer to PMR
Reyes: perfect match to bUZR
Hawpe: almost perfect match to bUZR
Mora:  almost perfect match to bUZR
Torii: half-way between bUZR and PMR
Uggla: closer to bUZR, and even further away from PMR (all other cases, sUZR was between bUZR and PMR)
Abreu: perfect match to bUZR

First of all, it sure doesn’t look like bUZR and sUZR are that different based on this non-random sample.  Which reminds me that “correlation” is not what we want to test, just the average difference or RMSE.  Correlation will alter the slope.

In any case, it certainly looks like MGL’s engine behaves very similarly.  It looks like there’s only a couple of runs difference either way.  MGL: can you re-run your bUZR/sUZR comparison, but look at the average difference, and not correlation?

And clearly, Pinto’s engine is doing something very different from MGL’s.


#1    Tangotiger      (see all posts) 2008/12/09 (Tue) @ 18:15

By the way, this is what I expected to find when using two different observers, and the same engine: lots of close results, with a huge outlier.  In this case, the huge outlier is Edgar Renteria.

So, I would look at his ball distribution with STATS and UZR, and I expect to see a huge difference.

What we’ll find is that the difference between STATS and BIS with most teams to be random biases, while a few of them will have systematic biases.  And Renteria, and to a lesser extent Torii, is likely evidence of that.


#2    Sky      (see all posts) 2008/12/09 (Tue) @ 19:09

Am I right that UZR computes a probability of each batted ball being turned into an out?  Or is it an expected run value?  Either way, it should be easy to compare the rating of every play between STATS and BIS data, if one had access to both, right?  Then you could group the plays however you wanted and see where the biases are… ballpark?  Ballpark and pitcher?  Ballpark and position?  Hitter?  Groundballs?


#3    David Pinto      (see all posts) 2008/12/09 (Tue) @ 19:16

Cool stuff, thanks for the comparison.

It might be interesting to create fielding charts like the ones at my site for bUZR and see how much they look alike.


#4    Dan Turkenkopf      (see all posts) 2008/12/09 (Tue) @ 20:17

Thanks for the info Tom.

Here’s the Average Difference and the RMSE for each position. (There’s a lot of space in the preview, so I apologize if it shows up in the comments)


















































Position Correlation Avg Diff RMSE
First Baseman 0.715016446 5.485277778 6.998029419
Second Baseman 0.581294074 5.684 7.131289708
Third Baseman 0.626725693 4.742564103 6.686325016
Shortstop 0.687982392 4.841794872 6.237151409
Left Fielder 0.748395813 4.64175 4.490245957
Center Fielder 0.528499668 5.173181818 5.065314979
Right Fielder 0.802550034 5.06952381 4.323938691


#5    Tangotiger      (see all posts) 2008/12/09 (Tue) @ 20:50

Right, the correlation is irrelevant, since it is dependent on the spread in the results.  If all the CF are close to each other, the correlation will be small just because of that.  Imagine that all the CF are -5 to +5 in fielding, adn the differences between Pinto and MGL are 0-4 runs each.  That’s great, right?  But, correlation will be pretty low.

So, the average difference is 5 runs, and there’s no position bias.

Kinda weird how the RMSE is lower than the average difference.


#6    Dan Turkenkopf      (see all posts) 2008/12/09 (Tue) @ 20:55

Yeah, that explanation makes sense.  I keep finding myself wishing I had paid closer attention in my college stats class.

I would assume part of the reason for the larger errors in the infield is because PMR includes infield popups (or at least it used to).  UZR leaves those out because of their discretionary nature, right?


#7    Colin Wyers      (see all posts) 2008/12/09 (Tue) @ 23:23

Probably for the same reason that the correlations are the way they are - the spread among players at that position.

For all players:

Pos STDDEV
1B 2.9
2B 3.7
3B 4.5
CF 4.9
LF 3.7
RF 4.0
SS 4.3

100+ DG:

Pos STDDEV
1B 6.3
2B 7.7
3B 9.3
CF 9.9
LF 9.7
RF 9.7
SS 8.3


#8          (see all posts) 2008/12/10 (Wed) @ 01:30

Kinda weird how the RMSE is lower than the average difference.

I believe it’s a mathematical fact that the RMSE must always be larger than the average difference. So, there may be a mistake in the calculation.


#9    Dan Turkenkopf      (see all posts) 2008/12/10 (Wed) @ 08:29

@dcj

I believe it’s a mathematical fact that the RMSE must always be larger than the average difference. So, there may be a mistake in the calculation.

Average difference was just the mean of the absolute values of subtracting bUZR from PMR.

RMSE is what Excel told me it was after running a regression.

I’ll go back and check the numbers this afternoon.


#10    TangoTiger      (see all posts) 2008/12/10 (Wed) @ 10:21

If the absolute value is:
1+2

Then the average is 1.5

RMSE for that is SQRT((1+4)/2)=1.58

If it was 2+3, then it’s 2.5 for average and 2.55 for RMSE

If it was 1+99, then the average is 50, and RMSE is 70.

Basically, the ratio of RMSE/AVG will be between 1 and sqrt(n), and the more far apart your data points, the more the ratio will get to sqrt(n), where n=number of data points.

As you can see, it is impossible for the RMSE to be less than the average of the absolute difference.


#11    Tangotiger      (see all posts) 2008/12/10 (Wed) @ 15:41

You know, we have THREE systems based on BIS data.  Dewan already published his numbers. 

Dewan’s process is more similar to MGL, but the number of parameters MGL uses is more similar to Pinto.  So, let’s see what Dewan thinks of Renteria:

-9 plays

That’s -7 runs and puts him right between bUZR and PMR (neither of which is close).  But, closest to sUZR!

MGL, if you are looking for an infielder test case, then I suggest Edgar Renteria.  Tell us why bUZR and sUZR don’t agree at all on him.  And, maybe we can get Pinto in on this as well.


#12    Tangotiger      (see all posts) 2008/12/10 (Wed) @ 15:43

Torii Hunter is also about half-way between PMR and bUZR, and close to sUZR.

Weird, right?  UZR based on STATS and Dewan based on BIS are closer to each other, than UZR is to itself, or bUZR to PMR.


#13    Dan Turkenkopf      (see all posts) 2008/12/10 (Wed) @ 19:31

This is another one of those instances where I should understand statistics better, but should I be getting different numbers depending on which set of values is the Input X range and which is the Input Y range in Excel?

I would have thought RMSE would be the same regardless of which columns ended up where.  But for RFs I get 4.32 if PMR is the Y range and 6.26 if bUZR is the Y range.

That makes things look better for the OF, but now the IF has the problem with the RMSE being less than the average of the abs differences.

So now I’m really confused.  I’m starting to think that the Regression functionality in Excel is not what I want to use to find the RMSE.  Any thoughts?


#14          (see all posts) 2008/12/10 (Wed) @ 21:11

I’m no expert on Excel, but I’d guess that the RMSE function associated with a regression would give you the difference between the Y-values and the scaled and translated X-values, so yes it is asymmetric in X and Y if the regression line doesn’t have slope 1. The fact that this line has a slope significantly different from 1 is somewhat interesting in itself.

What you probably want to do is just make a column of (X-Y)^2 and then find the SQRT(AVERAGE(that_column)).

Alternatively, you can make your new column (X-Y) and find the AVERAGE and STDEVP of that column. If the AVERAGE is 0, the STDEVP computed this way will be the same as in the previous paragraph. Otherwise it will be slightly smaller.


#15    Dan Turkenkopf      (see all posts) 2008/12/10 (Wed) @ 21:59

@andeux

Thank you!  That makes sense, and brings back old memories.

The values using the formula you suggested (and which I now can see in Tango’s example) are as follows:

Pos Avg Diff RMSE
1b:  5.49 6.97
2b:  5.68 7.01
3b:  4.74 6.56
SS:  4.84 6.27
LF:  4.64 5.43
CF:  5.17 6.34
RF:  5.07 6.21


#16    MGL      (see all posts) 2008/12/11 (Thu) @ 05:01

For 05-08, the average absolute error in total runs between bUZR and sUZR is 5.17, for all players with a min of 100 chances (N=898).  If I weight that by the number of chances, it is 5.65.  The RMSE (square root of the average squared error) is 6.75.  The correlation coefficient is .702.

For each position:

Pos Avg. Diff “R”
1B 4.27 .603
2B 4.45 .727
3B 4.65 .716
SS 5.20 .759
LF 4.97 .806
CF 6.40 .585
RF 5.67 .689


#17    MGL      (see all posts) 2008/12/11 (Thu) @ 06:16

Here are some preliminary comparisons of the STATS and BIS data:

05

Type of BIP STATS BIS

GB 60,236 59,894
FB+PF 49,084 47,286
LD 26,149 28,241
Bunts 3,203 3,181
Total BIP 138,672 138,602

06

Type of BIP STATS BIS

GB 60,194 59,276
FB+PF 49,527 49,744
LD 26,280 26,551
Bunts 3,118 3,093
Total BIP 139,119 138,664

07

Type of BIP STATS BIS

GB 59,950 58,945
FB+PF 49,448 51,369
LD 26,672 25,273
Bunts 2,988 2,973
Total BIP 139,058 138,560

I am not too impressed with the agreement or lack thereof.  In all fairness, the BIS data includes “fly fliners” (which I called fly balls) and “liner fliners” (which I call line drives).  I am not sure how there can be such large differences in the number of ground balls and I am not sure how there can be any differences in the number of total BIP.

Maybe someone can calculate the number of BIP from traditional data or from the retrosheet files, and we can see which data set (STATS or BIS) seems to be the better one in terms of total BIP.


#18    terpsfan101      (see all posts) 2008/12/11 (Thu) @ 07:23

Retrosheet Data 2005-2007:

YEAR ---GB ---FB ---LD BUNT PA BIP
2005 60289 49644 25445 3373 138751
2006 59912 49754 25903 3251 138820
2007 59737 49371 26448 3079 138635

FB include Flyouts and Popups
PA BIP = Plate Appearance BIP
GB, FB, LD totals don’t include bunts

Foul Errors could be a culprit. I didn’t include them in the above totals because the batter isn’t charged with a plate appearance. No foul errors were charged on bunts:

YEAR FB
2005 53
2006 44
2007 54

This is probably unecessary, but here is the breakdown of the bunts:

YEAR --GB -FB LD BUNT
2005 2945 417 11 3373
2006 2870 374 7 3251
2007 2758 312 9 3079


#19    terpsfan101      (see all posts) 2008/12/11 (Thu) @ 07:54

One more thing, Retrosheet does code homeruns as batted balls. They are included in the above totals. Here is the breakdown for inside the park homers and outside the park homers. “u” means that the HR doesn’t have a batted-ball code.

Inside the park HR

year fb ld u
2005 1 0 11
2006 1 2 10
2007 5 4 6

Outside the park HR

year ---fb --ld u
2005 4504 501 0
2006 4820 553 0
2007 4335 607 0


#20    terpsfan101      (see all posts) 2008/12/11 (Thu) @ 08:44

I will have to see if the discrepancy in the number of bunts is due to counting foul bunts with 2 strikes.

There probably are a few other events that were not assinged a batted-ball code, considering a few of the inside-the-park HR’s didn’t have a code.


#21    dkappelman      (see all posts) 2008/12/11 (Thu) @ 13:50

This just goes to show that it’s not that easy to get 100% accurate play-by-play data.  I’ve dealt with a number of play-by-play providers and let’s just say there have been some pretty bad ones that I’ve had to cancel contracts with because the data wasn’t reliable.

I typically don’t see too many problems with the BIS data and they’re good about correcting things.  STATS who I receive live pbp data from occasionally had errors in their live data, but they were usually cleared up in the game after data.  I used to see a TON of inaccurate pbp data from STATS up on ESPN in 2006, and even tried to contact ESPN about how unreliable their data was.

For a very brief amount of time I was using their feed to test the Win Probability stuff and it was just painful to deal with on some days.


#22    terpsfan101      (see all posts) 2008/12/11 (Thu) @ 15:45

Yes, foul 2 strike-bunts were coded as flyouts. Removing them, this is the new total for bunts, followed by the updated BIP total:

2005: 3203, 138581
2006: 3107, 138676
2007: 2969, 138525

If you look at the total BIP, the new amounts are in better agreement with the BIS data.


#23    MGL      (see all posts) 2008/12/11 (Thu) @ 16:11

I have to check if my numbers included 2-strike foul bunts, foul errors, and the like.  I’ll also try and break it down into parks, to see if there are any biases.


#24    Tangotiger      (see all posts) 2008/12/11 (Thu) @ 16:12

When I work with retrosheet data, the first thing I do is do:
SET BATTEDBALL_CD = ‘B’
WHERE BUNT_FL = ‘T’

I couldn’t care less whether the bunt is a “groundball” or a “flyball”.


#25    terpsfan101      (see all posts) 2008/12/11 (Thu) @ 18:06

Sounds like a good idea Tango. Thanks for the suggestion. This way you don’t have to worry about accidentally mixing in bunts with GB, FB, and LD.


#26    terpsfan101      (see all posts) 2008/12/11 (Thu) @ 21:00

MGL or Tango or dkappelman,

What is your guys position on classifying 2 strike foul bunts? I am thinking that it would be best to exclude them from the total number of bunts, since they are already classified as a strikeout.


#27    Tangotiger      (see all posts) 2008/12/11 (Thu) @ 21:37

I call those bunts.  I don’t really care if technically they are “strikeouts”.


#28    MGL      (see all posts) 2008/12/11 (Thu) @ 21:55

For what? Depends on what you are doing or trying to find out.  I generally classify them as strikeouts and ignore them.  I only use a fair BIP for UZR, so I eliminate everything else.  As far as looking at the cumulative numbers, as I did above, I have to go back to my databases and see what I was counting and what I was not counting.  For UZR, I usually take a “raw” database and then produce another database which only includes things I want to look at.  So I probably am not including foul bunt third strikes in my numbers above.  I am probably including foul outs on fly balls, but I may or may not be including foul errors on fly balls (where the PA continues).

As long as you specify what you are including where, that is all we care about.  So, in the bunts category, just specify whether or not it includes third strike bunt fouls.  In the fly ball, line drive, and pop fly categories, specify whether it includes foul errors where the PA continues.


#29    MGL      (see all posts) 2008/12/11 (Thu) @ 22:02

Tango, sure.  I just meant that in the eyes of most teams, fans, media, and the like, you are never going to get the same appreciation (or value) for defense that you will for offense, especially in the OF.  If you ever wanted to amuse yourself, ask a bunch of typical fans, sports writers, GM’s, managers, etc., how many theoretical “runs” a good, very good, or great outfielder is worth, OR ask him how many OPS points, OR ask him how many BA points those categories of players were worth, and see what they say.  You would be quite amused at the answers (all over the board) I would think. That is assuming that they wouldn’t look at you with glassy eyes thinking, “What the hell is this guy even talking about?” Also, ask them, after they answer, if they would trade a player worth 10 runs above average (most won’t know what that means), or with an OPS of .900, or a BA of .310, for an “equivalent” defensive player, according to their own standards, and why, and watch their head explode.


#30    MGL      (see all posts) 2008/12/12 (Fri) @ 03:37

Question for you guys.  For a defensive metric like UZR, how do you think a FC withe no outs recorded should be treated for the fielder who fields the ball.  A typical play would be a runner on first, the third baseman (or any fielder) fields a slow ground ball, tries to get the runner at second, the throw is late and everyone is safe.  The fielder is usually not charged with an error, but since the official scorer thinks that the batter would have been thrown out if the fielder elected to go to first base, it is scored as a “reached on a FC” or just a “FC.” It is a goofy scoring term, since if you are told that a play is a FC or the batter reached base on a FC, you don’t know whether an out was recorded at another base or not.  Anyway, how do you think that should be treated for the fielder?  One option is that it is treated as a “non-play” (ignored). Another option is that it is treated as an out for the fielder (as if he got the out at first or at another base).  A third option is to treat it as an error on the fielder, since he elected to throw to another base rather than first, and his judgment was presumably bad (although technically, he does not have to get an out 100% of the time at another base, in order to be “right").  A fourth option is to give him a partial error and partial out, since it is the correct play to try for an out at another base, as long as you get that out at least 80% of the time, or whatever the BE point is.

Thanks.


#31          (see all posts) 2008/12/12 (Fri) @ 04:56

MGL, bottom line is, the fielder had an opportunity, and failed to record an out, in this case because he chose to go for the presumably tougher play.

If your rating is the percent of balls in play where an out is recorded, then he has failed.

You may want to create FC as a new field, but it’s nothing the fielder should get any positive credit for.


#32    joe arthur      (see all posts) 2008/12/12 (Fri) @ 04:57

If you want to credit the fielder with the extra run value saved whenever he actually eliminates the lead runner instead of the batter, you will reward him for trying to make the “right” play, but you should fully penalize him for not making any play. This is closest to your option 3. If the fielder has better or worse than average judgment on these plays it will be captured as part of his defensive value, though I’d guess this will rarely be as much as plus or minus 1 run for a season. FC without any outs are fairly rare plays, but it just takes 2 in a season to cost more runs than that.


#33    terpsfan101      (see all posts) 2008/12/12 (Fri) @ 07:37

I agree with Joe and Brain. If you reward fielders’ the additional run-value for erasing the lead-runner, then you need to penalize fielders’ when they do not succesfully eliminate the lead-runner. Not only did they fail to erase the lead-runner, but they are also responsible for adding another baserunner since they didn’t record any outs on the play. So I think you would be perfectly justified in treating these plays like you would treat plays where a batter reaches base on an error.

A “Reached on Fielders’ Choice with no outs recorded” (RFC) does not occur very frequently. From 2002 to 2008, there was a total of 1140 RFC’s, 220 occuring on sacrifice bunts. While they are an infrequent occurrence, they are very costly in terms of run expectancy. The average RFC on non-sacrifices was .66 runs from 2002-2008, and the average RFC occurring on sacrifice bunts was .73 runs. So RFC’s are responsible for creating about 100 runs each season.

In Linear Weights, I lump the RFC’s in with Reached on Errors. However, I do not include the RFC’s that occur on sacrifice bunts. I also do not include ROE’s that occur on sac bunts.


#34    MGL      (see all posts) 2008/12/12 (Fri) @ 20:09

Thanks guys.  I don’t treat erasing another runner on a FC (where the batter is safe but an out is recorded at another base) as anything more positive than getting the runner at first, even though it is more valuable to get a lead runner of course.

So I guess I should treat it as a full error even though technically I should treat it as a partial error since a player with perfect judgment will only get a lead runner 80% of the time or so and the other 20%, everyone would be safe.


#35    terpsfan101      (see all posts) 2008/12/13 (Sat) @ 04:41

MGL,

If you do not award fielders the extra-value for eliminating the lead baserunner compared to recording the out at first base, I would recommend treating a RFC as a partial error and partial out. I keep saying lead-runner even though it isn’t always the lead-runner who gets erased on a fielders’ choice.

Looking at the breakdown of RFC’s by base-out state from 1954 to 2008, I have 7 RFC’s that occured with the bases-empty. Of course, this is impossible, because the definition of a fielders’ choice implies that there is a runner(s) already on-base prior to the event occurring.

I think that one of the reasons why the Linear Weights don’t sum to zero when you use Chadwick to parse the PBP files, is that the base-out states are occasionly recorded incorrectly.


#36    terpsfan101      (see all posts) 2008/12/14 (Sun) @ 00:16

According to the official rules, a fielder’s choice is awarded when a fielder throws the ball to the wrong base. So it is possible to reach on a fielder’s choice with nobody on-base. One of these RFC’s with nobody on, occurred after a caught stealing. I don’t have the PBP in front of me, but I imagine what happened was the fielder threw the ball to the 2nd baseman (or in the direction of the 2nd baseman), thinking that there was still a runner on 1st base.

Also, foul 2-strike bunts that are caught by a defensive player (excluding foul-tips) are not counted as strikeouts.


#37    MGL      (see all posts) 2008/12/15 (Mon) @ 03:07

#35, yes I agree that if I don’t give extra credit for erasing another runner other than the batter/runner, I should not give a full “error” for a RFC (unless maybe there are 2 outs).  If I did that, everything would not sum to zero of course.  Probably no big deal though.  Maybe I’ll actually change it so I DO give extra credit for an out at another base (not first).  It would not be much more than 1 out though.  It would be roughly the ratio of FC (with an out) to RFC, which would be something like 1.01 I would think.


#38    terpsfan101      (see all posts) 2008/12/15 (Mon) @ 22:13

I see your point about everything not summing to zero, if you treat RFC’s with less than 2 outs as a partial error and partial out. If you think about it, a RFC really is an error, although it isn’t classified as an error. I keep changing my opinion on this. I think you should treat the RFC’s like an error, since it allows a runner to reach base without an out being recorded, just like a hit or reached on error does.


#39    joe arthur      (see all posts) 2008/12/16 (Tue) @ 02:42

Does UZR have base/out situation adjustments for the probability of recording an out which implicitly incorporate the probability of failed attempts on the lead runner anyway? The vast bulk of such plays must be at 2nd in DP situations, and most of the remainder in infield in/play-at-the-plate situations. I thought UZR did at least do something to adjust for DP situations - even though the rationale might be to pick up the changed probabilities caused by positioning middle infielders at double play depth and holding the runner at first, that changed baseline for recording an out would also capture the normal probability of a failed fielder’s choice.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jul 30 03:43
Roy Halladay’s Bobby Orr career

Jul 30 02:33
Cleveland: Meet Patrick Roy

Jul 30 01:42
“I believe…”

Jul 30 00:30
Maddon at it again…

Jul 29 23:04
Introductions: Strasburg, BABIP… BABIP, Strasburg

Jul 29 20:31
Bannister: the greatest saberist spokesperson ever

Jul 29 19:25
Gotta give Joe Torre some credit

Jul 29 19:10
SABR 111 - Out value

Jul 29 17:47
Reducing bias in fielding metrics

Jul 29 17:44
Colin full-time at BPro