THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, May 27, 2010

How lucky has Scott Rolen been with his opportunities to field?

By Tangotiger, 10:27 AM

Take the number of balls in play for the league and the team.  See what percentage of BIP the shortstop typically fields, then see how many extra plays Ozzie made.

Right, that’s how WOWY kinda works, and you do end up with “unbelievable” numbers.

I was just looking at Scott Rolen’s career numbers, and whereas the average 3B converts 9% of BIP into outs, Rolen converts 10% of BIP into outs.

Now, if the reason that Rolen is converting so many BIP into outs is because his pitchers, his batters or his park are predisposed to get balls hit to 3B, then Rolen will end up looking better than he is. 

However, his batters, his pitchers and his park have all been virtually league average over his career (as you would kind of expect given a player that has played on as many teams and for as many years as Rolen has).

Now, even given that, perhaps Rolen got lucky and ended up having a disproportionate number of GB/airballs, when he is playing.  That is also not true.

So, we are left with the fact that when there are 4000 BIP per year, the average 3B will make an out 360 times, while Rolen has made an out 400 times, making him +40 plays PER YEAR (+32 runs).

Now, perhaps the reason Rolen is so good is that even given that he’s had average hitters and average pitchers and average parks and average GB/Airball ratios and he’s played so often behind everyone, that even after all that, Rolen is getting a disproportionate share of those ground balls hit to 3B. And that’s why WOWY has Rolen at +170 runs since 2002 while UZR and Dewan has him at +90 to +100 runs.  And that’s why WOWY has Rolen at +300 runs in his career, while Total Zone has him at +150 runs.

So, I ask: what is more believable, that Rolen, given 42,917 balls in play has faced basically an average distribution of plays, such that the league average 3B would have converted 9% of those BIP into outs while Rolen did in fact convert 10%, or that the scorers are so precise in their batted ball locations that an average 3B would have converted 9.5% of those BIP into outs?

The scorers’ marking of the hit locations implies that Rolen’s opportunities are 3.6 standard deviations from the mean.  Now, that, in and of itself, means nothing.  After all, look at enough data, and someone, by luck, will be at 3.6 SD.

Rolen, however, is not an isolated incident.  And so, I submit that the marked locations of the batted balls have enough imprecision in them that a more “earthy” approach, like I’m doing, is warranted to at least regress to.  That even though I’m saying 9% and even though the scorers are implying 9.5%, that maybe it should really be 9.3%.

(The actual numbers I have are this: 42,917 BIP for Rolen, of which he made 4197 outs, or 9.8%.  The average 3B made 8.9% outs.  The average 3B, facing Rolen’s batters made 8.9% outs.  The average 3B, fielding behind Rolen’s pitchers made 9.0%.  The average 3B, playing in Rolen’s parks, made 8.8% outs.  The average 3B, given Rolen’s GB/FB/LD distribution, made 8.9% outs.)


#1    MGL      (see all posts) 2010/05/27 (Thu) @ 11:27

He could like to play further from the line than the average 3B and therefore make more plays but give up more doubles.

Look at his career simple STATS ZR and see what that shows.  That should be more accurate than just looking at outs per BIP (your method).  I mean regardless of the sample size and how ‘average’ the batters and pitchers are, to think that a computation that uses mostly balls that are hit nowhere near the third baseman is “accurate” is NOT believable. At the very least have someone take the retro-sheet files and only count ground balls that are hit anywhere near the 3rd baseman.  Those results I might believe.  Tango, are you only counting ground ball outs made by Rolen and all third basemen?  Where are you getting that info?


#2    Rally      (see all posts) 2010/05/27 (Thu) @ 11:42

Do Rolen’s shortstops make as many plays with him as without him?  Maybe he’s cutting in front of the shortstop to make some extra plays.


#3    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 11:48

At the very least have someone take the retro-sheet files and only count ground balls that are hit anywhere near the 3rd baseman.

That’s what Total Zone essentially is, correct?

Tango, are you only counting ground ball outs made by Rolen and all third basemen?  Where are you getting that info?

Retrosheet.

that uses mostly balls that are hit nowhere near the third baseman is “accurate” is NOT believable.

Right, if we knew where the balls are actually hit, that would be one thing.  But, we actually don’t know.  However, you’ve given me an idea to improve something.  A really good idea.  You’ve just given me several hours of rewrite to do.  Thanks!


#4    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 11:54

MGL, can you try this?  Presume that BIS is marking all of Rolen’s outs as “too conservative”.  What if we alter the outs made as being farther from the central point?

For example, let’s say you have angle 0 at 2B, with -45 down the 3B line and +45 down the 1B line.  The average 3B out is made at -40 degrees (let’s say).

Let’s say that some Rolen out, actually made at -44 is being marked at -42.  And let’s say that some Rolen outs, actually made at -32 is being marked at -36.  And that Rolen is being mostly affected because he’s the best fielder around, and so he’s got alot of “out of zone” plays that are being marked closer to zone.

Therefore, how much does his 2002-09 UZR change if you were to “stretch” the recorded values?  That is, change the -39 to -38, change the -38 to -36, the -37 to -36.  Stretch the zones by doubling the numbers from some central point.  (Up to a point… don’t make him go past angle -46 or -48.)

How much does UZR change if you do that?

Basically, I’m trying to see how much impact some little change can have.


#5    Guy      (see all posts) 2010/05/27 (Thu) @ 12:30

Let’s use this thread (not the WAR thread) for all comments on TZ. 

“And that’s why WOWY has Rolen at +170 runs since 2002 while UZR and Dewan has him at +90 to +100 runs.  And that’s why WOWY has Rolen at +300 runs in his career, while Total Zone has him at +150 runs.”

We should treat TZ separately from UZR/Dewan.  I don’t see any reason why UZR necessarily has to produce a more conservative estimate than WOWY (unless the raw numbers are regressed in some way I’m not aware of).  But Total Zone MUST understate Rolen’s value.  It will only give him credit for about half of the extra plays he makes.  In other words, if Rolen’s teammates are average, a TZ rating of +150 actually means he is about +300 runs!  This isn’t an issue of imprecision in the data, it’s a sytematic bias built into TZ. 

“I’m a lot more comfortable calling him +30 than +95.  Seems like most other people are too.” [Rally in WAR thread.]

I have huge respect for Rally and the work he’s done, but this is just a terrible argument.  TZ says that Ozzie is +214 runs over his career.  Assuming a SS gets credit for about .4 plays for every extra out he makes (Rally can tell us the exact ratio), that means Ozzie is really +535 runs.  I’m not making some radical, crazy assumption—that’s what his TZ rating truly means if everything else is just average.  That’s what I think people aren’t understanding about TZ—it systematically reduces the rating of great fielders, even if you accept the validity of all its data.  On top of that, if we just look at a crude measure of plays made (Range), we would guess Ozzie is about +700 runs. 

So I don’t care what people are “comfortable” with.  Putting Ozzie at +535 (or whatever we’d get by comparing actual and expected outs) is actually a much more reasonable and conservative estimate for Ozzie than using a metric which we know with 100% certainty will only show about 40% of his actual performance.

“I mean regardless of the sample size and how ‘average’ the batters and pitchers are, to think that a computation that uses mostly balls that are hit nowhere near the third baseman is “accurate” is NOT believable.”

You are greatly underestimating the value of randomness here.  Over a whole career, a lot of things will wash out.  On top of that, Tango has controlled for most of the factors that could change Rolen’s opportunities.  Now, UZR may be more accurate than WOWY—I think that’s an open question.  But TZ, which systematically underestimates Rolen’s contribution by a huge amount, has virtually no chance whatsoever of providing a better estimate than WOWY.


#6    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 12:56

Where UZR shines is in the smaller sample, the 0 to 3 years level.  That’s because the thing that kills WOWY, that the randomness hasn’t had a chance to spread out yet, is what lets UZR’s information of batted ball location step up.

At the same time, there are obvious biases in sujective data recording.  So, after 43,000 plays, one would expect that randomness would wipe out most of the issues with WOWY.  We shouldn’t expect a disproportionate number of GB hit to SS or 1B compared to 3B at that point.  However, with UZR and the others that rely on batted ball locations, there could be systematic biases. 

For example, an outfielder that makes an out is more likely to have that ball marked as a flyball than a line drive, and OF make more “runs” on line drive outs.  So, if Ichiro or Rolen are making outs on tough plays that are really being marked as easy, the UZR and Dewan will underestimate these players.

So, I think I’ve said this in the past, that if I had less than 3 years of data, I’d go mostly with UZR.  And if I had more than say 7 years of data, I’d go mostly with WOWY.

***

I know we talked about the conservative nature of TZ.  We had a thread on it about a year ago where we went through the process and determined that TZ does underestimate.  I’ll see if I can find that thread.


#7    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 13:17

This is the thread starting at post 16, where Guy, Rally, and I discuss the workings of TZ:

http://www.insidethebook.com/ee/index.php/site/comments/best_worst_wowy_since_1993_through_age_34/#16

(Interestingly, in that thread, I talk about Beltre, much the same way I talk about Rolen here.)

Anyway, in post 23, Guy provides an excellent illustration:

To illustrate this more clearly, consider an Adam Dunn level SS, who only makes outs on 6% of BIP rather than 12%.  So he’s -6 plays per 100 BIP.  TZ will define his opportunities as approximately outs + .5 * GB hits (to LF or CF), or about 14 in this case.  Then his out% is 6/14 or 43%, vs. a league average of 70%.  Apply the difference of -27% to his 14 opportunities, and he’s now just -3.8 plays (rather than -6).

The trick is we’re now saying this SS had only 14 opportunities per 100 BIP, while the average SS has over 17.

In my long-response, I put in this snippet:

Actually, we can see that the hits allowed are inversely proportional to the outs made.

For example, over a long period of time, we know that the number of opps will be the same for every SS.  So, if one SS has more outs, then he also allowed less hits.

Rally’s model (is this the same problem with Dan Fox’s?) presumes that the two are independent, when in fact, they should be inversely proportional.

The problem is that for any given year, we don’t know if a SS faced a normal number of opps.  So, if he had few outs, we presume that he probably had few opps as well.

Actually, everyone should read that thread in its entirety, just so that we don’t end up repeating ourselves here.


#8    Rally      (see all posts) 2010/05/27 (Thu) @ 13:28

"We should treat TZ separately from UZR/Dewan.”

In Scott Rolen’s case, there is no difference.  TZ has him at +90 for 2002-2009.

“Putting Ozzie at +535 (or whatever we’d get by comparing actual and expected outs) is actually a much more reasonable and conservative estimate for Ozzie than using a metric which we know with 100% certainty will only show about 40% of his actual performance.”

We don’t know that.  We sure as hell don’t know that with 100% certainty.  In the 1980 example, we know Ozzie made about 100 more plays than an average shortstop.  We do not know that there were 100 fewer hits going through the shortstop’s area of responsibility.  And to give him +500 runs, you have to know that.

Most plays made in the field are relatively routine.  Using plays made as the standard to build WOWY on is dangerous, in my opinion.  What really separates fielders is how many hits they allow. 

If you run the same WOWY for Scott Rolen and find there are 300 fewer runs worth of opponent hits when he’s in the field I’d be more convinced.  Especially if we break it down a bit.  I know Tango doesn’t trust the groundball category on retrosheet but you could look at hits fielded by the left fielder (plus infield singles of course) - Unless MLBAM stringers are randomly generating the data you will not have the centerfielder or rightfielder picking up a ball the third baseman had a chance to make a play on.

OK - centerfielder I can see - freak play like a ground ball down the line, ricocheting of the wall, bouncing past the LF and backed up by the CF.  But such plays will be so rare that you don’t have to worry about them changing the results of the data.


#9    Peter Jensen      (see all posts) 2010/05/27 (Thu) @ 14:02

(The actual numbers I have are this: 42,917 BIP for Rolen, of which he made 4197 outs, or 9.8%

Tango - I have Rolen as fielding 2012 ground balls for outs from 2002 to 2009. B-ref has him with 2103 assists for 2003 to 2009.  How are you getting him with 4197 outs?


#10    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 14:07

This is an interesting thread:

http://www.insidethebook.com/ee/index.php/site/comments/how_many_runs_is_a_good_fielding_ss_worth/

From 1957-2006 (excluding 1999), there have been 50 shortstops that have played at least 1037 games at SS (specifically, at least 28,000 outs). ... Our 50 shortstops with the most playing time played a total of 682,330 innings (468 full seasons).  Their teams allowed 4.30 runs per game.  Their counterparts (all other SS, weighted by the same park/year combination) allowed 4.54 runs per game.  Therefore, our top 50 SS were on teams that allowed 0.24 fewer runs per game than their counterparts.  That’s a nearly 40 run difference over a full season, which is quite hefty.

We talked about a similar finding in the Mark Ellis thread
http://www.insidethebook.com/ee/index.php/site/comments/how_much_is_a_great_fielder_worth_25_runs/


#11    Guy      (see all posts) 2010/05/27 (Thu) @ 14:08

Just to clarify one thing (and Rally should jump in if I state this incorrectly):  There are TWO versions of TZ—for more recent years it uses PBP data, prior to that it does not.  In the PBP version, about 40% of the credit/blame for a play made by the SS is assigned to another player.  In the pre-PBP version, I’m estimating 60% or more of the credit/blame is spread to other players.  And for positions that make fewer plays, the pre-PBP version will have a larger error—for example, a 3Bman will only get credit for about .3 plays on each extra out made. 

*

I’d certainly agree that 3B poses a special challenge, in terms of the possibility that Rolen is stealing opportunities from his SS.  We should be able to figure out if that’s true.  But we shouldn’t ASSUME this to be true just because he makes more plays than some people (or even a lot of people) think is likely.  I don’t think this is a factor we need to worry about for any other IF position.

*

“We don’t know that.  We sure as hell don’t know that with 100% certainty.”

Rally, I’m not sure what ‘that’ means here.  I agree we aren’t 100% sure Ozzie is really +535 (a number I made up).  But yes, we are 100% sure that using expected outs to estimate Ozzie’s opportunities is a MUCH better estimate than TZ, given the data you use. 

Look at it this way: 
If Ozzie faced a typical distribution of BIP given those opposing hitters, then Ozzie’s real runs saved will be about 2.5X bigger than TZ says.  And with 76,000 BIP (!), isn’t that the conservative assumption to make?  For TZ to be correct, Ozzie’s real opportunities would have to be about 5 or 6 SDs above average.  So maybe I should say 99% rather than 100% certain. 

*

“I know Tango doesn’t trust the groundball category on retrosheet but you could look at hits fielded by the left fielder (plus infield singles of course)”

That’s fine, and what TZ does.  I agree the LF will handle those balls.  But then for every hit Rolen prevent, you give half the credit to the SS!  So if Rolen’s SSs have been average, your result and WOWY are totally consistent!  (And if Rolen’s SSs are below average, you will further undervalue him.)


#12    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 14:11

Peter: That’s his career.

Since 2002, he’s at 2439 outs (excluding bunts and pitchers-as-batters).  It includes any lineouts.


#13    Guy      (see all posts) 2010/05/27 (Thu) @ 14:24

Rally:  To be clear, I didn’t mean to ask/suggest that you overhaul TZ, just suggesting an additional metric that I think may be superior at the career level.  I thought Fangraphs might have the interest and capacity, since you have already done most of the work required to calculate expected outs by measuring how often each hitter makes a BIP out at each position.  But I am perhaps underestimating the amount of programming work required to create this expected/actual out metric.


#14    Rally      (see all posts) 2010/05/27 (Thu) @ 14:30

"But yes, we are 100% sure that using expected outs to estimate Ozzie’s opportunities is a MUCH better estimate than TZ, given the data you use.”

I don’t know who “we” is, but does not include me.  Say God, Buddha, Muhammad, Corey Schwartz, and Sean Forman all decided to get together, unearth game broadcasts and retroactively install field f/x to measure everything we want in a fielder, and came back with runs saved results.  Before the numbers were released, MGL opens a casino so we can all take bets on where they’ll be.  I would put my money on Ozzie being closer to +200 than to +500.  Maybe I’m wrong, but that is where I stand.


#15    Rally      (see all posts) 2010/05/27 (Thu) @ 14:43

Guy #13 - No problem.  The different versions of TotalZone are in several different databases.  The programs are complex enough that it takes me a long time to look over them and figure out what I was doing.  My baseball time is very limited being a dad now.  And i have other projects I want to move on to. So as for any work on TZ, the only thing I’m willing to do is make what I’ve produced in the past available (my site, Fangraphs, B-ref) and update the current version of the program when I get new data.  Any revisions of the old stuff aren’t going to be done by me.  I have nothing but encouragement for anyone who wants to try and implement what amounts to a new system.

I am curious though.  We see Rolen has made enough extra plays to account for twice the run saved value that TZ,UZR, and DRS credit him with.  Using the same parameters, does that mean there are 300 runs worth of extra hits happening?  Or is that 1% of BIP going somewhere else?

1) If it’s hits, then I may have to revise my skepticism here, especially if the same pattern holds for other top fielders.
2) If it’s fewer outs for other fielders, we could have a few causes.
a) Some explainable interaction, like 3B cutting off the SS
b) If it’s something else, like rightfielders making fewer plays when Rolen is in there, then there’s probably too much noise even in multi-year data to trust it.


#16    Guy      (see all posts) 2010/05/27 (Thu) @ 14:53

"I would put my money on Ozzie being closer to +200 than to +500.  Maybe I’m wrong, but that is where I stand.”

Then, with all due respect, you don’t even believe in your own metric.  Because +200 in Total Zone MEANS about +500 on the baseball field (on average).  Or if I’m wrong, please answer this:  if Ozzie really were +500 on an average BIP distribution, and his 3Bmen were also average, about what would his TZ be? 

Again, +500 is just a guess, based on TZ.  I’m assuming that in pre-PBP TZ, a SS gets about .4 credits for each hit/error he prevents.  Am I in the right ballpark?

Basically, TZ sees a player who is +50 and says “This is unbelievable.  So he must have had an unbelievable number of opportunities.  I’m going to call him +20.” This is not based on data for the most part (at least for the pre-PBP years), but is just an assumption.  And for some rookie SS, an assumption most of us would probably agree with.  But do you really want to make the same assumption when you have 76,000 BIP?  You really think we should continue to assume that 60% of his observed performance, year after year after year, was luck?  I can’t imagine you believe that makes sense....


#17    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 14:54

Ozzie played the equivalent of 2400 full games.  My money is on him lowering the runs per game at least 0.20 runs per game, and I wouldn’t be surprised by 0.30 runs per game.  Heck, even 0.40 runs per game.  But, less than 0.10 runs per game?  That would be shocking.  Put me down for +600 runs (0.25 runs per game) as the fair estimate, and I’ll take a dozen chocolate-glazed donuts.


#18    Rally      (see all posts) 2010/05/27 (Thu) @ 15:02

My belief is that 200 runs of TZ means 200 runs on the scoreboard.  If I had a dozen +20 shortstops in TZ, that alltogether they would save their teams 240 runs.  Not 500.  If I believed it was 500 I’d change the system.

I’d have to think about that question.  (What a real +500 shortstop should look like in TZ).  I don’t think it’s as simple as doubling the reported run values for everybody.


#19    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 15:04

Or is that 1% of BIP going somewhere else?

Right, an excellent point.  If there is NO bias, then we should see Rolen’s SS make the same % of outs on BIP that those same SS make with non-Rolen 3B.

If on the other hand, and what may be likely, is that those SS make fewer outs per BIP, then it’s very likely due to Rolen cutting those balls off.

And so, while we always presume we are talking about “turning a sure hit into a sure out”, what may be happening is “half the time, we are turning a sure hit into a sure out, and the other half of the time, we are turning a possible hit, possible out into a sure out”.

Excellent.

So, what I’m going to do is take the 3B leaders of our generation in WOWY (Rolen, Chavez, Beltre).

It’s also lovely that those three guys also happen to be great in UZR and in the Fans Scouting Report, and Gold Glove.

I’ll check it out over the weekend.


#20    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 15:12

Basically, TZ sees a player who is +50 and says “This is unbelievable.  So he must have had an unbelievable number of opportunities.  I’m going to call him +20.”

Right, as discussed, TZ puts a relationship that the more outs made, the more opportunities available.

While this makes some sense on a seasonal level (if a guy made 300 outs at SS, he probably didn’t have alot of balls hit to him, and if he made 600 outs at SS, he probably had alot of balls hit to him), this does NOT necessarily transfer to the career level. 

But, TZ processes things at the seasonal level, and simply adds it up.  If those same SS continue to make 300 outs and 600 outs each, well, guess what, those figures almost definitely reflect their value, that they both had say 650 balls hit to them, and one made 600 outs and the other made 300 outs.

This is the key point, that TZ implicitly does a regression at the seasonal level.  And, as we know, just because you regress at the seasonal level doesn’t mean you get to regress at the same rate at the career level.

For example, look at Rivera’s BABIP each year.  While you would HEAVILY regress the annual BABIP, you would NOT simply average out his annual BABIP for his career rate.  This is important: each of Rivera’s seasonal line is not independent when looking at his observed BABIP.

You may want to add 1000 BIP of league average performance to infer Rivera’s BABIP skill for each of 2009 or 2008 or 2007.  But, if you have 2007-2009, you do not add 3000 BIP of league average performance.  You still add the 1000 BIP of league average performance.

That’s why TZ, which may work on a seasonal level, can’t work as a “sum of parts” at the career level.


#21    Chris Dial      (see all posts) 2010/05/27 (Thu) @ 15:17

You guys talk differently than me.

Rolen in DRS through 2008 (carer) is at +157 runs which is about 200 plays in 14000 IP.


#22    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 15:38

Chris, that’s consistent with UZR, and TZ, and Dewan, and that works out to +.005 outs per ball in play (presuming about 4000 BIP per season).

Since we know he converted .098 outs per BIP, and since we know the average 3B converted .089 outs per BIP, this means that some combination of this is true in order for him to be only worth +.005 as you suggest, instead of being +.009 as I suggest:

1. he faced easier chances than average
2. he gave up more extra base hits than average
3. on some of the balls he made an out, they could have been made by the SS

If ALL of that is not true, then +300 runs is the better estimate.

I’d bet that #1 is not true.  I suspect that #2 is not true.  It’s very possible that some degree of #3 may be true.


#23    Rally      (see all posts) 2010/05/27 (Thu) @ 15:48

"Basically, TZ sees a player who is +50 and says “This is unbelievable.  So he must have had an unbelievable number of opportunities.  I’m going to call him +20.””

For the current version of the program, I get +36 as the assumed rating of the guy who is a true +50 fielder.  What I did was assume 67 more outs, and 67 fewer hits and errors.  The 67 saved were prorated from the average mix of hits to the LF hole (where the SS gets charge .40), up the middle (SS charged .55), and infield singles/errors (where the SS is charged 100%).

I don’t know about the old version of TZ.  It charges hits to fielders based on where the batter makes his outs.  Some of the +50 shortstop’s extra plays would be fewer errors, the blame for those are not shared.  I would have to assume that much of the saved hits would be saved from batters who hit a lot of balls to short, so the shortstop should be getting credit for much of those.  But certainly not all. At the other extreme, a hit saved on a ball hit by an extreme lefthanded pull hitter might only give the shortstop .03 hits or something like that.  What I can see is that the spread of good and bad fielding ratings has not changed that much from the 1950-1990 version to the 2003-2009 one.  Adam Everett has a +40, a few others are in the +20’s.  Mark Belanger made the +25 season regularly.  There are others who scored at +20 or more.  So I suspect that TZ is shrinking their ratings by a similar amount, and a +500 would only get 60-70% credit.


#24    Chris Dial      (see all posts) 2010/05/27 (Thu) @ 15:49

You say 4000 BIP per season, but he has couple of half seasons.

Generally, plays Rolen makes, the SS doesn’t have a good chance of making, so that’s likely a small percentage.  Yes, the SS would make some of them, but the gap there is good. 

How many popups a season are you including?  And couldn’t he get more chances than average (I could check this actually)?  And I am not sure that “Since we know he converted .098 outs per BIP, and since we know the average 3B converted .089 outs per BIP,”

DRS is compared directly to what the average 3B gets, so I question that (hopefully this isn’t in those other threads you didn’t want to re-hash, but I am working at the same time...)

Also, join SABR.


#25    Guy      (see all posts) 2010/05/27 (Thu) @ 15:54

"My belief is that 200 runs of TZ means 200 runs on the scoreboard.”

Here’s how I think Ozzie gets rated if he’s actually +50 plays in a season.  You tell me where I mess up. 

4500 BIP.  1260 hits, 3240 outs.
Average SS makes 648 outs (20%), credited with 252 hits, success rate = .72.

Ozzie turns 50 hits into outs, so he has 698 outs.  Team now allows 1210 hits, Ozzie is assigned 242.  His success rate is 698/(242+698) = .7425.  His Total Zone is (.7425-.72) * (242+698) = +21.2 plays.  So he gets credit for .42 plays for every play he actually makes.

So if Ozzie were +500 runs in reality, he would be +211 in Total Zone (under these assumptions).


#26    Guy      (see all posts) 2010/05/27 (Thu) @ 16:02

Note that I’m assuming “old” TZ for Ozzie, which has a larger problem. 

I’d estimate that a 3Bman in the old system only gets credit for about 30% of his extra plays made.  That could cost Mike Schmidt about 25 WAR over his career (unless the rest of his team was consistently very good defensively).


#27    Chris Dial      (see all posts) 2010/05/27 (Thu) @ 16:03

So if Ozzie were +500 runs in reality, he would be +211 in Total Zone (under these assumptions).

That seems like a flaw to me.


#28    Guy      (see all posts) 2010/05/27 (Thu) @ 16:28

Chris:  I agree we have to add “ballhog on IFs” to Tango’s 3 factors (comment #22) that could explain Rolen’s high out total.  But as it happens, Rolen’s ratio of assists:outs (2.73) appears to be exactly average for 3B.  So I doubt he’s grabbing a huge number of easy flyballs.  I’m sure we all agree that, when data permits, IFs should be excluded.


#29    Guy      (see all posts) 2010/05/27 (Thu) @ 16:40

To go back to Schmidt, he appears to be +589 plays made, or about +400 runs for his career (using range factor per 9 innings as very rough estimate).  TZ says he is +129.  My estimate is that TZ give 30% credit to 3B.  So that would mean Schmidt was +430 runs in “adjusted TZ.” See how nicely this works?  Rally should be able to calculate a muliplier for each position, separately for each version of TZ, which would tell us the player’s rating IF his teammates were all average fielders.  (For SS in new-TZ, it’s 1.4 as he showed above).

(Schmidt faced about 4% more RHH than average, so maybe he needs a small adjustment for that.)


#30    Chris Dial      (see all posts) 2010/05/27 (Thu) @ 16:40

Right, and in DRS, IFFs *are* excluded.  How many of these POs are CS?  Is that in there?


#31    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 16:42

For the current version of the program, I get +36 as the assumed rating of the guy who is a true +50 fielder.

Good stuff.  So if you have Rolen at +150 runs at a career level, he’s probably +208 runs using your system.  That inches it closer, almost half way, to what WOWY had.

You say 4000 BIP per season, but he has couple of half seasons.

I’m just saying a full year equivalent is 4000 BIP, just like a full game equivalent is 9 innings, even though Pedro’s 1.74 ERA is not really 1.74 ER per game… it’s 1.74 ER per 9IP.  So, that’s what I mean about 4000 BIP.

And I am not sure that “Since we know he converted .098 outs per BIP, and since we know the average 3B converted .089 outs per BIP,”

At the bottom of the main blog entry, I show all the numbers.  Those are factual numbers (insofar that Retrosheet is correctly reporting the balls in play, and the PO/A correctly.)

How many popups a season are you including? 

Since WOWY only deals with factual data, it doesn’t consider the subjective designation of flight path.  So, this would fall into the #3 I noted whereby plays that Rolen made could have been made by other players.  So, rather than around .80 runs per play, maybe it’s .70 or .60 runs per play (i.e., some of Rolen’s outs would have been someone else’s outs).

***

And I love Guy’s examples as it gives us the roll-up-your-sleeve-and-understand-what-you-are-doing basis for discussion.  Otherwise, it becomes a philosophical discussion, without any understanding of the impact on the landscape.

So, great job on Guy, and hopefully, Rally will continue to come back to refute or confirm.


#32    Chris Dial      (see all posts) 2010/05/27 (Thu) @ 16:44

(Schmidt faced about 4% more RHH than average, so maybe he needs a small adjustment for that.)

Ugh.  We know this often is incorrect.  Why do we still do it?


#33    Chris Dial      (see all posts) 2010/05/27 (Thu) @ 16:48

At the bottom of the main blog entry, I show all the numbers.  Those are factual numbers (insofar that Retrosheet is correctly reporting the balls in play, and the PO/A correctly.)

Um, yeh, about that.  I’m gonna need you to go ahead and break that down into BIP types for us to have comparable datasets.

So, great job on Guy, and hopefully, Rally will continue to come back to refute or confirm.

Thank you?


#34    Guy      (see all posts) 2010/05/27 (Thu) @ 16:51

Sorry, Chris, I got that number from B-Ref.  Just trying to anticipate arguments against the claim that Schmidt was +400.  Are you saying the RHH% is likely wrong, or that a higher proportion of RHH does not produce more expected outs at 3B?  (or both?)


#35    Tangotiger      (see all posts) 2010/05/27 (Thu) @ 17:09

How many of these POs are CS? 

If that was directed to me, I am ONLY looking at BIP.

Um, yeh, about that.  I’m gonna need you to go ahead and break that down into BIP types for us to have comparable datasets.

You can start by comparing to this:

“The actual numbers I have are this: 42,917 BIP for Rolen, of which he made 4197 outs, or 9.8%. “

That’s Rolen, for his career, excluding bunts, outside-the-park HR, and any pitchers-as-batters.


#36    Guy      (see all posts) 2010/05/27 (Thu) @ 17:26

Making the assumption that it must be more than a little annoying to have kibbitzers complaining about all the work you’ve done FOR FREE, I want to say to Rally that I’m only pushing this point because your version of WAR is rapidly becoming the most frequently used—if not hegemonic—metric for rating player careers.  It’s only because your work is getting so much totally-deserved attention that it’s worth arguing about getting it as right as possible.

And considering all the work that goes into measuring offense, position adjustments, etc.  exactly right, this is a rather big issue.  WAR says that Jeter and Ozzie are peers (Jeter is 4 wins better, 69 vs. 65).  But if we make these adjustments on fielding, Jeter becomes 64 (using Rally’s adjustment) and Ozzie perhaps 91—27 wins ahead of Jeter.  Giving Mike Schmidt another 30 wins moves him ahead of Wagner into 6th place all-time (until we revise Wagner, I suppose).  So this could change things a lot.


#37    Brian Cartwright      (see all posts) 2010/05/27 (Thu) @ 18:56

I looked up Rolen’s numbers in Oliver’s fielding stats

from 2005-2010 per Gameday, infield groundballs hit to Rolen, using WOWY to calculate expected values on batter handedness, bunt flag, etc. I do not currently have vector data but will shortly.

I consider these reliable as they only include balls Rolen actually got his hands on

BIP = 1610 
    IFH   RFE   RTE   RCE    Tot
exp 135.5  35.7  22.7   2.8  196.7
obs 103    37    12     1    153

IFH Infield Hits
RFE Reached on Fielding Error
RTE Reached on Throwing Error
RCE Reached on Catching Error

It’s harder to assign infield responsibility for ground ball hits to the outfield. Opportunities for 3B is all groundball hits to LF, then using WOWY and some formulas to decide how many are on the SS and how many on 3B. Being able to use the vector data will make the data more granular.

BIP = 1440
       GBH
exp  174.7
obs  181.3

Here I have Rolen -22 plays (-12 runs) 2005-2007, but +16 plays (+8.5 runs) 2008-2010.

Overall, +50 plays 2005-2010


#38    Rally      (see all posts) 2010/05/27 (Thu) @ 20:50

Third basemen are charged with .6 hits on singles to left.  They are charged 100% on extra base hits, errors, and infield hits.  So the rating of a true +50 player will be more than the +36 i get for shortstops.  I’d just have to look up the exact proportion of those events and I’ll do the calculation.

I’m not sure what it is with pre-1990 data, but the 40% estimate seems too low, because the range of values is similar to what we see with today’s numbers.  That’s what you’re getting assuming a shortstop gets .20 credit for hits.  I suspect the guys who get fewer hits when Ozzie is at short are usually batters who generally hit more balls to short.

If a top fielding 3B was +50, and the methods that got him to +40 today only would have gotten him to +20 pre 1990, then how are Nettles, Robinson, Bell, and Boyer still up at the top of the list?  Their too-conservative values should have been blown away by Beltre and Rolen.

At shortstop Belanger and Ozzie are at the top.  Vizquel and Luis Aparicio, who are perfect comps for each other all around, rank very closely at #4 and #5.

I see now bias towards either era.


#39    Rally      (see all posts) 2010/05/27 (Thu) @ 22:11

TZ would show a true +50 third baseman, playing alongside average fielders, as a +44 fielder.  For 2B the calculation should be similar to the SS (+36) and the 1B similar to the 3B.

For outfielders, there is no partial credit for hits.

Caveat: This does not mean that the guy who is reported as a +44 fielder is truly a +50 fielder.  He may be, or he may just have had more chances to field.


#40    Guy      (see all posts) 2010/05/27 (Thu) @ 22:51

Rally:  Starting with the old-TZ, can I assume that my calculation for Ozzie is roughly correct, that a +500 SS will appear to be about +200 in TZ?  Or if not, can you tell me the error in my calculations?  And if you think my estimate for 3B is too low (which it probably is, since I didn’t deal with errors), please tell us the right answer.  I can’t imagine TZ gives more than .5 for each extra play made.

Citing a few examples from the career leader board doesn’t really tell us anything.  I could as easily cite the fact that you have Boggs, Cirillo, Pendleton and Matt Williams as all about equal to Mike Schmidt as “proof” that the pre-90 TZ is too low on great fielders.  In any case, this isn’t about finding specific cases where TZ appears credible.  This is a systemic issue:  over a career, it HAS to give too little credit to good fielders and too much credit to poor fielders. It’s designed to do that. 

A complication is that TZ is polluted by the performance of teammates (adjoining players post 1990, all teammates pre-1990).  So Brooks looks so good in part because he was on the best defensive team of all time. (If you want to see a great example of this problem, check out Terry Pendleton who averaged about +14 playing next to Ozzie, but never exceeded +7 once he left St Louis).  Some of the examples you cite may well be explained by teammate influences.

The real test would be to compare the two versions of TZ to some simple measure of outs/expected outs.  For players who make a lot more outs than we expect based on total BIP, which version of TZ better approximates that?  I’m confident the post-1990 version will do at least somewhat better.

Attributing full responsibility on errors is also problematic.  That’s probably one reason Schmidt doesn’t do that well—he made a lot of errors.  TZ is concluding from this that an unusually high proportion of the Phillies’ non-outs were in Schmidt’s zone.  In fact, it just means that Schmidt missed a bunch of balls that scorers thought he should have had.  But to the team, it doesn’t matter if a player makes easy plays and misses hard ones, or vice-versa—all that matters is how many outs you make.

The issue of 3B cutting off balls that might reach the SS is an interesting one, and may paradoxically hurt Rolen’s rating.  If he makes a play on a ball that the SS would have made, we don’t want him to get credit (or not much).  But if he makes an out on a ball that would have otherwise been touched by the SS but not become an out, then I think Rolen receives very little credit while the SS actually gets a big boost. 

And that raises a more general question:  if a fielder touches a GB but fails to make the out, does TZ assume that hit was 100% his responsibility?  (I would look this up, but I can’t find Rally’s original MVN articles on line.) If so, that strikes me as a serious problem that explains a big chunk of what we’re seeing.  Players with great range will then get more blame for balls hit in the holes, while Jeter-types never touch the ball and thus share blame with their 3B or 2B.


#41    Guy      (see all posts) 2010/05/27 (Thu) @ 23:00

The last 2 grafs refer specifically to new-TZ.


#42    Peter Jensen      (see all posts) 2010/05/28 (Fri) @ 09:19

Brian - From your data on infield hits it looks like you are combining the numbers for ground balls and bunts.  I don’t know if this is a wise idea as I think that bunt coverage is a specific skill from range for a third baseman. 

Also, you have Rolen as a +43.7 on infield hits and errors and a -6.6 on ground ball hits.  But then you state that you have him as a +50 plays overall.  Is this an error, or are you incorporating additional information that you haven’t shown to us?  Even if the +50 plays is correct that should translate to around +40 runs which seems extremely low for Rolen.  I have him at 54.9 runs better than average for 2005-2009 not including bunts.  Is it possible that he has lost that much value on bunts and during 2010?


#43    Guy      (see all posts) 2010/05/28 (Fri) @ 09:49

Brian/Peter: 
Isn’t it problematic to assign 100% responsibility to fielders on hits simply because they touch the ball?  It sounds like this is what both of you are doing, and TZ too if I understand the system correctly.  Please correct me if I’m misinterpreting (btw, if anyone has link to Rally’s articles explaining TZ, please post that). 

This means that if a 3B ranges deep into the hole to knock down a ball but has no play at first, he gets a full -1.  But if the ball goes thru to LF, 40% of the debit goes to the SS.  We want the 3B to knock that ball down—it may have prevented baserunner advancement—but he gets penalized for it.  Now, every system will have a few idiosyncratic plays with “bad results,” but this seems like something that will happen quite a bit. 

So Brian says Rolen is +44 on IF hits.  But maybe he was really +75 and also handled another 30 hits that typically would have been fielded by the SS or LF.  We don’t know.  But we can expect that fielders with good range will touch more hits than poor fielder, with the perverse result that they get assigned more responsibility.  Basically, the more balls you touch, the more “opportunities” assumed.  And someone like Jeter, who doesn’t even touch a lot of balls that other SSs would stop, will benefit a lot.


#44    Rally      (see all posts) 2010/05/28 (Fri) @ 10:24

"Starting with the old-TZ, can I assume that my calculation for Ozzie is roughly correct, that a +500 SS will appear to be about +200 in TZ?”

I don’t think that’s right, as I’ve stated a few times.  But I can’t prove it.

“check out Terry Pendleton who averaged about +14 playing next to Ozzie, but never exceeded +7 once he left St Louis”

Aging has something to do with that as well.

“The real test would be to compare the two versions of TZ to some simple measure of outs/expected outs.  For players who make a lot more outs than we expect based on total BIP, which version of TZ better approximates that?  I’m confident the post-1990 version will do at least somewhat better. “

Of course.  I’d love to use the 2003-2009 version for everyone.  But the data won’t allow that.

“if a fielder touches a GB but fails to make the out, does TZ assume that hit was 100% his responsibility?”

Infield singles are counted towards the fielder it’s hit to.  So, yes, letting it go by would improve his rating unless the system knows more about the ball location than TZ does.  Hopefully fielders don’t know this, or fear of being macked as a Hanley will mitigate.

A ground ball 5 feet from the 3B, TZ is making the wrong choice about docking the player more for an infield single than an outfield one.  On all infield singles like that, the 3B should have more responsibility for those plays than for the average ball hit between third and short.


#45    Peter Jensen      (see all posts) 2010/05/28 (Fri) @ 11:06

Guy - My understanding of the post 1990 version of TotalZone is that Rally has used the zone information from the 1990’s Retrosheet data to establish the relative rates that 3B and SS field balls that are hit in the zone between them and hase use those rates to assign responsibility for the hits in that area, giving the average SS 60% of the singles to left and the average third baseman 40% (from Post #38).  The 60-40 split may or may not be correct, but that it is not the main problem here.  For discussion purposes let’s assume that it is. 

Having a fixed split based on average players at both positions works fine if the player that you are trying to evaluate is equally skilled as his adjacent team mates.  But if one player is much better than the other TZ tends to penalize the better player and reward the less skilled as you have correctly pointed out.

However, I think you are in error in saying that the solution can be found in a multiplier that can be applied across the board to the TZ values.  This is actually a variation on the same adjacent fielder problem that I brought up in the UZR thread.  The correct solution is to only compare a fielder with other players at the same position and not with what a combination of average fielders would do. 

For example a team of average fielders where there are 1000 ground balls fielded by the SS, 3B, and LF.  TZ knows that 600 are fielded for outs by the average SS and 3B at a 60-40 split so 360 by the SS and 240 by the 3B with 400 fielded for singles by the LF.  TZ then assigns 240 of singles to the SS and 160 to the 3B.  Since all 1000 GBs are hit past the 3B, his conversion rate is 24%.  Since the 3B has successfully fielded his 240 balls only 760 go past the SS and he successfully fields 360 for a conversion rate of 47.4%.

Rolen’s team has the same 1000 ground balls hit to the LF,SS and 3B, but Rolen fields 300 of the GB’s for outs instead of the average 240.  That leaves 700 balls possible chances for the SS and since he is average he fields 47.4% or 332.  That leaves 368 hits fielded by the LF.  These get divided 60-40 so 221 to the SS and 147 to the 3B.

The average 3B above had 240 outs and 160 hits, for an out rate of 60% ot total chances.  Rolen playing next to an average SS is credited with 447 chances (300 outs and 147 hits).  His expected outs are 268 and expected hits are 179 so he is underrated at +32 plays instead of +60 plays.  An average SS is expected to make 60% of his chances (360 of 600) and that is exactly the rating Rolen’s SS is credited with (332 of 553).

Now take the example where Rolen is playing next to a SS who is also 60 plays above average.  So Rolen still fields 300 GBs for outs, the SS fields 392 insteand of 332, leaving 308 hits to be divided up.  The SS gets 60% of 308 or 185 and the 3B gets the remaining 40% or 123.  Rolen now is computed to have made 300 outs in 423 chances.  His expected outs are 254 and his rating is a +46 plays made, much closer to his actual +60. 

A single multiplier will not work.  The actual multiplier is dependent on the ratio of skill between the two adjacent fielders.  The solution is not in multipliers, but in changing the methodology so that 3B are compared only with 3B and SS only with SS.


#46    Guy      (see all posts) 2010/05/28 (Fri) @ 11:30

Peter:  I’ve never suggested that a multiplier will fix the problem of a fielder’s rating being influenced by the quality of his teammates.  I think I’ve been very clear that this is an additional problem, which is why I favor rating players by comparing actual outs to expected outs, with expected outs determined by the average distribution of opposing hitters. 

However, there is a separate problem, which is that TZ hugely regresses players’ performance, even if their teammates are in fact exactly average (this is especially true for pre-1990 TZ).  I asked Rally to estimate multipliers, just so we could understand the scale of this regression.  If the multiplier is 3X for 3Bman, that doesn’t prove that Mike Schmidt specifically was three times better than his current TZ rating.  But it DOES tell us that should be our working estimate of his performance, perhaps to be modified by other considerations.

*

“I don’t think that’s right, as I’ve stated a few times.  But I can’t prove it.”

But Rally, I HAVE proven it (roughly).  It’s baasically a question of mathematics, not data.  If TZ assigns .13 hits to CF for each extra hit allowed, then Willie Mays will get credit for .37 plays for each hit that he turns into an out.  It’s not hard to estimate this. 

“Aging has something to do with that as well.”

Well, Pendleton aged a lot the minute he arrived in Atlanta.

*

In terms of post-1990 TZ (and other PBP metrics),
I hadn’t really thought enough about the implications of assigning full responsibility for balls when a fielder touches it.  That seems like a very problematic approach, which really punishes IFs with good range.  I also wonder how well it work in the OF.  On balls hit in the gap, won’t the faster/better OF often get to the ball first?  And if both have a play, won’t the better arm often pick up the ball?


#47    Peter Jensen      (see all posts) 2010/05/28 (Fri) @ 11:33

Isn’t it problematic to assign 100% responsibility to fielders on hits simply because they touch the ball?

My BZM metric assigns responsibilty for a ball only on where it was hit, as determined by the MLBAM hit ball locations.  The chances are calculated by the number of balls hit in his zone minus the number of balls touched by fielders in front of him.  The expected plays are calculated from the average rates of how many outs, outfield hits, infield hits, and errors, times the player’s actual chances.  So in BZM touching a ball and changing an outfield hit ball to an infield single doesn’t add to his chances and is only a plus for a fielder. Just not as big a plus as if he actually converted it to an out.

I have no idea what Brian does as he has not shared the details of his defensive metric with us.  As I discussed in my previous post, if I am correctly interpreting how Rally computes TZ, then it appears that computing a player’s chances by adding outs, infield hits and a percentage of outfield singles leads to problems such as you and I have described.

For the years not having the detailed PBP data that Retrosheet now provides us, I am not sure that we will ever even approach accuracy in evaluating fielding.  Maybe TZ is a close as we can get.  But for the years in which any hit location data is available, the numerical ratings of TZ should be ignored as it is clearly not accurate beyond an ordinal measure.


#48    Rally      (see all posts) 2010/05/28 (Fri) @ 13:09

"But Rally, I HAVE proven it (roughly).”

Proof based on an assumption which would take me too much time to verify.  If it was something like 3X, then the range of performances in old TZ would be 1/3 what we get from something like UZR, and maybe 1/2 of what we get from current TZ.  What I might be able to do is get a standard deviation for guys with 1000 innings in a season.

For now I’ll start with this:

TZ: 2003-2009
TZ_ps (project scoresheet, 1989-1999)
TZold: 1954-1988, 2000-2002

Average TZ of league leader, shortstop:
TZ +16, TZold +17, TZps +18

3rd base
TZ +19, TZold +16, TZps +15

Center field
TZ +19, TZold +18, TZps +21

The cost/benefit to me is not worth it to figure out a way to estimate what this multiplier should be, given the way it varies by the batter/pitcher combo.  The idea of a 3x multiplier has ramifications.

I can buy that the +16 shortstop might really be +21.  I cannot buy that the +17 shortstop really must be +40.  I see no reason to think that the spread of fielding talent was that much greater back then.

Maybe Ozzie was a +300 shortstop instead of +240.  If you argue that Schmidt was much better than +130, maybe he was.  The system is a rough approximation.  But if you tell me the typical +100 fielder was really a +300 fielder, I’m not buying.

To convince me there you’d have to show me either:
1) the spread in fielding talent has significantly, drastically narrowed

or

2) That modern fielding metrics (ignore TZ as Peter says. Look at UZR/DRS.) underestimate the spread of talent to a huge and significant degree.


#49    Guy      (see all posts) 2010/05/28 (Fri) @ 13:59

Rally:  Thinking about this some more, I don’t think you can answer this by comparing the variance of different systems.  I did say I thought TZold would have less variance, but that’s only true for the impact of plays where a hit becomes an out.  I had forgotten that TZnew assigns 100% responsibility on both errors and fielded hits.  That will also have a huge “regressing” effect, pulling good fielders back toward the mean.  So both systems apply massive regression, through somewhat different methods (at the heart of both systems, though, is the central problem that the more plays you field, the larger your defined opportunities).
So even if you found very similar variance in the methods, it wouldn’t say anything about whether it’s correct to apply this massive regression—it would only suggest that TZnew may be just as wrong as TZ old.

The issue that matters—which Tango said many comments ago—is how much does each metric assume that a player’s distribution of BIP differed from the average distribution?  So you should look at some great fielders (under both systems), and tell us how many chances above average they had given the total BIP.  If you tell us that TZ only assigns Ozzie 50-100 more chances than average over his career, then that seems plausible.  But if you tell us TZ estimates he had 500 or 700 more chances than average, well, that’s just not believable.  And the latter is the answer you will get.

You say “I cannot buy that the +17 shortstop really must be +40,” as though this is an issue of judgment or opinion.  But it’s not.  Leaving errors aside, TZold will always credit fielders with about .4 plays for each extra out they make (varies slightly by position).  It really is that simple.  It’s not that Ozzie “might be +500,” or “could be +500”—TZold is telling us that Ozzie MUST be +500 unless he faced a +6 SD ball distribution over his career.

Somehow the burden of proof seems to be placed on those who disagree with the “advanced metrics.” But with large samples, it’s the other way around.  The default assumption should be an average distribution of opportunities.  The burden of proof is on anyone who wants to argue that Ozzie’s 76,000 BIP (or Rolen’s, or Schmidt’s) were extremely atypical.


#50    Peter Jensen      (see all posts) 2010/05/28 (Fri) @ 14:18

However, there is a separate problem, which is that TZ hugely regresses players’ performance, even if their teammates are in fact exactly average

Guy - But it isn’t a separate problem.  As I showed in my example in post 45 a star 3B playing next to an average SS could be rated at a +32 instead of a +60 just because of the way TZ calculates prorates the hits as extra chances between the 2 fielders.  It has nothing to do with any regression; it is purely a fault of the methodology.  And it shares this fault with UZR as I tried to show in the other thread.  It just has much less effect in UZR because of the smaller zones some of which will hurt a fielder and some of which will help.  But when a disparity of skill exists between two adjacent infielders UZR will always underestimate the skill of the better fielder.

The correct basis for defining what is an opportunity for a fielder has to be determined first.  From that one can determined the expected number of outs.  Trying to fix TZ with a multiplier will cause more problems than it will solve.  You are better off starting from scratch to design a new metric for the pre PBP data.


#51    Tangotiger      (see all posts) 2010/05/28 (Fri) @ 14:24

It goes back to my three points:

1. he faced easier chances than average
2. he gave up more extra base hits than average
3. on some of the balls he made an out, they could have been made by some other fielder

For players with long careers who played on multiple teams, most players will have 1 and 2 as false.

So, it’ll come down to #3: the shared plays.  Be it groundballs that could have been handled by the SS, or popups, or whatnot.. .ANYTHING where the outs is not automatically a “sure hit turned into a sure out”.

The more we talk about it, the more I should simply presume that the standard should be that rather than using 0.80 runs, maybe I should use 0.65 runs, to account for the fact that some outs would have been made anyway.  Maybe pitchers should be 0.50 runs because there’s a decent chance that one of the other infielders would have made the play.

But yes, going back to Guy’s point, as a sanity check, you have to at least ascertain if #1 and #2 are true or false.  If we all agree that Rolen faced a typical distribution and allowed a typical rate of extrabase hits per basehit, then it comes down to #3.


#52    Guy      (see all posts) 2010/05/28 (Fri) @ 14:41

"It has nothing to do with any regression; it is purely a fault of the methodology.”

Peter:  We are saying the same thing.  I was using “regression” as a shorthand—what TZ really does is share blame/responsibility with other players, because it doesn’t know who actually allowed any given hit.  Plus, it assumes that extra outs = extra opportunities.  These two factors both push TZ toward the mean.  And this particular effect will be apparent even when teammates are average fielders.  So the two issues are related, but distinct....

And thanks for the clarification earlier on BZM.


#53    Brian Cartwright      (see all posts) 2010/05/28 (Fri) @ 14:47

Peter, I calculated bunts in a separately from swing away, but the report I was quoting has combined totals.

I’m sorry you think I haven’t described my process. I haven’t done a full blown article, but last year I did a couple of articles on infield and outfield (Defending Manny, Jeter Conspiracy) and I responded to you here a month or so ago.

My assignment of groundball hits is just about what is described in #45, except that I use WOWY to estimate the 3B/SS split pct for individual fielders, instead of using the same split for everyone.


#54    Rally      (see all posts) 2010/05/28 (Fri) @ 15:07

"Trying to fix TZ with a multiplier will cause more problems than it will solve.  You are better off starting from scratch to design a new metric for the pre PBP data.”

This I can agree with.  And I wish great luck to the unfortunate soul who decides to undertake the task.

“So even if you found very similar variance in the methods, it wouldn’t say anything about whether it’s correct to apply this massive regression—it would only suggest that TZnew may be just as wrong as TZ old.”

I can agree with this too.  Way too much work to try and find how much TZold understates the great fielders, so if the variance is similar, I’d rather focus on TZnew.  I can say that if I find the time to work on things I’d much rather try to incorporate gameday hit location into current TZ than try another approach to cover for the missing data in the past.

“As I showed in my example in post 45 a star 3B playing next to an average SS could be rated at a +32 instead of a +60 just because of the way TZ calculates prorates the hits as extra chances between the 2 fielders.”

In your example, your star 3B is saving runs only by making extra plays in the 3b-ss hole.  I could construct an example where the 3B saves 60 runs from doubles hit down the line, errors, and infield singles, and would be getting 100% credit for his efforts. 

A 3B who saves a proportional amount of each hit type (based on league averages) will get credit for 88% of his run prevention - 44 plays out of 50.


#55    Guy      (see all posts) 2010/05/28 (Fri) @ 15:46

"And I wish great luck to the unfortunate soul who decides to undertake the task.”
Would it actually be that hard for Fangraphs (not you!) to calculate projected outs based simply on opposing batters?  Haven’t you already done the hard work of determining every hitter’s ratio of outs by position? 

*

For TZnew, I still think it would be a useful exercise to estimate how “lucky” TZ thinks great fielders are.  And how unlucky it assumes poor fielders are.  I think you will be surprised by how much implicit regression there is, even in TZnew.

*

“A 3B who saves a proportional amount of each hit type (based on league averages) will get credit for 88% of his run prevention”

This is only true IF you accept the assumption that every IF hit and error represented an extra fielding opportunity.  Let’s say we have this typical distribution for a 3B:
Out 70
Hit-LF 20
IF hit 5
Error 5
Now we have a Schmidt-type player, with good range but he makes errors.  With same ball distribution, let’s say he makes 1 additional error, saves 4 hits (1 IF, 3 LF), and makes 73 outs.  Plus, he knocks down one ball that usually would go to LF (but no out).  This 3B is +3 plays, but TZnew will only call him +1.9, or .63 plays for each extra out made.  (Giving him credit for XBH will improve things a little, not sure how you do that.)


#56    Rally      (see all posts) 2010/05/28 (Fri) @ 16:52

Your example is assuming 20 hits to LF that are charged 0.6 runs to the 3B.  This is only the case for singles.  For doubles to LF, they are assumed to be down the line, and charged 100% to the third baseman.  Assuming his runs saved are from a proportional mix of events, and that on the team level we have the exact same number of opportunities (though the 3B estimated opps will change) I’m still going to get him saving 88% of the actual runs.

We can spend all day here coming up with variations on that (what if he makes x more errors but saves y hits) and get an infinite number of possibilities.  What a waste of time.  I gave you what the average should be.  Yes, it could be more or less than that depending on the 3B particular strength.  But we can come up with a virtually infinite number of possibilities by changing the parameters!

And that is with a method that treats a ground ball off any pitcher/batter the same!  J.H.C., and you tried to get me to commit to one number on the old TZ where the credit changes every batter?


#57    Peter Jensen      (see all posts) 2010/05/28 (Fri) @ 16:57

But yes, going back to Guy’s point, as a sanity check, you have to at least ascertain if #1 and #2 are true or false.  If we all agree that Rolen faced a typical distribution and allowed a typical rate of extrabase hits per basehit, then it comes down to #3.

Tango - Not necessarily.  It may have a lot to do with the criteria that you used for youe WOWY.  You looked at Total Outs/BIP.  You have Rolen at 4197 when the average 3B would have had 3819 for the same number of BIP.  That makes Rolen +378 plays.  But that is all plays, pop ups and line drives included.  Which is certainly relevant to Rolen’s skill, but not comparable to other fielding metrics that only are looking at ground balls.  I looked at assists per inning and compared to the average 3B and found Rolen to be +278 assists.  At .8 runs per assist that’s about +222 runs.  Much closer to the 147 runs Rally’s TotalZone has for his career. And we know TZ even in the modern version is going to compress the value of a star player like Rolen.  Rolen has 15496 defensive innings which translate to 11.4 150 game seasons.  Which would give make Rolen about +19.4 runs per 150 games figure by assists.  That is back to a “believable” number but is still likely higher than his value actually has been.


#58    Guy      (see all posts) 2010/05/28 (Fri) @ 17:25

I’m not saying that some odd combination of factors could cause TZ to underestimate a 3B.  Of course that’s true for any metric.  I’m saying that even new TZ will SUBSTANTIALLY underestimate the contributions of good fielders (on average), by far more than the 12% you are suggesting.  If you want to convince us otherwise, just tell us how many “chances” above average you allocate to Rolen and some other great fielders. 

On TZold, estimating a multiplier is extremely easy.  Every fielder will face an average array of opposing hitters overall.  There’s nothing hard about it at all.  I’m going to assume that our best estimate for pre-1990 players is about 2.5*TZold, unless/until someone can provide a more precise estimate.


#59    Rally      (see all posts) 2010/05/28 (Fri) @ 21:33

I did not write post #58, just for the record.


#60    Tangotiger      (see all posts) 2010/05/28 (Fri) @ 22:41

Peter, good point about the lineouts.


#61    Guy      (see all posts) 2010/05/28 (Fri) @ 23:34

Doh.  #58 is mine, obviously, not Rally’s.  Tango, can you fix?


#62    Chris Dial      (see all posts) 2010/05/29 (Sat) @ 14:18

Peter, good point about the lineouts.

Tango, that’s *exactly* what I was saying in #30 and #33 (that’s a joke line from teh movie “Office Space").  Since most other metrics only include GBs, and I CLAIM it is pretty well established that the best fielder gets teh shared outs, and that’s always Rolen, whereas for most 3B it is the SS.

So Rolen is +160 on +200 GB plays.  How many +GB plays does WOWY have him at?  And why are bunts excluded?  That doesn’t seem right.


#63    Chris Dial      (see all posts) 2010/05/29 (Sat) @ 14:20

Guy,
RHH do not distribute BIP like that very often.  You cannot assume that a pct of RHHs result in a % of 3B plays.  This is known as “The Chipper Jones Fallacy”.

The OF scatter plots that MGL and Max did before should show some of that.


#64    Matthew Cornwell      (see all posts) 2010/05/29 (Sat) @ 14:25

If the data is right, than its right. But I wonder out of curiosity how the perception of this quickly-being-accepted metric will be when the Rolens and Ozzies of the world end up with more WAR than Jimmie Foxx and other “legends”.  Isn’t that the expected result if they are being hugely shortchanged defensively?


#65    Rally      (see all posts) 2010/05/29 (Sat) @ 14:43

"But I wonder out of curiosity how the perception of this quickly-being-accepted metric will be when the Rolens and Ozzies of the world end up with more WAR than Jimmie Foxx and other “legends”.”

I would expect acceptance of the metric would come to a complete halt, it would be ridiculed.  Probably as it is, putting Ozzie Smith slightly ahead of Mark McGwire, would be the object of much ridicule if McGwire were still a hero and not a pariah.  But that doesn’t matter a bit to me if it were in fact true.  I’m just a long way from believing it.

To me it comes down to the spread of defensive talent.  If a typical league leader at a position should be in the +20 range (with a few legendary +30-+40 seasons now and then), as UZR shows, then TotalZone has the right spread.  If the typical league leader should be a +35 to +50 player, and a legend can be +70 or something, then TZ and UZR are wrong.  And Ozzie Smith was more on the level of Ripken/A-Rod than Trammell/Larkin.


#66    Tangotiger      (see all posts) 2010/05/29 (Sat) @ 17:20

Ozzie is 5 years older than Tim Raines.  When Raines hit free agency entering 1987, he signed for 5 million for 3 years under collusion (and likely might have hit 6 million otherwise).  Ozzie made 6.5MM$ for those same 3 years. And even when he was 39 years old, as a free agent entering 1993, he increased his salary yet again.

Ozzie was extremely highly regarded by the fans and MLB and paid as such.  He was voted overwhelming into the HOF in his first year.


#67    Matthew Cornwell      (see all posts) 2010/05/29 (Sat) @ 17:24

True, but I don’t think even the most avid Cardinal fans would put Ozzie in the top 20-30 players all-time.  Well, assuming at least moderate knowledge of baseball history.


#68    Tangotiger      (see all posts) 2010/05/29 (Sat) @ 17:25

"Since most other metrics only include GBs, and I CLAIM it is pretty well established that the best fielder gets teh shared outs, and that’s always Rolen, whereas for most 3B it is the SS. “

I don’t think of lineouts as “shared outs”, certainly not the discretionary outs you are talking about here.  The popups, yes.  But lineouts, no.  I see no reason to exclude lineouts.

As for bunts, well, that’s not to say there’s no fielding talent in bunts.  Just something to take care of separately.

Anyway, let me try to figure out how many outs Rolen’s SS make.  IF they make less than with other 3B, then there’s a shared-BIP issue to consider.


#69    Brian Cartwright      (see all posts) 2010/05/29 (Sat) @ 18:37

Peter 42 - yes, I goofed on the arithmetic. I have Rolen +43.7 on ‘hands’, -6.6 on ‘range’ for a total +37.1. I also have DP stats not listed.


#70    Guy      (see all posts) 2010/05/30 (Sun) @ 07:46

"I wonder out of curiosity how the perception of this quickly-being-accepted metric will be when the Rolens and Ozzies of the world end up with more WAR than Jimmie Foxx and other “legends”.

I think this overstates how much the ratings would be shaken up if we discovered TZ was indeed underestimating great fielders.  Yes, if you give Ozzie another 30 wins he probably does become one of the top 30-35 players.  But is it impossible to imagine that the greatest fielder ever at the most important defensive position belongs there?  (And he wouldn’t become Ripken’s equal, as Ripken too would likely get a boost). 

And Ozzie was a freak—the most extreme case.  Let’s say Rolen deserves another 10-15 WAR, and is as valuable as Chipper.  Why is that implausible?  If Willie May ends up close to Ruth and Bonds, would that shock anyone?  If we decide Gary Pettis earned 25 WAR instead of 18, will our understanding of the baseball universe be shattered?  I think we’d survive…

And looking at the single-season leaders isn’t going to settle this.  For one thing, the binomial variance and measurement error in these metrics are both so high at the single-season level that even a fairly large difference in implied talent variance will be hard to see.  More importantly, it misses the key point being made in this discussion, which is that careers and single seasons are fundamentally different. At the season level, assuming each play made implies more opportunities may give you more accurate results.  But at the career level, the assumption is far less justified.


#71    Matthew Cornwell      (see all posts) 2010/05/30 (Sun) @ 09:50

It’s just funny, because about 7-8 years ago, many in the sabermetric community were claiming Ozzie might not even be HOF worthy.  I will say, that the discovery of the actual impact and run value of defense is one of sabermetric’s most valuable “discoveries” the past 10 years.  Right up there with the quantitative value of positional differences and the fact that pitchers have less control on BIP than we originally thought.  Those were the three that changed my perceptions of the game the most.


#72    Peter Jensen      (see all posts) 2010/06/01 (Tue) @ 11:10

Tango - What query did you use to get the 4197 outs for Rolen for his career?  In other words, how did you define converting a BIP into an out?


#73    Tangotiger      (see all posts) 2010/06/01 (Tue) @ 11:58

Peter, the responsible fielder is the first fielder who touches the ball.  And if there’s an out on a ball in play, then that’s an out.

In other words, all of these get credited to the 3B:

5-4-3
5-6
5-3
5
5-2
5-2-6-1-6

I understand that we can look at some of those plays (especially the rundown play) and not like some of those choices.

And, if there were some of those kinds of plays that occurred in a large enough frequency where they could disproportionately affect a player, I’d think about it some more.

The key point is that if things happen often enough in a proportionate amount, then everything cancels out, and we are left with the one fixed variable, Rolen.

Someone can fairly charge that Rolen’s SS are squeezed out of some easy plays a disproportionate number of times (say infield flies).  That’s up to me to show if there is that bias.

And maybe Rolen’s 2B get alot of 5-4 plays because they are always in position to get those plays, etc.  Again, up to me to show bias or not.


#74    Peter Jensen      (see all posts) 2010/06/01 (Tue) @ 12:28

Tango - Are bunts included?  Is Rolen credited with two outs on a double play or just one?


#75    Tangotiger      (see all posts) 2010/06/01 (Tue) @ 13:36

My exclusions: bunts, HR over the fence, pitchers-as-batters, nonpitchers-as-pitchers.

Outs: an out play counts as 1 out (yes/no).


#76    Chris Dial      (see all posts) 2010/06/01 (Tue) @ 15:13

Tango,
my data includes line drives.  I don’t see a reason to exclude them either.  Plus I have demonstrated they don’t make a difference.  I do mean popups.  Bunts should be included.  Mostly, I’d be concerned when a bunt wasn’t properly noted.

There’s also foul territory based on his home park.  I am more concerned about his popup taking behind 3B than anything else.


#77    Tangotiger      (see all posts) 2010/06/01 (Tue) @ 15:29

Chris:

Just to be sure: are you suggesting a possible bias that is limited to home park in terms of fielding foul balls?  That is, a 3B, being more familiar with his home park, is more likely to field those foul balls at home than on the road?

As for bunts: if the classification of the BIP would be considered “objective” as bunt / not bunt, would you still have an objection?  Would you agree that we can separate (handle separately anyway) the sac bunts from the other BIP?

I agree that the farther back we go in time, the classification of a BIP that should be a bunt but is not would upset my claim that I am using “objective” data (other than SH).


#78    Chris Dial      (see all posts) 2010/06/01 (Tue) @ 15:36

Re: fielding foul balls - mostly that someone has a home park that allows more foul balls to be caught (like Oakland, historically).  I couldn’t tell you about Rolen’s home parks, but we should at least categorize them.

I didn’t mean “objective/not objective” as much as “wrong”.  Which I guess is an objective aspect.

I am really looking for this: in my data of 3800 plays, Rolen has 3600 GBs (including bunts) and 200 LDs.  In your data (4200), he has 3500 GBs, 200 LDs, 500 popups (some fair, some foul).

Some breakdown like that to see if the difference is “obvious”.


#79    Rally      (see all posts) 2010/06/01 (Tue) @ 16:10

If at all possible, popups should be excluded from the analysis.  Popups in fair territory are caught something like 99% of the time.  It doesn’t matter if the 3B makes more of them, somebody is going to catch it.  For foul popups you might have more skill come into play, a quick 3B might catch some foul pops that others won’t get to.  But the opportunities are not in your dataset, a foul pop that lands behind third base isn’t a BIP unless it’s caught.  And as Chris mentions, the parks and amount of foul space will make a big difference.

For Rolen’s career I think retrosheet has the bbtype coded, at least for the outs (hits since 2003).  So excluding popups should not be difficult. 

Line drives are an area where good fielders might be saving lots of runs we don’t know about.  You don’t know opportunities very well.  Retrosheet can tell you it was a line drive caught by the 3B or a line drive hit to left field.  But you don’t know if it was just missed by the 3B or 30 feet over his head.  Gameday data won’t tell you that either, though at least if you work out the angles you know if it was hit 5 feet or 20 feet from the typical 3B position (horizontal, not vertical).  Zone rating counts the line drives in the numerator and denominator, but that doesn’t change the ratings much, just allows those who catch a lot of liners to bump their ratings up a bit.


#80    Chris Dial      (see all posts) 2010/06/01 (Tue) @ 16:32

Line drives are an area where good fielders might be saving lots of runs we don’t know about. 

I don’t think so.  You did a bunch of the research on this (whether inadvertently or not), but IIRC, no player stood apart in the catching of linedrives, nor in the not catching (I think there were one or two now that I type this), but mostly, people hovered around average.  No, it doesn’t ID opportunities, but the catching of LDs per inning (and assuming pitchers don’t prevent LDs), then it stands to reason that no fielder is particularly good at them or bad at them over a large sample of innings.


#81    Peter Jensen      (see all posts) 2010/06/01 (Tue) @ 16:58

Rally and Chris - I looked at Rolen from 2000-2009.  He was +9.6 on Pop Ups and +7.2 on Line Drives in runs for the 10 years with no controls for park.  About 8% of the value that I calculated he had on GBs over the same period using the same methodology (roughly Tango’s Outs/BIP).


#82    Tangotiger      (see all posts) 2010/06/01 (Tue) @ 17:06

Peter, good job.  What was the frequency of pops, lines, ground, relative to other 3B?

***

Rally: insofar as I’m concerned, my decision to lumps pops, liners and grounders together is because of the inherent subjectiveness of the classification, which may lead to bias.  As I’ve noted in the past, catching something called a line drive is worth alot more than catching something called a popup or something called a groundball, and I’d be placing trust in the opinion of the scorer.

Separating out bunts is less problematic in my view because all SH are by definition bunts, and that leaves just the subjective non-SH called as bunts.  But, I have a hard time believing that there’s much subjectiveness in whether a batter bunted or not.


#83          (see all posts) 2010/06/01 (Tue) @ 17:15

Peter,
can we just get the counts?  If we differ on the number of GBs we have something else going on. 

Tango, wrt bunts, I just mean they didn’t write it down correctly, not that they couldn’t tell.


#84    Peter Jensen      (see all posts) 2010/06/01 (Tue) @ 17:43

Tango - I did the calculations yesterday before I had your descriptions that I asked for today.  I calculated lg avgs for 3Bs in all categories to give expected rates and then multiplied by Rolen’s BIP to give expected plays made in each category and then subtracted Rolen’s actual numbers.  I multiplied the plays made differentials by .8 runs for the pop ups and LDs and .71 runs for the GBs. The .71 is the lg avg run value for a GB that the 3B fails to field for an out including the run value of those balls fielded for an out by the SS.

I am going to redo with your new descriptions and Rolen’s 1990s years added.  What I am trying to get at is whether your Outs/BIP is a good estimator of a player’s actual chances or whether it is biased by the quality of the player.  I think that your methodology may solve some of the problems that TotalZone and UZR have with the shared fielders, but I want to make sure that it doesn’t introduce to much variabilty due to varying hit ball distributions.

You shouldn’t have a problem with any bias on hit types.  The way you are doing things here catching a line drive is worth exactly the same as catching a pop up.  Missing either one is still one fewer play made and they each use the same run multiplier.

If you really want to get an idea of the maximum runs that a superior 3B can save I would think you would want to include bunts and DPs.


#85    Peter Jensen      (see all posts) 2010/06/02 (Wed) @ 09:21

Tango - There seems to be a problem with how BEVENT parses the Event Description 5/FL.  I interpret that as a line drive that the third baseman caught in foul territory.  BEVENT considers it a fly ball in the Batted Ball Type in field 47.


#86    Tangotiger      (see all posts) 2010/06/02 (Wed) @ 09:31

Tango, wrt bunts, I just mean they didn’t write it down correctly, not that they couldn’t tell.

Well, that’s a fantastic point.  I’ll have to think about it.


#87    Tangotiger      (see all posts) 2010/06/02 (Wed) @ 09:39

Peter/85: Point out the specific game_id and event_id (first and last field in the record), so someone else here can verify what CWEVENT gives us.

***

As a side note, and this applies to any BEVENT user: CWEVENT does everything BEVENT does… plus more.  It has 52 extra fields, of which a decent number are fields that I proposed to Ted that are very useful.  Ted put in a bunch of other useful fields.  Put simply, CWEVENT rocks.

If you go to the EVENTS table, you will see all the extra fields from CWEVENT (start after event_id):
http://www.tangotiger.net/wiki/index.php?title=Retrosheet_Database#RETRODUMP_schema


#88    Peter Jensen      (see all posts) 2010/06/02 (Wed) @ 10:01

ANA199707040, Event 21. Last play of the third inning.  It is happening for any play that has the Event Description of ?/FL*, which is over a hundred plays a year in the 1990s and over 1400 plays a year in the 2000s.  The foul flag field is correctly interpreted as “true”.  You still may want to point out the problem to the Retrosheet users group for those who still use BEVENT.


#89    Peter Jensen      (see all posts) 2010/06/02 (Wed) @ 10:20

Actually its only about 1000 a year that are misclassified in the 2000 to 2009 years since all the outfield ?/FL balls are probably fly balls anyway.  And FL signifies any ball caught in foul territory, so most of the infielder caught balls are probably pop ups and not line drives as I had first thought.


#90    Rally      (see all posts) 2010/06/02 (Wed) @ 11:39

I’m not sure I follow.  The /FL plays are caught in foul territory.  BEVENT is correctly marking them as foul.  Is the problem that they are counted as flyballs when they should be popups or linedrives?

I would think that the only player who could catch a line drive in foul territory would be the 3B or 1B guarding the line.  If any other fielder has enough time to run to foul territory and make a catch, it’s not a line drive.  So the issue is popups vs flyballs, unless I’m not getting it.


#91    Peter Jensen      (see all posts) 2010/06/02 (Wed) @ 13:48

Rally - Yes, as I said in my post #89.

Tango - I was able to replicate your total plays made by Rolen exactly, 4197, but for some reason was a little higher on the total BIP at 42953 instead of your 42917.  My league conversion rate (Outs/BIP) was 8.87%.  Rolen was a plus 389 plays total.  With all the FL plays placed in the pop ups Rolen’s 389 broke down to +248 on GB, +106 on PU, and +34 on LD.  The +106 on pop ups is a bit suspect since one would need to do a WOWY on the parks for 3B pop ups to control for the different foul areas.  The +248 on GBs translates to +176 runs at .71 runs per ground ball.  That is much closer to Rally’s 147. 

What is unbelievable is how superior Brandon Inge’s numbers are to any other third baseman of this era (1996-2009) on the basis of total runs saved per inning played.  He is second to Rolen with 310 plays saved, but in fewer than half of Rolen’s innings.


#92    Tangotiger      (see all posts) 2010/06/02 (Wed) @ 14:02

Right, Inge is off the charts.

Can you show Rolen’s BIP and out rate, along these lines:

GB
LD
PU
FB

And then show the same numbers for all 3B.

Good job…


#93    Peter Jensen      (see all posts) 2010/06/02 (Wed) @ 14:11

Tango - I don’t calculate individual player’s out rates as I explained in Post #84.


#94    Guy      (see all posts) 2010/06/02 (Wed) @ 14:17

Peter:  nice work.  One small point:  I believe Rally’s +147 includes the value of XBHs Rolen prevented/allowed to LF.  Presumably that includes some of the +34 you find for LDs.  So on GBs alone, it looks like there’s still a 40-50 run difference between TZ and your/Tango’s analysis. 

The critical question is how much of the +106 plays on PUs is real, if any.


#95    Rally      (see all posts) 2010/06/02 (Wed) @ 15:30

The line drives are not considered part of TotalZone.  The extra base hits charged to 3B are only groundball extra base hits.  Unless you think that some of the line drives Rolen snags would have been called groundballs had they not been caught, which could be possible.

If you use .80 runs per play for 3B (about what Chris dial uses for ZR), then it’s a 50 run difference.

On the popups, I suspect Rolen gets more of the elective ones just be being 6’4.  I remember the description when Troy Glaus played alongside David Eckstein, Glaus “outrebounded” the little guy.  Do you have the numbers for all the 3B?  If height is a factor I’d expect Figgins to be fairly low on the popup rating. 

The popup rating, as a measure of defensive performance, is only relevant if the guy not getting them is allowing popups to fall in for hits, or in the case of fouls, prolonged at bats.  If I remember correctly 99% of popups are caught.

Peter, you are just looking at player vs. league average rates, right?  Not controlling for mix of batters or pitchers?


#96    Guy      (see all posts) 2010/06/02 (Wed) @ 16:01

Sorry, Rally, by bad.  Thought you looked at all XBH.

*

It would be great if Tango and/or Peter could provide a similar breakdown for a couple of SSs.  Maybe Vizquel and Sanchez, who I think are the top two post-1990 SSs in TZ.  I believe TZ likes Vizquel much more than WOWY does, while Sanchez is the reverse.


#97    Peter Jensen      (see all posts) 2010/06/02 (Wed) @ 17:31

Peter, you are just looking at player vs. league average rates, right?  Not controlling for mix of batters or pitchers?

Yes, just league average rates.  Tango’s analysis in the original post had controlled for batters and pitchers and had found them average.  Although I did all 3Bs while I was at it I acknowledge that only Rolen’s stats have any controls over the other factors.

.71 is the correct run value to use for the ground balls to the 3B.  This run value takes into account both extra base hits and outs that the SS gets on GB hit past the 3B in the 56 hole.

Guy - Yes, you definitely need to adjust the pop ups for park.  And as Rally points out the run value needs to be adjusted to account for the balls missed that fall foul.  The value of a PA is .12 and the value of a pop out is about -.26 or -.27 I believe.  So the value of a +pop out couldn’t be any lower than +.35 or so even after adding in some value for an extra strike.  But seeing Rolen and Beltre at the top of the list for Plus PUs makes me think that there is some skill involved.  Mike Lowell and Ryan Zimmerman who are below average in GB runs saved are number 3 and 4 in PU plays saved with 82 and 66.  But any meaningful comparisons await so analysis of park effects for PU.


#98    Guy      (see all posts) 2010/06/02 (Wed) @ 17:50

I’m more concerned about the “ballhog” factor than park effects.  There may be 1-2 extreme parks that cause a problem, but the differences in foul territory don’t seem that big these days for the most part.  My worry is 3B who take a lot of PUs away from SS, P, C, and/or LF (as I think was true for Orlando Hudson at 2B, at least in some seasons). 

If these are literally 99% outs, then whatever small talent Beltre, Rolen etc. might have doesn’t seem like enough to be worth introducing such a big potential bias.  On the other hand, if Beltre and Rolen are getting more of these, then maybe we have a scoring problem where shallow FBs caught by an IF get labeled a “PU” while those that drop or are caught by an OF get labeled “FB?” Is that possible?


#99    Rally      (see all posts) 2010/06/03 (Thu) @ 10:18

Mike Lowell #3 in popups seems strange.  There are a few reasons a 3B can get more popups than others:

1) He’s quick, and can get to foul popups down the left field line that would just be foul balls for other 3B.  This obviously does not describe Lowell.
2) The ballpark has acres of foul territory, like Oakland.  Fenway park, not so much.
3) He plays behind pitchers who get a lot of popups.
4) He takes a larger than normal share of popups that catcher or SS could also get.

Maybe Lowell fits #3 or #4.


#100    Tangotiger      (see all posts) 2010/06/03 (Thu) @ 10:41

WOWY controls for the park, as we know.  For the 12 3B with the most BIP, the guy who benefitted the most from his park was Eric Chavez (no surprise).  It’s not that big of an effect, though.  The league average was 9.0% outs per BIP in the years Chavez played, but in his parks, it was 9.1% (excluding Chavez).  So, it’s an extra .001 outs per BIP, so with 4000 BIP in a year, that’s 4 extra outs (I should have rounded to an extra decimal place anyway) that Chavez’s parks have generated over the average park.

At the low end is ARod, and in his parks, the 3B (other than him) make 8.6% of outs per BIP, compared to the 8.8% they made in the years he’s played.

However, the largest source of bias for any player is Mike Lowell and his pitchers.  His pitchers have been VERY 3B-friendly, as their 3B (other than Lowell) have gotten 9.6% outs per BIP.


#101    Guy      (see all posts) 2010/06/03 (Thu) @ 11:14

Tango:  have you ever updated Jeter’s WOWY rating for 2008 and 2009?  I’m curious as to whether it supports the idea he performed as a league-average fielder in those years.


#102    Peter Jensen      (see all posts) 2010/06/03 (Thu) @ 11:48

Rally - Youkilous is also high in pop ups for his time at third.  When you think about it, there is some logic to it.  Batters trying to hit a HR must hit at a higher than average vertical angle to clear the wall in left.  If they miss by a little bit, the result has a higher likelihood of being a pop up.  This will show up in your category 3, but it is really a result in a change in batter strategy to increase home runs.  Just a guess on my part that needs some statistical evidence to back it up.  If Fenway has a higher than average pop up rate for third and SS, but not for 2nd and 1st, it would lend support to my theory.


#103    Tangotiger      (see all posts) 2010/06/03 (Thu) @ 12:09

Jeter:

No, I don’t see it.  I have him with 396 outs on 3563 BIP in 2009, for 11.1% outs per BIP.  And I see no evidence that his batters, pitchers, or park were biased against him.

Presumably the data recorders (BIS, MLB.com, and STATS) have the hit locations of Jeter’s balls farther than that of the average SS, and so Jeter gets more runs for each BIP he fields, which is why he ends up looking good in UZR, Dewan, etc.

It’s possible that this is what happened, that even though Jeter recorded 11.1% of all BIP into outs in 2009, compared to his 11.4% of all his career (55,027 BIP), that in 2009 he happened to get a weird distribution of balls, even though he had a typical distribution of batters and pitchers.  That the average SS would get 12.1% outs, had he faced Jeter’s actual distribution he’d only get 10.8% or something.  I mean, it’s a very tough sell that he could have had that an atypical distribution.  It could have happened.


#104    Peter Jensen      (see all posts) 2010/06/03 (Thu) @ 12:26

Tango - It is not only the distribution of hit balls, it is the number of balls that get fielded in front of Jeter by the pitcher, catcher and third baseman.  In 2008 and 2009 Jeter had more balls fielded in front of him than average.  This would lower his true chances to lower than the number that WOWY would calculate from his BIP alone.  Because of that, BZM has Jeter rated at 2 runs above average total for the 2 years combined.


#105    Chris Dial      (see all posts) 2010/06/03 (Thu) @ 12:46

It’s possible that this is what happened, that even though Jeter recorded 11.1% of all BIP into outs in 2009, compared to his 11.4% of all his career (55,027 BIP), that in 2009 he happened to get a weird distribution of balls, even though he had a typical distribution of batters and pitchers.  That the average SS would get 12.1% outs, had he faced Jeter’s actual distribution he’d only get 10.8% or something.  I mean, it’s a very tough sell that he could have had that an atypical distribution.  It could have happened.

I don’t think so.  Going back as far as 200-2002, Jeter has lower rates of BIZ.  The Yankees threw fewer GBs to SS than one would expect, just as the Braves threw fewer GBs to 3B than one would expect.


#106    Chris      (see all posts) 2010/06/03 (Thu) @ 15:42

"I don’t think” what?  Are you saying that you think Jeter did have an atypical distribution?

I know Mike Emeigh made this case for Jeter years ago.  But I find it very hard to believe that Jeter gets a more difficult distribution than expected, year after year, even after controlling for his pitchers and the stadium he played in.  To the extent the PBP data shows this (it isn’t clear to me that it really does), I find it vastly more likely that it reflects some bias in the data than reality. 

And let’s say, improbably, that it is true:  for a decade and a half, Jeter has gotten far fewer balls to field than anyone else.  That probably mean that pitchers think Jeter is such a bad fielder that they consciously pitch in a manner designed to reduce GBs to SS.  And if they are pitching non-optimally, it means they are giving up more hits (or more XBH) than usual to other places on the field.  So even then, WOWY is likely giving you a better estimate of Jeter’s real contribution.

And let me ask this:  If this were anyone other than the leading star of the most famous team in baseball, would anyone take this argument seriously for more than 2 seconds?  I don’t think so.....


#107    Guy      (see all posts) 2010/06/03 (Thu) @ 15:49

Peter:
Isn’t it likely that most of the “extra” GBs balls fielded by the Yankees’ C, 3B and P would have become hits in Jeter’s zone had they not been successfully fielded?  I think there are relatively few GBs that can be turned into an out by 3B or C but which the SS—especially Jeter!—could also field.  (GBs fielded by the pitcher I can see in some cases.) It seems to me this information makes the opposite point:  it explains one reason the PBP data in 2008-09 is making Jeter look better than he really was.


#108    Guy      (see all posts) 2010/06/03 (Thu) @ 15:51

Damn. Comment #106 is mine, directed TO Chris, not from him.  Gotta stop doing that......


#109    Peter Jensen      (see all posts) 2010/06/03 (Thu) @ 18:45

Yes, I think you are right Guy.  But all any defensive metric can aspire to is to accurately record what actually happened.  To project a fielder accurately you need to look beyond the simple +-runs to try and understand why it is happening.  I don’t think pitchers can or would try to pitch in a manner to minimize the number of balls going to Jeter.  However, I think it is perfectly plausible to presume that a pitcher or third baseman would position themselves or make a special estra effort to field balls that they might let go through to the SS if the SS was more likely to field them for an out.


#110    Guy      (see all posts) 2010/06/04 (Fri) @ 12:59

"All any defensive metric can aspire to is to accurately record what actually happened.”

Well, yes and no.  The metric should definitely tell us what happened.  But these metrics also try to tell us what “should” have happened (given an average fielder), by measuring opportunities.  And what’s becoming clear to me, the more I learn and think about it, is that the PBP metrics face serious limitations in estimating opportunities.

It seems obvious that PBP data must be a big improvement over simply comparing a player to the average number of outs made (for given number of BIP), or even WOWY.  After all, PBP gives us a huge amount of additional information about where the balls really went.  That MUST be better than counting flyballs to RF when evaluating SSs, right?  MGL has said as much.

BUT, the PBP data is missing one very important piece of evidence:  it doesn’t know how many outs this player should have made, under normal conditions.  That’s a hugely powerful piece of information, but UZR, TZ, and +/- don’t use it.  So it doesn’t have to be true that PBP metrics are more accurate—it depends which information matters more, the variation in opportunities or the central tendency.  Indeed, over large enough samples, it’s almost certain that getting the central tendency right is more important, and something like WOWY will be more accurate (or that the results will converge, if the PBP metric is very good). 

These metrics have to calculate opportunities by looking at where specific balls went, or ended up.  And that introduces problems.  Each metric has to make a lot of decisions about how to allocate opportunities, and each decision—which seems to make sense in isolation—can actually bias our estimate of opportunities.  A few examples (not every example applies to every metric):

1) As Peter has just shown here, a SS’s teammates may make a lot of “extra” plays in front of him. When that happens, difficult opportunities for the SS are removed, and he appears to be better.

2) Fielders are sometimes assigned more responsibility (i.e. higher penalty) when they touch a ball. If a SS knocks down a GB in the hole and prevents runners from advancing, he gets penalized more than if he fails to reach the ball at all. So high range fielders are penalized.

3) Errors are assumed to be 100% the responsibility of the fielder. So a fielder who makes few errors but also converts fewer tough opportunities will get a higher rating than the reverse (even if they make same number of outs).  I think this is why Vizquel looks good in TZ and UZR, but WOWY says he has been just average.  Vizquel makes about 8-9 fewer errors than average each year, but doesn’t make more plays than average.  That could account for almost all the positive value he gets in TZ and UZR.  UZR is concluding Vizquel faced fewer easy chances because he made few errors, but over his whole career that’s almost certainly not true.

Every one of these decisions represents a good-faith effort to measure fielders’ opportunities, given the data that metric has to work with. But each one also introduces some error of its own. It’s just not clear that the data is, or can be, good enough to really measure opportunities with enough precision to be materially better than WOWY.  We know the metrics aren’t worth much with only one year of data; and over a long career, we usually won’t need the metrics.  The question is whether they add value for players with 3-5 years—maybe, but I’m not yet convinced.

And on top of all this, there is the data.  I think it’s likely that the difficulty rating given to each ball by the stringers is impacted by how close the fielder comes to the ball (and whether it’s fielded)—so fielders who are well-positioned and/or get to the ball quickly get rated as having “easier” chances (and slow fielders are thought to have tougher chances).  If true, this will have a powerful “regressing” impact on ratings.


#111    Rally      (see all posts) 2010/06/04 (Fri) @ 13:41

In some cases a player with a lot of range may have more errors and stop more balls on the infield that are called hits.  But for the most part, these are not the same players getting shortchanged in both areas.  I’ve found a negative correlation between error rate and infield hit rate.  There are some players, who we can call saints, who the scorers don’t want to give errors to.  So they wind up with more infield hits than most.


#112    Tangotiger      (see all posts) 2010/06/04 (Fri) @ 14:14

Let me give you another one to chew on.  We all love Linear Weights, and we think it’s much better than Runs Participated In (RPI or R+RBI-HR).

(And, really, we should include my Batter Assists and Batter Blocks in some way in RPI.)

Let’s say however that each player is rotated in the batting order every game.  So, in Game 1, Rickey bats first, in Game 2, Rickey bats second and so on.

After 2000 games, what would you trust more: Linear Weights or RPI?

RPI contains extra information: how well you performed with men on base.  It also includes information about your teammates, but by being rotated around, alot of that will get suppressed.

So, when we look at data, and if we have a reasonable expectation that all the biases in the data will get wiped away, we don’t haev to go to extraordinary lengths to remove biases from each and every single data point.

This is what UZR and Dewan does.  And, like I said, up to 3 years, trust that more.  But, more than say 6 years?  Well, how much true biases can there be in most cases, and how much is now observer bias?

And, observer bias is a big killer.


#113    Guy      (see all posts) 2010/06/04 (Fri) @ 15:08

"There are some players, who we can call saints, who the scorers don’t want to give errors to.  So they wind up with more infield hits than most.”

That’s interesting.  I can see that those are offsetting factors in TZ, for the most part.  But I think in UZR, MGL just treats errors separately, so each error plus/minus is basically an extra play made or missed.  But if those become infield hits, the fielder isn’t debited anything extra.  (We had a long and painful thread on this recently that I can’t bear to look at again, but I think I’ve got that right.) So UZR will tend to overrate the “saints” you mention, with Vizquel probably the poster boy for this.

And Rally, would you agree that this is potentially a problem in TZ-old as well?  I think a high-error/high-range fielder (like Schmidt) gets penalized in TZ-old, while a “saint” would have a significant advantage.


#114    Rally      (see all posts) 2010/06/04 (Fri) @ 15:55

Yes, a “saint” will have an advantage.  Not sure how significant it is though.  Especially with low error rates in modern times.  There aren’t many players who consistently have higher error rates than average, like making 10 more. 

Schmidt’s not the best example.  No doubt he had great range.  But his error rates were actually a bit better than league average.


#115    Peter Jensen      (see all posts) 2010/06/04 (Fri) @ 16:40

I feel like I have learned a lot from this thread. I hope others feel the same way or at least have some new issues to think about.

We started with Tango’s assertion that his WOWY assessment of Rolen might have him at twice the value for his career compared to TZ and UZR.  From that came some evidence that TZ has major problems with how it computes opportunities and that both TZ and UZR have problems with how they assign responsibility to fielders in shared zones.  But we also discovered that much of the “extra” value claimed by WOWY was actually extra plays on pop ups and line drives that the other fielding metrics weren’t evaluating.  Plus, the extra pop ups might not really be contributing much if any value because half of them happen on foul balls and the fair balls might be skewed by park effects.  What a mess.

Guy, trying to accurately assign a value to opportunities is the crux of what a fielding metric is.  The plays a fielder actually makes is pretty straight forward.  The only other problem of significance is how to assign run values to the plays and I think a consensus on that could be easily reached. 

One also has to consider what use the data from a fielding metric is put to.  If TZ says that Ozzie is the best defensive SS that has ever played the game do we really care if it is underestimating his value by 20% or even 50%?  But it is irresponsible to use TZ to evaluate current players where contract dollars and trade value depend on the most accurate evaluations possible.  We really have three eras that require 3 differnt metrics; the hit location era, the PBP era, and the pre-PBP.  And you could possibly separate the pre-PBP era into the box score era and the pre-box score era since the amazing Retrosheet is close to completing the box scores from 1920 to 1952.

I see no inherent advantage in accuracy to the WOWY method even for career assessment.  It may be easier to calculate.  It may be possible to use a WOWY version to give a reasonably approximation of defensive value for the pre PBP era.  But essentially all the checks that WOWY does should be included in any responsible comprehensive PBP defensive metric.  I do think WOWY made perfect sense in evaluating catchers where the shared responsibilty with pitchers for the outcomes of plays was a difficult problem.  And I am glad Tango introduced the method as it has helped to provide a check on the other defensive metrics.  But it doesn’t make sense to use WOWY to estimate fielding opportunities for the hit location era when even the problematic observer based hit locations that we have now give a more accurate description of events than does WOWY.  Even for the PBP era and even if Tango used WOWY to estimate by Batted Ball Type, there are better ways to estimate opportunities than WOWY. 

The adjacent fielder problem is very complex. We have only touched on some of the issues here.  Although I acknowedge the possiblity that Guy suggests above that a fielder may be taking more difficult to field balls from in front of another fielder.  But that is not the only possible interpretation.  Fielding is definitely a team activity and fielders will adjust to the abilities of the adjacent teammates in ways that are not entirely predictable.

Many people seem to think that having more data through Hit f/x and Field f/x will make possible fielding metrics that will be orders of magnitude more accurate than our present metrics.  It will certainly be helpful to have the objective data replacing subjective data.  But don’t be surprised if all we gain is a little more confidence in the numbers that we are already generating.


#116    Guy      (see all posts) 2010/06/04 (Fri) @ 17:00

Rally:  Thanks.  Schmidt’s error totals (20+ per season) looked high to me, but I see that the average 3B was a tad higher.  And any player who plays a long time probably isn’t making a lot of errors, and/or the scorers are going to stop giving errors to an accomplished veteran.  So to the extent this is a problem, is’t really about overvaluing the “saints,” not undervaluing high-error players.  I do think you would find that Vizquel’s rating owes a lot to this, at least in UZR.


#117    Guy      (see all posts) 2010/06/04 (Fri) @ 17:26

"we also discovered that much of the “extra” value claimed by WOWY was actually extra plays on pop ups and line drives that the other fielding metrics weren’t evaluating.”

I think the LDs is clearly something we should measure at the career level. I understand the reason for leaving them out of TZ or UZR (it adds a lot of non-skill variance at the season level), but there’s a tradeoff there.

Peter:  can you comment on whether foul popups are in fact 99% out balls (or close to that)?  If so, I think we could all agree to ignore them.  But if not, this seems like a real skill.

“Guy, trying to accurately assign a value to opportunities is the crux of what a fielding metric is.”

Exactly my point.  But what I’m suggesting is that our data may not allow us to do appreciably better than assuming each player’s opportunities were average, given his pitchers and parks. 

“If TZ says that Ozzie is the best defensive SS that has ever played the game do we really care if it is underestimating his value by 20% or even 50%?”
Well, it may be more like 70%.  And yes, we care.  Even if the historical value issue doesn’t interest you, we need to know the true variance in fielding skill.  If TZ and UZR are regressing players 30% on average, as I (and maybe Tango?) suspect, that has important implications for the present.

“I see no inherent advantage in accuracy to the WOWY method even for career assessment....it doesn’t make sense to use WOWY to estimate fielding opportunities for the hit location era when even the problematic observer based hit locations that we have now give a more accurate description of events than does WOWY.”

These are just assertions.  What’s the evidence? I’ve suggested several reasons this may not be true, even though it seems quite logical.  You’ve given us one good example yourself:  if A-Rod and NY pitchers field an above-avg number of balls in Jeter’s zone, it will make him appear to be a better fielder. 

*

It occurs to me that one important check that could be made would be to compare the career opportunities WOWY assigns a player to those assiged by the major defensive metrics.  That tells us how much easier/harder than average each metric thinks a player’s chances were.  Let’s see what the spread looks like:  how much easier have Jeter’s chances been?  And most importantly, see if that correlates to the player’s rating in that metric.  The correlation should be zero.  But I suspect that we will find good fielders are assigned easier opportuniities than average, and vice-versa, creating the regression effect we’ve been discussing.


#118    Tangotiger      (see all posts) 2010/06/04 (Fri) @ 18:28

Popups are almost certainly 99% outs given that the run value of a popup is close to that of a K.

HOWEVER, there’s a difference between saying a popup should be ignored and saying that a play marked as a popup should be ignored.

If every 3B, given a long enough career, gets a fair share of popups, then it doesn’t matter if we ignore them or count them… they cancel out.  If a 3B gets more than his fair share, then either he’s hogging, or he’s being unfairly marked as getting too many popups.  So, you have to handle it.


#119    Peter Jensen      (see all posts) 2010/06/04 (Fri) @ 19:40

Popups are almost certainly 99% outs given that the run value of a popup is close to that of a K.

Pop ups in fair territory are 98% outs.  Fair pop ups in the 3B area of resposibility are caught 96.6% of the time.  Guy was specifically asking about foul pop ups.  Since no information is available about how many foul pop ups are not caught, we have no idea whether there is a skill involved in catching those.  There is a fairly wide range in the rate that 3Bs catch them for outs.  Whether that range is mostly due to the park rates at which foul pops are caught remains to be tested.  But even though a larger foul territory might give a 3B more potential foul pops to catch, the additional chances will be more difficult because they are farther away from the 3B’s position, so there will be some skill involved in converting those chances.

I was incorrect in the way I previously estimated the possible run value of catching a foul pop.  It should be much less.  Probably around -.23 reflecting the value of an out minus the value of some strikes, plus the value of some advancement.


#120    Guy      (see all posts) 2010/06/05 (Sat) @ 07:02

"Since no information is available about how many foul pop ups are not caught”

Thanks, Peter, that’s what I was trying to get at.  When a ball drops in foul territory, is it not rated at all? In other words, the out rate on foul popups is 100% by definition, because plays not made aren’t counted.  But that doesn’t really tell us if this is a useful.  My impression is that a non-trivial proportion of flyballs in foul territory are not converted to outs, and that 1B and 3B (and even SS and 2B) likely do vary in their ability to make these plays.  I guess you could see if there is a strong negative correlation between a player’s foul outs and the total for his adjoining fielders, and if the answer is “no” then we want to give Rolen credit for this.


#121    Tangotiger      (see all posts) 2010/06/05 (Sat) @ 07:40

Foul plays as events in Retrosheet are ONLY recorded if they are caught for an out or dropped for an error. 

They do show foul pitches that are not recorded for an out, but they don’t indicate where those were hit.


#122    Guy      (see all posts) 2010/06/05 (Sat) @ 08:01

Then we can’t say “Popups are almost certainly 99% outs,” right?  And we shouldn’t assume that “extra” foul outs by a player are just outs taken from other players.


#123    Chris Dial      (see all posts) 2010/06/05 (Sat) @ 18:27

Re: #106.  Yes, Guy, stop that.

Yes, for the first half of Jeter’s career, he saw fewer chances.  I appreciate the snark in your last sentence, but ABSOLUTELY that Chipper Jones did as well.  Likewise, Aramis Ramirez, when he was with the Pirates, saw untold numbers of GBs.  Distribution isn’t even, and I don’t believe that it “has to even out” over a career at all.  That, IMO, is the biggest fallacy statheads make.  I am sure Chipper got fewer balls.  I am pretty sure Jeter did.

Also, Emeigh’s study had a massive flaw in the dataset that I pointed out all those years ago. 

Also, fielders may “cheat” a little one step or the other in leagues where they can play to cover for a weak fielder, but at the top level, they have to play where the hitter is most likely to hit the ball.  And the gaps between infielders is large enough that chance stealing is very very low.

At least, I believe so, and I think I can be proven wrong.  Here’s teh test: is there a vector between each fielder where the out percentage approaches zero for both fielders?  If so, then the “chance stealing” doesn’t exist as a problem.  There will be the occasional play, but it isn’t something that causes odd rankings.


#124    Peter Jensen      (see all posts) 2010/06/05 (Sat) @ 19:55

Chris - Watch the video of the bad call on the missed call of Gallarago’s perfect game.  Look where the 1B fielded the ball.  That ball was certainly hit at a vector where the 2B makes plenty of plays.


#125    Chris Dial      (see all posts) 2010/06/05 (Sat) @ 20:10

Peter,
I said “There will be the occasional play, but it isn’t something that causes odd rankings”.


#126    Guy      (see all posts) 2010/06/06 (Sun) @ 07:12

Chris:
Can you provide a link to what you’ve written about Chipper?  I’ll try to keep an open mind on that one until I see your argument.  (And I certainly agree a player might get an unusual distribution over 3-4 years, or even longer if you don’t control for pitchers).

However, the idea that Jeter has had fewer than average opportunities—AFTER controlling for pitchers and park—borders on the absurd.  How could that be true?  The only other factor that could possibly create a large variation from average over 50,000 BIP would be teammates—and you’ve just said you don’t believe that matters. 

And I do have to note the extraordinary coincidence that the two players who have had extreme bad luck in fielding opportunities are also the 2 top stars for the two most successful franchises of the past 2 decades.  Are there any non-superstars who have had such misfortune over the same years?  Serious question.....


#127    Rally      (see all posts) 2010/06/06 (Sun) @ 10:27

Easy answer: No.  Because non-superstars do not play as long as Larry and Derek have played.


#128    Guy      (see all posts) 2010/06/06 (Sun) @ 11:01

C’mon Rally, I didn’t stipulate the player had to play for 15 years.  Just find me someone with a 7-8 year career whose opportunities were so much lower than average.

And remember, this kind of luck HAS to be independent of fielding skill.  So there should also be some fielders whose raw fielding stats are very good, but who in fact were all-time great fielders once you factor in their Jeter/Chipper level of diminished opportunities.  Who are those players?


#129    Guy      (see all posts) 2010/06/06 (Sun) @ 11:10

Tango:  have you ever run WOWY for Chipper?  Just looking at total plays made, it looks like he’s maybe -300 runs over his career.  If that’s close to being right (which it may well not be), then his WAR total isn’t even in HOF range.


#130    Tangotiger      (see all posts) 2010/06/06 (Sun) @ 13:00

Yes, Chipper is very low.  He’s gotten 8.2% outs per BIP.  His pitchers were slightly below average in getting outs to 3B.  Among the 25 active 3B with the most playing time, Chipper is at the bottom with Tatis, Hinske, and Atkins.

The top 6 are: Inge, Rolen, Mora, Beltre, Feliz, Chavez.

(Inge is very high in out rate, but his pitchers are also very conducive to getting outs to 3B.  Even after accounting for that, Inge is at the top.)

Other than Mora, the other 5 are all considered standout fielders at 3B. 

Take it for what you will.


#131    Guy      (see all posts) 2010/06/06 (Sun) @ 15:19

Here’s where you posted totals for a few players through 2008:  http://www.insidethebook.com/ee/index.php/site/comments/best_worst_wowy_since_1993_through_age_34/.  Chipper was -261 plays, or -182 runs.  And that’s just at 3B.  He was at least equally bad in LF (and SS), so extrapolating to almost 2100 total games, we have him at about -250 runs through 2009.  That compares to -25 in TZ.  If WOWY is right, Chipper drops down to Joe Torre level in career WAR (c. 55).  This is a huge disparity.

Of course, it could be that some portion of Chipper’s shortfall is catching fewer foul flies, which as Peter notes above are much less costly than allowing actual hits/errors.


#132    Guy      (see all posts) 2010/06/06 (Sun) @ 17:31

Slightly off-topic (but fielding related), I notice that Boston has improved it’s DER 2 points over 2009 (from 1% below average to +1%).  Small sample size, of course, but defense was obviously a priority in their personnel changes.  That’s about a 7-win swing if the improvement is maintained.


#133    Rally      (see all posts) 2010/06/06 (Sun) @ 17:39

I put together a spreadsheet looking at Jeter and Jones for the 1990’s, when retrosheet has the project scoresheet hit locations available.

http://www.baseballprojection.com/special.htm

For each player, year, and hit location code, I compared how many outs they made compared to their position average, and how many hits were allowed compared to league average.  This is limited to ground balls only.

For Jeter, he made 50 fewer plays than the average shortstop, and there were 44 more hits against the Yankees.  Take the average of that, and you probably get -35 runs or so.  His TZ rating was -25.  Looking at his Range factor compared to league average, he was -229 plays.  Most likely not as bad if you look at assists only, but still way off from batted ball analysis.

Same process for Chipper.  He made 7 fewer outs and allowed 30 more hits considering how many ground balls were hit at or near him.  Probably 14-15 runs, and TZ has him at -15 runs.  Looking at range factor he’s -252 plays during that same time.

I don’t know how this should look into the following decade, retrosheet data is sparse for 2000-2002, and while beyond 2003 everything is coded for bbtype, hit location is no more.

For Jones, his TZ ratings were about the same in the 2000’s, a bit below average but far from terrible.  For Jeter, I’m switching from TZ(projectscoresheet) back to TZold, and he has his worst defensive seasons.  He goes from averaging -5 per season to no better than -15 for 2000-02.


#134    Guy      (see all posts) 2010/06/07 (Mon) @ 05:54

Interesting data.  How does the distribution of BIP by hit location compare to league average for each player?  Are their balls disproportionately in hard or easy locations, or basically average?


#135    Guy      (see all posts) 2010/06/07 (Mon) @ 08:04

Following up:  how do Jeter’s and Chipper’s total opportunities compare to average, as % of all BIP?


#136    Chris Dial      (see all posts) 2010/06/07 (Mon) @ 17:16

Here’s where you posted totals for a few players through 2008:  http://www.insidethebook.com/ee/index.php/site/comments/best_worst_wowy_since_1993_through_age_34/.  Chipper was -261 plays, or -182 runs.  And that’s just at 3B.  He was at least equally bad in LF (and SS), so extrapolating to almost 2100 total games, we have him at about -250 runs through 2009.  That compares to -25 in TZ.  If WOWY is right, Chipper drops down to Joe Torre level in career WAR (c. 55).  This is a huge disparity.

Right.  And that’s wrong.

IMO, where STATS gets this right.  Unfortunately, Fangraphs has completely bailed on the pre-2004 UZR, and they are now posting something called DRS that isn’t mine, which sucks completely.


#137    Chris Dial      (see all posts) 2010/06/07 (Mon) @ 17:17

Also, Guy, you say “controlling for pitcher” you can’t control for pitcher in this situation.  that’s the problem.


#138    Guy      (see all posts) 2010/06/07 (Mon) @ 17:45

Chris:  don’t just say it’s “wrong.” Tell us why you think that, or link to your past work. 

When I say “controlling for pitcher,” I mean Tango adjusts Chipper’s opportunities to account for how many outs are made by other 3B behind these same pitchers.  Now, Chipper is pretty exceptional in terms of playing a lot of innings behind a few pitchers who also had relatively few innings in front of other 3Bmen.  That’s the kind of player where WOWY has the most trouble.  Still, I bet the results would look similar even if you remove the big 3.  And why do you say that pitchers “can’t” be controlled for? 

And I’m still waiting to hear about some other players who have shared Chippers’ and Jeter’s exraordinary misfortune.


#139    Guy      (see all posts) 2010/06/07 (Mon) @ 18:00

BTW, what these kind of discrepancies tell us is that there is likely to be serious scorer bias in at least some of the PBP data.  If you think about it, there almost has to be.  It’s almost impossible for a scorer not to be influenced by the position of the fielders and the outcome of the play.  If you have two identical balls to the 5-6 hole, one of which is fielded for an out and one of which gets through to LF, it’s almost certain that the hit will get scored as a tougher chance in terms of location and/or velocity (and maybe even BIP type).  Everything we know from psychological research about “anchoring” would tell us to expect this, and on a fairly large scale. 

To show this isn’t happening, someone would have to compare the scoring to some objective measure of videotaped plays, one that can’t be influenced by the player.  (Or, have a sample of videotaped plays rated by 2 scorers, with the player digitally removed half the time.) Has anyone ever tried to validate the Retrosheet data that way?  Or do STATS and BIS have a way of validating their data?


#140    Peter Jensen      (see all posts) 2010/06/07 (Mon) @ 18:36

Guy - I believe BenJ has explained the BIS process in detail, possibly even on this site.  It is my understanding that every game is scored by at least to video scorers and any major disagreements decided by a third person reviewing the tape.  Last year for my Pitch f/x presentation I reviewed the video of 60+ SS GB plays where there were disagreements between the Hit F/x horizontal vectors and the Gameday vectors or where the Hit F/x SOBs seemed wrong.  Many were deflected balls where the place where the ball was eventually fielded was not in line with the vector off the bat.


#141    Tangotiger      (see all posts) 2010/06/07 (Mon) @ 18:47

Ben’s claim is that BIS rotates their scorers in-season well enough that there should be no park-bias.  Until I get the data to confirm that, color me skeptical of that.

Also, I agree with Guy that it’s very possible that some stringers will record a Chipper or Jeter play as “in the hole” if they don’t make an out on it, thereby making that play look harder than it was, and so, losing fewer runs per play on the misses.

We already know that there’s stringer-bias in recording an outfield play as a line-drive if it falls for a hit or flyball if it’s caught, even if it’s the same trajectory and hang time for both plays.

Like I said, these are biases.  Sample size doesn’t wash out scorer biases.  Sample size does wash out balls in play distribution.  Where that happens, I don’t know.  I suspect somewhere around 6 years, maybe 8. Just a gut feel.


#142    Guy      (see all posts) 2010/06/07 (Mon) @ 19:41

The kind of bias I’m talking about won’t be park- or team-specific. It will be universal, and the outcome would be an artificial regressing of all players toward the mean.  Good fielders will be assessed as having easier plays, precisely because they get to balls earlier (whether due to better positioning, range, or both), and the reverse will happen for bad fielders.  This kind of bias is inevitable with human scorers unless MLB puts chalk grid lines down on the field (and even then velocity ratings would be biased).  In the absence of clear unbiased benchmarks, it’s just impossible to imagine that scorers don’t evaluate the ball—consciously or not—based on the location of the fielder.  A ball that is never touched will be rated as easier than an IF hit, which in turn will be judged easier than an out.  A fly ball that Carlos Betran catches easily will be rated easier than if the same ball barely eludes a diving stab by an inferior OF. 

The social psychology and behavioral economics literature is full of examples of how people make judgments in comparison to artificial benchmarks, often in cases where the benchmark is much less relevant than the proximity of a fielder on a ballfield.  I am 100% certain that I would be guilty of this kind of bias if I scored plays, and I doubt anyone could avoid at least some of it.

Now, maybe the BIS video system provides some kind of grid that prevents this.  I don’t know.  But for systems using scorers in a ballpark, this kind of bias is unavoidable and has to be somewhere between significant and massive.  And this bias won’t wash out with 1,000,000 BIP.


#143    Chris Dial      (see all posts) 2010/06/07 (Mon) @ 21:05

Guy,
I can’t access the data at the moment (computer issues).  But Chipper’s ZR chances/9 IP is much much lower than average.  He simply doesn’t get the same number of chances.  MGL can demonstrate that.  I was saying this in 1998-99, so this isn’t a recent concoction.  But it continued through most of his career.  It’s actually worsened in some ARF systems (like BPro) where assumptions are made for LH GB pitchers.  That’s what I meant by “controlling for pitchers”.  That assumption doesn’t hold.

I don’t disagree some bias exists for the 56 hole (but there are multiple scorers and there is a little QC - not a ton), but the difference between Chipper’s chances and average is too large to be attributable to that IMO.  This could be DEMONSTRATED by MGL, if he so chose.  I would wager the Braves gave up a low percentage of GB hits to LF, coupled with Chipper’s low ZR chance count shows that there were fewer GB opps hit to him than would be expected.

Here’s a quote from a 2002 USENET post:
“Yes.  I mention it in my other post (Chipper got 308 GBs hit to his zone
in 1297 innings.  In 1345 innings, Aramis Ramirez got 486 GBs). “

It is hard to believe, but I am certain it happens.  Jeter’s lack of GBs is less severe, but real nonetheless. 

There used to be a system that you could go and look at all BIPs for a pitcher and I wrote about the difference between Pettitte and Clemens going to teh Astros and who threw more GB-2B, and suprisingly, through the 2B zone, Pettitte did.  One would expect Clemens to, but he didn’t.

I’ll have to locate the data and post more later, but such events do happen.  The largest issue is that few plyers get so many seasons, and so the fluctuations year-to-year for three years gets handwaved away.


#144    Tangotiger      (see all posts) 2010/06/07 (Mon) @ 21:38

Why would Chipper get fewer opps than other 3B behind the same pitchers?


#145    Tangotiger      (see all posts) 2010/06/07 (Mon) @ 22:40

Chipper + Glavine: 4318 BIP, .083 outs/BIP
Other3B + Glavine: 5380 BIP, .088 outs/BIP
Difference with Glavine: -.005 outs/BIP.

Chipper’s 10 most frequent pitchers:
BIP diff Pitcher
4318 (0.005) glavt001
4205 (0.002) maddg002
3816 (0.009) smolj001
1946 (0.019) millk004
1798 (0.004) hudst001
1203 0.003 neagd001
879 (0.002) jurrj001
832 (0.008) burkj001
741 0.000 avers001
736 0.015 thomj005
19883 (0.005) REST

Other than Jurggens, each of the top 10 had between 2200 and 6000 BIP with someone other than Chipper at 3B.

Now, you can probably make a case for each pitcher individually, that perhaps the “other 3B” was some great fielding 3B, or that when they had Chipper, they were on the downside of their careers, or etc, etc, etc.  But, we’re talking the equivalent of about 10 full seasons at 3B for Chipper.  Excuses in isolation, however justified, don’t hold up when amalgamated.

***

HEre’s Rolen’s top 10:
BIP diff Pitcher
1960 (0.003) schic002
1400 (0.008) wolfr001
1273 0.026 morrm001
1266 0.010 persr001
1247 0.007 marqj001
1174 0.012 suppj001
1169 0.001 willw001
1155 0.020 carpc002
884 0.010 stepg001
849 (0.004) hallr001


#146    Guy      (see all posts) 2010/06/07 (Mon) @ 23:00

Chris:
From 1999-2001, Chipper had an average range factor per 9 of 2.33.  Then 34-yr-old Vinny Castilla arrived, and posted a 2.66 RF9 over the next 2 years.  Castilla was making .3 more plays a game, which is about a 30-run difference over a season.  I suppose Braves pitchers suddenly started pitching differently?  And let me guess:  in 2004, they once again began inducing balls to be hit anywhere but near Chipper when he returned to 3B.

The story is just silly.  And if Chipper and Jeter weren’t huge and widely-admired stars, absolutely no one would take it seriously.


#147    Tangotiger      (see all posts) 2010/06/07 (Mon) @ 23:09

Rolen’s SS made 11.9% outs on BIP.

Beltre’s SS made 12.1% outs.

Chavez’s SS made 12.9% outs.

Taking the 30 3B that played the most from 1993-2009, here is how many outs their SS made:
0.116 bross001
0.116 rodra001
0.117 camik001
0.118 lowem001
0.119 belld002
0.119 ventr001
0.119 roles001 <--
0.119 wrigd002
0.119 boona001
0.120 koskc001
0.120 frymt001
0.120 glaut001
0.121 felip001
0.121 belta001 <--
0.121 jonec004 <--
0.122 castv001
0.122 zeilt001
0.122 hayec001
0.122 muelb001
0.122 batit001
0.123 cirij001
0.124 sprae001
0.125 willm003
0.125 moram002
0.126 ramia001
0.126 alfoe001
0.126 randj002
0.127 palmd002
0.129 chave001 <--
0.129 credj001


#148    Tangotiger      (see all posts) 2010/06/07 (Mon) @ 23:10

Notice the top 2 have some famous SS in common.

Rolen could be ball hogging about .003 outs per BIP from his SS.

Chavez is either being the victim of ballhogging or Oakland Coliseum is very nice to its SS.  I’d have to look.

And Chipper Jones’ SS haven’t had any problems making outs, being 15th out of 30 SS.  So, I ask again: what’s more likely, that Jones has received some disproportionate number of opps while his SS have received the normal amount, or that Chipper is just not making the plays?

What I like about these charts is that it’s just factual stuff, counting bip and counting outs.  (Other than the issue with the bunts.)


#149    Rally      (see all posts) 2010/06/07 (Mon) @ 23:12

I don’t know why Chipper got fewer opportunities, but he did.

To answer Guy’s questions, I looked at all groundballs from 95-99 hit to a zone where the 3B has a chance of making a play.  The zones are: 5,56,56D,56S,5D,5L, and 5S.  You can go to retrosheet’s documentation to see where those zones are.  Some are fairly rare, more than 70% are in the 5 (straight at 3B) and 56 (3b-ss hole) zones.

24.2% of ground balls in play were hit to the combination of these 3B zones.  When Chipper was on the field it was only 22.8%.

I’ll look at Jeter if I get chance on Wednesday.


#150    Guy      (see all posts) 2010/06/07 (Mon) @ 23:39

Rally:  Don’t mean to be a pain, but can you run those #s as % of all BIP, rather than of GBs?  That’s the relevant question.  Since Glavine, Smoltz, and (especially) Maddux were above-avg GB pitchers in those years, the percentage of all BIP that are GBs in Chipper’s zones could still be close to average.

Then we of course have LDs and foul pops to consider, which may also be weaknesses for Chipper.


#151    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 07:20

I took all battedball events from 1995-1999, included those with this data:

e.event_cd in (2, 18, 19, 20, 21, 22)
or
(e.event_cd = 23 and fld_cd between 1 and 9)

and excluded those with no data in the battedball_loc_tx field

***

I broke up the data between RHH and LHH.

***

With a RHH, 19.4% of all BIP had one of the “5” loc fields that Rally noted.

With a LHH, it was 7.3%.

That was for MLB.

Chipper Jones was at 16.7% (negative SEVEN! SD from the mean) for RHH and 8.5% for LHH (plus THREE SD from the mean).

***

Ken Caminiti is plus 4 SD and plus 5 SD for the two hands.

***

If I take the SD of the z-scores, I get 2.4 for RHH and 1.6 for LHH.  Seeing that I didn’t control for batters faced (other than hand), or pitchers, it’s plausible, but, sheesh, it sure doesn’t feel like someone can have that disproportionate a set of batters faced.

***

Anyway, it seems more plausible to me that Chipper’s locations are not being marked properly.


#152    Rally      (see all posts) 2010/06/08 (Tue) @ 09:50

Where were they being marked then?  That group of zones covers everything from the left field line to the shortstop’s position.

Calling a hit into the 5 zone - normally an easy chance for a 3B - as a ‘56’ if he doesn’t make it, that is possible and an example of scorer bias.  But if a ball has any chance of being fielded by the 3B, and it doesn’t go into one of the zones above, that can only happen by complete scorer incompetance.  I find that much harder to believe than Chipper having fewer chances than most 3B.


#153    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 10:23

There can also be slopping recording practices, as Chris pointed out regarding bunts/nobunts.

Here’s the data against RHH for 1995-99:
BIP_RT SD POS5_FLD_ID
0.216 5.1 camik001
0.207 3.2 frymt001
0.204 2.8 castv001
0.209 2.7 berrs001
0.206 2.1 holld001
0.205 1.9 alfoe001
0.203 1.7 andrs001
0.201 1.7 cirij001
0.201 1.5 randj002
0.200 1.4 willm003
0.197 0.6 bross001
0.196 0.4 davir002
0.196 0.4 gaetg001
0.196 0.4 valej002
0.194 0.0 naeht001
0.194 0.0 ripkc001
0.193 (0.1) pendt001
0.193 (0.2) oriek001
0.191 (0.6) muelb001
0.191 (0.6) hayec001
0.188 (1.1) blowm001
0.186 (1.3) bonib001
0.184 (2.0) roles001
0.185 (2.2) zeilt001
0.183 (2.2) boggw001
0.185 (2.3) palmd002
0.180 (2.7) tatif001
0.182 (3.0) ventr001
0.179 (3.6) sprae001
0.167 (7.2) jonec004


#154    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 10:32

Chipper Jones was at 16.7% (negative SEVEN! SD from the mean) for RHH and 8.5% for LHH (plus THREE SD from the mean).

This also means that there could be alot of pitches thrown outside that the hitters are hitting opposite field.

What I’ll do when I get home, if I can, is break down the distribution by pitchers, with and without Chipper.


#155    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 10:59

And by the way, I’m not expecting to find anything about the pitcher distribution because, well, I’ve already shown the outs made by 3B other than Chipper is pretty typical by Chipper’s pitchers.

What we’re going to be left with is that pitchers are purposely throwing outside when Chipper is on the field, and not throwing outside as much when Chipper is not on the field.


#156    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 11:08

Here’s another food for thought.  Let’s say that when you have a RHH, the Braves’ shortstop plays a bit more in the hole than the typical MLB shortstop.  He’ll be in a position to get to alot of “56” location plays.

The stringer, seeing the SS make the play will mark him as making a “6” location play. 

So, all those true “56” location plays are being marked as “6” location plays.

I think it’s more plausible that the SS is cheating by a few feet toward Chipper and/or the stringer is biasedly marking “56” location plays as “6” plays if the SS makes a play on it.

Isn’t that plausible?

We already have seen evidence of this kind of bias with outfield lineouts.  Why not with Chipper and Jeter?


#157    Guy      (see all posts) 2010/06/08 (Tue) @ 11:50

"Isn’t that plausible?”

Totally.  It also seems quite possible that great 3B make some plays in the 6 and 6S zones, but these probably get scored as 56 most of the time (thus increasing the implied opportunities for those 3B). 

Similarly, it could be that a lot of Chipper’s 5 balls get coded as 56, because of his poor range and/or because he plays a bit closer to the line than average (Braves might decide he can’t go to his left anyway and they may as well prevent as many xbh as possible).  This won’t affect Rally’s total BIP #, but will affect how the metrics estimate his opportunities.  With Chipper on the field, balls to left side of IF break down 21%/35%/44% for zones 5/56/6.  Is that typical?

Tango, can you link to the evidence you’ve been referring to on bias in outfield plays?


#158    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 12:47

It seems to me what Tango posts in #151 says EXACTLY what we are saying.  The balls aren’t being hit there. 

Secondly, 5 vs 56 aren’t particularly relevant to ZR. 

Regardless of “why”, the fact is, there are fewer BIP to the left side of the infield when Chipper is playing.  Tango’s data says that exact thing. 

Guy, I would appreciate if you wouldn’t make the ridiculous inference that AROM and I “believe this because Jeter and Chipper are popular”.  You ought to have more respect for our integrity regarding the data than that.  *I*, ME, Chris Dial was the first to make the claim that Chipper got fewer chances and that was why his ARF was off.  Me.  You cannot find an earlier reference to that.  And I made that claim when Chipper was in the league about three seasons.  It has NOTHING to do with his status whatsoever, nor is it recent.


#159    Rally      (see all posts) 2010/06/08 (Tue) @ 13:03

And that is also coming from a Met’s fan, who by all right should hate Chipper Jones.

This discussion is moving slowly on my end.  I get to play with my retrosheet stuff no earlier than 9 PM, and for a few hours at most.  Nothing tonight, as I’m watching some rookie pitcher to see if he has what it takes.

It would help the discussion if more people set up retrosheet databases.  There are plenty of guides on the internet for doing that, and really, it’s not that hard.  Don’t ask me.  Ask retrosheet yourself.  I’ve got enough to keep myself busy.


#160    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 13:22

WRT Vinny Castilla and Chipper:
in 2001 and 2004, the Braves threw 4217 GBs (including bunts).  In 2002-2003, the Braves threw 4434 GBs (w/ bunts).  So, yes, in the years you were pointing to, the Braves *DID* throw more GBs in Castilla’s years.

Wow, looking at Chipper’s page, Chipper’s RF/9 in 2001 was 2.14, and in 2004, it was 2.64, getting your 2.35 average, but seriously?  It turns out in 2001, the Braves threw 1961 GBs and in 2004, they threw 2256.  And Castilla, in 2002 had a 2.44 with 2087 GB and 2.88 RF/9 in 03 with 2346 GBs.

So, yes, the Braves threw more GBs to Castilla.


#161    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 13:35

YEAR    playerID    RS    Innings    CH    PM   Field1
2002    castivi02    2.0    1222.0    360.0    279.0    0.2946
    derosma01    0.0    28.0    6.0    5.0    0.2143
    helmswe01    2.0    151.3    31.0    26.0    0.2048
    gilesma01    1.0    62.3    17.0    14.0    0.2727
    lockhke01    0.0    3.7    1.0    1.0    0.2727
2002 Total        5.0    1467.3    415.0    325.0    0.2828
2003    castivi02    4.0    1266.3    431.0    333.0    0.3404
    derosma01    
-3.0    169.3    66.0    46.0    0.3898
    hessmmi01    0.0    11.0    4.0    3.0    0.3636
    garcije01    0.0    9.7    2.0    2.0    0.2069
2003 Total        1.0    1456.3    503.0    384.0    0.3454
2004    jonesch06    12.0    802.0    235.0    196.0    0.2930
    derosma01    
-2.0    556.0    169.0    128.0    0.3040
    betemwi01    1.0    39.0    7.0    7.0    0.1795
    hessmmi01    1.0    29.0    17.0    14.0    0.5862
    garcije01    
-1.0    12.0    5.0    3.0    0.4167
    greenni01    0.0    12.0    2.0    2.0    0.1667
2004 Total        11.0    1450.0    435.0    350.0    0.3000
2005    jonesch06    5.0    830.3    233.0    190.0    0.2806
    betemwi01    4.0    431.0    124.0    103.0    0.2877
    orrpe01    0.0    45.7    16.0    13.0    0.3504
    martean01    
-4.0    130.7    25.0    15.0    0.1913
    gilesma01    1.0    6.0    6.0    6.0    1.0000
2005 Total        6.0    1443.7    404.0    327.0    0.2798
2006    jonesch06    
-4.0    888.3    247.0    189.0    0.2780
    betemwi01    1.0    203.7    51.0    41.0    0.2504
    aybarwi01    1.0    241.3    49.0    40.0    0.2030
    pradoma01    
-2.0    38.0    8.0    4.0    0.2105
    orrpe01    1.0    64.0    18.0    15.0    0.2813
    penabr01    0.0    3.0    1.0    1.0    0.3333
    penato02    0.0    3.0    0.0    0.0    0.0000
2006 Total        
-3.0    1441.3    374.0    290.0    0.2595
2007    jonesch06    6.0    1080.7    300.0    239.0    0.2776
    pradoma01    0.0    44.0    10.0    8.0    0.2273
    orrpe01    
-1.0    63.0    20.0    14.0    0.3175
    escobyu01    1.0    159.3    44.0    35.0    0.2762
    woodwch01    1.0    100.3    16.0    13.0    0.1595
    harriwi01    0.0    9.0    4.0    3.0    0.4444
2007 Total        7.0    1456.3    394.0    312.0    0.2705
2008    jonesch06    9.0    987.3    295.0    242.0    0.2988
    pradoma01    1.0    158.7    58.0    47.0    0.3655
    infanom01    2.0    228.7    72.0    59.0    0.3149
    gotayru01    
-1.0    64.0    17.0    12.0    0.2656
    lillibr01    0.0    2.0    2.0    1.0    1.0000
2008 Total        11.0    1440.7    444.0    361.0    0.3082
        38.0    10155.7    2969.0    2349.0    0.2923

If this is at all legible, you can see from 2002 on (where we have chances for ZR, rather than an estimate of chances), that Chipper is up and down wrt the other 3B in a given season.  Some years his ZR chances per IP is higher and some lower.  The high GB year of 2003 seems to be significant enought o skew the data.  There doesn’t appear to be a bias that some 3B get the BOD and others do not (see the Mark DeRosa years).


#162    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 13:47

I seem to remember reading an article, maybe by Harry, at Hardball Times, maybe about 3 months ago?  Something like that.  I’ll check a bit later if someone else hasn’t found it.


#163    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 13:49

Can we at least agree that we should stop saying this:
“The balls aren’t being hit there.”

And start saying this:
“The balls aren’t being recorded there.”


#164    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 13:49

An article about what?


#165    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 13:50

Sure.  As long as sweeping assertions in the other direction are foregone.


#166    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 13:51

Am I the only one who “makes something” of the verification code?  The last two were “without45”, meaning Michael Jordan was retiring, and “hold61” meaning Livan has a runner on first.


#167    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 15:18

Harry’s article about line drives is linked to from here:

http://www.insidethebook.com/ee/index.php/site/comments/wowy_does_ichiro_hate_the_stringers/


#168    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 15:46

I don’t see what that’s supposed to tell us.  It seems to say that MGL and others say stringer bias isn’t the answer (which is what you and Guy are claiming here as well).

And GBs and FB/LDs are distinctly different.


#169    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 16:04

I am saying that there is stringer bias, be it in figuring out FB from LD or a 56 hit location from 6.  And it’s based on whether the player makes a play on the ball or not. 

That if a player makes a play, the stringer is biased to record it one way, and if a player doesn’t make the play, the stringer is biased to record it another way.

This bias exists.  The question is the extent to which it exists.


#170    Guy      (see all posts) 2010/06/08 (Tue) @ 16:21

"Guy, I would appreciate if you wouldn’t make the ridiculous inference that AROM and I “believe this because Jeter and Chipper are popular”. 

Totally fair point.  I don’t believe that’s why you or Rally are arguing your positions, and I shouldn’t have implied it.  Though I do think it’s true that your claims (and UZR ratings) would be met with much more skepticism from the saber community if these players weren’t stars.

I will say though that the way these discussions proceed, at least at BTF, is to assume UZR or TZ is “truth,” or at least the best approximation of truth we have.  And the burden of proof is on anyone who wants to argue the ‘advanced metrics’ are wrong.  In my view, the only facts in fielding are the outs—and Jeter and Chipper don’t make them.  Everything else is opinion and judgment.  The burden of proof should be on those who want to argue that these are average fielders despite their failing to make outs for 15 years.  Just as the burden would fall on anyone trying to argue they are really just average hitters who faced an unusually “easy” array of pitches over their careers.


#171    Guy      (see all posts) 2010/06/08 (Tue) @ 16:32

I took a look at Pinto’s PRM data, to see what the BIS data suggests about fielding opportunities at 3B.  I don’t think PMR is as good as UZR, especially in the way fielders are evaluated (e.g. PMR counts balls against fielders when a teammate makes an out).  But I think it’s basic expected outs calculation should reflect how easy/hard a player’s opportunities were, according to the BIS data.  For 2006-2008, I measured how many expected outs each 3B had compared to league average, per 4000 BIP (minimum 4000 BIP).  The list is below. 

To me, it looks like good fielders are in general being assigned more opportunities.  Obviously, that’s not all that’s going on—there are ballpark and pitcher effects, and random variation too.  But the question is what to measure this against.  It can’t just be outs made, as we expect players with easier chances to make more outs.  Maybe Tango could match this to the fan ratings? 

PLAYER / EXPECTED OUTS +/-
Inge 51
Crede 32
Bautista 32
Punto 21
Chavez 20
Rolen 20
Mora 18
Lowell 16
Wiggington 16
Ensberg 12
Feliz 12
Blake 8
Gordon 0
Glaus -1
Beltre -3
Zimmerman -7
Wright -10
Chipper -11
Hannahan -12
Cabrera -13
A. Ramirez -14
Atkins -16
Kouzmanoff -18
Tracy -19
Encarnacion -19
Blalock -27
A-Rod -28
Reynolds -29
Braun -31
Figgins -46

(Note:  Braun has only 2886 BIP)


#172    Peter Jensen      (see all posts) 2010/06/08 (Tue) @ 16:33

Stringers recording a ground ball out to the SS in zone 6 instead of zone 56 accounts for 11 chances fewer for Chipper than normal for the period 1995-1999.  The imprecision of including all BIP in the WOWY metric instead a logical subset of BIP relevant to 3B fielding skill is responsible for much more error even for a five year time period.


#173    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 16:41

Zimmerman, Beltre, and Figgins are the ones that stand out as pretty low in that list.

***

Why just a groundball out?  Why not any plays in zone 6?


#174    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 16:48

Ok, how about we do this:

In Chipper-games against RHH, show the % of BIP that were turned into outs by each of the 9 fielders.

And in non-Chipper Braves games against RHH, do the same thing.  (Tonight, I’ll do it for his pitchers without Chipper.)

Ideally, we’ll see the out rates that should be similar.  Someone want to do that.  (Tonight, I’ll do it for his pitchers.)

If this is not the case, if the out rates are shifted away from Chipper, then fine, that tells us alot. 

But, if the out rates are consistent, the issue therefore is: where did the non-outs go.  We’re likely going to see a disproportionate number of hits not made near Chipper.  And if that’s teh case, then that’s the hard sell, that even though the out distribution looks normal, the (recorded) hit distribution will be skewed.


#175    Guy      (see all posts) 2010/06/08 (Tue) @ 16:50

"Stringers recording a ground ball out to the SS in zone 6 instead of zone 56 accounts for 11 chances fewer for Chipper than normal for the period 1995-1999.”

Peter, how can you know this?


#176    Rally      (see all posts) 2010/06/08 (Tue) @ 17:08

"In my view, the only facts in fielding are the outs—and Jeter and Chipper don’t make them.  Everything else is opinion and judgment.  The burden of proof should be on those who want to argue that these are average fielders despite their failing to make outs for 15 years.”

It is convenient how your view puts burden of proof anywhere else.  In my view, show me the hits.  If Jeter/Chipper are as terrible as adjusted zone rating shows them to be, then there should be hundreds of extra hits in their areas of responsibilities.

And nobody is arguing Jeter was an average fielder.  Every metric has him below average, and I have shown a number of extra hits in his zone from 96-99. The question is how bad?  Was he a -15 fielder or a -50.


#177    Peter Jensen      (see all posts) 2010/06/08 (Tue) @ 17:21

Guy - I calculated it.


#178    Peter Jensen      (see all posts) 2010/06/08 (Tue) @ 18:39

Guy - Sorry.  I was a little curt with that answer.  Tango defined the question in post #156. A SS playing with Jones at 3B getting GB outs placed in zone 6 instead of 56 where they actuually were.  So I counted number of GB outs that Jones SS got in zone 56 and 56D (164).  I also counted the number of outs that he got total in zones 56 and 56D and 6 and 6D (1164).  I then did the same for all SS.  For all SS I took the percentage of 56D + 56 GB outs (5926) divided by 56 + 56D + 6 + 6D GB outs (39764), and multiplied that rate (.14993) times the number of 56D + 56 + 6 + 6D GB outs that Jones SS had (1164).  The result (174.5) was the expected number of outs that an average SS would have been credited with if he had Jones SS opportunities.  Since Jones SS actually had 164 I determined that the scorer may have shifted 11 balls from zone 56 to zone 6 as Tango thought.


#179    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 22:19

Though I do think it’s true that your claims (and UZR ratings) would be met with much more skepticism from the saber community if these players weren’t stars.

They *were* a decade ago.  Hell, they were ignored enough for me to write a piece at Baseball Analysts 10 years after I first made the claim.  It has routinely been ignored by sabermatricians.  Jeter *maybe* get the BOD but Chipper doesn’t short of citing my work.


#180    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 22:21

Ok, how about we do this:

In Chipper-games against RHH, show the % of BIP that were turned into outs by each of the 9 fielders.

I don’t buy this.  The only thing that matters is the *actual* BIP distribution.  Pct of outs of all outs doesn’t have meaning for this player even if it works for 99.9999% of players. 

MGL *has* this data.  We don’t have to chase this.


#181    Chris Dial      (see all posts) 2010/06/08 (Tue) @ 22:23

But, if the out rates are consistent, the issue therefore is: where did the non-outs go.  We’re likely going to see a disproportionate number of hits not made near Chipper.  And if that’s teh case, then that’s the hard sell, that even though the out distribution looks normal, the (recorded) hit distribution will be skewed.

I can tilt your way if you have skewed GB hits to LF.


#182    Tangotiger      (see all posts) 2010/06/08 (Tue) @ 23:41

The only thing that matters is the *actual* BIP distribution.  Pct of outs of all outs doesn’t have meaning for this player

As I keep saying, we don’t have actual BIP distribution.  If we have actual BIP distribution, we’d have no issue.

What we do have is RECORDED BIP distribution, which is not the same thing.

We also have recorded outs, which at least is a factual piece of evidence, and since, shift aside, a SS plays in the SS position, we know what the out distribution is.

We know the RECORDED hit distribution.  We have to prove whether the recorded data is conceivably also the actual data.


#183    Peter Jensen      (see all posts) 2010/06/08 (Tue) @ 23:53

What we do have is RECORDED BIP distribution, which is not the same thing.

Tango - I think it is up to you to show that the recorded BIP ISN’T the actual distribution.  So far, you have hypothesized one scenario of possible bias and I have shown why that source of bias would only account for a shift of around 11 chances.  Meanwhile, I have computed successively more accurate estimates of actual chances using more and more precise subsets of BIP data.  The problem lies with your using Outs/BIP for WOWY when there are better, more logical data subsets available.


#184    Chris Dial      (see all posts) 2010/06/09 (Wed) @ 08:42

As I keep saying, we don’t have actual BIP distribution.  If we have actual BIP distribution, we’d have no issue.

What we do have is RECORDED BIP distribution, which is not the same thing.

This isn’t exactly true.  We have actual BIP distribution if not to the granularity we’re looking at.  We hits to LF.  Or is it your position that balls hit to the left of second base may well have been hit to the right of second base.

No system I know of uses a *lone* scorer.  Every system attempts to minimize this bias through the use of multiple scorers.  There needs to be better evidence (or any evidence) that this does not work for BIP.  Heck, why would road scorers have the same bias as home scorers for Chipper?  MGL *has this data*.  He has it.


#185    Guy      (see all posts) 2010/06/09 (Wed) @ 08:44

"Tango - I think it is up to you to show that the recorded BIP ISN’T the actual distribution.”

Not surprisingly, I don’t agree.  There is sufficient doubt about the accuracy of the PBP data that I feel no obligation to accept it as ‘facts.’ If you consider humans’ enormous capacity for perceptual bias—such as the disparities in accounts you get from multiple eyewitnesses to a crime—the data is almost guaranteed to have problems.  And you haven’t “shown” this particular bias accounts for only 11 plays, because you’re assuming a lot of things not in evidence about Chipper’s SS’s opportunities.  Maybe there were many fewer than average balls truly hit to 6 zone with Chipper on the field. 

But there’s no point in continuing a debate that says “you prove it” followed by “no, you prove it.” Let’s just agree that neither WOWY nor the PBP data represents “truth,” the truth likely lies between them for most players, and we should all make a good faith effort to figure out where that is. 

*

“It is convenient how your view puts burden of proof anywhere else.”

It’s not “convenient”—it’s inherent in what I’m arguing.  I’m saying that BIP distribution will tend to even out over large samples, and that I consider the PBP data unreliable.  If you believe those things, then obviously you don’t believe the PBP data should be accepted as the starting point for fielding discussions.  That’s like my saying it’s “convenient” for you to profess faith in the Retrosheet data, when obviously you wouldn’t have done TZ in the first place if you didn’t believe that.

*

Rally/Chris:  what do you make of the list in #171?  Do you really find it plausible that A-Rod, Ramirez, Braun, Atkins, Chipper, Cabrera etc. all get many fewer than average opportunities, while Rolen, Chavez, Inge, Lowell, Feliz etc. all get more than average opportunities?


#186    Guy      (see all posts) 2010/06/09 (Wed) @ 08:51

"No system I know of uses a *lone* scorer.  Every system attempts to minimize this bias through the use of multiple scorers.”

Chris, I think one of the problems here is that we’re talking about different kinds of bias.  i think you’re talking about “pro-Chipper” or “pro-Atlanta” bias.  That could exist here and there, but that’s not really the issue.  The issue is systemic bias against all good fielders and in favor of all bad fielders.  How much of that exists is the central question.


#187    Chris Dial      (see all posts) 2010/06/09 (Wed) @ 09:04

Guy (#186),
no, I understand that, but why does Chipper stand out?  No other fielder sees what he sees.  Secondly, how do you know who is a good fielder and who is a bad one?  I score Chipper as a “good” one (or at least slightly above average).  WOWY (and other ARFs) score him as atrocious.  PBP systems score Chipper as about average.  ARFs score him as horrible.

All PBP systems have a 200 run bias to them?  That hypothesis doesn’t hold water with me.


#188    Chris Dial      (see all posts) 2010/06/09 (Wed) @ 09:12

Rally/Chris:  what do you make of the list in #171?  Do you really find it plausible that A-Rod, Ramirez, Braun, Atkins, Chipper, Cabrera etc. all get many fewer than average opportunities, while Rolen, Chavez, Inge, Lowell, Feliz etc. all get more than average opportunities?

You are cherrypicking names there.  I see Beltre with a -3, and he’s a top 3 guy.  What’s the correlation between the numbers you are putting out?  When I look at the correlation between ZR Chances and DRS, I get 0.004.  One doesn’t follow the other.  The causation arrow, as I pointed out in another thread could be that *because they get more expected"easy" outs, they *appear* to be better fielders.


#189    Tangotiger      (see all posts) 2010/06/09 (Wed) @ 09:26

Or is it your position that balls hit to the left of second base may well have been hit to the right of second base.

It could be that, it could be several things.  For example, bad angle by the scorer will place a ball 10 or 30 feet away from its actual position.  That’s a systematic bias.

Or, sloppy writing can mark a play to the LF gap as RF gap instead (or marking a bunt as a groundball, even though the scorer knows it was a bunt).  That’s a random bias.

Just comparing UZR using STATS as opposed to BIS has a 100 run difference for Andruw Jones over 7 years.  100!!  It has huge differences for many players (more prevalent in OF than IF), even though it’s the same engine processing the same games.  The difference is that the scorers (and scoring system) for STATS and BIS is different.  GIGO.

***

And it’s not just Chipper that is affected.  There are many players who stand out.

***

Therefore, I am not going to accept as truth the following:
1. subjective views of the scorers
2. the competence of the scorers to enter their subject views through a computer program
3. the computer program to translate the entries of the scorers

At least with MLB official scorers, they go through checks and balances to ensure accuracy.  And EVEN THEN, Ruane at Retrosheet reports thousands of errors of things as simple as whether it was the LF or the RF that caught the ball.

***

The only thing I will accept as guaranteed factual data is the identity of the batter, the nine fielders, and the park.

Who actually first fielded the ball (for an out) is almost always factual, but has some degree of uncertainty.  Any bias is almost certainly random.

Who actually first fielded the ball (for a hit) is mostly factual, and has a degree of uncertainty. Any bias is likely random.

Which zone the ball was first fielded (or hit something, or whatever the definition is as the scorer thinks) is somewhat factual, has a high degree of uncertainty, and any bias is somewhat systematic.

The trajectory of the ball in flight (both in how it is viewed, and how it is accurately recorded based on that view) has a some degree of subjectiveness, a pretty high degree of uncertainty (relative to the above), and the biases are going to be systematic.

***

The question on that table is:
a. Who has the burden of proof for the accuracy and precision of the data: the guy supplying the data, or the guy analyzing the data? 

b. How much effect can all these biases actually have?

For me, a) is always on the supplier of the data.  As for b), well, that’s what we’re after here in this thread.


#190    Rally      (see all posts) 2010/06/09 (Wed) @ 09:39

I second the cherry-picking observation.  Wigginton is among those getting a lot of chances (one of the worst defensive infielders I’ve ever seen, at 3 positions).  And Figgins is right at the bottom.

“When I look at the correlation between ZR Chances and DRS, I get 0.004.”

Interesting.  A long time ago I looked at the correlation between zone rating chances per inning, and range factor.  I found a correlation higher than .9 I really haven’t paid attention to range factor since that day.

“I’m saying that BIP distribution will tend to even out over large samples”

I obviously don’t believe that as much as you do.  I’m sure it evens out more over a career than the season differences, but even at the career level I don’t believe all players have faced the same hit distribution.  But I can’t put numbers on it.  This is the impasse.  I obviously can’t prove it to you if you don’t believe the PBP data.  And I don’t think there’s any way you can prove your position to me, since without looking at PBP, you just have to take it on faith that Chipper Jones over 15 years is seeing as many chances as Scott Rolen.

As far as burden of proof, it is disingenuous for either one of us to sit back and demand the other person push a rock uphill until a case is made.  It is my burden if I want to prove my belief to you, and it is your burden if you want to prove yours to me.


#191    Chris Dial      (see all posts) 2010/06/09 (Wed) @ 09:46

Just comparing UZR using STATS as opposed to BIS has a 100 run difference for Andruw Jones over 7 years.  100!!  It has huge differences for many players (more prevalent in OF than IF), even though it’s the same engine processing the same games.  The difference is that the scorers (and scoring system) for STATS and BIS is different.  GIGO.

Um, MGL adjusts the crap out of those numbers.  GIGO.


#192    Chris Dial      (see all posts) 2010/06/09 (Wed) @ 09:48

And EVEN THEN, Ruane at Retrosheet reports thousands of errors of things as simple as whether it was the LF or the RF that caught the ball.
....
Who actually first fielded the ball (for an out) is almost always factual, but has some degree of uncertainty. 

Those are somewhat contradictory, no?


#193    Chris Dial      (see all posts) 2010/06/09 (Wed) @ 09:53

I think Guy makes ONE GOOD POINT (every thing else he says is complete nonsense, glub glub).

I went to his list in #171, and I copied it and pasted it in Excel, and then went to my db for DRS and compared them.  I didn’t like what I found, so I’ll keep it to myself.  Not really.  the correlation (for these types of comparisons) was very high. 

PLAYER    ExOut    DRS
Aramirez    
-14.00    -9.00
A
-Rod    -28.00    -8.00
Atkins    
-16.00    -19.00
Bautista    32.00    
-22.00
Beltre    
-3.00    28.00
Blake    8.00    
-12.00
Blalock    
-27.00    -1.00
Braun    
-31.00    -14.00
Cabrera    
-13.00    -34.00
Chavez    20.00    1.00
Chipper    
-11.00    11.00
Crede    32.00    20.00
Encarnacion    
-19.00    -17.00
Ensberg    12.00    4.00
Feliz    12.00    42.00
Figgins    
-46.00    -2.00
Glaus    
-1.00    -9.00
Gordon    0.00    
-16.00
Hannahan    
-12.00    5.00
Inge    51.00    46.00
Kouzmanoff    
-18.00    -8.00
Lowell    16.00    25.00
Mora    18.00    
-4.00
Punto    21.00    8.00
Reynolds    
-29.00    -12.00
Rolen    20.00    27.00
Tracy    
-19.00    -1.00
Wiggington    16.00    
-9.00
Wright    
-10.00    -16.00
Zimmerman    
-7.00    19.00
        0.477

So now I have to question Expected Outs, and how/why that is calculated. 
But I think it confirms some level of players with easier chances get a higher score - which I suppose is “known”, but it bothers me about DRS.


#194    Tangotiger      (see all posts) 2010/06/09 (Wed) @ 09:54

Just to give you another example of finding errors where none should: back around 2003 or 2004, I had access to the BIS data, and I did my own quality check.  BIS even had something as simple a whether Jim Thome walked or not as wrong (BIS had him at 2 fewer walks). This is not a slight against BIS. They corrected it when I pointed it out, among several other factual corrections I made. 

(I even offered to do free data quality check against access to the data… I was turned down, which was rather disappointing.  I figured I’d be able to provide value much more to them that I’d get back.)

So, imagine then things that don’t have independent values to check against, whether it’s bad opinions on who/where the data landed, or an honest mistake of transcribing, or a bug in the software in translating.

As far as I’m concerned, they are in Beta-mode.


#195    Chris Dial      (see all posts) 2010/06/09 (Wed) @ 10:04

As far as I’m concerned, they are in Beta-mode.

Agree.  For me it is mainly because they are still settling in on how they want to score things. 

STATS is at least consistent.


#196    Peter Jensen      (see all posts) 2010/06/09 (Wed) @ 10:11

The issue is systemic bias against all good fielders and in favor of all bad fielders.  How much of that exists is the central question.

Yes, Guy you have demonstrated how TZs computation of chances biases the metric against good fielders and for worse than average fielders.  And I think you are exactly right.  And I have shown how both TZ and UZR are also biased in their division of responsibility for base hits of adjacent fielders, which also underestimates the ability of good fielders and overestimates the bad.  I am not defending the methodology of either of these two metrics or the resulting numbers that they have computed for Chipper or Rolen.  And I have demonstrated through my prior writing that I have little confidence in observational data getting the hit locations correct.  Would stringers be more likely to call a hit a line drive and a caught ball a fly ball? Absolutely, and I think I was one of the first people to point out that possibility.  Would stringers classify plays that good fielders make look easy as being “easier” by underestimating the speed or placing the ball in an easier to field zone? That is certainly a possibility and should be investigated further.  Are there recording and transcription errors? Certainly there are a few, but most of them are discovered in the internal controls that match the Retrosheet or STATS or BIS data to MLB data or with other internal controls.

But would a stringer place a ball hit in the 56 hole over in the 6M hole?  I don’t believe that for a second.  Would he or she mistake a ground ball for a line drive in the infield?  Maybe once in a thousand plays.  And this is the crux of the matter in using Tango’s 3B Outs/BIP as a serious measure of fielding ability.  It is just too broad.  3B GB Outs/GB BIP is better.  3B GB Outs/GB BIP left side is even better.  And by that measure Chipper is a -12 play to -20 play fielder for the period 1995-1999, or -2 to -3 runs per year. 

MGL is the only individual that I know who has access to both STATS and BIS data. And he certainly also has access to Retrosheet.  I have urged him to compare at least the batted ball designations from all three sources and publish the results.  The only reason that I can think of why he hasn’t is that it would violate his terms of use.  Maybe BenJ could be persuaded to do a comparison on BIS and Retrosheet for some past years. This study will get done sometime and when it does I expect that the amount of disagreement will be less than 1% a year with a small bias by park and no bias by player.  I do expect a much larger range of disagreement on vectors, distances, and speeds of hit balls if we ever get access to that data from the different sources and that data may likely be biased by source.


#197    Tangotiger      (see all posts) 2010/06/09 (Wed) @ 10:42

How do you explain the 112 run difference on Andruw Jones between BIS and STATS, using the same UZR engine?


#198    Chris Dial      (see all posts) 2010/06/09 (Wed) @ 10:47

How do you explain the 112 run difference on Andruw Jones between BIS and STATS, using the same UZR engine?

I have no reason to believe they operate “using the same engine”, nor do I know what that engine is comprised of.


#199    Guy      (see all posts) 2010/06/09 (Wed) @ 10:50

"A long time ago I looked at the correlation between zone rating chances per inning, and range factor.  I found a correlation higher than .9 I really haven’t paid attention to range factor since that day.”

But Rally, it works both ways.  Your .9 correlation also means that TZ believes that almost all variance in plays made results from differences in opportunies, and therefore talent differences are very small.  So I could see that .9 and decide not to pay attention to TZ. 

I think the way to figure this out is to look at how the correlation changes based on sample size.  For a single season, maybe a correlation of .9 is appropriate (I don’t know)—most of the variation is a function of different opportunities.  But as sample size grows, that correlation should be reduced a lot.  By the time you have 10 or 15 seasons, talent differences should explain much more, and differences in opportunities much less.  So if you tell me the range:TZ correlation remains .9 with samples of say 5 seasons, then you’ve pretty much proven that TZ isn’t really measuring opportunities very well—it’s just assuming that a lot of plays = lot of opportunities.  But if that correlation becomes much lower with large samples, that would be a good sign.  Same thing applies to UZR.

*

“As far as burden of proof, it is disingenuous for either one of us to sit back and demand the other person push a rock uphill until a case is made.”

Right.  Which is why I said this, right BEFORE the line you quote:  Let’s just agree that neither WOWY nor the PBP data represents “truth,” the truth likely lies between them for most players, and we should all make a good faith effort to figure out where that is.


#200    Tangotiger      (see all posts) 2010/06/09 (Wed) @ 11:03

I have no reason to believe they operate “using the same engine”, nor do I know what that engine is comprised of.

Fair enough.  Let’s agree that any black box, be it from MGL, STATS, BIS, or MLB.com, is a black box and therefore, they each carry a certain level of uncertainty in processing.


#201    Rally      (see all posts) 2010/06/09 (Wed) @ 11:16

Guy, I was looking at ZR, not TZ when I did that correlation, before TZ was invented.  Minor point I know, but a clarification.

“I have no reason to believe they operate “using the same engine”, nor do I know what that engine is comprised of.”

I don’t think we are talking about MGL’s ratings vs John Dewan’s, we are talking about MGL using the exact same methodology on both datasets.  But I can’t find the article on THT or Fangraphs.  Anyone have a link?

“I expect that the amount of disagreement will be less than 1% a year with a small bias by park and no bias by player.”

If you’re talking about batted ball classifications, there will be more difference than that, because they aren’t using the same standards and the totals don’t match.  For 2009, here’s retro/Mlbam: (percentages)

LD 18.9
GB 44.9
FB 28.6
Pop 7.6

And for BIS, from Fangraphs:
LD 18.9
GB 43.3
FB 34.1
Pop 3.7

So the line drive totals are the same, but I don’t think looking hit by hit will match up.  You’ll have one system that calls the LD on one end of the spectrum groundballs, and the other system calls them flyballs at the other end of the spectrum.

And BIS is much less likely to call anything a pop, but I’m not sure they even make the distinction, I just multiplied fb% by the infield fb% to get that.  While MLBAM is coding some as popups even if they get to the OF.


#202    Rally      (see all posts) 2010/06/09 (Wed) @ 11:24

OK, here’s the link:  Right here, most of the meat in the comments.  I thought MGL wrote up a full article on this, but maybe I was wrong about that.

http://www.insidethebook.com/ee/index.php/site/comments/suzr_v_buzr/


#203    Peter Jensen      (see all posts) 2010/06/09 (Wed) @ 15:10

And BIS is much less likely to call anything a pop, but I’m not sure they even make the distinction, I just multiplied fb% by the infield fb% to get that.  While MLBAM is coding some as popups even if they get to the OF.

Rally - Retrosheet coded 3 balls caught by outfielders as pop ups last year.  All 3 had the infield fly rule called.  Don’t make it seem like there is a problem when there isn’t one.

The problem with the Fangraphs hit ball percentages is not with the BIS coding, but with Fangraphs weird way of reporting the percentages.  The percantages of FB, LD, and GB add up to 100% for each team.  I believe that the IFFB % is the percentage of IFFB of all air balls, or LD + FB. That would make the BIS numbers and the Retrosheet numbers very close in the aggregate.  I can’t even find where you got the percentages for the league as a whole that you reported, so I can’t check for sure.  Fangraphs really, really needs a better system and a much better glossary that explains in detail how they calculate their percentages and formulas.  They present a lot of information, but its not very useful if you don’t know how they calculated it.


#204    Rally      (see all posts) 2010/06/09 (Wed) @ 15:33

What I did to get the MLB percentages was take the team page, which unfortunately has percents and not counts, dumped them into excel and just took the average.

I was thinking infield flies were as a percentage of flyballs.  Thanks, that makes the numbers match up better, though still not perfect.

If it’s as a percent of all non-grounders, then we get this:

FB 27.9%
Pop 6.2%

Much, much closer, but MLB still has more pops, and more groundballs.  I don’t think anyone is going to mistake a ground ball for a flyball, so that leaves us with a few issues where one system codes a ball differently than another:

1) popups vs flyballs
2) groundballs vs low line drives
3) high line drives vs flyballs

From these figures I’d guess MLBAM and Fangraphs/BIS agree about 95% of the time.  And yeah, I’m only comparing Fangraphs interpretation.  I’ve never seen the actual BIS data.  I know they talk about fliners now and then and even break it into more hybrid subsets (fliner-flies, etc.) but I’ve never seen that stuff published, it all gets turned into the familiar G/F/L codes first.


#205    Tangotiger      (see all posts) 2010/06/09 (Wed) @ 15:53

http://www.fangraphs.com/statss.aspx?playerid=755&position=P#battedball

Click, and page down, and that’s Santana’s counts.

In 2010, he has these numbers:
GB 82
FB 110
LD 37
BU 10 (bunts)
Total 239

He has 320 batters faced, including 81 K+BB.  That leaves him with 239 contacted balls.

So, that’s how you get 100% coverage.

Now, as for IFFB and IFFB%: he has 14 and 12.7% respectively.  If you take 14/.127, that will give you the denominator Fangraphs uses, and that number is 110.  And that is the number of FB he has.

So, BIP = GB+FB+LD+BU

FB = ofFB + ifFB


#206    Guy      (see all posts) 2010/06/09 (Wed) @ 16:32

Chris:  Thanks for posting that data in 193.  Very honest of you!  What DRS is that—career, 2006-2008, or something else?

I think expected outs should provide a pretty fair measure of how many chances, and how easy/hard, a fielder faced.  It’s a less complex version of UZR.  Pinto doesn’t remove BIP that are turned into outs in estimating expected outs (so a play made by another fielder on a ball with shared responsibility counts “against” the other fielders), which I don’t agree with in terms of evaluating fielders.  But that mostly affects OFs.  And as a relection of what the BIS data says, it should work well. 

That .48 correlation is huge.  It really suggests a big regressing effect.  I’ll try to post some other positions later.  I guess we could also compare expected outs to TZ, and even UZR.


#207    Peter Jensen      (see all posts) 2010/06/09 (Wed) @ 18:43

Tango Post #205 - Your calculations are correct, but Fangraphs are not.  Compare Fangraphs 2009 numbers for Santana with Retrosheet’s:

Hit_Type--------BIS-------------Retro

B---------------17---------------17
G--------------174--------------176
F--------------232--------------145
L---------------82---------------98
P---------------38---------------71
TOTAL----------543--------------507
BIS_TOT_WO_P_--505

My best guess is that BIS counts 505 hit balls and for some reason Retrosheet counts 507.  My money is on Retrosheet being correct.  Anyway, BIS divides its hit balls into Bunts, LD, GB, and FB.  The GB, FB and LD without the Bunts is the 100% that gets divided into the percentages that Fangraphs presents.  And then the FB get further subdivided into IFFB.  That’s where the 16.4% IFFB rate comes from.  The difference between the 38 IFFB for BIS and the 71 pop ups for Retrosheet is probably a difference in definition.  Retrosheet defines a pop up as a non line drive air ball fielded by an infielder.  BIS probably defines an IFFB as one fielded within the infield an outfield FB as any fly ball fielded in the outfield no matter what type of fielder fields it.  Makes it hard to compare without the actual BIS PBP data, but easily correctable if one has the BIS PBP data.  So the actual difference between the 2 data sets is probably just the 16 line drives and the 2 missing plays.

Fangraphs needs to subtract the 38 IFFB from the 232 in Santana’s stat line.


#208    Tangotiger      (see all posts) 2010/06/09 (Wed) @ 20:49

"Fangraphs needs to subtract the 38 IFFB from the 232 in Santana’s stat line. “

Why does it “need” to do that?  Fangraphs is treating the ifFB as a subset of FB.

To do what you are saying, it needs to show ifFB and ofFB.  But obviously David doesn’t want to do that.  To him “FB” is ofFB+ifFB.  To you (or Retrosheet anyway), FB is ofFB.


#209    Rally      (see all posts) 2010/06/09 (Wed) @ 22:56

Peter, you think that if a batter lifts a pop beyond the shortstop, Erick Aybar backs up 20 feet and makes the catch, retrosheet calls it a pop up and BIS calls it a flyball, and not part of the infield fly subset?

That should explain the differences.


#210    Guy      (see all posts) 2010/06/09 (Wed) @ 22:58

Here’s the same analysis for SS, with Pinto’s 2006-2008 PMR predicted outs per 4000 BIP based on BIS data.  The correlation of opportunities with outs/BIP is .89, which strikes me as too high when you have 3 years of data—that implies opportunities explain 80% of the variance in plays made—but hard to say for sure.  The correlation with outs/expected outs is .38, which is obviously too high—there must be zero correlation between fielding talent and opportunities above/below average.  So again, this seems like strong evidence the BIS data is “over regressing” by ascribing more opportunities to those who make more plays.  Maybe Chris can test this against DRS as well.

Player / Expected Outs +/-
Tulowitski 55
Escobar 32
Everett 29
J. Wilson 29
Clayton 24
Furcal 21
Uribe 20
Bartlett 14
Young 12
Izturis 11
Peralta 10
Guillen 7
Guzman 6
Berroa 5
Rollins 3
Scutaro 2
Theriot 1
Reyes -1
Greene -2
Eckstein -3
McDonald -3
Lopez -4
Hardy -4
Renteria -5
Vizquel -6
Cabrera -9
Crosby -10
Lugo -11
Hanley R.  -12
Tejada -12
A. Gonzalez -18
Jeter -19
Pena -20
Drew -23
Betancourt -26


#211    Peter Jensen      (see all posts) 2010/06/09 (Wed) @ 23:47

Rally - Yes, I think that’s whats happening.  Santana in 2007 also had a difference in GBs with the BIS data reported at Fangraphs.  The only explanation that I have for that is the balls that are line drives as they pass the pitcher but are GBs by the time they reach SS or 3B.  If one wanted to test that theory one could look at GB opportunities reported in the Fielding Bible or at the Bill James web site and compare to those in Retrosheet.


#212    Guy      (see all posts) 2010/06/10 (Thu) @ 00:22

I checked UZR for a few players, and it tracks PMR pretty well in terms of the predicted outs it assigns.  For example, UZR expects about 28 fewer outs than average per 4000 BIP from Jeter, and +36 for Jack Wilson. 

Furcal really stands out:  he makes about +40 plays per year, but UZR expects him to be make -50 plays, so he ends up with negative UZR rating.  My guess is Furcal really got somewhat fewer opporunities, plus BIS/UZR adds some extra regression to get that result.

Maybe MGL can provide UZR’s predicted outs for a large sample of players, and see if it correlates at all with UZR.


#213    Guy      (see all posts) 2010/06/10 (Thu) @ 10:29

I ran the PMR #s for CF as well.  Same pattern:  correlation between Predicted Outs and plays made is over .9, and correlation with PMR (Actual-Predicted Outs) is .32.  Players below (check out Vernon Wells, with -37 predicted outs per season with over 10,000 BIP).

If you regress Predicted Outs on PMR, you find that for each extra play PMR thinks a SS or CF has made, a player is assigned .6 extra predicted outs.  In other words, it appears that SS and CF are actually 60% better/worse than PRM thinks they are (on average, of course).  At 3B, players get +.4 predicted outs for every extra PMR out made.

This is a crude approach, as I don’t love the PMR metric.  But it’s consistent with Chris’ DRS, and when I compare it to UZR it looks similar there.  Maybe Peter can do a similar analysis using his metric, or Rally with TZ.  But it looks to me like the BIS data assigns about .5 extra opportunities for every play made.  And that’s BEFORE we deal with issues in the metrics that further regress results (like UZR’s treatment of errors), or possible influence of teammates.

CF 2006-2008
Player / Predicted Outs +/-
Carlos Gomez 45
B.J. Upton 26
Melky Cabrera 22
Alfredo Amezaga 20
Curtis Granderson 17
Carlos Beltran 16
David DeJesus 13
Aaron Rowand 12
Corey Patterson 9
Ichiro Suzuki 6
Joey Gathright 5
Torii Hunter 3
Chris Young 2
Coco Crisp 1
Gary Matthews Jr.  -3
Grady Sizemore -4
Josh Hamilton -4
Brian Anderson -5
Marlon Byrd -6
Mike Cameron -6
Jim Edmonds -6
Willy Taveras -8
Andruw Jones -9
Nate McLouth -15
Johnny Damon -18
Mark Kotsay -21
Juan Pierre -26
Kenny Lofton -27
Shane Victorino -30
Vernon Wells -37


#214    Rally      (see all posts) 2010/06/10 (Thu) @ 11:11

I compared the TZ ratings for 2006-2008 to the list in 210.  Got about the same thing Chris found for the 3B, correlation of r=.45.

I looked at TZ chances above/below average and found almost no correlation to TZ rating.  This was for a longer period, 2003-2009.  For 3B it was .12 and for shortstops .008.  I was figuring expected chances based on innings in the field.  I know balls in play is a better measure, but that would take more time to compute, where I had innings readily available.  I’ll do per BIP at some point.


#215    Guy      (see all posts) 2010/06/10 (Thu) @ 11:23

Thanks, Rally.  BTW, can you post link to your original two articles explaining Total Zone, or maybe put it up on your site if the originals can’t be accessed? Would be great to have as a reference when I forget (often) how it’s constructed.


#216    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 11:50

Right, it would have to be BIP.


#217    Guy      (see all posts) 2010/06/10 (Thu) @ 12:19

Not to start another argument, but the correlation won’t tell us exactly the same thing for TZ, because TZ uses outcomes in part to define opportunities.  Let’s say (just theoretically), that counting infield hits as opportunities for the player who touches the ball tends to penalize players with good range.  TZ will count that as a 100% opportunity, but that also reduces the player’s TZ rating.  So we may not see a correlation, even if TZ is magnifying opportunities for good fielders in that respect.  Same thing would apply to an OF if you assign responsibility based on who picks up a hit, or if if you assign 100% responsibility on errors. 

*

One way we could settle a lot of this is to test all the metrics on team-switchers, in terms of predicting player DER (outs/BIP).  When players switch teams, they should get an entirely new set of opportunities.  So while those opportunities may vary a lot, it should be unbiased at the player level.  A metric’s ability to predict future DER should tell us how well it measures true talent.  (All players will be older, so that’s a constant). 

Say you put UZR and WOWY —or just UZR and Range Factor—in our model.  If WOWY/range is picking up no more information than opportunities, then it will have no predictive power.  But if it does have predictive power, it means UZR is regressing opportunities too much.  Repeat same exercise for TZ or any other metric. Thoughts?

Furcal is an interesting example (yes, totally cherry-picked).  He made an incredible number of plays in Atlanta, yet UZR has him as slightly below average (so it must assign a huge predicted outs total).  Then he goes to LA, and CONTINUES to make a huge number of plays there, and UZR continues to say he’s slightly below average. Could he have such enormously bad luck in both cities?  I suppose.  But then Renteria arrives in Atlanta, with basically the same pitching staff, and makes almost one less play per game over the next two seasons.  And UZR says he’s average too!  I’m not buying this story....


#218    Rally      (see all posts) 2010/06/10 (Thu) @ 13:01

"Not to start another argument, but the correlation won’t tell us exactly the same thing for TZ, because TZ uses outcomes in part to define opportunities.”

Good to know.  If it won’t us anything then I’m glad I didn’t waste my time redoing it by BIP.  While I fully agree BIP is a better measure, Innings may be inaccurate, but I don’t think they are biased, and would tell mostly the same story.

“A metric’s ability to predict future DER should tell us how well it measures true talent.”

YES!  I’ve been saying that for about two years now.  Put TZ, UZR, DRS, +/1, Fans scouting report, whatever else to the test.  You start with the ratings.  Then you apply a standard, and simple, formula to turn all the yearly ratings into projections.  For the next year, you look at how much playing time each player had on the field.  See how many hits the actual players used are projected to save.  Then compare it to actual DER.  Which of course has to be park adjusted.

http://lanaheimangelfan.blogspot.com/search?q=defense+projections+evaluation

One of these days I’ll get around to doing that. But I’d prefer if somebody without a dog in the fight did.  If Totalzone turns out to be the winner, everyone’s going to think it’s rigged.


#219    Rally      (see all posts) 2010/06/10 (Thu) @ 13:04

As for the original TZ articles, I’m not sure I have them.  I wrote those in the blogging software for MVN, not in word or anything.  If I can find them, then yes, that would be a good addition to my site.  I’ll try to find the archives, or else maybe google wayback.


#220    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 13:07

Furcal/WOWY (first column is through 2005, and second column is since 2006)

2005- / 2006+
12.6% / 13.1%: Furcal outs per BIP

12.3% / 12.1%: all other SS those years
12.3% / 12.0%: his batters’ career with other SS
12.2% / 12.3%: his parks’ career with other SS

12.8% / 12.9%: his pitchers’ career with other SS

As you can see, his batters and his parks were pretty neutral for his career.  But his pitchers… hooo boy, did they help him alot, especially in LA (Derek Lowe at least).

So, he went from somewhat below average with the Braves to somewhat above average with the Dodgers.

I don’t see much difference with UZR actually.


#221    Guy      (see all posts) 2010/06/10 (Thu) @ 13:16

Rally, I was suggesting something a little different, though similar:  use the metrics to predict individual player DER for players who switch teams.  Maybe the team approach works too, but I could see the team results being pretty muddled.  And it probably couldn’t settle this “too much regression” issue, as a mix of good and bad fielders on a team would hide that.

Actually, having player-level DER available to everyone would be very useful.  Tango: maybe you could persuade Fangraphs and/or B-Ref to display it? (assuming you agree)


#222    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 13:17

As for Renteria, well, his Braves pitchers must have changed alot in 2006-07.  They were very league average in terms of outs by their non-Renteria SS: 12.2%.  Renteria made only 11.5% outs in Atlanta, and his entire profile of batters, pitchers, parks were very close to league average.

For 2003-09, to match UZR years, he’s had perfectly average everything (parks, batters, pitchers), with 24,642 balls in play.  His out rate was in Jeter territory.

To UZR’s credit: he’s played on 5 different teams in 7 years, and so, whatever systematic bias there is should be washed away.  That is, unless he positions himself in a very non-traditional manner such that balls that go by him are marked as being farther than than they actually were.  And therefore, bias follows him.

So, Renteria would be a very exciting test case to see how much bias there is in recording data.


#223    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 13:22

Tango: maybe you could persuade Fangraphs and/or B-Ref to display it? (assuming you agree)

I’ve been thinking of doing my own fielding website to also include the Fans Scouting Report.  Just one of those things I keep meaning to do, but my regular job slows me from doing it.


#224    Chris Dial      (see all posts) 2010/06/10 (Thu) @ 14:23

They present a lot of information, but its not very useful if you don’t know how they calculated it.

This is what I am saying about UZR.


#225    Chris Dial      (see all posts) 2010/06/10 (Thu) @ 14:47

Here are the SS.  The correlation (r, not r^2) is lots lower.

Player    ExO    DRS    PM diff
A
.Gonzalez    -18.00    12    16
Bartlett    14.00    21    27
Berroa    5.00    
-6    -8
Betancourt    
-26.00    -15    -21
Cabrera    
-3.00    6    8
Clayton    24.00    2    4
Crosby    
-10.00    21    26
Drew    
-23.00    -23    -30
Eckstein    
-3.00    -1    0
Escobar    32.00    0    
-1
Everett    29.00    35    45
Furcal    21.00    5    6
Greene    
-2.00    18    24
Guillen    7.00    
-1    -2
Guzman    6.00    
-2    -2
H
Ramirez    -12.00    -33    -44
Hardy    
-4.00    -20    -27
Izturis    11.00    0    
-1
J
.Wilson    29.00    -3    -3
Jeter    
-19.00    -22    -31
Lopez    
-4.00    -27    -36
Lugo    
-11.00    1    0
McDonald    
-3.00    15    21
Pena    
-20.00    7    10
Peralta    10.00    
-19    -26
Renteria    
-5.00    -19    -25
Reyes    
-1.00    11    17
Rollins    3.00    
-2    -2
Scutaro    2.00    5    5
Tejada    
-12.00    13    18
Theriot    1.00    0    0
Tulowitski    55.00    16    22
Uribe    20.00    17    22
Vizquel    
-6.00    42    56
Young    12.00    
-5    -7
        0.305613747    0.306757641

Also, I added the “Plays Made difference, which should be more comparable to “expected outs” than DRS.  Also, could someone ask Bill James to rename his defensive runs saved.  That’s mine.


#226    Guy      (see all posts) 2010/06/10 (Thu) @ 14:52

"As for Renteria, well, his Braves pitchers must have changed alot in 2006-07.”

Not so much.  Smoltz, Hudson, James, Sosa, Thomson, Ramirez, Davies and Reitsma accounted for 71% of IP in 2005, and 63% in 2006 (somewhat different proportions, of course).  What changes, of course, is that Renteria’s “other SSs” include Furcal, while Furcal’s include Renteria.  This may be a case where WOWY runs into sample size problems.  But perhaps Furcal really has played consistently behind extreme GB staffs--certainly possible.

What isn’t possible, I don’t think, is that Renteria has been -28 plays per 4000 BIP, over 5 teams, but is still somehow an average fielder (UZR).


#227    Guy      (see all posts) 2010/06/10 (Thu) @ 15:09

Chris:  is that DRS for 2006-2008?  Total or per year?


#228    Chris Dial      (see all posts) 2010/06/10 (Thu) @ 15:14

Renteria was a good fielder when he was young, and bad when he was old.  He drops off like a rock, though.  He’s +38 (one below avg season in nine) over his career for Fla and Stl, and -33 for Bos/Atl/Det (all below average).

So it *averages* to an average fielder, but the last four seasons have been awful.  Jeterrific, in fact.


#229    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 15:17

Guy:

I don’t see how that’s possible. 

The pitchers, without Furcal, are at 12.8% or 12.9%.  Seeing that Renteria is part of this group of SS, this number is, if anything, a smidge too low. 

The pitchers, without Renteria, are at 12.2%, which is very league average, including Furcal.

Therefore, there is definitely a huge shift in pitchers from 2000-05 and 2006-07.  It’s NOT JUST 2005/06.

Furcal’s pitchers, year-by-year:

PIT YEAR
0.129 2000
0.124 2001
0.131 2002
0.130 2003
0.127 2004
0.126 2005

0.132 2006
0.128 2007
0.129 2008
0.127 2009

So, in 2005, Furcal’s pitchers got 12.6% outs (in their careers) from all non-Furcal SS.

The SS-friendly pitchers reached a high in 2002-03, and then they started leaving Atlanta. 

Furcal got SS-friendly pitchers in 2006 in LA.


#230    Chris Dial      (see all posts) 2010/06/10 (Thu) @ 15:17

Dammit. 

Guy, it is for 2006-2008.  I matched the seasons you posted.  the numbers are intended to reflect the same period.  Total, which I assumed yours were.


#231    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 15:23

I’ve been meaning to include an age adjustment for WOWY, so that when Ozzie Smith appears in the list of SS that Andy Benes had, it would not look like an averaged-aged Ozzie Smith, but a 40-yr old Ozzie Smith.

Obviously, it’s going to work out pretty even for most pitchers, but for a few Ripken-mostly pitchers, that might have an effect.


#232    Chris Dial      (see all posts) 2010/06/10 (Thu) @ 15:28

I can’t really figure out the proper age adjustment.

A few years ago, Tango, you offered to run the age curves for defense that I was doing wrong.  Your age curves are done on fewer years and fewer players originally.  We have 5 seasons now for bUZR, but We have 20 for DRS.  Still interested in running with that?  I wanted to learn to do it for myself, but it was more work than I wanted it to be.


#233    Guy      (see all posts) 2010/06/10 (Thu) @ 15:44

Tango:  Yes, I realized after posting that you were comparing many more years.  There wasn’t a big shift from 2005 to 2006, but not having Maddux (and other changes) makes a big difference comparing the two larger time periods. 

Still, UZR is looking at this one year at a time.  You have Furcal making a LOT more plays in 2004-2005 than Renteria does in 2006-07, with a substantially similar staff and of course the same stadium.  Yet they get the same UZR rating.

Chris:  no, I was reporting predicted outs (vs. average) per 4000 BIP.


#234    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 15:50

Chris,

Sure, send what you got.  IDEALLY, you would send it like this, as one file:
playerId,pos,age,chances,rate

If you do that, I can even give you the “positional adjustments”.


#235    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 16:06

I don’t really like to focus on single-season with WOWY, because, well, there’s only so granular I can take it.

Just compare Furcal 2004 to Furcal 2005 (both with Braves).  He had similar pitchers in both seasons.  He got 395 outs in 3272 BIP in 2004 (12.1%) and 503 outs in 3810 BIP (13.2%).  That’s an enormous difference (about 40 outs difference).

One standard deviation is .005, so the difference here is around two SD.  He MIGHT have been a bit better in 2005, but, it’s certainly possible (especially when cherry-picked), that a player will go from 12.1% to 13.2%, all other things equal just by chance alone.

His UZR changed by 9 runs, so, that’s entirely consistent here.  That is, the 40 out difference (30 runs) is 9 real runs (as per UZR) and 21 lucky/timing runs.  That’s why I’m not crazy about WOWY for 1-2 years.

***

As for Renteria, the 2004-05 Braves pitchers were more conducive to get outs at SS than the 2006-07 Braves pitchers (in their careers).  About .004 more outs because of the pitcher tendency.  So, we’re talking about an extra 15 plays Furcal gets over Renteria because of the pitcher composition.

That said, even Furcal-to-Furcal shows a big difference, so, you might not see what you expect.

***

As for Renteria’s career, you can draw a bright line between 2001, 2002.  Slightly above average before, to completely abysmal afterwards. 

I didn’t check to see if he got hurt or something then, but I would bet something did happen.

I would definitely have a question about Renteria’s UZR, because it seems so inconstistent with the rest of teh data.  He makes few outs, plays with 5 teams, but UZR sees him as average.  The only persistent bias possible is his positioning and how scorers use that as a reference point to mark plays.


#236    Chris Dial      (see all posts) 2010/06/10 (Thu) @ 16:43

I would definitely have a question about Renteria’s UZR, because it seems so inconstistent with the rest of teh data.  He makes few outs, plays with 5 teams, but UZR sees him as average.  The only persistent bias possible is his positioning and how scorers use that as a reference point to mark plays.

I esplained this.  He was good and got bad.


#237    Chris Dial      (see all posts) 2010/06/10 (Thu) @ 16:50

Well, guy, then we aren’t comparing apples and apples, so I would expect those correlations to go down.  Or up.  How many IP is 4000 BIP?


#238    Guy      (see all posts) 2010/06/10 (Thu) @ 16:50

"The only persistent bias possible is his positioning and how scorers use that as a reference point to mark plays.”

Well, that and the systemic bias that makes ALL players look more average than they are.  Unless someone thinks PMR has some enormous flaw in estimating predicted outs, I think we’ve shown that the BIS data has a huge bias problem based on fielder quality.  Whether you use PMR, TZ, or DRS as a measure of fielder talent, there is a huge correlation with predicted outs.  (If Chris adjusts his SS DRS for playing time, I bet the correlation will be higher.)

I hope MGL will do this analysis with UZR data, and either confirm or refute this.

*

I don’t disagree about 1 year (or even 2 year) samples.  That’s why I don’t think we can evaluate metrics by asking if the individual season results look “reasonable.” They should NOT look reasonable: a +15 player will end up being +40 or +50 some seasons.  If they do look reasonable, you’re regressing too much—and then your career numbers will be WAY off.


#239    Tangotiger      (see all posts) 2010/06/10 (Thu) @ 16:57

4000 BIP is about 150 games.


#240    Guy      (see all posts) 2010/06/10 (Thu) @ 16:57

"I esplained this.  He was good and got bad.”

You’re Ricky Ricardo now?  Tango is talking about his UZR ratings which are only available for the “bad” period (but still show him average).

4000 BIP is roughly 150 games.  Correlation will go up (I think).


#241    Chris Dial      (see all posts) 2010/06/10 (Thu) @ 22:01

If Chris adjusts his SS DRS for playing time, I bet the correlation will be higher.)

Soda?  Also the 3B has the same requirements.  How many innings do you want me to use, because I have to normalize it.  150*8.75?


#242    Guy      (see all posts) 2010/06/10 (Thu) @ 23:51

Chris: sure, that’s fine.

I went ahead and estimated UZR Predicted Outs, using the Fangraphs UZR data for SS (2006-2008), just defining Predicted Outs = Actual Outs - Average Outs - UZR.  UZR Pred Outs looks pretty similar to the PMR version on the players—the two have an r of .89.  The spread is even larger in UZR, with a much higher SD—UZR is often attributing a lot more/fewer opportunities than PMR.

Interestingly, UZR Predicted Outs has NO correlation with UZR itself.  That seems good.  But I don’t think it is.  Because UZR Pred Outs still has a strong correlation with PMR—in fact, even higher than PMR Pred Outs.  So I’m guessing it is also correlated with DRS and TZ. 

It’s possible that UZR is right, and the other metrics show a correlation with UZR’s Predicted Outs precisely because the other metrics fail to account for these opportunities (making those fielders appear better than they are).  But it could also be that the lack of correlation means that UZR is doing a worse job of identifying good fielders, while the BIS data does in fact attribute more/easier plays to good fielders.  In addition, UZR—unlike PMR—defines opportunities in part based on outcomes of plays, which will necessarily reduce any correlation between UZR and UZR Pred Outs. 

SS 2006-2008, Predicted Outs per 4000 BIP
Player / PMR Pred Outs / UZR Pred Outs
Everett 29 23
Alex Gonzalez -18 -30
Berroa 5 4
Crosby -10 -6
Guillen 7 8
Izturis 11 12
Guzman 6 -9
Eckstein -3 -4
Jeter -19 -28
Renteria -5 -12
Lopez -4 -16
Hanley R.  -12 2
Hardy -4 -13
J. Wilson 29 38
Bartlett 14 20
Peralta 10 30
Rollins 3 1
McDonald -3 -4
Reyes -1 -23
Uribe 20 8
Lugo -11 0
Greene -2 -4
Scutaro 2 12
Young 12 10
Tejada -12 -23
Vizquel -6 -12
Cabrera -9 -16
Furcal 21 50
Clayton 24 21
Theriot 1 -2
Drew -23 -12
Pena -20 -25
Tulowitski 55 68
Escobar 32 36
Betancourt -26 -16


#243    Peter Jensen      (see all posts) 2010/06/11 (Fri) @ 01:38

Guy - I looked at BZM for the 5 years I have data.  I did player season correlations on BZM rating vs. (Actual Games - Games Chances).  Actual games are Innings played/9.  Games chances are Chances/(Lg avg chances per game. The correlation for 3B is -.073, N = 190.  For CF correlation -.0097 N = 148.  For SS correlation
-.119 N = 168.


#244    Tangotiger      (see all posts) 2010/06/11 (Fri) @ 07:53

Chance rates should simply be the number of chances per BIP.

Not innings, since innings is BIPouts plus strikeouts.

This is the point to trying to figure out if you have bias, that if you take the total number of BIP, that the way you subdivide those BIP is not biased in some way.


#245    Guy      (see all posts) 2010/06/11 (Fri) @ 08:59

Peter:  Thanks for checking.  It’s tricky to figure out how to measure any bias that may exist in allocating opportunities.  Within some metrics, the player’s defined opportunities in part reflects his outs/errors/touches—so that creates an offsetting negative correlation between opportunities and rating.  We’re trying to detect the reverse correlation (if it exists), in which skill yields more defined opportunities.  So I think you need to compare opportunities to some independent measure of skill (as Rally and Chris have done with the PMR opportunities).  PMR is different, because predicted outs is defined entirely (I think) without regard to what the fielders did. 

More and more, I think the answer lies in looking at how sample size effects the relative importance of opportunities in each metric.  As sample size increases, the relative weight of skill should grow and opportunities should shrink, both because random variation shrinks and because fielders have new pitchers, teammates, and stadiums.  Does that happen?

Using 2006-2008 for SS, UZR says that 80% of the variance in plays made is explained by opportunities, and 20% by UZR.  UZR itself combines both skill and luck (for any given distribution of opportunities, we’ll still have binomial variation).  So UZR is telling us there is very little skill difference here.  In fact, the spread for UZR—about 11 plays per 4000 BIP in my data—appears to be less than expected random variation. 

So the question is what happens if you have 5-6 years of UZR for players?  Does it still claim that opportunities account for 80% of the variance?  If so, there’s a problem.  And even the 80% for 3 seasons of data seems high to me.


#246    Tangotiger      (see all posts) 2010/06/11 (Fri) @ 09:05

"Peter” I presume is “Guy”?


#247    Guy      (see all posts) 2010/06/11 (Fri) @ 09:38

Damn.  Yes.  Can you fix?


#248    Guy      (see all posts) 2010/06/11 (Fri) @ 10:37

I realize I made an error by using UZR, which incorporates errors and DPs.  So I re-ran the numbers using range UZR, which I think is the relevant measure.  Results change a bit, but not a lot (see below).  Still no correlation between R-UZR and Predicted Outs. 

It’s a weird pattern.  UZR’s and PMR’s predicted outs are very highly correlated (.89).  The correlation of outs made should be 1.  Yet R-UZR and PMR only have a correlation of .51.  Clearly, the UZR engine is doing a lot of work.  The result is that UZR attributes a lot more of a fielder’s outs to opportunities than does PMR.  Correlation with outs-above-average:  PMR .77, UZR .38.

With an average sample of about 8300 BIP, UZR is saying that 85% of a fielder’s outs above/below average reflect his opportunities, and just 15% is accounted for by skill/luck.  I don’t know what the right split is, and maybe Tango can weight in here.  But that seems like a very high correlation between plays made and opportunities. 

Again, the real test is how this changes by sample size: does the weight given to opportunities shrink as sample grows?  I’d love to see the correlation between plays and opportunities for all metrics over samples of 1, 3, and 5 years.  It’s easy to create a metric that looks reasonable, both in terms of variance and leaders/trailers:  just take outs above average divided by three.  It’s probably better than raw outs above average for 1 season.  But over a career, it will obviously be a lousy metric.  The question for any metric is how well it distinguishes real differences in opportunities, rather than just infering opportunities from plays made.


#249    Peter Jensen      (see all posts) 2010/06/11 (Fri) @ 10:38

Guy - My Chances are entirely without regard to what the actual fielder did.  For infielders they are all the GBs hit within the angles that define his area of responsibilty minus all the touches of the ball by fielders in front him.  For outfielders they are all the LDs and FBs hit within the angles that define his area of responsibility minus HRs.  Batted ball type defined by Retrosheet.  Batted ball locations from MLBAM with my translations to angular measure.  Angles defining area of responsibilty are varied slightly by batter handedness.


#250    Guy      (see all posts) 2010/06/11 (Fri) @ 10:41

And here are the correct R-UZR Predicted Outs per 4000 BIP, for SSs 2006-2008:

Everett 29
Alex Gonzalez -26
Berroa 3
Crosby -5
Guillen 1
Izturis 14
Guzman -14
Eckstein -8
Jeter -25
Renteria -12
Lopez -25
Hanley R.  -4
Hardy -8
J. Wilson 46
Bartlett 14
Peralta 34
Rollins 8
McDonald -5
Reyes -20
Uribe 11
Lugo -11
Greene 2
Scutaro 12
Young 14
Tejada -19
Vizquel -3
Cabrera -12
Furcal 46
Clayton 15
Theriot 1
Drew -12
Pena -25
Tulowitski 78
Escobar 33
Betancourt -16


#251    Guy      (see all posts) 2010/06/11 (Fri) @ 14:37

Peter:  Thanks for the clarification.  It makes sense (to me) that BZM would have relatively little bias, as the zones are so large and you aren’t measuring velocity.  Paradoxically, sometimes less information is better. 

If it isn’t a lot of trouble, I’d be interested in knowing if there’s any relationship between R-UZR’s Expected Outs (comment 250) and a SS’s BZM rating.  That is, if we use BZM as a measure of SS skill, is there any correlation with the opportunities that UZR/BIS estimates?


#252    Guy      (see all posts) 2010/06/13 (Sun) @ 13:28

Rally/Peter:
I notice that Peter reported correlations between his metric and opportunities based on player seasons.  Rally:  did you use seasonal data or multi-year samples?  I ask because the correlation for PMR with opportunities is very small at the season level (.09—about the same as Peter reports for BZR at 3B and SS), but quite robust once you have 3-year samples.  Even a strong relationship apparently gets lost in the noise of single-season data.  So it would be a good test of your metrics to look at 3-year samples and see if correlation remains close to zero.


#253    Rally      (see all posts) 2010/06/13 (Sun) @ 13:50

Guy, see #214.  I was looking at 2003-2009.


#254    Chris Dial      (see all posts) 2010/06/14 (Mon) @ 13:34

With an average sample of about 8300 BIP, UZR is saying that 85% of a fielder’s outs above/below average reflect his opportunities, and just 15% is accounted for by skill/luck.  I don’t know what the right split is, and maybe Tango can weight in here.  But that seems like a very high correlation between plays made and opportunities.

I think I made this claim a few years ago at BTF.  I bumped it the other day in regard to replacement level.

70% (or) of ZR chances are fieldable by anyone playing the position, and the last 15% (to get to the avgs of .80-.85, are the difference makers. 

Third base is a bit lower.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential