THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, August 24, 2007

Why two heads may be better than one

By , 01:56 AM

Here is a good article by Michael Humphreys (MAH) analyzing various fielding metrics that use BIS and STATS data.  His premise is that because the recording of batted balls, particularly in the OF, are difficult to nail to any degree of accuracy and are somewhat subjective, you (a person interested in evaluating player fielding) are better off using more than one data source and combining them (or at least combining metrics that use different data sources). He presents some numbers to back that up.  Personally, I am not sure that it makes all that much difference.  I can’t imagine a whole lot of meaningful differences between the two data sources.  It can’t hurt though. Since he does not have access to the raw data, he can’t compare the actual batted ball data itself.  Here is the link.  BTW, in the next or last installment, he is going to present his algorithm for his own pseudo- PBP defensive metric (DRA) that uses non-PBP stats to compute a sort of UZR.  It is a regression equation I think. I think it is excellent and can certainly be used for the minor leagues and college (or when you don’t have any PBP data for any level) and would be invaluable to teams.


#1    Rally      (see all posts) 2007/08/24 (Fri) @ 09:26

In some cases the little differences between sources really add up.  There is a vast difference in how the systems based on STATS rate Andruw Jones (average or below) and how the BIS systems rate him (best CF in the game by far.)

I’m looking forward to seeing how DRA works.  I’ve got a system for traditional stats, JAARF (just another adjusted range factor) but I haven’t published because I don’t think its any more useful than fielding runs or FRAA, which is to say not very good.


#2    studes      (see all posts) 2007/08/24 (Fri) @ 10:38

Personally, I am not sure that it makes all that much difference.

I’m surprised you say that, Tango.  Like a lot of people, I’ve wondered about the sometimes huge differences between PBP rating systems, but Michael’s analysis seems to show that the differences are primarily caused by the data, not the methodology of the systems.  In particular, the differences between infielders and outfielders are stark.

It also makes sense to me that there would be wide variations in marking spots in the outfield.


#3    MGL      (see all posts) 2007/08/24 (Fri) @ 11:19

That was me, MGL, not Tango.  Overall, I just can’t imagine there being that much difference between the data.  As I said, combining the results of systems based on different data sources can’t hurt, but I don’t think it is going to make much of a difference. And I’m not sure that Michael’s analysis tells us one way or another.


#4    Tangotiger      (see all posts) 2007/08/24 (Fri) @ 11:32

Pinto, for the longest while and perhaps still, did not (does not?) consider distance a ball is hit, instead relying on the “hardness of ball hit”.  It was pointed out to him, and I think he said he was going to take a look at it.

The best way to determine how different a system is, is in the results of the CF.  The CF, more than any other player, has more latitude in his positioning (lefty/righty, pitcher tendency, park, hitter tendency, and score/men on base).  A good system will be able to determine where the typical CF is positioned when all those parameters are considered, and how often he should make the play given where/how the ball was hit, and compare that to what the CF in question did.  (To the extent that a CF faces an “average” profile of all those parameters, it doesn’t make any difference in using all those “positioning” parameters… just the batted ball results.)

As for data gathering, it depends how balls are marked.  Some show where the ball first hits something (ground, player, wall), others show where it’s picked up or changes direction.  And, noting something as “line drive” or “fly ball” will make a world of difference to the PBP fielding system, since there’s a huge difference in expected out rates between the two.

As usual, let me once more ask that BIS, STATS, MLB.com, et al, invest in a stop watch.  Figure the hang time.


#5    Rally      (see all posts) 2007/08/24 (Fri) @ 11:47

Depends on what you consider a big deal in differences in data.  Maybe one has a flyball going 280 feet and another 265 feet.  May not seem like a big deal.  But over the course of a season, the little differences add up and we can differ by 20 runs or so in evaluating some players.  To me, that’s a big deal, but I don’t know how much is due to data and to methodology.

Michael’s latest is up on THT, and compares UZR and PMR using consistent methodology.  I don’t know how truly close the methodologies are in this comparison, but for Brian Giles 03-05 the difference is 27 runs per year.

I know PMR has some limitations vs UZR, like the distance thing, but if its just the data causing that difference, I’d say its a huge deal.

The link:
http://www.hardballtimes.com/main/article/ghosts-in-the-outfield/


#6    Mike Green      (see all posts) 2007/08/24 (Fri) @ 11:49

Wouldn’t it be possible to use an automated method to calculate hang time?  It wouldn’t surprise me if you could do it in almost every case from TV images.  You’d think that HitTracker would be on this project, as measuring outfielder defence is a lot more important than measuring how far home runs really go…

Ideally, of course, you would use bat to first contact (ground, player, wall) as the time measurement, and you would have a height measurement for wall contact.


#7    studes      (see all posts) 2007/08/24 (Fri) @ 12:12

That was me, MGL, not Tango.

Sorry!


#8    Rally      (see all posts) 2007/08/24 (Fri) @ 12:54

Mike, that would take some resources.  I’m not sure how much help Greg has with hittracker, but if there are 20-30 homeruns per night, a small team could cover the job.  For tracking all flyballs hit, you’d need someone for every game.  MLB could do it, STATS or BIS could do it.


#9    John Beamer      (see all posts) 2007/08/24 (Fri) @ 12:54

You’d think that HitTracker would be on this project, as measuring outfielder defence is a lot more important than measuring how far home runs really go

To sabermetricians perhaps, but to your average fan on the street, no. Also the resources required to track every fly ball is significantly more than that required to track HRs.

I know Greg is expanding Hit Tracker to include some close-to-the-wall flyballs. I expect it requires a step change in resource to expand this to cover the entire major leagues.


#10    tangotiger      (see all posts) 2007/08/24 (Fri) @ 14:02

Read the article from post 5, and it’s a good job by Michael in asking Mitchel and David to create a “simplified” version of their PBP systems.

What I don’t understand is if Michael said this:

David, Mitchel and I decided to reduce zone ratings back to the bare essentials—counts of BIP defined by distance, slice and trajectory—and compare the results based on BIS and STATS data.

Why bother asking Mitchel and David the results of processing these input data, and instead why not simply compare the input data? We still can’t tell if it’s the inputs or the processor.  While MGL and Pinto reworked their systems to only accept those inputs, it’s not clear that their processors are all that equivalent.  MGL uses “adjustments”, while Pinto uses “maximum likelihood estimation”.

The Dewan model seems to take a simplified versio of Pinto’s MLE, and treats each set of parameters independently, to the point that if Andruw Jones is the only one to get BIP with a particular set of parameters, then he gets compared to himself, and gets a zero (likely confirming the results Michael is alluding to).  I’m not sure Pinto’s MLE process overcomes this however.


#11    Rally      (see all posts) 2007/08/24 (Fri) @ 14:12

What would be really cool to see is MGL and David trading data - MGL figuring UZR from BIS and David PMR from STATS.


#12    MGL      (see all posts) 2007/08/24 (Fri) @ 15:41

Yeah, of course the idea is just to compare the data.  I’m not sure I am allowed to disseminate the raw data though.  To be on the safe side, I probably shouldn’t.  I take all of my agreements very seriously.


#13    Tangotiger      (see all posts) 2007/08/24 (Fri) @ 15:50

I’m wondering if you (STATS data) and Pinto (BIS data) could “bin” them by counts (park, distance, slice, trajectory).  After all, this is not raw data, and you are not linking it to any player (or even team).  STATS/BIS must have less of a problem with this, than with UZR/PMR, I’d think.


#14          (see all posts) 2007/08/24 (Fri) @ 19:41

MGL, Thanks for linking the article and for running the numbers.

Rally—DRA is clearly better than DFT, Fielding Win Shares, and Fielding Linear Weights.  Perhaps you can test JAARF against the same data set here.  The data is all laid out.  Thanks for linking to the second installment.

Tango—David (Pinto) told me he used depth information for the ratings he gave me, so a simple lack of depth data is not the source of BIS/STATS differences.  Actually I think David Pinto uses the same database Shane Jensen does for SAFE.

I am all in favor of the maximum legal disclosure of Zone Data, though I’m pretty sure the mega-corporate owner of STATS and the saavy millionaire owner of BIS would have some pretty tight contractual constraints on data disclosure.  After all, once the data is out, they lose their monopoly (or duopoly) on fielding consulting to the major league teams.

Maybe a limited form of ‘binning’ the data would be OK.  If not, one of the points of the article is if we can’t have public disclosure of raw data, if we at least have independent analysts running the numbers using a reasonably consistent methodology, we can be more confident of ratings.

By the way, you are so right about hang time.  Using the pop-up, fly ball, fliner and line-drive distinctions is about as precise as recording whether a flyball landed in left, center or right.  That being said, over the course of two full seasons, the line-drive ratios should average out, based on research published in The Hardball Times Annual 2007, and what matters is the flyball/groundball tendencies of the pitching staff and the mix of right- and left-handed batters, which is why DRA can correlate with SAFE, S-PMR, and the average of S-UZR and S-PMR about as well as SAFE and UZR correlate with each other _IF_ you only look at full-time fielders with two or more full-time seasons.

What you really need the hang time information for are the multitude of players who don’t play full-time at one position year after year.

Based on the presentation I saw at SABR, a ‘perfect’ zone system could be set up the following way, and probably for not that much money.

(1) Purchase and install in each major league stadium a single video camera that would be kept still, and have a view of the complete field.  Videotape every minute of every game.  No human being needs to be there.

(2) After each game, fast-forward the tape to review each batted ball in play, whether caught or not caught.  It shouldn’t take more than ten minutes, if the video can fast forward fast enough.  Use a stopwatch to determine hang-time for balls in the air (or perhaps the video system can ‘count’ the number of ‘frames’ during which the ball was in the air) and do a similar calculation to determine when a ground ball reached or passed the nearest infielder.  Mark on the two-dimensional video screen where the ball lands or is caught (or if a grounder, leaves the infield).  If the ball dropped in for hit, record whether it was a single, double or triple.

(3) Whip out your handy-dandy matric equation to convert the (x, y) coordinates on the video screen to (u, v) coordinates (or to polar coordinates of depth and angle) on the true playing field.

(4) If you want to get really fancy, the SABR 37 guy also proposed tracking initial and final position of the fielder, as well as the time it took to reach the final position (where he caught the ball, if he did).

(5) Now the hard part.  How to devise a kernel smoother and numerical integration technique of the kind Shane is doing if we now have _three_ continuous variables (depth, angle, time) mapping onto a single scalar (the integer 0 or 1, for out or hit, or maybe .275 for out, .47 for single, .80 for double, 1.05 for triple, etc.).  Once the kernel smoothing and numerical integration is done, you have a mapping from continous variables (depth, angle, time) to a continuous variable P{out} or E[runs].

(6) If we can’t figure out (5), we muddle our way through to a satisfactory way of creating buckets defined by ranges of outcomes for each of our continuous variables and do the kind of calculation that Mitchel and David have been doing for the past few years.  I believe you hit the nail on the head (using the Andruw example) why the first go at Plus/Minus didn’t work in the outfield. BIP bucket sample sizes of one make every outfielder average.

I think the total budget would be the cost of the videomachines, rental space for the videocamera at the ballparks, and a salary for one or two people.


#15    Rally      (see all posts) 2007/08/24 (Fri) @ 21:58

Certainly anything public disclosure of STATS or BIS data can’t be done - but I wonder if the companies would have a problem with 2 researchers who already have access to the data to collaborate on a project.  The only results we’d (the readers) have is something like:

A Jones
+25 PMR, BIS
+17 PMR, STATS
+2 UZR, STATS....and so on.


#16          (see all posts) 2007/08/28 (Tue) @ 09:16

The last installment of the article is in today’s (August 28th) Hardball Times.  The link is: http://www.hardballtimes.com/main/article/how-to-calculate-outfield-dra/


#17    Rally      (see all posts) 2007/08/28 (Tue) @ 11:20

I like the approach of looking at groundball outs above average, rather than groundout/flyout ratio.


#18    Peter Jensen      (see all posts) 2007/08/28 (Tue) @ 11:26

Why is so much energy wasted by well intentioned people developing fielding metrics for non-PBP data for current players when we have two free sources of PBP data?  Retrosheet is certainly not perfect, but it does record every play made by a fielder as accurately as BIS or STATS.  Plus it records hit ball type and the player who ultimately fields the ball, which gives some location information information for base hits.  This can be supplemented with the hit ball location provided in the MLB gameday files.  This information is also not perfect since it records where the ball is picked up instead of where it hits the ground, but it seems to describe the on field location pretty accurately.  Combining these two sources comes very close to the data in the proprietary BIS and STATS sources and is a heck of a lot better than guessing from non PBP data. 

It is OK to dream of the day when we have complete player positioning from implanted microchips and 3D ball movement that can calculate hang time to the hundreth of second, but it seems that currently too many people are using those future possibilities as an excuse to not make the best use of the abundant information that we already have.


#19    Tangotiger      (see all posts) 2007/08/28 (Tue) @ 12:23

I believe that Michael’s purpose was to create a system for leagues/years where there is no PBP information, and is testing his system for the current years by comparing against the PBP systems.

So, if you are trying o evaluate the fielding of Ted Williams, or some college kid, he’s giving you a tool for it.

***

As for hang time, I’d be happy if it was within 0.5 seconds accurate.


#20    Rally      (see all posts) 2007/08/28 (Tue) @ 12:25

The reason I’m interested is to better rate players from years we don’t have pbp data.  If Michael can show it works reasonably well for modern players, compared to pbp, then we can use the method to go back and compare Ozzie Smith to Rabbit Maranville or Willie Mays to Tris Speaker.


#21    Peter Jensen      (see all posts) 2007/08/28 (Tue) @ 13:17

My fault for just reading the second article and not the first.  I would be very surprised if much of the difference in systems based on STATS and BIS data is in the data rather than in the methodologies used to process the data.  But, as others have mentioned, until someone actually has both data sets to compare we won’t know for sure.  Has anyone who has access to either BIS or STATS bothered to compare with MLB Gameday data?  It would be interesting to know how much difference there is on comparable data.  I compared MLB Gameday with Hitracker Home Runs.  I was hoping to compare with Greg’s in park fly balls but he removed the early season data from his web site before I had a chance to download it.


#22    Tangotiger      (see all posts) 2007/08/28 (Tue) @ 14:21

That’s a great idea Peter!  You can even compare it to the “large zoned” Retrosheet, to at least see if everyone is in the same ballpark.  Counts by:
1. park
2. park, zone
3. park, zone, trajectory
would be a good start.

You’d want all the systems to have an exact match for #1, of course.  In the BIS and STATS finer zones, you’d probably want some “margin” count, to show that it could go either way.  That is, if Retrosheet has it at 78D, there’s enough uncertainty there that as long as BIS and STATS are in the general range, that would count as a match.


#23          (see all posts) 2007/08/28 (Tue) @ 20:37

Peter,

I believe there are significant data differences between BIS and STATS because the correlations between systems that used the same Zone Data (SAFE and S-PMR for BIS; UZR and S-UZR for STATS) had correlations of .85-.89, whereas the closest correlation for systems using different data was .70 (SAFE and S-UZR) and as low as .60.  That is a big difference.  I believe it can be fixed using the matrix equation shown at SABR 37.

Until then, and given these differences, I believe in taking the average of ratings based on both data sets.  Adding in the latest Retrosheet coding would be terrific.  Do you know anybody who is using the data available for the most recent seasons to create zone-type ratings?

In addition to providing good historical ratings, DRA provides contemporary ratings that, with the current exception of third and first, basically correlate as well with the state-of-the-art systems as the do with each other, at least with two full seasons of data.  Before this article came out, did anyone believe that that would ever be true of a non-pbp system?

The publication of the DRA book about the greatest fielders (any suggestions and help in getting published most welcome) would introduce a completely transparent system that would always be there as a ‘sanity’ check for the Zone Data systems.  Remember, when Mitchel was working for the Cardinals, fans couldn’t get good STATS-based ratings as a second opinion to BIS-based ratings.  Tom Tippett long ago stopped publishing his Gold Glove essays, which were based on close examination of STATS data.


#24          (see all posts) 2007/08/28 (Tue) @ 20:46

Peter,

Ah, I just think I got your point.  Why bother with DRA when you can get contemporary near-Zone Data from Retrosheet, and also use MLB Gameday.  Regarding the latter, the fact that it records where the ball was caught, not where it lands, is a big problem.  While at SABR 37, I met some Japanese analysts who work for a Japanese company that has similar “where the ball is caught” data.  They haven’t found it useful.

Regarding Retrosheet, I attended the SABR 37 seminar on Retrosheet data extraction, and I was amazed by the detail available for _some_ games, as scoring detail varied considerably over the years.  For seasons from 2003 to the present, are the batted ball descriptions reasonably complete?  Do you know some folks who are working with it to create Retrosheet Zone Ratings?


#25    tangotiger      (see all posts) 2007/08/28 (Tue) @ 20:57

Mike,

I suggest publishing through http://www.lulu.com

There are no startup costs whatsoever.  It’s the best way to get into the game.


#26    Peter Jensen      (see all posts) 2007/08/29 (Wed) @ 01:17

Mike - As I said in post #21 I apologize for criticizing your work without reading the first part of your series and understanding that your aim was to develop a system that would be useful for the pre-retrosheet years.

Since 2003 the retrosheet data on fielding is minimal; all hit balls are described as L,F,P,G (in field 47 as BATTED BALL TYPE) and the fielder that fields the ball is given (in field 46 as FIELDED BY).  That means that a ground ball through the 5-6 hole is described as a G single fielded by the left fielder.  In addition there is sometimes additional hit location information in field 50 (HIT LOCATION) and as part of field 29 (EVENT TEXT).  This is not the zone information that had been used in some previous years but is limited to the hole location for grounders (shortstop third base hole given as 56), gap location for outfield balls (78 or 89), and sometimes a letter designation as to the depth of balls to the outfield.  In my opinion this additional hit location is too inconsistant to be useful and I have ignored it.

I believe, but I do not know for sure, that balls that are classified as ground balls by Retrosheet had hit the ground before they passed an infielder and are not line drives through the infield that subsequently were rolling on the ground when they reached an outfielder.  This is an important distinction that should be verified through comparisons with video or by somebody from Retrosheet. 

Gameday hit ball location information is given as an XY coordinate where the ball was fielded.  The XY coordinate can be translated to a distance - vector location.  In this form the vector is useful in placing ground balls travel through the infield.  These vectors should be compared with BIS and/or STATS data by those who have access to the proprietary systems.  This might be a way that BIS data can indirectly be compared with STATS without actually revealing proprietary information.  If those with access to BIS data compared it with Gameday and noted discrepancies and those with access to STATS data did the same, then perhaps only the discrepancies could be compared and still remain within legal bounds.

I have worked on a Retrosheet based fielding metric based on a no-zone system where a fielder is responsible for all balls hit to his field or his quarter of the infield.  This metric assumes that the distribution of balls hit to a fielder’s position will contain the same fraction of “fieldable” balls and “unfieldable” balls for each fielder.  Whether this is true or not is debatable, but DIPS theory also assumes that there is little control over the fieldability of hit balls by an individual pitcher, so this should be even more true for an entire pitching staff.  Of course this assumption would be true only for large numbers and not small samples, so the metric would only be useful for regular starters and would be more accurate over several years than for individual years.  For this metric the Gameday information is accurate enough to not be a problem at all.


#27    Rally      (see all posts) 2007/08/29 (Wed) @ 09:42

I have worked with 2003 to 2006 retrosheet data.  I called it TotalZone.  In short, it uses the “fielded by” and bbtype to charge responsibility for all hits in play to a fielder.  I had to make some assumptions, 50% of singles to left are charged to the 3b, and 50% to shortstop.  Same for hits to cf (2b/ss) and rf (1b/2b).  All extra base hits on groundballs to left or right are charged to either the 1B or 3B, assuming they had to be down the line.

I adjusted for handedness, ballpark, and bases occupied situation.  It was discussed here:

http://www.insidethebook.com/ee/index.php/site/comments/totalzone_a_new_fielding_measure/

and here:
http://mvn.com/mlb-stats/2007/04/23/totalzone-a-new-defensive-measure/

and here:
http://mvn.com/mlb-stats/2007/04/28/total-zone-part-2/

And put the data here:
http://home.comcast.net/~briankaat/tz0506.xls


#28    Peter Jensen      (see all posts) 2007/08/29 (Wed) @ 11:31

Rally - I read the discussions of TotalZone when you presented them back in April.  It follows much the same line of thinking as the work I have done on a Retrosheet metric. The key to accuracy of such a system depends on how the hits are apportioned to each fielder.  I understand why you made the assumptions you did, but the actual distributions are much more complex, dependent both on the absolute range of each fielder and the relative range of adjacent fielders.  Unfortunately, your simplified distributions are not close to being correct, so neither are your fielder rankings.  I think the problem of assigning reasonably accurate distributions is solvable, but I am not sure that I have found the best solution yet.  A small change in methodology makes a significant change in the rankings.  We both need to work on this some more, but I am confidant that the result will be very close to the best systems using proprietary data.


#29    Rally      (see all posts) 2007/08/29 (Wed) @ 12:35

I don’t think there’s anything you can do for the ball distributions.  If all you know is that its a groundball picked up by the centerfielder, there’s not much you can do except recognize its a rough estimate and live with it.  Craig Biggio benefits from fewer GB going into center because he plays next to Everett and not Hanley Ramirez.  Yet its a good sign that the system is able to give a high rating to Jack Wilson even though he’s playing next to Jose Castillo.

As far as the rankings “not being close”, it looked to me like its rating the commonly accepted good fielders well and the poor ones low.  But I don’t want to get into semantics, I should see how it correlates with UZR.


#30    Rally      (see all posts) 2007/08/29 (Wed) @ 12:54

For the players in the last link I did in #27, for 2005-06, the correlation with UZR is .711.  I think that’s something to be reasonably proud of.  I’ll break that out by position later.  And a big thanks to MGL for including retroID in the UZR file - makes the job so much easier.


#31    Rally      (see all posts) 2007/08/29 (Wed) @ 13:04

That didn’t take long.  Its better in the infield than the OF, especially strong in the middle:

1B .671
2B .859
3B .760
SS .885

LF .679
CF .669

But it blows up in right field:
RF .124

I know I’m not the first guy to have trouble in RF, I think it was the most problematic position for David Gassko’s Range stat as well.


#32    tangotiger      (see all posts) 2007/08/29 (Wed) @ 13:06

What is the correlation between UZR and simply outs made per BIP (i.e., DER)?


#33    tangotiger      (see all posts) 2007/08/29 (Wed) @ 13:46

Hmmm… that didn’t make any sense.  I guess I meant on an OF/IF basis.  So, as a group, what’s the correlation between OF DER and UZR, and IF DER and UZR.


#34    Peter Jensen      (see all posts) 2007/08/29 (Wed) @ 13:56

Rally - I still think that much improvement can be made with hit ball distribution, but if you are satisfied with your .71 correlation with UZR I guess we will have to live with that disagreement.


#35    Rally      (see all posts) 2007/08/29 (Wed) @ 14:28

I’d welcome any improvement that can be made, but there’s only so much you can do with limited data.  As for the .71 correlation, that’s actually higher than PMR and SAFE’s correlation to UZR, from looking at Michael’s THT article.


#36    Rally      (see all posts) 2007/08/29 (Wed) @ 14:42

Correction, the article just covers outfielders, so those systems do match up better with UZR for outfield.  There really isn’t anything you can do to the hit distribution for outfielders.  Sometimes the right fielder will field the ball even though it was hit closer to center, and the cf dived and missed it.  Such plays are rare.  Better allocation of the hits would be an issue for infield, but that’s where my system best matches UZR.

Tango,

Calculating DER for infield and outfield will take some time.  Once I get home to the database I could look at gbouts/gb and fly+line hits/fb+ld Should popups be included?


#37    tangotiger      (see all posts) 2007/08/29 (Wed) @ 15:32

Popups are usually noise and should be discarded, unless you believe that Jeter et al make a noticeable number of “run into the OF or stands” catches.


#38    Rally      (see all posts) 2007/08/29 (Wed) @ 20:50

Compared UZR to DER, years 2005/2006:

infield: .894!  Just knowing what percent of total ground balls are turned into outs gets you incredibly close to team infield UZR, without any adjustments at all.

outfield: .393 From retrosheet that’s outs per flyball + linedrive.  Without accounting for line drive/ flyball mix, parks, left/right, etc. you aren’t very close.


#39          (see all posts) 2007/08/30 (Thu) @ 00:07

Peter—
I apologize for making you feel you had to apologize.  No problems.  Thanks for explaining the Retrosheet data.  Considering the price, it’s pretty amazing stuff. 

Rally—
I can’t look at Total Zone tonight, but will try this weekend.

What was the 2005-06 UZR sample you used? 

Right field is tough for everyone.  I wonder what the sample I reported shows for correlation between BIS and STATS in right.

Tango—
The only infielder who seemed to have significant flyball skill under Dewan’s Plus/Minus (which, rightly, has a ball-hogging adjustment for infielder flyballs) was Orlando Hudson.  Maybe 5 runs a year.  I consider that a reasonable upper bound.


#40    Rally      (see all posts) 2007/08/30 (Thu) @ 09:10

Michael,

MGL has posted everyone’s UZR for 2003 up to July of this year.  Just search through the old threads on this site.  I compared UZR to the fulltime players I gave out ratings for in post #27.


#41          (see all posts) 2007/08/31 (Fri) @ 11:34

Rally,

What kind of correlation do you get in the outfield if you only include players who played two Full-Time Seasons (defined in the article as 130 or more games played at one outfield position for one team) and take their two-year (2005-6) average runs saved per 1450 innings played?

The reason I ask is that I believe, based on THT Baseball Annual 2007, that line-drive percentages tend to average out over a two-year period.


#42    Rally      (see all posts) 2007/09/01 (Sat) @ 08:16

Its a busy weekend, but I should have some time Monday to look at the 2 year avg correlation in right field.


#43    Rally      (see all posts) 2007/09/06 (Thu) @ 12:58

I did the correlation for RF who played fulltime in both 05 and 06.  Even using 2 years, its no better between totalzone and UZR.  Actually, its negative (-.07).


#44          (see all posts) 2007/09/06 (Thu) @ 23:22

Rally,

Thanks for running your numbers.  Do you have ratings for the outfielders in The Hardball Times article, i.e., outfielders with at least two Full-Time Seasons between 2003-05?

Right field is a b1tch.  Why do you think that is?


#45    tangotiger      (see all posts) 2007/09/06 (Thu) @ 23:50

Can you guys correlate against this list:
http://www.tangotiger.net/scouting/pos2006_RF.html


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 09 16:41
Sabermetric Moves of the 2009 Pre-Season

Jan 09 19:56
Modeling Baseball Player Ability with a Nested Dirichlet Distribution

Jan 09 18:08
Line Drives

Jan 09 18:04
Challenging Nate Silver (and all other forecasters)

Jan 09 17:31
Cheers

Jan 09 17:14
Teaching sabermetrics at school

Jan 09 16:51
The first Hardball Times Annual available for download!

Jan 09 14:44
Vote for the Worst Player in MLB

Jan 09 12:29
Clint Eastwood is Archie Bunker

Jan 09 12:16
Mailbags on Parade