THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, April 01, 2008

Cross-checking the data providers

By Tangotiger, 09:21 AM

Fabulous article by Peter Jensen:

Let’s take the two observers in closest agreement, BIS and Greg, split the difference between them and call that the best guess of the actual hit location. What is the minimum distance and degrees that will have 95 percent of both Greg’s and BIS’ observations included? The answer is +-18 feet and +-4 degrees. That’s a pretty big area. It is two whole zones in width.
...
It doesn’t matter if you have three observers or 3,000, the composite data will never have any less error than that of the two closest. Having many observers is only useful for finding those two best observers.

Fantastic stuff.  And great point.  Peter is right, that by throwing in as many observers as I can, I wouldn’t want to weight each one equally.  The better the estimator (relative the other other 2999), the more I would weight that observer.  Ideally, you’d be down to just one observer, the perfect guy.  Realistically, you might have one observer carry 10% of the weight, another 9%, another 8%, and on an on, such that you only need about 20 observers out of the 3000. 

However, his conclusion that the error is now 22 feet doesn’t necessarily mean that’s bad.  If the two closest observers were within 18 feet of each other, but the third observer was in fact the best for a particular data point, I’m not sure that we’d want the 18 feet.  For example, MGL and Marcel have a similar forecasting engine as its basis, while Chone does not.  By selecting the two closest in agreement (MGL, Marcel) doesn’t mean that it’s necessarily bad if we also include Chone.  Perhaps Greg and BIS are biased in the same manner (rely more on video than in-park).

Question to Peter: what is the correlation of STATS, BIS to Greg?  And what is the weight for each of those two?  Repeat for the other combinations.  Couldn’t we come up with a better estimate of where a ball landed based on different weightings?


#1    Peter Jensen      (see all posts) 2008/04/01 (Tue) @ 11:39

It doesn’t matter what the actual landing location is.  There is no possible actual landing location that has any two of the four systems observed landing points within 18 feet and 4 degrees 95% of the time.  That doesn’t mean that Greg or BIS or STATS might not be exactly right with 100% of their observations.  But if Greg is exactly right, it means that both BIS and STATS, who are each trying equally hard to be correct, are each going to be at least 18 feet or 4 degrees away from the correct (Greg’s) location at least 5% of the time.

You can’t weight systems and get a smaller error.  You can’t add observers and get a smaller error, unless one of those observers is closer to one of the two “best” observers that you already have.


#2    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 12:35

I’m not suggesting that you WANT to get a smaller error.  As in the MGL/Marcel example, I can get the smallest error by having two similarly biased system.  But, what I do want is the best estimate. 

It’s like when people say “All the fielding systems say...”.  But, if Win Shares, Baseball Prospectus, Pete Palmer, etc all use a very similar process and the exact same data, then minimizing the error doesn’t help me, since they are all similarly biased.  I’ll take UZR at 100% over these other 3, even if these other 3 are closer to each other than to UZR.

So, what I am suggesting is that perhaps there is bias with Greg/BIS, in that they rely heavily, if not exclusively, on the same data feed.  And so, minimizing the discrepancy between two sources does not in and of itself confer those two sources with the minimum error as the best, nor is it even necessarily what we want.


#3    Greg Rybarczyk      (see all posts) 2008/04/01 (Tue) @ 12:45

Nice work, Peter, I’ll readily admit that it is tough to be accurate on most balls hit inside the fences, given only the MLB video.  I think I’m probably better than the in-park guys due to my ability to repeat the video and sometimes see multiple angles, but still, the accuracy is nowhere near as good as it is for homers.

I would love to dive into your raw data if that wouldn’t get you into trouble.

One more thing to consider is that the error is not constant across the field (which might not be apparent from the zones you had everything broken into): what I mean is, it is certain that precision will be better the closer the ball landing spot is to a landmark - whether that is the warning track, the fence, the foul line, etc.  In the extreme case, if a ball landed on second base, everyone would have it nailed, and if it landed two feet behind it, same thing.  That’s why you did the right thing in excluding home runs - there are a multitude of landmarks beyond the fence that make it easy to be very precise about the landing spot, whereas it is very hard to be precise for a fly ball in the green expanse of center field, all the more so when the cameraman is zoomed in :(

It would be fantastic, at least for the purposes of tracking balls, if the field were marked, but I doubt that would ever happen, and I think I might oppose that on aesthetics.  For tracking home runs, I wish MLB would put tick marks on the foul poles for height, but not sure if the field can be marked unobtrusively… too bad.  Guess we’ll have to wait for the camera or radar system to have true precision on in-park hits…


#4    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 12:58

At the old multi-purpose fields (say when the 49ers played at Candlestick), we had great measurement lines.  I don’t think Giants fans complained, did they?

Regardless, I’m ok with the NFL broadcast having the 10-yd markers on the field of play.  I’d be ok with a two-second image of the lines being imposed on the screen when a fielder is about to catch a ball.


#5    Peter Jensen      (see all posts) 2008/04/01 (Tue) @ 14:46

Thanks Greg.  I was aware that the error is less near landmarks.  There are lots of opportunities for various biases to creep in with human observations.  The data collecting services are aware of that and that’s why STATS has at least two sources that they compare for consistency.  Only the STATS data came in zones for the angular portion.  BIS was directly on degrees and both BIS and STATS gave distances directly.

Tango - There seems to be a lack of communication between us in the above posts.  I don’t know how to resolve it.


#6    Greg Rybarczyk      (see all posts) 2008/04/01 (Tue) @ 14:53

Sportvision is the company that does the 1st and 10 line for NFL football.  They could probably work out some way to show those lines on the screen, either on the broadcast or on other video streams they could make available somehow, so as not to offend the masses.

Peter, are you going to be in San Francisco next month?  We could ask them about that…


#7    Peter Jensen      (see all posts) 2008/04/01 (Tue) @ 15:08

Yes, I’m going.


#8    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 15:24

Peter, ok, let me try a different way.

Let’s say you have one fantastic observer, the one guy you trust.  Let’s call him Goddy. And then you have two other observers, who you think are independent, but are covertly acting in concert with each other.  They are cheating.

You record the observations of these 3 sources.  Goddy is off by 20 feet from each of the other 2 guys.  The other 2 guys are off by 5 feet from each other.

There’s no reason to think that we want the two guys who are closest to each other, since the goal is not to minimize the differences, but to get the best data.  We would hope that we could also minimize the differences as a proxy.  But, this only works if all data sources are independent and unbiased.

In the case of Greg/BIS, they probably share very strong biases (they record their data predominantly off the same video feed).  It’s possible that STATS gives “perfect” data (Goddy-like) and the other two, while not acting in concert, are simply more biased, and biased in the same direction.

(Note to Greg/BIS: I’m not accusing anyone!  Just making the case here that having agreement doesn’t necessarily constitute reliability of data.  Just consistency of data.)


#9    Greg Rybarczyk      (see all posts) 2008/04/01 (Tue) @ 15:29

When I was Navigator on USS South Carolina (CGN-37), taking the ship in and out of port we would maintain a plot with 1 minute fixes, where the fixes were all made with 3 (or more) lines of position to known landmarks ashore.  The “excellent” fixes were the ones where the lines all met and made a nice little 6-line star, and the “fair” fixes were the ones where the LOP’s made a triangle.  When we couldn’t get better agreement, we’d just use the center of the triangle as the position, and start another round of fixes immediately.

So, I’m not sure why we couldn’t expect that a large number of observers couldn’t, on average, make a very accurate fix of landing position?  The only reason to doubt that the average position would be accurate is either a) small sample size, or b) persistent bias.  These may both be in play at one point or another, but I would think the average of a lot of observers would be pretty good.  Im not sure if Tango and/or Peter agree… However, I am sure that with the data Peter has, we can’t evaluate my theory, because we don’t have a “gold standard” correct position to check against.

I certainly believe that if we had a large number of at least minimally qualified observers time the flight of a fly ball, the average time would be very accurate.  Why would this not be the case for position as well?


#10    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 16:07

This is what I’m trying to get at.

Peter is looking at trying to minimize the absolute differences among two observers, when that is really not the objective.  Those two observers could be sitting right next to each other, so they could certainly be seeing things the same way.

My suggestion is that we’d need to figure out, of the 3000, who is the most reliable, and give those people more weight.

So, if you only had 3 or 4 observers, no matter how strong, you still need to weight all of them, since you can’t tell if any of them are biased in the same direction.  Without the “gold standard” to compare against, all must count to some degree.  And therefore, minimizing differences isn’t necessarily the objective.


#11    Peter Jensen      (see all posts) 2008/04/01 (Tue) @ 16:36

Tango - I’ll try again.  If you have your 3000 observers the only ways that you have to figure out the two most reliable are a: Know what the actual hit ball location is or b: pick the two whose differences have the lowest standard deviation.  Those two may share the same biased data and may not be the two whose data points are closest to the actual hit ball location.  But the standard deviation of their difference determines the reliability of the entire 3000 observers.  No matter where the actual hit ball location ends up being (presuming you could actually measure it exactly) you will never have ANY two observers out of the 3000 whose data points are closer to the actual hit ball location than the data points
of the two observers with the lowest standard deviation are from a set of points midway between their datapoints.  That is the limit of precision of the observational data of that group of people.


#12    Greg Rybarczyk      (see all posts) 2008/04/01 (Tue) @ 16:50

Sorry, Peter, I’m confused - are we trying to figure out accuracy or precision here in this discussion?  (this is apart from your original article, which was about comparisons between sources)

I think the objective is accuracy, because if a crowd can give me an accurate result on a particular batted ball position, I don’t really care how far from the true position any of them are (although, as Tango points out, if I only have a handful of observers, I may have to care who is more accurate).  In fact, there really isn’t such a thing as precision for a single data point, there is only accuracy.

You also used the term reliability in post #11 - not sure what you meant by that.

I think we’re working out something important here, so let’s try to continue to get on the same page…


#13    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 16:56

I’m not disagreeing with that.  I’m just saying that it is irrelevant.

The two guys with the lowest SD could simply guys who are sitting right next to each other. 

***

The preferred, to me, is the one guy who is the closest to the mean of the other 2999.  So, instead of looking at the diff of two individual points, and taking the SD of all those diffs, you take the diff of one point and the center of the other 2999 points, and take the SD of all those diffs.

Whoever has the highest SD is the least reliable, and can be safely dropped from the list.  You repeat this “survivor” process until you are left with a core group (100 observers?  1000?  I don’t know).  This core group represents the “gold standard”.

You can then compare each of the 3000 to this gold standard, and you can then assign a reliability (weight) to each of the 3000. 

So, the #1 reliable observer might get 5% of the weight.  The #2 might get 4.5% of the weight, and so on.  By the time you get to observer 100, the remaining observers might simply get no weight at all.


#14    MGL      (see all posts) 2008/04/01 (Tue) @ 18:22

To give an idea as to how bad the observations can be, for every zone (there are 22 in fair territory, each one a little more than 4 degrees, in the STATS system), I went through the STATS data and recorded the 3 furthest non-HR fly balls and the 3 shortest HR’s.  In each zone, those numbers should have been about the same of course, although I am not sure how they score the distance of a ball hit off the fence (how far it would have gone without the fence in the way, or simply the distance of the fence).  In all zones, the furthest non-HR were 10 or 20 feet further than the shortest HR’s.  In addition, there was too much variability among the 3 shortest HR’s and longest non-HR’s.  With a lot of data, all those numbers should have been the same.  I was shocked at how bad the numbers were.

That being said, you CANNOT use how bad the data is to assail a system like a PBP defensive one that relies on the data.  The better the data, the more accurate and reliable the system will be in the short term.  But after a season or two, it makes VERY LITTLE difference whether a ball was hit in one spot or 15 feet to the right or left, since all of the variability will even out in the long run.  As an illustration of how little the exact location of the ball matters, Chone’s (I think it is his - is Chone the same person as Rally?) hit location system is very much in agreement with UZR and that system has no idea where the balls are hit!  Same thing with Tango’s WWOY systems.  And of course, for offensive “systems” which people seem to have no problem with whatsoever, we also have NO IDEA where a ball was hit, only whether someone caught it or not, and how many bases the batter got, yet somehow we deem that reliable (whether someone catches a ball and how many bases the batter got is a proxy for exactly where the ball landed and how it was hit).

So the next time you see or hear someone say that they don’t trust the advanced defensive metrics at all because the data gathering system is unreliable (which it is), tell them that the systems are going to be quite accurate, especially after a season or two, even if the data is not that accurate.


#15    Greg Rybarczyk      (see all posts) 2008/04/01 (Tue) @ 18:41

MGL - why does the fly ball distance data seem wrong to you?  For different parks with different fence distances, you should get this sort of thing, if I understood you correctly.  Dodger Stadium is 395 feet to straightaway CF, while Comerica Park is 420 feet to the same place.  You ought to see some 410 foot fly balls and 400 foot homers in that zone.  Ditto RF in Fenway at 380 once you get away from the pole, vs. Yankee Stadium at 340 (or less close to the line).

Now if you’re talking about data within the same park, you might have a good point there, but some parks with sharp fence direction changes (e.g. the triangle in RCF at Fenway, the CF side of the Crawford boxes in Houston) might show that within a park…


#16    tangotiger      (see all posts) 2008/04/01 (Tue) @ 18:58

MGL was referring to the slice (not the zone) of the same park, I believe. 

There are 22 slices in fair territory.  If you take a 370 foot radius, that makes each slice about 25 feet of wall.  So, in some cases, like Fenway, within those 25 feet, you could have a sharp change.  Perhaps MGL could remove those slices or parks.

It is disappointing that they could get distances like that wrong.

MGL is right about the “poor” data gathering not being a problem if it is random.  If a scorer or park has a systematic bias, that’s a different animal altogether.


#17    Renè      (see all posts) 2008/04/01 (Tue) @ 19:10

What about installing a camera high above and behind home plate which doesn’t do closeups and has a superimposed grid with all the defensive zones? Seems cheap, relatively accurate and consistent to me, since the superimposed grids would make it easier to mark the landing spots.
While I’m not sure that MLB clubs would be happy to let Stats and BIS install proprietary cameras, it doesn’t seem like an issue that a little fee and other agreements can’t fix.


#18    Rally      (see all posts) 2008/04/02 (Wed) @ 00:04

Chone’s (I think it is his - is Chone the same person as Rally?)

Yup, that’s me.


#19    joe arthur      (see all posts) 2008/04/02 (Wed) @ 02:53

I think it would be useful if Peter could split his data into errors on balls hit less than 150 feet and balls hit further. For this purpose mlb.com data could be disregarded and comparisons made between the other three sources with the full data set.
To extend Greg’s point back in #3, the reason is that there are more landmarks in the infield both for distance and direction which aid the observer, whether live or on video. Besides the mound and the bases, you have the edges of the mound and the dirt cutouts around the bases, and the infield and outfield grass boundaries of the infield dirt. For a skilled observer with a good “map” of the park, gross errors are unlikely except to some extent on balls in the SS or 2B hole, and a very decent fraction of balls in play can be located very precisely. It won’t surprise me that the video observer may do better than the live observer on balls near the infield.
The situation is different on balls hit to the outfield unless they are hit close to the fence or close to the foul line. In these cases the live observer can make better use of landmarks than the video observer (due to the typical way the camera zooms in on the outfielder and loses context). Here gross errors should be larger and more common, but I’d expect a skilled live observer to be more accurate overall.


#20    MGL      (see all posts) 2008/04/02 (Wed) @ 05:14

#15,16, yup, I meant at each park.  Here are some examples of what I mean.  The first 3 numbers next to each “slice” are the 3 shortest HR’s and the next 3 numbers are the 3 longest non-HR’s.  They don’t seem to jive to me in a lot of cases, I but I haven’t really looked closely at them.

“lan”,"C",330,330,340,340,340,340
“lan”,"D",350,350,351,370,360,360
“lan”,"E",350,350,360,360,360,360
“lan”,"F",360,360,370,400,370,370
“lan”,"G",369,375,380,370,370,370
“lan”,"H",380,375,376,380,380,380
“lan”,"I",380,380,382,380,380,380
“lan”,"J",380,380,380,380,380,380
“lan”,"K",390,390,390,390,390,390
“lan”,"L",390,390,390,390,390,390
“lan”,"M",400,400,400,400,400,395
“lan”,"N",400,400,400,400,395,395
“lan”,"O",390,390,400,390,390,390
“lan”,"P",390,390,390,390,390,390
“lan”,"Q",390,390,390,390,390,380
“lan”,"R",377,380,380,380,380,380
“lan”,"S",378,380,380,380,380,380
“lan”,"T",370,370,380,380,380,370
“lan”,"U",365,368,370,360,360,360
“lan”,"V",360,360,370,360,360,360
“lan”,"W",340,340,340,360,360,360
“lan”,"X",330,330,334,350,335,330

“tor”,"C",328,328,328,330,330,330
“tor”,"D",333,333,333,340,340,340
“tor”,"E",337,340,340,360,350,350
“tor”,"F",340,340,340,370,370,360
“tor”,"G",360,360,360,370,370,370
“tor”,"H",360,370,370,380,380,375
“tor”,"I",384,384,384,380,380,380
“tor”,"J",380,380,380,390,390,390
“tor”,"K",390,390,390,395,395,395
“tor”,"L",390,390,390,400,400,400
“tor”,"M",400,397,397,400,400,400
“tor”,"N",400,400,401,400,400,400
“tor”,"O",390,405,410,400,400,400
“tor”,"P",380,380,380,400,395,395
“tor”,"Q",378,378,378,390,390,390
“tor”,"R",380,380,385,390,380,380
“tor”,"S",360,380,380,380,375,375
“tor”,"T",350,350,350,370,370,370
“tor”,"U",340,340,340,360,360,360
“tor”,"V",340,340,340,350,350,350
“tor”,"W",334,334,334,340,340,340
“tor”,"X",330,340,340,370,330,330


#21    tangotiger      (see all posts) 2008/04/02 (Wed) @ 07:11

Here’s the corresponding STATS diagram:
http://www.baseballthinkfactory.org/szymborski/zrgrid.jpg

And for kicks:
http://www.retrosheet.org/hitloc.jpg


#22    Peter Jensen      (see all posts) 2008/04/02 (Wed) @ 08:22

Joe - Both BIS and STATS give the distance on ground ball hits as the distance to where it is picked up or where it hits the wall so limiting to 150 feet would eliminate ground ball hits fielded by outfielders.  Limiting to all ground balls (406 events) improves the vector standard deviation between Greg and both BIS and STATS by about 10%. The vector SD between BIS and STATS is unchanged.  The distance SD also improves but is not particularly important on ground balls.  Estimating vectors through the infield is made easier because of the “landmarks”, but is more difficult because the distance for each degree is less. At 100 feet, 1 degree is only 1.7 feet so an observer needs to decide not only where a third baseman was when he fielded a ball but whether his glove was nearer his left foot or right foot. 

MGL is absolutely correct about the variability evening out in the long run and minimizing the impact of lack of accuracy on fielding metrics; unless there is a bias by recorders at the individual home parks.  One would need access to both BIS and STATS data for at least an entire season to investigate whether that exists.

Greg #12 - Sorry about the confusion of terms.  Since we don’t know the actual hit location we can’t judge the absolute accuracy of any estimate of a hit location.  The problem then becomes establishing a confidence interval for any system of estimates that the actual hit location will be within that interval from the estimated location a certain percentage of the time.  Precision, as I have used the term, is choosing a percentage and then finding the interval that achieves that percentage.  Reliability is choosing the interval first (like existing zones) and estimating the percentage that will be within that interval.

Tango has been suggesting (I think) that having multiple observers will give a better estimate of the hit location.  He may be right.  I have been insisting that there is a limit on the precision of any estimate by human observers that is inherent in our ability to descriminate distance and angles by sight alone.  It is my opinion that the care taken by BIS, STATS and Greg have already approached that limit of precision and that further improvements will more likely be found in some method of electronic data gathering.  I don’t think I would get an argument from anyone that full camera coverage of the field would be the ideal data gathering source.  My article was written to show that the current levels of precision were large enough that there may be an opportunity for Hit f/x to make a significant improvement until we can get full camera coverage.


#23    joe arthur      (see all posts) 2008/04/02 (Wed) @ 20:11

Peter,
you’re right that grounders are in part a special case, regardless of how far they travel. The determination of vector likely depends on where the ball passed through the infield, so we shouldn’t use distance as a criterion for measuring vector agreement on ground balls. But observation of distance still depends on landmarks, regardless of hit type, as long as each observer has the same rules for measuring distance. Rarely, BIS may record a particular ball as a ground ball fielded 250 feet from the plate, while STATS recorded the same ball as a line drive traveling 140 feet in the air before hitting the ground. But if they agree on the hit type, I believe they use the same rule for measuring distance.

Though I yearn for you to break out the infield and outfield discrepancies separately, these are the sources we have, and you’ve done a great service by indicating the rough scope of their agreement. [Actually there is at least one more source, cbs sportsline http://www.sportsline.com/mlb/gamecenter/gamechart/MLB_20080401_TOR@NYY ,which uses a grid system like mlb.com but with different coordinates - I don’t know if their underlying data source is really independent.]
Nonetheless, I don’t think your evidence actually justifies the strong conclusion that this is the “best” agreement we could get from independent human observers. This is just an observation of how much agreement there happens to be between these sources. Circa 1990, STATS described itself as having multiple stringers as backups to provide a double check on accuracy. Video was not routinely available then, of course, but as you describe it, their approach to verification is now down (usually) to a second source. I don’t know whether it is clear that BIS formally double checks every ball in play.

Also, I may be misreading your analysis, but since you say you treat the STATS vector as the center of its zone for computing errors, it sounds as though you have treated the difference in vector precision as an error in observational accuracy in itself. By analogy, if I report someone’s height to the nearest inch as 5’10”, and you report it as 179 cm, there need not be any observation error between us just because 179 cm converts to 5’10.47”. Within the limits of each measurement system they are identical. Disagreements should be counted only when the finer measurement does not lie within the range defined by the grosser measurement.

One source of error not yet mentioned is simple data entry error; the observer for example saw a location of vector S and distance 300, but recorded S/360. With more limited double checking, such large errors might not actually be caught today. How common are they? I’m not ready to agree that STATS or BIS really represent the limits of human observation systems… Like you I do look forward to a hit f/x system.
regards
Joe


#24    Peter Jensen      (see all posts) 2008/04/03 (Thu) @ 03:32

Joe - I went through all the data looking for any large differences that may have been caused by data entry or transcription error.  There were none in BIS or STATS.  I think there were 3 in Greg’s data.  I eliminated those before I did my analysis.

Since STATS chooses to record data in 4 degree wide zones instead of by one degree increments they are by default limited to a precision of +- 2 degrees.  That BISs observations only fall into the STATS zones less than 47% of the time is an indicator that the true limit of precision of human observation may be greater than that.

It is certainly possible that some other human observational data gathering method might be better than what we have now.  But it would certainly entail a lot more work.  And, as MGL points out, there wouldn’t be much gain for defense metrics.  I want Hit f/x primarily for the speed off the bat, which is difficult it not impossible to measure accurately any other way.  That it will be an objective source of data for defense metrics, rather than a subjective one, is just another plus.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 04:02
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 19:13
Offense by position groups by decade

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?

Nov 19 13:50
Response of a fired head coach