THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, April 01, 2008

Cross-checking the data providers

By Tangotiger, 09:21 AM

Fabulous article by Peter Jensen:

Let’s take the two observers in closest agreement, BIS and Greg, split the difference between them and call that the best guess of the actual hit location. What is the minimum distance and degrees that will have 95 percent of both Greg’s and BIS’ observations included? The answer is +-18 feet and +-4 degrees. That’s a pretty big area. It is two whole zones in width.
...
It doesn’t matter if you have three observers or 3,000, the composite data will never have any less error than that of the two closest. Having many observers is only useful for finding those two best observers.

Fantastic stuff.  And great point.  Peter is right, that by throwing in as many observers as I can, I wouldn’t want to weight each one equally.  The better the estimator (relative the other other 2999), the more I would weight that observer.  Ideally, you’d be down to just one observer, the perfect guy.  Realistically, you might have one observer carry 10% of the weight, another 9%, another 8%, and on an on, such that you only need about 20 observers out of the 3000. 

However, his conclusion that the error is now 22 feet doesn’t necessarily mean that’s bad.  If the two closest observers were within 18 feet of each other, but the third observer was in fact the best for a particular data point, I’m not sure that we’d want the 18 feet.  For example, MGL and Marcel have a similar forecasting engine as its basis, while Chone does not.  By selecting the two closest in agreement (MGL, Marcel) doesn’t mean that it’s necessarily bad if we also include Chone.  Perhaps Greg and BIS are biased in the same manner (rely more on video than in-park).

Question to Peter: what is the correlation of STATS, BIS to Greg?  And what is the weight for each of those two?  Repeat for the other combinations.  Couldn’t we come up with a better estimate of where a ball landed based on different weightings?


(24) Comments • 2008/04/03 • SabermetricsBatted_Ball
Page 1 of 1 pages

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:33
Avery being Avery

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP