THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Thursday, February 19, 2009

Intentionally using less data?

By Tangotiger, 10:55 AM

Tim Marchman writes:

while reading the new edition of the Baseball Prospectus annual, I was a bit put off by this information on their new play-by-play defensive metric:

The best PBP systems rely on highly detailed batted-ball data—a direction for where the ball was hit, some indication of how hard, and the result of the play, with the field broken down into many, many fairly small zons. That data is typically available only for the majors. To keep the majors and minors on an even setting, we’re dealing with a reduced set of data.

As I understand the idea here, BP wants to make apples-to-apples comparisons between their minor league and major league defensive numbers, and so is artificially crippling the data set they’re using to derive the major league numbers to bring it into line with the less granular data available for the minors. I see the appeal, but it makes the topline numbers suspect, especially when the system arrives at seemingly wonky results like Bobby Abreu rating as a plus defender and Hanley Ramirez as a Gold Glove candidate last year. Of course even very good systems have outliers, but not every system intentionally deals with a reduced set of data. For now I’ll continue to rely on UZR and Plus/Minus, though I’ll be curious to see what people like Tom Tango have to say about the technical pros and cons of the new system.

Since Tim asked, I’ll respond. 

I have no doubt that Clay wrote that.  Why is that?  Because this is the same thing he said a few years ago when defending using the non-PBP numbers in FRAA, to make the “apples-to-apples” comparison for all of baseball history.  I disagreed completely then, and I disagree just as much now.

Why is that?  Because the biases are not systematic.  Presuming that Clay is using Dan Fox’s Simple Fielding Runs (SFR), then the SFR of Darin Erstad in 2000 has no stronger relationship to the SFR of Darin Erstad in 2001 than to the UZR of Darin Erstad in 2001.  Indeed, I would bet that the relationship of SFR in year X is stronger with UZR in year X+1 than to SFR in X+1!  (Shades of ERA/FIP discussion.) The only way for the year-to-year SFR relationship to be stronger is if there is a systematic bias with SFR to begin with.

As a good example, an early version of PMR (Pinto’s model) had a super love affair with Orlando Hudson, while UZR merely liked-to-loved him.  Why is that?  One possibility was that Orlando Hudson was a popup hog, and he was getting a benefit there every year.  That’s a systematic bias. 

But, this doesn’t apply here with SFR and UZR.  There is no question at all that you always want to use the maximum data (combined with an intelligent approach, natch) and using the same methodology does not, in-and-of-itself, provide an apples-to-apples comparison.

(Hat tip: Repoz)

(8) Comments • 2009/02/27 • SabermetricsFielding
Page 1 of 1 pages

Latest...

COMMENTS

May 26 01:11
Largest demonstration in Canadian history?

May 25 23:40
“Why Kickstarter works”

May 25 19:41
What sabermetrics is NOT

May 25 19:41
Pete Palmer’s new book: Basic Ball

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

February 19, 2009
Intentionally using less data?