THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Friday, October 21, 2011

A Warning from MGL!!

By , 11:32 PM

Now that I have your attention…

I just read a decent article on FG written by David Cameron about TLR’s bullpen choices in the 9th inning of game 2, which we have discussed on this blog.

In the comments section, I kind of lectured Dave, who has done much excellent analyses and written many fine articles on FG and other sites and publications, about using samples properly. I think it is important enough to warrant a thread on this blog.  Essentially I said (I’ll reprint some things from my comments on his article):

I am very uncomfortable when an analyst gets to choose which sample he wants to present to support his point or his opinion. This year only? Last 2 years? 3 years? Career? Lately, as in last half season? The last 10 games? You should not be allowed to do that, for obvious reasons (cherry picking your evidence makes your arguments intellectually dishonest, or misleading at best).

For example, Dave said this:

“While Hamilton’s strikeout rate against LHPs jumps to 22.1%, Rhodes K% against LHBs this year was just 16.1%. His career numbers are much better, but he’s not the same pitcher he was a few years ago, and Hamilton had hit an outfield fly against him the night before.”

Yes, he is not the same pitcher, but if this year his K% was higher than his career numbers, Dave would probably be quoting us his career numbers (heck, I would too if I had the choice!). The analyst should NOT have the choice. He should always be quoting a projection which is some kind of weighted career average or whatever the accepted standard is!

And for the last part of that last sentence, about Hammy hitting a fly ball the night before, David should get immediately thrown into the MGL jail. I can’t believe he even said that in that context. Shame on you Dave!

I followed up with this:

If you are allowed to split the samples up anyway you want, you can probably support just about any thesis from one end of the spectrum to the other. Which is why a standard must be used. As in all scientific fields, in sabermetrcis there is a generally acceptable standard in the industry – weighted career (or to simplify last 3 or 4 years).

That is not an arbitrary method mind you. When we are trying to answer questions such as, “Who should be used in an upcoming situation,” we are essentially asking the question, “How do we expect so and so to perform at some time in the future, in most cases, as in this, the immediate future, such as in the next PA or tomorrow?”

To do that, again, the accepted standard in the industry, after years of very thorough research and analysis, is to use a “Marcel-like” projection for component rates, GB and FB frequency, platoon splits, etc. It is also accepted standard to ignore things like clutch, home/road splits (other than the normal one of course), day/night, pitcher/batter historical matchups, hot and cold streaks, etc. Not because we KNOW that these don’t exist, but because we find, again, after years of thorough and extensive research, that even if they exist, they have little predictive value.

So I implore all analysts, including David, who is a fine one, to use these standards when presenting a thesis. If time or other constraints exist, which I understand, then some semblance of these standards should be used, or some qualifications issued, rather than disingenuously using one year, half year, or other similarly small and/or misleading samples (such as un-weighted career) in order to support an argument.

(4) Comments • 2011/10/22 • SabermetricsSamplingStatistical_Theory
Page 1 of 1 pages

Latest...

COMMENTS

May 21 05:05
Cory speaks!

May 21 04:48
Extra, extra, read all about it: MLB has inter-conference play this weekend!

May 21 04:21
Is the Shift actually working?

May 21 02:57
Are bullpen sessions predictive?

May 21 01:01
Lincecum the catcher

May 20 21:02
Poll: I would have suspended Lawrie/Alomar for ___ part of the season

May 20 20:59
How do you incentivize a power hitter to bunt?

May 20 14:22
Combining Rock-Paper-Scissors (RPS) with “What Number Am I Thinking Of”

May 20 11:51
When to buy Facebook?

May 19 23:47
Sponsoring MLB jerseys

THREADS

October 21, 2011
A Warning from MGL!!