THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Sunday, October 09, 2011

How not to do a study

By Tangotiger, 07:12 PM

Yesterday, I made a point of saying that technically, you cannot include in your control dataset the data that you are actually studying.  And that while practically, if the data in question makes up only 5% of the whole control dataset, you don’t need to worry about it, you still need to point out this bias.

Why?  Because today I see someone do this

For 2011, BB-REF PI lists 504 starts where the starter pitched at least 24 outs, which I choose so as not to bias the results by looking only at complete games. The 504 starts were distributed among 140 pitchers.

When I add up the game data for these starts, I get the following line:

W L W-L% ERA GS CG SHO IP H ER HR BB SO WHIP
351 82 0.811 1.23 504 169 73 4,189.2 2,532 572 197 678 3,220 0.77

That seems to be a pretty good line to me, and would argue keeping a starter in for the 9th inning who has already gone 8 IP, in the absence of any other data.

But he DID bias the study.

There are two things being said here, both important to note.  The first is that the poster selected all pitchers with at least 24 outs recorded. And among those pitchers, they had an ERA of 1.23 for the whole game.  This should come as no surprise.  If I had selected all pitchers who pitched in a game their team won, you’d also see some ridiculously low ERA. 

The second thing the poster said is this:
“keeping a starter in for the 9th inning who has already gone 8 IP,”

Now, if you have someone who pitched through 8 innings, this guy may have had an ERA of 1.00 through 8 (for illustration purposes only).  If he ends up with an ERA of 1.23, does this prove that you made the right choice to let him pitch in the 9th?

In the above line, there were 504 pitchers, meaning that in their first 8 innings (4032 innings), they would have had (if the completely made up ERA of 1.00 is true) 448 ER.  Those pitchers actually ended with 4190 innings and 572 ER.  That means that AFTER the 8th inning, they pitched an extra 158 innings and allowed an extra 124 ER.

So, when you look at the poster’s comment: “keeping a starter in for the 9th inning who has already gone 8 IP,”, we see that in this illustration it was a horrible choice.

Now, what if instead, the pitchers through 8 had an ERA of 1.25?  In that case, in their 4032 innings, they’d have allowed 560 ER.  And therefore, in their performance AFTER the 8th inning, they’d be at 158 more innings and 12 more ER, for a miniscule ERA of 0.68.

What does all this mean?  Well, the first thing you need is your control dataset, which in this case is performance through 8 innings.  This is the group of pitchers you are interested in.  And since the question is: “how did the guys who pitched 8 do in the 9th”, then what you need to study is the out-of-sample data: the performance in the 9th inning.

The above dataset from the poster doesn’t help us.  The poster presented us with a combined dataset of the in-sample dataset, the data that we are selecting on, and the out-of-sample dataset, the data that we are interested in studying.  By just coming up with the dataset as he did it is a sampling bias. 

As my two illustrations showed, we have no idea if the pitchers allowed a tons of runs in the 9th, or hardly any.  And what we care about, what we are testing, is their performance in the 9th inning.

Let me give you another one: in 2011, there were 141 pitchers that pitched 9 innings.  Their ERA was 0.65.  Does this mean that these guys would be fantastic candidates to pitch in extra innings?  And, if they did, suppose they each pitched one inning, and gave up one run in extra innings.  Well, now their ERA would be 1.48.

If all I did was tell you that there were 141 pitchers that pitched 10 innings and their ERA was 1.48 through 10 innings, would you therefore conclude that the manager was correct in letting them pitch 10?  If this is the only information I gave you, then you couldn’t come to any such conclusion.

This is why it’s important to separate the data that you sample on, and the data you are actually testing.  Even in cases where it makes impractical sense to do it, you should still do it.  Because many times, like here, a poster forgets, or is unaware, of this.

Thank you to the poster for providing the source material, which was used for instructive purposes.

(9) Comments • 2011/10/10 • SabermetricsStatistical_Theory
Page 1 of 1 pages

Latest...

COMMENTS

May 25 23:40
“Why Kickstarter works”

May 25 19:41
What sabermetrics is NOT

May 25 19:41
Pete Palmer’s new book: Basic Ball

May 25 17:32
Largest demonstration in Canadian history?

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

October 09, 2011
How not to do a study