THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Sunday, October 09, 2011

How not to do a study

By Tangotiger, 07:12 PM

Yesterday, I made a point of saying that technically, you cannot include in your control dataset the data that you are actually studying.  And that while practically, if the data in question makes up only 5% of the whole control dataset, you don’t need to worry about it, you still need to point out this bias.

Why?  Because today I see someone do this

For 2011, BB-REF PI lists 504 starts where the starter pitched at least 24 outs, which I choose so as not to bias the results by looking only at complete games. The 504 starts were distributed among 140 pitchers.

When I add up the game data for these starts, I get the following line:

W L W-L% ERA GS CG SHO IP H ER HR BB SO WHIP
351 82 0.811 1.23 504 169 73 4,189.2 2,532 572 197 678 3,220 0.77

That seems to be a pretty good line to me, and would argue keeping a starter in for the 9th inning who has already gone 8 IP, in the absence of any other data.

But he DID bias the study.

There are two things being said here, both important to note.  The first is that the poster selected all pitchers with at least 24 outs recorded. And among those pitchers, they had an ERA of 1.23 for the whole game.  This should come as no surprise.  If I had selected all pitchers who pitched in a game their team won, you’d also see some ridiculously low ERA. 

The second thing the poster said is this:
“keeping a starter in for the 9th inning who has already gone 8 IP,”

Now, if you have someone who pitched through 8 innings, this guy may have had an ERA of 1.00 through 8 (for illustration purposes only).  If he ends up with an ERA of 1.23, does this prove that you made the right choice to let him pitch in the 9th?

In the above line, there were 504 pitchers, meaning that in their first 8 innings (4032 innings), they would have had (if the completely made up ERA of 1.00 is true) 448 ER.  Those pitchers actually ended with 4190 innings and 572 ER.  That means that AFTER the 8th inning, they pitched an extra 158 innings and allowed an extra 124 ER.

So, when you look at the poster’s comment: “keeping a starter in for the 9th inning who has already gone 8 IP,”, we see that in this illustration it was a horrible choice.

Now, what if instead, the pitchers through 8 had an ERA of 1.25?  In that case, in their 4032 innings, they’d have allowed 560 ER.  And therefore, in their performance AFTER the 8th inning, they’d be at 158 more innings and 12 more ER, for a miniscule ERA of 0.68.

What does all this mean?  Well, the first thing you need is your control dataset, which in this case is performance through 8 innings.  This is the group of pitchers you are interested in.  And since the question is: “how did the guys who pitched 8 do in the 9th”, then what you need to study is the out-of-sample data: the performance in the 9th inning.

The above dataset from the poster doesn’t help us.  The poster presented us with a combined dataset of the in-sample dataset, the data that we are selecting on, and the out-of-sample dataset, the data that we are interested in studying.  By just coming up with the dataset as he did it is a sampling bias. 

As my two illustrations showed, we have no idea if the pitchers allowed a tons of runs in the 9th, or hardly any.  And what we care about, what we are testing, is their performance in the 9th inning.

Let me give you another one: in 2011, there were 141 pitchers that pitched 9 innings.  Their ERA was 0.65.  Does this mean that these guys would be fantastic candidates to pitch in extra innings?  And, if they did, suppose they each pitched one inning, and gave up one run in extra innings.  Well, now their ERA would be 1.48.

If all I did was tell you that there were 141 pitchers that pitched 10 innings and their ERA was 1.48 through 10 innings, would you therefore conclude that the manager was correct in letting them pitch 10?  If this is the only information I gave you, then you couldn’t come to any such conclusion.

This is why it’s important to separate the data that you sample on, and the data you are actually testing.  Even in cases where it makes impractical sense to do it, you should still do it.  Because many times, like here, a poster forgets, or is unaware, of this.

Thank you to the poster for providing the source material, which was used for instructive purposes.


#1    bobm      (see all posts) 2011/10/09 (Sun) @ 23:19

Tango: while you decided to focus on ERA hypotheticals, I said that these starters had a “pretty good line.” That line includes data besides ERA, including W/L, CG and SHO, data which you apparently chose to ignore.

Don’t pitchers’ wins and ERA together comprise a reasonable enough accounting scheme to infer what starters did in the 9th inning, without having to examine the 15,822 plate appearances in 9th innings in the 2011 season?  I neither “forgot” nor was “unaware” of the sampling issues.

This group of starters won over 80% of their decisions and approximately 70% of the 504 starts, both stats at or above the rates for
MLB leaders among qualifying starting pitchers. 

Based on these win rates, how likely is your hypothetical situation where the starters went 8 and then blew the game in the 9th inning? 

Not very likely, especially since men put on base by starters in the 9th and subsequently bequeathed to relievers who scored runs, meaningful for the outcome or otherwise, were charged to the starters, not the relievers.  These losses would have been charged to the starters.

(By the way, the relievers went 43-28, for a 61% win rate, a rate worse than the starters.)


#2    Tangotiger      (see all posts) 2011/10/09 (Sun) @ 23:34

bobm:

The only thing we care about is how they did in the 9th inning.  Your study doesn’t show us what they did in the 9th.


#3    Tangotiger      (see all posts) 2011/10/09 (Sun) @ 23:55

By the way, if anyone else disagrees with my assessment, please post your reasons.

If someone doesn’t fully understand what I said in this thread, please let me know at which point I lost you.

If you feel I’m being condescending, then ignore me.


#4    bobm      (see all posts) 2011/10/10 (Mon) @ 00:16

Actually, most fans, players and managers care about a different outcome--winning the game. 

Remember, leaving the starter in was allegedly part of the “worst managing ever.” Isn’t the null hypothesis “not the worst managing ever”?

Does the starters’ W-L record necessarily indicate that they did poorly in the 9th inning, poorly enough to lose the game consistently, and that a manager who employed such a strategy would be doing “the worst managing ever”?

If starters who pitched into the 9th inning won only 60% of their decisions and approximately 50% of the 504 starts, that makes me a lot more doubtful about the strategy in the context of starting pitcher performance in 2011. 

Finally, with minimal effort I got a strong indication of the right strategy hours before the poster and thread on this site resistantly meandered to it.  Sometimes simple experiments up front save one from travelling far down the wrong path.


#5    MGL      (see all posts) 2011/10/10 (Mon) @ 01:24

Tango, why do you bother? You are infinitely more patient and kind than I am!


#6    MGL      (see all posts) 2011/10/10 (Mon) @ 01:26

The truly sad part about the whole thing is that had the Cardinals blown the game, the amount of disagreement and vitriol would be 10% of what it was…


#7    Tangotiger      (see all posts) 2011/10/10 (Mon) @ 08:17

I’m not doing it for bob’s benefit here, though it would be great if he were to step back and learn.

Ideally, most of the regulars of this site will learn from this illustration.


#8    Steve C      (see all posts) 2011/10/10 (Mon) @ 12:34

My main concern with citing releiver winning percentage is that releivers are going to consistently be brought into close games introducing a significant bias. 

How many times have we seen a manager leave an ace in that has been cruising when his team is up by 8?  If you want to use W-L you need to control for the game state and focus on games in which the lead is 1 run or the game is tied and go farther back in time to increase the sample size.  I would expect a significant shift in winning percentage.


#9    Neil S      (see all posts) 2011/10/10 (Mon) @ 15:13

I’m reminded of opening day, Jays vs. Tigers, in 1992. Jack Morris cruised through the first 8 innings, the Jays were leading 4-0 going into the bottom of the 9th, and he proceeded to give up a homer, get himself into some trouble, and the Jays won 4-2 when all was said and done. The Jays won, sure, and Morris’ whole line is still very good - but, clearly, he was exceptional in the first 8 and awful in the 9th. So this seems like a decent example of just how poorly bob’s methodology does at explaining whether it was a good idea or not to bring Morris back out for the ninth.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 08:11
What sabermetrics is NOT

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards