THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, February 20, 2009

The Color of Clutch

By Tangotiger, 10:25 AM

This is the summary to the whole Clutch project, over at Hardball Times. 


#1    Guy      (see all posts) 2009/02/20 (Fri) @ 10:55

Nice writeup, Tango.  My only quibble is with your decision to just exclude IBB, which seems a bit arbitrary.  If you include IBB, your hitters increase their OBP advantage (compared to all PAs) by 4 points, while their SLG advantage declines 19 points—pretty close to a wash. 

Moreover, the IBB to these great hitters are likely not randomly distributed.  As a result, it’s possible that your hitters in their non-IBB PAs may face a greater platoon disadvantage, and/or be facing superior pitchers (or a reliever rather than starter).  A manager doesn’t pitch to these guys in a 3.0 LI situation unless a) he has no choice, or b) he likes the pitcher matchup.  Have you looked at platoon matchups, pitcher quality, or pitcher time-thru-lineup for the two samples?

It may be that the fans saw something, but I remain a bit skeptical.....


#2    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 11:18

The SLG is unaffected by IBB.

***

The better measure would be WPA/LI.


#3    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 11:36

That is:

WPA/LI (Situational wins) per 700 PA
- overall +0.37 (Fans), +1.87 (Tango)
- high leverage -0.35 (Fans), +0.59 (Tango)

So, overall, my hitters were +1.50 wins per 700 PA better than the Fans.  That’s around 25 wOBA points.

In high-leverage situations, my guys took a 1.3 win dump, while the Fans took a 0.7 win dump.  My guys ended up still being +0.94 wins per 700 PA ahead.  That’s around 16 wOBA points better.

That’s a 9 point difference relative improvement for the Fans, but not enough to close the gap.

(It should be noted that in high-leverage situations, you are facing good relievers, so we should expect everyone to go down.  My guys dumped more, but not enough to be worse than the Fans.)


#4    Guy      (see all posts) 2009/02/20 (Fri) @ 11:49

If these players had all performed in the clutch the same as they did overall, Tango’s hitters should be 13.3 WPA and Fans 1.4, or +11.9 for Tango.  In fact, they were Tango 4.2 and Fans -6.3, or +10.5 for Tango.  So the Fans overperform (and/or Tango underperforms) by 1.4 wins over about 1900 high-LI PA.  To put this in context, this means the average clutch hitter delivers an extra .05 wins per season (actually a bit less, since his less-than-expected performance in the other 90% of his PAs will cost something), or maybe 7/10 of a win over a long career. 

Still, I’m not convinced the two pools of hitters face equal pitchers/matchups.  Opposing teams should deploy their greatest resources against the best hitters.  They should seek the platoon edge against A-Rod, even if it means they give it up against a lesser hitter; they may give him an IBB, and risk a lesser hitter beating them.  In fact, it would be surprising (and an error for the opposing teams) if your hitters did NOT face tougher pitchers/matchups.....


#5    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 12:02

Guy, you make a good point, and a comment on ballhype shows a disparity in handedness among the pools of hitters.

Hopefully other Retrosheeters will take this occasion to come up with more context to the numbers, specifically as to what the handedness difference should result and the quality of pitchers faced by both groups.  Here is the list of players with handy-dandy MLBAM IDs:
http://www.tangotiger.net/clutch/fangraphs_list.txt


#6    Guy      (see all posts) 2009/02/20 (Fri) @ 12:04

Tango:  It would be helpful to know what hitters overall do in LI>2.0 PAs.  That will give us a sense of whether the Fans team overperformed, or your team underperformed (or both).  How much do WPA and wOBA decline overall?


#7    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 12:13

Here is player data from 1999-2002:
http://tangotiger.net/files/clutchdata.txt

IIRC, at the high-end, the wOBA drops by 6 points on average

Baseball-Reference tracks it at LI >= 1.5, so you can see league totals here:
http://www.baseball-reference.com/pi/bsplit.cgi?lg=ML&team=TOT&year=2008#wpa-lever

But of course, the players in the high LI aren’t necessarily as proportionate as in the low LI.


#8    Dan Rosenheck      (see all posts) 2009/02/20 (Fri) @ 12:15

I’ll just post here the same thing I put up on BTF, even though you already answered the question:

I would think you would need to look at WPA or WPA/LI here rather than simple run estimation--clearly, the reason the Fans choose contact hitters is because clutch situations occur disproportionately often with runners in scoring position, when the difference between a single versus a walk or a strikeout versus a flyout can really be night and day. Take this situation: Bottom of the ninth, home team down by 1, one out, runners on second and third. Who would you rather have up to bat there--the 2007 version of Jack Cust (.256/.408/.504) or the 1971 version of Glenn Beckert (.342/.367/.406)?

Cust is by far the superior hitter in a neutral context--he generates 7.78 runs per 27 outs, versus 5.57 for Beckert. But here, the only stat that matters is batting average. Since there are two outs, both runners will be off with the pitch, so any hit at all wins the game for the home team, regardless of whether its a single, double, triple, or homer. And a walk has no effect on the team’s win expectancy, since the winning run is already at second base. Finally, an out ends the game. Thus, with Cust, you have a 20% chance of a game-winning hit, a 21% chance of a walk (after which, according to winexp.walkoffbalk.com, the home team is 25% likely to win), and a 59% chance of a loss, for a win probability of .2+(.25*.21) = 25%. With Beckert, you have a 33% chance of a game-winning hit, a 4% chance of a walk, and a 63% chance of a loss, for a win probability of .33 + (.04*.25) = 34%. While Cust is a 40% more valuable hitter overall, Beckert is a 36% more valuable hitter in this situation.

It seems to me that an ideal bench would have one pure singles hitter (say a .317/.338/.380 1994 FĂ©lix Fermin) for a situation like the one above, one pure OBP guy to lead off innings (who can’t be all walks because in that situation the P will throw strikes; I’d toss out a .284/.353/.344 1955 Pete Runnels), and one raw slugger for when you’re down by 2 with 2 outs in the 9th and runners on first and second (Tony Armas’s .217/.254/.453 showing with 36 HR in 1983 would be a good bet).


#9    Guy      (see all posts) 2009/02/20 (Fri) @ 12:35

Dan:  I don’t disagree that different situations reward different skill sets.  But the fans may be underestimating the cost of DPs in their preference for someone who “hits the dam* ball.” The Fans’ hitters hit into about 14 more GDP than Tango’s hitters (relative to PAs), which effectively drops their OBP another .007 or so (more, really, since DPs are a huge WP killer).  It takes a lot of extra advanced runners to make up for that.  And since the high-K power hitters are FB hitters, they may get just as many SFs as the high-contact guys.


#10    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 12:43

It’s an almost certainty that Fans think of clutch in terms of RBIs, and in that respect, they had 5% fewer RBI per PA in Clutch situations compared to my guys (as opposed to 18% fewer RBI per PA in all situations).


#11    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 16:32

Here are some more component number breakdowns for your pleasure.

In all instances, the numbers in high-leverage situations are in parens.

wOBA
Fans: .340 (.342)
Tango: .361 (.355)

So, Fans improve a bit in their overall game, while my guys decline.  Seeing that the 6 point drop is consistent with my 1999-2002 study, that should have been expected.

TTO (i.e., BB+K+HR+HBP divided by PA, excluding IBB)
Fans: 25% (27%)
Tango: 34% (36%)

No surprise.  Putting bat on ball declines for everyone.

BABIP
Fans: .308 (.306)
Tango: .309 (.314)

Slightly interesting…

2B+3B per 1B+2B+3B (i.e., XBH per H in ball in play)
Fans: .239 (.219)
Tango: .278 (.249)

So, everyone loses power, but my guys lose more.

K per PA
Fans: increases by 8%
Tango: by 10%

BB per PA
Fans: +6%
Tango: +5%

HR per contacted PA (i.e., exclude BB,K,HBP)
Fans: .033 (.036)
Tango: .057 (.054)

***

The overall general trend is: be more careful with the game on the line, and put the ball in play at the expense of power.

This could also apply to all late-game situations, regardless of whether the game is close or not.  That is, it could be the basic trend that you do all this the later the innings go by.  At the very least, that was one of the findings in The Book, that the TTO goes up the 2nd and 3rd time through the order (I think that’s what I found). 

And of course, in the later innings, you are facing power relievers, so all this data we are seeing may simply reflect our expectations of the matchups, regardless of clutch or late-game performances.


#12    JD      (see all posts) 2009/02/20 (Fri) @ 16:32

I don’t think the discussion of clutch will end, at least not by me, until somebody can actually define the term satisfactorily. I still haven’t seen that. It was my one complaint with The Book, and I still don’t feel that has been settled. Every time I see a study with “clutch” and “non-clutch” situations these parameters are, I think, fatally flawed.


#13    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 20:34

JD, rather than waiting for someone to define clutch to your satisfaction, why don’t you define clutch to your satisfaction?


#14    Terry      (see all posts) 2009/02/20 (Fri) @ 22:00

Ha!!!!!!! So clutch does exist!!!!  tongue laugh


#15    MGL      (see all posts) 2009/02/21 (Sat) @ 00:11

I agree with Guy on 2 points that I had not seen mentioned before.

One, in removing the IBB’s, you most likely have a biased sample wherein the pitchers are better than otherwise and more likely to have the platoon advantage.

Two, and perhaps more importantly, the pool of pitchers is likely to be better, maybe much better, versus Tango’s players in high leverage situations, than the Fans’ players.  No?  Tango?  So, wouldn’t you need to adjust for the pool of pitchers in the clutch data?

The argument that “We (whoever the researcher is) have not defined clutch to YOUR satisfaction,” is a specious one on several levels. One, who cares what YOUR definition is or whether you like the researcher’s?  As Tango said, if you don’t like the researcher’s, do your own study with your own definition.  I don’t mean that facetiously.  What I mean is that if you think that a researcher’s definition is completely unreasonable, then that is a legitimate argument.  If the researcher’s definition IS reasonable, then where is the argument, other than, “I’d also like to see the numbers when we look at X.” Let’s face it.  There is no ONE definition of clutch.  Any reasonable definition will due the trick.  The most important point is that if you think that one study with a large sample of data (which this one does NOT by the way) and a reasonable definition of clutch can show a very small or a very small effect, but if you change the definition to another reasonable one, and the results will be completely different, well, you don’t know too much about the way this kind of baseball research works.

Finally, when someone does research like this (and again, this was NOT a bona fide research project on the existence and magnitude of clutch), all they can do is to present a reasonable definition - THEIR definition - of clutch, and apply the results to THAT definition.  If they end up generalizing the results and explicitly applying them to ALL possible clutch definitions and situations, then they are making somewhat of a mistake.


#16    birtelcom      (see all posts) 2009/02/22 (Sun) @ 18:23

Isn’t there a bit of a contradiction between the final two paragraphs of Tom’s article? The second-to-last paragraph points out that the experiment under discussion shows a fan-predicted level of “clutchness” that is not statistically significant and is thus unlikely to recur in a repeat of the experiment.  Then the last paragraph goes on (sarcastically perhaps?) to conclude with breathtaking certainty that the experiment has shown that a repeatable clutch skill identified by fans exists, that the size of it, albeit modest, can be generally identified, and that these conclusions are so well supported that all further discussion of the issue can and should now cease.  I hope that this set of final statements in the last paragraph really was tongue-in-cheek, because they seem to be undermined by the actual results of the experiment, which are best summarized in the second-to-last paragraph.

I do agree that the most interesting results of the experiment are not about objective clutch skill at all, but rather about the nature and parameters of belief about clutch skill held by this particular self-selecting group of fan participants.


#17    Tangotiger      (see all posts) 2009/02/25 (Wed) @ 12:10

birt: you are definitely reading way too much into what I said.  I said what I said.  This is how to parse what I said:

Let’s let this clutch debate end today (please?),

I’m tired, and most people who study this issue are tired.  So, let’s stop talking about it, because…

and simply agree that:

... we can have a healthy compromise that both sides can live with, which is…

a) yes, clutch exists,

... duh.... that’s a given.  There is no human endeavour you can point to in which everything he does is random.  Even f-ck-ng rock/paper/scissors is not random!

b) yes, fans can perceive clutch players, but

... they have some insight in this, even if it’s just tiny… it’s at least a smidge above 50/50.

c) the impact of clutch players is limited to less than the platoon advantage.

... somewhere between non-zero and less than the platoon advantage.  Isn’t that something that we can all agree to?

p.103 by Andy in The Book:
“About one in six players increases his inherent OBP skill by eight points or more in high-pressure situations; a comparable number of players decreases it by eight points or more.”

We all agree that Scutaro+clutch is worse than ARod. 

I’m quantifying the upper limits.  Andy quantified it more specifically.  We all agree it’s non-zero.

We’re all in the same ballpark, more or less.

My little project simply presents a useful boundary to think of clutch in real terms.  Once you think in those terms, then there’s really not much more to debate is there?

Exactly what is the debate at this point?  That the clutch skill is 1 SD = 10 wOBA instead of 6 or 8 wOBA?

Or that 1 SD = 50 wOBA?  (i.e., yes, given the choice between Scutaro and ARod, I’ll bring in Scutaro).


#18    Guy      (see all posts) 2009/02/25 (Wed) @ 13:12

I’m all for compromise where possible, but I have to quibble a bit.  What Andy’s analysis tells us (and all studies based on measuring total vs. expected variance) is that there is a clutch talent IF we have successfully accounted for all the factors that might produce non-random variance.  That’s a signfificant “if.” Maybe some managers do have a limited ability to select favorable hitter-pitcher matchups, based on the hitter having an unusual platoon split, or being able to hit certain kinds of pitches, or something else.  Or maybe some hitters are a little better with men on base—even controlling for handedness—because of their distribution of BIP, and that creates the illusion they are better in the “clutch.” I’m not arguing any of that is true, just that Andy’s analysis doesn’t quite prove the existence of clutch talent in the normal sense we mean it.

And of course the ability of fans to detect this skill is another issue altogether.  And I don’t think we have strong evidence yet that this is the case (though your experiment is an interesting first look at the issue).


#19    Tangotiger      (see all posts) 2009/02/25 (Wed) @ 13:29

Are you suggesting that you disagree with at least one of my 3 statements?


#20    Guy      (see all posts) 2009/02/25 (Wed) @ 13:40

Yes.  I’d say:
b is definitely an open question
a is somewhat open (though likely correct)
c is definitely true (assuming clutch does exist)


#21    Tangotiger      (see all posts) 2009/02/25 (Wed) @ 14:11

But what’s the extent of the disagreement then?  Take the “b”, which is that the statement is that fans perceive clutch.  Isn’t it more likely that Fans are not completely clueless as to who is a clutch hitter, just mostly clueless?  That for example, a fan can say “Chuck Knoblauch doesn’t know how to throw any more”, or “Rick Ankiel forgot how to throw”, while a statistician would say “No, it’s way too early to say that… I can tell you in two to four years whether what you just is real or not”.

As long as we agree that it’s not completely random, that fans are not completely clueless, then I don’t see the point of disagreement.

Indeed, the presumption would be on the person who says that fans are completely clueless with regards to clutch to prove his point.

Once we accept a and b, then the real argument is about c.  And that I say it’s between non-zero and the platoon advantage is about as comforting a statement that both sides can live with.


#22    Guy      (see all posts) 2009/02/25 (Wed) @ 14:32

Look, if it will stop the debate, I would certainly be willing to concede a small clutch ability—which I think is your point.  But who will meet on the deck of the battleship to sign this truce? 

Until then......  I ran correlations between Fangraphs’ “clutch” statistic and your WPA/LI stat.  If I understand correctly, the latter is a context-neutral measure of hitter performance, and so shouldn’t be correlated with clutch.  Here’s what I get (hitters >300 PA):
2004 -.07
2005 -.16
2006 -.04
2007 -.11
2008 0.00
It sure looks to me like great hitters have a tendency to be “unclutch.” Unless we think there’s a reason for hitting ability and clutch “ability” to be inversely related, this must reflect good hitters facing better pitchers and/or worse platoons.  And couldn’t that explain much or all of the difference between your hitters and the Fans’?


#23    Guy      (see all posts) 2009/02/25 (Wed) @ 14:50

Additional thought:  can you ask Dave to calculate the average Clutch score for the Fans’ players and your players?  Would be interesting to see if Fans were above average. 

My guess is your team will be a little below average.  Over past 5 years, the top 30 hitters in WPA/LI each season have averaged a -0.12 Clutch score.


#24    Tangotiger      (see all posts) 2009/02/25 (Wed) @ 15:00

I would be happy to join Murray Chass to sign this truce.

And I agree that some of the gap between Fans and my team is a platoon/handedness issue.

All’s I’m saying is that it’s non-zero (as proven by Andy’s study), and could not possibly be more than the platoon advantage (as shown by this study).

It’s there whispering at us, not shouting at us.  And for all intents and purposes, is useless, because it has so little value in decision-making.


#25    Tangotiger      (see all posts) 2009/04/08 (Wed) @ 11:53

In the main article, I show how Silver got an r=.33 with two groups of players each with around 3500 PA, which led to me to state this equation:

r=PA/(PA+7000)

I also said in my own clutch study, I found this:

r = PA / (PA + 6250)

Pizza Cutter shows this in his clutch study:
http://www.statspeak.net/2009/04/so-maybe-clutch-hitting-exists.html

number of PA N split-half
1000 869 .174
2000 429 .304
3000 186 .431
4000 74 .489
5000 20 .656

If I were to construct a handy-dandy regression equation for each of those data points, I’d get
r=PA/(PA+x)

where x is some number around 4500.  Here is the “x” for Pizza’s data:

number of PA N split-half x
1000 869 0.174 4747
2000 429 0.304 4579
3000 186 0.431 3961
4000 74 0.489 4180
5000 20 0.656 2622

I’d ignore that last one, since it’s only 20 players.

Basically, at around 4500 PA, Pizza would have an r=.50 (if he had a large enough number of players).

This is fairly consistent with my findings using only 4 years of data, and a bit stronger than Silver’s process.

Given for example 2000 PA in two groups to run the correlation, this is the correlation that the three of us would say we’d get:
.31 Pizza
.24 Tango
.22 Silver

Pizza does show that he sees it better than we do.  But, as he noted, if you need 4000 career PA just to get an r of .30, it’s going to be very tough to get any real use out of it, other than a retrospective look.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 23 01:15
How much should minor leaguers make?

Feb 22 22:31
Not everything you learn in college is true (duh)…

Feb 22 17:27
Would you cut to a regularly scheduled show, if the main event ran long?

Feb 22 17:02
This week in chart failure

Feb 22 16:26
Who’s evaluating the 2011 forecasts this year?

Feb 22 12:21
MLB 2012 Odds: BetOnline

Feb 22 07:11
K minus BB differential or ratio?

Feb 22 01:18
Two players have the same stats: one is much younger.  Which one will be better next year?

Feb 21 14:49
Knuckleball pitchers: all of them

Feb 21 13:57
Proper compensation for Epstein?