THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, August 19, 2009

Critique on the Clutch study in The Book

By Tangotiger, 01:54 PM

Phil makes his case, and I will email Andy asking for his response.


#1    AED      (see all posts) 2009/08/21 (Fri) @ 12:40

1. Inclusion of Walks

I’ll freely concede the point that pitchers approach different at-bats differently.  In some instances, the pitcher will challenge the hitter and try to get him out.  In others, the pitcher will avoid the middle of the strike zone at all costs, even if this results in a walk.  In yet others, the pitcher wants nothing to do with the batter and intentionally walks him.

To the extent that these approaches can be cleanly separated into different samples, that is a good thing.  Which is of course why intentional walks are excluded.  However, separating the first two is not so clear.  There are instances in which the pitcher is trying to paint the corners, and does so well enough to get a strikeout, induce a weak hit, or maybe he messes up and gives up a home run.  Likewise, there are instances in which the pitcher wants to challenge the batter, but the batter works the count and draws a walk anyway.

The danger here is that using the result to infer the approach biases the sample.  Specifically, if pitchers are more likely to pitch around a good hitter with the game on the line, then good hitters would be seen to have poor clutch performance.  They would lose credit for the semi-intentional walks when being pitched around, but get penalized for the outs they create when being pitched around.

I guess the issue is that Phil treats walks as a proxy for pitcher intention.  while that is undoubtedly part of the equation, he is completely ignoring the batter’s ability to work the count.  Thus, some part of the walk is indeed related to skill on the part of the batter, and discounting this is as questionable as including the part of the walk for which the pitcher is responsible.  (Why not also discard strikeouts, which are also a strong function of the pitcher’s style?) In addition, it seems reasonable that maintaining proper plate discipline with the game on the line might be challenging for players; thus by omitting walks one is potentially throwing out a key indicator of clutch performance.

For the record, while the study is six years old and thus I no longer have the exact code used to produce it, I have made a quick look and see little difference between the clutch measurement with and without walks included.  Thus, I stand by the results.

2. Statistical Significance

This is probably a reflection of my training, but I do not accept the notion that there is some magical degree of confidence at which a detection becomes “statistically significant”.  At least in scientific research, the proper way to report results is to give the result with a confidence interval (68% is standard, as it corresponds to 1 sigma) and let the reader deduce for himself how solid the result is.  I have done just that, and Phil even cites the passage where the confidence interval is given.

Also, Phil has grossly exaggerated his claim that I underestimated my uncertainty.  From the numbers in the Book, one computes about a 13% false detection probability.  (For the numbers folks out there, it looks like the measured variance due to clutch skill is something like 0.000071 +/- 0.000063, which gives a false detection rate of 13%.) From Phil’s simulations, he estimates 14%.  I don’t see a large discrepancy here.

Finally, I do have to point out that repeating a claim (that the result is insignificant) does not make it any more correct…


#2    bb      (see all posts) 2009/08/21 (Fri) @ 14:05

Walks are both a function of pitcher intent and batter ability.  Good hitters will get more walks in every situation.  They also get more balls per pitch thrown any way you cut it (with or without walks).  In exchange, they get more outs than they would if the pitcher pitched to them normally.  I.e. the pitcher is increasing the batter’s OBP in exchange for reducing his runs scored (primarily through a reduction in HRs).


#3    MGL      (see all posts) 2009/08/21 (Fri) @ 17:55

The danger here is that using the result to infer the approach biases the sample.  Specifically, if pitchers are more likely to pitch around a good hitter with the game on the line, then good hitters would be seen to have poor clutch performance.  They would lose credit for the semi-intentional walks when being pitched around, but get penalized for the outs they create when being pitched around.

That is a great point (basically that you have a similar problem either way).  I was waiting for Andy to respond to Phil’s criticism of using OBP as opposed to BA (or whatever).  Also, I suspected that if he used BA or some other metric, the results would be the same and he confirmed that.  In any study I have ever done, when I got a non-trivial result I usually got the same or similar result whether I used BA, OBP, OPS, wOBA, etc.

Andy’s answer to the “it is not statistically significant” criticism by Phil is exactly what I have said for many years and how I responded to Phil in the other thread (I forget which one).  I usually love Phil’s work, but I see zero merit in any of his criticisms of Andy’s study.  That is WAY out of character for him.  While no study is ever perfect, I think he took on the wrong guy here.  Andy is really smart and really detailed…


#4          (see all posts) 2009/08/21 (Fri) @ 19:47

Thanks, Andy ... give me a little while for what you wrote to sink in.


#5          (see all posts) 2009/08/21 (Fri) @ 20:05

One quick question: you found a variance of .008 in OBA, but .006 in wOBA.  The difference between the two is (a) decreasing the relative value of a walk, and (b) including extra bases. 

Is there an straightforward interpretation of this?  Does it follow that there is more variance in clutch walks compared to clutch extra-base hits?


#6          (see all posts) 2009/08/21 (Fri) @ 20:20

"Also, Phil has grossly exaggerated his claim that I underestimated my uncertainty.”

1.  I think your numbers give you a 6.5% false detection rate, not 13%.  I think you’re looking two-tailed instead of one-tailed.

2.  I never claimed that you underestimated your uncertainty.  I just noted that my simulation and method gave me even lower significance than you had: 14% to 6.5%.  That could be because my simulation had every player the same (in terms of PA), or it could be that our methods were different. 

3.  One thing I did wrong that I shouldn’t have is assumed that your confidence interval of (3, 12) was based on a normal distribution, and you have to double the width to get to 95%.  If I understand your method, the estimate of the SD doesn’t come from a normal distribution (it’s the variance estimate that’s normal).  So I should have used a different method.  The results wouldn’t have been much different, though.


#7    Guy      (see all posts) 2009/08/22 (Sat) @ 14:23

Contra MGL/3, I don’t find Andy’s response on walks very compelling.  Yes, of course hitters deserve some credit for the walks they draw.  But it’s certainly possible that pitchers walk very good hitters proportionately more often in clutch PAs.  Suppose good hitters were +.015 OBP in clutch PAs overall, while average/weak hitters had their usual performance, and the difference was entirely due to good hitters drawing more BBs.  In that scenario, Andy’s method would reveal a “clutch” ability.  But the clutch hitters would mostly be very good hitters, and not “clutch” in the sense we’re looking for.  (In fact, one interesting check on Andy’s results would be to see if the “clutch” and “unclutch” hitters differ at all in offensive abilities.) So I think Phil raised a valid question.  That said, if the difference holds up when looking only at BA, then that still provides some evidence for clutch ability.

My concern, though, is the underlying assumption that all variance beyond binomial variance reflects a clutch “skill.” That may well not be the case.  Andy controlled for quality of opposing pitcher, but I can’t tell if he also controlled for handedness.  Even if he did, hitters may face any number of other possible advantages (or disadvantages) in their clutch PAs:  playing at home/away, being hurt/healthy, having more/fewer PAs facing a pitcher for 3rd time in a game, faced more relievers (who will often be more effective than their lifetime stats suggest), etc., etc. 

You can’t realistically control for all of these factors.  But you could run a version of Andy’s study to see what the variance looks like for a factor that we know is random.  For example, compare hitter’s performance in games played on the 1st, 2nd, or 3rd day of a month with their performance on all other dates.  (That would give you samples of similar size to Andys’.) Apply the same controls.  Then, is the resulting spread of players’ “first 3 days performance” gap any smaller than what you find for clutch?  It wouldn’t surprise me to learn there is little difference. 

More generally, I think this is a better way to use the “Dolphin” method of searching for a skill.  Compare the variance to what you find in random samples of similar size in MLB, rather than to what we would see if only binomial variance were at work.

Final point:  I don’t think Phil was “taking on” Andy in any personal sense, and certainly didn’t seem to be impugning Andy’s talents in any way.  He just raised disagreements with the study.  Even if you think his criticisms are all mistaken and that Andy has the stronger argument, I don’t think Phil said anything inappropriate or “out of character” here.


#8    MGL      (see all posts) 2009/08/22 (Sat) @ 16:05

"Final point:  I don’t think Phil was “taking on” Andy in any personal sense, and certainly didn’t seem to be impugning Andy’s talents in any way.  He just raised disagreements with the study.  Even if you think his criticisms are all mistaken and that Andy has the stronger argument, I don’t think Phil said anything inappropriate or “out of character” here.”

Wasn’t suggesting that he did.  Not at all.  What I sort of meant was that if you were to tell me that you are doing a critique of Joe Blow’s (who we have never heard of in a sabermetric context) study and one of Andy’s, I would bet beforehand that the chances of finding some critical errors in Joe Blow was something like 50% or greater and 10% or less with Andy.  That is all I meant.

Plus, there were two criticisms:  One, the significance issue, which I think is a non-criticism and I have many times voiced my opinion on that.  I’ll say it again:  That term (statistical significance), and the accompanying “yes/no” dichotomy, should never be uttered when analyzing sample data.  All a researcher can and should do is show his results and give you the certainty level, which Andy properly did.  If he wants to attach some qualitative words to those numbers, that’s fine by me too.

Two, he questioned the use of OBP versus some stat which doesn’t use walks or at least de-emphasizes them.  I suppose that was a legitimate criticism, but to me at least (maybe Andy and Phil are more sure of their positions, I don’t know), the jury is still out on that.

I guess in thinking about it again, what really annoyed me was the significance thing, and I have a feeling that Phil wishes that he never brought that up, although there is an 18.9% chance that I am wrong about that.  Of course we may never know, because we don’t know whether Phil would admit it or not if it were true… wink


#9          (see all posts) 2009/08/22 (Sat) @ 22:01

mgl/8: I didn’t mean to imply a “yes/no” dichotomy on the significance level.  But the actual level is relevant, and 14% isn’t that strong.  That was all I was trying to say.  I usually mention 5% just to put the actual number in context, and also to make it obvious whether I’m phrasing it as “5%” or “95%”.  I should have made it clearer what I was doing.

My argument is:

a) we have people claiming zero;
b) some strong previous studies were also claiming zero;
c) and so, in that context, .008 with 15% significance isn’t strong enough to convince me to ignore those other studies.

More on this point in my next post ...


#10          (see all posts) 2009/08/22 (Sat) @ 22:05

Also, nothing I said was intended to be a criticism of Andy’s study ... my points related to the conclusions and what kind of arguments you could make based on the conclusions.  I’m sure Andy did things right.  Well, not 100% sure, because I haven’t seen the actual study.  But mgl is 90% sure, which sounds reasonable to me.  smile

I do wonder how Andy got a narrower confidence interval than I did ... I think he should have got a wider one, if any, because I used a constant number of PA (Andy’s average) for all my players, while Andy used some higher and some lower.  Andy’s way creates a larger variance, I think.

I think Andy reports a p=.065 significance, which is stronger evidence than my p=.14.


#11          (see all posts) 2009/08/22 (Sat) @ 22:09

I agree with Guy/7 that a comparison with another partition of the data would be more appropriate.

As Guy points out, the equation isn’t really

observed variance = binomial variance + talent variance

but actually

observed variance = binomial variance + talent variance + other variance

Guy’s method would hold “other variance” constant, so as not to confound it with the “talent variance” we’re looking for.


#12    MGL      (see all posts) 2009/08/22 (Sat) @ 22:16

Without checking, I think you said something like, “15% is not even statistically significant.” You used that term that I hate in the context in which I hate it, which is to claim that some result isn’t “strong enough” because it has not reached some magical level of significance.  There simply was no reason to “criticize” the study or Andy in that regard.  He did NOTHING wrong. In order to criticize someone or their work, they had to have done something wrong.  He would have had to overstate the certainty of his results, which he did not do, did he?

It really is a minor point Phil.  I have no problem with you pointing out (not that it necessarily needed pointing out) in case someone missed it, the fact the certainty level of his results were X, but you did use the magic words, which ALWAYS imply that anything below it (more than 5%) means “yes” or “we are really sure” and anything above it (less than 5%) means “no” or we’re really not sure, which is what I object to.  No big deal though.

Which leaves one other criticism and we are not (at least I am not) sure whether it is valid or not.  That’s not a whole lot of criticism, which is why I said that you took on the wrong guy.  And I did not mean to imply that you intended to “take on” anyone.

I love your blog and I love when you critique studies which are usually fraught with problems. I am biased since this study is in a book with my name on it, so I should probably recuse myself from this discussion, but I don’t really see much to criticize so far.

Plus you (or anyone else) didn’t even address the Bayesian aspect of the certainty level of the results.  The statistical test he did and the sigma level he found assumed no priors.  Given the fact that I am pretty sure that some level of clutch skill HAS to exist, that changes the certainty level of his results.  I am not surprised that he found something and neither should anyone else.


#13          (see all posts) 2009/08/22 (Sat) @ 22:36

Yes, I said “is not statistically significant”.  I shouldn’t have said that.  I should have said “is not as statistically significant as you’d want in order to draw a strong conclusion.” I agree with you 100% on that point.

And I agree with you 90% that I shouldn’t have implied that I was criticizing the conclusion in The Book.  I was thinking of comments or posts on this blog that treated the .008 as if it were an established result.  My point absolutely WAS Bayesian: I said, explicitly in my summary, that my position was based on the fact that other studies found no effect.  If this had been the first study I had seen on the subject, I’d have gone with the .008.

You say, “I am pretty sure that some level of clutch skill HAS to exist.” I am not sure at all.  We have different priors.

Can we both agree on this:

-- the study found .008.
-- the study was well-constructed and valid (as far as I, Phil, can tell).
-- the study found statistical significance (from zero) only at the 14% level.
-- you have to combine this study with all other information to come up with a position or estimate of what’s really going on.

I think we both DO agree on this.  My argument is: where’s the third part, the evaluation of this evidence in the light of other evidence?  The Book doesn’t have that Bayesian discussion.

I have no problem if you conclude that “clutch hitting exists” based on Andy’s study and the other available evidence.  But it’s not enough to cite Andy’s study without a Bayesian argument about why you’re discounting other studies.


#14          (see all posts) 2009/08/22 (Sat) @ 22:37

P.S.  I am coming around to the .008 point of view on the basis of more simulations I did (which I will post about), but that doesn’t make my original position invalid, IMO.


#15          (see all posts) 2009/08/22 (Sat) @ 22:50

And on the walks issue ... my point is that

-- conventional wisdom does suggest that any clutch walk effect would have a significant portion attributable to the pitcher.  There is such a thing as a “semi-intentional walk,” but not a “semi-intentional single.”

-- if you included intentional walks in clutch, you’d probably get a VERY significant clutch talent on the part of the guys who IBB a lot.  That would not reflect “clutch ability” the way it is normally defined.

-- similarly for “semi-intentional” walks.  That is simply NOT what people mean when they talk about clutch hitting: the ability to lay off the bad pitches and work the walk.

-- therefore, it seems like there are issues with using Andy’s result to answer the conventional clutch hitting question.

-- this does not mean that Andy’s result isn’t useful or important—just that it answers a slightly different question than was asked, and the “semi-intentional walk” issue could plausibly lead to very different answers to the two questions.

I agree with Guy that it would be interesting to see if the clutchier hitters differ in offensive skills from the chokier hitters.  Specfically, I’d look at intentional walks.


#16          (see all posts) 2009/08/22 (Sat) @ 22:58

I’ve updated the original post to fix Andy’s and mgl’s [valid] criticisms of my statements about statistical significance.


#17    MGL      (see all posts) 2009/08/22 (Sat) @ 23:26

While it probably wasn’t necessary to do that, Phil, I always have a lot of respect for people who are able to rethink their positions.  Lord knows, I have had to do that many times, and it is often not so easy…


#18          (see all posts) 2009/08/22 (Sat) @ 23:29

Well, it’s not a big deal ... what I said about significance was obviously too strong, and I should have caught it in the first place.


#19    MGL      (see all posts) 2009/08/22 (Sat) @ 23:48

I want to add one thing about the “walks” controversy.  Someone on a Cubs blog pointed to Phil’s site and his critique article and said something like, “clutch hitting may exist but certainly not clutch walking.” That is not true and is not the issue.

Unless you want to define clutch hitting as hitting only, and I see no reason to do that, of course walking is part of clutch hitting or more accurately, clutch offensive performance.  If a team has runners on first and second and is down by a run with 1 out, a walk is certainly an excellent event and should be rewarded as such.  On the other hand, with runners at second and third and 1 out in a close game, a walk is not such a great event and should not be rewarded too highly (it is certainly better than an out though).  In a tie or one run game in the 9th or later innings, a leadoff walk is also a very good event and could easily be classified as a clutch offensive event.

So the problem is not that walking should not be included in clutch offense, it is that it does not have the same value in clutch situations as in non-clutch situations since clutch situations disproportionately include RSIP with bases open or situations where the pitcher does not mind walking a good batter.

So the solution to that is not to ignore them (if you ignore them, you, as Andy pointed out, may end up penalizing players who get pitched around a lot) but to give them the proper weight.  If you use OBA or even wOBA or OPS, you will overvalue walks in clutch situations.

A better solution is to use a metric, say lwts, or even wOBA which uses the correct values of the offensive events.  So, for example with RISP and first base open, an IBB has a value of something like 0 to .1 runs (I don’t really know off the top of my head) and a leadoff walk in a one run game might be .4 runs or whatever it is.

But I see no reason not to reward a player who gets a walk in a clutch situation when that walk has some value over and above an average PA which it almost always does.  Certainly part of “clutch hitting” is not swinging at bad pitches and taking a walk when either it is very valuable or when you have little choice.  I mean isn’t the opposite of clutch hitting swinging at bad pitches (say when you are nervous in clutch situations) and thus having fewer walks?

I remember when I was playing in a big playoff game in our AAA baseball stadium under the lights.  I had a 3-2 count on me late in a game and I was really nervous and I didn’t want to strike out looking.  I ended up taking a half swing on a pitch over my head and I looked ridiculous.  Had I been more “clutch” and not so nervous, I would gotten a walk.  If walks were not included in clutch stats, I would have been penalized for my K (assuming that my PA was included in the clutch bucket), but had I not swung at that pitch, I would have received no credit.  That is not a good balance sheet…


#20    Guy      (see all posts) 2009/08/23 (Sun) @ 09:26

I agree that walks, properly valued, should be taken into account.  However, if you want to isolate “clutch” ability to draw a walk, you may first have to control for the quality of the hitter.  That is, if good hitters as a class have higher walk rates in clutch situations (relative to their non-clutch level) than other hitters, you need to control for that in determining whether some hitters have a clutch walk ability. 

**

A small point:  I think the clutch study needs to control for batter handedness.  Otherwise, LHH will tend to hit a bit better in the clutch (I would guess), because there will more often be a runner on first in clutch PAs.


#21          (see all posts) 2009/08/23 (Sun) @ 14:20

mgl/19

I think the issue goes back to Andy’s point

- I guess the issue is that Phil treats walks as a proxy for pitcher intention.  while that is undoubtedly part of the equation, he is completely ignoring the batter’s ability to work the count.  Thus, some part of the walk is indeed related to skill on the part of the batter, and discounting this is as questionable as including the part of the walk for which the pitcher is responsible. -

The walk is a skill in the offensive package, that is unquestionable. Just because baseball hasn’t yet been able to quantify semi-intentional walks, doesn’t mean all walks should be discounted. However, there is probably too much game theory involved to actually quantify it properly anyway. If we were able to quantify semi-intentional walks, there could (should?) be a correlation to wOBA. I.E. Better hitters get pitched around more often.

The other aspect is that drawing a walk in a “clutch” situation, will only add to the run value state of the next PA. Just because one isn’t winning the game on a single, doesn’t mean one isn’t clutch for their contribution.


#22    AED      (see all posts) 2009/08/25 (Tue) @ 15:41

A few points.

First, I’m not aware of other “strong studies” on this topic.  There have been indeed many studies that used a smaller data set or divided the sample in such a way that the resulting error bars are larger.  If my error bars were twice as large (or even 25% bigger, for that matter), they would be larger than the result.  Put differently, if two techniques to analyze the same data come out at 71+/-61 and 0+/-400, the first technique would be used in place of the second.

Second, I’m not sure how a measured clutch skill variance of .000071 +/- .000063 would infer a 6.5% false detection probability.  That’s 1.13 sigma, and there’s a 13% chance of a random draw exceeding 1.13 sigma.

Finally, as for wOBA vs. OBP, variance in wOBA is driven as much or more by extra-base hits than by walk rate.  So, it’s not correct to assume that the (very small) difference between the measurements is because of walks.  And of course, even if part of clutch differences are due to walk rates, this doesn’t prove that clutch hitting doesn’t otherwise exist for the arguments I gave in my first post.  I don’t have the Book in front of me, but wasn’t Spiezio the best clutch hitter?  If so, I’d be shocked if he were getting a bunch of semi-intentional walks.


#23          (see all posts) 2009/08/25 (Tue) @ 15:53

1.  The other studies I’m thinking of didn’t have error bars.  Agreed that your study is better in that regard, but I still think the Ruane study is “strong”.  Just that yours is stronger.  Had “The Book” mentioned that the other studies didn’t have confidence intervals, and that they were consistent with your study, I’d be satisfied.  But the Ruane study is pretty good and was the best we had until yours.

2.  Right, 13%.  Sorry.  My bad.

3.  OK.  So if you add in extra base hits, and recalibrate to the same scale (since wOBA is on the same scale as OBA), and the variance goes down, that means that walk rate and hit rate are more important for clutch than power rate, right?

Agreed that even if part of clutch differences aer due to walk rates, it doesn’t prove that clutch hitting doesn’t otherwise exist.  But we’re not going to find “proof”; we’re just looking at how best to interpret the evidence.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 23 01:15
How much should minor leaguers make?

Feb 22 22:31
Not everything you learn in college is true (duh)…

Feb 22 17:27
Would you cut to a regularly scheduled show, if the main event ran long?

Feb 22 17:02
This week in chart failure

Feb 22 16:26
Who’s evaluating the 2011 forecasts this year?

Feb 22 12:21
MLB 2012 Odds: BetOnline

Feb 22 07:11
K minus BB differential or ratio?

Feb 22 01:18
Two players have the same stats: one is much younger.  Which one will be better next year?

Feb 21 14:49
Knuckleball pitchers: all of them

Feb 21 13:57
Proper compensation for Epstein?