THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, October 06, 2009

Catcher’s strikeout rate

By Tangotiger, 04:37 PM

Jerry Crasnick:

According to Baseball-reference.com, Burnett has struck out 79 of 434 batters while pitching to Posada this season. Opponents are hitting .270 against Burnett and have a .775 OPS when Posada is his catcher.

In contrast, Burnett has struck out 77 of 288 batters while throwing to Molina. Opposing hitters have a .221 batting average and a .658 OPS against Burnett when he’s working with Molina.

Presuming that the quality of opposition is the same, we have a K rate of 18.2% with Posada and 26.7% with Molina.  Figure an average of 350 PA.  Burnett has a career K rate of 21.9%.  One standard deviation, given 21.9% as the true rate and 350 sample PA is .022 K per PA.  That puts Posada and Molina at roughly 2 SD from the mean (each going the other way, obviously).  (The real numbers are 1.9 SD for Posada, and 2.4 for Molina.)

Again all other things equal, we see that Molina is the better catcher for Burnett.  We can presume a non-zero difference.  We CANNOT presume the actual observed performance.

Glove-slap: Michael


#1    Alex      (see all posts) 2009/10/06 (Tue) @ 18:19

I think the fact that these players were (I assume) cherry-picked for the extreme performance differential makes presuming a non-zero difference problematic - you’d expect to see extreme stats a certain percentage of the time just by chance, and if you’re specifically selecting these extreme stats to test for significance, it seems to me like you’d get a certain number of false positives using this method.  I guess maybe this is such an extreme number of standard deviations apart that it outweighs this factor, I just have a minor issue with cherry-picking the extreme stats to test.

This is an interesting issue, though, and it would be interesting to see how predictable and significant these sorts of splits are.  Maybe small sample size issues, though, overall.


#2    Larry Mahnken      (see all posts) 2009/10/06 (Tue) @ 18:37

Alex - they weren’t cherry picked for their extreme performance differential, as you’ll see if you click on the linked article.


#3    Alex      (see all posts) 2009/10/06 (Tue) @ 19:38

Why were they picked, then?  I reread the article and I still don’t see evidence they were picked for anything other than the playoff lineup change (which, obviously, is a result of Burnett’s extreme performance difference between the two).  It seems to me like indirectly they were picked because of this, but I admittedly could easily be misreading something about the situation.


#4    MGL      (see all posts) 2009/10/06 (Tue) @ 20:46

"I think the fact that these players were (I assume) cherry-picked for the extreme performance differential makes presuming a non-zero difference problematic - you’d expect to see extreme stats a certain percentage of the time just by chance, and if you’re specifically selecting these extreme stats to test for significance, it seems to me like you’d get a certain number of false positives using this method.  I guess maybe this is such an extreme number of standard deviations apart that it outweighs this factor, I just have a minor issue with cherry-picking the extreme stats to test.”

I have not read the article, but you are 100% correct.


#5          (see all posts) 2009/10/06 (Tue) @ 20:48

Tango, thanks a lot, this is great!

I agree with Alex, they were picked (for this article, and for this playoff swap) because of their wildly differing numbers.  We’re not reading about Pettitte, Chamberlain, or Sabathia’s special catcher, so we know at least that Burnett was cherry-picked as the most extreme case on the Yankees.


#6    MGL      (see all posts) 2009/10/06 (Tue) @ 20:53

I read the article and it is cherry picking.  Not cherry picking in the sense that someone looked at all the possible matchups and just reported the ones that had a large split.  But cherry picked in that someone (Crasnick, Girardi, whatever) looked at the Burnett/Posada/Molina matchup and if there were nothing to it, there would be no story, and if there were something to it, there would be a story, which there is.  That is also cherry picking.

Regardless of what you want to call it, showing one item among many that is several standard deviations off the norm tells us nothing unless we first show that there is some “skill” in the entire population.  In this case, if we were to find that the distribution of “catcher splits” were that which we expect by chance (with the normal 1SD, 2SD, etc. outliers), then this 2 SD split with Posada and Molina would be worthless information.

That being said, the difference between the two on offense is so large, you cannot afford to make the presumption that Burnett will pitch better with Molina behind the plate.  (Likely) Big mistake by Girardi.  Sounds like something Torre would do.


#7          (see all posts) 2009/10/06 (Tue) @ 22:14

Cherry picking and publishing bias are very similar.  In this case, you can call it either one.

Look at it this way: Let’s assume that there is no such thing as a catcher significantly influencing a pitcher’s K rate (or anything else for that matter).  So would expect the usual distribution of “splits” among all catcher duos (and triads) on each team.  Let’s take the last 5 years.  We might have 100 such “splits” to look at, where at least 2 catchers from each team get significant playing time with one pitcher on that time.  5 will have 2 SD or more splits, like Molina and Posada.  What are the chances that a manager notices that over the least 5 years and someone writes an article about it?  That is one of the relevant questions.  You can call it cherry picking or publishing bias, or whatever you want.  The bottom line is that without looking at all the data to see if that distribution indeed looks random or not, we can infer almost nothing from the fact that someone reports that one “split” is 2 SD.

So, as Alex says, I think that the statement, “We can presume a non-zero difference.” is false.  It is perhaps true that we can presume an every-so-slightly non-zero difference, but that is a technicality.  We can say the same thing about a hitter who in 10 PA has done better in the clutch than on the non-clutch, even if it is by .01 SD.  We can say that there is a non-zero chance that he has better clutch talent than the average player.  Again, without examining all the data, finding one instance of a 2 SD or even a 3 SD anomaly tells us nothing, since we don’t know out of how many samples that one instance was chosen from, even if the person doing the reporting did not look at any other samples.  That is one example of publishing bias.  It doesn’t have to come from the same person.  Maybe 100 (or 1000) people (managers, journalists, researchers) looked at these catcher splits.  Eventually someone is going to report a large one even if the entire distribution is random (there is not “true” difference among catchers in terms of their influence on pitcher performance).


#8    dan      (see all posts) 2009/10/06 (Tue) @ 22:21

I fail to see how this is cherry-picking. Molina has been catching Burnett in place of Posada for a long time now. Molina wasn’t even in the discussion to catch any other pitcher because Posada hasn’t had problems catching any other pitcher. Burnett has complained about pitching to Posada in the past.


#9    Nick      (see all posts) 2009/10/06 (Tue) @ 22:28

Dan - it’s cherry picking because Crasnick intentionally (at least it seems that way) picked a player that had a large split.  MGL is right that we need to know the SD of “catcher splits” before we can take any meaning from this.


#10    MGL      (see all posts) 2009/10/06 (Tue) @ 23:28

Like anything else (clutch, etc.), we need to know the observed SD of catcher splits and compare that to the expected SD by chance.  Any difference would be the “skill” with a standard error around that of course.  If there is no significant difference between the observed and the expected variance by chance alone, then we “conclude” (with some degree of certainty - meaning that there is some chance that we are wrong - a Type II error) that there is no catcher skill, i.e., the Molina/Posada difference is likely due to chance. We cannot ever conclude that there is no apparent skill in the population but there happens to be ONE (or two) catcher tandem that has this skill.  Either there exists a skill in the population or there doesn’t.

That being said (on the other hand), even if we find no skill in the population (the observed variance = the expected variance by chance), that doesn’t necessarily mean that there are not ANY tandems that have such a skill.  It just means that if there are, we probably can’t find them. 

And we’re only talking about 2 SD here folks.  If we find no skill in a population, but out of a population of a couple of hundred, we find some element that is 3 or 4 SD off, then we can talk (about the possibility of an “exception” to the “no skill” conclusion).

But if we find little or no skill in the population (and I am not saying that there is - I don’t know) and someone reports a 2 SD element in that population, that is not exactly something to write home about as we EXPECT that there are several 2 SD elements in a population of 100 or more.

Again, reporters and managers have lots of catcher tandems to choose from. It is NOT surprising that someone reports a 2 SD split if there is NO TRUE SPLIT (skill).  Not surprising at all.  Now, if we get these articles all the time over the years, then we can begin to think that maybe the distribution of these splits is NOT random.

Why is that so hard to understand?


#11          (see all posts) 2009/10/07 (Wed) @ 00:17

hmmmmm--in this case, why do we need to assume a single, generalizable “skill”? (I presume your skill is “catcher ERA effect")

I mean, suppose some pitchers depend more than others on throwing curveballs in the dirt and some catchers are better than others at blocking said pitches.  I assume no one will argue with either of these assumptions.  Now, given that these assumptions are true, OF COURSE certain catcher-pitcher combos will be problematic--it may just be that those combos produce large enough effects to be visible w/ a smallish sample size in the case of extreme pitcher/catcher combos, which only occur, let’s say, .15*.15 of the time......


#12    MGL      (see all posts) 2009/10/07 (Wed) @ 00:24

Nick, sure that could be true, in which case, you would want to investigate the issue on a more granular level.


#13    Paul Scott      (see all posts) 2009/10/07 (Wed) @ 05:37

It is not quite as bad as replacing Posada’s bat with Molina’s.  Posada will DH, so it is replacing Matsui with Molina.  Still a significant drop and still clearly a mistake, but that is somewhat compensated by gaining a platoon effect with Brian Duensing announced as the starter for Game 1.

Also, forgetting the likely wrong-minded assumption that Molina makes Burrnet better, Molina is a better defensive catcher - at least as it relates to stopping steals.  Both Posada and Molina have a 28% rate this year, but Molina over his career has been at 40%, while Posada has been at 28%.

Again, the difference in offense between Matsui and Molina (even giving Molina the platoon effect against an LHP) is still huge and this is certainly a mistake - but not as big of one as is being portrayed.


#14          (see all posts) 2009/10/07 (Wed) @ 08:46

There’s 8 teams in the playoff, with, say, 3 starters each for the first round of playoffs.  That’s 24 pitchers who probably have thrown some percentage of their pitches to their team’s main catcher and some much smaller percentage to their team’s backup.

According to Tango’s math, Burnett is about 2 standard deviations away from expectation with his catchers, which I believe translates to mean that about 5% of the time, numbers this wild will simply be the result of random variance and chance.

But, we have 20+ pitchers in the sample.  So one of them (one pitcher being ~5% of the group) just by random luck is going likely to have statistics that are 2 SD from the mean.  In this case, that could be Burnett, and so Crasnick wrote about him, rather than the other 23 pitchers who presumably have less than a 2 SD difference in strikeout rate between their catchers.

But, as dan mentioned, Burnett has complained about Posada in the past, which is something pitchers rarely do regarding their team’s all-star catcher.  So given that Burnett has complained in the past, maybe it’s not so cherry-picked to examine his stats and write about him.


#15    Tangotiger      (see all posts) 2009/10/07 (Wed) @ 09:22

What is the difference in hitting between Molina and Posada/Matsui?  (Aside: I took a wild stab, and figured the wOBA as .270 and .370.  The career for Molina is .269, and for Posada is .370.  I love wOBA!)

Anyway, if you presume the starter is going to pitch 6-7 innings, that gives your main catcher 3 PA, or a gap of 0.26 runs.  That’s on offense.  Defensively, say that Molina cuts that down .11 runs, so that the gap is now down to 0.15 runs.

Is it possible that Burnett “believes”, at least insofar as the next game is concerned, that his ERA will be affected by 0.15/6.5*9 or around 0.20 runs?  That is, because he’s been vocal, because it’s now a story, that it’s going to get into his head for the next game?

Is it too much to say that the benefit of the doubt (our margin for error) is around there?

***

Yes, it is cherry-picking if we take the most extreme example.  It’s not clear to me that it is the most extreme example.


#16          (see all posts) 2009/10/07 (Wed) @ 09:30

Paul,

I don’t think Posada will DH.  Matsui will, and Posada will ride pine until the ~6th inning, when he will pinch hit for Molina.

I think this is stupid.  Burnett is an inconsistent pitcher who has thrown both great and terrible games with each catcher behind the plate.  I have a much higher degree of certainty that Posada’s bat > Molina’s bat than I do that Molina’s work-with-Burnettness > Posada’s.  Plus, to my eyes Molina’s glovework has slipped. 

Obviously Girardi, the former weak-hitting backup catcher who was given playoff starts over Posada, disagrees.  Grrr.


#17    Guy      (see all posts) 2009/10/07 (Wed) @ 09:52

Tango:  Have you ever used WOWY to look at catcher ERA (or catcher Ks)?  Seems like it might be an interesting exercise.  Of course, that would measure whether a catcher had an ERA skill across all pitchers.  It can’t answer whether specific pitcher-catcher combos are unusually good or bad.  But if it turned out there were little/no talent in general, that should probably make us slower to believe in the specific claims.


#18          (see all posts) 2009/10/07 (Wed) @ 10:16

I’d love to see someone investigate this in-depth.  I think it’s an extremely important part of the game that I haven’t seen too much on.  The specific pitcher-catcher interactions are what I’m interested in.  As Guy says, taking the ERA across all pitchers may be misleading as to when specific pitcher-catcher matchups would be advantageous. 

My interest lies in personal experience pitching to different catchers.  I remember specific catchers that I hated to throw to, but my memory is obviously biased and the sample sizes are small.  Pitchers are quirky and having confidence even in how a catcher frames/catches a ball can effect the pitcher’s head.  Whether that truly translates to performance is up for debate.  Also, finding a difference at such a high level of play may be impossible (there are plenty more bad catchers in college than in MLB, so the variance may be negligible).

I think MGL and Tango are both correct here.  Given that Burnett complains, there is reason to investigate the single relationship between the battery whether ‘cherry picking’ or not, and there is a difference.  Whether that difference would be persistent is the question.  That’s where MGL’s argument comes in, I believe.  But the fact that there is currently a difference in past performance in this small sample size really could have effects on Burnett’s mechanics and confidence throwing certain pitches.  I’d love to see someone with more access/ability for this kind of data mining than I to try and see what they could find.


#19    Tangotiger      (see all posts) 2009/10/07 (Wed) @ 10:19

"Plus, to my eyes Molina’s glovework has slipped.”

I say this all the time, but I’ll repeat it anyway.  Any single person’s observation, be it Rob here, or me, or MGL, or anyone else in the world, is virtually worthless.  One observation means nothing.  Nothing at all.  At the same time, collect a handful of worthless observations (even as few as 5, but preferably at least 15) and you get valuable observations.

http://tangotiger.net/scout/index6.php?prim_fld_cd=2

47 Yankee fans think that Jose Molina is the 5th best fielding catcher in baseball.  And 54 Yanks fans think that Posada is a slightly below average fielding catcher.  Those fans therefore see, say, a 15 run gap between the two, per full season (or .10 runs per game).

***

Guy: the reason I have resisted is that I need to control for age.  For BABIP, I didn’t have much worry, because a pitcher’s BABIP rate doesn’t change much due to age.  But, a pitcher’s K and BB rate do change much.  Not to mention that I don’t know that I need to count the HR/PA or BABIP against the catcher either.  Yes, given a large enough sample, it’ll balance out, but I don’t know that I can get that high.

Plus, I know I can control the running game easily enough, so I didn’t want to include that in the ERA part.  Finally, ERA (or RA) is itself subject to sequencing, and I don’t know if I need to include that.

All to say that yes, I’ve thought about it (alot), but I’ve resisted until I can actually do it in a more intelligent fashion than the rest of the WOWY stuff I’ve done.


#20    Tangotiger      (see all posts) 2009/10/07 (Wed) @ 10:21

Millsy:

You might not be aware of this:
http://tangotiger.net/catchers.html

I expanded that study in the 2008 Hardball Times Annual, to cover all the Retro years, through 2006.


#21          (see all posts) 2009/10/07 (Wed) @ 10:41

Thanks, Tango.  Really interesting stuff.  Now that I read it, I think I had glanced over that a while back, but totally forgot.

Using Wild Pitches and Pass Balls is one way of evaluating the pitcher-catcher interaction.  It actually surprises me that there are stark differences in such rare events.  Did these apparent differences in confidence in the catcher lead to further performance deficiencies (HR allowed, runs allowed, walks, etc.)?

In my experience these rare events aren’t the entire story, but they may reveal that there is a story.  Sometimes being on the ‘same page’ as the catcher, or, as you state, ‘framing’ can be helpful as a pitcher.  Unfortunately, I’m not sure how to really define those or if they’re truly relevant.  Often times, these are the things we actually can’t place our finger on in a statistical sense, but have to make assumptions using the metrics that people like you have developed.  I’m worried that it gets into the realm of testing for ‘clutch’ ability.

That’s why I think it’s at least reasonable to compare Posada and Molina paired with Burnett: he claims there’s something askew.  Whether it’s random and truly affecting performance or not is up for debate--but whether it seems to be affecting Burnett’s confidence is not.


#22    Peter Jensen      (see all posts) 2009/10/07 (Wed) @ 11:01

Millsy - For another study that may be more on point to your proposed “undefined” catcher skills read Tom Hanrahan’s study of rookie catchers at http://www.philbirnbaum.com/btn2004-11.pdf.


#23          (see all posts) 2009/10/07 (Wed) @ 11:56

Peter/22,

Thanks!  Really really cool stuff.  I wonder if someone could extend that with Pitch F/X data in terms of location and change in hitting spots being the problem with the pairings, or if it’s something more non-quantifiable such as ‘feel’.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 03:39
Lack of hustle during a game

May 25 02:54
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards