THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, October 06, 2008

Power or Finesse Pitchers in the Post-season?

By Tangotiger, 12:54 PM

Bill James, in what will surely be an article to appear in the next Gold Mine, looks at the issue of whether the Power or Finesse pitchers perform better in the post-season.  He does his typically enjoyable study of matched pairs, where he proceeds to select 100 power pitchers and 100 finesse pitchers (they match in a variety of ways, except in K and BB).  They match up quite well in the categories he selected. He also notes:

But the power pitchers had averaged 183 strikeouts, 76 walks; the finesse pitchers had averaged 107 strikeouts, 57 walks.  The two groups were nearly even in terms of home runs allowed (a few more for the power pitchers), but the finesse pitchers had given up, on average, 18 more hits.  18 more hits, 19 less walks, one less homer. . .the same results overall.

As you guys know, I’m big on simply doing K minus BB, per PA.  And just looking at the bolded part, you can see that I think the two groups are biased. I responded:


Very enjoyable study.

If you look at the BABIP (batting average on balls in play, or H minus HR divided by PA minus BB, K, HBP, HR), I think you will find that the finesse pitchers ended up with a BABIP of 10 or 12 points better.  Or, probably a bit more lucky than the power pitchers that year.  So, I think the study is biased in that while the component ERA may come out as equal for the two groups, the component ERA of the power pitchers is more indicative of the true talent.

I estimate that the 10-12 estimated difference in BABIP to be roughly worth 0.20-0.30 in ERA, thereby giving you a perfect match for the post-season difference.

***

We can even try to estimate FIP, and I get a 38 point difference, in favor of the K pitchers.  So, I don’t think that we really have a matched pair of pitchers here.  The idea behind matched pairs is that you can match on everything, except the thing you are looking at.  And the plan is to make sure not to bias the two groups.  But, I think Bill does have a biased group of pitchers.  The FIPs aren’t close to matching, the BABIP don’t match, and what is more indicative in the future is a pitcher’s FIP not his ERA.  And his BABIP is the least indicative, but it makes up a substantial part of ERA, one of James’ indicators.

In any case, I really enjoyed the study, and it would be an ideal study by simply introducing one extra parameter (FIP or BABIP) into the equation.

#1    MGL      (see all posts) 2008/10/06 (Mon) @ 13:55

If the study is what I think it is from your description (get two matched group of pitchers, each with the same regular season ERA, but different K/BB ratios or differentials, and then see who does better in the post-season), then you are being way too kind and solicitous to Mr. James.

DIPS has been around for many years now.  Any sabermetrician worth his weight in salt knows that if you take two groups of pitchers with the same ERA but one group has a higher BABIP (usually because they have a higher K rate), then the group with the higher BABIP will see their ERA “lowered” in ANY TIME PERIOD, past or present (outside of the initial sample time period), whether it be post-season, reg season, or the Fourth of July.

The reason, as you say, is that the two groups are not unbiased.

So if James concluded or even suggested that “power pitchers are more suited to the post-season than finnesse pitchers” (even if he does not say that, but the data suggest it), then he is being irresponsible (to say the least) as a result of a poorly constructed study and he should be called out on it in no uncertain terms or words.

The next thing we know, we will be hearing for the next 10 years that power pitchers are more suited to the post-season (what BP also said) because that’s what Bill James said.

I am frankly appalled that Bill would have such an obvious flaw in his “study.”

I am holding this post in abeyance until I read the study, lest I end up putting my foot in my mouth.

OK, I read the article and it is as Tango describes and what I figured.

Bad study!

BTW, Bill says it took him 35 hours to do the study. Either he needs to hire a programmer or a new one!  That should take about 1-2 hours.  Seriously.


#2    Tangotiger      (see all posts) 2008/10/06 (Mon) @ 14:44

I’ve done studies that has taken me quite a bit of time, that after the dust settles, I wonder where did 90% of my time go.  It helps me tremendously that my career is one based in programming, so I can’t fault anyone for not being well-versed there. 

Bill James does everything by Excel, from his own admission a while ago.  It’s a g-dd-mn bear to do a “most similar” pitcher study in Excel, basically ending up doing it one player at a time.  In a programming language where you can do loops, you can set this up in an hour.  In a database system, this would take just minutes.


#3          (see all posts) 2008/10/06 (Mon) @ 16:30

Did James respond to Tango’s criticism?


#4    Tangotiger      (see all posts) 2008/10/06 (Mon) @ 17:08

No, or at least, not yet.

However, Bill generally doesn’t respond anywhere near as often as I respond here.  It’s very hard to get any kind of dialogue going with him on his site.


#5    john      (see all posts) 2008/10/06 (Mon) @ 17:50

I think I remember reading somewhere that power pitchers are more sucessful in the playoffs.  The theory being that finesse pitchers rely alot on the overaggressiveness of weak offensive teams to swing at bad pitches.  Obviously once you hit playoffs your facing mostly good offensive teams.  I could have swore I read that in BP’s Baseball Between the Numbers but I could be wrong.


#6    MGL      (see all posts) 2008/10/06 (Mon) @ 18:37

John, easy enough to test.  Just look at power pitchers and finesse pitchers versus various types of batters - good ones, bad ones, power, etc.

Honestly, I just don’t think you are going to find any subset of pitchers or batters that defy a regular odds ratio prediction.

Unless, of course, you get some really small samples and do some “curve fitting.”


#7          (see all posts) 2008/10/06 (Mon) @ 20:35

Are you positive that finesse and power pitchers have precisely the same true BABIP?

It would not surprise me if finesse pitchers have a “true” BABIP 10-12 points lower than power pitchers.  I have a theory, that the slower the pitch, the easier any resulting BIP is to field.  It holds as an explanation for why lefties have a slightly better BABIP than righties (as a group, they pitch slower), an why knuckleballers have a substantially lower BABIP than normal pitchers.

Or, heck, maybe being a finesse pitcher is kind of crappy, unless you have some particular knack for getting people to hit very field-able balls. So the only finesse pitchers who make it to the majors are the guys who have that knack.


#8          (see all posts) 2008/10/06 (Mon) @ 23:10

Baseball Reference tracks overall hitting stats versus power and finesse pitchers, so I’m sure someone could get the data from Sean if they wanted to study it. You would have to use his definition of power and finesse, I think, but it (the definition) seems pretty reasonable.


#9          (see all posts) 2008/10/07 (Tue) @ 01:00

Well, per bb-ref
Power Pitcher
Year Power Avg Finesse
2008 .297 .300 .302
2007 .291 .305 .309
2006 .292 .302 .307
2005 .287 .297 .296
2004 .288 .299 .301
2003 .287 .294 .299
2002 .292 .293 .294
2001 .290 .298 .297
2000 .291 .301 .306
1999 .299 .301 .305
1998 .297 .300 .300
Total 0.292 0.299 0.301

Definition:
Power pitchers strike out or walk more than 28% of batters faced, Finesse pitchers strike out or walks less than 24% of batters faced. Stats are based on the three years before and after (when available), and the season for when the split is computed


#10          (see all posts) 2008/10/07 (Tue) @ 02:09

Looking at Fangraphs 05-08 qualified pitchers I get a correlation of .08 for Fastball velocity (or average pitch velocity) and babip.  That’s 339 pitchers at an average of 198 IP (they dont have BFP) I get .20 if I look at “last 3 years”, but that’s almost all Wakefield’s doing, and the sample is really small.


#11          (see all posts) 2008/10/07 (Tue) @ 02:20

339 pitcher seasons


#12          (see all posts) 2008/10/07 (Tue) @ 02:22

Nevermind, just realized my data scrape off of fangraphs is bad.


#13    MGL      (see all posts) 2008/10/07 (Tue) @ 04:40

Chris, if those are BABIP from B-R, you would expect that finesse pitchers would have a higher BABIP due to a higher G/F ratio (which I assume they do).

If you want to see which group has a higher true BABIP, you would need to adjust for G/F ratio.

And the point of the critique of the James study is not necessarily that the power and finesse pitchers have the same BABIP.  It doesn’t matter if they do or if they don’t.

The point is that if you match up power and finesse pitchers by ERA and HR rates, you will necessarily have a biased matched sample, in that one group will have gotten lucky and the other group unlucky.  That is guaranteed. This is the essence of DIPS.


#14    Tangotiger      (see all posts) 2008/10/07 (Tue) @ 15:32

Bill hasn’t replied, but I replied to one of his readers:

What we see here is that there are TWO uncontrolled for parameters: (1) one is the enormous K and BB difference (representing power/finesse), and (2) the other is the difference in BABIP. And what do we find in the out-of-sample results? That there is a difference of 0.25 runs. (And our supposition is that if Bill were to show the results of the following regular season, we’d also find a 0.25 difference.) However, the reason is NOT because of the power/finesse variable. Because there are two uncontrolled for variables, we cannot tell why there is a difference. And by NOT acknowledging the BABIP parameter as existing, the results are *biased*.


#15    Tangotiger      (see all posts) 2008/10/07 (Tue) @ 16:18

In response to:

If Bill were studying ‘what type of pitchers do well’, it would be ‘biased’. He was instead anwering a discrete question: Do power pitchers indeed do better than finesse ones? A touch more on this semantic question later. Right now I have to get back to work!

I said:

He is specifically answering this question: “Do power pitchers perform better or worse compared to finesse powers who have otherwise performed equally”. But, the reason that they performed otherwise equally is because the finesse pitchers managed to post a better BABIP. The compensated with the lower K-BB differential by posting a better BABIP. And, in the out-of-sample results, our expectation is that the ERA will increase because the BABIP was not controlled for. The determinant is not the power/finesse categorization. Indeed, that the finesse pitchers posted a smaller differential already tells me that they are not otherwise equal. Groups of pitchers that have a smaller K-BB differential are expected to post ERA in the out-of-sample data worse than guys with higher K-BB differential. If the question is nuanced, then the results must be nuanced. And if it’s that nuanced, who is going to understand it?


#16    MGL      (see all posts) 2008/10/07 (Tue) @ 19:23

Yes, if you couch the question, “Given that two groups of pitchers performed equally, with performance defined by me, which group performs better, if any, in the post-season, then yes, Bill’s answer is correct - the power pitchers.

The reason is what is in question.  If you bias two groups such that one group is necessarily lucky and the other is necessarily unlucky, then it is tautologic (and completely uninteresting) that the unlucky group will perform better in any out-of-sample time period than the lucky group.

There is another simple way to look at this which I will post on the BJ site:

We know that teams in the playoffs have pitchers AND hitters who performed better than average in the reg season.  Therefore, by definition (anyone who does this kind of research or reads and understand it, knows this next point), the pitchers and the hitters in the post-season are better than average players AND got a little lucky in the reg season.

We also know for a fact that they will regress towards their true means in any out-of-sample period, including the post-season (or next regular season or the season before, or whenever).

Now, we know that power pitchers have better true talent than finesse pitchers.  So they will regress less than the finesse pitchers!

IOW, let’s say that the mean ERA for power pitchers is 4.00 and for finesse pitchers, it is 5.00 (I am exaggerating a little of course), in a league where the average ERA is 4.50.

Now, let’s say that all pitchers who get into the post-season are around 4.00 (better than average).  All of them will regress in the post-season simply because of regression to the mean (after adjusting for the quality of the hitters they face in the post-season).

But, the power pitchers will regress towards 4.00 and the finesse towards 5.00, so that in the post-season or any other out-of-sample time period, the finesse pitchers that were 4.00 in the reg season will be around 4.25 in the post-season (if they regress 50%) and the power pitchers will still be 4.00 in the post-season (4.00 in reg season and regressing towards 4.00).

So whether you focus on BABIP or not, the power pitchers will regress towards a lower mean than the finesse pitchers.  That is the problem, by the way, with using a “matched pair” method when the two groups have unequal means (in true talent).  In ANY out of sample timer period, the group that has the lower mean ERA (in this case, the power pitchers) will ALWAYS have a lower ERA!

That is true even if you looked at average pitchers in the regular season, but you still used matched pairs.  For example, let’s say that in the reg season, all of these post-season pitchers performed at league average, or 4.50.  Well, the finesse pitchers would regress towards 5.00 and the power pitchers would regress towards 4.00 so in any out-of-sample period the power pitchers will do better withing those matched pairs.

You CANNOT do matched pairs studies like he did when the two groups have different mean true ERA’s and then look at what happens to their ERA’s from one time period (in which you matched the players) to another timer period.

So technically Bill is right that the power pitchers performed better, but it HAS nothing to do with the post-season or the fact they both groups are facing above average batters, which was the whole point of his study, was it not?  To see whether one group or the other has an “advantage” in the post-season?

While the answer to that question is probably “no” it could be “yes” but you cannot tell from his study.  Not at all.


#17    MGL      (see all posts) 2008/10/07 (Tue) @ 19:41

There is no argument or controversy here.  Bill blew the study and Tango and I were kind enough to point it out.  I would hope that if I blow a study (or a study can be improved) that someone would be kind enough to point that out to me.

Yes, while technically, “For finesse and power pitchers who performed the same in the reg season, the power pitchers will perform better than the finesse pitchers in the post-season (and ANY other out-of-sample timer period),” CLEARLY he was attempting to answer another question, and though that he did, which was, “Whether power or finesse pitchers have an advantage in the post-season, given the same true talent?”

If I have two groups of players and I set it up so that one group got unlucky and another got lucky in a certain time period, do I need to do a study to see what will happen in any other time period?  I don’t think so.  Both groups will revert to their true talent levels. Always have and always will.

If your defense of Bill’s study is that, “Well technically he answered his question correctly,” you are being childish, as in, “Technically, I did clean up my room, mom.”


#18    david smyth      (see all posts) 2008/10/08 (Wed) @ 10:35

Could it be that power pitchers are a bit more “consistent” than finesse pitchers of the same ability in the postseason? A finesse pitcher allows significantly more BIP. Over the long haul, the starts when those extra BIP “fall in” should be canceled out by the days when they don’t. But if a pitcher is only gonna get 1 to 3 starts in a do-or-die series, I think most managers would prefer their bread and butter starters to be as least susceptible to the luck on BIP as possible. (If you have to start a bad pitcher, for some reason, maybe it’s the other way around.)


#19    Tangotiger      (see all posts) 2008/10/08 (Wed) @ 10:40

What if you find that the results are the same in regular season year X+1?


#20    Bjorn      (see all posts) 2008/10/08 (Wed) @ 11:54

David, I don’t know what the managers would prefer since their actions and wishes don’t always make sense.

But shouldn’t you (given the same “expected” i.e. average performance) prefer a higher variance (or as you put it, susceptiblity to luck) if you are the underdog in the game and a lower variance if you are the favorite.

Or to put it in simpler terms, there is value in beeing consistently good but the value comes 99%+ from the “beeing good” part. Simply beeing consistent should have almost no value in itself.


#21    MGL      (see all posts) 2008/10/08 (Wed) @ 13:30

Depends on how you define the “better” team.  If it is in rs/ra, then yes, you can change the win percentage if you change the distribution of those rs and ra, I think.  But if you define it as a “64% chance” of winning” then there is nothing you can do to change that, since the 64% already includes the variance.

If the question is would a finesse pitcher who allows 4 rpg and maybe has a larger variance have a lower, higher, or the same wp as a power pitcher who allows 4 rpg with a smaller variance, well that has nothing to do with the post-season.  I think that any below average pitcher benefits from a higher variance and an above-average pitcher does not.  We have discussed that before and it is addressed in various articles and publications as well.

One of my “beefs” with the article and with its defenders, like “Richie” on the BJ site, is that whatever Bill did or did not find has nothing whatsoever to do with the post-season (as opposed to any other out-of-sample time period) and clearly his intention was to see what might “work” in the post-season as opposed to the regular season, much like Nate Silver’s “secret sauce” which is mentioned in the article and which is getting a lot of “play” lately by writers like Rob Neyer.

As I said, it is unfortunate, in my opinion, that it is getting so much play.  Pretty soon it will be accepted sabermetric wisdom that the “secret sauce” for the playoffs is legitimate, correct, etc., when good analysts like Tango, myself, and others are questioning it to say the least and as of yet have gotten no defensive responses from anyone (not that I have written to Nate Silver or anything like that).


#22    Bjorn      (see all posts) 2008/10/08 (Wed) @ 14:08

MGL, I don’t know if your latest post was in reply to me or not but anyway, here goes.

I completly agree that this has nothing special to do with the postseason. As far as I am concerned aside from “downvaluing” depth and maybe a weather adjustment I see no reason to make any special consideration for the playoffs at all.

I would like to point out in this variance discussion that perhaps for a specific game it is not how a pitcher is compared to average that matters but the opposing pitcher.

I.e. if you have two different 4 rpg pitchers to chose from one with higher variance than the other you should probably start the high variance one if the opposing pitcher is 3 rpg and the low variance one if the opposing pitcher is 5 rpg. (Offences beeing equal.)



#24    MGL      (see all posts) 2008/10/08 (Wed) @ 21:32

I.e. if you have two different 4 rpg pitchers to chose from one with higher variance than the other you should probably start the high variance one if the opposing pitcher is 3 rpg and the low variance one if the opposing pitcher is 5 rpg. (Offences beeing equal.)

Interesting.  I’d have to think about that. Someone could probably answer that more quickly than I could.

I have not read Phil’s response yet, but I am glad he has entered into the fray.  He is very good at articulating a critique.  Very good indeed.


#25    MGL      (see all posts) 2008/10/08 (Wed) @ 21:45

Phil does in fact sum up the issue very nicely:

Another way to put it is that Bill’s study is legitimate – it truly does find that, all else being equal, power pitchers do indeed outperform control pitchers in October. But the reason they do so is simply that for a control pitcher to have the same regular-season record as a power pitcher, he has to have been lucky with respect to balls in play. And the luck doesn’t carry forward into the future.

However, he too, I believe is being overly solicitous with Bill.

He (Phil) says:

Twenty years ago, Bill’s conclusion would have been valuable to bettors and GMs – it would have told us something new. But, in today’s world, sophisticated sabermetricians are already controlling for balls-in-play luck. So, in this case, Bill’s study just gives us another confirmation of what we already know about predicting pitcher performance.

I take very strong issue with the implication above.  It is clear (to me at least) that Bill made a grave error in his study and did not realize it. He thought that he was comparing pitchers of equal quality, which he was not (as Phil, Tango, and I have pointed out, if two pitchers have the same ERA over any time period, and one has a higher K rate than the other, the pitcher with the higher K rate is the better pitcher, ERA-wise, and will perform better in any other time period), and he thought that he was validating the notion put forth by Nate Silver and BP (their “secret sauce") that if two pitchers are of EQUAL QUALITY (that is exactly what Bill said in his last paragraph), that power ones appear to do better in the post season.  That is NOT true, according to the study, because he did not test two groups of pitchers of equal quality.

As we all have said, he took two groups of pitchers who had the same ERA.  One group was lucky and the other was not (relative to one another).  He found that the unlucky ones performed better than the lucky ones in another time period.  Whoop de do, break out the champagne. If Bill KNEW that is what he did, do you really think he would write this article?

Come on, let’s call a spade a spade, Bill James or Bill Plasche!


#26          (see all posts) 2008/10/09 (Thu) @ 00:54

mgl/25,

Sure, I agree with you that Bill probably didn’t realize that the two pitchers in each pair aren’t really of equal quality.  However, it would have been legitimate to assume they were of equal quality 20 years ago, before Voros came along.

Perhaps I should have phrased it differently: Bill’s study is legitimate in that it shows that power pitchers will outperform finesse pitchers with equal *statistics*, if the statistics you choose are of the baseball-card variety and do not include BABIP.

But that’s something that we already know.

How’s that?


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 09 16:41
Sabermetric Moves of the 2009 Pre-Season

Jan 09 19:56
Modeling Baseball Player Ability with a Nested Dirichlet Distribution

Jan 09 18:08
Line Drives

Jan 09 18:04
Challenging Nate Silver (and all other forecasters)

Jan 09 17:31
Cheers

Jan 09 17:14
Teaching sabermetrics at school

Jan 09 16:51
The first Hardball Times Annual available for download!

Jan 09 14:44
Vote for the Worst Player in MLB

Jan 09 12:29
Clint Eastwood is Archie Bunker

Jan 09 12:16
Mailbags on Parade