THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Sunday, August 23, 2009

How can we tell if a pitcher is any good?

By , 03:42 AM

I was reading the thread on BBTF about David Cameron’s article comparing Smoltz and Jered Weaver (who is a very good pitcher - we think).  By the way, “we think” is my standard disclaimer on all qualitative comments on players.  That is because we really don’t know for sure how good or bad any player is - we can only make an educated guess with some level of certainty - and that level of certainty has a level of certainty as well.  But that is not the point of this post.

David basically argues that any pitcher, good or bad, can have a run of bad performance just like Smoltz did.  He used Jered Weaver as an example.  Of course he is right.  While any bad spate of performance suggests that a pitcher is bad, we have to add that bad spate to all of his previous performance and put in the appropriate weightings and age adjustments for us to make any sense of it (the bad spate of performance).  The difference between Weaver and Smoltz is that Weaver’s bad spate was a blip, and thus not much of a factor, in a 3 or 4 year healthy career, whereas Smoltz’ bad spate was all of his performance after a 42 year old player, albeit a formerly great one, had serious arm surgery.

Still, we really don’t know how much of that bad - OK, terrible - spate of performance relates to his future performance.  Because it is only 40 innings or so, not much of a sample to have much meaning - even if it is spectacularly good or bad performance. Some meaning yes, but definitely not a lot.

So what do we have to work with in order to make an educated guess as to his, or any pitcher’s, future performance or true talent at the present time?  In the case of Smoltz, we have other information, or priors in a Bayesian sense.  The problem is that we are going to have a hard time quantifying those priors for obvious reasons.  One prior is that he was, before his most recent injuries and surgeries, a great, great pitcher, both as a reliever and starter.  We also know that he is old and has had major surgery.  Certainly those go into the equation, but we don’t really know what to do with them.

We also know more about his 40 IP performance than his ERA or even his OPS against, FIP, etc.  David pointed out that he still has a 91 mph fastball and an 85 mph slider and splitter, and he said that is evidence that he is still capable of “getting major league hitters out” which is a euphemism for being not a terrible pitcher - maybe better than replacement level if we had to quantify that term.

Someone on the BBTF thread, which was mostly silly, as many of their threads are, pointed out that there are hundreds of AA and AAA pitchers (and probably lower minor league, high school and college) who have 91 and 85 mph pitches, like Smoltz.  He is right.  Actually both are right.  The person who said that is right in that having a 91 mph fastball and an 85 slider and splitter by no means is 100% or even 90% indicative that a pitcher, even a HOF one, is capable of pitching effectively in the majors.  Dave is being reasonable in pointing out that we are not talking about a HOF pitcher who had surgery and now pitches at 87 or 88 like some pitchers.  The fact that he can throw 91 shows some promise and potential.

Now, a lot of people on the thread kept saying things like, “Yeah, but if you saw him pitch, you would see that despite his 91 mph fastball and 85 mph slider and splitter, he could not locate them and he was getting hammered, especially by lefties.” They also said, “Yeah, he looked fine the first time through the order, but he got hammered after that.”

The problem with that argument is that ANY pitcher can look bad as far as his location goes and can get hammered in 1 or 5 or even 10 games, which was David’s original point in comparing Smoltz to Weaver.

Now, the people bringing up how he looked and how he got hammered and how his location was poor are making somewhat of a legitimate point.  There are two levels with which we can look at a pitcher’s performance.  One, the numbers only:  ERA, WHIP, K, BB, OPS against, whatever. If those are bad in say 40 IP, we can say something like, “There is a 10% chance that this guy is a terrible pitcher,” not knowing anything else about him. Now we can also look at his location and whether he got hammered or not, independent of his numerical results.  If those things are bad or even worse than the numbers, we might say that there is a 12% or 15% that he is a terrible pitcher.  If those things are not so bad, there might be only a 3% chance that he is a terrible pitcher.  (You can see how getting hammered or not actually trumps the numbers to some extent, but by no means is getting hammered indicative of anything in the short term.)

So, for Smoltz or anyone else for whom we only have 40 IP of data, while the observation of location and “hammerness” is interesting and important to the evaluation and can add to or even trump the numbers, it has its limitations because a good pitcher can get hammered or have bad location in any given number of games or innings.  Look at Weaver in the time period that David describes in his article and look at Sabathia the first few weeks of this season (he got absolutely hammered I think). You can probably look at 3 or 4 or 5 game stretches of just about any pitcher and find them getting hammered and having poor location skills.

So that brings us back to the title of this post. How the heck can we tell how good a pitcher is?  We know that pitchers are difficult to project statistically for various reasons.  I’ll also say - and people may disagree with me - that teams and their scouts and other evaluators do a horrible job at evaluating pitchers, other than the obvious ones.  Anecdotal evidence of that is the fact that teams routinely cut and sign what turn out to be terrible washed up pitchers.  If it were easy to scout a pitcher to see whether they were still effective or not, pitchers like Bruce Chen, Odalis Perez, Sidney Ponson, the bad Weaver, etc., would not bounce around from team to team, in and out of retirement, and from the majors to the minors and back again.  But that is also another story.

So what can we (anyone) do to evaluate pitchers in the long and short runs?  We can use the “numbers” but we will run into two problems:  One, for young pitchers we don’t have enough data to make these evaluations with much certainty.  Two, even when we have a lot of numbers, we find that pitching projections are not that reliable, depending on your definition of “not that” of course.  Personally, I put a lot of time and effort into pitching projections by the numbers and I get very frustrated each year when dozens of pitchers seem to belie their projections and many of them go on to do that for many more years, as if their true talent drastically changed from one year to the next (maybe it did and maybe it didn’t).  For example, all of a sudden Edwin Jackson and Jason Marquis are Cy Young and Rich Harden and Ervin Santana are Cy Espstein.  Of course, injuries play a large role in the uncertainty and difficulty in pitcher projections but they are an integral part of the game.

So what else can we do?  As I’ve said many times, I watch as many games as anyone, and I kind of specialize in pitching analysis.  I can tell you that almost ANY major league pitcher has either very good stuff or very good command or both.  However, on any given day, it is amazing how lousy or good a pitcher’s command can be and even how good or bad his stuff can look (especially when the command is there or not there).  For example, if guys with fairly average stuff happen to hit their locations for a game or two, they can look like Greg Maddux and absolutely dominate their opponents.  You would think they were great, great pitchers even if you recognized that they didn’t have great stuff, although I can tell you that command can go a long way in making it look like a pitcher has good or bad stuff.  Jason Marquis and Jeff Weaver are great examples of this in the 06 post-season. Both pitchers looked like Cy Young in the post-season, yet Jason Marquis, until this year, was a bad pitcher, and Jeff Weaver had already imploded before that post-season. 

On the other hand, a pitcher with great stuff can look like absolute crap when he is not in command of those pitches. And some days he can have great command and other days he will have lousy command. 

Now, you may be saying to yourself, “Well if a pitcher has great stuff he has more of a chance to be a good pitcher and if he doesn’t, he has less of a chance, so one of the first things we can do with a young or even an old pitcher is to evaluate his stuff. And the quality of a pitcher’s stuff should not fluctuate all that much, all the more reason to use it to evaluate that pitcher.” There is some merit in that, and of course that is one thing that scouts do - to a fault I think.  Here are some of the problems with that.  The obvious ones are that lots of pitchers have good or great stuff, but good pitching is more than that, as we all know.  So maybe you have solved 10% or 20% of the mystery by evaluating a pitcher’s stuff.  That still leaves a lot left.

Perhaps more importantly though, is what constitutes good stuff?  A scout may drool at a 96 mph fastball, but as we know, there are 96 mph fastballs and 96 mph fastballs.  IOW, some end up being quality pitches and some don’t, even independent of command, although command plays a large role in how effective a pitch is, not only when that pitch is thrown, but in the grand scale of pitching - the less command you have of your pitches, generally the more predictable you will be. For example, if you cannot throw your off-speed pitch in the strike zone with any regularity, you will be forced to throw fastballs in fastball counts and even that 96 mph fastball, especially if THAT is not commanded well, is going to get hammered in fastball counts.

So how do we tell whether a pitcher has good or great stuff.  Scouts will say, “Just watch him and see what he throws and look at the movement, velocity, and command.” I say BS!  If it were even close to that easy, teams would know who was a good pitcher and who wasn’t, which they clearly don’t.

Why is that? For several reasons:  One, the eye cannot see the exact movement of a pitch.  Two, the eye cannot see the deceptive element of a pitch, which is important to its effectiveness.  Three, the effectiveness of a pitch is partially based on when it is thrown and the other pitches that are thrown and when, and all of that is complicated.  Plus, as I keep saying, command is such a critical part of the equation, and one, a scout cannot necessarily quantify command with any precision from watching a pitcher, and two, his command is going to fluctuate a lot from session to session and from game to game.

Now, of course a lot of those things that the scout cannot see or cannot measure or see with any precision can be gotten from the data - like the pitch f/x data.  And as I have said from its inception, we have not even scratched the surface as far as using it to evaluate pitchers and pitching in general.  But, the problem with that is that that data is subject to fluctuation and sample size error.  So we are back to the same thing as we started when we just used numbers like ERA, ERC, OPS against, tRA, etc.  Sample size.

To conclude this already too long post, this is what we (teams or anyone that wants to evaluate pitchers) need to do with the pitch f/x data:  We start with velocity.  Smoltz has a 91 mph fastball.  OK, for 91 mph fastballs, what kind of movement is necessary for it to be effective?  Obviously we need to set arbitrary boundaries for effective versus not effective.  OK, given a certain movement, what kind of location and/or command does it need to be effective? 

Now we compare Smoltz’ fastball to those baselines.  For example, let’s say that we find that a 91 mph fastball is pretty good, but only if it moves at least X horizontally and Y vertically.  Does Smoltz’ meet that requisite?  No?  Then he is in trouble.  Let’s say that it does.  Now, we’ll look at all 91 mph fastballs with similar movement and we’ll see what kind of command and location is necessary for it to be effective.  Then we will compare that to Smoltz’.  We’ll do the same for his other pitches.  Then we’ll look at how often he throws the various pitches in the various counts and we’ll do a similar analysis. I know, this gets really complicated, but I think it is doable to some extent.

The one problem we are going to run into is the deception. I firmly believe that a big part of any pitch’s effectiveness is its deceptive nature by virtue of the pitcher’s motion and release point which may or may not be able to be observed or measured. If not, we have to rely on the actual effectiveness of each pitch in each location.  For example, if the average 91 mph fastball with the same movement as Smoltz in a 2-2 count has a lwts value of zero runs, and Smoltz’ is negative (good for him), then we might infer that he has some deception going for him that is better than the average ML pitcher.  Of course, we have to control for how often he throws his other pitches.  For example, if the average pitcher throws that same 91 mph fastball 50% of the time in that count and Smoltz throws it 40%, he is probably going to get a better result in that pitch.

So it is a complicated process to be able to evaluate pitchers, and I think we have a long, long way to go as compared to what we (as analysts) and teams (scouts) are going right now, including a combination of the two (scouts and stats).  A long way to go.  And I think that there will be or there is the potential for great strides in the next 10 years or so, owing in part to the availability of the data like pitch f/x. I also think that when the breakthroughs come, it will likely be in the sabermetric community and that the baseball world - the teams - will lag by 5 years or so (some teams more)…


#1    Matthew Cornwell      (see all posts) 2009/08/23 (Sun) @ 04:31

Great info!  Just for the record, Jason Marquis didn’t throw a single pitch in the 2006 playoffs.  Doesn’t really change anything, but I know you like to be accurate in all that you write.


#2          (see all posts) 2009/08/23 (Sun) @ 07:14

Coming out from my exile for a moment

Seems like you’re coming just short of saying that

Watching (top level) baseball makes little to no sense.

At least as far as getting any objective inferences about teams and players is concerned.

And thing brings us to the sad baseball paradox: trying to know how baseball really works is detrimental to your enjoyment of baseball.

Very sad revelation this ...

OK, back to my exile ... and watching cricket.


#3    MGL      (see all posts) 2009/08/23 (Sun) @ 11:44

Matt, thanks for the correction.  It must be the Jew in me that would like to think that he did! wink


#4    MGL      (see all posts) 2009/08/23 (Sun) @ 11:47

Suppan was whom I was thinking of!  A bad pitcher who pitches extremely well during that post-season, I think, and looked like a control artist.


#5          (see all posts) 2009/08/23 (Sun) @ 21:55

With so much that we still don’t know about evaluating pitching, I guess we shouldn’t be surprised that teams continue to pick up guys like Oliver Perez, Jeff Weaver, Sidney Ponson, and Bruce Chen.  Those guys can look okay (even dominant in Perez’ case) for brief stretches.


#6    MGL      (see all posts) 2009/08/23 (Sun) @ 22:31

For teams, I think the answer is to pay more attention to the “real” numbers (for example, if I owned or ran a team, the words “wins” or “win/loss record” should NEVER be uttered in a discussion or meeting about a pitcher - how many teams do you think still use those terms?  I bet a lot!) and to figure out a way to get more out of the pitch f/x data. Clearly whatever some teams are presently doing to evaluate pitchers and make decisions (like recycling these retreads, giving Oliver Perez a lucrative contract, trading Kazmir for Victor Zambrano, etc.) is not working too well.

I think that teams are far, far away from taking the pitch f/x data and using it to better evaluate and project pitchers.  (And using it to improve upon the strategy that they teach their pitchers.)

As I said, I think that most of the breakthroughs in that regard are going to come from the sabermetric community and that most teams will lag far behind them.  And the sabermetric community has only scratches the surface of the pitch f/x data so far.


#7    cdm      (see all posts) 2009/08/23 (Sun) @ 22:33

MGL,

You mentioned pitchF/X.  Has anyone gone through the whole pitchFX database, aggregated the velocity, movement and command of each pitcher for each of his pitches?  One could then use these data to predict ERA (et al).  One could even do something nifty like quantize each dimension (e.g., FF speed into plus, minus, etc.) and run a multivariate analysis, since clearly the factors will interact…

I ask because I couldn’t find it, so I went ahead and learned SQL yesterday to make an initial stab at it. There are some problems equating lefties and righties, and the fact that a slider has essentially 0 movement, but the preliminary data are interesting (and more important, fun).  If anyone has a tight (read: small) SQL database with eliasIDs and ERA in 2007, 8 and 9, I’d very much like to match the pitching data up with the outcomes.

If you normalize control (strike percentage) and velocity, and then weight them evenly, these are the top 10 fastballs in the league 07-09:
1. Jose Valverde
2. Matt Thornton
3. Joel Zumaya
4. Carlos Rosa
5. Kerry Wood
6. Jonathan Broxton
7. Jason Motte
8. Jonathan Papelbon
9. Matt Lindstrom
10. Frank Francisco

little bit of face validity thrown in there.


#8    cdm      (see all posts) 2009/08/23 (Sun) @ 22:59

MGL:

Now we compare Smoltz’ fastball to those baselines.  For example, let’s say that we find that a 91 mph fastball is pretty good, but only if it moves at least X horizontally and Y vertically.  Does Smoltz’ meet that requisite?  No?  Then he is in trouble.  Let’s say that it does.  Now, we’ll look at all 91 mph fastballs with similar movement and we’ll see what kind of command and location is necessary for it to be effective.  Then we will compare that to Smoltz’.  We’ll do the same for his other pitches.

Actually, Smoltz had decent velocity on his fastball (above league average), with good movement (again, league average). His change had good separation, and his slider still slid. He’s at least league average across the board.

Up until now, baseball was nice because the events were largely independent.  This allowed you to calculate linear weights, and play with simple regression models. Now Pitch F/X data have *huge* dependencies. How do you get a regression to tell you that “A change-up is only valuable when it is significantly slower than the fastball, and can be thrown for a strike in the lower half of the plate”? You can’t. The complexity of this problem is sufficient to really challenge the greatest contemporary minds using the most advanced methods.  The saberists who don’t know a kernel ridge regression from a support vector machine won’t get too far.

I think it would be brilliant for a team to jump in now that the Netflix prize is won, and announce the Billy Beane challenge to use pitchF/X data to predict future performance beyond some threshold for $1m. You’d get the attention of some really smart people, and advertise at the same time.  Short of that, though, this stuff is going to slowly creep along as it becomes too difficult for even the geekiest of fans.


#9    Mike      (see all posts) 2009/08/24 (Mon) @ 00:14

The complexity of this problem is sufficient to really challenge the greatest contemporary minds using the most advanced methods.

I agree with that and what MGL said about us having just scratched the surface.

The saberists who don’t know a kernel ridge regression from a support vector machine won’t get too far.

I doubt that.  The people who know that stuff will probably be next to worthless with the PITCHf/x data set just like most academics are next to worthless when it comes to baseball problems because they don’t understand the realities of the game.  As Tango likes to say, the subject matter experts will beat the stuffy academics every time.

You’d get the attention of some really smart people, and advertise at the same time.  Short of that, though, this stuff is going to slowly creep along as it becomes too difficult for even the geekiest of fans.

You already have the attention of some very smart people on this problem.  But you’re right that some real money would go a long way toward making a difference.  Most of the very smart people who are looking at the data are doing it as a hobby because major league teams don’t pay enough for people to work for them.  That’s the reason it “slowly creeps along”, not because the people looking at the data aren’t smart enough.


#10    Nick      (see all posts) 2009/08/24 (Mon) @ 00:30

While I agree that a kind of Pitch f/x “scouting” like you described is the ultimate goal, however, I think their is a happy medium that we can do without an exceptional ammount of work. 

I’ve already mentioned this before, but things like swining strikes, zone% and other such stats easily attainable using Pitch f/x have much less variance attached to them that even things like FIP and tRA.  That’s because pitchers can get extremely lucky/unlucky with umpires, or just get unlucky in terms of timign with their pitches.

I’m working on something that models ERA using those types of pitch f/x stats, and I think it could be pretty useful a small sample size.  Of course you would still need to know how to regress that information properly, however, I think a metric like that would take out MUCH more noise that FIP. 

Of course the next step is breaking it up by pitch type; however, that takes a while to do, and it has to be done on a case by case basis.


#11          (see all posts) 2009/08/24 (Mon) @ 00:36

Agree with cdm and Mike. Actually, I disagree with cdm about what Mike disagrees with him about.  (That was a mouthful!)

I don’t think it is necessary to have much of a hard-core statistical background to tackle these kinds of things.  It might help, but as Mike and Tango have said, for some reason, the academics with the best tools seem to be piss poor at analyzing these kinds of things.  In general.  Not always of course.

That being said, there seem to be more and more sabermists who are also very good statisticians, so maybe by the time the pitch f/x data goes beyond the “scratching of the surface” stage there will be lots of academicians (or just people in various fields with similar training and skills) who also have the requisite knowledge of and interest in the game itself.

As far as money, baseball analysis is very lucky to have gotten as far as it has given the fact that almost no one makes a lot of money from the analysis and most people make none.  The reason we still have the plethora of work that we do have is because people love the game of baseball.  If this were analysis of something dull and boring, we’d get nowhere without money.  But yes, adding some money to the mix would go a long way.  Someday a bold and progressive (and smart) team will budget a few million for the statistical analysis department and hire a bunch of really smart saberists and pay them a couple of hundred thou a year.  Or someone will start a think tank with a large grant from the government or from baseball itself.


#12    pft      (see all posts) 2009/08/24 (Mon) @ 01:58

#8.  Smoltz slider had nowhere near the bite he used to have.  As such, he could not get out LHB’ers, as this was a pitch he relied on against them. 

When your shoulder is weak or injured, you may still have decent velocity on your FB, as Smoltz does, but your breaking pitches suffer, as does command.  Even guys who throw 97 are hittable without effective secondary pitches and good command.

Now it may well be that as his shoulder gains strength, that bite comes back, and Smoltz regains his effectiveness.  Going to the NL does not hurt, as the AL East is no place for a pitcher coming back from an injury.


#13    Nick      (see all posts) 2009/08/24 (Mon) @ 03:22

pft - Smoltz’ slider still has roughly average movement.  Saying stuff like this:

Smoltz slider had nowhere near the bite he used to have.  As such, he could not get out LHB’ers, as this was a pitch he relied on against them.

Is ridiculous.  Of course he doesn’t have the stuff he used to; he’s 42!!  What most people care about is whether or not his stuff is above average, and based on it’s movement and velo, and his pitch results, it probably is. 

As to your second point, there is really no basis behind that.  The reason that he “couldn’t get out lefty batters” was that he faced 100 of them.  That is a completely insufficient sample from which to make any conclusions.


#14    Nick      (see all posts) 2009/08/24 (Mon) @ 07:10

Also, how stable is pitch location?  I mean, how much does his first 40 innings of location predict the rest of the year? 

I checked his location with Pitch f/x, and he’s been throwing about 4% more pitches in the middle third of the plate compared to the league average player, which is obviously bad.  However, is that predictive for the rest of the year, or only a little like most other stats?


#15    MGL      (see all posts) 2009/08/24 (Mon) @ 09:21

The only thing that is fairly predictable from a small sample of performance that I am aware of is velocity.  Of course location is going to suffer from sample size error.  That is one of the biggest things that creates fluctuation in pitcher results from start to start, I would think.

I would suspect that movement is fairly stable too, but I am not sure.  That would be an interesting study.  Anyone aware of any work in that regard?  Deception is probably stable too, since that is a function of release point and other physical attributes of a pitcher’s delivery, but as we have discussed, that is hard to measure.  It is the value of a pitch minus the value of an average pitch for all pitchers given the same velocity and movement, with everything else being controlled for (such as how often it is thrown at the various counts, a pitcher’s other pitches, etc.).


#16    Peter Jensen      (see all posts) 2009/08/24 (Mon) @ 10:07

I think understanding why a pitcher pitches well in one start and doesn’t pitch well in another is THE most interesting question in baseball.  But it will take more than Pitch f/x analysis to reach any valid conclusions.  You first have to control for the quality of the opposing teams offense,the influence of the umpire, and the weather.  Those are important factors outside of the pitcher’s control that vary from game to game.  That means having an every pitch simulator that has inputs for those factors.  Until you have that simulator and run the game multiple times you have no idea whether a pitcher’s performance in any one game is better than expected or worse.  It will also help when we have a multi-year database of Hit f/x to help separate the pitcher’s contribution from the defense, and will become even more accurate when we get a multi-year Field f/x database that will provide an even more accurate assessment of fielding ability.

I just don’t think we have accurate enough data inputs at the present time to be able to differentiate the nuances of pitching performance changes from game to game.  That doesn’t mean that we shouldn’t begin thinking about the problem and developing plans for the type of metrics that we would like to use once we get accurate data.  Better Pitch f/x analysis will definitely be part of the solution.  But the problem is very complex and will require more tools than Pitch f/x.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jul 30 03:43
Roy Halladay’s Bobby Orr career

Jul 30 02:33
Cleveland: Meet Patrick Roy

Jul 30 01:42
“I believe…”

Jul 30 00:30
Maddon at it again…

Jul 29 23:04
Introductions: Strasburg, BABIP… BABIP, Strasburg

Jul 29 20:31
Bannister: the greatest saberist spokesperson ever

Jul 29 19:25
Gotta give Joe Torre some credit

Jul 29 19:10
SABR 111 - Out value

Jul 29 17:47
Reducing bias in fielding metrics

Jul 29 17:44
Colin full-time at BPro