THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, August 20, 2009

Sample size and granularity of data

By , 12:20 AM

I’m not going to get too much into this, but....

I was reading about Smoltz on FranGraphs.  He has pitched poorly in 40 IP or so this year.  Some people were saying, “That is a small sample size, so it doesn’t mean anything.” IOW, he still might be a good pitcher.  Other people responded with something like, “Yeah, but if you WATCHED him pitch, you would see that he was getting hammered, especially by lefty batters.” Some people were even saying that, “You can see that he didn’t have much stuff,” while others were saying, “What do you mean, he had a 91 mph fastball (which is pretty good for a starter) and a decent slider.”

Let’s say we have a sample metric like ERA. And let’s say that a pitcher has an 8.00 ERA in 40 IP.  We intuitively know (at least some people do) that that ERA alone does not tell us very much because it is only in 40 IP, and in 40 IP almost anything can happen to any pitcher.  The more technical explanation is that the standard error around a pitcher’s “true talent” ERA in only 40 IP is pretty high.

Now, the naysayers, like the people who don’t think Smoltz is any good anymore, say, “Yeah, but if you see that the pitcher got hammered in those 40 IP...”

So what does that mean?  Does that mean that we are now pretty darn certain that he is a bad pitcher even if we only have 40 IP to work with, because we “saw that he got hammered?”

No.

Now, putting aside the subjective nature of what it means to get “hammered” what we basically have by observing a pitcher getting hammered is more granular data.  With that, our standard error around a pitcher’s true talent in those 40 IP goes down.  So basically we are indeed more certain that he is a poor pitcher than we would be if all we knew was that he had an 8.00 ERA in 40 IP.

How much more certain?  I don’t know, but I would guess, “Not that much.”

Why is that?  Basically because there is “random” fluctuation in a pitcher’s “hammer factor” just as there is random fluctuation in a pitcher’s ERA.  A good pitcher may get hammered in any given day or month (3 IP or 50 IP) and a bad pitcher may not.

So basically, the rule is that the more granular your data, the more certainty you have given a certain sample size of that data.  But it is only a matter of degree and in most cases the differences are small.  What I mean is that the difference in certainty between your assessment of a pitcher from his ERA in 40 IP is not a whole lot less than from whether he got hammered or not in those 40 IP.  The reason for that is that the ERA and the “hammer factor” are very dependent variables.  If you had granular data that was more independent (of ERA), then the difference in your certainty between one set of data and the other might be greater.  IOW, it is pretty likely that a pitcher with a high ERA got hammered, and vice versa.  Not 100% likely, but the correlation is high.  Keep in mind that there are some levels of data that do NOT have much of an uncertainty factor as compared to other levels of data.  For example, let’s say that you used triples rates and other things like SB attempts or bases advanced as a measure of a player’s pure speed.  You would recognize that there is going to be sample error and biases (e.g., park effects) with that.  But what about if you just measured each guy in the 40 or the 60 with a stopwatch a few times on separate days. While there is going to be some fluctuation and measurement error there, it won’t be much.  This level of data is going to give you an answer with MUCH more certainty than the “speed score” stuff. Anyway, I digressed a little.

OK, what about the “stuff” thing?  Two things:

One, you now have another level of granularity of data.  And that can fluctuate as well.  Plus, it is hard to quantify that data in the short run.  One pitcher’s 91 mph fastball may be great and another pitcher’s 95 mph fastball can suck.  And if you think that a trained eye can distinguish the two in a few outings, I have news for you:  They can’t.  Which is why you have pitchers with supposedly great stuff who pitch for a while in the major leagues until it becomes evident from their results that they are awful and you have pitchers without supposedly great stuff who languish in the minors for years and then when they finally get their chance in the bigs they shine.  Why can’t the scouts tell beforehand?  I don’t know, but they can’t.  I don’t mean that they can’t at all. I mean that the can to only a relatively small degree.  For some pitchers at the extremes it may be fairly obvious, but for many pitchers in the middle, it is not.  You need to run them out there for a while and see what they do.  And even then…

Two, judging from the fact that some readers (on Fangraphs) say he has terrible stuff and other readers say that his stuff if good, it is difficult for ANYONE (even scouts) to separate talent from results.  Really difficult.


#1          (see all posts) 2009/08/20 (Thu) @ 01:26

Awesome stuff. I have encountered many Sox fans who have just told me that “he sucks. He gets rocked every time out.” And the problem is exactly what you were saying--that because his ERA and “getting rocked/hammered” are so dependent, of course that is what people are seeing. What they don’t pay attention to is his 4.38 xFIP, his low LOB%, and his high HR/FB rate (16.5%).

If we are concerned with how hard guys are hitting Smoltz, Statcorner has him with a tRA+ of 94. While it isn’t great, it also isn’t anything to scoff at, and now the mark of a pitcher who gets shelled every time out.


#2    Ryan JL      (see all posts) 2009/08/20 (Thu) @ 01:58

I agree with you and was surprised the Sox of all teams would give up on him.

It really is a form of begging the question: Smoltz is getting rocked because he has bad stuff.  We know he has bad stuff because he’s getting rocked.


#3    MGL      (see all posts) 2009/08/20 (Thu) @ 02:25

Pat, be as careful about the other stats (FIP, tRA, etc.) as we are about the ERA and the “getting rocked.” The fact of the matter is that we have 40 IP of data, regardless of what metric you want to construct from that data. No metric is going to tell us much.  We would like to think that observation will tell as lot, but I am afraid that just isn’t the case as much as we would like to think that it would.  If that were the case, would BOS have signed him and then released him only to have STL sign him and actually start him?

Would team after team let Ponson pitch or Jose Lima (a few years ago) or Jeff Weaver (while other teams wouldn’t let them pick up a ball for them) if they could observe and scout these guys and easily figure out if they still have the stuff to pitch in the major leagues?  I don’t think so and apparently not.

I’ll be honest with you. I think that scouts and GM’s are biased by two things when they evaluate pitchers like these:  One, their reputations and pedigree, and two, the results of their performance in terms of ERA and wins.

Here is somewhat of an example:  I watched Ceasar Carrillo, the young Padres starter, pitch last night.  The announcers, one a former pitcher himself (OK, he’s Mark Grant, who is not the sharpest tool in the shed, as when he was going on explaining the reasons why he “didn’t mind” an 0-1 count from a batter’s perspective), were talking about what a good job Carrillo did.  He allowed 3 runs in 6 or 7 innings, which isn’t bad, and he won the game.  Now, when I watch a game, I watch closely every pitch that the pitcher throws and don’t pay any attention to the score, etc. (other than how he pitches given the score of course).  To my eye, this guy was terrible.  Not even close to major league material.  He had OK stuff, maybe even better than that, but zero command of any of his pitches, especially the off-speed ones.  He was basically forced to throw fastballs in fastball counts, which is never a good thing unless that fastball is dominant, which his isn’t.  Sure, someday he might be a good major league pitcher, but he is not one now, at least from what I saw last night.

I think that scouts and GM’s, to some extent, do the same thing and think the same way as these announcers.  The guy allows 3 runs and wins a game, he pitched a good game.  It is so easy to pitch a bad game and allow 2 or 3 runs, it’s not even funny - or vice versa.  I just don’t think that GM’s and scouts (not all of them) are able to disassociate results from the process nearly as much as they should.


#4    Nick      (see all posts) 2009/08/20 (Thu) @ 02:32

MGL, would you not agree that Pitch f/x data can be especially useful in this situation?  Things like velocity, movement and even probably things like whiff rate would tend to “stabilize” much quicker. 

If you’ve found that Smoltz’ movement and velocity remained good on his pitches, and he was getting swings and misses at an above average rate, it seems right to conclude that his stuff is still there. 

Again, I don’t know how quickly pitches tend to stabilize, but I suspect very quickly for obvious reasons.


#5    Davor      (see all posts) 2009/08/20 (Thu) @ 08:43

Based on the small sample size, Smoltz still has good K/BB and LD%, but ba ERA, HR rate and BABIP. His stuff isn’t what it was before injury, but nobody should have expected it. He should be bonus in 4th or 5th slot for any team, but he isn’t an ace anymore. It is quite possible that he is in that period of career that every great pitcher has when his stuff isn’t exceptional anymore, and he should learn a different approach. He may be trying pitches which created outs previously, but now batters can hit them. Clemens, Petite, Moyer, Mussina,… all had such periods.
As for Ponson and others, they were mostly 2nd, 3rd, 4th… options for 5th starter. That usually means bad pitcher. And old rule is that if you can chose between several bad players, you chose the one who had in the past shown that he could be good, because there is a small chance he could return to that form.


#6          (see all posts) 2009/08/20 (Thu) @ 09:58

I have always wondered if it’s possible to tell a “good pitch” from a “bad pitch” just on the pitch itself, without observing the result.  You know when a guy hits a home run and the announcer said “he hit a pretty good pitch?” Is that true?  Are they able to tell? 

Because, if it is possible to tell (maybe with the help of Pitch F/X data, you could come closer to settling this debate.


#7    walkoffblast      (see all posts) 2009/08/20 (Thu) @ 11:31

The whole thing seems like a public misinterpretation of events. Its this false inference: Smoltz appears to be terrible when watched or looked at by traditional measures, the Red Sox get rid of Smoltz, thus the Red Sox got rid of Smoltz because he looked bad and had poor traditional numbers.

No, they got rid of him because of the way his contract was designed relative to his performance. They more than likely knew he was not terrible but still felt he was not worth giving more money everyday to be a starter when they had better options. If he would have moved to the pen maybe they keep him. Did anyone from the Red Sox say they got rid of him because he was done? I doubt it. Did lots of fans and sportswriters assume they did? Likely.


#8    MGL      (see all posts) 2009/08/20 (Thu) @ 12:00

#7, yes of course, you only keep a player relative to his contract and your next best option.  Sure, no problem with that.  But…

If they think that he should have started for the 4 or 5 times he did start, given his salary and their other options, what changed?  I don’t think they acquired any other pitchers or someone came off the DL.  Quite the opposite.  Wakefield is still not ready and Penny is proving that he is horrible. Bucholz was always available and a decent pitcher, according to my projections.  SO what changed other than Smoltz’ bad performance?  I think that no mater how you look at it, they thought that Smoltz was X at the time they signed him and at the time that they let him pitch and now they think he is Y, where Y is a lot worse than X.  That fact, if true, is interesting.  It says that their entire scouting staff, coaches, manager, and front office cannot evaluate him by “scouting him”, as compared to 40 IP of actual game performance, with all the inherent fluctuations.  What goes into that thinking?  “We watched him a lot Mr. Henry, and we were pretty sure he still had decent stuff.” And Mr. Henry says, “Well what changed between now and then?” “Well nothing sir, his stuff is still the same.” “Why did you let him go sir?” He couldn’t get anyone out in the games he pitches.” “We, of all teams, should know how much fluctuation in outcome there is in only 40 IP!” You go into Bill James office and you don’t come out until you write on the blackboard 1000 times, sample size, sample size, sample size....!”

“I have always wondered if it’s possible to tell a “good pitch” from a “bad pitch” just on the pitch itself, without observing the result.”

No, you cannot.  You can get a better idea, but you are going to missing one important thing without observing the result and you would still have to observe other pitches thrown before that pitch.  The quality of a pitch is based on three things:

1) The physical traits of that pitch, which you CAN get from pitch f/x (velocity location, and movement).

2) The “deception” (for lack of an all-inclusive better word) given the pitcher’s delivery.  That you CANNOT get from the pitch f/x and you HAVE TO look at the result.

3) The other pitches he throws in that exact same situation, which you can get from the pitch f/x.

So #2, which is important, is missing, Phil.

“MGL, would you not agree that Pitch f/x data can be especially useful in this situation?  Things like velocity, movement and even probably things like whiff rate would tend to “stabilize” much quicker.”

Nick, absolutely.  Which is why I have been saying for a long time that the pitch f/x data, if analyzed correctly, is the Holy Grail (not nearly perfect though) in terms of determining overall pitcher quality.  The limiting factor is the fact that the characteristics of the pitch and game theory elements “randomly” fluctuate as well so you will ALWAYS have sample size issues, even with perfect pitch f/x analysis.

Here is a great example of how fluctuation and sample size can greatly affect results and why you need a large sample of data even with great granularity and perfect analysis of that data:

Let’s say that in a given situation a pitcher is supposed to throw a fastball 70% of the time and an off-speed 30%.  And let’s say that he knows that and flips his mental 10 sided coin in his head each time that situation comes up.  And let’s say that the first 3 times, the 30% pitch comes up.  Chances are that the batter is not going to do very well with that pitch because he is looking fastball.  But if those first 3 “coin flips” came up “fastball” the batter is going to do a lot better.  In the long run of course, when there are 100 pitches like that and nearly 70% of them are fastballs, we’ll have a pretty good idea as to the pitcher’s true talent in that situation.  But imagine the hundreds of situations there are in a game, given the 9 batter batters a pitches faces and the inning/score/outs/runners. And then imagine the above example (70% fastballs and 30% off-speed) and you can see why it takes so long to get into the long run for pitchers.  Not to mention the many other sources of sample error, like what happens to the pitch after it leaves the pitcher’s hand!


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:10
Mail: rWAR v fWAR

Sep 02 15:08
The two uncertainties of UZR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?