THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Thursday, August 20, 2009

Sample size and granularity of data

By , 12:20 AM

I’m not going to get too much into this, but....

I was reading about Smoltz on FranGraphs.  He has pitched poorly in 40 IP or so this year.  Some people were saying, “That is a small sample size, so it doesn’t mean anything.” IOW, he still might be a good pitcher.  Other people responded with something like, “Yeah, but if you WATCHED him pitch, you would see that he was getting hammered, especially by lefty batters.” Some people were even saying that, “You can see that he didn’t have much stuff,” while others were saying, “What do you mean, he had a 91 mph fastball (which is pretty good for a starter) and a decent slider.”

Let’s say we have a sample metric like ERA. And let’s say that a pitcher has an 8.00 ERA in 40 IP.  We intuitively know (at least some people do) that that ERA alone does not tell us very much because it is only in 40 IP, and in 40 IP almost anything can happen to any pitcher.  The more technical explanation is that the standard error around a pitcher’s “true talent” ERA in only 40 IP is pretty high.

Now, the naysayers, like the people who don’t think Smoltz is any good anymore, say, “Yeah, but if you see that the pitcher got hammered in those 40 IP...”

So what does that mean?  Does that mean that we are now pretty darn certain that he is a bad pitcher even if we only have 40 IP to work with, because we “saw that he got hammered?”

No.

Now, putting aside the subjective nature of what it means to get “hammered” what we basically have by observing a pitcher getting hammered is more granular data.  With that, our standard error around a pitcher’s true talent in those 40 IP goes down.  So basically we are indeed more certain that he is a poor pitcher than we would be if all we knew was that he had an 8.00 ERA in 40 IP.

How much more certain?  I don’t know, but I would guess, “Not that much.”

Why is that?  Basically because there is “random” fluctuation in a pitcher’s “hammer factor” just as there is random fluctuation in a pitcher’s ERA.  A good pitcher may get hammered in any given day or month (3 IP or 50 IP) and a bad pitcher may not.

So basically, the rule is that the more granular your data, the more certainty you have given a certain sample size of that data.  But it is only a matter of degree and in most cases the differences are small.  What I mean is that the difference in certainty between your assessment of a pitcher from his ERA in 40 IP is not a whole lot less than from whether he got hammered or not in those 40 IP.  The reason for that is that the ERA and the “hammer factor” are very dependent variables.  If you had granular data that was more independent (of ERA), then the difference in your certainty between one set of data and the other might be greater.  IOW, it is pretty likely that a pitcher with a high ERA got hammered, and vice versa.  Not 100% likely, but the correlation is high.  Keep in mind that there are some levels of data that do NOT have much of an uncertainty factor as compared to other levels of data.  For example, let’s say that you used triples rates and other things like SB attempts or bases advanced as a measure of a player’s pure speed.  You would recognize that there is going to be sample error and biases (e.g., park effects) with that.  But what about if you just measured each guy in the 40 or the 60 with a stopwatch a few times on separate days. While there is going to be some fluctuation and measurement error there, it won’t be much.  This level of data is going to give you an answer with MUCH more certainty than the “speed score” stuff. Anyway, I digressed a little.

OK, what about the “stuff” thing?  Two things:

One, you now have another level of granularity of data.  And that can fluctuate as well.  Plus, it is hard to quantify that data in the short run.  One pitcher’s 91 mph fastball may be great and another pitcher’s 95 mph fastball can suck.  And if you think that a trained eye can distinguish the two in a few outings, I have news for you:  They can’t.  Which is why you have pitchers with supposedly great stuff who pitch for a while in the major leagues until it becomes evident from their results that they are awful and you have pitchers without supposedly great stuff who languish in the minors for years and then when they finally get their chance in the bigs they shine.  Why can’t the scouts tell beforehand?  I don’t know, but they can’t.  I don’t mean that they can’t at all. I mean that the can to only a relatively small degree.  For some pitchers at the extremes it may be fairly obvious, but for many pitchers in the middle, it is not.  You need to run them out there for a while and see what they do.  And even then…

Two, judging from the fact that some readers (on Fangraphs) say he has terrible stuff and other readers say that his stuff if good, it is difficult for ANYONE (even scouts) to separate talent from results.  Really difficult.

(8) Comments • 2009/08/20
Page 1 of 1 pages

Latest...

COMMENTS

May 26 07:27
“Why Kickstarter works”

May 26 03:03
Pete Palmer’s new book: Basic Ball

May 26 01:11
Largest demonstration in Canadian history?

May 25 19:41
What sabermetrics is NOT

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

August 20, 2009
Sample size and granularity of data