THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, May 14, 2008

Another article about when current season stats become “real” (or something like that)

By , 03:58 PM

Ron Shandler chimes in about the “reliability” of statistics at any point in the season.  While he makes some good points about how the different measurements require different sample sizes (e.g., PA) to be equally reliable, the piece is littered with phrases, and I am paraphrasing, despite the quotations, like “become meaningful,” “reliable,” “taken seriously, etc.

As I mentioned in another thread, I don’t like those terms being thrown around, with respect to this issue.  They are misleading.  And dangerous.  One man’s reliable is another man’s unreliable.  More importantly, the more a short-term statistic strays from a career one and/or from a population mean, the less “reliable” it is, given identical sample sizes (of the current performance).  Not to mention the fact that you cannot make any mention of a statistic’s reliability based on the sample size of that statistic without knowing the prior history of the player or the mean of the population.  A player with no history hitting .260 on May 15 is probably around a .260 hitter (at least that is our best estimate, albeit without a great deal of certainty).  A player hitting .260 On May 15 who has been a .300 hitter his whole career is probably a .295 hitter.

So how in the world can we say that May 15, or any other date, is the date at which a statistic becomes reliable, without knowing the prior history of the player and the mean of the population?

Similarly, if a player is hitting .300 on May 15 with no history, his true BA is probably around .270.  So one player with no history who hits .260 on May 15 has a true BA of .260.  Another player with no history is hitting .300 on May 15 has a true BA of .270.  In one case, his short-term BA is likely his true BA.  In the other case, his short term BA is likely nowhere near his true BA.  Again, how can we talk about a date or a current sample size, in isolation, that makes a player’s statistic reliable or not?  We can’t!


#1          (see all posts) 2008/05/14 (Wed) @ 17:34

Plus MLEs do give us history up through the first few years of a players career.

I’ve spent a lot of time projecting the Pirates, I believe pretty accurately. Nate McLouth is doing great, just like in Altoona 2004. The next three years he’s probably going to be back hitting 260-270. Waiting Paulino to hit .315 again? Bautista to be more than a 250 hitter with 15-20 HRs. That’s what they are. Their major league performance is perfectly in line with what they did in the minors.


#2    Pizza Cutter      (see all posts) 2008/05/14 (Wed) @ 18:11

While I appreciate the nod… he kinda screwed up on something.  Contact rate is actually times made contact / swings.  I did it at the pitch level.  (Also, how is it that I found out that I was in USA Today from here?)


#3    Pizza Cutter      (see all posts) 2008/05/14 (Wed) @ 18:19

On the bigger issue, about the best that we can do is to say “at X PA, the stat meets this reliability criteria.” As far looking at it on an individual player level, a study could look to see which is the better predictor over time, historical numbers or the current season.  Maybe it’s already been done.


#4    watercott      (see all posts) 2008/05/15 (Thu) @ 13:40

Tango, I think you’re confusing whether a sample is “reliable” with whether it is equal to the overall population mean.  I would take a sample to be “reliable” if, well, I would be willing to rely on it to draw conclusions.  Whether it turns out to be similar to the population mean is a separate question.

I think the question they’re asking here is at what point would you consider a recent sample large enough to rely on it OVER the performance before that sample period.  What would give us a more accurate picture of real ability, the last 5 years of a 10 year career, or all 10?  We spend a lot of time weighting more recent performance more heavily, etc., but what we really want to accomplish (in this case) is to figure when true talent has actually changed, and weighting won’t help us do that.

Your point that one man’s reliable is another’s unreliable stands, though.


#5    Tangotiger      (see all posts) 2008/05/15 (Thu) @ 14:11

Water/4: by “Tango” do you mean “MGL”?


#6    MGL      (see all posts) 2008/05/15 (Thu) @ 14:56

You can leave it as Tango and he can respond if he wants! wink


#7          (see all posts) 2008/05/22 (Thu) @ 11:58

yeah, I did.  In fact, I was really surprised later when I found a quote by Tango in another post stating that (approximate quote) “after 250 at bats, a sample is reliable”.

This is the question people are trying to ask, though, isn’t it?  When should we totally discount far-past performance and instead base conclusions only on some recent sample?


#8    Tangotiger      (see all posts) 2008/05/22 (Thu) @ 12:21

water/7: Never!  You can never totally discount anything.

The weighting I use is:
weight for hitting = .9994^daysAgo
weight for pitching = .9990^daysAgo

So, even performance from 10,000 days ago has some weight, albeit realllllly tiny.

As for what I’m sure I meant about the 200-250 PA being reliable is that at that point, your regression toward the mean is 50%.  That is, half of that performance is real, and half is not.  The amount to regress is:
x/(x+PA)
where x=200 for hitters and 300 for pitchers and 400 for fielders (BIP, not PA for them).

If you are going the component route, then that “300” for pitchers will be much lower for K rates and BB rates, and far higher for BABIP rates.


#9    MGL      (see all posts) 2008/05/22 (Thu) @ 12:38

There is no “magic point.” It is a sliding scale which changes on each PA.  For a batter, a good rule of thumb is that each previous year gets 80% of the weight of the following year, and for pitchers, 60%.  That is a very general rule of thumb.  There are many caveats, exceptions, etc.  For example, every component has a different weighting, because certain components have more “talent” associated with them (and thus a player is more likely to have achieved a new level of talent with respect to them).  So, if we are discounting last year’s stats 20% as compared to this year’s, we also have to discount yesterday’s PA’s a tiny amount as compared to today’s.  So on and so forth.  So you tell me at what point current (I don’t even know what “current” means) stats become “reliable,” because I have no idea.

If Tango ever did say “after 250 PA, you can consider a batter’s stats reliable,” what he meant was that that was the point (for whatever stats he was talking about, probably OBA or wOBA) at which you would regress it 50% towards the mean if you had no other history to work with.  In my opinion, the use of the word “reliable” is a poor choice as it has no inherent meaning in that context, and as I like to say, “One man’s reliable is another man’s unreliable.”

Plus, we have two issues.  One, is given no history at all, how much (percentage-wise) do we regress a player’s stats toward some mean or average (if we know nothing about the player, it is the mean of the population of all MLB players)?  The other is how do we weight each PA or group of PA, with respect to recency?  Two completely separate, albeit related, issues.


#10    Tangotiger      (see all posts) 2008/05/22 (Thu) @ 13:18

It appears that MGL and I posted at the same time, and we said virtually the same thing. 

As a matter of fact, since this question comes up so often, I’ll copy these two responses to the Mailbag.


#11          (see all posts) 2008/05/22 (Thu) @ 15:15

I completely understand what you’re saying.  In terms of prediction of large numbers of players or aggregate data even moreso, using a gradual scale is obviously going to win out.  I think I’m trying to ask a different question, though.  Let me try to rephrase…

Is there any way that we can quantitatively determine whether there has been, I’ll call it an inflection point, in a player’s performance?  There must be real changes in a players activities that would make their past performance worthless in terms of predicting future performance more-so than what is captured by simply discounting exponentially.

Take, say, Rick Ankiel.  Does his performance as a hitter when his position was pitcher, with all the associated lack of training, practice, coaching, experience, etc, have anything more than a tangential relationship to his performance as a hitter after his transition?  What about a player that undergoes major surgery?  Stops using PEDs?  Adds a pitch to their arsenal?

These changes are real, and we know that they cause changes in real performance levels (or might, in the case of PEDs).  It seems disingenuous to go about our evaluations ignoring the possibility of an “inflection point” in performance levels and continue to value all past performance the same for every player.  Perhaps there is no way to quantitatively take them into account, and we must leave this sort of thing to grizzled, pick-up driving scouts…


#12    Tangotiger      (see all posts) 2008/05/22 (Thu) @ 15:51

Yes, there are inflection points.  Of this, there is no question.

But, no, we cannot determine them.  All we can do is assign probabilities that we have an inflection point, be it Ankiel (his ascension as a hitter), Ankiel (his collapse as a pitcher), Cliff Lee (his Schilling-like control of the strike zone), Brady Anderson (his ascension as a supreme power hitter), Brady Anderson (his collapse as a supreme power hitter).

See, with all of these things, we simply don’t know if a new talent level has been reached or not.  We can’t know based on the performance data (as currently compiled).  If on the other hand we see that Eric Gagne used to throw 99mph and now throws 89mph, well, we have a stark contrast.  Same with the collapse in fastball speed of Barry Zito.  These are really “tools” or “scouting” type information.  These are subjected to far less noise than performance data (which follows the binomial, since performance data is a collection of binary outcomes… unlike say fastball speed).


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main