THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, April 08, 2009

GuyM on JC’s aging study, at Phil’s site

By Tangotiger, 11:07 AM

Phil’s got the lowdown on JC’s aging study.  GuyM provides his comments, which reads in part:


...The most serious problem in my view is the selective sampling created by looking only at players who log 5,000 PAs between ages 24 and 35, and JC takes no steps to ensure this isn’t creating a bias. ... And in fact, if you look at a broader population, it becomes clear that it is simply not correct that aging is symmetrical around age 29, as JC claims. That would mean players are as good at 32 as they were at 26; as good at 33 as they were at 25. This is not true.

Here are the number of players who were average or better at various ages, defined as at least 400 Pas and an OPS+ of 100 or higher, from 1921 to 2006. This should give us a good look at the aging curve: if players tend to be better at a given age, more players will meet this performance threshold, and vice-versa. First, let’s test JC’s notion that performance is symmetrical around age 29:
Ages—# of players
29—683
28/30—704 / 621
27/31—725 / 543
26/32—706 / 463
25/33—629 / 353
24/34—449 / 277
23/35—320 / 214

In every pair, there are far more player seasons at the younger age. If 26-yr-olds and 32-yr-olds were truly equal players, why would we see 52% more players at age 26 (706 vs. 463)? Are baseball teams really forcing vast numbers of talented 32-yr-olds to retire prematurely? Of course not. The curve is not symmetrical around age 29.

Now let’s center this at age 27 (the mode):
27—725
26/28—706 / 704
25/29—629 / 683
24/30—449 / 621
23/31—320 / 543
Here we see a nice fit close to the age 27 peak, but it appears that the pre-peak curve is steeper than the post-peak curve. And really, that has to be the case if peak is 27-28, because we’re pretty sure that 18-yr-olds aren’t as good as 36-yr-olds.
...

I’d reprint the whole thing if Guy wants.

***

Related to this paper, is Fair’s paper, which I was asked to give peer review last year.  I broke down his paper (start at post 28), as well as whether my concerns were addressed. 

***

I discuss issues regarding the selection bias in aging studies here.  That was a really old article before I really knew what I was doing.  So, be (somewhat) kind to me on that one.

***

I have basic aging charts here using the delta method.

***

My aging article, which shows the steepness as you’d expect.

***

And my most-satisfying work to-date on the matter, where I actually control for the same pitchers facing the same hitters in the same parks in back-to-back years.  Post 27 has the payoff chart.  This is unlike any and all other aging studies to-date anywhere.  It is similar in methodology as to how we can determine the true HR changes over time, and the aging of fielding talent.  Not to mention that entire basis of WOWY.

I’m at the point that any study that doesn’t specifically isolate the identity of the batter/pitcher or pitcher/fielder pairing is suspect.  We control for these things in true experiments, and we should do so in sports studies as well.  We can’t presume that there is no bias in the opponents or teammates in year-to-year studies.

#1    Xeifrank      (see all posts) 2009/04/08 (Wed) @ 12:55

Seems like another reason for more 26 year olds than 32 year olds and perhaps with some of the other age comparisons, due to the fact that the younger players are more likely to be under “cost control” than the older players.  Not so much than they are better.  If the talent difference is the same, who would you rather have playing on your team for ONE year, a young player under cost control or an older free agent?
vr, Xei


#2          (see all posts) 2009/04/08 (Wed) @ 15:52

Coupled with Xei’s comment, in which I am in complete agreement, I’d say that a 26 year old who has a catastrophic injury at 28 won’t show up as a counterpart to himself at 32.  Also a 32 year old who hasn’t had a catastrophic injury shouldn’t have the catastrophic injury of some other 28 year old impact the projection of his ability.


#3    Tangotiger      (see all posts) 2009/04/08 (Wed) @ 16:25

Crack’s message was marked for moderation and is now open.


#4    Guy      (see all posts) 2009/04/08 (Wed) @ 16:32

Xeifrank:
I agree that the cost of players is a complicating factor.  Teams should prefer cheaper players.  On the other hand, cost concerns also cause them to keep some players in the minors longer, to prevent the service clock from running.  And I think we’d agree that many managers and GMs have a bias in favor of “proven major leaguers,” that results in veterans getting more playing time than talent alone justifies.  I don’t know how that all nets out, but I doubt that younger players get a big net advantage.

In any case, that should operate mostly at lower talent levels:  given two players slightly above replacement, teams would probably take the cheaper/younger option.  But I looked only at players who posted an OPS+ of 100 or better AND 400 PA.  I just don’t believe there are a significant number of 32-yr-olds capable of that level of performance who are sitting home in April hoping a GM will call.  And if there were, then surely they’d lower their salary demands to the level necessary to get hired (since they are better than many current players), which would eventually raise what we call “replacement level.” I just don’t see it.....

* *

I don’t have a strong view on whether “peak” is 27, 28, or 29, and maybe the truth is that those ages form a plateau within which we can never really identify a single peak. I think that debate frames the issue too narrowly.  What interests me is the rate of growth and decline on either side.  And I am certain that performance declines after 29, and at a faster rate than JC’s model suggests.  His linear weights model actually has peak at 29.5, meaning that he has 26-yr-olds and 33-yr-olds equal, and also 25 and 34.  That’s just silly—about half of all position players are out of the game by age 34. Pull any reasonable sample of 26-yr-olds you want and compare their age 33 performance—I’m sure you’ll find a decline (even ignoring those who can’t play any more). 

But maybe I’ll try raising the threshhold a bit higher and report back on the results.....


#5    MGL      (see all posts) 2009/04/09 (Thu) @ 02:05

I don’t really understand how this kind of study is going to yield any accurate results simply because of the tremendous bias that Guy mentions - that of only including players with at least 5000 career PA.  Clearly, that underrepresents players who don’t age well and overrepresents players who do - exactly what Guy said.  That is a fatal mistake - and a particularly egregious and significant one.  What does JC say about that?

And why all this complicated regression in order to get “an approximation” of peak age (for any component)?  I can do a “delta” study in about 20 minutes and come up with an accurate and easy to understand answer - albeit with some similar selective sampling issues, but not nearly so much as JC’s.

I don’t understand this at all. This is going to get published and the next thing you know, people will be quoting his results as “proof” that players peak at around age 29?  That is a joke.

Also, the notion that the aging curve is symmetrical is also a joke.  Again, I can do a delta analysis in 20 minutes and show you that the aging curve is not nearly symmetrical.

How about if we just look at the weighted (by whatever you want) average for ALL players of the OPS increase or decrease from age 27 to 28?  I gurantee that it will be negative (a decrease).  In fact, I am willing to wager $1,000 on that. JC wanna take me up on that wager?  He would have to think that is a lock bet for him, or he must have some explantion for his showing that peak age is 29, but somehow all players in history have their OPS (or whatever measure you want of overall production) go DOWN from 27 to 28.


#6    Tangotiger      (see all posts) 2009/04/09 (Thu) @ 07:02

If the players in his sample shows a peak age of 29, that does not mean we get to extrapolate that to all players.  Not unless JC proves that the in-sample players are representative of the population of MLB players.

This is no different than anything else we do, like MLEs or park factors.  The MLEs are strictly for those players that the teams selected to play in MLB. 

We cannot presume that SF’s home park affects all LHH the same on HR, if a significant portion of HR hit by LHH in SF are by Barry Bonds.

I can go on and on about analysts extrapolating their results to all players in the population based on a non-representative sample of players of that population.  Regardless of how many players you select.  Bias is bias, and a large quantity, 20% or 50%, still represents a source of bias.


#7    Guy      (see all posts) 2009/04/09 (Thu) @ 08:53

Tango:  did you write the follow-up article referenced in that THT piece on SS aging?  If so, could you post a link?  (I couldn’t find it.)

And FYI:  it’s showing Studes as the author.


#8    Tangotiger      (see all posts) 2009/04/09 (Thu) @ 09:46

Guy, I did one for CF linked in the thread here:
http://www.insidethebook.com/ee/index.php/site/comments/fielding_aging_curves/

Pretty much, they had all followed a similar curve, so I didn’t bother publishing the results.

***

MGL also published his a few years ago using UZR:
http://tangotiger.net/mgl/

See the PDF called “agecurve”.  Again, pretty much what we expect.


#9    Tangotiger      (see all posts) 2009/04/09 (Thu) @ 10:37

Guy, thanks, studes fixed it.


#10    Guy      (see all posts) 2009/04/09 (Thu) @ 12:05

That Studes—always trying to claim credit for others’ work.  :>)

“I don’t really understand how this kind of study is going to yield any accurate results simply because of the tremendous bias.... of only including players with at least 5000 career PA.”

JC is not alone here.  Jim Albert’s study was limited to hitters with a minimum of 5000 AB, giving him an n of 473.  Ray Fair looked only at players with at least 10 full seasons and only at seasons with at least 100 games, with a sample of 441.  JC also requires at least 10 seasons, plus 5000 PA from ages 24 to 35, giving him n=450.  The age constrainst seems especially problematic, as it virtually requires the player to still be productive at ages 32-35, and is probably not unrelated to his finding of a 29-30 peak. 

So all of these end up with about 450 players.  It seems obvious to us that you can’t study the impact of age by looking only at the fraction of players who are still productive at an age most major leaguers are retired.  How can economists make such an elementary error?

I think the answer must be a variation on the old saw that “to a man with a hammer, every problem is a nail.” To an econometrician, the answer to all puzzles is using regression.  And if you’re going to use a regression model to build an aging curve, then you need players with reasonably long careers.  You don’t worry too much about the selective sampling problem, because—in your universe—there is no alternative way to solve the problem.  (I suppose it’s possible they all considered the delta method, and were stymied by the selective sampling issues.  But far more likely is they never considered using such a “primitive” tool.)

Oddly, JC recognizes that young and old players will be above-average players, and so excludes players under 24 or over 35.  But he’s worried about the wrong selection bias.  He has the ability to correct for the players’ talent, and does so, so that’s not a big problem (unless good players age differently).  The problem for this study is selection bias based on players’ aging pattern—he’s looking only at players who age well.  And he magnifies the problem by arbitrarily choosing cutoffs that are apparently not symmetrical around the true peak. 

* *

MGL:  when you say the aging curve is clearly not symmetrical around the peak, on which side are you suggesting it is steeper?


#11    MGL      (see all posts) 2009/04/09 (Thu) @ 16:15

All the aging studies I have ever done (using the delta method, which, as I said, takes like 20 minutes to do) show a curve which is much steeper at the left (young side).  I guess I say that is obvious because of the reason for the curve in the first place - which is that players gain experience and grow physically on the left (most hitting changes are due to physical changes) and then gradually lose their physical skills, but not the experience, on the right. I guess that is not necessarily obvious unless you actually compose some aging curves, but if I had to guess at the shape, I am pretty sure I would guess the correct shape. I mean, if we plot a curve of physical development from birth to death, what is that curve going to look like?  Is it going to be symmetrical?  You quickly gain size and strength from age 0 to whenever and then you slowly lose it after that.  Should it be any surprise that baseball skills, and hence performance, would be any different?

Guy, you are probably right with the hammer and nail analogy, but I have no patience for these kinds of poor studies.  None, whatsoever.  I could give an economist who knows nothing about baseball a pass (the problem I spoke of in the basketball thread - researchers doing sports research who know nothing about the sport).  But JC?  You’ve got to be kidding me…


#12    TangoTiger      (see all posts) 2009/04/09 (Thu) @ 16:42

If some people are having a hard time following along, let me ask them some simple questions:

1. Player plays from the age of 24 to 27.  What is the chance that his MLB performance data shows that he peaked at age 28?  30?  35?

2. Player plays from the age of 26 to 31.  What is the chance that his MLB performance data shows that he peaked at age 25?  33?

Those aren’t trick questions.  The answer is zero.  Since most players who have short careers have their last seasons in MLB prior to the age of 30, all their “peak” seasons will be somewhere between age 21 and 29.

So, by discarding all short-career players, you automatically bias the results the other way.  The discarding of the data was neither random, nor proportionate, but rather heavily skewed.



Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 05:00
Help needed with sticky issue…

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards