THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Sunday, April 22, 2007

Peak Offensive Age

By , 07:24 PM

Here is a blog entry about peak offensive age.  In it, the author references a study by Jim Albert which looks at the issue using some advanced and complex (to me at least) statistical techniques.  Here is a post I wrote on BTF which also references the above-mentioned blog entry.  My post sums up my thoughts on the matter.  I was wondering what the many bright minds on this blog think of this issue.  As I say below, I think the answer (what is peak offensive age) really illudes us.  And it makes a difference when doing projections.  Age ajdustments are important.


The study is interesting. I’m not sure I had read it. Maybe I had. Whether James said so or not, I think he was referring to offense only. Even if he meant offense and defense, at the time he had no good way to measure defense and I don’t think he knew that defense peaks so early for most positions (probably all save first base).

There are other (easier) ways to estimate average peak age and to be honest I don’t fully understand Albert’s although he seems to know what he is doing. The traditional sabermetric way is to compute a weighed average of all players’ difference in offensive rate between each pair of adjacent years and to see when that number starts to turn negative. IOW, we look at all player’s difference between age 22 and age 21. That should be positive on the average. Same for 23 and 22. Etc. If we do that, we generally find that from 27 to 28, the difference (28 minus 27) is slightly minues and is then minues from then on, suggesting that 27 is the peak age.

The problem with that methodology is that if the league gets better every year, which we think it does, then that introduces a negative value into every one of those differences which has to be subtracted. For example, if the league gets better by .001 per PA per year and the difference from age 27 to age 28 is -.0005, after adjusting for quality of league, we get +.0005, suggesting that players are still getting better.

The other problem with the “delta” method above is that there is a selective sampling problem that exists in every age pair which also tends to make us understate peak age. Any time a player has a very unlucky (bad) year, he tends to get fewer PA’s or no PA’s the next year. So what does this mean? It means that the first year in ANY pair of years tends to be ppulated by players who were slightly lucky, again, tending to make the difference between year X+1 and year X more negative than it should be if we simply let every player play X number of PA from age 20 to age 40 (or whatever).

I don’t know if Albert’s method had a similar bias (the selective sampling) and I don’t know how an increase in league strength every year affects Albert’s model either.

One thing about Albert’s model that I do know is that since he limits his data to players who have accumulated at least 5000 PA, we may have a sample of players who have gotten a little lucky at some point in their careers (I am not sure if this would be early or late) and we definitely have a sample of players who probably had a somewhat later than average peak (otherwise they would not have played that long). Whether this somewhat later than average peak is inherent to their true aging curve or whether it was by luck also, I don’t know.

As a few posters above have alluded to, of course, the whole thing depends on what sample of players we want to determine true peak age for. Is it players who play a long time. These are necessarily good players. Do good players have naturally late or early peaks? Are their true peaks different from other players? If different kinds of players have different natural peaks, and we want to answer the question, “What is the peak age for the average player?” do we want to weight our findings by playing time? By the number of players in each group? Do we want to know the average peak age of players who actually end upo having some kind of career? Or do we want to include the peak age of players who fizzle out and don’t end up having a career (maybe they had natural early peak ages)? I don’t know the answers to these questions but they are important ones, and difficult to answer.

I honestly think the jury is still out as to the natural peak age of MLB players, however we want to define that group of players. I think it is probably earlier than Albert finds and later than I and others (e.g. Tango) have found in the past.

#1    Pizza Cutter      (see all posts) 2007/04/22 (Sun) @ 21:12

In my business, we call this sort of problem an attrition bias.  Bad players don’t ever get a chance to show what they can do at 27, because they get sent to AAA/Japan/the real world.  I wonder if the trajectory of the curve is related to overall ability.  (If we were modeling it like a projectile trajectory, the quadratic term… x^2… has a smaller number in front of it.)

I’m thinking out loud.


#2    Guy      (see all posts) 2007/04/22 (Sun) @ 21:38

One observation:  Albert is using a rate stat in his study (linear wts per PA), while James was using a measure of performance effectively weighted by playing time ("VAM").  It’s certainly plausible that players log more PT at ages 26-27 than at 28-30, on average.  So I think the poster at WOW is off base to assert that James was “wrong”, based on Albert’s work.  If one thinks peak should mean “season with most value”—certainly a plausible definition—James may well be right.


#3    Tangotiger      (see all posts) 2007/04/22 (Sun) @ 22:50

Here’s my study from several years ago:
http://www.tangotiger.net/archives/artAging.shtml

We have a huge attrition bias at work.  Go through my study, and you see the rather obvious: a guy with a two-year career will have his stats deteriorate.  Hardly makes sense if the stats are random, but makes perfect sense in a real sense: a guy who stinks it up the second year most likely had a run of bad luck.  Same thing, but to a lesser extent the three-year guy.. and the 4-yr, etc.  On the other hand, a guy with a 15-yr career means he was a bit luckier than his true talent should dictate… maybe not as injured, etc.

If you start talking about “the average MLB player”, that itself is also a selected sample.  What is an “average MLB player”? 

It should be a given that some people peak at 16, at 19, at 24, at 28, at 34, at 42.  Each person is different, and there is no “average MLB player”.  If you are talking about the average of each player, then, are you weighting the guys with more PA more?  There were 900 nonpitchers last year in MLB (30 per team), and a few thousand players in the last 10 years.  Are you weighting each guy equally?

There’s an enormous selection bias, attrition bias, and sampling bias.  Even the question itself is ambiguous.

To the extent that the question is: “When will Albert Pujols peak”, then clearly we don’t really care about the aging curve of Angel Berroa.  What you do care about is Hank Aaron, Mickey Mantle, maybe Cesar Cedeno, etc.  As well, you care about the Ryan Howards, guys who you don’t even know how good they really were, because they were kept down in the minors too long.  (Did you realize that Pujols and Howard are virtually the same age?)


#4    MGL      (see all posts) 2007/04/23 (Mon) @ 03:53

Well, in terms of projection algorithms (methodologies) which don’t use “similar players” as Pecota does, these questions are important to answer.

For example, if we simply have a 27 year old player who has been a full time player in the bigs for several years and he has a lwts rate of exactly zero in his age 27 year, a little less in his age 26 year, and a little less than that in his age 25 year, what do we project him at in his age 28 year?

I think we need to know what the aging curve looks like (without attrition biases, selective sampling biases and the like) for this kind of player.


#5    Peter Jensen      (see all posts) 2007/04/23 (Mon) @ 09:02

I am not sure why anybody would think that any generic aging studies would help that much in predicting any individual player’s aging curve. As the graphs in the Albert study showed there is just too much individual variation for a “one size fits all” approach to aging projection.  An appropriate weighted year system (on rates, not playing time) would get the upslope and downslope pretty close, with the majority of the error at the year after the peak age for the player.  That may be the best we can do for individual players.

I also think that there is some confusion in the first 2 or 3 years of major league play between age related (physical body changes) skill improvement and time of service related improvement due to experience.


#6    tangotiger      (see all posts) 2007/04/23 (Mon) @ 10:33

In addition to comments posted on the dberri blog entry that MGL linked to (including me), there is a “us v them” blog entry by Phil, along with commentary by me (Tom):
http://sabermetricresearch.blogspot.com/2007/04/another-academic-champion-of-peer.html


#7          (see all posts) 2007/04/23 (Mon) @ 12:58

By the way, I’d refer to my comments as “them vs. us” rather than “us vs. them”.  I have never seen anything to suggest that anyone in the sabermetric community is biased against academics.  We have nothing against “them,” and we welcome their work.  To me, good research is good research, and I don’t care if it comes from academics, non-academics, or space aliens. 

In the last BTN (http://www.philbirnbaum.com/btn2006-11.pdf), we had one article from an academic, one from non-academic professional sabermetricians, and one from—well, I have no idea what the third one does for a living, which I suppose kind of makes my point.  In the previous issue, we had an article from someone who turned out to be a high-school student.

Let the record show that it’s “them” academics complaining about the research conventions of “us” non-academics, and not the other way around.


#8    tangotiger      (see all posts) 2007/04/23 (Mon) @ 13:50

I agree with Phil.  Drop the title and the clothes, and we’re all the same.


#9    Rally      (see all posts) 2007/04/23 (Mon) @ 14:30

I agree with Phil as well, but its something I never would have thought about if I hadn’t read his post.

I really don’t care what academics think about our research methods.  I know a lot of people, academics and non, who don’t take sabermetric research seriously, mostly because they aren’t baseball nuts.  That’s fine.  I’ve found this small corner of the internet where people do take it seriously, and that’s all I need.


#10    Pizza Cutter      (see all posts) 2007/04/23 (Mon) @ 14:41

Something I often say to my students is “Never let someone talk down to you because they have more letters after their name.” Then again, I often tell my mother that the only reason I want to get a Ph.D. is that everything that I say is automatically right!  After all, I (will eventually) have a Ph.D.


#11    MGL      (see all posts) 2007/04/23 (Mon) @ 17:38

Peter, we have no idea (at least I don’t) that players have large individual variations in their own aging curves.  What you are seeing is merely very small samples of their own true aging curves.  Players may have very distinct curves and players may not.  Your argument could be used with respect to clutch hitting.  If we looked at everyone’s clutch hitting splits, we would find that “there is just too much individual variation in player’s clutch hitting splits to use a one size fits all model.” That would be wrong of course.  After doing the analysis, we find that there IS a one-size fits all model with repsect to clutch hitting and that is that we can assume that everyone has around the same true split which is “zero.” If we didn’t know that we would come up with some erroneous conclusions about any given player’s true clutch hitting splits.

Same thing with aging curves.  First we have to determine whether there is indeed great individual variation in true curves (not sample ones).  I think that would be hard to do, but maybe someone like Andy or Pizza Cutter or Albert could do that.  Then we would apply that to our projection model.  But until that is ascertained it is OK (probably more than OK - necessary) to use a “one-size fits all” approach.  You don’t throw the baby out with the bath water, by not using ANY age adjustment just because individual players may have their own unique aging curves (which I admit that they probably do).  And just using a basic Marcel without age adjustments is NOT going to capture each player’s unique aging curves, assuming they do have unique curves, as there is WAY too much random noise in a players career trajectory.

And just for the record, it is not really important to determine whether players have unique individual skills with respect to a certain talent, but the MAGNITUDE (spread, variance, etc.) of the difference in that talent among the population.  If that spread is small, such as with clutch hitting and pitcher’s BABIP, then not only is it OK to use a “one-size fits all” methodology but we MUST use one (or regress the sample stats the appropriate amount).

And BYW, I agree that not enough is known about the role that experience and not simply age plays in a batter’s career trajectory.


#12    David Gassko      (see all posts) 2007/04/23 (Mon) @ 17:44

Mickey,

On individual aging curves: Have you read Chris Constancio’s essay in the THT 2007 Annual? He extended that research for the projections we made in the THT 2007 Season Preview, and it was quite fascinating.


#13    Pizza Cutter      (see all posts) 2007/04/23 (Mon) @ 18:04

The problem with any growth curve modeling is that you’re most often dealing with the development of _several_ underlying parameters.  I’m not a developmental psychologist by any stretch of the imagination, but I do read a lot of child development literature.  It’s not that players (or kids) have differing curves per se, it’s that we’re dealing with the interaction of several factors that lead to growth.  Baseball (and anything in life) is the sum of several abilities marshalled together (and some luck).

Consider, a rookie will grow physically and mentally through his experience in the league.  Physical growth without “learning the game” is a lot of raw power without much in the way of refinement (see Branyan, Russell).  Knowing the game but not having the physical gifts is equally problematic.  But, when both happen together, the player “develops.” With kids, we speak of moments where they just “get it” especially around language development.  Usually, it’s because one part of their development needed to work with language (semantic classification) finally catches up to their abilities in producing sounds.  It looks like a sudden change, but it’s just been a case of uneven development.

Now, the math behind that is a little beyond me right now (I went to a conference presentation on it last year).  I’d have to bone up a bit before I jumped into that.  Maybe over the summer…


#14    MGL      (see all posts) 2007/04/24 (Tue) @ 16:52

David, I re-read Chris’ chapter in the Annual.  While interesting, I would have liked to see it extended to a higher age, peak age determined, and more variables than just ISO.


#15    Joe Arthur      (see all posts) 2007/04/24 (Tue) @ 21:08

Although he did not emphasize this, one other inference to consider from the Albert study is that “peak age” may be historically contingent; different physical skills probably have different curves, and different eras have emphasized marginally different skill sets. Also, intelligence is a kind of skill, and more intelligent players presumably make better use of their increasing experience to leverage their purely physical skills. They may tend to reach their performance peak later than their purely physical peak (if there is a unitary physical peak). Is it only a coincidence that relatively more college players are in the game now and the peak age appears to be increasing? It would be interesting to use college attendance as a proxy for intelligence and isolate these players as a group; this might be possible from the Lahman database.

As an illustration of historical contingency from another sport, I found the age of almost every 100 meter world record setting (or record-tieing) male in track since 1912. Until 1983, the average age of the record setter was 22.4 years, and only 3 of 49 were over 24; several were in their teens. Starting in 1983, the youngest of 16 record setters was 22.6 years old and the average age increased to 25.8 years. This change is certainly substantially due to the end of amateurism in the Olympics in this sport in about 1982, so that top athletes realistically could continue to compete after college. Steroids may also have played some role in improving the performance of older athletes [but Carl Lewis is thought to have been clean and was the oldest record setter in this period.]

Similar factors are present in baseball with different weighting; it has been noted before that improved pay has allowed most players to train year round instead of getting off-season jobs, starting perhaps in the 1980s.

Anyway I am skeptical that you can take fifty years or more of data and identify a fixed peak age for baseball.


#16    MGL      (see all posts) 2007/04/24 (Tue) @ 23:29

Joe, in Chris’ piece in the THT Annual, he found that college draftees (as opposed to high school and international) had a much steeper aging curve wrt ISO.  He theorized that it was because the high school players had more professional experience before age 21 (he looked at aging curves from 21 to 25) than college players.  It could be due to intelligence, as you theorize, or some combo, as always…


#17    tangotiger      (see all posts) 2007/04/25 (Wed) @ 16:41

Calculating age as season minus birth year, this is what I did.

1. Figured the wOBA for all players
2. Figured the wOBA for all seasons (excluding pitchers)
3. Subtract 1 from 2 for each player
4. Limited my pool of players to those with at least 300 PA each year, from age 23 to age 32 (10 seasons of 300+ PA)
5. Limited my pool of players to those born since 1895

That gives me a pretty small number (168 hitters).

6. Figured the age where each player had his personal peak wOBA:

Age n
18 1
19 1
20 7
21 7
22 6
23 9
24 5
25 16
26 15
27 16
28 13
29 9
30 7
31 11
32 9
33 6
34 11
35 4
36 5
37 4
38 1
39 2
40 1
42 2

The average of the above is 28.3 years old.  However, don’t forget I looked for guys with age 23-32 seasons.  There were many peak seasons that occurred after that age.  In essence, guys who stick around get to enjoy the chance of putting up peak seasons, while prior to age 23, alot of those guys are putting up peak seasons in the minors, and don’t count here.

If we only look at a player’s career as age 23 to 32, this is what we get as the peak performance:
Age n
23 12
24 9
25 25
26 20
27 20
28 18
29 15
30 15
31 21
32 13

The average is 27.6 years old.  And, as you can see, it’s alot more uniform than the first chart (which suggests not a real peak as we’d think, but a wide plateau of hitting a peak). 

Simply put, a guy who peaks at age 35 or 37 likely had his second best peak at age 32 rather than at age 27. 

So, these “peak ages” have a habit of stretching out the peak period, if you look at a certain class of players.

***

Changing my criteria in step 4 to:
4. Limited my pool of players to those with at least one season of 300 PA in their career

I get 2786 players (includes guys with only one season of 300+ PA).  The average age is 27.7 years old.

Here is the average peak age by birth half-decade.  (I extended the birth year back to 1855):

Start End peakAge n
1855 1859 27.5 119
1860 1864 26.5 121
1865 1869 27.4 107
1870 1874 28.1 89
1875 1879 28.0 122
1880 1884 27.9 146
1885 1889 27.6 150
1890 1894 27.1 142
1895 1899 27.4 128
1900 1904 28.4 128
1905 1909 28.2 128
1910 1914 28.6 125
1915 1919 28.3 174
1920 1924 28.3 119
1925 1929 27.8 122
1930 1934 28.0 128
1935 1939 27.8 143
1940 1944 27.3 168
1945 1949 27.3 178
1950 1954 26.5 200
1955 1959 27.5 160
1960 1964 28.4 218
1965 1969 28.7 202

As you can see, the anomoly is the 1950-1954 birth period, which is when there was a run (no pun intended) on gazelles.

While the peak age in 1965-1969 birth year is an historic high (28.7 years old), it would fall right in line with 1900-1924 birth years.

So, the first thing we should do is stop talking about “peak” as if it’s a mountain where the slopes are well-defined.  It’s more like a rolling hill, with a “peak” barely noticeable. 

The second thing is to stop talking about “guys peak later these days”.  It may very well be that the profile of a player these days is disproportionate to what it was 15 years ago.  That is, each profile of players might “peak” at their same age, say 25 for speedsters and 29 for power hitters, but since there are far more power hitters these days, this brings up the average.  And when the gazelles ruled the world, it brought down the average.


#18    tangotiger      (see all posts) 2007/04/25 (Wed) @ 16:53

Little bug in my last chart.  Conclusions don’t change, but here’s the data:
Start End peakAge n
1855 1859 27.9 119
1860 1864 26.3 121
1865 1869 27.3 107
1870 1874 27.9 89
1875 1879 28.4 122
1880 1884 27.8 146
1885 1889 27.5 150
1890 1894 27.6 142
1895 1899 27.8 128
1900 1904 28.1 128
1905 1909 28.4 128
1910 1914 28.7 125
1915 1919 28.1 174
1920 1924 28.4 119
1925 1929 28.1 122
1930 1934 28.2 128
1935 1939 27.7 143
1940 1944 27.3 168
1945 1949 27.9 178
1950 1954 26.9 200
1955 1959 27.7 160
1960 1964 28.5 218
1965 1969 28.8 202


#19    tangotiger      (see all posts) 2007/04/25 (Wed) @ 17:04

This is what I did:
1. For the player’s peak season, figured out his SLG relative to the league
2. A guy who was 90 points or more above the league is “Power”, a guy who was -36 points or worse relative to league is “Punk”, and -36 to +90 is “Rest”.

SLGclass Start End peakAge n
Power 1895 1969 28.8 521
Rest 1895 1969 28.0 1288
Punk 1895 1969 27.3 512

The “Start” and “End” refers to the birth year.  As you can see, your power hitters peak at a later age than your punks.

Here’s guys born from 1940-1969:
SLGclass Start End peakAge n
Power 1940 1949 28.4 89
Power 1950 1959 28.5 79
Power 1960 1969 29.1 116

Power hitters are peaking a bit later, and there’s alot more of them.

Punk 1940 1949 27.2 78
Punk 1950 1959 26.1 85
Punk 1960 1969 27.9 67

The number of punks born 1950-1959 actually exceeded the sluggers, but now there are twice as many sluggers as punks.  As you can see, the punks born 1950-1959 peaked very early (probably alot of one-dimensional guys who couldn’t hack it in their later years).

The big takeaway is: don’t presume the composition of the league is similar in any given time period.  What you think you are seeing (later peaks) may be explained partially by the change in personnel.


#20          (see all posts) 2008/09/13 (Sat) @ 01:33

Part of designing my projections tool was to look at how closely the projections fit the following year. After looking at the rms for each component, I checked the delta to look for indications of an age based bias.

Running all major league players (including pitchers) 1954-2007, no minimim PAs, by using matched pairs comparing the same player in year 1 and year 2, weighting by the smaller number of plate appearances in the two years, then checking each component seperately. By including everyone, it maximized the sample size, which gave me some smooth curves.

Power numbers (homeruns, doubles) peaked earlier than I thought, at about 25. I reran, filtering by career hr/(hr+bip), broken into five groupings. The higher the career hr%, the later in age the peak occurred, with > .08 peaking at age 29.

Which was the cause, and which was the effect? Did players peak later because they had a higher hr%, or did they have a higher hr% because they peaked later?

Thinking that some of the problem was an attrition bias, I set a minimum of 4000 career PAs, then reran the same hr% groupings. Now, the players at all hr% levels, even those < .025, peaked at 29-30.

Players who had a long career, but who hit homers at less than 60% of the league average did not stay in the majors due to their hr rate, but they still peaked at the same age as those players in all other hr% groupings.

As the long playing career and the hr% appear to be independent, is it safe to assume that this is a more accurate aging curve for hr%?


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 13:18
Do pitcher’s reach back for velocity when needed?

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves