THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, June 19, 2007

Basic Aging Curve for Hitters, 1957-2006

By Tangotiger, 04:09 PM

I updated my aging curves, this time for the time period 1957-2006.  Here’s the step-by-step process:


1. Select only nonpitchers
2. Total on: player, year, team (i.e., Abreu gets two records in 2006) ...27,497 records
3. Determine the main home park for each team, and append that to the record (Abreu gets Citizen’s park for one record, and Yankee Stadium for the other record)
4. For every consecutive years, match on player, and main home park.  (This is important for guys who play on the same team, but switch home parks, like Pujols) ...15,715 paired matches
5. Figure the age for each player as season minus birth year
6. For each match, give a PA weight based on this formula:
2 / (1/PA1 + 1/PA2)
7. For each age, sum up the PA weights of all players, along with their wOBA in the two matched years.

You get this Google Docs

Legend:
Age1: the age of the first of the two matched seasons
n: number of players in the sample
PA: the total of the weighted PA
wOBA_1: the weighted wOBA at Age1
wOBA_2: the weighted wOBA at Age1+1
diff: the difference between the two wOBA; a positive numbers means performance improved
ratio: the odds ratio, which can be loosely described as wOBA_2/wOBA_1; a number above 1.00 means performance improved

I then added another column called wOBA_curve.  This value is based on the ratio (or diff), and allows us to create a single curve.  For example, at age 22, we see .332, and at age 23, we see .343.  That’s an 11 point improvement between 22 and 23.  If you look at the “diff” column for age 22, you’ll see a 10 point difference (consider it rounding error).  The same player, at age 23 (.343) becomes .345 at age 24, or a 2 point improvement.  The “diff” column for age 23 shows a 3 point improvement.

This is called Chaining, and it allows you to take the known differences of samples, and merge them onto a single player.

You will note what I did not do (yet): regression.  You will notice that the values in wOBA_1 constantly increase from age 22 through age 40.  However, starting from age 27, the PA goes down every year.  This is survivor bias: the only guys left in the sample are those guys deemed good enough to still be there.  And, teams obviously have an age bias.  That is, of the guys at age 40 who are allowed to play at age 41 (only 39 of them) are BETTER than the guys at age 25 and allowed to play at age 26 (1652 of them).  Of course, if we only looked at the 39 best 25 year olds, they’d be far better than the 39 best 40-yr olds.

Anyway, back to regression.  As you get older, teams will more likely depend on how you performed recently than on any “true talent” skill you may still have.  So, there is an enormous selective sampling issue.  As you can see by the year 2 performances, every single age class from age 30 onwards performed at around the .333 wOBA level.  The old man dropoffs you see in the late 30s and early 40s are biased.  The year 2 performances are more indicative of the talent than than the year 1 performances.

If you don’t apply some regression, what happens?  According to this basic chart, a 19-yr old is equivalent to a 34-yr old.  Seeing that there are 10 times more 34-yr olds and 40 times more PA given to 34yr olds, compared to a 19-yr old, that seems like a ludicrous statment to make.

I also didn’t adjust for the change in year-to-year environment, like the juiced ball or mound raised.  However, we don’t expect to have a disproportionate number of players at a certain age group in 1968 or 1987.  So, this chart would be unaffected. 

If there is a systematic bias, like the talent pool keeps increasing each year, and therefore, the difference in the paired ages captures not only the change in performance, but also the increase in talent, then this aging curve will make it seem like players age slightly faster than they actually do.

At some point, I’ll repeat this for each of the “skills” (like BABIP, K/BB ratio, HR/XBH, etc), and I’ll apply regression.  Until then…

#1    Tangotiger      (see all posts) 2007/06/19 (Tue) @ 16:47

I’m running the numbers for walks, and I get similar numbers as here:
http://www.tangotiger.net/agepatterns.txt

That is, the peak age for walks is 37 years old (unregressed).  If we allow for regression, I’d suspect the peak age for walks would be into the 40s.  However, once you hit 30, you get very very modest gains.

I suspect it’s a balancing act: your power develops in your 20s, and pitchers pitch you more careful, and your walks shoot up. 

As your power diminishes in your 30s, your walks should go down, but as you get older, you are more selective (smarter).  So, these two (the pitchers going after you more in your 30s, and the hitter being smarter as he ages), seems to cancel out.


#2    David Gassko      (see all posts) 2007/06/19 (Tue) @ 18:42

Tom,

I’m not sure I buy not adjusting for league averages. If scoring on a whole has gone up (relatively) consistently (IOW, there is a positive trend line) over the past 50 years, then I think that would impact your aging patterns, showing less of a decline as players get older than there actually is.


#3    tangotiger      (see all posts) 2007/06/19 (Tue) @ 19:02

You’re right, good point.


#4    tangotiger      (see all posts) 2007/06/20 (Wed) @ 00:28

I think I’ll take that back.  Suppose you do have a perfectly trendline from RPG of 3.00 in 1957 to 5.50 in 2006.  That’s a gain of +2.5 runs over 50 years, or +0.05 runs per year.

In the first year of your matched pair, the average run environment would be an average of:
(3.00 + 5.45)/2= 4.225 runs per game.
And in the second year, it would be:
(3.05 + 5.50)/2 = 4.275 runs per game

Because 1958 through 2005 appears in both the first year and second year of the matched pair, they cancel out.  All you have left is the very first (1957) and very last year (2006).

And, if let’s say the year 2006 was a run environment of 3.00, to match our fictitious 1957’s 3.00, then, we have no worries.  The average run environment of both pairs would be identical.

***

In 1953, there were 4.75 RPG, and in 2006 there were 4.76.  Those should have been the two end-points I should have chosen.  This would have addressed the changing environment.  Even so, using the actual 1957 (4.38) means there’s an average bump of +.0074 runs per game, or roughly, +.0002 runs per PA.

I believe if I did redo my exercise, but scaled to the league average of the year, nothing would change.

However, this does not address the changing in talent levels, since it presumes the talent levels are static.


#5    tangotiger      (see all posts) 2007/06/20 (Wed) @ 18:01

I ran the aging algorithm for every single event.  Rather than my usual binomial grouping like I’ve used in the past (which requires a certain sequencing of events), I just treated each one relative to PA, here are the results.  (Note, since walks peak later, it will have an affect on the totals you see below.  After all, if you walk, you can’t get a hit.)

PA per G: maxes at age 25, plateau of 24-29
R per PA: age 24 (23-27)
H per PA: age 23 (21-26)
2B: 26 (24-29)
3B: 21 (19-23)
HR: 26 (25-28)
RBI: 26 (24-29)
SB: 24 (22-27)
BB: 37 (29-43)
SO: 28 (25-31)
IBB: 38 (30-40)
HBP: 28 (25-29)
SH: 20 (19-21)
SF: (26-39)
GIDP: 20 (19-26)


#6    Peter Jensen      (see all posts) 2007/06/21 (Thu) @ 14:35

I was intrigued by the statement that you made above - “Of course, if we only looked at the 39 best 25 year olds, they would be far better than the 39 best 40 year olds.”

I wondered, whyisn’t that a basis for an aging study. Look at the best x number of players in each age group and compare the averages for each age.  So I decided to look at BB/PA (actually (UBB/(PA-IBB-HBP-SH))for the years 1961-2004, ages 25-35.  I took the top 8 players in BB/PA with150 or more PAs for each age in each year, leaving out any year-age group that didn’t have 8 players.

Here are the results:

AGE---AvgOfWALK Rate-PERCENT of PEAK--N--
25--------.1119------------93.2---------352
26--------.1142------------95.1---------352
27--------.1194------------99.4---------352
28--------.1201-----------100-----------352
29--------.1199------------99.9---------352
30--------.1178------------98.1---------352
31--------.1170------------97.4---------352
32--------.1129------------94.0---------352
33--------.1088------------90.6---------344
34--------.1056------------87.9---------312
35--------.1058------------88.1---------216

As you also mentioned above it is hard to separate true walk rate ability from slugging so this curve may be a truer reflection of slugging than walk rate.  But it is still interesting in that it peaks at the age that you would expect but overall is flatter than most aging curves that I have seen.


#7    Tangotiger      (see all posts) 2007/06/21 (Thu) @ 14:51

Very interesting results.

A couple of suggestions:

1. In the 26-29 age classes, you have more samples to choose from.  So, looking at the top 8 *samples* of a group of say 30 compared to say a group of 12 will have selective sampling issues.  I’d rather see you take say the top 3 or 4, or a small enough number, such that it represents, at most 25% of the given sample. In years where you have an age class where you don’t have at least 3 for each age class, drop all the age classes for that year.

Unfortunately, this may mean that you’ll be stuck with only age groups 25 through 32.  Still, that period itself will be interesting, since the contention is that the should always be increasing in walk rates.

2. Restrict it to players who were above average HR hitters over the previous two years.  Since walks and HR will follow closely, especially for the 30s crowd, I think it becomes important that you don’t have different types of players in each age group.  Then again, I think I may be biasing the sample with this restriction.  I’m not sure.


#8    Peter Jensen      (see all posts) 2007/06/21 (Thu) @ 15:08

If the aging curve had been steeper than usual at either end I would be more concerned with the smaller pool of players that I was drawing from for both the earlier and later ages.  Since the pools got to be very small at ages 34 and 35 and some years I was taking all batters that had more than 150 PAs, I would expect those years would have much more variation and consequently much less percentage of peak.


#9    Peter Jensen      (see all posts) 2007/06/23 (Sat) @ 14:30

We had a tornado here and I was out of power for 27 hours so this is a delayed response to some of the suggestions you made in post #7.

You do have more players in MLB that are ages 26-29 that get 150+ ABs than you do at age 35. Roughly 3 times as many.  However, that doesn’t necessarily mean that there is a larger player pool.  Remember that my methodology is measuring the top range of performance possible at each age.  As long as the skill being measured is an essential skill for getting to/ and or staying in the major leagues the best players at that skill are still going to be playing. 

I am reluctant to cut the included players to the top 3 or 4 because I want to keep the N up to smooth out the aging curve.  However, I did try the top 4 as you suggested and the curve had the same shape as the top 8; peak at ages 27,28,29 and gradually decrasing through age 35.

I was still concerned about the effect of slugging in walk rate so I decided to normalize for slugging by calculating the average slugging of the players at age 35 and looking at the walk rate of a selection of players aged 28 that had the same average slugging.  Here is the table for the combined slugging of the top 8 walk rate players at each age.

AGE----SLG
25----.416
26----.421
27----.430
28----.428
29----.419
30----.426
31----.426
32----.424
33----.420
34----.414
35----.424

Not anywhere near the difference that I expected!  The difference wasn’t worth trying to go through the normalization process.

So, I think I will stand by my aging curve as accurately reflecting the maximum walk rate capabilities that can be expected at any age 25-35.  Why is it different than curves generated by the delta method?  My curve is showing the limits of potential performance at different ages where the delta method is giving the changes in actual performance of players at those ages.  The players that remain at ages 35+ were superior performers in their prime ages of 27-29.  They had high skill levels in a variety of basic physiological characteristics (eyesight, reaction time, arm strength, leg strength, hand eye coordination, etc.) that allowed them to be superior.  These superior players had the option in their prime to forego trying to maximize their walk rate, as they tried to maximize their slugging and batting average instead.  As their performance in slugging and batting average started to decrease in their 30’s they still had the option to increase their walk rates to keep their overall production high enough to keep them in baseball.  The other players, starting with lower slugging and batting average and lessor skill levels overall, could not compensate enough to keep their overall performance high enough to keep them in baseball.  The result is the delta method is measuring an increase in walk rate that is due to a change in hitting strategy rather than an actual increase in walk rate ability due to age.


#10    tangotiger      (see all posts) 2007/06/23 (Sat) @ 15:53

This is rather fascinating.

I suppose someone like Frank Thomas, who was Ted Williamseque in his command of the strike zone as soon as he was born basically could only get so much better in the pros.  And perhaps someone like Sammy Sosa could leverage his power to get more walks.

Ok, so what I can do is similar to what I did here:
http://www.tangotiger.net/SpeedLead.htm

And look for the Frank Thomases and his family members, and look for his “twins”, sans walk rates (I dunno, say, Vlad).  And see how their walk progressions move through the years.


#11    Tangotiger      (see all posts) 2008/10/02 (Thu) @ 09:39

Brian does the component work I had meant to do for this time period and reports high-level results:

http://mvn.com/mlb-stats/2008/10/02/aging-patterns-its-all-downhill-from-here/


#12          (see all posts) 2008/10/03 (Fri) @ 15:56

Good thread, I wasn’t here last year and hadn’t found it yet.

Concerning procedures:
What I do in Oliver is first take the player/team/year data and normalize it, then drop team and group by player and year to get a single record for each season.

My multi-season park factors do not mean to zero for each season. For eaxmple, if there’s a bunch of bandboxes, the league HR factor will be above 1. This makes it helpful to normalize for parks from one season to another. (I think I have an idea for next week’s column).

I am interested in the “how to do” of the wOBA curve - that’s something I’ve wanted to be able to do, and I haven’t figured out your procedure from the description in the article. Could you give me some additional detail?


#13    Tangotiger      (see all posts) 2008/10/03 (Fri) @ 16:24

Brian, see if this helps:

http://www.hardballtimes.com/main/article/fielding-aging-curves/


#14          (see all posts) 2008/10/03 (Fri) @ 22:24

Got it now, thanks


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:33
Avery being Avery

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP