Tuesday, June 19, 2007
Basic Aging Curve for Hitters, 1957-2006
I updated my aging curves, this time for the time period 1957-2006. Here’s the step-by-step process:
1. Select only nonpitchers
2. Total on: player, year, team (i.e., Abreu gets two records in 2006) ...27,497 records
3. Determine the main home park for each team, and append that to the record (Abreu gets Citizen’s park for one record, and Yankee Stadium for the other record)
4. For every consecutive years, match on player, and main home park. (This is important for guys who play on the same team, but switch home parks, like Pujols) ...15,715 paired matches
5. Figure the age for each player as season minus birth year
6. For each match, give a PA weight based on this formula:
2 / (1/PA1 + 1/PA2)
7. For each age, sum up the PA weights of all players, along with their wOBA in the two matched years.
You get this Google Docs
Legend:
Age1: the age of the first of the two matched seasons
n: number of players in the sample
PA: the total of the weighted PA
wOBA_1: the weighted wOBA at Age1
wOBA_2: the weighted wOBA at Age1+1
diff: the difference between the two wOBA; a positive numbers means performance improved
ratio: the odds ratio, which can be loosely described as wOBA_2/wOBA_1; a number above 1.00 means performance improved
I then added another column called wOBA_curve. This value is based on the ratio (or diff), and allows us to create a single curve. For example, at age 22, we see .332, and at age 23, we see .343. That’s an 11 point improvement between 22 and 23. If you look at the “diff” column for age 22, you’ll see a 10 point difference (consider it rounding error). The same player, at age 23 (.343) becomes .345 at age 24, or a 2 point improvement. The “diff” column for age 23 shows a 3 point improvement.
This is called Chaining, and it allows you to take the known differences of samples, and merge them onto a single player.
You will note what I did not do (yet): regression. You will notice that the values in wOBA_1 constantly increase from age 22 through age 40. However, starting from age 27, the PA goes down every year. This is survivor bias: the only guys left in the sample are those guys deemed good enough to still be there. And, teams obviously have an age bias. That is, of the guys at age 40 who are allowed to play at age 41 (only 39 of them) are BETTER than the guys at age 25 and allowed to play at age 26 (1652 of them). Of course, if we only looked at the 39 best 25 year olds, they’d be far better than the 39 best 40-yr olds.
Anyway, back to regression. As you get older, teams will more likely depend on how you performed recently than on any “true talent” skill you may still have. So, there is an enormous selective sampling issue. As you can see by the year 2 performances, every single age class from age 30 onwards performed at around the .333 wOBA level. The old man dropoffs you see in the late 30s and early 40s are biased. The year 2 performances are more indicative of the talent than than the year 1 performances.
If you don’t apply some regression, what happens? According to this basic chart, a 19-yr old is equivalent to a 34-yr old. Seeing that there are 10 times more 34-yr olds and 40 times more PA given to 34yr olds, compared to a 19-yr old, that seems like a ludicrous statment to make.
I also didn’t adjust for the change in year-to-year environment, like the juiced ball or mound raised. However, we don’t expect to have a disproportionate number of players at a certain age group in 1968 or 1987. So, this chart would be unaffected.
If there is a systematic bias, like the talent pool keeps increasing each year, and therefore, the difference in the paired ages captures not only the change in performance, but also the increase in talent, then this aging curve will make it seem like players age slightly faster than they actually do.
At some point, I’ll repeat this for each of the “skills” (like BABIP, K/BB ratio, HR/XBH, etc), and I’ll apply regression. Until then…
I’m running the numbers for walks, and I get similar numbers as here:
http://www.tangotiger.net/agepatterns.txt
That is, the peak age for walks is 37 years old (unregressed). If we allow for regression, I’d suspect the peak age for walks would be into the 40s. However, once you hit 30, you get very very modest gains.
I suspect it’s a balancing act: your power develops in your 20s, and pitchers pitch you more careful, and your walks shoot up.
As your power diminishes in your 30s, your walks should go down, but as you get older, you are more selective (smarter). So, these two (the pitchers going after you more in your 30s, and the hitter being smarter as he ages), seems to cancel out.