THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, December 20, 2010

THT Forecasts

By Tangotiger, 02:10 PM

Tim Lincecum:

Year     Org     Age     IP     ERA     WHIP     H     K     BB     HR     K/9     BB/9     HR/9     WAR
2011     SFN     27     217     3.26     1.21     186     243     77     17     10.1     3.2     0.7     5.2
2012     SFN     28     214     3.50     1.26     193     236     77     18     9.9     3.2     0.8     4.6
2013     SFN     29     212     3.65     1.29     197     229     77     19     9.7     3.3     0.8     4.2
2014     SFN     30     211     3.81     1.32     202     222     77     19     9.5     3.3     0.8     3.8
2015     SFN     31     209     4.02     1.36     208     214     77     20     9.2     3.3     0.9     3.2
2016     SFN     32     206     4.32     1.42     215     204     78     22     8.9     3.4     1.0     2.5

First thing I focused on was the WAR column.  You guys already know what my expectations were, and, basically, they were satisfied as being reasonable prima facie (i.e., passed the sniff test). 

Now, I don’t know that I agree with how Brian got there.  A pitcher forecast for 217 innings in 2011 would also be forecast for 206 innings in 2016?  I’ll say that I haven’t look too much into this issue, but my rule of thumb is the Rule of 10, meaning 10% drop in IP year-to-year.  Now, maybe for the elite, like Lincecum, and at such a peak age like 27, it should be more like 5% or even 3% perhaps.  The ERA however is climbing by about 0.20 runs per game, whereas my Rule of 10 would have said it would be 0.10 runs per game.  So, basically, Brian’s overaggressiveness in one aspect and underaggressiveness in another aspect (relative to my expectations) cancel out to give us what we see.

Nonetheless, I subscribed, partially to support THT and partially to support Brian, both of which I love (as a saberist anyway), and knowing that Brian is always willing to tinker with his model in face of our suggestions.

Ideally, Fangraphs, THT and BPro would merge into one, but we’re not there.  Until then, I’m spreading the love.


#1          (see all posts) 2010/12/20 (Mon) @ 14:30

Following a tangent from your last paragraph…

Ideally, Fangraphs, THT and BPro would merge into one, but we’re not there.

Ideally, because?


#2    Sky      (see all posts) 2010/12/20 (Mon) @ 14:33

I thought the Giants’ 2011 IP forecasts were high in general—all five starters with 200-220 IP.  All round numbers, so I wonder exactly how playing time estimates are done.

--- ---

Unrelated interesting tidbit: Initial team forecasts have NL with win advantage, 82.5 to 78.5 (ish).  Some FA signings to come, but that’s a solid talent swing.


#3          (see all posts) 2010/12/20 (Mon) @ 14:40

Sky/2, if it’s like last year, the playing time assignments on the depth charts for 2011 are done with human input.  For hitters, it was assigning a percentage of playing time at a position or positions, and for pitchers it was splitting up 1440 innings however you wanted among the pitchers on the team.  We updated the depth charts on a weekly basis during the season.

I’m sure Brian or David can let us know if they’ve changed anything about that for this year.


#4    Sky      (see all posts) 2010/12/20 (Mon) @ 14:46

Thanks, Mike.

That implies the “conservativeness” of playing time projections will vary from team to team.  Could make a big difference if Team X’s “Tim Lincecum” only gets 180 IP.


#5    David      (see all posts) 2010/12/20 (Mon) @ 15:05

Starlin Castro begins next season at the age of 21

2011 WAR: 1.2
2012: 1.8
2013: 2.0
2014: 2.1
2015: 2.1
2016: 1.9

I’m not sure I agree with 2011, but ignoring that, the improvement to 2012 seems fine, but what happens as he’s entering his age 23 season?  Shouldn’t we be seeing him continue to improve until he’s 27 or 28?  He’s at his peak at 23 to 25 and declining at the age of 26.


#6    Tangotiger      (see all posts) 2010/12/20 (Mon) @ 15:10

Mike: well, ideally for me, at least.

***

Generally speaking, whenever you have duplication of effort, mergers will remove that amount of waste.

And, there’s probably more impact from synergy by combining forces than there is by getting impact from competition-driven improvements.

On the other hand (arguing against my ideal scenario), if the marketplace can support three sites that highly overlap, then there’s more money to be made by staying separate.


#7    studes      (see all posts) 2010/12/20 (Mon) @ 15:16

Sky, Mike is right about playing time.  It’s set by team “experts” from the Internet, not by a computer algorithm.  They are then updated weekly during the year, based on injuries, etc.

Tango and I have talked about the good vs. bad of having separate sabermetric sites.  I think separate sites are good for everyone involved--competition and freedom of expression, etc.


#8    Tangotiger      (see all posts) 2010/12/20 (Mon) @ 15:21

David/5: no, you can’t look at WAR on its own like he’s “declining”.  WAR is talent x playing time.

He could very well be increasing in talent while decreasing in playing time.  Injuries or other things of attrition, need to be included.

Indeed, what you are showing is kind of what I would expect “on average”.


#9    David Gassko      (see all posts) 2010/12/20 (Mon) @ 15:22

I believe that Brian is still working on playing time aging curves for pitchers that he is comfortable with. The hitters he’s already done. So the simple answer is that the IP numbers for years beyond 2011 still need some tweaking, and will presumably go down once Brian figures out a curve that he is comfortable with. Of course, Brian can give you a much better answer himself if he reads this post.


#10    Rally      (see all posts) 2010/12/20 (Mon) @ 15:23

"He’s at his peak at 23 to 25 and declining at the age of 26.”

That’s not unique to Castro, look around at every young player you can find and you’ll see a peak age about 24.  On aging curves, Brian is the anti-JCB.


#11          (see all posts) 2010/12/20 (Mon) @ 17:18

As one of the “experts” (Royals) in the forecasts, it is tough to really guesstimate time right now.  Especially for non-starters and players in position battles. 

I like the method and the numbers can change as the season goes on.  So if Timmy gets hurt, we are to go back and adjust his playing time on a regular basis.


#12    MGL      (see all posts) 2010/12/20 (Mon) @ 19:17

There should be no reason why the peak age for virtually all offensive players is not 26-28.  Rate-wise, that is.  I have never done or seen any research on playing time aging curves.


#13    David      (see all posts) 2010/12/20 (Mon) @ 19:43

I realize playing time matters, but it’s basically the same for Castro for those years. 

Year: PA  wOBA
2011: 525 .308
2012: 564 .319
2013: 558 .323
2014: 548 .325
2015: 551 .324
2016: 547 .321

So 24 is still his peak age in terms of wOBA.  22 is his peak age in playing time. 

I’ve always understood the aging curve in that the best a player was in wOBA or some other similar metric was 26, 27 or 28.  Is this not true for players that break into the big leagues at the age of 20?  If so, is there an article I can read about why that is.

Thanks.


#14    Brian Cartwright      (see all posts) 2010/12/20 (Mon) @ 19:55

As has been said above, I have an algorithm for estimating future playing time that’s much better worked out for the hitters than pitchers. First I do a weighted mean of the past three years. (That’s about as far as it goes for pitchers). Then I apply a regression formula to get a better fit for the next season, then finally another multiplier based on how good the player is relative to other players in his league. The best guys get bumped up, the worst go down.

The numbers I provide are in the six year forecast. The current season playing time is provided by the stringer, er I mean team expert employed by THT who does the MLB depth chart, dividing up the available PA’s and IP’s as they see fit.


#15    Brian Cartwright      (see all posts) 2010/12/20 (Mon) @ 20:01

My aging curves have been discussed here a few months ago. I used MLB, minor league and Japan, looking at players who were in the same league in consecutive years. As David has shown with Castro’s numbers in #13, it’s peaks at 25 for both batters and pitchers, but is rather flat from 23 to 27, especially for pitchers.

Some players continue to improve until 27 or later - these guys also then get to a higher peak, and are more likely to be in MLB. Thse who peak before 25 likely never get to a level of play high enough to make it to MLB. Now, if we can figure out how to predict any given player…


#16    Brian Cartwright      (see all posts) 2010/12/20 (Mon) @ 20:11

Some things I am doing new this year -

1. Regression amounts have been increased slightly. I ran best fit test on each category for batters and pitchers, comparing regressed projections to the next season’s MLE.

2. Players are now regressed to the mean MLE of players with their same age, position and level, instead of just level. So a 19 year old in Double A will be regressed to a higher mean that a 21 year old in A, and a 1B will be regressed to a more power hitting mean than a SS.

3. I am calculating and applying age effects based on the player’s exact, decimal age. Before, a player born 1 Aug 1990 would be 19 in 2010, while another born a day earlier, 31 Jul 1990, would be 20. That would give the player born on Aug 1 a full year of extra development. Now the Aug 1 player is considered 19.999, while the one of July 31 is 20.001 - a day different. I also applied some LOESS type smoothing to the curves before they are applied.


#17    Tangotiger      (see all posts) 2010/12/20 (Mon) @ 20:26

There should be no reason why the peak age for virtually all offensive players is not 26-28.  Rate-wise, that is. 

My expectations would be different.  I would say the better you are the later you peak, that the great players probably peak at age say 28.  And the worse you are, the earlier you peak, maybe as early as age 23 or 24.

This is of course hard to validate, without getting into a selection bias issue.

You’ve got SOME evidence for this.  If you look at K rates for pitchers, and the SB or 3B rates for hitters (you know, more indicative of pure physical tools), those peak early, like at age 23.  You also have to figure that there’s alot of players who simply don’t learn (Nuke Laloosh) for whatever reason.  And so, as their physical tools go away, they don’t improve on things they should be learning on (that leads to say better walk rates).  Who knows, maybe Jeff Francoeur is exactly that kind of player.

So, I’d be more inclined to say the status quo should be that great players peak later, poor players peak earlier, and come up with some reasonable age range of say 23 to 29 to explain that, rather than saying the peak is 26-27 overall.

I have never done or seen any research on playing time aging curves.

I must have posted this at some point no?  I thought I did.  Well, I do remember I did it for good pitchers who were 29 years old, and they lost 10% of their IP year over year, hence my Rule of 10.


#18    Ed D.      (see all posts) 2010/12/20 (Mon) @ 23:03

mgl/12, this is hidden behind a subscriber firewall, but I completed a series of aging curves (delta method—no correction for survivor bias at individual component level) for various hitting skills indicators for BaseballHQ a few weeks ago and started with a playing time curve.  Games played per season “typically” seems to peak right along with performance from age 27-29, with a year-over-year delta of about +10 games through age 26 and -10 games from age 31 (it isn’t linear, but I’m simplifying here).

Plate appearances per game also peaks at the same time with year-over-year deltas of +/-0.03 PA/G leading up to and tailing off from the age 27-29 peak.

I did not separate the sample into any cohorts (e.g., good hitters vs. bad hitters, fast vs. slow runners) for this particular set of curves.

Link: http://www.baseballhq.com/members/news/research/ra101203.shtml


#19    MGL      (see all posts) 2010/12/21 (Tue) @ 01:18

Ed, how did you weight each “pair” when using the delta method?

E.g., player 1 at age 26 has a .300 BA in 100 AB and then at age 27, he has a .310 BA in 500 AB. 

Player B at age 26 has a .230 BA in 300 AB and then a .240 BA at age 27 in 30 AB.

What do you get for the “delta” from age 26 to 27?


#20    Ed D.      (see all posts) 2010/12/21 (Tue) @ 09:35

mgl/19, in general I used the harmonic mean of PA within each pair for weighting, but for the games per season curve I left it unweighted (didn’t see any other way to do it since the denominator was effectively 1) and for PA/G I weighted it by games played.

In your example, technically I would have used PA not AB as weights (each skill that I looked at had a different denominator so I wanted to be consistent across my analysis), but the 26-27 delta would be +0.10 BA points in raw terms or +3.58% in relative terms.  In general, I used raw terms when constructing the curves (some of my metrics were already expressed in % terms and I did not want to do % of %).


#21    Guy      (see all posts) 2010/12/21 (Tue) @ 15:15

"There should be no reason why the peak age for virtually all offensive players is not 26-28.  Rate-wise, that is.”

I don’t think we know this is true, and I doubt that it is.  But selective sampling of various kinds makes it hard to sort out. If you selected players who played full-time at age 23, and then compared wOBA in their age 24 and 27 seasons, I’d guess you would find very little offensive improvement.  Factor in speed and defense, and they might even decline. (WAR/PA might be a good way to evaluate total production.)

Players are given more playing time up through age 28 or so.  And great long-career players may continue to improve through age 28, as Brian speculates (though I’m not sure).  But on average, I don’t think players improve much after age 23 or so.


#22    MGL      (see all posts) 2010/12/21 (Tue) @ 17:40

If you selected players who played full-time at age 23, and then compared wOBA in their age 24 and 27 seasons, I’d guess you would find very little offensive improvement.  Factor in speed and defense, and they might even decline. (WAR/PA might be a good way to evaluate total production.)

Players are given more playing time up through age 28 or so.  And great long-career players may continue to improve through age 28, as Brian speculates (though I’m not sure).  But on average, I don’t think players improve much after age 23 or so.

It depends on how you define your population.  I’ve talked about this before.  If you generate a curve using the delta method, without or without survivor bias compensation, you will find a fairly steep rise up until age 26 or so.  However, that is NOT he curve of any one player.  That includes players whose first year is age 25 and also play the next year, first year is age 28 and also plays the next year, etc.

That is different from, say, only looking at players who play full-time at age 23, as in your example - different pools of players, with potentially different true aging curves.

So when we say, “the average aging curve of the average MLB player,” that statement or construct is meaningless.  The only meaningful statement in practical terms, is a more specific one, like, “the aging curve for the average player who first plays at age 23,” or, “at age 26.” And even then, you might get different curves depending upon whether that first year is full or part time, or whether the player is a good, average, or bad player.  And in order to deduce those curves, you will run into some pretty severe selective sampling issues.

Sometimes the concept of “average” anything is not very useful, instructive, or illuminating.

For example, you said this:

But on average, I don’t think players improve much after age 23 or so.

Now, that statement could be right or wrong, depending on what you mean, by “on average” or even, “players.”

Using the delta method, and chaining all the gains and losses from one age to the next from age 23 to age 27, but with lots of different players, some only playing for two years at various ages, the “average gain from age 23 to age 27” is 7-8 runs per 500 PA, not an insignificant number.

So, by that definition of an “average aging curve from age 23 to age 27,” I think your statement is clearly wrong.

But, if we interpret your statement to mean, “The average player who plays from age 23 to age 27, much like how JC generates his curves, we are probably going to get a very different curve, since we now have players who are a very different ilk from the “average MLB player” (they are called up at an early age, so are likely very good prospects, and they play from age 23 to 27 at least, so they are likely players who are good, healthy, and a little lucky).

In general (and usually, but not always, I think that the delta method is best used for most real-time analyses, since we obviously never know the career trajectory of a player before the fact. So, if I have a 23 year old player and I want to project him as a 24 year old, I MUST use the delta method to do the age adjusting.  And that (the delta method) tells me that the average (all) 23 year old player gains 4 runs per 500 PA from age 23 to age 24, again, rendering your (Guy’s) statement (that there is very little gain from age 23 to peak age) incorrect.

Now, if I wanted to estimate his gain (or loss) from age 23 to age 27 (such as in a 5-year projection), then I MAY have to assume that he is going to play from age 23 to age 27 in which case, the aging curve may be different.


#23    Guy      (see all posts) 2010/12/22 (Wed) @ 12:44

MGL:  can you post a link to your prior delta method analysis/analyses?

I think you’re underestimating the selective sampling problem inherent in the delta method.  People worry a lot about the problem of unlucky players not returning, while lucky players do return, which creates a downward bias in the delta method.  But there is a very large bias the other way as well:  players who are allowed to return despite a poor or mediocre performance are players who are expected to improve (based on prior track record, scouting, etc.).  If a 23 yr-old with a .305 wOBA plays at age 24, it is only because his team has good reason to think he is much better than his age 23 performance.  So it doesn’t surprise me at all that your delta method shows improvement from year to year. 

For the delta method to provide convincing evidence that 23-yr-olds do get better, you MUST also include the minor-league performance of those not in MLB at age 24. (Wouldn’t it be great if using CAPS actually made our statements true?) That should be available for nearly all players, as it would be unusual for a player to be good enough to play in majors at age 23 yet not be in professional ball at age 24.  Perhaps you used minor league data as well as MLB performance, but I’m guessing not.

Fundamentally, I don’t think you can estimate age curves for young players in general without using minor league data, as Brian has done.  I don’t know all the details of this methodology so can’t comment on that, but at least he’s looking at the right population.  And I don’t think it’s a coincidence that he estimates a much younger peak.

I also think that my suggested method provides a very good way to avoid many selective sampling problems:  select on age X, then compare age X+1 to later years. And I’d amend it to include using MLEs at the later age if a player is no longer in MLB.  I agree there is some bias here, in that the player had to be in MLB by age X, and so may be an early-maturer.  But for the task of forecasting guys who do make the majors at a young age, it’s a very good method.  And my bet is still that improvements after age 23 are quite modest.  I think we’ve been fooled for years into believing in a later peak than is real by the fact that many players’ ability to return and keep playing is in large part a function of teams’ knowledge that they are likely to be better than their early performance indicates.


#24    Tangotiger      (see all posts) 2010/12/22 (Wed) @ 13:03

MGL had a huge two-parter I think at Hardball Times.  Should be easy to find.

Here you go…

http://www.hardballtimes.com/main/authors/mgl/

Written exactly a year ago.


#25    Brian Cartwright      (see all posts) 2010/12/22 (Wed) @ 13:13

I used matched pairs of park adjusted data whenever a player was in the same league in consecutive seasons, no minimum playing time, weighted for smaller PA of the two.

I’ve also tried same team in consecutive years (so that I could use unadjusted data) and got similar results, but using same league gives a larger sample.

I did not use league adjusted data (MLEs) because I use the aging curve to calculate the league factors (age correct the minor and major seasons before comparing), and I wanted to use data with as few adjustments as possible.

For ages 16-22, all or almost all of the players are in the minors, and this is where the most changes occur year to year.


#26          (see all posts) 2010/12/22 (Wed) @ 13:23

Brian, is controlling for same league (or team) artificially depressing your results?  It seems that the players who improved year-to-year would be far more likely to be promoted to the next level.

I can understand why you wouldn’t want to use multi-level data to establish your aging curve, but shouldn’t you go back and check afterwards if your aging curve is different for the players who changed levels?  That wouldn’t tell you what the right answer was for that group, but it would tell you if you got your answer wrong.


#27    Brian Cartwright      (see all posts) 2010/12/22 (Wed) @ 13:35

The decision to promote is made before y2 begins. If a player is not promoted, it’s likely because y1 was unluckily low, and y2 would then probably regress back towards true talent.

Running right now with only MLB data, same model, age 27 has most PA’s, BABIP decreases every year, XBH peaks at 23, TR decrease every year, HR peaks at 25, SBAay 21, CS at 26, BB at 29, SO at 26.


#28          (see all posts) 2010/12/22 (Wed) @ 13:54

The decision to promote is made before y2 begins. If a player is not promoted, it’s likely because y1 was unluckily low, and y2 would then probably regress back towards true talent.

It’s a good point that there’s a bias in that direction which would be artificially inflating your results.  However, I don’t which effect is bigger, the one I identified in #26 or this one you identified in #27.  I would suspect that the effect of promoting the good prospects would be much larger, but I don’t have data to support that hunch.  That’s why I’m suggesting you could check that by looking at the aging curve of only the players who were promoted a level between years.  If your aging curve is correct, and the selection bias of overperformance leading to promotion is the main bias at play, this group of players should perform worse in year 2 than in year 1.

Also, not every decision to promote is made between years, of course.  There are many mid-year promotions.  If you’re weighting both years’ data by the smaller of the two samples, that will operate in the same direction as the effects of promotions at the start of the year.


#29    Brian Cartwright      (see all posts) 2010/12/22 (Wed) @ 15:49

Many mid-season promotions will be included. If a guy is in A+/AA in y1, AA/AAA in y2, AAA/MLB in y3, then he will be included in AA y1/y2 and AAA y2/y3.


#30          (see all posts) 2010/12/22 (Wed) @ 17:14

Many mid-season promotions will be included. If a guy is in A+/AA in y1, AA/AAA in y2, AAA/MLB in y3, then he will be included in AA y1/y2 and AAA y2/y3.

I understand that mid-season promotions will be included, but they will be weighted lower in your sample than other players, assuming you’re weighting by whichever sample of y1 and y2 is smaller.

It’s a lesser effect than that from between-season promotions, but it acts in the same direction.  At least it’s a lesser effect on a per-player.  It may be a bigger effect overall if the population of mid-season promotions is much bigger than the population of between-season promotions.

In either case, you do see how there is a bias here that could push your aging curve down if outweighs the out-performing-one’s-talent promotion bias that you mentioned, don’t you?


#31    Guy      (see all posts) 2010/12/22 (Wed) @ 17:14

I think that MGL’s delta-method findings are potentially consistent with the idea that most players reach their peak by age 23.  First, he weights his “couplets” by average PA, which will tend to give more weight to players who improve than those who are stable or decline.  More fundamentally, he is limiting his analysis to MLB performance.  Players who continue to improve after 23 are more likely to stay in MLB, while those who don’t improve (or get worse) are more likely to fall out of his samples over time. 

Let’s say we knew that among all 23 yr-olds in professional ball, 1/3 would continue to improve thru age 28, one-third would have the same true talent thru age 28 then decline, and one-third would begin declining immediately.  And let’s assume this age curve is independent of talent level (may or may not be true).  What would we see at the MLB level?  The “decliners” would tend to exit MLB fairly quickly (except the very best).  The “plateau” guys whose true talent is above replacement will tend to stay (especially those who underperform), while the weak and unlucky will drop out.  And at each successive age, the “improvers” will become a larger share of the population, both because their performance is better and their trajectory suggests more upside.  My guess is that would translate into something like MGL’s delta-method findings.


#32    Guy      (see all posts) 2010/12/22 (Wed) @ 17:57

I took a quick look at players who qualified for the batting title at age 22, 2001-2005.  There were 17 of them, and at age 23 they had an average OPS+ of 117 (straight average).  At age 27, they put up an average OPS+ of 115.  This is obviously a small sample, but I think you would see a similar pattern in a larger sampling.  If you are good enough to play regularly at age 22, I don’t think we should expect much offensive progress after age 23.  Clearly, this is NOT reflective of most professional ballplayers, who aren’t FT players at age 22.  But to the extent we want to forecast players who are in MLB at a young age, I think it’s a valid way to do projections.  In fact, I think it’s far better than the delta method. 

While it’s not common to decline at such a young age, it does happen.  Just in my small sample, Sean Burroughs had his last MLB season at age 24, and Baldelli never cracked 200 PA after age 24.  These declining players won’t appear at all in many of MGL’s delta couplets (Burroughs isn’t in 24/25, 25/26, 26/27), or will get very little weight due to low playing time, creating a bias in favor of improving players.

Methodology notes:  S. Burroughs was out of baseball at age 27, so I credited him with a 70 OPS+.  Sizemore had almost no PT at 27, so I used his age 26 season.  Baldelli didn’t play at 23 so I used age 24 season.


#33    Brian Cartwright      (see all posts) 2010/12/22 (Wed) @ 19:17

Following up on Guy/32, I took all 25 year olds who had >= 502 PA, from 1998-2008, and compared to age 27 season, using park adjusted, MLB only stats.

115 players, all still playing at 27 (used left join) although Ken Harvey and Todd Walker under 100 PA when 27. Of the 115, 58 improved their wOBA, 57 declined, for a weighted average difference of -.002.

I put the list on Google Docs, but it won’t let me see it in the browser. It can be downloaded
https://docs.google.com/leaf?id=0B0ieb136KCz2NDllYjU4MjYtMWIyMC00NGFhLTg5NDQtOTg2ZGZjYjc5MjQ0&sort=name&layout=list&num=50


#34    Brian Cartwright      (see all posts) 2010/12/23 (Thu) @ 05:56

Mike Fast, I got enough time after work (and sleep) to run the variants. I added a column for class rank, a number with MLB at the top, so I could do a easy numberical comparison to see if the player was promoted to a higher level, demoted to a lower one, or went to a different league on the same level.

These groups are not mutually exclusive. Say a player is at Eastern (AA) & PCL (AAA) in y1, and also in y2, plus Texas in y2. He would be in same league in AA and AAA in y1 & y2; promoted AA to AAA from y1 to y2; demoted from AAA to AA from y1 to y2; and moved laterally (Eastern to Texas) from y1 to y2. Not exactly the same data in each set of matched pairs, but which portions were relevant to the constraints.

I do not use one age curve for players, instead each component has it’s own. BH, XBH, TR, HR, HP, BB, SO, SH, SF, GDP (each as rates with their own appropriate denominators)

In the spreadsheet, the blue “Same Lg’ line is the one I use. This example is the HR rate.

Promoted players decrease in HR every year after 21. Looking at all the columns (not available on this spreadsheet) shows consistent signs of overachieving in y1, causing y2 to almost always be less - BH always less, BB always less, SO always more after 22.

Demoted players (for any portion of y2, can include injury rehabs, especially for the older players) almost always do better in y2 - they performed below their true talent in y1. BH always more to 36, XBH always more to 34, HR always up, BB always up to 36, SO always less.

Players who switched leagues at the same level show the same growth up to peak, but then are very flat afterwards - looks like a JC curve.

Demoted players have a strong bias. Management might be too quick to send someone down after a subpar performance not indicative of their true talent.

Promoted players have a bias, but it does not look as strong. The HR curve has the same slopes as the same league players, but is shifted about four years younger. Presumably many players are promoted on reputation (true talent estimate), despite their performance.

Biggest takeaway point - including promoted players would only skew the curve younger.

https://spreadsheets.google.com/ccc?key=0Akieb136KCz2dEJJMDl2WVYxQV9JSEtWZ2t3MVc3MWc&hl=en


#35          (see all posts) 2010/12/23 (Thu) @ 11:57

Very interesting!  Thanks for looking at that and sharing your results, Brian.  I would have guessed the net effect would lean the other way, but real data is always better than a guess.


#36    Brian Cartwright      (see all posts) 2010/12/24 (Fri) @ 03:21

Thinking my numbers that I posted in #34 -

Some people have questioned using minor league stats to profile major leaguers. In the early ages, almost all the players are in the minors and it’s the only sample available for those players 17-23, when the most rapid changes are occurring. Likewise, I can say that after 25, most of the players input to the age study are in the majors, and I didn’t think that mixing the minor and major data was going to make a substantial difference in the results, as I was comparing players in the same league in consecutive years.

Looking at the ‘Lateral’ graph, players who changed leagues (and thus were not included) but those leagues were at the same level, had a pre-25 growth very similar very similar to the league repeaters, but it was flat into the 30’s. Could I also use this data?

Where the two curves differed was in the post-peak years, when a large majority of the players were in MLB. There are differences in talent level between the NL and AL, but in many ways they are like one large league, and I was not including in the age study players who made lateral movements between the two major leagues.

When I adjusted my code to include these players, HRs peaked at 29, extremely flat 25-30. All categories carried skills better into the late 20’s. I have updated the spreadsheet linked in #34.

I will generate a composite line to see what a typical player’s stats would be at each age, calculating a wOBA which I now believe will peak closer to 27.

I actually feel very good to be wrong - I still believe my basic model was correct, and this adjustment, which does not violate any of my previous criteria, comes up with a result that is more intuitive and acceptable.


#37    studes      (see all posts) 2010/12/24 (Fri) @ 10:38

Brian, I assume your data is park/league adjusted?  Are you essentially comparing MLE’s for your age study?


#38    Brian Cartwright      (see all posts) 2010/12/24 (Fri) @ 13:16

Studes, they are park adjusted, but not league, as I am using the results to calculate each league’s MLE factors. (To compare the performance at 20 in AA to 24 in MLB, I age normalize the stats so that the only remaining factor is the league (level of competition))

I am comparing park adjusted stats where a player was in the same league in consecutive seasons, now treating MLB as one league.


#39    MGL      (see all posts) 2010/12/26 (Sun) @ 00:43

#32 and #33, using such high minimum PA, you are introducing a huge selective sampling issue. Of course you won’t see an increase in performance when comparing later years to those early years. It is like the discussion we had about whoever the young player was that Pos and Neyer were discussing.

But that does NOT mean that those players’ true talent is not increasing as he ages.  Not even close. I can’t emphasize that enough.

Guy said this:

I took a quick look at players who qualified for the batting title at age 22, 2001-2005.  There were 17 of them, and at age 23 they had an average OPS+ of 117 (straight average).  At age 27, they put up an average OPS+ of 115.  This is obviously a small sample, but I think you would see a similar pattern in a larger sampling.  If you are good enough to play regularly at age 22, I don’t think we should expect much offensive progress after age 23.

So Guy found 17 players at age 23 with an average OPS+ of 117. If I asked any of you (Guy, Brian, Tango, etc.) to estimate their true talent, how would you respond?  You would take that 117 and regress it the appropriate amount (given a one year sample of 550 or so PA) toward the mean of all 23 year old MLB players and come up with something like an OPS+ of 104 (117 regressed, say, 50% toward 91 - I am guessing at some reasonable numbers).

So now we have those players at age 27 with an OPS+ of 115, for an increase of 12 points or 3 points per year.  And that is not even weighted by future playing time.  If you weight it (and I am not suggesting that you necessarily should), it is going to be much more than that.

Why in the world would you use 117 as your initial point, when that 117 was not randomly selected?

It is selected based on playing time, and we KNOW that when you select any players based on playing time, you will have a lucky or unlucky group depending upon whether that selected playing time if high or low, especially at a young age.

Lots of those (not all) 23 year-olds came to the majors as very good prospects.  Those who started out lucky likely got lots of playing time and those who did not were benched, relegated to part-time status, or sent back down to the minors.  I don’t have to tell any of the regulars hear that, do I?

So this statement by Guy:

If you are good enough to play regularly at age 22, I don’t think we should expect much offensive progress after age 23.

is true only in terms of the first selectively sampled year as compared to any future year, and is misleading.  There certainly IS going to be progress after age 23 if we don’t selectively sample any year after the initial one.

Without looking at the data, I challenge Guy or Brian to look at all players who were full time at age 22, and then look at any two years after that, starting with age 23, regardless of how many PA they get - IOW the delta method.  I will guarantee that there will be progress up until at least age 25.  I say ONLY 25, not because I am backpedaling, but because in my aging studies I generally get a plateau around 26-28, and it is likely that players who are full time at age 22 are “early maturers” on average, such that their physiological age is closer to 23 than 22.

So basically I want to see the subsequent years (PA and OPS+), including if they did not play, of all players who were full time at age 22 and/or at age 23.  Again, I guarantee that we will see an increase in talent until at least age 25.  The list should look like this:

Players with at least 500 PA at age 22:

Player, OPS+ and PA at age 23, 24, 25, 26, 27

A: 108/500, 111/550, 109/400, 115/475, 120/520
B: 102/500, 0/0, 103/300, 112/405, 112/500
C: 117/534, 90/400, 0/0, 0/0, 0/0

Etc.

The same for players who were full time at age 23, listing their OPS+ and PA for ages 24-27.

I don’t care about their numbers in the samples year (age 22 or 23), as those are NOT unbiased samples.


#40    Brian Cartwright      (see all posts) 2010/12/26 (Sun) @ 10:21

mgl, I agree with what you are saying vs the selective sample (and that si not how I do my curve, I use delta), even more so at age 22 when MLB players are a minority at that age. My graph shows how the promoted players show a much earlier peak, because there is over performance in y1.

I picked age 25 to repeat what Guy had done because there’s a larger population of MLB at that age, and 25 and 27 where the two peaks being discussed.

for a batter who is league average at 25, I now have the following wOBA’s at each age

17 .226
18 .248
19 .267
20 .284
21 .299
22 .312
23 .323
24 .331
25 .336
26 .337 
27 .337
28 .337
29 .336 
30 .334
31 .332
32 .328
33 .324
34 .320
35 .314
36 .308
37 .300
38 .291
39 .279
40 .266


#41          (see all posts) 2011/01/03 (Mon) @ 09:23

If these aging curves don’ provide a “better projection system”, then is it fair to describe them as more accurate?  Is the variability high enough that aging curves, beyond what Bill James found 30 years ago, minimally helpful?

I presume we are using these to make projections.  However, if those projections are not more accurate than systems that don’t use this much aging curve assessment, what is gained?

Not that it isn’t interesting math and discussion, but if it doesn’t make more accurate projections, then I have a hard time believing the curve to be “right”.


#42    CircleChange11      (see all posts) 2011/01/03 (Mon) @ 10:54

Can the THT forecasts be C&P and/or exported to Excel? (by team, or by position, etc)


#43    Tangotiger      (see all posts) 2011/01/03 (Mon) @ 11:52

Let me try for the first time, so here’s the step by step after logging in:

click sortable batting
click oliver forecast
click spreadsheet.csv

Well, that was easy.  Here’s what Pujols’s record looks like:

MLBAM_ID    Retro_ID    BBREF_ID...........    Name    Pos    Org    Lg    Class    Age    PA    AB    R    H    2B    3B    HR    RBI    SB    CS    K    BB    HBP    GDP    BA    OBP    SLG    OPS    wOBA    Field    WAR
405395    pujoa001    pujols001jos    Pujols
Albert    1B    STL    NL    MLB    31    638    533    104    173    38    1    42    123    9    4    60    93    5    18    0.324    0.425    0.635    1.06    0.445    0.6    6.3

My constant pleading for the MLBAM or RetroID is satisfied here.  Thanks Brian.


#44    Brian Cartwright      (see all posts) 2011/01/03 (Mon) @ 12:18

Tango - we added the IDs over the weekend, hadn’t announced it yet.

Chris #41 - valid points. The aging curves are first used to age normalize stats before calculating minor to major league factors, then to do the extended forecasts (more than one year). I’m in the process of doing new accuracy measurements.

Also, other people’s (published) aging curves don’t go down to age 16 or 17, which I need for some of the minor leaguers.


#45    Guy      (see all posts) 2011/01/03 (Mon) @ 12:42

I don’t understand MGL/39 (which I missed until now). I selected players based on their playing time at age 22, and then compared their age 23 performance (NOT age 22) to their age 27 performance.  I don’t see why we should regress their age 23 performance at all.  So I stand by my statement (though I’m quite willing to be proven wrong):  if you can play fulltime at age 22, you won’t improve from age 23 to age 27.  Now, even if that’s true it still leaves open the possibility that such players improved from 22 to 23.  However, if we also found this same pattern when looking at 21-yr-old regulars (i.e. a plateau from 22 to 27), and 23-yr-olds (no improvement from 24 to 27)—a big “if”—then I think you’d have to conclude that position players really don’t improve once they become regulars.


#46    Guy      (see all posts) 2011/01/03 (Mon) @ 21:15

Brian:
Is your curve based on chained deltas?  Can you point me toward an explanation of the methodology?

The post-peak curve looks too flat to me.  It suggests the average player is as good at age 33 as at 23, which I don’t believe is true.  Or that 35-yr-olds are as good as 22-yr-olds.


#47    MGL      (see all posts) 2011/01/03 (Mon) @ 23:07

I don’t understand MGL/39 (which I missed until now). I selected players based on their playing time at age 22, and then compared their age 23 performance (NOT age 22) to their age 27 performance.  I don’t see why we should regress their age 23 performance at all.

That is a good question (why I wrote that).  I have to think about it a bit.  As you say, there shouldn’t need to be any regression at age 23 if we select our sample at age 22…


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:08
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 14:44
What sabermetrics is NOT

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion