THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, December 02, 2009

The latest (and last I hope?) JC thread on this site

By Tangotiger, 02:33 PM

Someone sent me this email:

In case you haven’t picked up on this yet, BTF has a post up about another Bradbury aging article:

http://www.baseballthinkfactory.org/files/newsstand/discussion/huffington_post_bradbury_is_your_favorite_free_agent_over_the_hill/

Rally gives the PERFECT response (I wish I had thought of doing that):

This is pretty bad. I’m fairly sure I know how he looked at this, and that is not an honest interpretation of the results. I believe he looked at players who were active at age 35 and at 27 or 29 or whatever his peak is. And he’s got it backwards. Knowing that a player has an .850 OPS at age 35, he probably was a .900 player at age 27.

But if you’re looking at age 27 here’s what I find:
All players with OPS between .875 and .925, from 1993 to 2001, who were 27. (start of modern offensive era on one end, and the last year that a 27 year old could have and also have an age 35 season on record. B-Ref PI is my source. 18 players qualify, average OPS is .900

One player maintained a near .900 OPS, Jeff Bagwell. Another, Sammy Sosa, was at .849.

Six were between .770 and .804 (Ordonez, Karros, T Martinez, M Williams, M Sweeney, V Castilla).

Four were between .695 and .741 (Ventura, Catalonotto, John Valentin, I Rodriguez). There’s Tim Salmon (628), Mike Lieberthal (.540, 82 PA). Juan Gonzalez almost made it through a single plate appearance before being done for good (but not quite, hurt running to first). Dean Palmer, Todd Hundley, and Trot Nixon were out of baseball.

The weighted average on PA of this group is .779, simple average of the ones who were still playing was .749, and the median (Robin Ventura) was .741

So given that a player has a .900 OPS at age 27, it is more likely that he’s out of baseball entirely at 35 than still posting an .850 OPS or greater. And the median expectation of the guy is Robin Ventura 2003: 242/340/401, 392 AB, 14 HR, 55 RBI.

Thanks Rally.  You have given me the strength to endure JC’s ramblings.


#1          (see all posts) 2009/12/02 (Wed) @ 17:51

The first thing I thought of here was Bernie Williams.  Age 27 OPS: .926 (1996).  Age 35 OPS: .796 (which was, sadly, an improvement over his age 34 season).  Not to mention the total erosion of his defensive abilities.

I looked up those numbers, but I didn’t really have to.  Bernie was cooked by age 35, and it was painful to watch.  At age 27, he was a superstar level player.  His last really good year was 2002, at age 33 (during which he ran into a wall, went 0-17, had knee surgery, came back and was never the same).  You did NOT want Bernie Williams on your team at ages 34 and 35 making money you agreed to pay him when he was 27. 

Sample size of one, but any big fan can come up with a long list of guys who declined quickly in their early to mid 30s.  Sure, if you take all the good 35 year-olds and go back to look at their 27 year old seasons you’ll find… good players.


#2          (see all posts) 2009/12/02 (Wed) @ 18:39

At what age do economists start to decline?


#3    JD      (see all posts) 2009/12/02 (Wed) @ 19:20

Wait, am I misunderstanding something here? Did JC really just “prove” that good 35-year-old hitters were likely great 27-year-old hitters? Wow. Enlightening.


#4    Mike      (see all posts) 2009/12/02 (Wed) @ 19:54

Ha! BrianK, well done.

What’s the opportunity cost of all the effort spent refuting JC’s asinine positions?  I’ve ignored him completely since the Francoeur debate, and I’m a happier individual because of it.


#5    JD      (see all posts) 2009/12/02 (Wed) @ 21:15

Mike/4 - Same here. I will check these discussion threads about him, but I haven’t been to his site since. I’m not sure I even have the link to it anymore.


#6    MGL      (see all posts) 2009/12/02 (Wed) @ 22:10

#2 is funny!

As I mentioned in the other thread, I am working on some research which indicates that peak age might indeed be around 29 (actually pretty flat between 27 and 29) once we account for survival bias.

Hopefully I’ll have that research done and I’ll write an article, probably on Fangraphs, within a week or so.

As I briefly mentioned in the other thread, I am not on the “vilify JC” bandwagon as others are.

For one thing, as I said, I think that peak age is in fact close to 29, according to my new research. For another, JC makes it “clear,” at least according to his methodology, that he is talking about players with long careers.  There is no ambiguity in that. Even if he does not specifically say that he is only dealing with good players with long careers, it is plainly obvious from the data set he is using. And there is nothing wrong with that.  I don’t recall him ever saying that we can extrapolate his conclusions to anyone who has ever played MLB at any age and for any length of time.  If he has, I stand corrected.

Sometimes looking at numbers in the aggregate distorts things.  For example, what if there are lots of players who play for a one or two years or sporadically over several years and these are the players who bring down the average peak age or change the typical trajectory?  Are we particularly interested in these players? Maybe yes and maybe no, but the answer is by no means certain.

I CAN say with some certainty that most of the time when we want to know at what age a player starts to decline or is going to peak or has peaked, as when we evaluate FA contracts, it is usually NOT these marginal players.  So we certainly don’t want to use “average” trajectories and peaks when dealing with this kind of subset of the population of all players who have ever played MLB.


#7    Guy      (see all posts) 2009/12/02 (Wed) @ 22:25

MGL, what do you mean by “once we account for survival bias?” Accounting for survival bias should generally lower, not raise, the estimate of peak age.

JC absolutely does say that his findings extend to players beyond his sample.  Perhaps not “all players,” but players with any kind of significant career in MLB.

I don’t have a big problem with excluding 1- and 2-year players.  It won’t make much difference.  But I would say the fact that marginal players mostly make it to the majors at age 25-27, rather than ages 28-30, is one more piece of evidence that baseball players in general do not reach their best performance at ages 29-30. 

The important issue, as JC says, is less the precise peak than the rate of subsequent change.  He argues that players decline relatively littly through age 35.  I think there is a vast body of evidence demonstrating this is not true.  (Unless you want to exclude every player who had an injury at some point that perhaps played some role in diminishing their playing skills.  But at that point, I’m not even sure what “aging” means.)


#8          (see all posts) 2009/12/02 (Wed) @ 22:42

MGL, this is going to sound strange given the sort of thing you normally get criticized for, but I think you are being too charitable towards JC.  If you look at the article of JC’s that is being discussed, he is talking about applying his research to current free agents, where we obviously don’t know who is going to still be playing in their mid-to-late thirties.  Yes, his data set clearly implies that he was answering a different question, but that doesn’t seem to be how JC is interpreting his work…


#9    Tangotiger      (see all posts) 2009/12/02 (Wed) @ 22:57

MGL: I agree that there is no ambiguity in JC’s dataset when he talks about the peak age being 29.

However, you are incorrect when you say that JC is not extrapolating beyond his selection parameters.  He is definitely extrapolating.  This is plain in the way he talks about this.  Indeed, why would he even bother comparing to the “traditional” age 27 peak if not to refute this “myth”.  As far as he’s concerned, it’s not apples and oranges that are being compared. 

If he were to say “the peak age for players who get 5000+ PA over ten years between the ages of 24 and 35 is 29”, we’d have no issue.  None.  But, this is not what he’s saying.

He goes further with the pitchers, when he says pitchers with a 3.50 ERA at age 29 will have an ERA of 3.75 at age 35.  This is obviously ridiculous.  That basically ages the pitcher by 1 run per year.  You can 1000 batters, and you age by 1 run.

***

Furthermore, to JC, aging does not include attrition!  Basically, if a guy is finished by age 32, he drops out of JC’s study!  It’s like this player has provided no value in an aging study because JC presumes that “something” must have happened to him that is not due to “aging”.

So, at this point, we are debating the word “aging”.  Aging, as the rest of us discuss it, includes conditioning, and any reason whatsoever (including death possibly) whereby a player’s performance drops.  We can call it “quatlus” if we want, but when the rest of us talk about aging, we mean the whole kit and kabootle.

***

But, I agree with you generally if you want to say that different quality of players and different types of players have different aging trajectories.

And yes, it doesn’t help us to say that if the average of the 16,000+ MLB players in history, from Moonlight Graham to Nolan Ryan, peaked at age 26-27, then we should apply this to every player.

We agree that the players who are worse peak earlier.  This is probably why they are worse to begin with.  I’m sure there are PLENTY of players at age 20 that you can’t separate between stars and over-the-hill.  Some will peak at age 21 or 22, and won’t even make it into MLB.  Some will peak at 23 or 24 and will just make it, but be finished by age 30.  Others will peak at age 28 or 30, and have long careers.

So, there is a relationship between the quality of player and the peak of hif performance: better players peak later, generally.

Regardless, the problem is with extrapolation.  If we can all agree that you can’t go beyond your sample parameters, we have no issue.  None.


#10    Guy      (see all posts) 2009/12/02 (Wed) @ 23:20

"better players peak later, generally”

Doesn’t that depend on what you mean by “better players?” If you mean career value, then sure, almost by definition.  But if we mean peak talent, I’m not sure that’s the case.  Why do you think that’s true?

Just looking at the top OPS+ seasons of all time (200+), and excluding Sosa/Bonds/McGwire, we have these peaks:
Ruth 25
Williams 22
Hornsby 28
Mantle 25
Gehrig 24
Bagwell 26
Thomas 26
McCovey 31
Foxx 24
Brett 27
Cash 26
Musial 27

Hardly an exhaustive study.  But I’d be very surprised if even great peak players peak as late as 29. (And as we’ve discussed, using a better measure of performance like RAR or WAR will tend to give us even earlier peaks.)


#11    Brian Cartwright      (see all posts) 2009/12/03 (Thu) @ 00:32

We could be confusing cause and effect by saying ‘better players peak later’. Could it be that someone is a better player because they peaked later (continued to improve their skills past the normal age)? If you continue to improve until age 29, you have a greater chance of playing later into your 30’s?

It’s quite possible that two players have the same talent at age 20. Player A bulks up his legs, his speed goes, etc, and peaks at 23, maybe has a cup of coffee in mlb. Looked good once, but flopped. While player B keeps getting better until 29, and plays until 36.

And players who have a higher peak, even if it’s at 24 or 25, have a longer glide path down before they reach replacement level and attrit out.


#12          (see all posts) 2009/12/03 (Thu) @ 01:57

Brian/11, I think thats why Tango mentions attrition. As you state, attrition is a big part of the entire population that JC is using. However, if JC’s focus is “great” players the attrition you mention doesn’t exist in large quantities. Obviously, most seem to think JC is extrapolating his data beyond great players.

- - - - - - - - -
I have a problem with OPS being the statistic being used. Does anyone else? Hitting is just one facet of the game.

Obviously skills deteriorate after one’s individual peak. However, batting skills aren’t a absolutely value of one’s abilities. Fielding is a huge component of a player’s value and probably deteriorates quicker than a player’s hitting ability.


#13    MGL      (see all posts) 2009/12/03 (Thu) @ 02:14

"MGL, what do you mean by “once we account for survival bias?” Accounting for survival bias should generally lower, not raise, the estimate of peak age.”

Simple. If we use the delta method (or some form of it), all marginal (or old, or really young, or whatever) players who have a really bad/unlucky year and who would have bounced back, at least a little the next year, and thus would have had a positive “delta” for those two years, are excluded from the study if they don’t get to play in Year II.

So if lots of players with a positive delta are excluded from the data, which they are, that brings the peak age down.

Tango, regardless of what JC says or doesn’t say, his study speaks for itself. If I do a study that shows that says that first baseman are the best hitters in baseball and then I go on to say that SS are the best hitters, what difference does it make?  It is not worth arguing.


#14          (see all posts) 2009/12/04 (Fri) @ 11:23

I feel like it would be possible to first calculate a survivial function based on performance (or something else we deem important) at “Age X”, then use that for each year of the study (up to a season where there are only players that we know have left the game/retired by now).  I think we could leave out those we know have catastrophic injury.  Once we have that, we should be able to use that information to caclulate the projected career length at each point in the career, and simply use those observations similar to that player to calculate the peak age for only those players that match up well.

However, I’m not certain exactly how to combine those, and perhaps this is similar to what MGL is putting together?  I think one problem is that different people are defining aging differently.  JC seems to feel that having an entire career curve is important (which does make sense depending on your interest), while others feel that the idea someone didn’t ‘survive’ is an important indicator of their aging.  I think both are correct.

I think it would be reasonable to throw out players on the cusp of ‘replacement level’, as they would only increase the survivor bias (and if they really are that level player, then they’re not that interesting anyway when it comes to aging).  Once that’s done, I think using a hazard function to control for survival bias would be an interesting model.  Using the performance variables, I imagine we could get a decent picture of projected career length, as well as the probability of attrition from the league at Year X conditional on their performance.  Like I said, incorporating this information into some sort of “peak year” function is the difficult part.


#15    Tangotiger      (see all posts) 2009/12/04 (Fri) @ 12:00

I have no problem if someone wants to create a career arc based on:
- aging of above-average players with no injuries
- aging of average players with no injuries
- aging of below-average players with no injuries
- injury rates by age, position
- attrition rates not due to injury, by age

And if JC wants to only focus on the first, that’s fine.  And if someone wants to figure all the above in one shot, that’s fine too.

But, if the question being asked is: “I have a 29-yr old Josh Beckett with a 3.75 ERA”, I don’t want to hear someone say “He’ll likely forecast for a 4.00 ERA in 6 years”.

What he’s actually saying is: “Based on similar players who played full-time for the next 6 years, if he can continue to be a full-time pitcher, he’ll have a 4.00 ERA as a 35-yr old pitcher.”

When you add all those conditions to the statement, the answer becomes far less interesting.  Indeed, what we want to know is: “How much should we pay Josh Beckett for the next 6 years”.

Presuming JC’s aging trajectory is not going to help us.


#16          (see all posts) 2009/12/04 (Fri) @ 12:29

I agree with that.  And I think it’s fully possible to do in one swoop.  Quite an undertaking, but I think there is a combination of statistical techniques that could be used to build a really cool model that could give a locally matched aging curve for whatever player you like.


#17    Tangotiger      (see all posts) 2009/12/04 (Fri) @ 12:37

"Quite an undertaking”

Undertaken, two years ago:
http://www.insidethebook.com/ee/index.php/site/comments/win_shares_aging_curves/

Post 13 has the general function.


#18    Dackle      (see all posts) 2009/12/04 (Fri) @ 13:05

Wonder if it’s possible to reduce the survival bias by getting more granular with the data—ie instead of comparing year over year, compare month over month. I wonder if it’s possible to even compare day vs day (age 27 years 150 days compared with age 27 years 151 days etc). Then chain the results back together into years.


#19          (see all posts) 2009/12/04 (Fri) @ 13:14

Interesting, and yes quite an undertaking of data.  I was thinking of a smoother “bin” method, but definitely similar to that.  I was thinking of some combination of hazard functions and propensity score matching that I haven’t seen much of before in ANY literature (which makes me think my ability to apply it would be extremely limited if it’s even a possibility).  I assume this is something similar to what PECOTA does, though.

Just a question about the “5 year following average”.  Is it possible there are some truncation issues with dropouts in those following 5 seasons?  Or that perhaps 4 good years were not accounted for, given there was terrible performance in the first year following and the player got no chance to peak beyond the original year?  Should it be conditioning Year 5’s production on the fact that they played in Year 4, and Year 4 on the fact that they played in Year 3, etc.?  If not, there could be some survival bias issues where performing well in Year 5 doens’t assume high performance in Year 1.  Whether this has any significant effect, I’m not sure.


#20    Tangotiger      (see all posts) 2009/12/04 (Fri) @ 14:43

Millsy: what the “following 5 years” shows is the level of production in MLB.  If you want to argue that it could be higher, had the player shown a better “year one”, then sure, that’s certainly true.

All I’m showing is what kind of production you are going to get, if you let the management decide who gets to keep playing.

That is, I could scream until I’m blue in the face that so-and-so’s below-average production should be given limited weight in light of his star performance the prior 4 years, but it’s management that will decide how much this guy is going to play.

So, this is what we are measuring: observed future production, given his prior history and age.


#21    MGL      (see all posts) 2009/12/04 (Fri) @ 16:36

"What he’s actually saying is: “Based on similar players who played full-time for the next 6 years, if he can continue to be a full-time pitcher, he’ll have a 4.00 ERA as a 35-yr old pitcher.”

When you add all those conditions to the statement, the answer becomes far less interesting.  Indeed, what we want to know is: “How much should we pay Josh Beckett for the next 6 years”.

Presuming JC’s aging trajectory is not going to help us.”

Agree 100%, but the traditional delta method of looking at all players who played at each age pair is not going to help either.  What you want to look at for that particular question are pitchers who were above average and who played for 8 or 9 years starting at around age 20 (and are big and strong power pitchers, etc.).


#22    Tangotiger      (see all posts) 2009/12/04 (Fri) @ 17:01

Agreed 100% with MGL/21.

In that case, this is going to end as a boring thread.


#23          (see all posts) 2009/12/04 (Fri) @ 17:57

Thanks for clarifications, etc.  I’m very curious to see MGL’s study.


#24    MGL      (see all posts) 2009/12/04 (Fri) @ 19:12

It is going slower than I thought.  There are several good ways to do the “delta method” and accounting for survival bias is tricky.  As well, how much using any kind of delta method applies to any particular player is far from being clear, as is how to articulate the results from a delta method.

Wow, Amanda Knox guilty!  I am not surprised but I think she may have gotten railroaded, even though I am obviously not privy to the details of the case.


#25    Zach      (see all posts) 2009/12/05 (Sat) @ 01:49

I wonder if using rates to bin players in above average, average, and below average, instead of raw totals, would get rid of the survivor bias. Take all players with career OPS’s over .850 as the top group instead of, for instance, 500 home runs.


#26    acerimusdux      (see all posts) 2009/12/05 (Sat) @ 16:52

I wonder if players with great longevity don’t peak later, regardless of talent level. Not everyone ages at the same rate. If my biological clock simply ticks slower than yours, I might peak later, but play to a later age.

It might be interesting to only look at players who played to an advanced age, but weren’t especially good, who were maybe average performers for their best years. At what age did they peak?

Uniform aging curves based only on age seem to be only a rough approximation. Some players are good prospects who end up busts because they seem to have peaked at age 23, others make the majors at a very late age and unexpectedly go on to be productive regulars. If only there were a way to tell beforehand which were which!


#27    MGL      (see all posts) 2009/12/05 (Sat) @ 18:57

"If only there were a way to tell beforehand which were which!”

To some extent you can, since a lot of it has to do with body type and physiological and not chronological age. For example, there was a study which showed that college draftees peaked later than high school draftees, which makes sense since high school players who are drafted tend to be more physically developed and likely have a higher physiological as opposed to chronological age than college draftees.  IOW, an 18 year old high school player who is drafted is more like a 20 year old in terms of height, weight and body type.


#28    philly      (see all posts) 2009/12/05 (Sat) @ 20:42

mgl/27

That’s very interesting.  Do you have a link or a recollection where that study was posted?


#29    Tangotiger      (see all posts) 2009/12/10 (Thu) @ 22:16

Phil:

http://sabermetricresearch.blogspot.com/2009/12/bradbury-aging-study-re-explained-part.html

It’s like … suppose you want to find the average age when a person gets so old they have to go to a nursing home. And suppose you look only at people who were still alive at age 100. Well, obviously, they’re going to have gone to a nursing home late in life, right? Hardly anyone is sick enough to need a nursing home at 60, but then healthy enough to survive in the nursing home for 40 years. So you might find that the average 100-year-old went into a nursing home at 93.

But that way of looking at it doesn’t make sense: you and I both know that the average person who goes into a nursing home is a lot younger than 93.

But what Bradbury is saying is, “well, those people who went into a nursing home at age 65 and died at 70 … they must have been very ill to need a nursing home at 65. So they’re not relevant to my study, because they didn’t go in because of aging – they went in because of illness. And I’m not studying illness, I’m studying aging.”

That one difference between us is pretty much my main argument against the findings of the study. I say that if you omit players like Giles, who peaked early, then *of course* you’re going to come up with a higher peak age!

Bradbury, on the other hand, thinks that if you include players like Giles, you’re biasing the sample too low, because it’s obvious that players who come and go young aren’t actually showing “aging” as he defines it. But, first, I don’t think it’s obvious, and, second, if you do that, you’re no longer able to use your results to predict the future of a 26-year-old player. Because, after all, he could turn out to be a Marcus Giles, and your study ignores that possibility!

All you can tell a GM is, “well, if the guy turns out not to be a Marcus Giles, and he doesn’t lose his skill at age 31 or 33 or 34, and he turns out to play in the major leagues until age 35, you’ll find, in retrospect, that he was at his peak at age 29.” That’s something, but … so what?

I’m certainly willing to agree that if you look at players who were still “alive” in MLB at age 35, and played for at least 10 years, then, in retrospect, those players peaked at around 29. And I think Bradbury’s method does indeed show that. But if you look at *all* players, not just the ones who aged most gracefully, you’ll find the peak is a lot lower. There are a lot of people in nursing homes at age 70, even if Bradbury doesn’t consider it’s because of “aging.”

Ditto.


#30    MGL      (see all posts) 2009/12/11 (Fri) @ 08:16

Finished my study, which is pretty good I think.  The article is quite long.  Pretty much says what we know already, but I think it is worth reading. 

I’ll summarize: Using the delta method and correcting (as best as I can) for survivor bias, peak age is anywhere from 27-28, depending on the era.

If I restrict it to players with 10 years and 5000 PA, it is around 29-30, depending on era, and a much more gradual decline into the 30’s.

If I use only those players with at least 1000 career PA (but no min number of seasons), the aging curve is not much different than for all players.


#31    Guy      (see all posts) 2009/12/11 (Fri) @ 11:20

MGL:  Has/Is this being published somewhere?


#32    Tangotiger      (see all posts) 2009/12/15 (Tue) @ 15:27

Normally, I’d create a new thread.  But, I’d like for this to be the permanent home for all JC-related musings.

In the last few days, he has said:
http://www.sabernomics.com/sabernomics/index.php/2009/12/astros-sign-lyon/

I’ve got [Brandon Lyon] valued at $17.5 million over the next three seasons, so the deal seems about right to me.

And:

Halladay will make almost $16 million and I have being worth around $17 million. Lee will make $8 million, and I estimate him being worth $15 million.

Think about that.  Roy Halladay, in the running for pitcher of the decade, and STILL at the top of his game, is worth as much in 2010 as Brandon Lyon is worth in 2010-12 (according to JC).

Here’s more to think about: in any given season, Roy Halladay will pitch a bit over 200 innings.  In any THREE seasons, Brandon Lyon will pitch a bit over 200 innings total.

So, JC is saying that Halladay’s 200 future innings is worth 17MM$ and Lyon’s 200 future innings is worth 17.5MM$.

And somehow, his model has it that a reliever with a career 4.20 ERA is equal to a starter with a career 3.40 ERA?

And if you think that JC’s model is broken, think again:
http://www.sabernomics.com/sabernomics/index.php/2009/12/the-halladay-lee-blockbuster/

In all seriousness, starting pitchers is the one place where my estimates have been off in free-agent salary comparisons, on the low side. I designed my method to estimate fundamental values from baseball revenues, and after staring at the data for hours on end I’ve reached the conclusion that starting pitching is overvalued right now—call it the new market inefficiency.

Wow. 

***

Millsy, as the sole person who visits this blog who has been, if not a staunch defender of JC, at least one who is willing to give him the benefit of the doubt, the floor is yours.

Everyone, please, let Millsy give his best defense before you tell JC exactly what you think of his system.


#33    Guy      (see all posts) 2009/12/15 (Tue) @ 15:39

Not to short-circuit Millsy, but I will contribute two things in JC’s defense here:

1) in the past, he completely rejected the idea of incorporating leverage into his valuation of relievers (as emphatically as he rejected replacement value).  He now says that relievers need to be more highly valued on a per-inning basis.  I credit him with acknowledging his earlier error and changing his approach.  (Which isn’t to say I necessarily agree with the Lyon valuation.)

2) In most cases, JC’s value estimates seem to come in pretty close to what the market is saying.  To me, that’s a mark in favor of his model.  Much of sports economics is devoted these days to identifying alleged inefficiencies in the market—it boils down to economists saying “we’re smarter than the sports decision makers.” The problem of confirmation bias is endemic in this area, as the researchers stop once they think they find an inefficiency (Berri is the worst offender here, but has lots of company.) But invariably, the SME’s then discover a flaw in the analysis.  To his credit, Bradbury is more inclined to think that baseball market outcomes are rational.  (Notwithstanding his new theory that starters are overpaid.)


#34          (see all posts) 2009/12/15 (Tue) @ 15:41

I have had little to comment on with respect to JC Bradbury’s salary model.  I’m in no position to defend someone’s model I know nothing about.  That is silly to me.  If you want a defense, ask that person.  It’s not my place, and I refuse to pretend otherwise.

SP vs. RP is something I’ve been curious about but am not sure how to value.  But I did see that JC seems to admit his model purposefully undervalues starting pitching.  He seems to think it’s a market inefficiency.  I honestly don’t know and I’m not convinced this is the case either (in fact, I’d argue the other way around: I think teams value RP a bit much, especially when it comes to the type of stats they often seem to base their contract on--this is something I’ve stated in the past, but have no true empirical defense for).

My sympathies with Bradbury were on some of his perception of how the community sometimes attempts to communicate with others, not necessarily his salary model.  Maybe I’m wrong in agreeing with him (and others) on that part.  I think there is a pretty interesting discussion on aging going on on this site right now--a discussion in which I’ve stated I think both using a delta-type method AND using a fully known career curve are interesting additions.

I have no way to evaluate it other than the ‘occular test’--or just eyeballing it.  I found the Francouer debate here to be a bit silly on his part.  I’ve never claimed to know more than you or Bradbury about the subject of baseball analysis, but I do think he has some valuable insight into economics of sport, as I feel this site has an enormous amount of good information on evaluating player performance.  Hence, my visiting both sites.

Other than with respect to perceptions of interactions among a community, please tell me where I give him the benefit of the doubt so asusuredly?


#35    Tangotiger      (see all posts) 2009/12/15 (Tue) @ 16:02

Millsy: MGL used to be the one the most pro-JC (or least anti-JC).  Millsy is now the most pro-JC (or most neutral-JC).  JC refused to comment on this blog (he posted twice here, and neither make him look good), even after I asked him to move forward on a productive discussion.  He’s not interested in interacting with me or the readers here (though I have no doubt he tries to learn as much as he can from us… only a complete moron would refuse to learn from us because he has a personal issue with us).  Anyway, I wanted the best possible advocate for JC, and I thought you were it.

***

Anyway, Guy made the best case possible, I think.

So, everyone, feel free to post whatever you want to post.

***

I will also say that I would welcome JC to join the discussion here, where all attacks are above the belt, and all bouts end with a hearty round of drinks.


#36    Tangotiger      (see all posts) 2009/12/15 (Tue) @ 16:04

Guy: if 200 innings of Brandon Lyon is worth 17.5MM$, I wonder what 200 innings of Joe Nathan and Jon Papelbon are worth?


#37    Guy      (see all posts) 2009/12/15 (Tue) @ 16:12

Tango, I’m not defending the model.  In any case, that’s impossible since he hasn’t explained the new model AFAICT.  I assume that will come in his next book.  Which is fine, but until then I’m not sure why he expects anyone to care what his valuations say. 

I do think you’ve picked JC’s least plausible valuation to highlight (of those I’ve seen).  Are there others you have a big problem with?


#38          (see all posts) 2009/12/15 (Tue) @ 16:14

I understand, Tango.  I just don’t feel that it would be good practice for me to do that to a significant extent.  This is especially true when I have no idea what the model entails (nor have I gone through all the data myself, as I’ve played around with some differnet types of sports econ stuff).  No harm, I just wanted to be clear on that.  I understand why you didn’t expect a response from JC as well.

Perhaps he has a more positive projection of Lyon than a 4.3 ERA?  I dunno.  If for some reason a model assumes Lyon will produce at a comparable talent level to Halladay for short outings, and he is only used in short outings, I guess it would make some sense, or at least give a value output in line with that sort of improvement+leverage advantage.


#39    Tangotiger      (see all posts) 2009/12/15 (Tue) @ 16:29

Guy: I understand you were just trying to rationalize it, not defend it.

And, yes, I have to pick the least plausible, because that’s how you can tell if a system works or not.  We’re not going to learn anything if I take an average pitcher or an average nonpitcher.  Presumably, the average is calibrated to the average.

I’m sure he’s got Brandon Lyon as a better than 4.20 career ERA for his forecast.  It would still be hard for anyone to say that 200 innings from Lyon, at whatever reasonable forecast you want to give him, is going to be valued the same as 200 innings from the best Cy Young contender, and 2.5MM$ more than Cliff Lee.


#40    Colin Wyers      (see all posts) 2009/12/15 (Tue) @ 17:58

There is no point defending Bradbury’s model (or decrying it, for that matter) until he actually publishes it, or at least publishes enough information that we can make an educated guess about it.

It may not be worth it even then.


#41    Tangotiger      (see all posts) 2010/01/11 (Mon) @ 17:50

JC publishes on BPro:
http://www.baseballprospectus.com/article.php?articleid=9933

Can someone cut/paste the part where JC talks about MGL?  Thanks…

***

Good comments from the BPro readers.  And, to a few of them, it looks like aging articles is a brand-new thing too.


#42          (see all posts) 2010/01/11 (Mon) @ 18:02

I guess I shouldn’t post with my name so I don’t get in “trouble”. Too late… Let me know if you would like more from the article.

Recently, Mitchel Lichtman used a modified delta-method approach to quantify aging that attempts to correct for the survivor effect. His solution was to include players who would typically be dropped from the sample (i.e., players who do not play in consecutive years) by assigning them hypothetical performances for the following year. This sounds good in theory, but where does this hypothetical performance come from? Lichtman explains:

The projection is their last three years lwts per 500 PA, weighted by year (3/4/5) added to 500 PA of league average lwts for that age minus five. In other words, I am regressing their last three years lwts (weighted) toward five runs lower than a league average player for that age.

While Lichtman believes using five runs below average generates a “conservative” projection, the substitution is just a guess informed by nothing more than a hunch. In this case, the guess imposes the outcome for the exact factor we are trying to measure: the estimated decline is a pure product of the assumption. Thus, it is no surprise that Lichtman’s adjusted delta-method estimates yield results that differ little from his raw delta-method estimates.

The good news is that we don’t have to guess what players might have done; instead, we can look at how players who continue to play over several years age and not rely on snapshots of one-time annual changes. Such a sample has two advantages over the delta method. First, the fluctuations in player performance due to random noise—not aging—will be smoothed out over time to generate a trend. Second, and more importantly, it allows us to track individual players over time to see how each player progresses according to his own unique baseline.

Doing this analysis requires using a multiple regression analysis technique that allows us to observe how a cross-section of players change over time, while controlling for other potential factors that affect performance but have nothing to do with age. Part II describes the study I conducted using this method and discusses its results.


#43    Tangotiger      (see all posts) 2010/01/11 (Mon) @ 22:20

It sure would help BPro readers if someone at BPro did a summary of MGL’s articles at THT in an Unfiltered post. 

Really, the readers there, who see the obvious selection bias issues, as well as the lack of potential forecasting would appreciate MGL’s article.


#44    Nick Steiner      (see all posts) 2010/01/11 (Mon) @ 22:27

^^ hint hint (Colin) hint hint


#45    MGL      (see all posts) 2010/01/11 (Mon) @ 22:48

Yes, the “regressing to 5 runs worse than average” was just a complete guess, but it won’t change the results much at all if I use a “normal” mean or even 10 runs worse than average.  So JC’s point is completely invalid.  It is a straw-man argument I think.

And again, we have the problem with JC using players who have already played a long career, after the fact.  There is virtually no use for that kind of trajectory. He is not wrong in declaring that, “This is the observed trajectory for players who have played 10 years and 3000 PA (or whatever),” but to say, “Then this is also the trajectory for the ‘average’ MLB player,” is just plain wrong.  That trajectory is useless for helping us project player performance.

In a post a few weeks ago, I showed the future trajectory for a player who played at least 5 years and 2000 PA before the age of 27, or something like that. It was pretty much like my generic trajectory and nowhere like JC’s trajectory.  That suggests that his trajectory is extremely esoteric and almost surely not typical.  It is a very biased one and almost surely one produced by sheer luck.  IOW, I doubt that those are “true trajectories” (representing the change in “true talent” as a player ages).

Plus, there is the fact that when I did the 1000 career PA guys, I found a trajectory nowhere near what JC claims for those guys.  I seriously suspect that he is mistaken about that.  Let’s just say that I would like to see his data on that.  All I remember from his article is him saying that when he reduced the requirements to players with only 1000 PA (and no number of years I think), he got pretty much the same results (as his 10 years, 3000 PA or whatever it was). I think that is pure B.S.


#46    MGL      (see all posts) 2010/01/11 (Mon) @ 23:02

JC also writes this (on BP):

The mode peak age will likely lie below a player’s expected peak, because there are two main factors that cause players to decline and leave the game: aging, and non-aging-related injuries.

Injuries and age are correlated, but players also suffer performance declines that have nothing to do with age. For example, in 2008 Chien-Ming Wang severely damaged his foot while running the bases, and it’s unlikely that Wang will pitch as well as he did when he was 27. This injury could have struck Wang just as easily at 24 or 36. Because players who wash out of baseball are normally replaced by young players, there will be many more players having their best season at a young age. When we look at the mode, we are not differentiating from the cause of deterioration, however; non-aging attrition gives more players the opportunity to have peak ages earlier than later. Studies that employ the mode to measure aging inadvertently pick up this bias, which as nothing to do with aging.

IMO, once you start trying to separate aging from injury, you are getting into a whole new arena.  I have always assumed that they are inextricable and that when we talk about aging, we are also talking about age-related injury.  Surely even when a player suffers a freak catastrophic injury, it is often (maybe always) related to age. If I fall off a bike I am a lot more likely to break something than my son is.

JC saying that he is talking about pure aging and not “non-age related” injury and that everyone else is conflating the two seems silly to me.  Maybe if we take away injury from the equation, JC’s numbers are more right than everyone else’s, but darned if I know how to do that or if I want to do that.

Here is what JC said about the 1000 career PA sample of players as compared to the 5000/10 year (I said 3000, but I meant 5000) players:

As a final test, I estimated the model using a larger sample of hitters who only had 1,000 career plate appearances. This had very little effect on the estimated aging function. The diagram below maps the aging functions for the original and less-restrictive samples, with the improving and declining performance measured in standard deviations below the peak. The estimated peak ages in both samples are virtually identical, and the rises and declines are similar. Thus, it appears that the sample restrictions are not biasing the estimates of peak age upwards.


#47    MGL      (see all posts) 2010/01/11 (Mon) @ 23:23

Here is what I wrote in the comments section on BP, which pretty much sums up my thoughts on the matter.  JC’s study, his comments, and his “followers” are really starting to piss me off.  I’d like you guys to read these comments below very carefully.

This trajectory has no useful value. It surely cannot be used for any projection purposes.  It simply tells us the average “observed” (which is very different than “true,” as I explain below) trajectory for very good players who had long and prosperous careers.  Those players are a very small subset of all players at any age, especially at the younger ages (what percentage of young players end up having a career of at least 10 years and 5000 PA with at least 300 PA per year?).

So he comes up with an observed aging curve for a very, very small subset of players who by definition peak late and age gracefully (gradually).  If we assume that all players have somewhat different “true” aging curves (if, because of nothing else, their differences in physiological versus chronological age), his subset of players is one that necessarily is going to have an aging curve with a late peak and a gradual decline - otherwise they likely would not have lasted that long and played as much and as regularly as they did. 

In addition to that, and to make matters worse, the trajectory he found is not even a trajectory of “true talent.” Because of his selection bias, it will necessarily be comprised of players who, by chance alone, had late peaks and gradual declines.

To illustrate that, let’s say that all players have the exact same true aging curves.  Now, if we let all players play 10 years, obviously by chance alone, some players will peak at 26, some at 32, etc. (even though they all have the same true peak).  And some players will have steep post-peak (and pre-peak of course) trajectories and some will have shallow ones (in fact, every possible shape will occur if we have enough players in our sample), again by chance alone, even though they all have the same “true” shape. The players who peak late by chance and have a gradual performance decline by chance alone will tend to dominate JC’s sample.  Basically JC’s sample (a VERY small subset of MLB players) consists of players who have true trajectories that peak late and decline gradually AND players whose “observed” peaked late and declined gradually, by chance alone.  Is it any surprise that he finds a peak of 29 or 30 and a gradual decline after that?  Heck if we look at players who played 15 years and 7000 PA, we are likely to get a later peak and more gradual decline still!  Does the name Bonds sound familiar?

I am sorry, but with every fiber in my body, I think that it is ridiculous to characterize JC’s resulting trajectories as “of the typical MLB player,” or some such thing.  It is an “observed” (as opposed to “true” - representing the changes in true talent of a player over time) aging trajectory of a very small subset of players who we already knew had long and prosperous careers.  Nothing more and nothing less.  Can someone tell me any practical use for this kind of data?


#48    Fargo      (see all posts) 2010/01/11 (Mon) @ 23:33

re #2: “At what age do economists start to decline?”

You mean economists who never make it to a major league university? Kennesaw State University is Carolina League level.


#49    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 00:12

Ditto MGL/47 in its entirety.


#50          (see all posts) 2010/01/12 (Tue) @ 01:21

Fargo, I would never discount someone solely on the basis of their institution of employment (or their alma mater either). Basing it on JC’s work and his discourse with others, however....


#51    MGL      (see all posts) 2010/01/12 (Tue) @ 01:46

I am no fan of JC, and he seems to like criticizing the sabermetric crowd for often no good reason, but sorry Fargo, cheap shot....


#52    Blackadder      (see all posts) 2010/01/12 (Tue) @ 06:52

It is a bit cheap to go after JC for his university affiliation.  Granting that, JC likes to play the credentialism card against online criticism to a rather sickening degree, which of course is practically inviting people to examine said credentials.


#53    Guy      (see all posts) 2010/01/12 (Tue) @ 07:47

I of course agree with MGL’s overall criticisms.  Two small points:

1) Bradbury’s selection bias is actually even worse than requiring 5000 PAs and 10 seasons—the player has to meet those threshholds specificially from ages 24 to 35.  So Carlos Baerga, despite almost 6000 PA and 10 300-PA seasons, doesn’t qualify because about 30% of his PAs came before age 24.  His peak at ages 23-24 gets left out. 

2) I doubt JC is wrong in reporting that his results are the same including players with 1000+ PAs.  Most of the short-career players only play a few seasons in their mid-20s, and he’s controlling for their career mean, so it’s easy to see how those extra data points don’t change his overall curve very much.  (Of course, if you add hundreds of players to your sample who peaked at an earlier age than your original sample, and your method still reports the same peak age, you could conclude there’s a problem with the methodology rather than seeing it as vindication.)


#54    tangotiger      (see all posts) 2010/01/12 (Tue) @ 09:02

Can someone ask JC this: if the peak age is 29-30 under your original criteria (say players n), and it is also the same when you lower your threshhold to 1000 PA with no seasonal requirements (say players n+x), then what would the peak age be for just the guys that were added to your original sample (players x).

If he reports the same 29-30, then I find it hard to believe.  If he reports 27-28, then ask him how come these guys are so underrepresented in his n+x study.

The answer, I’m afraid, is that we don’t understand his methodology, and we have to go to school to understand him.


#55    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 13:17

Maybe someone can send “sockeye” a link to MGL’s own study.

sockeye
(31659)

MGL,

This particular topic is new to me and I come into it with an open mind, unfamiliar with whatever controversy there is around it. Your “Us vs. Them” approach isn’t working for me here. I’m sorry if you don’t think an economist can cross over into baseball unless he’s got a 100% watertight study, but unfortunately, this is actually how research often progresses. It is increasingly accepted within scientific fields.

I say this because I went and downloaded Bradbury’s full article last night, and poked around enough on the intertubes to find that your initials are everywhere on this critique. Let it go, man, you’re beating a dead horse, and are so wrapped up in it you can’t see the good. You focus entirely on whether the study is flawless (it’s not, and he says so), and not on what it kind of instruction it can provide going forward. Honestly, it smacks of a grudge, and such things don’t come off well within debate over published research.

Bradbury states in his ABSTRACT, he’s trying to “...isolate the effect of age on several player skills...” Paring a dataset down to try to understand underlying processes is OK, as long as you acknowledge it. He does this. It doesn’t mean he’s saying that all players will peak at 29. He’s providing evidence that these skills might do so, unfettered by things like injury, and limited to players good enough (or lucky enough, or persevering enough) to play during those ages.

Here’s some text from his conclusion:

“...while controlling for many exogenous factors
that influence player performances and might inhibit
the isolation of the impact of ageing, this study finds
that both groups reach peak performance around age
29. Players peak earlier in skills that require more
athleticism, and later in those that require less
athleticism.”

I don’t see him making the claims that it’s the end-all study that you seem to think he’s saying it is, or that it predicts what will happen to a given player. He’s trying to isolate factors so he can look at the effect of ageing on SKILLS, sans confounding factors.

Your choices, as I see it, are to throw rocks all over the intertubes about it not being representative because of the filtering (or because he’s not in your sandbox), or to do some follow up research to find out HOW REPRESENTATIVE it might be if you applied it to other, different populations.

His model and methods are well documented. With your skills, you could easily replicate it, then compare the shape and fit of different models that address your concerns. What would happen if you widened the age group, still including age-29? Compressed it? Fit different curves and compared them with an AIC? That’s how you win the argument, not by saying “There is no one in his corner that I am aware of, at least that actually does any serious baseball work.”

Bradbury has provided something possibly worth building on, and instead it sounds like a very entrenched faction is just telling him to get out of their sandbox.

And MGL has no grudge.  Indeed, he came pretty late to this discussion, and he gave some sober data.

The problem is that we’ve already won this argument, but BPro has chosen to give JC yet another platform without vetting it among their own team.

If there is one saberist on BPro that supports JC’s article, please step forward.

BPro would have been better off posting Phil’s critiques, since he gives the balanced views here.  Especially since this topic is brand-new to most BPro readers to begin with.


#56    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 13:19

By the way, BPro, why don’t you hire Phil to be “ombudsman”.  He’s the most respected of all the analysts out there, the one guy who is balanced, and is fantastic at introducing topics and making sense of them.  He’s the one guy, above all else, who can introduce a topic like no one else can. 

And if you appoint him ombudsman, then we know that he’s untouchable.


#57    Mike Fast      (see all posts) 2010/01/12 (Tue) @ 13:33

The problem is that we’ve already won this argument, but BPro has chosen to give JC yet another platform without vetting it among their own team.

If there is one saberist on BPro that supports JC’s article, please step forward.

BPro would have been better off posting Phil’s critiques, since he gives the balanced views here.  Especially since this topic is brand-new to most BPro readers to begin with.

I agree.

I am really disappointed to see this.  I am still waiting to see any evidence that the direction of BP is really changing from the top.  It’s great that they’ve brought Eric, Matt, Russell, and Colin onboard (and apologies if I forgot someone else).  Really it is.  But if Will, Kevin, and Christina are going to continue to drive the overall tone of the site, I’m not optimistic that it’s going to be any more of a friend to honest investigation of the facts than it has been.  Giving JC the platform that they just gave him was a stupid thing for them to do, and either (1) they didn’t realize that, which means the new authors have no real voice in the direction of the site, or (2) they realized it and went ahead with it anyway, which is probably worse than option 1.

The reading population at BP is largely unaware of the best saber work out there and BP shares a huge chunk of blame for that.  Allowing a few throwaway references to other sites is not going to change that.  It’s going to take a real attitude change by the powers at BP to make them a real contributing and involved member of the community again, and so far I don’t see much evidence of that.

I am disappointed because I had hoped that the recruiting of new authors was a sign of real change behind the scenes.  But if it’s only a few new guys writing columns to stanch the bleeding of the subscriber base and pretty up their stats, I’ll pass and leave BP in sabermetric dustbin where it’s been for a while now.


#58    Nick Steiner      (see all posts) 2010/01/12 (Tue) @ 13:43

And if you appoint him ombudsman, then we know that he’s untouchable.

That sounds like the mafia wink


#59    Nick Steiner      (see all posts) 2010/01/12 (Tue) @ 14:01

I agree with Mike and Tango, it’ disgusting that BPro is giving JC a platform for this, and it displays and utter lack of awareness of the current sabermetric research (or perhaps, it’s intentional… they need to be “unique” after all).

Their readers are even worse.  I believe that JC linked to MGL’s THT articles (or at least one of them) in the OP?  Well according to THT referral logs, there was ONE visit coming from a referral from the BPro article. 

Only one fucking reader actually clicked the link, yet you have guys like sockeye criticizing MGL.


#60    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 14:31

What’s weird is sockeye saying MGL had his hands all over this spat, when MGL has bent over backwards to tell us to cool down, because JC couldn’t possibly be saying what the rest of us said he was saying.  I think MGL is one of JC’s biggest defenders frankly, and that goes back to 12MM$ man Francoeur.

MGL also is beyond reproach because he doesn’t really care other than the truth.  I think I am as well, as is Phil.  I’ll characterize MGL as a DH, who comes in to do his job when he has to (sit and hit).  I’m like an NHL enforcer, and I do my job when I see no one else is doing it, or that I need to throw in a few bloody punches to keep people honest.  Phil is like a referee or umpire, who steps in with lucid thoughts.

If sockeye or anyone else sees any other intent, then he’s mistaken.


#61    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 15:43

JC:

My study is imperfect, as making sample choices always involves tradeoffs. I went with what I thought was best and I openly acknowledged the downside as well as the upside. Furthermore, as I stated above when I drop the sample inclusion requirements considerably the results do not change. When I look at HOF players versus the entire sample, they don’t appear to age much differently.

I would welcome anyone to study aging in baseball through new methods. As I have highlighted above, the methods previously employed to measure aging have some serious problems, and I think that my study handles these issues better. I don’t intend for this study to be the last word on the subject, and I would be delighted to see further studies that employ advanced empirical methods designed to handle the relevant issues.

Yes, we’re all not qualified to do an aging study unless he approved.  As if what MGL did has little weight.

It is really an impossible task to talk to JC, and I get along with everyone.  Give me Mike Silva any day.  Mike and I may be diametrically opposed, but we can find a common ground.


#62    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 16:06

JC’s entire response to MGL:

And people accuse me of appealing to authority?

I’m using the same data to estimate aging functions, but I’m using a different estimation technique. Though advanced,it is quite common and controls for identified problems with other methods. Inserting a -5 runs below average performance by age for non-survivors as correction for the survivor effect arbitrarily imposes aging into the sample. It’s just the delta method all over again, and I’ve explained the problems with the delta method.

If not -5, try -10.  Try different things.  Try whatever. It is not the delta method problem all over again.

What a disappointing response.

And I noticed that “sockeye” got a rating of +7 and MGL got a 0.  {shakes head}


#63    MGL      (see all posts) 2010/01/12 (Tue) @ 17:59

Sockeye said this:

“He’s providing evidence that these skills might do so, unfettered by things like injury, and limited to players good enough (or lucky enough, or persevering enough) to play during those ages.”

I think that is a somewhat fair statement actually.  However, JC’s sample is also biased toward players whose observed performance trajectory is skewed by chance (luck) alone.

It is exactly the same problem in reverse as survivor bias.  Players who do not survive in Year X+1 tend to have gotten unlucky in Year X.  Players in JC’s sample tend to have gotten lucky all along the way, pushing the peak age forward as they do so and flattening out the post-peak part of the curve as they go forward in their careers.

That is always a problem when you choose a group of players based on performance rather than choosing all players, a random selection of players or ones that are not selected based on performance.


#64    MGL      (see all posts) 2010/01/12 (Tue) @ 18:08

JC said in the BP comments:

JC said:

“I especially appreciate the positive feedback.”

(To me) That says it all…


#65    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 20:15

I had the same thought.  Basically, it’s not that he thanked you for your well-reasoned arguments, but he thanked sockeye for his defense of your well-reasoned arguments.  That’s what it amounts to.


#66    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 20:19

"Ben Solow
(35415)

That’s cool MGL, now you’re being a bit ad hominem in your last couple lines. There’s good ways and bad ways to go about disagreeing with someone, and this particular post is an example of a bad way.

Not that I disagree with your substantive point, but it would be cool to keep your responses exclusively substantive.”

Really?  There are 88 comments in that thread.  I’m sure I can pick out at least 44 that are not exclusive to substance.  But it’s MGL that gets picked on.

BPro is the new pre-snark Primer: forget about arguing the issue, let’s argue on how others argue!  And here I am, arguing about the guy arguing about arguing.


#67    MGL      (see all posts) 2010/01/12 (Tue) @ 21:23

It is not an ad hominem attack because it has nothing to do with my discussion of the substantive issue.  An ad hominem attack or argument is when an “attack” on a person is represented as or is a diversion from a substantive argument.  It is certainly OK to state, “And by the way, here is a comment on something else that the person said, which has nothing to do with the argument at hand.”

Did I have to explicitly say, “Warning! Time out.  This next comment I am about to make has nothing to do with the argument at hand.  Nothing at all.”

I’ve spoken my peace (or is it piece?) on JC’s research and on aging in general, and I don’t really have anything more to say on either issue, otherwise I run the risk of being even more redundant and repetitive than I already have been.  And as always, I could be wrong on one or more of the assertions that I have made.  Not to mention the fact that there is a lot of muddy water and gray area in this particular topic.


#68    Nick Steiner      (see all posts) 2010/01/12 (Tue) @ 21:50

It seems to be that JC has a different definition and use for his study than MGL does.  JC apparently wants to isolate the aging factor from the injury factor, and using players who weren’t seriously injured.  Using his cutoffs and age restrictions would do the trick I believe.  I think his view on the matter is that injuries are random, and not related to age - so he doesn’t want to include players who were injured for fear that they will impart noise into the study. 

The problem with that is, even if his view that injuries are mostly random is valid (and I have a sneaky feeling that injuries are correlated to age, which would explain why the rest of the studies found a lower peak), is that he still has massive selection bias against players who peaked early and were out of baseball by 30 for non-injury reasons.


#69    MGL      (see all posts) 2010/01/12 (Tue) @ 22:06

If you want to say that he is eliminating those players who suffer a career altering or ending injury, you also have to say that he is eliminating those players whose careers are altered or ended because of poor aging, right?

The problem with JC’s study can be summed up in two short sentences that can be understood by a smart high school math student:

He is eliminating all players who age poorly.  Therefore he comes up with the peak age and trajectory of players who age well.

How can that resulting trajectory be called or discussed as, “The aging curve of the MLB player?”

How can JC or anyone else argue with these two sentences?


#70    Nick Steiner      (see all posts) 2010/01/12 (Tue) @ 22:09

If you want to say that he is eliminating those players who suffer a career altering or ending injury, you also have to say that he is eliminating those players whose careers are altered or ended because of poor aging, right?

Well yeah, MGL, that’s basically exactly what I said.  BTW, why not just post that line - just that line - at BPro and see if anyone can rebut it?


#71    Guy      (see all posts) 2010/01/12 (Tue) @ 22:27

"He is eliminating all players who age poorly.  Therefore he comes up with the peak age and trajectory of players who age well.”

This is indeed the central issue.  And yet JC—over thousands of words in 5 or 6 posts throughout this debate—has not once responded to this criticism or even acknowledged it.  Instead, he says the main critique of his study is that his sample consists of very good players, and then knocks down that strawman by showing good players don’t age differently than inferior players. 

Now, I can’t say for sure whether JC is being disingenuous here, or literally doesn’t understand the critique despite many of us having stated it clearly many times.  But either way, what’s the point of continuing to engage him? And why reward BPro with more page views?  This isn’t a good faith intellectual exchange. JC’s been proven wrong six ways from Sunday, but he can’t or won’t admit it.  His loss.  Well, his and other economists who will read this paper in the future and mistakenly think it has merit (because, after all, it was peer reviwed!).  Time to move on.....


#72    Patriot      (see all posts) 2010/01/12 (Tue) @ 23:12

JC on Twitter:

Replace her w/ MGL RT @michaelianblack My wife asks why I Tweet so much...because my capacity for being hated cannot be met by her alone


#73    Tangotiger      (see all posts) 2010/01/12 (Tue) @ 23:48

I don’t really care that JC doesn’t get it.  What bothers me is that sockeye and guys like him are now following JC.  Thanks Will for “leading” this discussion.

Methinks JC likes groupthink, when it’s positive toward him.  Otherwise groupthink is something bad when people listen to me.

***

And yeah:

“He is eliminating all players who age poorly.  Therefore he comes up with the peak age and trajectory of players who age well.”

Totally. 

Here, just for fun, I went to b-r.com, looked for all players with at least 5000 PA between the ages of 24 and 35 born since 1931 (Mays/Mantle).  B-R.com comes back with 205 names, with 390 players.

#195 has an OPS+ of 114.  So, well above average.  That’s like a +10 runs per season player.  92 of the players have an OPS+ below 100, or 24%.  So, 24% of the players are below average hitters, and 76% are above average.

How is this representative?

If I bring it down to 1000 PA minimum, I get 1695 players.  The guy at number 848 has an OPS+ of 96, which is a bit below average.  (Presumably, the overall average of these guys is close to 100, on the basis the good guys are much better than the bad guys are bad.)

So, the 1000 PA seems to at least give you a decent sample.

In no way do I believe that the peak age remains the same, as JC continues to assert. If his methodology says that, then he’s messed up somewhere.


#74    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 00:02

Patriot, I don’t understand the tweet language.

Is this part written by Michael Ian Black:

“@michaelianblack My wife asks why I Tweet so much...because my capacity for being hated cannot be met by her alone”

And then, JC quoted that, and added this part himself:
“Replace her w/ MGL RT”

Did I get that right?  What’s RT?

Nobody hates you JC.


#75    Mike Fast      (see all posts) 2010/01/13 (Wed) @ 00:08

You got that right, Tom.  RT = re-tweet.


#76    Nick Steiner      (see all posts) 2010/01/13 (Wed) @ 00:55

You know, I’ve re-read his study and it doesn’t sound as preposterous as I first thought. 

He really is trying to do something completely different than MGL, and I don’t think that he really thinks his curve is predictive (meaning if you gave him a 24 year old player, he would probably say MGL’s curve would be more in line with his actual career).  He is really just trying to isolate the aging factor, and trying to factor out all other causes of decline or improvement.

So, since that forces him to look at players who were not injured, he has to use the cutoffs.  We know that the cutoffs will necessarily show an inflate peak age due to the fact that it doesn’t include players who peaked early for aging reasons.  However, I don’t think JC thinks that effect is that big, so he chooses to ignore. 

One way I think we can prove him wrong is by looking at players who played in the majors at age 24 and the minors at age 35.  That should allow us to see the true aging peak of players who peaked early.


#77    Rally      (see all posts) 2010/01/13 (Wed) @ 01:03

I would bet that most early peak players who are not in the majors at 35 aren’t in the minors either.


#78    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 01:37

Nick, you sound like MGL was sounding at the beginning, giving JC all those benefit of doubts.  Ask JC exactly what he thinks.  The group will wait for you to think like we do.


#79    Zach      (see all posts) 2010/01/13 (Wed) @ 01:45

To weed out players who fell out of the majors due to injury, why can’t you do a delta method like MGL’s aging study and remove the last year of each player’s career?

Tango did this a while back, and when he removed the last year, the peak age increased from 26 to 27 and the rest of the graph shifted right.

http://www.tangotiger.net/aging.html


#80    Nick Steiner      (see all posts) 2010/01/13 (Wed) @ 02:38

To be clear, I think the selection bias of players who had late peaks in JC’s study makes it a crappy study.  However, I think that JC and MGL, et al, are measuring two separate things.  MGL’s study is looking at the expected career trajectory of an average player, and allowing that to be influenced by other things besides aging (mainly).  JC is simply trying to look at how, just, aging effects performance. 

Obviously MGL’s is more applicable to the average player; however, I don’t think it’s fair to compare JC’s study to MGL’s because they are trying to measure two separate things… UNLESS, JC tries to extrapolate outside the realm of his study by using it as a predictor of a players career trajectory.

Actually, looking at the discussion that started this thread, JC is doing just that - so I think it’s fair to rip him.


#81    dcj      (see all posts) 2010/01/13 (Wed) @ 06:10

I think it is impossible in practice to separate aging from injury. Players get hurt all the time, and frequently they play through it. I remember hearing during Adrian Beltre’s monster year that he had a leg injury which made it really painful to swing at low and away breaking balls. I have no idea whether that’s true, but it highlights the difficulty of trying to infer injury from the stats.

Also, why would you want to separate aging from injury? Right now Ubaldo Jimenez has one of the best fastballs in the game. Where will he be in 5 years? Possibly he’ll blow out his arm and be out of baseball. More likely, he’ll have had some minor arm trouble and be throwing 92 instead of 96. Or, he might stay perfectly healthy and keep his velocity a la Nolan Ryan.

We care about the whole range of possibilities, appropriately weighted. You could call any injury-related decline a deviation from the “true” aging curve, but what would be the point?

By contrast, it might be useful to split out severe injuries of the career-threatening kind. Like this: “Pitcher X has a 15% chance of blowing out his arm by 2014. If he doesn’t, here’s the projection for how good he’ll be that year.”

To oversimplify, there are three possible career trajectories.

(1) major injury
(2) minor injuries/wear and tear
(3) perfect health

JC, I think, wants to group (1) and (2) together while setting (3) apart. To me, it makes more sense to set (1) apart and group (2) and (3) together.

His study makes things worse by attributing any pronounced early-30s decline to injury rather than deteriorating skills. Guy mentioned Carlos Baerga. Andruw Jones is another case. MGL hit the nail on the head:

“He is eliminating all players who age poorly.  Therefore he comes up with the peak age and trajectory of players who age well.”

As Nick says, this is fine as far as it goes. The study is what it is. The problem is when JC writes:

There is no doubt that age will sap players of their ability over the course of most free-agent contracts; however, the decline isn’t as pronounced as many people believe.
...
According to my estimates, a hitter who has a .900 OPS at his peak would be expected to post around an .850 OPS at 35; a pitcher with a peak 3.5 ERA is expected to post around a 3.75 ERA at 35.
...
Therefore, a good free agent in his early-30s will likely still be a good player when his contract expires.

That is from the Huffington Post article linked at the top of this thread. It is just wrong. End of story.


#82    MGL      (see all posts) 2010/01/13 (Wed) @ 06:34

"According to my estimates, a hitter who has a .900 OPS at his peak would be expected to post around an .850 OPS at 35; a pitcher with a peak 3.5 ERA is expected to post around a 3.75 ERA at 35.”

Yes, of course that is flat out wrong, unless he means or adds…

“If in fact we have a crystal ball and we know that that player continues his career past his peak and eventually amasses at least 10 years and 5000 PA and at least 300 PA per season.”

In that case, he is correct, but is that really what JC means and is that really what anyone wants to know?  Can he be so dense as to not understand that that is what his study evinces?

Surely most people want to know what a player’s aging curve is likely to look like going forward and long before that player reaches his mid to late 30’s.  JC’s study will NOT help you with that.


#83    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 08:39

There is exactly one good thing with JC’s study: it sets the threshhold as to how good a player can age (on average).  That doesn’t help us much, but it does help us see a potential goal in terms of handling suvivorship bias in the delta method.

We can’t handle the bias in such a way that the delta method says a player peaks at age 33 or something.

Otherwise, all of JC’s conclusions are useless.  Keep his studies, through out his analysis.


#84    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 11:01

BPro readers definitely don’t appreciate MGL’s style:

MGL
(2121)

Doesn’t this study boil down to the following statement?

“Players who play longer than average have later peaks than average.”

And isn’t that almost begging the question?

Yes and yes.

Not only do players who play longer have later “true” (underlying talent) peaks, but their observed peaks (which are not necessarily the same as their “true” peaks - for example, a player might have his best season at age 22 or at age 36) are also going to be later than their true peaks. It is sort of survivor bias in reverse. Players in JC’s sample tend to have gotten lucky all along the way, pushing the peak age forward as they do so and flattening out the post-peak part of the curve as they go forward in their careers.

JC said:

“I especially appreciate the positive feedback.”

When I write an article or do research, I especially appreciate the criticism. I have nothing to learn from the “pats on the back.” But that’s just me…
Jan 12, 2010 14:21 PM
link
rating: -2

So, even though what he said was quality stuff, because he added the Primer-approved snark at the end, BPro readers are slapping him with a negative.

Meanwhile, sockeye really adds nothing to the discussion:

sockeye
(31659)

MGL,

This particular topic is new to me and I come into it with an open mind, unfamiliar with whatever controversy there is around it. Your “Us vs. Them” approach isn’t working for me here. I’m sorry if you don’t think an economist can cross over into baseball unless he’s got a 100% watertight study, but unfortunately, this is actually how research often progresses. It is increasingly accepted within scientific fields.

I say this because I went and downloaded Bradbury’s full article last night, and poked around enough on the intertubes to find that your initials are everywhere on this critique. Let it go, man, you’re beating a dead horse, and are so wrapped up in it you can’t see the good. You focus entirely on whether the study is flawless (it’s not, and he says so), and not on what it kind of instruction it can provide going forward. Honestly, it smacks of a grudge, and such things don’t come off well within debate over published research.

Bradbury states in his ABSTRACT, he’s trying to “...isolate the effect of age on several player skills...” Paring a dataset down to try to understand underlying processes is OK, as long as you acknowledge it. He does this. It doesn’t mean he’s saying that all players will peak at 29. He’s providing evidence that these skills might do so, unfettered by things like injury, and limited to players good enough (or lucky enough, or persevering enough) to play during those ages.

Here’s some text from his conclusion:

“...while controlling for many exogenous factors
that influence player performances and might inhibit
the isolation of the impact of ageing, this study finds
that both groups reach peak performance around age
29. Players peak earlier in skills that require more
athleticism, and later in those that require less
athleticism.”

I don’t see him making the claims that it’s the end-all study that you seem to think he’s saying it is, or that it predicts what will happen to a given player. He’s trying to isolate factors so he can look at the effect of ageing on SKILLS, sans confounding factors.

Your choices, as I see it, are to throw rocks all over the intertubes about it not being representative because of the filtering (or because he’s not in your sandbox), or to do some follow up research to find out HOW REPRESENTATIVE it might be if you applied it to other, different populations.

His model and methods are well documented. With your skills, you could easily replicate it, then compare the shape and fit of different models that address your concerns. What would happen if you widened the age group, still including age-29? Compressed it? Fit different curves and compared them with an AIC? That’s how you win the argument, not by saying “There is no one in his corner that I am aware of, at least that actually does any serious baseball work.”

Bradbury has provided something possibly worth building on, and instead it sounds like a very entrenched faction is just telling him to get out of their sandbox.

Jan 12, 2010 08:10 AM
link
rating: 10

And he gets a +10.

Jared Cross responds to sockeye with a big fat zero:

Jared Cross
(694)

sockeye, your argument would make sense if Bradbury’s study, while flawed, was the best study of aging. In that case, it would be something to build on.

But, Tangoand Nate Silver had done better studies of aging before Bradbury’s and MGL did a better study after. Those are the studies we should build on. That’s what’s being pointed out in the comments.
Jan 12, 2010 09:57 AM
link
rating: 0

The lesson?  BPro readers prefer the warm and cuddly local morning show style, than the Stewart/Colbert biting insight.

Style over substance.


#85    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 11:10

JC responds to Phil:

philb
(30678)

I disagree with JC on this point. A couple of months ago, I argued that the results of lowering the PA to 1000 actually DO (if properly interpreted) constitute evidence that peak age is closer to 27 than 29. This is using a methodology similar to JC’s (although not quite as rigorous).

That is: like MGL, I believe that JC’s finding that hitters peak at 29 is completely due to his selective sampling of long-tenured players. That’s even accepting his method itself without qualification.

I know that JC disagrees with my conclusions on this point, as he disagrees with MGL’s. Either MGL and I are wrong, or JC is wrong.

Or, maybe one of us is 90% wrong and 10% right; or maybe we’re each half right and half wrong; or maybe we’re all full of crap. If you’re interested, take a look and judge for yourself.

All my comments on JC’s study are here. For the 1000 PA case in particular, look for the “part II” post.
Jan 12, 2010 20:46 PM
link
rating: 0

jcbradbury
(54130)

Phil’s “proper” interpretation is violently incorrect, and based on a complete misunderstanding of how least squares estimates the aging function. You can read my response to Phil here.
Jan 13, 2010 04:26 AM
link
rating: 0

The lesson?  Phil, the most keen of all students around is the one that rolls his sleeves up the most.  He’s the one that tries to understand the most.  Basically, the perfect student for any professor.  Not for Professor JC.

sockeye and the rest of the back-patters?  Just what the professor ordered.


#86    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 11:14

Gabe:
http://www.behindthenethockey.com/2010/1/13/1248869/why-do-economists-write-about

“It should be clear that we get completely different results if we consider all players or if we restrict our sample to only players with long careers.  You could argue that any of these curves reflects the actual aging process - but you can’t claim that they each imply the same thing.  Why an economist would want to get involved in this discussion is beyond me.”


#87    Hizouse      (see all posts) 2010/01/13 (Wed) @ 11:30

I think you guys may be a little too hard on Will here.  First, he isn’t very sabermetrically inclined, as his discussion of his Cy Young vote made painfully obvious.  So I don’t hold him to as high a standard or expect him to be able to follow or understand what’s going on in baseball research.  Secondly, at least he recognizes that BP is no longer the “pulpit for discussion.” I take this, along with the recent changes discussed in State of the Prospectus, as an admission by BP that a lot of the criticisms of BP have validity.

And from another thread: I don’t think you can blame Will for asking JC to do an article on a topic that has generated serious discussion recently and is now being introduced to more people (and generating page views for their for-profit business; maybe Silva could say something about that). 

That said, certainly we can criticize BP for putting Will in the position where he is or for not offering MGL (or maybe Colin, since I doubt MGL would agree to write an article) a similar chance to present opposing views.


#88    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 12:06

Well, MGL already prepared the opposing view, first in the PDF on this site, and secondly in a two-parter on Hardball Times. What’s MGL going to say differently?  Nothing.

The criticism is that this topic was INTRODUCED in such a terrible way.  JC’s article is not the standard.  It’s not something to build on.  JC’s article itself is a rebuttal.  It’s perfectly fine as a rebuttal.  But, to the readers of BPro, they think it’s the standard, the introduction.

BPro had an obligation to give that article the editorial-spin it needed to put it in the proper context.

Baseball’s Foremost Expert Commentators (TM) have already dismissed JC’s conclusions, while accepting the research for what it is: applicable to the most narrowest of conditions.

Too hard on them?  Not hard enough as far as I’m concerned.


#89    Hizouse      (see all posts) 2010/01/13 (Wed) @ 12:45

but JC’s original article was peer-reviewed....

#88 Tango, I agree with your post.  Perhaps I was reacting to something that wasn’t there (and to be clear, I was mostly reacting to the “leading the discussion” comment).  We shouldn’t be shocked that BP would publish a JC article or that BP readers treat it as gospel.  We should be sad that BP did not have anyone put it in context (and to be fair, maybe JC does do that some extent, I’m not a subscriber, but I haven’t read it).


#90    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 12:53

BPro has plenty of people who can put that article in context.  Pizza and Matt could be asked to write a followup article comparing MGL and JC’s articles, for one.

***

BurrRutledge
(18981)

jc, thanks for posting your research at BP - I don’t get out much, so this is my one source for baseball-related topics. This research is all new to me, so I’m sorry if I’m asking questions that have already been answered… how many players were in your sample? How big would the sample be if you restrict the sample to player ages 24 to 31 (instead of 35) and have a 2000PA cut off? And, finally, what are the results of that analysis? Thanks!
Jan 13, 2010 08:39 AM
link
rating: 0

Yes, absolutely and definitely! 

And after you do that, remove the age restriction, and set the PA cutoff to 1000 PA.

Thank you Burr Rutledge.  Too bad no one is replying to you, or rating you.  No worries, I’ll give you one: +1.


#91          (see all posts) 2010/01/13 (Wed) @ 13:51

It’s funny - I linked to JC’s work, I linked to Phil’s honest rebuttal, and I linked to this thread.  And JC says:

“You have linked to one side of the debate, you can find some of my responses here.”

JC - I understand that you like “positive feedback,” but pointing readers to your work means that the side of the debate I am linking to is yours!


#92          (see all posts) 2010/01/13 (Wed) @ 13:52

Here’s my main comment to JC:

“The reason we want to know what the peak age is in each sport is so that we can project player performance. If you really believe your method is correct, then you should build a projection system and put it up against PECOTA, Marcel and Zips – all systems that assume a mean peak age of 27. If you do better than they do, that’s a huge point in favor of your system. But I believe using a peak age of 29 will give you projected improvements for players who are, as a group, declining.”


#93          (see all posts) 2010/01/13 (Wed) @ 14:15

I’m a longtime BPro reader, and a more recent reader of this website (which I’ve come to respect very much).  I wanted to make some comments that, on this website, will come across as contrarian and arguably unwelcome, but which are well-intentioned.

(1) I actually think it’s wonderful that BPro gave JC a forum to describe his work, and that a very robust discussion has resulted, with participation from both JC and the Tango/MGL crowd.  My sense is that the two “sides” of this argument had hunkered down and weren’t able to constructively engage in discussions of the issue.  BPro, for all its perceived flaws, may be able to function as an ‘honest broker’.

(2) Could BPro have done a somewhat better job upfront to explain that JC’s findings were very controversial in the hardcore sabermetric community?  Perhaps; but that became clear enough quickly enough in the comments to avoid any damage.

(3) Should BPro commission an article from somebody like Colin to more formally discuss the other side of the argument?  Absolutely, and I will be disappointed in BPro if that isn’t forthcoming.

(4) Would BPro benefit from further engagement with, and respect for the work performed by, the hardcore sabermetric community?  Certainly; but surely the recent changes to the writing staff are the appropriate step, and I think it’s way too early to rush to judgment.


#94    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 14:23

Seeing that JC said this:

I have openly engaged the sabermetric community—-in addition to the academic community—-and responded to every concern.

I have engaged him on Gabe’s site with this concern:

1. Can you run your study that limits the players to 1000 – 4999 PA, with no age restriction. What is the peak age? I’m going to take a guess at 26-27.

2. You said you ran a study with a minimum 1000 PA (no age restriction?), and you still got the same 29-30 peak age from your original study.

Can you explain to us how you can take one subset (those in #1) and get one answer, take another subset (those in your original study) and get a second answer, but that when you combine those two subsets (those in #2), you still get the same answer as the second answer?

Shouldn’t you get something in-between the two?


#95    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 14:30

Rowen/93:

1. No issues with being given the platform.

2. Right, the issue is that they gave him a platform without telling the readers they are in the middle of a conversation, of which the keynote speaker was in the decided minority.  To rely on the comments of other readers to make that clear is not good, especially if a sizeable portion of the readers see his study and don’t bother with the comments.

Even River Ave Blues are citing him now.  And those guys are supersmart.  Very disappointed.

3. Agreed.  Pizza Cutter would be my guy, FWIW.

4. No rushing and no judgement, other than n=1.

***

You should have no reason to preface your post.  It’s received as you intended.


#96    Mike Fast      (see all posts) 2010/01/13 (Wed) @ 15:14

I’m a longtime BPro reader, and a more recent reader of this website (which I’ve come to respect very much).  I wanted to make some comments that, on this website, will come across as contrarian and arguably unwelcome, but which are well-intentioned.

Contrarian comments are very welcome here, in my experience.  Not always agreed with, but welcome and discussed in a fair manner.

(1) I actually think it’s wonderful that BPro gave JC a forum to describe his work, and that a very robust discussion has resulted, with participation from both JC and the Tango/MGL crowd.  My sense is that the two “sides” of this argument had hunkered down and weren’t able to constructively engage in discussions of the issue.  BPro, for all its perceived flaws, may be able to function as an ‘honest broker’.

I have two problems with this, beyond what I’ve already expressed and what Tango said.  This discussion doesn’t really have two sides in terms of positions on the issue.  It has a number of different perspectives that have been interacting and discussing, and then it has JC, who found a spot to stay, dug in there, and started lobbing bombs at all the other camps.  MGL started out in JC’s defense mostly, and I’d say his THT articles mostly stake out a third position from where Phil/Tango/Guy and JC have been.

BP giving first serve to JC, who has decidedly the most intellectually inferior position of the three groups and isn’t really engaging either of them, is a very poor way to start.  I’d like to see more debate between Phil’s ideas and MGL’s ideas.  Instead we get stuck in a stupid rut learning nothing because JC defends his position against all comers and against all logic and will make no concessions.

Also, it’s tough to have the real debate in comments to an article, as Tango mentioned, but also, I and many others have no ability to have a voice in the “discussion” because we’re not subscribers. 

(2) Could BPro have done a somewhat better job upfront to explain that JC’s findings were very controversial in the hardcore sabermetric community?  Perhaps; but that became clear enough quickly enough in the comments to avoid any damage.

Perhaps.  I tend to agree it’s become clear that there is a controversy going on.  It does seem, however, that the parameters of that controversy are pretty murky there.  The article and comments seem to imply that JC is the expert and MGL is his main rival.  I don’t think that’s a very accurate map of reality.  JC is trying to overturn years of established research that is being backed up by Phil, Tango, Guy, etc.  Nobody in the sabermetric community is buying what JC is selling.  MGL introduces another way.


#97          (see all posts) 2010/01/13 (Wed) @ 15:38

I must have unintentionally unsubscribed from this thread ... just noticed there are 50 new posts.  Subscribing again, then reading!


#98          (see all posts) 2010/01/13 (Wed) @ 15:52

OK, catching up ...

MGL/45 says: “Plus, there is the fact that when I did the 1000 career PA guys, I found a trajectory nowhere near what JC claims for those guys.”

Me too.  In fact, I found that when you include the 1000 career PA guys, and interpret the data, you find a peak of 27 instead of 29.  However, I had to eliminate the age requirement first.  Without that, there wasn’t enough data in the 1000 box to describe proper curves, and I did indeed get close to the same peak age. 

Even after my adjustments, if you take the data at face value, using my version of JC’s algorithm, the peak only drops six months.  You have to fix the weighting to get down to 27.  JC says that my interpretation is “violently incorrect,” but he hasn’t explained, implying that my problem is that I don’t understand his algorithm.

My view is that his algorithm is so opaque that it’s difficult to see it doesn’t work for players with short careers—unless you look closely.  There is absolutely no way I would have guessed that to be the case without trying to reproduce his study.


#99          (see all posts) 2010/01/13 (Wed) @ 15:57

In light of what I say in 98, Guy/53’s point number 2 is bang on. 

The algorithm JC uses is an overly complex one, and fitting a quadratic is something that just doesn’t work on players with short flat careers.  It only works well for players who have “typical” parabolic career trajectories.

Fitting a quadratic to Joe Charbonneau is silly.


#100          (see all posts) 2010/01/13 (Wed) @ 16:01

Tango/54: agree 100%.  I think I know what results JC would get if he *did* do 1000-5000 alone—they would be similar to what I found when I tried to reproduce his results.  They would either be in the 27s, or meaningless, depending on his sample, which is what I found.

For the record, I think I *do* understand enough of JC’s methodology to know what’s wrong, and I stand by my criticisms as reflected in my attempt to reproduce his results.  This is not to say for sure that I am not “violently incorrect” about what JC did, but he has given me no information to convince me that I’m not close enough.


#101          (see all posts) 2010/01/13 (Wed) @ 16:18

Nick/59: >“I agree with Mike and Tango, it’s disgusting that BPro is giving JC a platform for this, and it displays and utter lack of awareness of the current sabermetric research ...”

Yes, indeed.  BPro had a responsibility to inform their readership that there was serious controversy around JC’s results.  It is seriously wrong that they neglected to do so.  Perhaps they didn’t know any better?  Or perhaps they don’t care that much.

>“Their readers are even worse.”

I don’t expect BPro’s readers to have state-of-the-art awareness of current sabermetric thinking, nor do I expect them to be willing to analyze an article like JC’s and judge its correctness.  That’s why I expect BPro to treat the controversy like a controversy, to warn the readers to consider the arguments with extra care.

There is a prejudice towards being more receptive to a long, intelligently-presented study than to its less intellectually complex rebuttal.  (You can see that in the comment ratings.)

Suppose I assume 1+1=3, and use that in an erudite-sounding 15-page proof of some theorem.  Commenter MGL comes along and says, “Phil, are you nuts?  1+1 equals 2, not three, and your proof falls apart.  What were you thinking?”

He’s absolutely right.  But another reader tells MGL, “hey, it’s a start, and at least he’s contributing to the field of knowledge, which is more than you’re doing, just criticizing.” A second reader says, “hey, let’s keep the discussion civil.” A third reader says, “hey, Phil’s view on the subject should be heard too.” A fourth says, “Hey, MGL, I don’t care that you can quote 20 mathematicians saying 1+1=2, that’s argument from authority.”

It’s just the way things are.  MGL should be the hero, and I should be the goat.  But I come off sounding more reasonable, even if my argument is completely wrong, and, for people not willing to actually wade through the logic, I’m the one they choose to believe.


#102          (see all posts) 2010/01/13 (Wed) @ 16:24

MGL/69:

>“The problem with JC’s study can be summed up in two short sentences ... He is eliminating all players who age poorly.  Therefore he comes up with the peak age and trajectory of players who age well.

Perfect!  That’s really all that needs to be said.  If ever I get asked about this study, those two sentences of MGL’s will be my reply.


#103    Guy      (see all posts) 2010/01/13 (Wed) @ 16:24

I want to highlight dcj’s post 81, which shouldn’t get lost.  He makes exactly the right distinction in terms of the injury issue.  If you want to exclude from these studies catastrophic injuries that seem clearly unrelated to age (e.g. Tony Conigliaro, Juan Encarnacion), I have no problem with that.  I don’t think it will effect position players at all. It would probably impact pitcher aging curves and I’m not sure it’s a good choice—asking “How do pitchers age, leaving injury aside” is like asking “Other than that, Mrs. Lincoln, how did you like the show?” But I can see the logic.

However, JC takes two extreme steps beyond that.  First, he implicitly excludes ALL injuries as unrelated to age, by looking only at the players who were productive into their mid-30s.  However, as players age they are both more likely to get hurt, and more likely to have that injury impede performance for a long time.  So he has not isolated “aging,” but has rather removed one of the key aspects of aging.  Second, in order to remove the “contamination” of injuries he selects a pool of players who—in addition to being relatively injury resistant—also have flat post-peak aging curves.  That’s the death-blow to his study:  his sample is selected on the basis of the variable he’s trying to study.

Ironically, he finishes his BPro piece by quoting Bill James saying that players have different aging curves.  But if that’s true, then JC’s sample MUST be biased in favor of slow-aging players.  For any given inherent talent level, slow-aging players will have longer careers than fast-aging players, by definition.  So if you study long-career players, as a group they MUST have a flatter curve than average.  This is a mathematical certainty.  So, what’s to argue about?


#104    MGL      (see all posts) 2010/01/13 (Wed) @ 16:42

post 103 above, another great concise characterization of JC’s study, which he seems to fail to acknowledge - and inexplicably so.


#105          (see all posts) 2010/01/13 (Wed) @ 17:08

Mike Fast said, “I’d like to see more debate between Phil’s ideas and MGL’s ideas.”

Do you mean between JC’s ideas and MGL’s ideas?  Because on this issue I agree with MGL’s ideas pretty much 100%.


#106          (see all posts) 2010/01/13 (Wed) @ 17:26

Not to pile on, but Guy @ 103 makes a great point.

JC wrote this at my blog:

“The basic problem is that there will always be more younger players entering the league, because all players play at a young age before dropping out.”

But they drop out, in most cases, because they’ve reached their peak and they’re not good enough to stick.


#107    Mike Fast      (see all posts) 2010/01/13 (Wed) @ 17:32

Phil/105, I didn’t mean so much that you and MGL are violently at odds as much as I did that you approached the study with different methods and from different viewpoints, so to speak.  I’d much rather hear the two of you discuss the pros and cons of your various approaches and methods of dealing with the drawbacks of the delta method than I would to hear JC reiterate his same position and insult everyone else in the sabermetric community for the thirtieth time.


#108          (see all posts) 2010/01/13 (Wed) @ 17:45

Mike/107: Oh, okay. 

I think MGL and I (and Tango) have approaches that aren’t mutually exclusive.  To generalize a bit and mention only the biggest things we’ve done:

Tango shows that with other algorithms than JC’s, but JC’s selective sampling, the peak does indeed come out around 29.  But with proper sampling, it comes out around 27 as we expect.

MGL shows that the delta method, with appropriate corrections, does give you 27.

I show that a version of JC’s own study also gives you 27 if you weight the results properly and eliminate the selective sampling.

And all three of us have argued, from the beginning, theoretical reasons why the selective sampling issue would bias the results.  I think we all reverted to empirical studies to refute JC once it was obvious that we weren’t getting through.

So we’re just giving three different ways to illustrate the same point.  It’s like three different proofs of the same theorem ... you don’t necessarily have to argue that one is better than the others.


#109          (see all posts) 2010/01/13 (Wed) @ 17:50

I finally finished reading the updated posts. MGL’s concise characterization at 69 is dead on.

While I really do love the direction the BPro is moving in with Pizza, Eric, Colin, Tommy, Matt, etc., I can’t believe they didn’t preface this piece. I wouldn’t be surprised if Kharl, Goldstein, or Carrol didn’t look at MGL or Phil’s work on aging. But I’d be shocked if Eric and Matt(and so on) didn’t take a glace or at least know of its existence. Colin definitely knew. I hope, as suggested above by many, that BPro takes a hard look at Phil and MGL’s work and tries to compare it to JC’s.


#110    Guy      (see all posts) 2010/01/13 (Wed) @ 17:57

Gotta say, this discussion makes me really look forward to JC’s forthcoming article (with Dave Berri) advising academics on how best to interact with the online sabermetric community. He’s clearly the man for the job.  :>)

Hawerchuk/106:  Yes, one of the odder aspects of JC’s argument is that many players who leave the majors at age 25, 26, or 27 would have continued to improve, but they drop out of sight so we can’t see them.  That’s a major reason why—his view—there are more 27-yr-olds than 30-yr-olds in baseball, and more peak seasons at 27. But he provides no evidence for this fanciful view.  The mean age of AAA players is around 30, so clearly lots of guys stick around.  If they continued to improve until age 30, we should see lots of guys reaching (or re-reaching) the majors at that age.  But we don’t.  Because if you aren’t a major leaguer at 27, you won’t be one at 30 either (a few pitchers excepted).


#111          (see all posts) 2010/01/13 (Wed) @ 18:22

Guy/110,

I think you present the idea much more clearly than others have seemed to understand it when it comes to the survival bias.  I actually don’t know the answer to whether or not the selection problem there is the case.

However, may I suggest the following.  Given the decision to bring up a player depends on the expected performance for the rest of the career, it seems reasonable to think that, even if they eclipse the AAAA threshold, they won’t be brought back up at 30.  They were given their chance, and their peak is unfortunately not high enough to garner a re-callup given their current age.  Since at 30, they would actually be expected to decline from there, they would simply be passed over for a younger player with years left on the improvement side of the curve.  Hence, we see more of the younger players based not on true present performance, but on the projected performance curve into the future.  While they do stick around in the minors, they’re not sticking around at the MLB level and this is externally decided by a GM.  I think that is where JC’s article highlights an important point.  I’m not sure exactly how to show if this is the case, or measure it, and I don’t know if it is.

As for separating injury from aging, I think there are a lot of good points here.  That’s a tough issue, and I would argue that chronic injuries are a factor involved in aging, while others are completely random.  Of course, the probability of these injuries likely increases as they get older, but would someone be willing to do the math there?  I know predicting injury has been a tough spot for analysis in the past, so maybe it’s an issue to revisit.


#112          (see all posts) 2010/01/13 (Wed) @ 19:45

Millsy - the flipside is that established players are often signed to multi-year contracts.  Teams are reluctant to cut these players and eat their contracts.  Plenty of capable young players have rotted in the minors behind weaker veterans.  Ignoring the short careers of young players because they supposedly overvalue potential only accentuates the unwillingness of teams to cut their losses on overvalued veterans.

I realize my argument is purely qualitative, but JC’s argument for rejecting young players is too.


#113    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 20:27

The sole issue here: that JC makes a conclusion that is far-reaching, when really it is extremely limited in scope. 

This is exactly what MGL was saying in JC’s defense at the beginning, that it doesn’t matter what JC is saying, since the data speaks for itself: no extrapolating beyond the sample data.  That should be the end of story.

JC simply keeps shouting otherwise, at his site, at HuffPo, and now at BPro.  If JC admits the very narrow scope, this issue dies immediately.


#114    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 20:32

Thank you to colintj for your nice post:
http://www.baseballprospectus.com/article.php?type=2&articleid=9933#44757

Unfortunately, his strong belief in the rest of his post is too strong for some readers at BPro, as he’s already at -2.

I’m coming to dislike the rating system, though at the beginning I thought it was great.  Really, the system should be “ignore this comment, trust me” and “this comment adds value, even if I might disagree with it”.  Instead, the ratings act as backpatters, and kickassers.

“Most of the comments here seem to be related to people not having read or not believing this paragraph.”

Yes, Pete, I don’t believe it for a second that JC dropped the requirement to 1000 PA, and still got the same conclusion.  Well, I believe that he did it, but I simply think it means his methodology is scr-wed up to give that kind of answer.


#115          (see all posts) 2010/01/13 (Wed) @ 20:50

Tango/113,

I do understand and appreciate your difficulty with the sample, but to say that Bradbury hasn’t loudly admit that there is a fairly narrow scope is not true:

“My study is imperfect, as making sample choices always involves tradeoffs. I went with what I thought was best and I openly acknowledged the downside as well as the upside.....I would welcome anyone to study aging in baseball through new methods. As I have highlighted above, the methods previously employed to measure aging have some serious problems, and I think that my study handles these issues better. I don’t intend for this study to be the last word on the subject, and I would be delighted to see further studies that employ advanced empirical methods designed to handle the relevant issues.”

And here, “As I reiterate throughout the two parts, there really is no one-size fits all aging curve. And in practice, you are better off addressing each player on a case-by-case basis.”

I’m not defending the work itself, but it seems to me that he did say these things, but they were ignored.  In addition, Bradbury seems very open to the idea of a Heckman model (though, I think there are better bias-correcting type models for this type of data, which I mentioned either earlier in this thread, or in another aging one).
In the end, I’m not sure we can get a true grasp of everything until we do model the selection bias itself.  This, I think, is a worthwhile endeavor for someone.

Hawerchuk/112,

I think you’re missing the point I was trying to make about the selection bias.  If veterans are kept beyond when they reach AAAA level, then that’s good for estimating an aging curve.  I’m not sure what you’re getting at with the overvaluing young potential.  Could you clarify?


#116    Tangotiger      (see all posts) 2010/01/13 (Wed) @ 21:05

I did not say that everything he said was crap.  If he stops there, fine.

Millsy, if you insist, I will find you at least 10 sentences JC said that are patently false, starting with how he started this whole thing with age 27 peak is a “myth”.  I will only do this, spend 15-20 minutes, if you then accept that I did not do it all in vain.  (vein actually, cause that’s what I’ll pop when I have to re-read his falsehoods.)

Otherwise, accept that JC has made SOME absurd claims.  And it is only on THOSE particular claims that we are having problems.


#117          (see all posts) 2010/01/13 (Wed) @ 21:24

I never implied you said that.  If I did, then that was not my intention.  Maybe I took your comment to simplistically when you say,

“If JC admits the very narrow scope, this issue dies immediately.”

He admitted a narrow scope.  I didn’t insist on any specific examples, I understand which things you have a problem with.  But it seems like beating a dead horse is all.  It seems like rather than have him admit there are shortcomings, you’re waiting for him to annoint everyone else ‘right’ and himself ‘wrong’.  It’s going to be a long wait, whether or not it’s true.


#118    tangotiger      (see all posts) 2010/01/13 (Wed) @ 22:30

JC is being inconsistent when he claims narrow scope in one sentence, and then makes far-reaching conclusions in another.

All he has to say is that his study only allows us to make conclusions on players good enough to have long careers.  And anything he says beyond that is unsupported by the evidence.  Actually, I don’t care if he says it.  I’ll be happy if you accept this as the basis.

If in addition to this, you want to discuss the potential merits, fine.  But, let’s first find the common ground.


#119    colintj      (see all posts) 2010/01/13 (Wed) @ 23:33

np Tango.  like the comment implies, i’ve been lurking for a long time.  you guys are an invaluable resource for anyone learning or writing about the game.


#120    Nick Steiner      (see all posts) 2010/01/14 (Thu) @ 00:21

Millys/117

In the top of this very thread, Tango links to an article in which JC says that given a .900 OPS in a player’s peak, that player will have a .850 OPS at age 35.  Since that it is a forward looking analysis, instead of retrospective, it goes beyond the scope of his study.  Not to mention that Rally showed that particular example was untrue anyways. 

JC, to his credit, did not make any such statements in the BPro article.  If he continues to say,

“of players with long careers, and guys who peaked after the age of 24, the peak age is roughly 30 years old.  And, if we knew in advance that a player was going to fit the requirements of the study, we could use this curve as a general predictor.”

nobody will have a problem with it.  I’ve said before that JC’s attempts to separate injuries and aging are perhaps interesting; however, with the current sampling issues in his study, they are not applicable to the average major league player.


#121    Tangotiger      (see all posts) 2010/01/14 (Thu) @ 00:42

[Peak at age 27] may not seem like much of a myth, as the conventional wisdom has long been that baseball players peak around 30. This, it turns out, is a myth of sabermetrics.

That’s his opening fart.  His very first words.  Right away, he’s telling us that age 27 peak is a myth constructed by sabermetrics.  He’s not saying that there’s a legitimate reason under certain conditions that 27 is peak.  No. It’s a myth.  Myth is very unambiguous.

a pitcher with a peak 3.5 ERA is expected to post around a 3.75 ERA at 35.

Wow.  I mean, he couldn’t be more wrong.  In SIX years, a pitcher will go from 3.50 to posting an ERA of 3.75.  If you like, I will find you exactly how wrong that is.

If he intends to say “a pitcher with a peak 3.5 ERA, who then pitches full-time for the next 5 years, will then have an ERA of 3.75 in year 6… then ok, maybe that’s true.  I can believe that.

And boy, how uninteresting that statement is.


#122    Tangotiger      (see all posts) 2010/01/14 (Thu) @ 00:48

Millsy: how about this: make some statements, and I’ll tell you if I agree.  Make them progressively stronger in stance.  I’ll stop you when I can’t agree with you.  Then at least we have some common ground.


#123          (see all posts) 2010/01/14 (Thu) @ 00:50

Nick/120,

I make no assertations about the study itself.  Your suggestion is for Bradbury to simply concede his entire work is essentially useless.  I’m just saying, don’t count on that.  However, I do think the following statement (by Bradbury) sums up much of what you seem to be looking for:

“As I reiterate throughout the two parts, there really is no one-size fits all aging curve. And in practice, you are better off addressing each player on a case-by-case basis.”

In addition, I’m not sure you can make this statement, “they are not applicable to the average major league player” without actually knowing what the bias is in either sample.  They may perhaps not be applicable, but there’s a burden of proof on both sides as to the survival bias problem, which I think is a valid point, and I don’t feel has been fully addressed by responses.  I find the responses, however, interesting and informative.  I honestly don’t know who’s right about the peak, if it’s all that relevant beyond ‘peak is around 27 to 30’, etc.


#124          (see all posts) 2010/01/14 (Thu) @ 01:08

Tango/121,

I actually meant to state in my last comment that I think the “myth” argument is a bit overstated on his part.

When you say “make some statements”, what do you mean?  About aging?  I’m not arguing with anyone here about the methods, so I don’t know if the following is what you’re thinking.

I’ll start with:

1. Players probably don’t peak at 20 or at 35

2. If we were able to allow an experimental group of 600 or so players play 15 full seasons all against one another without moving them up and down (and remove any catastrophic injuries), this would give us the best idea of a curve of peak performance and aging.

3.Unfortunately, the full sample of players likely has a survival bias problem, given the external factors deciding play time that may or may not have to do with the player’s true talent.

4. Given this survival bias, we likely do not see a significant number of players hit their peak performance that they would have realized had they been given the chance to play a full career.

5. These players who are not allowed to reach the peak could (I say COULD) have a significant bias downward, since we actually don’t know if their peak would have been later if they had been allowed to continue playing in MLB.

6. The problem with finding the selection bias is that our outcome variable is performance, and performance is unfortunately also the main determinant of survival.

7. Using a two-step equation, such as the suggested Heckman, Propensity Scores, etc. could be a good step toward understanding the bias, and ensuring a correctly weighted sample.  I believe this is the type of system that PECOTA implements (a matching scheme with some sort of propensity score).

8. While I actually like your method that you linked me previously more so than Phil or MGL’s (essentially producing counterfactuals for players), I’m not sure it completely controls for the external bias put upon play time.  Would you agree?

9. (actually, less strong here, but just thought about it) Injuries, as a whole, are random.  However, it’s likely that risk of injury increases with age.  Modeling this increased risk could play an importantn role in the aging curve.


#125    Tangotiger      (see all posts) 2010/01/14 (Thu) @ 01:12

Millsy: I’m with you all the way.

Keep going if you like.


#126    MGL      (see all posts) 2010/01/14 (Thu) @ 06:27

"If we were able to allow an experimental group of 600 or so players play 15 full seasons all against one another without moving them up and down (and remove any catastrophic injuries), this would give us the best idea of a curve of peak performance and aging.”

All that would produce is the aging curve for all of those persons.  What players/persons are those?  All males age 21+?  All players in professional baseball?  All players who had at least one PA in MLB before the age of 25?  Each one of those populations may have entirely different aging curves.

The only inquiry that makes any sense is one that includes most players who play at least a season or so of MLB - i.e., most MLB players That is what I do and why I use the delta method.  I suppose there are other methods one can use and still include all players, but using the delta method is the only way I know how to deal with such a diffuse group of players (part time, full time, long and short careers, etc.), as I am not a statistician, econometrician, or what have you.

And I have said many time, a one-sized-fits-all model, be it mine or JC’s, has limited practical value anyway.

However, if we do want to answer the question, “What is the likely general aging curve for players who belong to X population?” you better be careful about defining X.  JC was not.

Even I am not sure how to characterize, define, or describe X in my model. I am using a conglomeration of all players, and linking them all together doesn’t really make any practical sense.  I cannot point to any one group of players and say, “This is the player I am talking about with my aging curve.” At least JC can do that.  I’ll grant him that.


#127    Blackadder      (see all posts) 2010/01/14 (Thu) @ 07:52

A quick word in defense of economists: the other night I was having drinks with a bunch of economics grad student friends of mine.  Probably the biggest laughs of the evening came after describing JC’s methodology.


#128          (see all posts) 2010/01/14 (Thu) @ 08:23

MGL/126,

I was just making a general statement about the sample.  Optimally, we’d want to start it at different ages, vary experience, etc. in replications to see if/when/where the aging curve actually shifted.

Tango/125,

I’m not sure we disagree on much here, then.

Bradbury in the BP comments mentions that he had thought about a Heckman type selection model before, but can’t remember why he abandoned it.  I suspect (and I’m making assumptions here I don’t know are true) that he did so because the selection is so highly dependent on the variable of interest (as I state in #6 above). 

Getting around this is tough, and at this point I haven’t thought about the logistics of it to make it work (maybe include minor league performance after a player is moved back down, but then we’re dealing with a lot of variance and uncertainty with MLE type data).  But the other problem is there is usually a need for an exclusion restriction (i.e. there is a variable included for MLBers vs. no longer MLBers) and I have no idea what this might be.

I haven’t read Bradbury’s study carefully enough to make any real criticisms or defenses on the methodology.  Beyond the fact that I think he brings up possible deficiencies in prior sampling, and introduces a new way of looking at aging (specifically, aging different types of skills), I just don’t know.  But I don’t think myself and a lot of the people here are so far off from one another on this issue.


#129    Mike Fast      (see all posts) 2010/01/14 (Thu) @ 08:48

introduces a new way of looking at aging (specifically, aging different types of skills)

Bill James was writing about this in the eighties in his Abstracts.  For all I know he may not have even been the first to talk about it.

Others have worked on this over the last few years for incorporating into projection systems.  Brian Cartwright comes to mind, but I’m sure he’s not the only one.


#130          (see all posts) 2010/01/14 (Thu) @ 09:06

Fair enough, Mike.

I think it’s a great thing to be working on.  I remember seeing a cluster or PCA type analysis at Statspeak by Pizza Cutter that really piqued my interest, and I think FA and PCA have been underutilized in the aging and player classification type research.  I think there was a discussion here about an FA article in JQAS that was subpar, leaving a lot of room.


#131    Tangotiger      (see all posts) 2010/01/14 (Thu) @ 11:00

This was one of my very first saber articles I wrote.  Probably back in 2002 or 2003, so forgive me if something in there seems amateurish:

http://tangotiger.net/aging.html

Do me a favor, and go about half-way through, where I show a table of aging by debut age and years in MLB.  This table is for those players who debut at age 25.  Obviously, I have a similar table for all age debut.

When you look at that table, what do you see?  Well, the more years in MLB, the later the peak.  For those nine players who debut at age 25 and play for 10 years, their peak was 27-28.  For those three players who debut at age 25 and play for 15 years, their peak was age 30.

For those players who played 4 or less years, their peak was the year before they are out of baseball.

For those who played for 5 years, their peak was their first year in MLB.

Each one tells its own story.  If you choose a subset of those (debut 25, 10+ years in MLB), CLEARLY they will tell you very little about all the other players who debut at 25 and played 5 or less years, correct?  This is not in dispute by anyone.

Therefore, all I am saying is: don’t tell me what your study of players with 5000 PA between ages 24-35 is going to tell me about the guy who debut at age 25 and played for 5 years.  Don’t tell me that he did NOT peak in his first year in MLB.  Don’t tell me that if you let him play 5 more years, that the overall peak for those players will be 29-30.

Limit your conclusions to what the data tells you.


#132          (see all posts) 2010/01/14 (Thu) @ 11:39

I understand your worry, like I said.  There are ways to create that “if he played 5 more years” curve that reduce the bias in that claim, which is what I’m getting at.  Given the type of data we have (and lack of much comparison), I don’t know how feasible it is.  I think your data in the link is very interesting.

I don’t know about Bradbury’s full methods for a couple reasons A. I read it only once and it was a while ago and B. I actually don’t have access to the journal it’s published in (which is odd, given we have pretty obscure journal access here), and C. I can’t find it on his site.  I also don’t have access to BPro (other than comments).


#133          (see all posts) 2010/01/14 (Thu) @ 11:44

Millsy, JC has said that he sends a copy the paper to anyone who requests it.  He sent it to me a couple of months ago.  It’s worth a read ...


#134    Tangotiger      (see all posts) 2010/01/14 (Thu) @ 12:17

Right, I’m not saying you can’t reduce bias.  But no matter what you do, you will always be limited to the data.  And if you want to extrapolate beyond the data, well, the uncertainty level will shoot up the farther the data point you look at that is from the sample your study is based on.

There are 1800 players who played in MLB just last year.  You probably have some 10,000 players in the last 50 years.  How many players in JC’s sample?  500?  Is he really trying to tell us that by taking the best 500 of those 10,000 players that he will, through various bias-reducing techniques, that he will be making a proclamation of the “average” MLB player?

He’s going out of his way to select the data points that are the most clustered and most skewed from the population.  And he’s going to extrapolate that to a point where the very first thing he’s going to tell us is that we’ve been following the myth of age 27?  And repeat that as a headline, in his blog, on HuffPo, and everywhere else?  And tell us that a pitcher with a 3.50 ERA at his peak will have a 3.75 ERA at age 35?  Really?

MGL had it right when he said we should ignore what JC says as a conclusion, since it’s obvious that the data doesn’t say what JC said.

EXCEPT: you have alot of people who don’t follow closely, like sockeye, who are sucked in.  And you have people like me who have to call bullsh!t.

If you limit it to exactly what I have said, there’s nothing to counter me.  If you want to say more, then yes, I’ll be able to evaluate what you say and probably agree with you on many, if not all, things.  Limited to what I’ve said, I’m 100% right.


#135    Tangotiger      (see all posts) 2010/01/14 (Thu) @ 15:15

This is a conversation MGL is having at BPro.  It’s must-read.

===========================================
MGL
(2121)

JC says:

“The projection is their last three years lwts per 500 PA, weighted by year (3/4/5) added to 500 PA of league average lwts for that age minus five. In other words, I am regressing their last three years lwts (weighted) toward five runs lower than a league average player for that age.

While Lichtman believes using five runs below average generates a “conservative” projection, the substitution is just a guess informed by nothing more than a hunch. In this case, the guess imposes the outcome for the exact factor we are trying to measure: the estimated decline is a pure product of the assumption. Thus, it is no surprise that Lichtman’s adjusted delta-method estimates yield results that differ little from his raw delta-method estimates.”

JC completely mis-characterizes or does not understand what I was doing.

He seems to imply that I am assuming a 5 run decrease for all of the “one-year” players (those who do not get a Year II).

I am not. I am assuming a Year II performance equal to a basic Marcel projection. While aging is or should be a part of a Marcel, so that it is true that I have to make some aging trajectory assumptions in order to construct the projection, the “5 runs worse than average” is the mean that I am using in the regression that is part of the Marcel (the projection). That is completely different than assuming a Year II which is 5 runs worse than Year I. That would be ridiculous. And that is what JC is implying that I am doing, I think.

Normally a Marcel regresses toward a mean which is the league average performance of similar players (age, size, etc.). The reason I used a mean (to regress towards) that was runs worse than a “generic” mean was that these players who do not see the light of day in Year II tend to be fringe players and therefore the means of their population are likely worse than the means of the population of all players.

In fact, if anything, I think I used a conservative (too optimistic/too high) mean. I contemplated using a mean which was 10 runs worse than a “generic” mean (mean of all MLB players).

Interestingly, as you can see from the charts in my articles, even using a “low” mean when doing the regressing, all of the players’ projections in Year II were BETTER than in Year I until age 30. So these players actually showed a “peak” age of 29 or 30 (it is not a “true” peak because Year I is an “unlucky” year), which pushed my overall peak age slightly forward.

The most important thing is that whether I used a typical MLB mean for the regressing, 5 runs less than that (as I did) or even 10 runs less than that (which, as I said, may have been even more correct), it would not have changed my results. So criticizing that aspect of my work cannot indict the conclusions generated from that work.
Jan 13, 2010 20:06 PM
link
rating: 2

dpowell
(1025)

I actually find the robustness to using different assumptions on the dropouts pretty convincing. At the same time, I wonder if it answers the question too rigidly. To me, the peak age question is basically synonymous with determining the true average age profile. This check, however, makes them separate questions. By assuming random values for the dropouts, you’re partially guessing the average age profile (for all the reasons described by people above). It just happens because of, say, attrition rates and lots of other factors that the “peak age” is unaffected. You’re getting all the age-specific values wrong (or at least, we have no idea if they’re right), but it just works out that the peak age is unaffected. Does that make sense? I’m not suggesting this is a criticism for this specific paper, but I think it’s an important caveat. Agree?
Jan 13, 2010 20:52 PM
link
rating: 1

philb
(30678)

mgl, Does it make a difference whether you use 0, 5, 10, 15, or something higher?

Because, if it only makes a little difference, and the peak still comes up around 27 regardless, wouldn’t that pretty much settle the question of peak with respect to the delta method?

That is: the problem with the delta method is the dropouts. If the results are robust (roughly the same) regardless of what reasonable method you use to compensate for the dropouts, doesn’t that give you a solid conclusion?
Jan 13, 2010 20:23 PM
link
rating: 0

MGL
(2121)

Phil, yes, absolutely. I just went back to my program and changed one number in the code. That number is the mean towards which I regress the non-survivors in their Marcels. I changed that number from -5 to 0 and also to -10. It does not change the peak age or the trajectories very much at all. It is still 28 in the modern era (I arbitrarily define that as including any player season after 1979 in my data).

As I said, JC’s criticism of the “5 runs worse than average” turns out to be a red herring, as whatever I use does not significantly affect the results.

Really, the survivorship problem is not as large as I thought it would be. If I do not include these players (who do not have a Year II), and thus my remaining players are a little lucky in all of their Year I’s (that is the problem with not including the non-survivors), there is essentially a plateau from 27 to 28 (a .1 run increase from 27 to 28 actually).

Once I include the non-survivors and use the “5 runs less” for the mean that I regress towards, the 27 to 28 interval shows a .4 run increase (rather than .1 run without the non-survivors).

If I do not use “5 runs less” for the mean (I simply use a standard league average), that 27 to 28 interval is now a .5 increase rather than .4.

If I were to use “10 runs less” for the mean rather than “5 runs less”, I get a .3 run increase for that same interval.

So, the peak age and overall trajectory is not very sensitive to the mean I use for the regression in the projections for the non-survivors.

Well, JC wanted a response to his criticism on that topic and I have provided a very adequate one I think.
Jan 13, 2010 23:58 PM
link
rating: 1

MGL
(2121)

To those people that don’t understand the whole issue of survivorship, I want to reiterate the fact that we are not really trying to include players in our resultant trajectories who do not actually play.

We are merely trying to balance out the players who do play a Year II, because they will have tended to be slightly lucky in ANY year (Year I) that is followed by a subsequent year. And when we use the delta method, we are only including player season in which there is a Year I and a Year II, so all of the players we are including will tend to show a false decline (or a false not-so-large increase) in ANY pair of years.

In order to account or adjust for that, we include ALL players, even the ones who do not get a Year II (for any given Year I), by creating a phantom Year II and using a Marcel-type projection for their Year II performance (which doesn’t really exist). That way, we can simulate a random, controlled experiment, whereby all players are forced to play at least one more year at any age. That would be the only way we could really ascertain true aging curves and peaks - by either forcing all players to play until they are 40 years old or so, or by at least forcing all players to play “one more year” whether they were allowed to or not (and then use the delta method because we have players who have played only 2 years, 3 years, 5 years, 10 years, etc., and we want to include all of these players, unlike JC).

Actually forcing all players to play until they were 40 or so (and starting them in the majors when they were 20 or so) would not give us a very good answer either. That would answer the question, “What is the average aging curve look like for all players who had some time in MLB and were allowed to play until they were 40 regardless of how well they aged or how well they played?” That would be sort of the reverse of JC’s sample, but equally biased.

Forcing players to play one more year, which is essentially what I am doing when creating those “phantom Year II’s,” creates a little bit of bias as well, because there are reasons why these players do not get to play in Year II other than the fact that they got unlucky in Year I (although that is definitely part of it for some of these players), but it is a good method to balance out those slightly lucky players who do get a Year II at any age. And actually using the “5 runs worse” method of regressing that JC does not like is actually a good way to counteract that bias.

So using all players AND creating phantom Year II’s for non-survivors, and then using the delta method to construct an aging curve, I believe is by far and away the best method of answering the question, “How does the typical MLB player age?” where “typical” means all players combined, from the ones that have a cup of coffee to the ones who play for 5 or 6 years, to the ones - as in JC’s sample - who have long and illustrious careers.
Jan 14, 2010 00:12 AM
link
rating: 1


#136          (see all posts) 2010/01/14 (Thu) @ 18:23

At BPro, two of MGL’s comments are not displayed because their ratings are too low.  You can override the autocensor and ask to see them anyway, so it’s not a big problem that way, but it’s kind of sad that BPro’s readers would vote those particular comments down.


#137    Tangotiger      (see all posts) 2010/01/14 (Thu) @ 18:53

It’s 100% ridiculous.

Like I said, their commenting rating is based on whether they like MGL’s style, not substance. 

Just look at the people who replied to MGL’s comment, as they got +6: they said nothing of substance, just that MGL should be nicer. 

People should get past the adjectives, and focus on the nouns and verbs.


#138    colintj      (see all posts) 2010/01/14 (Thu) @ 19:48

@Phil

Yeah, this seems like a serious case of status affiliation.  MGL is the outsider, forced to come in to comment where JC gets the privilege of being BP approved.  That’s what reveal the importance of the editorial choice.  Now we’re at a place where a large number of BP readers are choosing JC over MGL!


#139    J. Cross      (see all posts) 2010/01/14 (Thu) @ 20:26

Instead of creating phantom year 2 for players who only had year 1’s, what do you think about looking at the delta from Y1 to Y2 only for players who not only played in Y1 and Y2 but also Y3?

As MGL says this group of players is likely to have been somewhat lucky in Y1 (b/c we know that they were given the opportunity to play in Y2) but if you limit it to guys that also played in Y3 wouldn’t they have been lucky to a comparable degree in Y2?

Is this introducing an additional bias?  I can see why it might flatten out the aging curve but would it shift the peak?

btw, if our goal here is predictive (trying to assess contract value) isn’t the fact that there’s bias in the delta method beside the point?  After all, aren’t all players being offered contracts those who were good enough in “year 1” to get a “year 2” and possibly beyond?  Players getting offering contracts as a group have been somewhat lucky.


#140    Blackadder      (see all posts) 2010/01/14 (Thu) @ 20:34

I agree with # 138: I would bet dollars to donuts that if (say) Colin had written a BP article criticizing JC and then MGL and JC had had a similar exchange in the comments the distribution of ratings would look very different (although not completely reversed; it’s clear that MGL, for whatever reason, just rubs some people the wrong way.) The BP readership is obviously smart and comparatively well-informed, but a lot of them are seeing these debates for the first time, and it’s pretty natural to default to position that has the approval of the “world leader” in sabermetric research.


#141    MGL      (see all posts) 2010/01/15 (Fri) @ 01:46

I don’t see why all the criticism of BP. The editors are not in a position to judge the merits of JC’s article.  They are pretty much forced to assume that it has merit (which it does - at least some of it) because of JC’s status and reputation.  Plus, I don’t see too much harsh criticism of my comments.  I really don’t.

+123, Tango does essentially that, I think, by eliminating from the data set the last year for all players.  Your suggestion is fine I think.  It will work just as well as using a phantom Year II, maybe better.  Again, I don’t think the survivor bias is such a large problem, so just about anything to account for it will work.

It is true that you would not want to correct for the survivor bias if your goal was to project players that you knew were going to play in Year II.  That may or may not be your goal.  If you are a team, you definitely DON’T want to go that route.  If you are projecting players and only getting graded on players who actually end up playing in MLB, then you do want to go that route.  So, again, it depends on your goals.

It is similar to the MLE problem that Tango and I (and others) have mentioned in the past.  Traditional MLE’s only tell us about players who end up playing in MLB, especially if we regress their MLE’s toward that of a (rookie) MLB player.  That works if you are going to be graded only if your minor league player ends up playing in the majors.  If you are a team looking to decide whom to promote, that doesn’t work.  You have to at the very least regress those minor league numbers (or the resultant MLE’s) toward that of an average MINOR (not major) league player (BTW, David Gassko turned me on to this concept, which is an important one and one that I had overlooked) and you also have to tell the team you are working for that your MLE’s only apply to players who the team is thinking of promoting anyway.  If a team is NOT thinking of promoting a certain player, it is likely that your MLE’s for him are too optimistic.  Sorry to sidetrack the discussion a little, but this is an important point about MLE’s which also applies to using age trajectories to project players.


#142    Nick Steiner      (see all posts) 2010/01/15 (Fri) @ 01:54

I think people are angry at BP for giving JC the first shot on the aging debate.  If they want to bring in several different viewpoints, and try to foster a legitimate discussion about aging studies, I think that’s excellent.

However, by giving JC first shot, they are essentially making his study the authority on the matter and forcing guys like you to work up hill, when in reality, he is the one who is advancing a contrarian position and on *very* shaky ground.

As colintj said, doing that either displays an ignorance of the current research and standards, or it is intentionally shit stirring so that BPro can be “unique”.  Either way, it’s not good.


#143    colintj      (see all posts) 2010/01/15 (Fri) @ 10:02

"I don’t see why all the criticism of BP. The editors are not in a position to judge the merits of JC’s article.”

How is the second sentence not a criticism of BP?


#144    Tangotiger      (see all posts) 2010/01/15 (Fri) @ 10:10

colin/143: touche!

I think the criticism is based on the viewpoint that BPro knew that the readers would be brought into the middle of a conversation, but there was zero effort on BPro’s part to get their readers up to speed.

Even in things like nightly news or any newspaper article (like say the Goldman / Brazil / kidnapping case), they ALWAYS, at the end of the article or in the second paragraph, lay out all the facts, to bring the readers up to speed.

So, two things could have averted the angst:
1. BPro give a one-paragraph intro
2. JC not make far-reaching conclusion

Two pretty simple things.  And we’d all be better off for it.  JC’s article would stand as quality research (which it still is, just that the conclusion as it is now is crap), and BPro is commended for bringing in such a very heavy subject to willing readers (they still should be, just that they need to do some due diligence).


#145    Nick Steiner      (see all posts) 2010/01/15 (Fri) @ 21:45

Unrelated to the aging shenanigans, but JC apparently has Brandon Lyon fairly significantly better than Jose Valderde going forward. 

http://www.sabernomics.com/sabernomics/index.php/2009/12/astros-sign-lyon/
http://www.sabernomics.com/sabernomics/index.php/2010/01/tigers-sign-valverde/

Over the past 3 years, Lyon’s pitched 212 innings with a 3.30 ERA, a 3.78 FIP and a 4.39 xFIP.  Valverde has pitched 190 innings with a 2.84 ERA, a 3.58 FIP and a 3.65 xFIP.  Their WAR’s, according to FanGraphs, are identical and CHONE projects both players to be 7 runs above average next year.  Lyon is 31 and Valverde is 32. 

JC has Lyon valued at 5.8 per year over 3 years and Valverde at 4.5 per year over 2 years.  It’s not a Francouer type mistake, but I don’t see how JC can have Lyon 1.3 million per year better than Valverde going forward.


#146    Tangotiger      (see all posts) 2010/01/15 (Fri) @ 21:52

Maybe after I brought up the fact that Lyon is being paid the same per inning as Halladay that JC re-did his methodology?


#147    Brian Cartwright      (see all posts) 2010/01/16 (Sat) @ 00:35

I ran a study on the players in my database, including all levels of the minors. I did not include college. Criteria: They played on the same team in two consecutive seasons.

I would expect that minor league players who repeated would tend to have underperformed in year 1 in order to have not been promoted between seasons, and so a regression to the mean in year 2 would suggest a bias towards a better year 2.

What I found is that the batters and pitchers, overall, peaked at 24. Probably not coincidentally, 24 was also the age that had the most weighted PAs. At 25, the number of players getting released outnumbers the new players coming in.

Whether anyone wants to argue the validity of 24, what I think it points out is the result depends on which players are being studied.

Let’s say that the most common age for players to peak is 24. Not many players who peak that soon will be at a high enough level to make it to the majors (Carlos Baerga). Guys who are able (through genes, hard work, drugs) to continue to improve to age 27 reach a higher level, that of the average mlb player. Those who can keep improving to age 30 likely play ten seasons through age 35 and qualify for JC’s study.

In constructing projections, the key for me will be to find leading indicators - looking at a player at age 21 will it be possible for me to identify those who will continue to improve through age 27 or even 30? Which ones will follow the most common path and start declining at 25?

For example, Mike Stanton at age 19 projects to be a league average RF, low on walks, high on K’s with very good power, similar to Craig Wilson. Following a standard aging curve, by age 24 he is likely to look more like MVP Ryan Howard. PECOTA might have the best idea, look in the historical record for players at the same age who had comparable performances, skill sets and body types and check those player’s future performances.


#148    Tangotiger      (see all posts) 2010/01/16 (Sat) @ 00:46

Good stuff Brian!

Ok, so if the average age of the players in your sample is say 24 years old, then, basically no big surprise if the peak age is 24.

If the average age in JC’s sample is 30 years old, then no big surprise if the peak age is 30.

And if the average age in high school baseball is 17 years old, then no surprise if we eventually get the data and we find they peak at 17.


#149    Brian Cartwright      (see all posts) 2010/01/16 (Sat) @ 01:10

I don’t think there is the average age/peak age relationship you are stating.

I believe attrition in pro ball starts at age 24 because the front office realizes that if a guy is 24 or 25 and not yet good enough for MLB he’s not likely to get any better, and they start getting released - about 5% at age 25, then around 15% each year from 26 to 35.

I ran just Japan 1998-2009, and got similar results to MLB only, peaking at 27-28 (smaller samples gave a bumpier curve). NPB and MLB are both top end leagues (MLB is the highest you can go, and only a handful of players each year get ‘promoted’ from NPB to MLB) and they have similar average peaks.

When I have time I’d like to look at Cuba. It’s a large league (16 teams) for the population (although no foreigners) creating a wide range of skill levels. Players have to defect to move to a higher league, and they have to drop far to get released. That should help control survivor biases. I currently have 2005-2008, with 2009 season about half completed, but the stats are in text files and not yet in the db, and I have to look up ages in the player bio pages which I believe is only active players.

I also have many years of amateur summer league stats, ages 16-21, but they have yet to be digitized. Way back when I did do some aging studies (on paper), with the players continuing to improve to the end of their age eligibility. Pony League’s top age division is 17-21, and they have a cap, I believe 6, of how many 21 year olds can be on any one team, as they believe that a team that can stock it’s roster with a disproportionate number of the highest available age will have an unfair competitive advantage. (The free market guy in me says the best recruiter gets the best team, if you don’t win now you have to hustle harder next time, not legislate against the winners).


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 03 23:35
Who’s evaluating the 2011 forecasts this year?

Feb 03 23:19
Susan G. Komen

Feb 03 23:03
Danks or Garza?  ToMAYto, ToMAHto?

Feb 03 20:18
Aasif Mavi and The Daily Show

Feb 03 20:06
Werth: How long can a non-CF stay in CF?

Feb 03 19:54
Illusion of numbers

Feb 03 18:02
Knowing enough about numbers to be dangerous

Feb 03 13:47
Are relievers being used optimally, compared to 1980?

Feb 03 13:00
Casey Kotchman line

Feb 03 12:11
ULTIMATE BASEBALL THE GAME