THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, March 06, 2007

Hardball Times Preview Book

By Tangotiger, 12:13 PM

When one of your favorite websites posts a tip jar, it behooves all to drop a few bucks in there every year.  It’s a gratitude for service.  The gang at Hardball Times certainly has provided me with more favorable service than the local pub.  But, Hardball Times is not content to put out a tip jar, but are instead providing you with a goodie bag:
http://www.hardballtimes.com/main/article/its-the-hardball-times-2007-season-preview/
I’d say, regardless of the quality, drop the ten spot, and get it.  Don’t look at it as to whether “is it worth it?”.

But, what about non-regular readers?  This blog entry is for you.  Is it worth it?  Beats me.  But I’ll tell you the highlights and you can decide:


1. A quick intro to the forecasting system.  There’s not much detail in there, but that’s the case with all non-Marcel forecasting systems.  We’ll have to check in 6 months to see how they stack up.  But, the guys who did the system are bright guys who “get it”, so it’s probably as good as anything out there.  And, you get a spreadsheet with all the forecasts.  It’s unclear how the distinction was made between starters and relievers, which to me is a huge issue.  The forecasts also aren’t separated by starters/relievers.  Maybe next year.

(Note to Studes: kinda dangerous to have the xls and pdf files share the same directory.)

2. You get a two or three pages for each team of “Team In A Box”.  I’ve only read one so far, from my favorite blogger (Lisa Gray, or should I say: lisa gray), and the style appeals to me.  It’s very Bill Jamesian, and very much for the internet age.  If you know your writer, you’ll invest your time in whatever he says.  But, those days are gone, and the “Team In a Box” format works for me.

3. The layout is very nice.  Frankly, Studes should handle all baseball stat books that come out.

4. The cool part is that fielding is included, though I suspect the aging for fielding can use some more work.  For example, Ichiro is rightfully at the top of best fielder in baseball for 2007 as a +17.  But in 2009, he’s a 35yr old at +16.  On the other hand, his OPS goes from .764 to .679 in 2 years.  Jeter’s fielding improves each year (albeit from -25 to -23), and his hitting takes such a beating that he is a below-average hitter in 2 years (as a 35yr old in 2009).  The fielding aging is very light, while the hitting aging seems very excessive.  Overall, it probably works.  So, while I can, prima facie, trust the 2007 forecast for hitters, I can’t for 2009.  I’d have to do more work to evaluate those.

I actually prefer the WAR title, rather than WARP, since there is no such thing as WARP.  You can’t do a wins above replacement, without considering position.  Therefore, the position is a requirement if you are discussing replacement.  Therefore, WAR should capture position, and it does.

5. If we select all nonpitchers with at least +.40 WAR and all pitchers with at least +.14 WAR, and you sum their PA and IP, you get pretty much the annual MLB totals for those categories.  This gives us the pool of players from which to do further analysis.  One of which is to sum their WAR, and we get 612 for nonpitchers and 483 for pitchers.  That total is 1095 wins, which is +36.5 per team, meaning +.225 wins above replacement per team.  That makes the replacement-level team .275.  It’s a little low for my taste (I prefer .300), but certainly justifiable. 

I operate with +583 for nonpitchers (+.12 x 162 x 30), and +437 for pitchers (+.09 x 162 x 30).

Of course, the THT results could simply have been the product of how I selected the pool to begin with.  After all, I could have sorted based on PA instead, and selected those guys.  If I did that, the total WAR would have been under 400, and that’s way too low. 

Instead, and this is my preference, if I select all nonpitchers with a WAR of greater than zero, and a PA of at least 380, I get the right total of PA, and the sum of WAR is +558.  Selecting pitchers with a WAR of greater than zero and at least 53 IP gives me a sum of WAR of +469.  That total is now 1027 WAR, which is +34 WAR per team, implying a replacement level of .290.  That sounds pretty good to me.

#1    David Gassko      (see all posts) 2007/03/06 (Tue) @ 14:43

Glad you’re enjoying the book. The exact team replacement level is .305, actually. Must be a sampling problem for you. I’m glad to answer any questions about the projections, the projection method, and anything else that’s in the Preseason Preview, btw.


#2    tangotiger      (see all posts) 2007/03/06 (Tue) @ 15:14

Right, I was trying to infer the repl level based on the data, and trying to minimize the sampling issue.  Getting it to .290, which is fairly close to .305, shows that there is just a limited amount of sampling for me to worry about.

***

I just noticed that while I said Jeter’s fielding improves, that’s not quite right.  On a per 670PA basis, he’s a constant -25 runs.  The reason he looks like he gets better is because he is forecast for less playing time, and therefore, less opportunities to accumulate suckiness.  The same applies to Ichiro.  Therefore, I have to conclude that the fielding component has not been aged at all.

As a rough rule of thumb, I knock out 3 runs of hitting and 2 runs of fielding and half a run of baserunning per year (that’s -0.5 wins).  I haven’t verified my claim, but it seems reasonable.


#3    Chris C.      (see all posts) 2007/03/06 (Tue) @ 15:20

Same here.

We asked Dave to write a readable introduction to the projections because we thought more readers would find that useful than a 20-page technical manual, but I’m happy to help out and explain anything I can.

Some early responses seem to indicate the aging adjustments are too conservative, while others think they are excessive. I’m confident in the basic results and will try to offer a more thorough explanation of the mechanics of the growth curves in an upcoming THT article or two later this month.


#4    David Gassko      (see all posts) 2007/03/06 (Tue) @ 15:24

Therefore, I have to conclude that the fielding component has not been aged at all.

***

This is correct. I found no aging effect for fielders, which of course is wrong, but the effect of a small sample. As I wrote in my Hit Tracker article, there are more than a dozen improvements we plan on making to the system next year, and this is certainly one of them. I preferred to not touch the fielding projections rather than guesstimate, though maybe some would have preferred otherwise. Anyways, I believe the aging effect for fielders might be smaller than you anticipate—as fielders get older, they get smarter, too.


#5    tangotiger      (see all posts) 2007/03/06 (Tue) @ 15:42

I definitely found an aging pattern with MGL’s UZR.  I don’t remember exactly how much it was, but I think it was around 1.5 or so a year.  And the speed positions (SS, CF) dropped off earlier.  While you are correct that wisdom and experience compensates, it doesn’t compensate enough, except for 1B (and even then, that’s only until age 29 or 30 or so).

It’s alot like hockey, where defensemen age better because speed is less important for them. 

Or in baseball, where walks (brains/experience) always improve, but they can’t compensate for the loss in other areas.

Anyway, I’m sticking with my -3, -2, -0.5 rule (-0.5 wins) until someone convinces me otherwise.


#6    tangotiger      (see all posts) 2007/03/06 (Tue) @ 15:59

It’s possible that you are conservative for the late20 players and excessive in the 30s players.

The Marcels have the following as the HR leaders: Howard (40 HR, 587 PA)
Ortiz (40, 614)
Pujols (38, 587)
Dunn (36, 609)
Andruw (35, 602)

My totals are 189 HR in 2999 PA, meaning an average of 37.8 HR per 600 PA.

THT has the exact same guys.  In their case, per 600 PA, they have 40.2 HR.

That’s a whopping 2.4 HR per 600 PA more than Marcel.  Does this make sense?  Maybe, if we know something more about these guys.  Marcel uses limited information, and therefore must, by definition, regress more.

How about if we focus on 33yr olds?  Four top NYY players are 33 in 2007 (Abreu, Jeter, Matsui, Damon) for an OBP of .372.  In 2009, their OBP are at .357, for a 15 point drop.

Historically, I don’t have the OBP aging, but I have this:
http://www.tangotiger.net/aging.html
The Linear Weights Ratio drops from .939 to .902, which would be analogous to saying that their times on base drop by 4% while their outs remain constant.  So, a .372 OBP would become .363 in 2 years.

The other column in that link shows a 7% drop, meaning a .372 OBP becomes .355.

I guess your .357 seems reasonable (though I wonder if it’s biased, in the way that I describe in my linked article).

Ok, so at first glance, it looked wrong, but the more I look at it, the more I suppose it may be pretty good.


#7    Chris Miller      (see all posts) 2007/03/06 (Tue) @ 16:01

I remember using the UZR data on Tango’s site from 2000-2003, and came up with .6 Runs per year, but that was for players who did not move position.  I also took some ZR data and found 1.3 Runs per year, but that included player movement to new positions, with adjustments (and regression).  The new THT fielding data gives me .05 runs per year, so yeah, it makes it look like next to nothing, which is odd.


#8    Chris Miller      (see all posts) 2007/03/06 (Tue) @ 16:09

Oh, and .05 Run drop was based on a regressed Plays/162 Above Average, non regressed #’s were at .01 runs per year.


#9    tangotiger      (see all posts) 2007/03/06 (Tue) @ 16:37

Chris: since THT is rounding numbers, your .05 or .01 is essentially zero, which confirms DSG’s point that he did no aging on fielding.

***

As for the 1.3 (which is more trustworthy), that probably works out to 1.0 runs for guys in their late 20s, and then 1.5 and then 2.0 as the guy gets into his 30s.  My “-2” may be a bit exagerrated, and perhaps “-1.5” would make more sense.  Regardless, it’s not zero.


#10    studes      (see all posts) 2007/03/06 (Tue) @ 21:17

I was thinking that the only people who have access to the xls file have already bought the PDF, so it wasn’t particularly dangerous to add it to the directory.  No?


#11    tangotiger      (see all posts) 2007/03/06 (Tue) @ 21:31

The difference being the single id for the xls file and the unique id for the original PDF payment.


#12          (see all posts) 2007/03/07 (Wed) @ 10:42

I’m a huge THT fan, so kudos to the whole team on the site and getting 2 books out this year.  Wow.

Having 3-year projections right out of the gate is exciting and ambitious.  But, as a stat-head and a fantasy player, I have some suggestions for next year.

1) There is way too much regression to the mean on pitcher W/L records.  Almost no pitchers are projected to have over 11 wins. 
2) The projections would be much more useful with some playing time assessments.  I know that ZIPS doesn’t do them, and I understand why.  But if you want this to be a competitive pre-season publication, fantasy players care about counting stats, and these just aren’t realistic.  The depth chart method seems to be the best way to make these adjustments.
3) Even for young players, the projected PA decline over the 3 years.  I know there is some injury risk, but I’d suggest taking the baseline injury risk as a given, then only doing injury adjustments for players with above average injury risks.


#13    studes      (see all posts) 2007/03/07 (Wed) @ 12:48

Great points, mulkowsky.  Thanks very much.


#14    tangotiger.net      (see all posts) 2007/03/07 (Wed) @ 12:55

You may be right about their W/L.

We should remember, that forecasters are not trying to predict the number of pitchers with 12+ wins.  They are trying to predict the number of wins each pitcher will have.

I talked about it here:
http://www.hardballtimes.com/main/printarticle/forecasting-2006/
When I said:

The highest forecasted RBIs were 112 (Tejada), 110 (Pujols), and 108 (Ortiz). What is this, the 1980s? If you had wanted me to only forecast RBIs, and not tell you who would do it, I would have said 150. Why would I give a number like that? Because from 2001 to 2004, the four highest RBI totals were 160, 150, 146, 145. It would therefore be reasonable to think that the league leader will be around 150. The league leader in 2005 had 148 RBI. So, I would have been pretty close, as an over/under.

But, how sure could I have been that it would be Ortiz? You could come up with a reasonable list of 15 or 20 players that would lead the league in RBI. But, that’s not what we area trying to figure out. We are trying to come up with reasonable over/unders, numbers that you could find equal reasons where the player will over-perform and under-perform. Injuries, as we know with Bonds, can devastate any forecast.

Anyway, if we look at what Marcel thinks:
http://www.fangraphs.com/projections.aspx?pos=all&stats=pit&type=marcel

Marcel has 16 pitchers forecast to win around 13 wins (each of which is give or take a bunch, like 4 or 6).  THT has 4.

Not coincidentally, Marcel has 14 pitchers with 190+ IP, while THT has 5.

The IP to decisions ratio is around 9.3 or so on THT, which sounds exactly right.  The excessive regression is on playing time, not just wins. 

However, among the top save guys, the IP are much higher than Marcel.  For example, Marcel has Rivera leading the back of closers with 70 IP, while THT has him with 80!  Considering that Mariano has pitched more than 80 innings just twice in his career (last time in 2001, and as the young stud we all loved in the 96 World Series), we shouldn’t assume his mean forecast will be 80 at the age of 37/38.

I think it’s likely that no distinction was made between starters and relievers (or if there was, not enough of one was made), and you end up regressing each set of pitchers to the wrong mean (that is, starters are regressed to a population mean that is too low, and relievers regressed to a population mean that is too high).


#15    Trader Joe      (see all posts) 2007/03/07 (Wed) @ 13:42

I also welcome the HBT entry into the projections game.  I hope their projections will be included in the “unified projections” database.

Seems to me that the HBT might have taken better advantage of Marcel as an assessment tool during the design stage to highlight some of the issues that have been raised on this thread—before publishing their results.  Also, the suggestion about using depth charts is very important.


#16    Chris C.      (see all posts) 2007/03/07 (Wed) @ 13:43

Those are interesting suggestions, mulkowsky.

I think it’s fair to say we are not completely satisfied with our playing time estimates. That was o.k. because we were most interested in projecting the important rate stats, but I can understand why better playing time estimates would be helpful if you’re interested in counting stats. That said, I’m never going to project an 18-game winner even if I’m pretty sure that *someone* is going to win 18 games this year. See tango’s above post for a good explanation regarding the reasoning behind that.
We did use growth curve modeling to estimate some aging patterns very generally, but it would be nice to account for injury risk on a player-specific level and also make adjustments for anticipated role changes (a reliever moving into the rotation, a utility guy expected to earn a starting second base job, etc).
It’s something we’re going to think about more over the next year.


#17          (see all posts) 2007/03/07 (Wed) @ 14:10

Thanks for the thoughtful discussion.  The “open source” nature of the on-line baseball community is really phenomenal. 

A few thoughts on the above.  Johan Santana has averaged 18 wins in the past 3 years and Carpenter 17 wins.  Given aging curves, I’d think smart Vegas over/under odds on their 2007 projections would be about the same. 

The playing time comment on the pitchers seems right on, and I think it is an issue for batters as well.  Carl Crawford (to take 1 example) has averaged 670 PA in the past 3 years.  THT has him projected at 638.  I think that the playing time regressions to the mean (or injury risk assumptions) are too aggressive here.

BTW, I don’t know if Nate Silver has commented on this, but PECOTA seems to limit its playing time regressions to the mean.  If you look at the weighted mean forecast for playing time, it’s typically at the 75th percentile for the player, not the 50th.  This leads to more helpful projections, at least for fantasy purposes. 

Thanks again and very glad to have THT’s contribution to the projection field going forward.


#18    tangotiger      (see all posts) 2007/03/07 (Wed) @ 14:36

You are wrong.

Since 1985, there have been 92 pitching seasons where a pitcher had at least 15 wins for three consecutive seasons, including repeaters like Maddux (15 times!), Clemens, etc.

The average number of wins in those three consecutive seasons were: 17.5, 17.7, 17.8.

Care to guess the average number of wins for those pitchers in the following season?  13.7

Injuries are such killers, it’s easy to forget about them.  RJ, Pedro, Hershiser.  List goes on.  26 of those 92 pitchers (28%) ended up with 10 wins or less (averaging 7 wins), after posting three consecutive 15 win seasons.

46 of the 92 (50%) ended up with 15 or more wins (averaging 18 wins).

If you have a pitcher that has averaged 18 wins over the last 3 years, he’s got a 50-50 shot of winning at least 15 games.  Because of the severe downside and limited upside, his MEAN forecast will be a little lower (13.7).

It’s really as simple as that.


#19    tangotiger      (see all posts) 2007/03/07 (Wed) @ 14:39

As for forecasting playing time, a good discussion was had here:
http://www.insidethebook.com/ee/index.php/site/comments/forecasting_pujols_ab/


#20    tangotiger      (see all posts) 2007/03/07 (Wed) @ 14:56

Also note that PECOTA has Carpenter with a mean forecast of 14 wins, but a 50% level of 13.  I don’t see how that’s right.  As noted, the mean forecast should be less than the over/under forecast.  In any case, PECOTA has the general basis correct.

Bill James (BIS actually) and ZIPS seem to follow a 50/50 approach, as opposed to a mean approach:
http://www.fangraphs.com/statss.aspx?playerid=1292&position=P

The BIS numbers almost looks like they have no regression toward the mean component.  Forecasting 236 IP for Carpenter is a very optimistic forecast.

It will be fun to see in 7 months how it all shakes out.  My guess is that BIS will not do good when it comes to playing time forecasts.


#21          (see all posts) 2007/03/07 (Wed) @ 14:58

tango, my inner saber-geek is thrilled to be dialoguing with you.  Your work in the community is amazing, and I love “The Book”. 

Thanks for bringing those stats to the table. There are difference ways to assess how accurate a forecast is.  (Correlation, LMSE, etc.) If the whole league consists of 10 players who will average 500 AB if healthy and 100 AB if injured, and there is a 10% change of injury for all of them, is it “more accurate” to project 500 AB for all or 460 AB for all?  It’s a reasonable debate and I can see both sides. 

However, I want to offer a fantasy player’s perspective, as I think that is one of the markets for the THT pre-season guide.  Fantasy players will never expect anyone to predict the random Halladay/Matsui injury to a previously healthy player, so to a fantasy player, the 460 AB prediction will appear low and inaccurate, even though it is mathematically correct as an Expected Value.

Similarly, on pitching wins, I think that for the fantasy market and for established players, the median is a more relevant measure of prediction than the mean.  Part of the reason for this is that if a player is injured in Roto, he can be replaced, so some of the downside risk is avoided.

Based on my Roto experience and how other fantasy oriented analysts do their projections (Shandler, PECOTA) I’d contend that most Roto players looking at projections would prefer the “most likely scenario” (i.e., median, aka Vegas over/under) to the pure arithmetic mean. 

Hope this is helpful for next year.


#22          (see all posts) 2007/03/07 (Wed) @ 15:06

PECOTA’s weighted mean forecasts for playing time tend to be between the 60%-75% percentile.  (Weighted mean for rate stats tends to be about 50


#23    tangotiger      (see all posts) 2007/03/07 (Wed) @ 15:31

Thanks for the kind words!

***

There’s nothing preventing *both* from being brought forth.  If THT says they are presenting “mean”, then it’s mean.  They can present both, as does PECOTA.

***

When it comes to evaluating the results, an RMSE or absolute difference, is what you want. 

In your example, forecasting everyone for 500 means that you have a gap of 400 for one guy, and 0 for the other, for an average of 40. 

Forecasting 460 for all 10, means you have a gap of 40 for 9 and 360 for the other one, for an average of 72.

On the RMSE side, they are both close.

So, you would definitely have a point in this case, if the goal was to limit differences.  And this was addressed in the link in post#19.


#24    tangotiger      (see all posts) 2007/03/07 (Wed) @ 16:19

Continuing the example of post#23, forecasting 500 for everyone would minimize the average absolute difference.  Forecasting 460 would minimize the RMSE.

For the 500 forecast, the numbers would be 40, 126, for diff and RMSE respectively.
For the 460 forecast, the numbers would be 72, 120.

As you can see, there is a much bigger difference between the absolute difference results than the RMSE results.

My gut tells me that I care much more in the absolute difference.  I really don’t see the point, or understand the meaning, of squaring the differences, averaging them, and taking the square root.  I understand it relative to a creating a distribution, and using statistical theory on that, but otherwise, I don’t.


#25    Rally      (see all posts) 2007/03/07 (Wed) @ 17:32

Tango, on pitchers winning 15+ games 3 years in a row, that includes the great pitchers and also some Russ Ortiz types.

I know you’re still going to have injuries and regression to the mean to deal with, but I’ll bet if you select something like pitchers with an ERA (better yet - FIPS) 10% better than average each year in addition to the wins, the group will average more than 13.7 wins.

On 3 year projections - I’ve played around with some but never published them.  Its pretty easy the way I did it, just pretend your 2007 projection is exactly what the player will do.  Then, in Marcel’s case, weight your 2007 projection at 5, 2006 at 4, 2005 at 3, add the league mean and age factor, and you’ve got a 2008 projection.

If your max r = .70 for a one year projection, year 2 should be about .60 and year 3 around .50.

You could extend this indefinitely, but each year you add will be less accurate than the last, so its up to the forecaster how much uncertainty they are comfortable with.  Me, I’m comfortable with 1 year.


#26    Tangotiger      (see all posts) 2007/03/07 (Wed) @ 18:13

Limiting to players with a FIP of 4.00 or better in each of those years, I’m down to only 39 pitchers (including 14 by Maddux).  The average wins is 15.2, while the 3-yr average is 18.2.

Dropping the requirement to at least 14 wins in each season, I get 56 pitchers.  The average wins in those 3 years is 17.4, and the actual wins in the forecasted year is 14.8.

Assuming the population mean would be 9 wins, the regression amount among these pitchers is 30-35%. 

Chris Carpenter has a 3-yr average of 17 wins, which if we regress a third of the way toward 9, gives us 14.3 wins.  Marcel has him at 14.

Santana has a 3-yr average of 18.3 wins, which matches our 39 pitcher sample at the top of this post.  Those Maddux-heavy pitchers averaged 15.2 wins and Marcel has him at 15.

I know we’d like to think that the regression is too strong, but, it is what it is.


#27    David Gassko      (see all posts) 2007/03/07 (Wed) @ 18:20

Its pretty easy the way I did it, just pretend your 2007 projection is exactly what the player will do.

***

This is actually incorrect. Your projection for 2007 (minus park, league, and aging) is your expectation of the player’s true talent. That should not change when you project him for 2008, 2009, or 2172. All that changes is his age, and what we’ve tried to do is best-capture how specific players age.


#28    Tangotiger      (see all posts) 2007/03/07 (Wed) @ 18:53

I agree with DSG.  The 2007 forecast represents the sample data of 2006 and earlier, along with regression.  To then use that forecast, combined with the sample data to get a 2008 forecast is wrong.  As David is pointing out, you are mixing up apples (true talent data) and oranges (sample data).


#29          (see all posts) 2007/03/07 (Wed) @ 21:21

tango, in your examples of pitching wins, why are you making your measure of projection success having the same population mean.  shouldn’t it be having the lowest absolute difference.  i think absolute difference would be a better measure of whether marcel or any other projection system is regressing too much or just enough.


#30    MGL      (see all posts) 2007/03/07 (Wed) @ 22:38

I’ll chime in with a couple of things.

One, when you do aging curves and patterns, you absolutely MUST use the “delta” approach.

Tango suggests (I think) that you don’t use first and last years in the analysis, but I have played around with this and it does not make much difference whether you do or don’t.

His reasoning (I assume) is that there are strong selective sampling issues in 1st and last years.  Players with bad first years tend to not have second years and last years years tend to be bad years.

Even of you don’t use first and last years, you have selective sampling problems.  Years which are NOT last years tend to good years and even second years that are followed by third years tend to be good years.

You cannot get around selective sampling problems with age analysis, unless you somehow try and model it and then adjust, which would be a bear.  The bottom line is that any year which has a following year tends to be a good year, so that drop-offs are always going to be slightly exaggerated.  I don’t think this is a big problem though.  However, maybe this is why most analyses shoe peak overall or offensive age at 27.

Speaking of peak age, and we will confine it to offensive performance, there has been some suggestion that peak age has changed lately.  I find that if I use the delta approach after 1997, that peak age does seem to appear to be around 30, with a strange temporary drop from 27 to 28.  The drop could easily be a sample size issue.  If I were a conspiracy theorist, I could speculate that peak age has increased to 30 solely because of PED’s and even that the observed drop from 27 to 28, and then an increase until 30, is due to some players only taking steroids after their skills have started to erode (at age 28 or so).  Anyway, I think the jury is still out as to the peak age of offensive performance, especially if we are talking PED-free.

***

Using the same delta method for aging curves, here is what I find individually for the 3 “big” components (defense, offense, and baserunning), and overall:

For offense (lwts), from age 27 to 32, around 1.5 runs per 500 PA per year decline.  From 32 to 37, around 2 per year.  After that probably 3-4 per year, although I don’t have enough data to tell for sure.

For Superlwts (offense, defense, and baserunning), players peak at 26, and then lose about -2 per year from 26 to 30 and then almost 4 per year from 30 to 35 and then like 8 per year from 35 to 40.

UZR (all positions) declines from the getgo, like triples rates, at about .6 runs per year per 162 games until age 33, and then drops off considerably after age 33 at around -3 per year.

Baserunning also declines from the getgo, at around only .1 runs per year.

OF arms have a weird curve and I have no idea why.  OF arm lwts drops from 0 to -4 per year at age 28 and then go back up again to -2 by age 36. Maybe players lose their arm strength early and then learn to throw accurately.

I think that I am including positional changes in the data, but I am not sure.

One more thing.  I have looked at playing time decreases with age, including attrition, also using the delta method (which you must use as well).  For pitchers, even good ones, it is alarming.

Without going into all the details, almost all STARTING pitchers (without controlling for health or chance of injury), even very good ones who start out with lots of IP (150+ IP), lose around 20-25 IP per year including attrition (IOW, if a pitcher does not pitch, he gets credited with 0 IP for that year).

And you can NEVER project more than 190 IP in any subsequent year for a SP (I did not look at what happened when a pitcher has multiple years of high IP and/or multiple years of good performance, and again, I am not looking at different subsets of pitchers, grouped by health or chance of injury).

For a pitcher who was very good (good ERA) and pitched over 150 IP in any given year, his chances of pitching in 5 years hence, even if he is young, is only around 54% and his average number of IP (counting not pitching as zero) is only around 80 IP!  For starting pitchers, again, even very good ones, and those who had over 150 IP in year X, the rate of attrition is almost 10% per year (it is not linear – the first year, it is only like 6%).

Long-term contracts for pitchers, even good and young ones, are bad news!


#31    MGL      (see all posts) 2007/03/07 (Wed) @ 22:45

"almost all STARTING pitchers (without controlling for health or chance of injury), even very good ones, who start out with lots of IP, lose around 20-25 IP per year including attrition (IOW, if a pitcher does not pitch, he gets credited with 0 IP for that year).”

I don’t know why I said “Almost all starting pitchers..” What I meant was “starting pitchers as a group...” and given the way I conducted the research (I simply looked at what happens over the next 5 years after any year by a SP in which he pitched more than 150 innings, although I also limited it to certain age groups and certain ERA’s in that year).


#32    David Gassko      (see all posts) 2007/03/08 (Thu) @ 01:01

UZR (all positions) declines from the getgo, like triples rates, at about .6 runs per year per 162 games until age 33, and then drops off considerably after age 33 at around -3 per year.

...

Without going into all the details, almost all STARTING pitchers (without controlling for health or chance of injury), even very good ones who start out with lots of IP (150+ IP), lose around 20-25 IP per year including attrition (IOW, if a pitcher does not pitch, he gets credited with 0 IP for that year).

***

All good stuff, Mickey. The small decline in fielding before age 33 probably explains why my sample size was too small to find any effect—that’s good to know, and we’ll know to aim for about that kind of a decline when we re-visit our aging analysis for our 2008 projections.

Chris is the aging curve guru here, so I don’t want to speak for him, but as far as I understand, he found a similar (though smaller, I believe) effect for starting pitchers (and a lesser one for hitters as well)—players tend to lose a lot of playing time as they get older, no matter how much playing time they’ve had in the past.


#33    MGL      (see all posts) 2007/03/08 (Thu) @ 04:41

I have never done a detailed study of playing time curves with age, such as what factors might influence it one way or another. 

I think that most people, and certainly many teams, fail to realize the relatively small chance that any given pitcher has, age or talent notwithstanding, of pitching a substantial number of innings 4 or 5 years down the road.

For hitters, the decline in playing time is not nearly as dramatic.

For all hitters who compile at least 500 PA at age 25, only 74% of them are still playing at all 5 years later (that includes players who come back to play at a later date).  In year X, they average 613 PA.  In year X+5, they average only 363 PA (including zero for those who are not playing).  Of those who still play, they get 488 PA, still a considerable dropoff at only 30 years of age.  For those 25 yo who not only had at least 500 PA, but an OPS > .850 (very good players), still only 77% were still playing at age 30, and of those, they got 524 PA, as opposed to 641 at age 25.

For 30 yo players who had an OPS above .875, only 58% were still playing at age 35!  Including those who did not play, the average number of PA at age 35 was 238.  If they did play, it was 411.

For all 30 year olds, not just the good ones, only 49% were still playing at age 35 and the average number of PA were 381/185.

Even among elite 30 yo hitters (OPS > .900), only 58% were still playing at age 35 and the number of PA you can expect, including attrition, is only 243.

Obviously a player’s overall health and injury history (as well as defensive position and player size, etc.) would factor into this, as well as how many years they have sustained high PA numbers and their career OPS.  I only looked at the 5 years following any year in which a player had at least 500 PA at the various OPS thresholds.


#34    MGL      (see all posts) 2007/03/08 (Thu) @ 04:45

I am pretty sure that the decline in fielding varies a lot with position. For example, I think that the speed positions decline a lot more than the other ones, and I don’t think that first base declines much at all until the mid or late 30’s.

I also need to make sure that I am only looking at pairs of years in which a player plays the same position.

The small overall decline in UZR before age 33 is probably a function of the player slowing down but gaining experience, one tending to cancel the other.


#35    Rally      (see all posts) 2007/03/08 (Thu) @ 11:43

DSG,

Have you looked at previous years (retrojections?) to determine how accurate a 2 year or 3 year is compared to the one year?

I realize that may not be possible depending on what types of detailed batted ball data you are using.


#36    Guy      (see all posts) 2007/03/08 (Thu) @ 13:33

MGL:  do you have pitcher aging curve data (performance, not IP), similar to your post #30 info on hitters, that you can share?


#37    MGL      (see all posts) 2007/03/08 (Thu) @ 21:30

I simply looked at “age deltas” for all players in the career registry I have from around 1970 or something like that.  I used raw numbers (not park adjusted or anything like that).  I did not adjust for players who switched leagues from one year to the next.  I did not normalize each year’s stats either, except for opponent lwts (for that I set the out value for each year such that total league lwts=0).

I did not exclude any years for lack of a certain number of TBF or IP, but I did weight each “delta” by the lesser of the two TBF, so a year pair in which one of the years had very few IP or TBF got little weight anyway.  IOW, if a pitcher at age 25 had his ERA go down .5 runs from age 24, but he only faced 3 batters at age 24 or at age 25, that “delta” (the -.5 ERA from 24 to 25) was hardly counted at all - it got weighted by the 3 TBF.

Anyway, using the same methodology for pitchers, I don’t get a nice pattern.  Based on oppon. lwts, pitchers appear to peak at age 24.  Based on ERA, their ERA goes up slightly every year slightly until age 26 at which point it goes up .05 per year.  At age 32 or so, it goes up around .1 a year.  BB rate appears to go down until age 26 or so and then remain stable until age 32 or so and then go up.  K rate appears to go down until around age 23 and then remain stable until around age 32 also.

I think there is a lot of selective sampling with pitchers.  For these numbers I did not remove first and last years.  Maybe it would look better if I did.  I’ll try that later.

A long time (years) ago, when I looked at pitching performance curves as a function of years of major league service, as opposed to age (of course, they are similar), it looked a lot cleaner, IIRC.  I’ll have to look at that again later.


#38    tangotiger      (see all posts) 2007/03/08 (Thu) @ 23:51

There’s alot of selective sampling with pitchers, much more than with batters.  Here’s my charts for pitchers, starting with the basic one, which like MGL, shows the pitchers peaking at 24.  The later charts have regression applied to them.

http://www.tangotiger.net/adjacentPitching.html


#39    Guy      (see all posts) 2007/03/09 (Fri) @ 09:34

Could you deal with the selective sampling, at least somewhat, by looking only pitchers who pitched at least X seasons (maybe 6 or 7)?  This would largely eliminate pitchers on the ‘bubble’ who don’t get to pitch in MLB in yr 2 because of poor performance in yr 1.  Obviously, you would only be looking at above average pitchers.  But I don’t see any reason to think aging curve is talent-dependent.  And even if there is a link, it’s the guys who play for a while we mainly care about when looking at aging curves—not what a replacement level player is likely to do in 3 years. 

Weighting by fewest IP could also introduce a bias that understates age decline.  A lot of declining players will pitch fewer innings in yr 2, and be weighted by yr 2 IP.  Improving players will sometimes have fewer IP in yr 1 and so also be weighted down, but this may not be symmetrical.


#40    tangotiger      (see all posts) 2007/03/09 (Fri) @ 11:18

Under such a scenario, Mark Prior and John Rocker might disappear from the pool. 

Whatever it is that you do, you have selective sampling.

If you choose only pitchers who pitch at least 50 IP at age, say 30, you are excluding any guy who get hurt, who are flashes in the pan, etc.

Then, next time you have a stud by age 27, you will be comparing him only to pitchers who satisfied your original conditions (i.e., excluding the flashes and the injured pitchers), and therefore, giving him an aging curve that presume that this 27-yr old stud will be more favorable than it should be. 

If on the other hand you say, up front, “since I’m only basing the aging curve on 25% of the pitchers that were in his situation at age 27, then this is the 75th and better percentile forecast”, then I’m ok with that.


#41    MGL      (see all posts) 2007/03/09 (Fri) @ 21:03

Off the top of my head, you are never going to get rid of all or even most of the selective sampling problems.  You can mitiagte them, I think.  The only way to really figure out exact aging curves would be to make sure that every pitcher pitches a lot every year from age x to age y, no matter how well or poorly he pitches in any one year. Obviously that cannot happen.

On the other hand, using methodologies like similarity scores, like Pecota uses, will presumably get you the exact answer you are looking for, which is “exactly what can we expect in the future, both performance-wise and playing-time wise (and injujry-wise, etc.) from this particular player or similar players.

The problem with this kind of methodology is sample size (of the “comps").

If you are able to use larger sample sizes to generate more accurate aging (and other) patterns AND those patterns roughly apply to all pitchers (pretty much), that is probably a better methodology.

Using some degree of both probably works the best. I think that is what Pecota does - uses both types of methodologies and somehow combines them.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 04:02
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 19:13
Offense by position groups by decade

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?

Nov 19 13:50
Response of a fired head coach