Wednesday, March 03, 2010
Wieters II
Courtesy of Clark Kent, because Kolin is not around.
Buy The Book from Amazon
It sounds like BP is satisfied with the system that produced its Wieters 2009 and Montero 2010 projections and doesn’t see those as problematic.
Dave Pease:
To follow up on this--I’m not much of a natural marketer, but the reason we haven’t been more forthcoming about serious issues in the 2010 projections is that we’re not aware of any serious issues in the 2010 projections. Seriously.
If you are in a non-keeper league, as far as I know there’s no reason you shouldn’t be able to make use of the projections in the book, or the ones released on the site in late January, to inform your draft strategy.
The long-term projections (and, by extension, the upside), the comps, the weighted means, the percentiles--we’ve had problems with all of those to some degree, and we’re still working on fixes. But if what you’d generally use is the output from the PFM, none of those are relevant.
http://baseballprospectus.com/article.php?articleid=10226
Also in that thread, Colin says that the 2009 and 2010 PECOTAs are just as accurate as they were in 2007 and 2008 by RMSE OPS. How is it then that PECOTA performed basically up there with CHONE in 2007 and 2008 and at the back of the forecaster pack in 2009? I’m asking because I don’t know, but something still doesn’t make sense here.
I have slightly older runs of the new PECOTA system for three years of data, which I have tested against the originally published PECOTAs. Old and new PECOTA both return identical RMSEs for OPS, at least in the version of the test I’m currently looking at.
Mike, I got very confused reading that thread, with old and new, and past and current runs. It was hard for me to follow.
If Colin or BPro wants to use this thread to try to make it clearer, feel free.
You know, Tom, on a second reading (or really third or fourth), I’m not sure Colin was saying what I thought he was saying.
Perhaps he was just saying that the current PECOTA software run against 2007, 2008, and 2009 data produces forecasts that are very close to the 2007, 2008, and 2009 forecasts that were released using the version of PECOTA software in existence at that time.
Which basically means they didn’t break anything major when they ported from Nate’s Excel spreadsheets to the current software. Which is a good thing to know, but it doesn’t really address their poor performance in the 2009 forecaster comparisons.
I can envision scenarios in which the RMSE OPS remains pretty decent but the projection system as whole is line with Marcel but slips on some fraction of players (like Wieters, Montero) such that you certainly wouldn’t choose the projection system to use ahead of many other better systems but when averaged out over the whole MLB population its performance looks okay.
I did appreciate this comment from Dave Pease:
If you really think that we’re trying to pull the wool over your eyes by producing test results that we’ve massaged, or made 2009-specific changes only for the purpose of passing tests on 2009 data, please send me an email and I will refund your money.
We’ve had serious execution and communication problems this year, and I must again apologize for those, but I’d sooner go out of business than resort to something like that.
I’d imagined/hoped they felt that way, but it was very bold of him to say that and I appreciate that.
I only wish that BP viewed the rest of us out here in saber-world as being on their side in getting this fixed. Dave says he hasn’t seen any issues raised about their core projections. I can’t speak to the issues raised in the comments at BP Unfiltered, but I have raised such an issue. Neither the issue that I raised in my THT Live post on Montero nor the Wieters issue that Colin raised last year have been addressed by BP. I’m trying to help here. I think it’s obvious that Colin’s criticisms last year were also intended to help, given that he’s now working for BP.
I wish that BP would engage me (and the rest of the community of saberists) seriously rather than dismissing me as a liar and saberist sycophant. I’m not looking to work for BP like Colin is. I don’t have the time or interest for that. But I do guarantee that if BP addresses my concerns about PECOTA, I will be as public as I can in announcing my happiness with the system.
Mike, I would guess that from their perspective my blog is not the best place to address these issues.
I’m sure they probably have 100 things on their to-do list for PECOTA that come ahead of “make Mike happy”. That’s fine. My happiness isn’t really at stake either way.
I sense that they are working very hard and trying to address all the issues. However, beyond the short-term concern of making the subscribers happy, if they are really trying to build the best projection system out there, what better place to discuss than with the readers here? I’m serious. Also, I don’t get the sense from what Dave said that they are addressing the issues at all or believe that they even exist, even privately. He seemed to be completely flat-out honest and open that they don’t see any issues with the weighted mean/median projections.
I do believe that the saberist mind share drives the fantasy viewpoint. Why do people, even the mass of commenters at BP Unfiltered, think that CHONE is now the best system out there? It’s not because they evaluated it for themselves. They looked to somebody they trusted to tell them so.
If BP wants to win or finish near the head of the forecaster competitions, they’re going to need to address the real issues that caused them to do so poorly in 2009. It’s fine if they don’t care about that, I guess.
Conversely, I love to see how Brian Cartwright has handled the criticism of his system here. He’s taken the advice and gone back and improved and fixed. He also defended the things that he thought he had done correctly/better. But he didn’t stop there.
I imagine he would say that in the balance he was glad to have his system criticized here. True, nobody likes criticism. I don’t. But I have a sneaking suspicion that Oliver is gonna be a contender for the heavyweight title one of these years in the very near future.
Tom, if you would prefer I quit posting on the topic of PECOTA, please let me know. I don’t want to drag your blog down a road you don’t want to go.
The reason I post my concerns here mostly and once at THT Live is that here (mostly) and at THT Live (a little) are the only two communities of saberists that I believe are capable of helping BP resolve the PECOTA problems. I haven’t addressed anyone at BP privately on the issue, other than responding to an email that Will Carroll sent me, because I don’t believe I alone am their best of hope of identifying and debugging issues.
I suppose it should be plain to me by now that they don’t intend to use this forum (or THT Live) for help.
I certainly have no idea what goes on behind the scenes at BP, but…
I forgot who said it, but someone said something like, “Let’s not pretend that any business does what they do for any reason other than to make their owners money.”
While it is true that some business are more ethical and honest than others, NO business exists to make the world a better place. They all exists for one reason and one reason only. And what they do about things like this Pecota thing is driven by one thing and one thing only.
Mike,
Normally, I have no problem. In this particular case, their PECOTA threads on their blog are filled with tons of their own readers making their voices loud and clear on the subject.
Unless BPro authors are actively engaging you in this or any thread, we just make our point in one or two posts and move on. What may seem like being helpful to you or me may be interpreted as something less than pious by others.
From one of the comments on BP:
“Basically, most players are projected to remain stable or improve from mid/late-20s through age 35 or so. This is not a normal aging curve, and differs significantly from past long-term PECOTA forecasts. E.g., Dustin Pedroia’s TAv taken from his latest 10-year forecast:
Age 26: .305
27: .294
28: .292
29: .290
30: .291
31: .293
32: .292
33: .291
34: .301
35: .292”
So that’s why they hyped JC Bradbury’s aging study.
For reference’s sake, Pedroia’s wOBA from Oliver:
Age 26: .353
27: .348
28: .346
29: .338
30: .333
31: .326
That part of the Oliver aging trend looks to be right in line with what we expect. But I have noticed all the young players are peaking at age 24. What up with that?
There are two things working against each other when predicting future performance of young players, right? On one hand, they are getting older and one would assume more work and more physical maturity would make a player better. On the other hand, we recognize that it’s harder to predict what will happen the further ahead we look, so more and more regression gets built into the process.
So if you’ve got a 20 year old with potential, the aging curve expects him to get better through age 27ish and then decline. But regression hedges our bets right from the get-go, and builds on itself the further out we look. The relative importance of those effects could vary widely depending on the actual numerical values, but I could easily see that the combination of the two would produce a “peak” at an age earlier than 27.
That’s not the age the projection expects him to peak, it’s just what happens when you throw a bunch of percentiles and probability curves into a blender of uncertainty.
Or am I crazy?
That can’t be it. Take a below average hitter, say Peter Bourjos. He is also shown as peaking at 24. If regression were working in the manner you describe we should expect regression to be pulling him up closer to league average and working with, not against, the positive age adjustments.
Hey Rally,
Broadly speaking, Brian found that once you include minor league numbers in your aging curves, the peak for most hitters is 24. I think this makes some sense: Aging curves based on just major league numbers are biased in that they only include players that age well, while those that don’t drop out of the sample (this is the same problem that affected JC’s study). The more players you include, the younger the peak appears to be.
The ‘09 Wieters issues weren’t issues with PECOTA in and of itself at all - they were issues with the DTs (essentially MLEs) fed into PECOTA at the time. (And that’s in fact what I said before I got hired at BP.) Of course now I’m going to get someone asking if the current DTs are fine, and my short answer is I think so - but I haven’t done a lot of testing of this proposition.
(Wieters was something of a “perfect storm” where he had ONLY played in two leagues and those were the two leagues where the DTs seemed to be out of whack.)
Past the issues with the DTs, I don’t know that anything was out of order with the ‘09 PECOTAs, compared to where they were in ‘07 and ‘08. My RMSE tests on OPS produced similar results to ‘07 and ‘08. (I have Marcels in there as well, and Marcels “beats” PECOTA in projecting OPS by about .003 points of RMSE over a three-year period, bearing in mind that Marcels projects over 10% fewer players. CHONE and ZiPS are forthcoming, which is why this hasn’t been published yet.)
And the ‘10 PECOTA system (the new database-driven replacement for the Excel-based PECOTAs used in the past) run on the same data (ie what was available at the start of those seasons) provides similar results when forecasting OPS to the originally published PECOTAs over that three year period.
If you’re looking at things like the 10-year forecasts and such, yes, there are known issues there. I don’t know where Clay is at in addressing those, but they are being worked on.
But when it comes to the basic, one-year projections - they’re in the same condition that the past year’s projections have been in terms of accuracy.
David, that’s very interesting. It seems quite believable to me that players reach their physical peak by 24 (or even a little earlier). While they might gain a little strength after 24, that would probably be offset by decline in speed and reflexes. I assume growth after 24 is about continuing to learn the game, not raw physical talent.
It could be that there’s a negative correlation between physical talent and this kind of learning. First, the best players (those headed for long ML careers) have less incentive to learn at lower levels, since they are succeeding based on physical talent. It won’t be until they start to fail more often that they have the necessary incentive to learn and change their approaches. Also, some learning may simply not be possible until you are challenged by better opposing hitters/pitchers, and that won’t happen to the best players until they reach AAA and the majors.
If this is true, then the less-talented guys who don’t make the majors (or don’t last) may very well peak at an earlier age, because they faced adversity earlier and learned what they could, but in the end didn’t have the physical skills to go further.
Past the issues with the DTs, I don’t know that anything was out of order with the ‘09 PECOTAs, compared to where they were in ‘07 and ‘08.
Colin, are you talking about DTs in some limited number of minor leagues, or across the board, including MLB?
Because I can run my tests that limits it to players with substantial MLB playing time. I just don’t think it can possibly change much.
...bearing in mind that Marcels projects over 10% fewer players.
Correction: anyone not in the main Marcel file automatically gets a league average forecast. If it’s necessary for me to include all the extra 7000 players listed here:
http://tangotiger.net/files/REPORT_POSITION_CLASS.csv
...in a Marcel file to make that point, then I will. Otherwise, the FAQ is explicit in this regard.
Colin, are you talking about DTs in some limited number of minor leagues, or across the board, including MLB?
Check Colin’s parenthetical:
(Wieters was something of a “perfect storm” where he had ONLY played in two leagues and those were the two leagues where the DTs seemed to be out of whack.)
Colin’s THT post deconstructing the Wieters projection was based on problems with the DTs for the Carolina and Eastern League only. Wieters was a perfect storm because those were the only two leagues that he played in professionally prior to 2009.
There can’t that many Eastern Leaguers to mess up the data that much. PECOTA did well projecting hitters in 2007/2008. And was dead last in J Cross’s test last year.
Colin, can you give us the Marcel vs PECOTA rmse for each of the three years? You say Marcel is slightly ahead for combined. I suspect He had a monster comeback in the last inning there.
There are two different issues.
1. Why was the Wieters projection so terrible?
Colin’s THT article focused on the DTs for those two leagues and he now says that issue is fixed.
2. Why were the overall PECOTAs so bad in 2009?
I don’t beleive anybody from BP has ever addressed this issue directly. BP is now saying that it’s not a result of moving from Silver’s Excel based system to Davenport’s database based system. It’s nice to think that once that issue is settled they will publicly address the larger issue - if that wasn’t the problem, then what was it?
philly, the reason I asked is that there’s something missing. PECOTA 2009 had a very below-Marcel level for hitters and Marcel-level for pitchers.
Colin said:
“Past the issues with the DTs, I don’t know that anything was out of order with the ‘09 PECOTAs, compared to where they were in ‘07 and ‘08. My RMSE tests on OPS produced similar results to ‘07 and ‘08.... “
And further said the thing about matching Marcel. Something doesn’t match here. PECOTA had a terrible 09, and the only hiccup that’s been noted is “DT” and Wieters’ leagues.
That’s why I asked what I asked. Something doesn’t add up because it sounds like, if you remove any players who played in Wieters’ leagues in 2008, that PECOTA spit out exactly what it was supposed to spit out for every other player.
I don’t believe that the database version was up and running for last year - I believe Nate and Clay produced the ‘09 PECOTA using the same version that was used in ‘07 and ‘08. I could be wrong on this, though.
My tests, by year:
year npRMSE opRMSE mRMSE
2007 0.088 0.086 0.083
2008 0.084 0.083 0.082
2009 0.081 0.085 0.080
Total 0.084 0.085 0.082
np stands for “new PECOTA,” or the back-forecasts Clay ran using the new database-driven PECOTA. op is “old PECOTA,” or the ones published on the site for those seasons. m is of course “Marcels.”
For those wanting technical details - each system is graded on those players it forecasted, nothing more or less. (I have a version of the test that, as Tom says, provides a league-average forecast for any player not explicitly forecasted. My recollection is that the difference between systems doesn’t change much in that test.)
That is a weighted RMSE for players with 100+ AB in that season. (The playing time cutoff was a necessary evil, in order to cut down on the amount of numbercrunching done to produce the retro-forecasts.)
Before testing, all forecasted OPSes were adjusted so that each system’s forecasted mean matched the actual performance of those players in that season. In other words, I subtracted the projected mean of OBP and SLG for each system and added the observed mean of OBP and SLG for each system based upon the pool of players projectedm weighted by actual (not projected) PAs.
Once I have finished mapping ZiPS and CHONE for those seasons as well, I do plan on making the full data set available to other analysts. (I also have a slightly older run of nPECOTA which I need to update.)
Colin, well that is fantastic stuff, exactly what we’ve been needing to see.
I had a different player pool (and normalized OPS different by multiplying each system by a certain factor so that they had correct mean) and presented RMSE in a silly way but if you divide my weighted RMSE’s by sqrt(mean PA) so that they mean the same thing you get numbers that are comparable to Colin’s 2009 numbers:
system RMSE
marcel .083
Pecota .085
and the other systems:
Chone .079
ZiPS .080
Fant .083
TSN .084
Steamer .085
So I’ve got Marcel and Pecota showing up closer than Colin does but I think that can be explained by the different player pools.
Thanks for posting that Colin.
I’m not sure what the difference is between this and what Tango did almost exactly 2 years ago (looking at the 2007 forecasts). In that, PECOTA had an RSE of .069, just ahead of Marcel (.071). Must have been a smaller set of players in that thread.
Back to my aging curves, when I included minor leaguers (my matched pairs were of same team two consecutive years) the overall talent pak for both batters and pitchers was 24, although each component aged differently. I’ve sent the spreadsheet to Tango, mgl, and two other respected folks here (other than THT).
Of course, aging is a distribution. My research has shown 24 to be the mean for all players, but individuals vary, likely on a normal type curve. Right now I am working on two ideas (trending and player quality) to see if I can identify things to more personalize the aging curves. For example, we have seen that players in MLB and NPB peak around 27. These are the best players, so perhaps I can find a relationship which shows that best players continue to improve for another couple years past 24, while the less talented players don’t. I’ve got to watch for causation, as maybe those batters got to MLB caliber because they continued to improve, but hopefully I can find something useful. Even if individuals are shuffled up or down the curve, I will still have the group totals at each age match the existing curve.
Rally: I think I did average error, not RMSE. Probably.
***
The range is .079 to .085 RMSE. Now, all those who think that difference is small, raise your hand. Go on, do it. No one is watching. Ok, I count 990 hands up out of 1000 readers. Well, it’s actually a big difference, since we’ve seen Jared post his win% records and I posted my win% records.
Basically, RMSE means nothing to no one and really serves, well, nothing. Really. RMSE, correlations. What does it mean? I mean, I know what it means, but how is a casual reader supposed to say: “Well, Chone had an RMSE of .079, but Marcel was very close at .082”.
And, seriously, is ANYONE going to remember the RMSEs and r in 2 months? But, we’ll remember’s Jared’s 100-62 record for ZiPS pitching, and we’ll remember my .520ish record for Chone hitting and pitching.
RMSE and r-squared and all that is for the 1% of the people in the crowd. When ranking forecasting systems, this is not how to do it.
Head-to-head matchups? Well, that’s as plain as day. Chone beat Marcel 52% of the time. Pretty straightforward.
Jared’s win% presentation is another novel way to do it.
The win% scale is the best way to do this guys.
Or, present things in terms of dollars won in a draft. Something that has meaning.
Brian, I think it’s a fair enough thing to have a floating peak of say 18-30 based on the talent level of the player.
If we index all players at 100, where the replacement level is 80, and the best player is say 160. Say the average AAA is 70, average AA is 60, average A is 50, average college is 30, average HS is 10.
Just off the top of my head, you can do:
peak age = 5*ln(forecastedPeakTalent) + 4
You end up with this:
Talent PeakAge
160 29.4
100 27.0
80 25.9
70 25.2
60 24.5
50 23.6
30 21.0
10 15.5
Seems a reasonable enough set?
The ‘09 Wieters issues weren’t issues with PECOTA in and of itself at all - they were issues with the DTs (essentially MLEs) fed into PECOTA at the time. (And that’s in fact what I said before I got hired at BP.) Of course now I’m going to get someone asking if the current DTs are fine, and my short answer is I think so - but I haven’t done a lot of testing of this proposition.
Colin, it’s good to see you here! My perusal of the PECOTA cards available to me (the Yankees) seems to indicate that the problem with Eastern League DTs might still be there. Montero’s got quite a wild MLE from Trenton, as do the other Yankee prospects who played at Trenton last year.
IMO, PECOTA isn’t fine if the DTs are broken. I can imagine DTs being useful for analysis separate from PECOTA, but I can’t imagine PECOTA being useful separate from DTs. If we are only interested in the projections for established major league hitters, Marcel will do us just fine, and I don’t think PECOTA buys us much, if anything.
peak age = 5*ln(forecastedPeakTalent) + 4
This works for me since I peaked at 17 (just talking baseball here) but I like to think I was an above average HS player.
More seriously, although I really like the idea, I’m not sure I understand the logic behind this. At the low end of the spectrum are we just defining peak as “age when X stopped playing?”
Is it just that players who peaked at 24 never had a chance to get quite as good as guys who kept improving and peaked later?
If we took two 25 year old MLB players, both of whom would go on to play in the majors for the next few seasons would the better one (as defined by how they performed at age 25) be more likely to to be better at 29 than 27 while the worse one is relatively more likely to be better at 27 than 29? Or does this only work ex post facto (the players who peaked at 29 where better overall than the players who peaked at 27)?
Feb 11 16:10
Clutch analogy
Feb 11 15:58
MGL: Today on Clubhouse Confidential
Feb 11 14:01
Reader Mail of the Day: Why do we need X years of fielding data? And what about outliers?
Feb 11 11:54
Who is Jeremy Lin?
Feb 11 10:29
Dwight Evans
Feb 11 02:12
Performance through the ages
Feb 10 23:01
For Your Soul
Feb 10 21:07
Hero of the month: Brittney Baxter
Feb 10 18:32
Moneyball at Villanova
Feb 10 17:00
Psst… wanna intern in Canada?
We are here at the Hardball Times, where we’ve secretly replaced the fine Colin they used to serve with Folgers Crystals. Let’s see if anyone can tell the difference.
Thanks for the link, Tom.
I’m sure there are more capable people out there than me to look at this subject, but I’m curious, so if no one else will look under the hood, I will. I’m more than willing to take input about things I may have missed or misunderstood.
I appreciate Jeff’s comment at THT that reminded me that he has MLEs available at minorleaguesplits.com.