THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, September 28, 2010

Marcel = Chone = PECOTA

By Tangotiger, 07:17 AM

That’s pretty much what I get out of it.

Question for Colin: if you use these Depth Charts for Marcel (call it Marcel+Community), what kind of results do you get?


#1    Sky      (see all posts) 2010/09/28 (Tue) @ 11:35

The implied point was that PECOTA was not nearly as bad as its reputation indicated (although it’s not doing so well in predicting team wins this year: http://www.dugoutcentral.com/?p=1343)

But the main point I got out of it was what you wrote above, Tom—Marcel (a dumb system) and CHONE (a free system) are just as good.  Not a great marketing claim.

However, it nicely sets up BPro to say something like “PECOTA, right now, is among the best out there, but we have X, Y, and Z changes coming, which, based on research, will improve its accuracy.” I’m curious if we’ll get that claim this week or over the winter.

Like I tweeted, I’m not sure who I love more: Colin or whomever decided to hand him the keys.


#2    Colin Wyers      (see all posts) 2010/09/28 (Tue) @ 13:24

I think Marcel is a little less dumb than is typically acknowledged around here. There are plenty of forecasters who are a lot dumber than the monkey.

Let’s be blunt - PECOTA is, as a commercial proposition, in many respects a fantasy baseball product. That’s why we do the Depth Charts, that’s why we project things like saves and that - a lot of the purely sabermetric forecasters don’t put a lot of attention into that because from an analysis point of view, saves and RBI forecasts are boring.

Now - none of that matters if the underlying forecasting system isn’t competent. I have no interest in becoming yet another braindead fantasy tout. But I don’t think PECOTA has to be the absolute best forecaster to be worth the price - as the roto comparisons show, there is a value to what we’re doing there. (And there’s the articles, too.)

So endeth the sales pitch.

(That said - yes, I fully intend for PECOTA to be the most accurate projection system I can make it be. And we’re putting a lot of work into that proposition, and we’re going to start showing off some of it tomorrow.)


#3    Tangotiger      (see all posts) 2010/09/28 (Tue) @ 13:43

The intent of Marcel has always been to act as “the most basic forecasting system you can have, that uses as little intelligence as possible.”

By using Marcel as a common baseline, everyone has the same standard it can be compared against.  Basically, it’s the “Bad News Bears” of forecasting, a movie that is neither great nor terrible, but that everyone our age has seen, and has some obvious charms that, with some effort, can be beaten.

As for the BPro Depth Charts, I hope that you are going to take this time to compare against the Community forecasts that I publish.  Marcel for forecasting rate stats and the Community for depth charts should be the baseline to compare against.  It requires literally minutes of effort on my part, and is available to everyone.

Any commercial product should be able to beat this open source product.  Linux if you will.

By the way, I hope that you can make something that is value-added, and that you can get people to pay for it.  The same for Rally with his WAR files, and Fangraphs with their online PDF and THT with their book.  This is America.

I think the only thing you can do is be honest about what it is and what it isn’t.

I’ve always characterized these systems as Marcel being 81-81, the top systems being, at best 83 or 84 win systems.  That, I think, is the reality.  And if you look at the comments on BPro’s site, you see the readers are smart and appreciate honesty.


#4    J. Cross      (see all posts) 2010/09/28 (Tue) @ 14:52

All good points here.

Colin, one question: what did you do with players who have PA this year but weren’t projected by one of these 3 systems?  Were they dropped entirely from the comparison or still included in the other 2 system’s pools?


#5    Tangotiger      (see all posts) 2010/09/28 (Tue) @ 14:57

I’m sure he dropped it.  It would be kind of hard to find any player like that in here anyway.  Marcel forecasts the universe:

FAQ: “But, what about a player who’s never played MLB? Where’s his forecast?” That’s simple. His forecast is the league mean over 200 PA, 60 IP (starter) or 25 IP (reliever). If you want to know what the league mean is, just take the average of anyone forecast with a reliability of 0.00. So, Marcel’s official forecast for anyone coming over from Japan is that.

PECOTA and Chone forecast basically anyone in AA and AAA, and maybe even single A.  Plus anyone expected from Japan. 

In any case, if you’ve got 95% overlap in terms of playing time, the missing 5% won’t matter at all.


#6    Ken      (see all posts) 2010/09/28 (Tue) @ 15:05

(if Colin is reading this)

fixing PECOTA should be job #1 at BP, at least as far as Fantasy goes. however, fixing PFM (the valuation engine) should not be far behind. having great projections is useless if it doesn’t help you create accurate $ values or rankings, and PFM is far too much of a black box for me to trust. I have no idea what it’s doing. I trust open source systems like Last Player Picked a whole lot more.


#7          (see all posts) 2010/09/28 (Tue) @ 15:09

Colin

I love that PECOTA takes a different approach than the other systems.  Uniqueness matters imo.  And it should be less likely to be mathematically driven to overregress to achieve a lower RMSE.  I also dig the probabalistic nature of the forecasts; the percentile rank thingy.

My question, by way of example, is this:

Chone Figgins has an OPS of .649 in 675 PA at the time I type this.  Where does PECOTA’s preseason forecast rank this result?  If, for argument’s sake, PECOTA estimated that there was only a 24% chance that Figgins would do worse that this after 675 PA ... then we throw a ball into bin #3.

Jose Bautista will end up in bin #10, I don’t think we even need to do the math there.

And on and on for everyone else.  Then how does the final histogram look?

That’s the real measure of the quality of the thinking behind a forecasting system, methinks.

I have a script online somewhere that performs the same order test for Marcel forecasts (technically for Albert forecasts, but it’s a trivial matter to convert it to Marcel, just lock K in at 240 for all hitting elements).  This using Audrey Hristov’s beta cdf php script and a few lines of code.  Ubersimple, but slow as molasses to run.

PECOTA will obviously wallop marcel by this measure on component values (BB/PA, 1B/BIP, etc.) but I wouldn’t be surprised to see OBP, OPS and global measures like RC or wOBA ring in very similar to PECOTA.

I would also be very interested to see how Brad Null’s multinomial model fares by this simple order test.  Unfortunately I don’t have the math skills to calculate that myself.

Makes sense, no?


#8    Tangotiger      (see all posts) 2010/09/28 (Tue) @ 16:20

And on and on for everyone else.  Then how does the final histogram look?

That’s the real measure of the quality of the thinking behind a forecasting system, methinks.

I’m sure on a league level, PECOTA will do just fine.

Where it will NOT do fine is when you break it down by rookies and young veterans, or possibly relievers and starters.

That’s because Nate made a serious mistake.  The uncertainty range that he uses is NOT based on past playing time.  Instead, it is based on the comparable players selected, and THOSE guys may or may not be based on past playing time.

Beyond anything, the largest uncertainty level we have is based on the number of past PA and past IP he has.  There is *some* uncertainty level based on the profile of the player, but, that’s just icing.

This is why I’ve had the challenge to BPro ever since Nate rolled it out.  And he confirmed to me that he never tested it the way I said to test it, and that he only tested it at the league level.

Take for example this pitcher:
http://www.baseballprospectus.com/card/card.php?id=ALBALADEJ19821030A

He had 60 career innings, and his 90/10 forecast range was 4.03/5.20.

Then you have AJ Burnett and his forecast was 3.70/4.93.
http://www.baseballprospectus.com/card/card.php?id=BURNETT19770103A

It is impossible that the uncertainty range of these two pitchers is so similar, considering you know almost nothing of one, and so much of another.

Here’s Joba (3.91/5.04):
http://www.baseballprospectus.com/card/card.php?id=CHAMBERLA19850923A

Up and down the line, you get stuff like that.

Then, you get some really weird ones like Felix:
http://www.baseballprospectus.com/card/card.php?id=HERNANDEZ19860408A

His 90th point is 3.23 (his 80th is 3.22, so there’s a coding problem somewhere), and his 10th is 3.87.  His 50th is 3.54.  That must mean that this year, he’s at the 99.99th level, even though last year he had a 2.49 ERA.

You can look at CC for another with a very tight range.  This is yet another type of problem, where the uncertainty range is far too tight.

In no way should anyone have predicted that Felix had an 80% chance of posting an ERA between 3.23 and 3.87 (0.64 gap).  It is an impossible claim to make that you can give a pitcher that kind of range.

We can go to Clemens and Maddux and RJ and Pedro, and I would bet that their ERA in their primes was not within 0.64 in ERA 80% of the time.

“George is getting upset!”

What Colin should be doing right now is telling all his readers: “I disavow anything to do with the percentiles, because they make no sense and have not been tested.”

Unfortunately, he would be fired tomorrow for his honesty, so I’ll take the brunt of it.


#9    Tangotiger      (see all posts) 2010/09/28 (Tue) @ 16:40

Let me put it another way.  Say you have a guy where you know, with 100% certainty, that his talent level is a wOBA of .330.

And say you know that this guy will face 1000 batters.

Do you know what his OBSERVED wOBA will be at the 90th and 10th percentile range?  (Equivalent to +/- 1.3 standard deviations)?  0.310 to 0.350.  Do you know what kind of ERA range that translates to?  Over 1 run.

And that’s if you KNOW his true talent range, and he faces that many batters.  The absolute limit of 90th ERA divided by 10th ERA is at most 75%.  If you find ANY pitcher anywhere, where his 90th percentile ERA divided by his 10th percentile ERA is more than 75% (like Felix at 83%), then you know, with 100% certainty, that it’s a b.s. forecast.

The other pitchers I listed above, by the way, are all at 75% to 78%, meaning that they are also b.s.

Now, what if you only have 300 batters faced (like a reliever), and you KNOW he has a true talent of say .280 wOBA?  Well, now the 90/10 range would be a gap of 1.50 runs, or a ratio of 58%.

And that unknown reliever up there in my previous example?  Well, he’s nowhere close to that.

If it was me, I would immediately bring take all the percentile ranges from the website.

“George is really getting upset!”


#10    Colin Wyers      (see all posts) 2010/09/28 (Tue) @ 16:41

What Colin should be doing right now is telling all his readers: “I disavow anything to do with the percentiles, because they make no sense and have not been tested.”

Unfortunately, he would be fired tomorrow for his honesty, so I’ll take the brunt of it.

What I’d get fired for is sheer laziness. I have the 2010 PECOTA percentile forecasts in a database. If they haven’t been tested, who but me do I have to blame right now?

We’ve spent two days at BP talking about PECOTA, and we’ve run images of Matt Wieters right up top. I got to call the article “Whatever Happened to the Man of Tomorrow!” And at the end I plug that tomorrow we’re doing an article looking at how PECOTA has projected Ichiro.

I mean, say what you want to about me or BP, but I would’ve hoped that by now there’d at least be SOME presumption that I’m not being muzzled.


#11    Guy      (see all posts) 2010/09/28 (Tue) @ 16:45

In the category of seeking to improve projections from 81 to 82 wins, I think Phil’s updated “Lemons” paper raises an interesting possibility: 
http://www.philbirnbaum.com/lemons2.pdf.  His data suggest the possibility that a player’s recent trajectory may give us additional information beyond Marcel’s weighted average.  More specifically, that players experiencing a sharp decline in production—at least those well into their 30s—should project at a lower level than Marcel anticipates. 

I’d be interested in what others think of this, and whether anyone else has already researched this issue.


#12    Tangotiger      (see all posts) 2010/09/28 (Tue) @ 16:50

Percentile ranges: Certainly not laziness.  It’s on your todo list I presume, and you haven’t gotten to it.

I don’t think you are being muzzled, but I do think there is some level of restraint. 

I’ve offered a statistical proof as to the limits that you can show the observed ERA range.  The data shown on the BPro cards is far beyond those ranges.


#13          (see all posts) 2010/09/28 (Tue) @ 17:17

Tango,

Thanks for the response.  I had no idea, obviously I’d been giving the proprietors of PECOTA too much credit.

I disagree with the the first sentence though:
“I’m sure on a league level, PECOTA will do just fine.” If you’re right (and colour me convinced) then that won’t happen at all.  PECOTA will take a beating by this order test.

So tell the croupier to take my chips off of the PECOTA wager, and spread them out evenly between Null and Albert.


#14          (see all posts) 2010/09/28 (Tue) @ 17:29

Guy,

That is a great post on Phil’s site, and terrific commentary.  In a perfect world, everyone who put such an effort into their internet posts would be blessed with such knowledgeable and diligent critics.

I’m not sure that Phil’s study is telling us that about Marcel though.  It may be telling us that the decline in a player’s performance was not luck, but there for a reason (injury probably, a lot of possibilities there) and the selling team did indeed know this.

I also don’t think it is reasonable to expect any forecasting system to identify that.  This is why, to my mind, we should expect the result of the histogram, in the order test proposed above, to have a concave form, and we shouldn’t be judging by bins #1 or #10 --- not unless they are underrepresented, which would be a red flag.

You disagree?


#15    JEH      (see all posts) 2010/09/28 (Tue) @ 19:25

"I’ve offered a statistical proof as to the limits that you can show the observed ERA range.  The data shown on the BPro cards is far beyond those ranges. “

Are the BP percentile numbers simply the observed performances of the comparators?  I am in the dark here as I don’t know much about PECOTA, but I recall the Breakout, Improve, Collapse numbers and had the impression they were empirical (though I don’t recall why I had that impression).

I am in the process of coding my baseball projection/valuation process and recently was dealing with percentiles.

I deal with three separate percentiles: Ceiling and Projected (Observed).  I also use one called Filtered (which, when combined with Ceiling, may function as what Tango refers to as True Talent . . . I try and isolate the two).  Some may use a percentile range for Opportunity (Playing Time), but I have always been more interesting in Rate stats and just haven’t gone there yet.

Which gets me to the point: there are multiple places for error (uncertainty) to be introduced in the process if we try and model them and I am sure my projected numbers will span a very wide range [these are new, in the past I used the same steps but used single values (instead of ranges) and produced a single value], but something empirically based would likely be much narrower especially if disasters (low playing time seasons) are tossed from the sample.

And that gets me to the question:  What’s the case for using comparable players for projections?  I assume this has been discussed before but I have not blundered across a discussion on the topic.


#16    tangotiger      (see all posts) 2010/09/28 (Tue) @ 20:08

The best case for using them is if you are trying to figure out the best way to model a guy with say high speed, low power, and high walks and low K.  Or any kind of combination.  Rather than doing a*b*c*d as your parameter, you look for similar players, which gets you there.

More importantly, it’s fun.

Beyond that, I don’t think there’s anything there.


#17    tangotiger      (see all posts) 2010/09/28 (Tue) @ 20:11

BP staff member Steven Goldman
BP staff
(2079)

I would like to respond to the “Deadly Accurate” thing once and for all, since I’m the SOB that coined it. As I recall, we were asked by our then-publisher for some ways to describe what we do, and I jokingly offered a number of things, one of which was “deadly accurate.” There was much chuckling at the time, because who but Annie Oakley would say “deadly accurate” about anything? Six months later, there it was on the cover. It was never intended to be more than an obviously hyperbolic boast, something out of the Stan Lee school of breathless cover blurbs. I see how some might read it as written, but even so, it’s a line on a book cover, not a slap across the face, and I’ve never understood why some folks seem so exercised about a harmless bit of self-evident braggadocio.

My response:

TangoTiger
(57181)

I’m one of the people who is irked. It’s when people believe this press release that it’s the problem. People start acting like PECOTA is the leader, when tests after tests shows that it’s possibly above-average, and possibly below Marcel. It’s not something to be boastful about.

If the intent was limited to a quasi joke, why do you have it here:
http://www.baseballprospectus.com/subscriptions/

“Complete depth charts and forecasts for AL and NL pitchers and hitters using Baseball Prospectus’ deadly-accurate PECOTA projection system--the same one used in MLB front offices.”


#18    tangotiger      (see all posts) 2010/09/28 (Tue) @ 21:41

JEH: right, you are on the right track.

Start with the easiest thing and presume that there is no uncertainty in the mean, and the uncertainty is only in the observations.  When you give hitters 600 PA and starting pitchers 900 PA and relief pitchers 300 PA, what kind of ranges do you get in wOBA (or OPS) and ERA?

Now, on top of that, add a second uncertainty to ERA because of sequencing.

On top of that, add a third uncertainty to the estimated mean (true talent).

Report each those findings.

What you will end up with is a fairly wide ERA range at the 90 and 10th percentiles (around 1.3 SD).  And that’s for SP.  For relief pitchers, the results will be completely useless.

I’d love to see the results.


#19    JEH      (see all posts) 2010/09/28 (Tue) @ 21:51

"More importantly, it’s fun. “

I’ll buy that. 

“Beyond that, I don’t think there’s anything there.”

It seems likely (to me) that there would be some information (predictive power) in similar players (so there may be some advantage to integration), but my intuition also is to project using other methods.


#20    JEH      (see all posts) 2010/09/28 (Tue) @ 22:02

@TT/18

“I’d love to see the results. “

I’ll share when I have them.

Right now I’m busy reinventing the wheel (this week’s wheel is an interface/conversion tool for retrosheet data) so it will be a while.


#21          (see all posts) 2010/09/28 (Tue) @ 22:12

For those that care (so probably just me), uncertainty in the true talent is referred to as “epistemic uncertainty” while uncertainty in the observations is known as “aleatory uncertainty” or “statistical uncertainty.”

We know that both types of uncertainty apply to baseball results, but it’s never been clear to me whether or not the PECOTA percentiles attempt to incorporate both types or just one type (and if one type, which one).


#22    MGL      (see all posts) 2010/09/28 (Tue) @ 22:24

To me, if you are one of the spokespersons (and a content creator of course) for a business, like Colin and BP:

1) You obviously cannot say anything derogatory or even close to that about the business, and nor should you, even if it is true. 

2) You have a responsibility to fix whatever problems (say, the percentile numbers in Pecota) exist in the organization, if that is your job.  If it is not, and you are aware of the problems, it is your responsibility to communicate that to the person(s) who are.  Sounds like Colin is in the process of doing just that.

3) You can (and probably have to) make some boastful claims for PR purposes. How much you stretch the truth, such as “deadly accurate projections” is not real clear.  However, if you are in the business of making such claims as many, many companies do, you are stretching the bounds of ethics, and you will always have a reputation for not being a completely honest, ethical company among some percentage of your readership/customers or people who just know about your company but don’t patronize it. 

The interesting thing is that the more your company provides legitimate products that do what you say they do, the less you need to and should do that kind of (not necessarily truthful) boasting.  Why is that?  Because if your product works, you will have ample customers. If your product does not work (like 99% of those in late night infomercials or on pay radio and cable TV), then the only way to sell it is with deception.

BP should NOT be making claims that stretch the bounds of credulity unless it wants to be known as a company that sells products that don’t work or don’t do what they say they do.


#23    tangotiger      (see all posts) 2010/09/28 (Tue) @ 22:31

MGL/22: ditto.  Wish I would have said all that.

Mickey: it is at a minimum aleatory, and almost definitely both.  When I asked Nate about it, he said it would be a forecast of the observations, that that’s how you would test it.  Therefore, since there’s uncertainty in the estimate of the mean, it has to include that, plus the other.

Otherwise, how would you test it?  “If I’m 100% certain that Felix has a 3.54 true mean, then I’m 80% certain his ERA will be between 3.20 and 3.90”?

Or “I’m 80% certain Felix’s true mean is between 3.20 and 3.90”?

No, it’s both: “I’m x% certain that Felix has a true 3.54 true mean, and so I’m 80% certain his ERA will be between 3.20 and 3.90.”


#24    Tangotiger      (see all posts) 2010/09/29 (Wed) @ 09:09

Guys,

I refer you to our Chipper thread from that time:

http://www.insidethebook.com/ee/index.php/site/comments/chipper_does_not_compute/

I will move your threads there for continuity…


#25    tangotiger      (see all posts) 2010/09/29 (Wed) @ 19:35

I mean, say what you want to about me or BP, but I would’ve hoped that by now there’d at least be SOME presumption that I’m not being muzzled.

Colin, I am sorry that what I said made you feel slighted in any way.  Certainly that was not my intent, and I think it was just poor writing on my part more than anything.


#26    Guy      (see all posts) 2010/09/29 (Wed) @ 22:31

Nate actually has a blog post today on confidence intervals, in the context of his political forecasts: http://fivethirtyeight.blogs.nytimes.com/2010/09/29/the-uncanny-accuracy-of-polling-averages-part-i-why-you-cant-trust-your-gut/#more-1545.  Remarkably, he actually cites the confidence intervals as what made Pecota superior to other baseball forecast models.

I suspect he does underestimate the uncertainty in his political projections.  But I’ll be interested to read the next two posts in the series.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 12:40
Largest demonstration in Canadian history?

May 25 12:38
Do pitcher’s reach back for velocity when needed?

May 25 12:37
Chad Curtis

May 25 12:16
“Why Kickstarter works”

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves