THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, October 15, 2008

Challenging Nate Silver (and all other forecasters)

By Tangotiger, 02:03 PM

Nate issued a challenge to a pollster a few months ago.  That pollster, too “big” for Nate, didn’t take him up on it.  Nate, wisely said, regarding PECOTA:

“Sometimes being more accurate means you’re getting things right 52 percent of the time instead of 50,” says Silver. “PECOTA is the most accurate projection system in baseball, but it’s the most accurate by half a percent.”

Which is of course what I’ve been saying.  PECOTA is not necessarily the most accurate (Chone might be), but the basic idea is that same: the best one is an 84-win team (.519), the next best ones are 83-win, and your typical baseball fan is an 81-win team.  So, calling one forecasting system “the best” would be as dubious as calling an 84-win team better than an 83-win team, especially since we are not even sure if PECOTA is the 84 or the 83 win team.

Anyway: Nate Silver, MGL, Dan Syzmborski, Rally Monkey, David Gassko, Chris Constancio, Bill James, Pete Palmer, Ron Shandler, Tom Tippett, and anyone else.  And I’ll put in Marcel of course, “the people’s choice”.  Step right up.  Time for a challenge and put all the forecasting systems out there to fight it out.  I have a simple enough idea, which I’d be open to modify.  Each one of you provides forecasts for as many players as you wish.  I’ll tell you exactly what metric I’ll be using, and what position each player can play.  You provide a ranking based on that metric.  I then put you in one huge league.  I’ll draft the players for you in snake-style draft.  That’s the team you get for the whole season, no trading or moves.

Then, I’ll do it again, with a new order.  And then again.  And again.  And again, and again, and.... and again.  I’ll do one hundred (maybe one thousand) drafts, all done programmatically.  Then, we’ll see how everyone does at the end of the year.  The winner gets to say “Best Forecasting System of 2009”.  No one gets to put on their book “deadly accurate” unless it’s true.

In the spirit of fun and highlighting how close all these forecasting systems truly are, I encourage all you readers out there to push for your favorites to join.  There’s no reason for any of them to be all snooty like American Research Group was to Nate.

I will update this post, as challengers accept.

For those who don’t know me, I created the Marcel The Monkey Forecasting System (affectionately called The Marcels) as a way to provide the minimum competence level all forecasters should beat.  I want and prefer that the Marcels not win.  And I have even provided the exact model so that you can create your own. So, there is no conflict of interest here.


#1    Lou      (see all posts) 2008/10/15 (Wed) @ 16:10

Love it tango.  I hope they all accept.


#2    Tangotiger      (see all posts) 2008/10/15 (Wed) @ 16:37

Let’s see who the bold ones are.


#3          (see all posts) 2008/10/15 (Wed) @ 17:05

Some (maybe most, not sure) of those guys probably read this blog… did you bother sending this to them? Do you want one of us to send it to them?


#4    cannatar      (see all posts) 2008/10/15 (Wed) @ 17:26

There’s plenty of time to work out the details, but you’re going to have to decide how purely you want to stick to evaluating the projection systems or whether you want other issues to come into play (such as, relative replacement values at different positions, relative values of pitchers vs hitters, SPs vs. RPs, etc).

And are you evaluating these systems based just on how well they predict rate stats, or do you also care about playing time projections?


#5    tangotiger      (see all posts) 2008/10/15 (Wed) @ 17:37

Also playing time projections.

Basically, the Fantasy dollar figures per player.

***

No, I did not contact anyone.  I thought about it, and posted it here.  I’ll let the Crowd push the buttons.


#6    SG      (see all posts) 2008/10/15 (Wed) @ 19:32

Can I play?  I’ll throw CAIRO into the mix.


#7    tangotiger      (see all posts) 2008/10/15 (Wed) @ 22:04

All right.  Here are the competitors so far:

Marcel
CAIRO

Anyone else?


#8    Dan Szymborski      (see all posts) 2008/10/15 (Wed) @ 22:19

I’m in!  One condition though - reverse that y and z in my last name!


#9    Chris C.      (see all posts) 2008/10/15 (Wed) @ 22:46

Out of curiosity, how big are the ‘roster’ sizes for each team?


#10          (see all posts) 2008/10/15 (Wed) @ 22:59

I’ll throw Oliver in.

I have to code the one year age adjustment factor, to zero out the bias, but that’s not difficult.

The harder thing will be forecasting playing time.


#11    tangotiger      (see all posts) 2008/10/15 (Wed) @ 23:27

Marcel
CAIRO
ZiPS
Oliver

***

Dan: ouch.  I actually thought I got it right.  I spent ten seconds writing your name.  Oh well, 8 out of 10 isn’t bad.

***

Roster size: I’m thinking that there should be some 200-300 position players selected, and 150-250 pitchers selected.  So, if we have say 10 forecasters taking the challenge, then 35 players per team.  If we have say 20 forecasters taking the challenge, then say 25 players per team.

***

Playing time will be the challenge, as it should.  As we know, there is a linear relationship between WAR and salary.  And WAR is rate minus baseline times playing time.

And Fantasy Baseballers want to know the dollar value of the players they are bidding on.  So, it is imperative that anyone who wants to supply forecasts that are to be taken seriously by Fantasy Baseballers to be given both rate stats and playing time estimates.

Now, if someone ONLY wants to provide rate stats, and expects the playing time forecasts to come from somewhere else, that’s fine.  Forecasters should decide whether they want to provide one or both.


#12    tangotiger      (see all posts) 2008/10/16 (Thu) @ 07:37

Marcel
CAIRO
ZiPS
Oliver
RotoWorld


#13    Rally      (see all posts) 2008/10/16 (Thu) @ 09:06

I’m in.


#14    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 09:47

Official Acceptance List

Marcel
CAIRO
ZiPS
Oliver
RotoWorld
Chone

Official Waiting List
(none of the following have yet been individually contacted)

Nate Silver
Ron Shandler
Bill James
Pete Palmer
MGL
Diamond-Mind Baseball
David Gassko
Chris Constancio
Anyone else?


#15    Pizza Cutter      (see all posts) 2008/10/16 (Thu) @ 09:58

I’ve been toying with my predictor system for a while, but it probably wouldn’t be done in time for this particular project.  Certainly, the pitchers won’t.  If, somehow, it is, I’ll be happy to throw in my spreadsheet.


#16          (see all posts) 2008/10/16 (Thu) @ 11:39

Just using ‘08 numbers with no regression might be worth it too, an even monkier monkey.


#17    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 11:52

That’s a good idea.  Marcel sets the line as minimum competence, and anyone who finishes just below that still might be good.  But, just using 2008 sets the line as completely incompetent, and anyone who finishes below that line is more than useless.

Harry, if you have collected any of the 2008 forecasts, I’d love to get them.  I’ve got what’s on Fangraphs already, plus what David at THT posted.


#18    Colin Wyers      (see all posts) 2008/10/16 (Thu) @ 12:00

Maybe instead of using just 2008, use a two or three year average, unweighted. (At least this was argued by someone on a Cubs forum:

http://www.northsidebaseball.com/Forum/viewtopic.php?f=2&t=51758

in response to some unofficial Marcels projections of mine.)

I’d love to look at a collection of 2008 forecasts as well. Have some ideas for a study I’d like to try out.


#19    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 13:10

Unweighted 3 year average and a weighted 3 year average (plus regression and age) won’t have much difference.  The point is to have some separation and if Marcel and the unweighted look similar, there’s not much point there.  The idea of using 2008 only is that it gives the absolute bare floor.


#20    MGL      (see all posts) 2008/10/16 (Thu) @ 13:11

I’m not really sure what Tango is going to do with the forecasts.  I don’t follow his “multiple team” methodology.

But I am afraid that this is going to boil down to who has the best playing time forecasts, which I think is 90% luck with maybe a little bit of “insider knowledge” (which minor league players are likely to get called up, etc.) thrown in.  I could be wrong though, since as I said, I don’t follow Tango’s model.

I don’t really do playing time projections, and in fact, think they are kind of silly, at least for my purposes.

So I won’t be entering into this contest.  If it ends up being an evaluation of “rate” projections, then I will probably be in.


#21    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 14:22

To be clear, the idea is that I would provide some metric to judge against, like WAR (wins above replacement).  I’ll spell out exactly the definition so that we all know.  Let’s for example presume, for the sake of illustration, that the metric is: PA * (wOBA - .300).

1. You rank your players in order from first to last.  Each of the other forecasters does the same thing.

2. You each give me your rankings.

3. I randomly assign draft order.

4. My computer program picks the #1 player from the first forecaster to draft.  The next forecaster gets the next best player.  And on and on. 

So, what we have here is a league with 10 or 15 or 20 forecasters, each with a team of players that they selected as the best available according to their forecast.

Since anything can happen in one draft, I repeat steps 3 and 4.  Now I have two leagues.  MGL might have had the 4th draft pick in one league, and then the 11th in another.  And I do it again.  Now I have three leagues.  And on and on.  I’ll end up with 100 leagues.  Maybe 1000.  Or 2000.

5. You keep those players all the time.  So, it’s completely passive on the part of the forecasters.  At the end of the season, I simply tally up the WAR for each team, and declare the winner of each league.  With 2000 leagues, and if we have 20 forecasters, we should expect each forecaster to win 100 times.

***

Now, MGL could provide his WAR numbers based on his rate stats and PECOTA’s Depth chart.  It would be preferable for him to say that (which I know he would).


#22    studes      (see all posts) 2008/10/16 (Thu) @ 14:37

This part of your “contract"…

“Then, we’ll see how everyone does at the end of the year.  The winner gets to say “Best Forecasting System of 2009”.  No one gets to put on their book “deadly accurate” unless it’s true.”

...is offputting to me.  This one process becomes the one and only arbiter for judging forecasting systems?  I don’t have a system at stake, but I wouldn’t do it.


#23    cannatar      (see all posts) 2008/10/16 (Thu) @ 15:29

MGL’s probably right that playing time will wind up being the most important factor in winning.

You could do two separate competitions:

1. Each projection system bases list on both playing time and rate stats.
2. Each projection system submits rate stats only, and value is done for each using a single set of playing time projections (i.e. everyone has to use the PECOTA depth charts).

If anyone doesn’t want to bother with projecting playing time, they can just compete in #2.


#24    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 16:04

If playing time forecast is the single most important thing, then we should simply stop reporting forecasting studies based on rate stats.  The discussion should center on who can project playing time the best.

Regardless, the only test you can have is WAR or team wins.  And to do that, you need playing time.  There’s no way around that.

***

Studes, that statement was more of a tongue-in-cheek. 

However, everyone proclaims their system the best.  I’m now putting everyone into the same competition, and making them play each other a thousand times.  I’m not sure what better arbiter you would want.  Isn’t it better to have Consumer Reports or cNet give out 5-star reviews on an HP printer, than for HP to review its own printers?  That is essentially what BP does by putting their “deadly accurate” statement on their book.  I find what their marketing arm does very off-putting.  I can’t believe that Nate would willfully agree to do that, considering his other very public statements that run contrary to that bold statement.

That said, I am offering all forecasters the opportunity here to shape the process.  I’m very open to hearing from them how they would like to proceed here.

My basic viewpoint is to create a model that matches what the Fantasyer expects, and that is basically: playingTime times (rate minus baseline)

That gives you a direct correlation to Fantasy dollars.  And that’s what the Fantasyer wants to know.

And, that’s also what a GM wants.


#25    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 16:23

Official Acceptance List
Marcel
CAIRO
ZiPS
Oliver
RotoWorld
Chone
Ask Rotoman

Official Semi-Acceptance List
MGL (if he can use published depth charts)

Official Waiting List
(individuals contacted, and awaiting word)
Baseball Info Solutions
Baseball Prospectus
Baseball HQ
ProTrade

Unofficial Waiting List
(none of the following have yet been individually contacted)
Pete Palmer
Diamond-Mind Baseball
David Gassko
Chris Constancio
Anyone else?


#26    studes      (see all posts) 2008/10/16 (Thu) @ 16:24

Well, if it was tongue-in-cheek that’s fine.  But even Consumer Reports doesn’t insist that its participants not make marketing statements or claims that don’t jive with their results.  This would be just one test of many possible tests, not the definitive test.

BTW, “Fantasyer?” Here’s the link:

http://fantasyer.com/


#27    MGL      (see all posts) 2008/10/16 (Thu) @ 16:28

While I don’t have anything against evaluating a forecast that includes playing time, and I am not suggesting that it isn’t fair or even the correct way to do it, I am just not interested.

If it is something like (wOBA-.300) * PA, and everyone’s projected wOBA will be about the same (which it will be), then it will be ALL about playing time.  How about we just have everyone forecast playing time and forgo the rate projection? 

That would probably yield the same results.  Whether someone predicts that Griffey retires or not, or that Harden gets 120 IP or 220 IP just does not interest me.

And doing 100 or 1000 drafts does not solve the problem of the results fluctuating randomly, I don’t think.  The fluctuation comes from players fluctuating around their true talents, the uncertainty of the projections themselves, and certainly using playing time, there will be tremendous fluctuations around playing time, because of injuries, etc.

Again, if someone wants to see how good the forecasters are in projecting injuries and the like (and there is nothing wrong with that - it is both an art and a science), I’ll pass - it is just not my cup of tea.

If it were me, I would only use rate stats - and I realize some of the problems with that.  But I don’t have the time to set up a contest like that.  Whoever is willing to set it up and track it should set it up however they want, I guess.


#28    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 16:45

But even Consumer Reports doesn’t insist that its participants not make marketing statements or claims that don’t jive with their results.

Well, there are FDA mandates, and people have a certain level of trust on that.  For things that are not FDA mandated, all you get is a bunch of marketers making the “best” claim, meaning that all those claims are useless.  If a respected third-party outfit like Consumer Reports says that some SUV has serious rollover safety issues, that means alot more than said company saying it doesn’t.

So, that’s what I’m offering here, to cut through all the pre-FDA marketing crappola.

And, since I’m offering all the forecasters the opportunity to shape the methodology, they can’t then say that the results are tainted or biased.

***

MGL: I don’t know that “all” the rate stats will be the same for each forecasted player.  When I’ve done the rate studies in the past, we do see differences, big enough that Chone wins in 55% of the head-to-head battles with other forecasters.

That said, we can do as others are suggesting and run two concurrent contests, one where everyone uses the same depth chart, and one where we have both the rate and playing time forecasted.

As for running 1000 contests, the fluctuations will come from having different players.  For example, if you have 2000 contests and you have 20 forecasters, then you expect each forecaster to draft Granderson 100 times.  Forecasters who really like him will have him drafted say 120 or 150 or 200 times, and those who don’t like him as much will get him drafted 75 or 50 times.

So, that’s what this will come down to.  You will (essentially) be weighting each player based on how much more you like or hate him than your fellow forecasters.  If Marcel thinks that Longoria is a 15$ player, and everyone else has him as a 20-30$ player, then Marcel will never get to draft Longoria.

Otherwise, by having say only 10 drafts, you could end up never having drafted Granderson, and so, you get a zero weight for him, while some other forecaster counts him as (essentially) 1 or 2.  By having 2000 drafts, one forecaster will count him as a weight of 0.8, the average will be 1.0, and someone who really like him might weight him at 1.3.

It keeps the random fluctuation from getting out of hand, while giving a fairer weight as to where you really have him ranked.


#29    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 17:13

Btw, I’m also open to having multiple metrics.  For example, one could be wOBA, and another could be the one I publish every year for Fantasy Baseball (based on AVG, HR, SB, R, RBI).  Or really anything.  Ideally, we have a small set of metrics so that I don’t ask the Forecaster to provide 10 different lists based on 10 different metrics.  I think 3 max should suffice.


#30          (see all posts) 2008/10/16 (Thu) @ 17:22

I could be mistaken, but don’t Gassko and Constancio work together to create one projection for THT?

And what about asking someone from ESPN to throw theirs in? I have absolutely no idea who does their projections, and they probably wouldn’t even be allowed to enter, but might as well take a shot.


#31    tangotiger      (see all posts) 2008/10/16 (Thu) @ 18:10

Fangraphs has something called MINER, which I think is from Chris.  So, I don’t know the relationship between Chris and Dave.

As for ESPN, I’ll see who I can find.


#32    Colin Wyers      (see all posts) 2008/10/16 (Thu) @ 18:13

MINER is from Jeff Sackmann, who runs Brew Crew Ball and Minor League Splits.


#33    Tangotiger      (see all posts) 2008/10/16 (Thu) @ 18:37

Ah, thanks for that.


#34          (see all posts) 2008/10/17 (Fri) @ 10:48

#17:  I lined everyone up in Chone, Pecota, and Zips back when I thought I was actually going to be productive this season, and I think I’ve got Shandler too.  I can get them to you when I get home tonight.


#35    Tangotiger      (see all posts) 2008/10/17 (Fri) @ 11:14

METRICS

I will judge based on three metrics (which is up for discussion).  In all cases, I’m going to separate nonpitchers from pitchers, since the ability to forecast one doesn’t necessarily tie-in to the ability to forecast the other.

Metric 1: geared for Fantasy Players.

The basic formula for hitters is:
HR/10 + SB/10 + xH/10 + R/30 + RBI/30

And for pitchers, it’s:
W/5 + SV/10 + SO/50 + xER/10 + xWHIP/15

xH = H - AB*.277
xER = (4.10-ERA) / 9*IP
xWHIP = IP*1.32 - H - BB

The “.277”, “4.10” and “1.32” are subject to slight alteration.  I will finalize those numbers long before the deadline.

Metric 2: geared for reality.

Hitters: PA * (wOBA - .300)

This wOBA equation will include SB (weighted at .25) and CS (weighted at -.50).

Pitchers: IP/9 * (5.75 - ERA)

The “.300” and “5.75” are subject to change and will be finalized long before the deadline.

The “.300” will differ by position.  Catcher will be .020 less (i.e, .280), SS is .010 less, CF, 2B, 3B is no change, corner OF is .020 higher, and 1B/DH is .030 higher.

Metric 3: geared for rate stats.

Same as Metric 2, without the playing time component.

Hitters: wOBA
Pitchers: ERA

Hitters must get at least 200 PA and pitchers at least 50 IP.  Any shortfall gets a “replacement level” added.  The replacement level is based on the numbers in Metric 2.

I will simply take the simple average of all the hitters on your team.

For example, if you have 14 hitters each with over 200 PA, and a 15th hitter who has a wOBA of .500 in 20 PA, I will add 180 PA of .300 to his total (giving him .320), and then take the simple average of the 15 wOBA.


#36    Colin Wyers      (see all posts) 2008/10/17 (Fri) @ 12:14

I’d say to keep the evaluation position-neutral. Because if not, then you’re asking the forecasters to predict THREE things:

* Rate
* Playing time
* Position(s)

Sure, for the majority of players that’s not a big deal. But for some edge cases (superutility guys like Freel, DeRosa and Figgins) or aging players who get moved defensively, this could be problematic.


#37    Tangotiger      (see all posts) 2008/10/17 (Fri) @ 12:24

I will supply the positions beforehand.  And everyone will be selecting the same number of players at each position.

So, that won’t be an issue.  I simply don’t want to have a SS with a .300 wOBA count as “zero” in metric2, whether he has 100 or 700 PA.


#38    studes      (see all posts) 2008/10/17 (Fri) @ 13:56

So, that’s what I’m offering here, to cut through all the pre-FDA marketing crappola.

Right, you’re asking people to abide by your exercise (which will, of necessity, be a compromised procedure) and saying they “can’t” market themselves in a way that conflicts with the outcome.  Personally, I wouldn’t agree to those terms.

I’ve been in businesses in which consultants tried to position themselves in this way, and it never paid to cooperate with them, even if we ranked at or near the top.

I would think you’d have a better shot at participation if you were less heavy-handed about it.


#39    Tangotiger      (see all posts) 2008/10/17 (Fri) @ 14:44

As I said, I’m being tongue-in-cheek about the marketing aspect.  So, I don’t understand the heavy-handed argument.

There are NO terms!  So, again, what is there to agree or disagree with?

Post 35 lays out the metrics.  If Rally or Dan or whoever wants to offer other ideas, then that’s all up for discussion.  As it stands, those metrics do what they are designed to do.

The main blog entry lays out the drafting process: all automated, and simply going down each person’s list.

So, I don’t understand the objection, other than the b-tch slap I gave BP for their ridiculous proclamation on their book.  But, since Nate himself confirmed that PECOTA barely beats out any of the other forecasting systems, then if one system can call itself “deadly-accurate”, then the next system might as well call itself “99.44% deadly-accurate”, since that’s how close it is, according to Nate himself.  Like I said, it’s a ridiculous proclamation, and never would I think Nate himself with verbally articulate those words.  The BP marketing arm has nothing to do with Nate, any more than MGL making some outlandish statement about Leverage Index has anything to do with me.

What is it exactly do you think that I’m trying to “impose” that someone would object to, and therefore not want to participate?


#40    Tangotiger      (see all posts) 2008/10/17 (Fri) @ 14:49

To be clear, I’m offering the opportunity for someone to say:
“Best Forecasting System of 2009
-- Tangotiger.net”

This does not preclude anyone from saying:
“Super-duper Deadly Accurate Forecasting System of 2009
-- self”

I think it would be a great advantage to have the first statement pasted on someone’s site, much like Jake Luft’s quote would go on THT’s book.  It doesn’t preclude THT from ALSO putting anything else on their book.

So, this is exactly like someone giving Roger Ebert an advanced copy of their movie, and he gives his two thumbs up, and you put that on your movie poster, and at the same time, elsewhere on the poster, you put “The Most Incredible Movie of the Year”.  And that is regardless of whether Ebert gave his thumbs up.


#41    jinaz      (see all posts) 2008/10/17 (Fri) @ 17:05

So, participants can submit three ranked lists--one for each metric?
-j


#42    Tangotiger      (see all posts) 2008/10/17 (Fri) @ 17:06

Right.

Or in the case of MGL, just one.


#43    Tangotiger      (see all posts) 2008/10/20 (Mon) @ 12:36

Official Acceptance List
Marcel
CAIRO
ZiPS
Oliver
RotoWorld
Chone
Ask Rotoman
MGL (Metric 3 only)
Baseball Info Solutions
ANONYMOUS

Official Waiting List
(individuals contacted, and awaiting word)
Baseball Prospectus
Baseball HQ
ProTrade

Unofficial Waiting List
(none of the following have yet been individually contacted)
Pete Palmer
Diamond-Mind Baseball
David Gassko, Chris Constancio
Jeff Sackmann
Anyone else?

***

An Anonymous forecaster came forward.  For reasons I cannot disclose, he said he can participate, but as an unnamed forecaster (otherwise, he declines).  I don’t have a problem with that, unless you guys find a problem with that.


#44    Colin Wyers      (see all posts) 2008/10/20 (Mon) @ 12:43

I think the anonymous forecaster should get to audit the course - run him through all the tests, but if he posts the best results, the “win” is awarded to the second-best forecaster in that category. That strikes me as the fairest - he obviously isn’t interested in bragging rights, so it doesn’t do him very good to have them; and it doesn’t do the rest of us much good to know that the best forecasting system is utterly proprietary and so secret we can’t even know about it.


#45    Tangotiger      (see all posts) 2008/10/20 (Mon) @ 13:24

That’s why I love blogs.  There’s always one guy who can come up with an idea in one second that would take me ten hours to realize.

Auditing in school is perfect, just like amateurs competing in the PGA don’t get the money.  I love it.


#46    studes      (see all posts) 2008/10/20 (Mon) @ 15:04

As I said, I’m being tongue-in-cheek about the marketing aspect.  So, I don’t understand the heavy-handed argument.

Thanks for clarifying your terms, Tango.  Post 40 is markedly different from the tone you took in your original post.


#47          (see all posts) 2008/10/21 (Tue) @ 01:01

Correct me if I’m wrong, but I think everyone really wants to see PECOTA in this competition. It’s by far the most famous projection system, the public should know how it fares versus the other systems by means other than calculating RSME or something.


#48    MGL      (see all posts) 2008/10/21 (Tue) @ 01:24

Can’t you just use what Pecota publishes?  Or do you need the data before they publish it (or is there some data you need from them that they do NOT publish)?

I can send someone my 2008 projections if they want, BTW.


#49    tangotiger      (see all posts) 2008/10/21 (Tue) @ 07:03

Mick, you can email it to me at tom~tangotiger~net (replacing the ~ of course).

As for PECOTA, I will get them in.  It would be PREFERRED that Nate volunteers the data so that he can say that the list he provides is unequivocally his.  If he doesn’t, I will ask someone to step forward (like harryAbles or some other loyal reader), go through the published PECOTA forecasts based on the metrics I will post, and provide the list in his place.  So far, Nate has not replied.  However, he might be busy with his elections blog.

I’m thinking Nate will do it.  After all, he issued his public challenge, and it would be strange for him not to accept one in the same spirit.  As well, when they first rolled out PECOTA and I was running my first project on forecasts, they were offering me the entire set of forecasts.  I think the only way that Nate doesn’t participate is that someone at BPro is personally annoyed with my constant professional jabs at them.


#50    MGL      (see all posts) 2008/10/21 (Tue) @ 13:02

I think the only way that Nate doesn’t participate is that someone at BPro is personally annoyed with my constant professional jabs at them.

That could very well be!  Although for the record, you are never over the top that I can recall, nor are any of your “jabs” not without merit (although I suppose that someone could legitimately disagree with some of them).

There is no doubt that any legitimate person or entity that wants to claim some superiority or even any level of credibility or excellence should be more than willing (in fact, it should be mandatory) to have an objective entity evaluate or “grade” their work.

Any claim that one’s work or results is better than similar work done by other people, if that claim comes from the person himself, even if they claim that their evaluations techniques are objective and unbiased (and they appear to be), should be taken with as large a grain of salt as there is.


#51    Tangotiger      (see all posts) 2008/10/21 (Tue) @ 13:16

I agree.  After all, it’s BPro itself that plasters the claim on their book every year, and it’s Nate himself that issued the challenge to a fellow pollster.  This is what really inspired me to take action.


#52          (see all posts) 2008/10/21 (Tue) @ 19:10

I’m assuming the BIS projections are the ones found in the Bill James Handbook?

Btw Tango, do you know any system that’s just a little bit more complex than the Marcels, say maybe a system that uses 5 years of data and adjusts the weights accordingly?


#53    tangotiger      (see all posts) 2008/10/21 (Tue) @ 20:49

Aaron, the Oliver system, by Brian Cartwright, is such a system.  You can check the archives, or wait for Brian to come around…


#54    tangotiger      (see all posts) 2008/10/21 (Tue) @ 20:58

Right, BIS is from the Handbook.


#55          (see all posts) 2009/01/09 (Fri) @ 17:16

If everybody issued a universal playerID with their projections the consumers of these projections would be able to compare them on their own.


#56    Tangotiger      (see all posts) 2009/01/09 (Fri) @ 18:04

Right, which is why I’m pushing for it.

Aaaaannnnd, I will be using the MLBAMid as the universal ID for this project.  And I would encourage all forecasters to include the MLBAMid.  I will be providing mappings of MLBAMid to the (MLB) RetroID and BDBid.  I will also be posting this mapping on this blog in a few weeks.

There will be simply no reason for forecasters not to provide the MLBAMid other than:
1. laziness
2. contempt for their consumers

And you can quote me on that.


#57    Zach      (see all posts) 2009/01/12 (Mon) @ 23:46

Tango, you set the deadline at Feb. 1. Does that mean we have to enter and send you our projections by then? Mine won’t be done until late-Feb/early-March, but I wanted to enter.


#58    Tangotiger      (see all posts) 2009/01/13 (Tue) @ 00:33

Feb 1 to put your hat in the ring.

Mar 1 for me to finalize all rules.

Opening pitch for all forecasts to be submitted.


#59    Craig Tomarkin      (see all posts) 2009/12/12 (Sat) @ 11:54

Hi Tom,
What were the results of this contest? I would have liked to be included. Maybe there can be a rematch for 2010?

Craig
PS My annual forecasts are posted here at the link below, although I make improvements each year, so my current system should be ones posted.
http://www.baseballguru.com/bbinside4.html


#60    Tangotiger      (see all posts) 2009/12/12 (Sat) @ 12:06

http://www.tangotiger.net/forecast/


#61    Craig Tomarkin      (see all posts) 2009/12/12 (Sat) @ 13:08

Awesome! Thanks!

Is that collection of forecasts data available?

Can we collect it each year? Obviously, we’d all benefit in our drafts by averaging them all. We would need to agree that when people submit their forecasts, they include a common ID (I use lahman’s playerid with mine as a default).

Ideally, there could be a site where people upload their forecasts in a common format, so the machine concatenates them all automatically. And, anyone could download the concatenated file as it develops.

For simplicity, the format could be to supply:
ForecastID (marcel, chone, etc) playerID nameFirst nameLast yearPredict

And counting stats:
G AB H 2B 3B HR SB BB SO IBB HBP SH SF
GP GS IP (innouts) ER HR BBA SOA SV

The machine calculates avg, obp, slg, woba, era

A comment on judging results. What I’ve seen is that the quality of these forecasts sometimes fluctuates substantially year to year. So, last years winner could finish last next year. I take a frequentist approach rather than strict bayesian to improve stability. In other words, I only include a refinement to my forecasts if I see it adds lift every year. So I trade off a bit of potential lift for consistency.

A true winner is among the top few consistently. I’d love to see how that plays out. Of course the perennial winner would be the crowd average.


#62    Tangotiger      (see all posts) 2009/12/12 (Sat) @ 13:45

Craig, you should read more of the threads in this blog.  I use the MLBAM as the universal ID.  There’s about 6 or 8 threads that deals with this challenge.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential