THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, February 08, 2010

Evaluating the 2009 forecasts - Chone/ZiPS + Fantistics win

By Tangotiger, 11:28 AM

Professor Jared and his merry band of high school students take on the task.  Those ambitious fellows created the Steamer forecasts, and they submitted it as part of my Forecasters Challenge last year.  They finished 6 out of 22 in the official competition, and number 2 out of 22 in the rules I’ll be using for the 2010 competition.  And they did this while submitting an abbreviated list of players (which I supplemented with the Marcel draft list in the late rounds).  Yes, we should all be impressed. 

Anyway, let’s see what their analysis reveals. They compared these systems: Steamer, Marcel, PECOTA, Chone, ZiPS, Sporting News, Fantistics.

Missing data - 475 hitters had 50 or more PA in 2009.  465 of these hitters had projections from each of the big 3 (chone, pecota and zips).  438 were projected by Marcel.  We looked at these 438 hitters.  Projection systems that projected fewer players (Steamer Projections and Sporting News were the main guilty parties) were given the Marcel projection for that player.  This allowed for a comparison of all 438 hitters across systems.  Sporting News and Steamer only projected about 270 players each.  Systems could beat the monkey so long as the projetions they actually made were better than Marcel.

A technical note: Marcel’s official forecast for anyone not in the downloaded file is:

FAQ: “But, what about a player who’s never played MLB? Where’s his forecast?” That’s simple. His forecast is the league mean over 200 PA, 60 IP (starter) or 25 IP (reliever). If you want to know what the league mean is, just take the average of anyone forecast with a reliability of 0.00. So, Marcel’s official forecast for anyone coming over from Japan is that.

So, to be fair to The Big 3, the 27 missing players should be included with a forecast exactly as I said it should be.  It makes no sense for me to include it explicitly in the download file, if it’s just going to be a line of data that repeats for all players.  But, this is what should be done, because that’s what I said it is.  This is using Fantasy points:

RMSE* System
2.41 Avg Projection
2.43 Fantistics

2.49 Sporting News

2.52 Marcel
2.53 Steamer
2.55 ZiPS
2.56 Chone

2.65 PECOTA
2.67 2008

Holy moley.  First of all, Marcel had a great year.  Secondly, PECOTA did so bad, that it was as accurate as simply taking the previous season’s stats and running with those.  Don’t like RMSE?  How about correlation:

R with actual System
0.729 Avg Projection
0.723 Fantistics

0.707 Sporting News

0.697 Marcel
0.696 Steamer

0.688 ZiPS
0.687 Chone

0.657 PECOTA
0.653 2008

Again here, Marcel did very well, while PECOTA got trounced.  This is pretty shocking actually.  I don’t know what happened to PECOTA, but I’d love to see Baseball Prospectus explain it.  And who the heck is Fantistics anyway?

Anyway, let’s go on:

Also worth noting, each of projection systems has a smaller standard deviation across their projections than the standard deviation of actual results from 2008 or 2009.  This is as it should be.  The projection systems are trying to forecast true talent whereas the variance in actual results is a combination of the variance in true talent and the variance in luck.

Nothing novel there, but it should be noted because lots of people who are new to this aren’t aware.

Continuing:

And, if you want the best linear equation of these systems for projecting actual 2009 SGPs:
Actual = 0.527*Fantistics + 0.409*Chone + 0.243*SportingNews – 0.761 (R2 = 0.547)
While Sporting News projected SGP’s better than Chone, Chone added more information to Fantistcs because Chone was the most unique system while Sporting News was the least unique system.

Very interesting.

And this is using OPS:

RMSE* System
1.55 Chone
1.57 ZiPS
1.57 Avg Projection

1.62 Marcel
1.63 Fantistics
1.65 Sporting News
1.66 Steamer
1.66 PECOTA

1.84 2008

Chone and ZiPS lead the way.  Marcel is where it should be, whil PECOTA once again brings up the rear (just not so obvious this time).  And if you prefer r:

R with actual System
0.638 Chone
0.624 Avg Projection
0.623 ZiPS

0.590 Marcel
0.583 Fantistics

0.568 Sporting News
0.567 Steamer
0.564 PECOTA

0.401 2008

The usual story.  Chone does great, Marcel does average, PECOTA is in last place.

And, if you want the best equation to project 2009 OPS:
ActualOPS = 0.716*Chone + 0.199*ZiPS + 0.081 (R2 = 0.414)
Although this equation doesn’t do much better than simply using Chone.

So, if Fantistics was only in the middle of the pack in projecting OPS, how did it dominate SGP’s?  Ok, so Chone and ZiPS aren’t really trying to project playing time and, despite their excellence in projecting hitter quality (as evidenced by OPS) don’t do well here.  Pecota doesn’t try to project playing time in their weighted mean forecasts but does in their depth charts (used by Steamer).  The community forecasts that Marcel used do reasonably well, but not as well as the fantasy basebal gurus in projecting playing time.  Limiting this to the systems that try to project playing time (and using their proper names this time) we have:

R with actual System
0.721 Fantistics

0.694 Sporting News

0.666 Pecota Depth Charts for Fantasy
0.657 Community Forecasts

It looks to me like we found our secret recipe: Chone/ZiPS forecasts for rate, with Fantistics for playing time.  What I’d like for the professor and his kids to do is to run their study that uses Chone/ZiPS for rates and Fantistics for playing time.


#1    J. Cross      (see all posts) 2010/02/08 (Mon) @ 12:34

Good idea to look at the Chone/ZiPS + Fantistics super system.  I think that’s what I’d use for a 5x5 league this year.

To be fair we should also try Pecota with their depth chart playing time instead of their weighted mean playing time.  While they didn’t do that well, projecting hitter quality it’s really their SGP’s which were especially brutal and I think was in large part due to the playing time projections.


#2    Tangotiger      (see all posts) 2010/02/08 (Mon) @ 13:05

I think for PECOTA, since they actually publish forecasts that are explicitly intended to be used as-is, then what you did was totally fair game.

If you wanted to include an additional PECOTA data point, that would also be fair.

***

ZiPS and Chone explicitly said they don’t really forecast playing time.  Indeed, ZiPS, in the Forecaster’s Challenge, used the same playing time forecasts that Marcel and MGL used: community.  And Chone will be using some form of community playing time forecasts as well.

Seeing the results, it certainly seems that the #1 thing to get right is the playing time forecasts.

So, it will be interesting to see how the Fangraphs Community does in 2010.


#3          (see all posts) 2010/02/08 (Mon) @ 13:33

(sarcasm) If we exclude the Matt Wieters for MVP forecast, would that help PECOTA? (/sarcasm)

Did PECOTA have any sytematic bias?  Or were they all over the map?


#4    J. Cross      (see all posts) 2010/02/08 (Mon) @ 14:05

Heh, actually Wieters was left out completely b/c this only included guys with a Marcel reliability > 0.

It did occur to me that we could give guys without Marcel projections the default projection and go from there.  This might have sunk the systems that didn’t project very many guys however, despite the fact they they projected enough players for most fantasy drafts.  Someone could do just fine in a fantasy league without drafting any rookies. 

For pitchers I’m thinking we should project anyone who is projected by marcel (reliability > 0) PLUS anyone who had either 10 game starts or 10 saves.  This would “punish” systems that failed to project players of fantasy significance while not being overly harsh.

What do you think?


#5    Tangotiger      (see all posts) 2010/02/08 (Mon) @ 14:42

Jared: excellent point for the pure rookies.  Yes, Marcel did NOT include in its draft list guys like Elvis Andrus, and neither did Steamer.  And those two systems did GREAT in the Forecaster’s Challenge.

So, yeah, I agree, and it’s a great point.

That said, it would still be a bit more fair to also include the pure rookies, just to head off the complaints, as minor as they’ll end up being.

It’s an interesting point to select players, after the fact, of ONLY the guys who did do well.  It would be totally wrong of course to include in a sample only players who you’ve seen the results on.  It would give those systems a leg-up.

Then again, you are weighting the forecasts based on their actual playing time.

I’d say report it as two separate things: one without the pure-rookies, and one that includes pure-rookies.


#6    Rally      (see all posts) 2010/02/08 (Mon) @ 17:22

The best way to describe my playing time forecasts is an estimation based on recent playing time data of how much the player is capable of playing.  I do not specify that such playing time will be in MLB.  I think it’s reasonable that Peter Bourjos gets 500 PA or so this year, but 90% of them will be in Salt Lake City, and maybe he’ll get a September callup to the big A.

Thank you J Cross and students for doing this analysis.  I see where the strengths of my system lie, and that is player ability.  Playing time projections take a human touch, and while I attempt to do that (unpublished) for my team W-L projections, I don’t think they are publication quality.  It’s tough for one person to keep up with 30 teams that way, a team approach to depth charts seems like the best way to go.

Jared, Let me know if you guys want an updated spreadsheet anytime to combine with your PT projections.  I think a good collaberation could be found here, and if you guys are willing I could create a special set of webpages with your PT and my rates together.  And if you annoint closers for every team, maybe I can finally have saves on my site as well.

email for me is rally monkey (numeral five) at comcast dot net.


#7          (see all posts) 2010/02/08 (Mon) @ 19:57

Thank you very much! This is exactly what I’ve been looking for.


#8    J. Cross      (see all posts) 2010/02/08 (Mon) @ 22:06

Rally, we haven’t tried to make playing time projection yet although I think we will.  I’d definitely be game for combining them with Chone projections once we do.  They might not be any good though.

Last year, our spreadsheet predicted save opps for each team and then Peter simply spread those save opps among the team’s relievers based on his baseball expertise and we based sv% on ERA.  Not sure that our “system” will have done so well but we’ll see.

After this year, we’ll be able to see if there are meaningful biases in the fangraphs fan forecasts.  I’m guessing that they predict relatively too many at bats to older players.  Every system other than fantistics had a statistically significant age bias.  It seems like “fans” community, the prospectus depth chart guru and the sporting news guru) project too many at bats for older players while Zips, Pecota and Chone projected too many at bats for younger players.


#9    Zach      (see all posts) 2010/02/08 (Mon) @ 22:59

When I projected saves last year, I found team save opps by dividing projected team wins by two (a surprisingly accurate shortcut), and then handed out saves based on ERA--i.e., a closer with a 2.50 ERA got 85% of the teams total save opps, for instance. I chose the closer for each team, and the remaining slots were sorted based on ERA.

An 80 win team with 40 save opps, with the following relievers and their save %’s (picked out of thin air but lower ERA = higher %):

2.50, 85% ... 85% of 40 save opps is 34 saves
3.00, 80% ... 80% of 6 remaining save opps is 4.8 saves
3.50, 75% ... 75% of 1.2 remaining save opps is 1 save

I have no idea if this method is accurate (especially the save percentages based on ERA), but it passes the eye test.


#10    Zach      (see all posts) 2010/02/08 (Mon) @ 23:01

Just to make it clear, those percentages are projected save opps / team save opps (or remaining save opps). That method can’t predict BS or individual save percentage.


#11          (see all posts) 2010/02/09 (Tue) @ 00:41

Jared, I would appreciate if you would be able to evaluate my Oliver projections. Click on my name for my email address and I can send you a data file.


#12    J. Cross      (see all posts) 2010/02/12 (Fri) @ 19:08

Update:

Oliver also beats Marcel

R w/ actual:

R with actual System
0.638 Chone
0.624 Avg Projection
0.623 ZiPS
0.607 Oliver
0.590 Marcel
0.583 Fantistics
0.568 Sporting News
0.567 Steamer
0.564 PECOTA

0.401 2008


#13    Colin Wyers      (see all posts) 2010/03/02 (Tue) @ 03:04

J. Cross, can you clarify how you’re figuring RMSE for OPS?


#14    Colin Wyers      (see all posts) 2010/03/02 (Tue) @ 16:31

Bumping.

Because I literally have no idea what an RMSE of 1.66 means in the context of OPS. What are the units on that?


#15    J. Cross      (see all posts) 2010/03/02 (Tue) @ 19:17

Colin, those numbers look funny it’s because it’s the RMSE of OPS weighted by PA.  If you divide it by the square root of the mean (harmonic mean?) PA of the population you should get a number that looks like what you might expect for the RMSE for OPS (but with each error weighted by the number of PA).  I probably should have done this or just given RMSE for the OPS not weighted by PA.

Looks like the mean PA was 384 and here’s what I probably should have given:

RMSE for OPS (unweighted)

‘08: 0.109
chone: 0.093
steamer: 0.097
marcel: 0.096
pecota: 0.098
ZiPS: 0.094
sn: 0.097
fantistics: 0.098

If I take the numbers in my write up and divide by sqrt(384) I should get numbers slightly lower than these because the projections were more accurate for guys with more PA.

make more sense?

btw, I envy you getting to play around with the inner workings of pecota.  That actually sounds like a lot of fun.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jul 30 03:43
Roy Halladay’s Bobby Orr career

Jul 30 02:33
Cleveland: Meet Patrick Roy

Jul 30 01:42
“I believe…”

Jul 30 00:30
Maddon at it again…

Jul 29 23:04
Introductions: Strasburg, BABIP… BABIP, Strasburg

Jul 29 20:31
Bannister: the greatest saberist spokesperson ever

Jul 29 19:25
Gotta give Joe Torre some credit

Jul 29 19:10
SABR 111 - Out value

Jul 29 17:47
Reducing bias in fielding metrics

Jul 29 17:44
Colin full-time at BPro