THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, February 12, 2010

The Marcels takes on the field

By Tangotiger, 11:19 AM

Jared Cross and his students provided me with the raw data of their forecast evaluations.  It contains a list of 438 players who had a foreacst for all the evaluation systems, with at least 50 PA in 2009. 

Jared and his team also normalized all the foreacsted OPS so that the simple group average matched the actual simple group average of 2009.  For example, Aaron Hill is showing with a Marcel forecast of .747, compared to the simple group average of .760, or -.013 relative to the group average.  Hill’s actual OPS was .829 against the simple group average of .733, or +.096.  Marcel was therefore off by .109 in OPS. 

Repeating this step for all the systems, and this is how far off each system was on Aaron Hill:
0.058 Fantistics

0.090 Chone
0.109 Marcel
0.110 ZiPS
0.110 Steamer
0.112 Sporting News

0.135 PECOTA
0.146 2008

Relative to Marcel, we see that Fantistics did much better, and that PECOTA (and just taking Hill’s 2008 performance as the forecast) did much worse.  The other systems were all within .020 OPS(*) of Marcel deviation, and therefore were basically similar.  I give a win to Fantistics, a loss to PECOTA and 2008, and a tie to the other systems, relative to Marcel.

* Twenty OPS points represents about 5 runs in a full season.  I reason that if someone forecasts 87 RBI and someone else forecasts 98, and the actual is 90, then that’s the point where you go from a tie to a win/loss.

I repeated this step for all 438 players.  Marcel v Chone gives this tally:
Marcel v Chone
W: 93
L: 107
T: 238
Players: 438

So, of the 438 players in the dataset, Marcel made the clearly better prediction than Chone 93 times.  That’s 93 wins for Marcel.  Marcel lost out to Chone on 107 players.  That’s 107 losses for Marcel.  And in 238 players, it was too close to call, as the forecast-differentials were all within 20 OPS points of each other.  That is, 54% of the time, it doesn’t matter which forecasting system you choose between Chone and Marcel.  With Chone being right 107 times and being wrong 93 times, that means that Chone is just 14 players ahead of Marcel.  That is, in 3% of the players does it add real value.

Giving 2 points for a win, 1 points for a tie, and 0 points for a loss, and the average win percentage for Chone v Marcel is .516.  That is, we can say that Chone will have the better forecast 51.6% of the time.  This is actually the BEST forecasting system against Marcel.  Here are the totals for all the systems, based on the data provided to me:

Win% v Marcel
0.516 Chone
0.500 Marcel
0.500 ZiPS
0.487 Sporting News
0.487 Fantistics
0.482 Steamer
0.455 PECOTA
0.377 2008

As you can see, Marcel had an outstanding year. 

What if I change the “20” to a “10” OPS poinst in terms of establishing the tie?  Things change slightly:

0.516 Chone
0.509 Fantistics
0.500 Marcel
0.495 ZiPS
0.491 Steamer
0.478 Sporting News
0.460 PECOTA
0.364 2008

Fantistics takes a step up, while the others pretty much stay in their spots. 

If you have to go with one forecasting system for 2009 (for hitters), Chone would be the one.

I did a similar process in 2008:
http://www.insidethebook.com/ee/index.php/site/comments/evaluating_the_2008_forecasting_systems/#14
In that case, I showed these results for hitters:
win% system W L T
0.530 Chone 234 174 579
0.514 ZiPS 235 207 542
0.482 PECOTA 212 247 528
0.473 THT 204 257 526

In terms of the big 3 (Chone, ZiPS, PECOTA), it was the same order as in 2009.

And in terms of overall (hitters and pitchers), I said this in the same thread for the 2008 systems:

Adding up the hitters and the pitchers, and we get:
win% system W L T
0.522 Chone 418 339 1076
0.502 PECOTA 413 405 1015
0.501 ZiPS 433 431 969
0.476 THT 374 463 996

Chone, among these 4, is the clear leader.  I hope no one tries to tout their system as the best, other than maybe Chone.

Matt also studied the forecasting systems for the 2007/08 seasons:
http://statspeak.net/2009/04/testing-the-projection-systems-strengths-and-weaknesses.html
(link broken) and said this:

--CHONE was the best at projecting most things.

Therefore, once a system is able to beat Marcel, Chone is the next system up the ladder that a forecasting system needs to challenge.  Chone is the heavyweight.


#1    J. Cross      (see all posts) 2010/02/12 (Fri) @ 11:58

Good stuff.  Yes we will run the same thing for pitchers.

My bold prediction: 2010 Steamer will beat Marcel.


#2    Matt K. (d_f)      (see all posts) 2010/02/12 (Fri) @ 12:56

No surprises here, but congratulations, Rally, and nice work, Jared, et. al.


#3          (see all posts) 2010/02/12 (Fri) @ 13:29

Aren’t the 50-PA guys mostly noise, even if the difference is 20 points?

What if you grouped anyone with less than 100 PAs, in alphabetical order, until you had 400—then, compare the groups to each other? 

Perhaps better, use a “tie” criterion that depends on the number of PA.  Something like (sqr(PA)/100) gives .20 for 400 PA.

That would be only a small improvement over what you have now, but just thinking out loud ...


#4    Matt      (see all posts) 2010/02/12 (Fri) @ 13:35

Must be my misunderstanding, but this IS surprising to me, since CHONE did so poorly in the forecaster’s challenge: http://tangotiger.net/forecast/prosjoes.html (for example). What’s the difference between these two results?


#5    Rally      (see all posts) 2010/02/12 (Fri) @ 13:45

Playing time is key for the forecasting challenge, and I didn’t do so well with that. 

I did manual ranking of players but I wonder if I would have done better just going with straight off the spreadsheet rankings.  It was set up to value players like a traditional 5x5 league, so stolen bases are an important category.  There are a lot of speedy guys in the minors who if allowed to play in MLB will add 5x5 value.  There are few, if any, .350/.500 sluggers stuck in the minors.  So if your projections are like mine and show 400 AB for minor leaguers you really have to sort out those speed guys.  I did some of that, but I guessed wrong on Josh Anderson and probably had him on a lot of my teams.  At the start of the season I thought he had a good shot to play CF for the Braves, and if he had he would have had Willy Tavaras-like fantasy value.  He wound up getting released and played for a few teams as a backup.

For this upcoming year I’m not going to try and get MLB playing time right, I’ll be using the fan playing time forecasts.


#6    Steve Sommer      (see all posts) 2010/02/12 (Fri) @ 13:46

Matt #4, it might have to do with playing time projections.  I think the consensus was that Rally did rate stats the best but some version of community playing time was best… I could be way off too though as I’m going from memory.


#7    Steve Sommer      (see all posts) 2010/02/12 (Fri) @ 13:47

Or I could have waited a couple minutes for Rally to answer himself smile


#8    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 13:57

In the head-to-head contest, Rally did as well as Marcel:

http://www.tangotiger.net/forecast/index.html

Remember, when you do Pros v Joes you are ending up with only 5% of the players.  So, you really gotta nail the guys you really want.  Or get very lucky.

Marcel, the standard-bearer, did not have a good Pros v Joes, but it was excellent in head-to-head.  That because in h2h, he gets 50% of the players.  So, in that case, he just has to play it cool and smart.


#9    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 14:00

Phil: how about this in terms of “wins”: anyone who actually had at least 400 PA counts as “1 game”, and then everyone else is relative to that.  So, if you win with a guy with 200 PA, that’s 0.5 wins and 0.5 games.  If you lose with a guy with 80 PA, that’s 0.2 losses and 0.2 games.


#10          (see all posts) 2010/02/12 (Fri) @ 14:03

Tango,
Will we see the same comparison above for pitchers?


#11    Tangotiger      (see all posts) 2010/02/12 (Fri) @ 14:07

Using Phil’s scoring scheme in Tango/9, I get:

0.532 Chone

0.504 ZiPS
0.500 Marcel
0.498 Fantistics
0.486 Sporting News
0.478 Steamer

0.457 PECOTA
0.397 2008

Same jockeying for position around Marcel, while Chone stays on top and PECOTA/2008 stay on the bottom.


#12    Matt      (see all posts) 2010/02/12 (Fri) @ 14:09

Got it, thanks for the explanations. I also see that Tango basically said as much in the last paragraph of the pros-joes article.


#13          (see all posts) 2010/02/12 (Fri) @ 16:36

Thanks for the update, Tango ... it does make the standings scramble up a bit.

As an aside, I’m not big on forecasting contests that force you to predict playing time.  Trying to figure out what a manager is going to do (in terms of who he plays) or what a GM is going to do (in who he trades for and who thus becomes a bench player) is quite a different task than trying to figure out a player’s talent level.  If you rate forecasters on some kind of ability to predict both at once, you don’t really know what you’re measuring.


#14          (see all posts) 2010/02/13 (Sat) @ 01:07

Really cool stuff I think.

I agree playing time is really not something we should be trying to predict.

It would be really interesting to run this sort of analysis on specific player skills.  Its likely that all these projection systems at least do something good.  Maybe PECOTA does really bad at projecting HR/FB but it does a great job a BABIP.  I’d claim than any projection system that does better than Marcels at any specific stat has something to add to our collective knowledge about players.

If we were interested in really having the best pitcher projections, we could do this sort of analysis on all the fundamental pitcher skills (K/9, BB/9, HR/FB, etc).  We could take the winner or combination of winners from each category and use the winners projections to develop a hybrid projection system.  We could even see if one of the projection systems does a good job of predicting deviation of a pitcher’s RA or ERA from their FIP, xFIP and tRA (if you adjust the pitcher’s RA or ERA for park and defense).

This hybrid projection system would hopefully be awesome however it would probably only be a marginal improvement.  It would be really powerful (and lucky) if this sort of analysis could be done on 2009 data to make a hybrid projection system that you could use for future data where for example you use CHONE to predict a player’s walk rate and ZiPS to predict the players ISO.

This would take a lot of work I’m guessing but could be cool.


#15    Tangotiger      (see all posts) 2010/02/13 (Sat) @ 01:40

Bryan: I don’t know.  They should ALL be at least as good as Marcel, right?

If they are good at predicting BABIP, but overall are worse than MArcel, then that must mean they undid something MArcel already knew, right? 

I think only Chone, and maybe ZiPS, has learned from Marcel.  My money is on Rally who actually accepted the Marcel findings, and then value-added.


#16          (see all posts) 2010/02/13 (Sat) @ 02:07

Yeah I understand what you are saying.  They all should definitely be better than Marcels.  I guess my idea more revolved around making a composite projection system from stealing what works well from each projection system.

I guess what I pictured was that I have two people in the room.  One of them is really good at projecting the number of touchdowns football teams will score.  The other one is really good with field goals.  The predictions that these people will come up with may be better than using the average point totals the teams score but overall might be pretty bad.  If you let these two people combine their powers you have something much better.


#17    mulkowsky      (see all posts) 2010/02/26 (Fri) @ 22:09

Jared, great stuff!  Those of us with early fantasy drafts are waiting with baited breath for your pitchers 2009 analysis.  Thanks!


#18    J. Cross      (see all posts) 2010/03/06 (Sat) @ 14:27

Tango, I’m taking your advice on a more reader friendly scoring system.  Here’s my scoring system used for hitter OPS (and explained).  If it seems better than just giving R or RMSE, I’ll use it for the pitcher forecast evaluations (which should be done in the next couple of days).  I added in Oliver.

http://sites.google.com/site/steamerprojections/2009-ops-forecast-standings


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:02
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II