THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, October 04, 2007

Forecast Evaluations

By Tangotiger, 11:37 AM

Thankfully, there’s always somebody one step ahead of me, that saves me alot of time in doing stuff.  This time, it’s Nate Silver evaluating various forecasts, including Marcel.  He gives you a few different ways to evaluate things (which is good), but doesn’t give you the best one.  Let me explain:


He runs through the standard ones, which is good that he does so: correlation, average error, RMSE.  That’s exactly how I would have started it.  But, I’m going to tell you why each of those is wrong.

***

Correlation: the standard equation is y=mx+b.  That’s what we learned in high school.  You have your sample points (x, or the forecast), and you have a slope (m) and intercept (b) to best-fit it to the actual result (y, or OPS).  Let me make this clear: in what we are trying to do here, evaluate the forecasts, m MUST be 1.  It must!  If you were to chop in half every single forecast that Marcel made, do you know what would happen to its correlation?  Nothing!  The above equation would compensate by doubling the slope. 

With m firmly set to 1, all that is left is the intercept.  And that is simply a league corrector.  HOWEVER, it must be set PRIOR to seeing the sample data.  Once again, imagine that the players that make up the sample pool are NOT representative of the entire forecasted pool of players (and they are not).  What happens?  You are applying a league corrector to force the mean of your limited sample forecast to the mean of the same player’s actual totals.  What you should do is get the entire set of player forecasts and infer the league OPS that the forecast presumed.  And it is THAT value that makes up the intercept (b).

That is, after all is said and done, you have the m=1, and b=lgOPS minus forecasterLgOPS.  And once you have that, then all that is left is simply the actual OPS to compare to forecast OPS+ forecasted “b”.  Once you have that, you have two choices: average absolute differences, and the RMSE (square root of the squared differences).  I prefer the former, because salary and fantasty dollars are linearly correlated to production.

***

Average error: this includes a benefit to the forecasting system that guesses the right league OPS (or runs per game).  Irrelevant.  All salary and Fantasy dollars are essentially indexed to the league average.  All you really care about is not OPS but OPS above league average.  It’s the same darn thing.  So, I’m glad that Nate did it, and I’m glad he pointed it out.  But, this must be completed discarded in any fashion.

RMSE: again, it’s the exact same issue as average error.

***

In short, RMSE and Average error ARE the correct ways to do it.  But, only after applying the correction noted above.  That said, the correlation would likely come out with a slope (m) of 1 for each system.  And it would probably come out with the same “b” for each system, whether it used the sample of players, or the population of players.

When I finally get around to doing my evaluations (of some of those, plus BIS, and the Community Forecasts),you’ll see all this.

***

Finally, great job about throwing all the systems into the pot, and seeing which ones are “most unique”.  Ideally, Marcel should have been a “0” here, since it has no uniqueness whatsoever.  Every system should be building on it.  That it doesn’t means that some systems are not learning.

However, with so many similar forecasting systems, a couple that are very similar could wipe each other out.  For example,say that THT and Marcel are both very unique.  And, if you had just one of them, the uniqueness coefficient would stand out.  But because both are there, one, or both, gets knocked down in the correlation.  What would be better (and longer) is doing head-to-head, and seeing which won explains the actual OPS more.  In this case, you have to be careful about the slope and intercept, as explained above.

Finally, and this should be very easy for Nate: what is the correlation of ALL the forecasting systems against OPS? 

Double-finally: how about correlation if you only look at last year’s stats? This was a very bad year for forecasting systems, as I explained in MGL’s earlier thread today.  I’m thinking you might get a number right around .59 with your lower threshhold.

#1    Tangotiger      (see all posts) 2007/10/04 (Thu) @ 12:27

Ouch, Rally reminded me of something that you have to do as well, and this will remove the need for a minimum playing time: the absolute error, weighted by PA.

So
abs(actual minus adjustedForecast) * actualPA

sum all that for all players, and divide by the sum of actualPA.

For guys that don’t have a forecast, you have two ways to do it.  One, give them an OPS of league average (that’s what Marcel does).

A second way, is to give them a low OPS, say 100 points below league average. This way, this kills something like Marcel, which it should!

***

Another thing, this doesn’t help for Fantasy forecasts, where playing time IS part of the forecast.  Therefore, what you actually want to do is:

1. (actual OPS minus league OPS) * actualPA
2. (forecast OPS minus forecast League OPS) * forecastPA
3. abs(1 minus 2)

For guys who are missing, I think it’s more appropriate to do what Marcel does (gives 200 PA standard), and make his forecasted OPS 100 points under the league OPS.

I would also force the actual OPS at 100 points below the league average.  Why do this?  Because what we care about is the Fantasy dollars or salaries.  And, those have floors.  So, ideally, what we are trying to do is combine forecasted OPS and forecasted PA into dollar figures.

If it makes it easier for people to follow/understand that we should convert the OPS,PA figures into dollars first (which have an obvious floor of zero), then do that.


#2    Tangotiger      (see all posts) 2007/10/04 (Thu) @ 12:38

For example, and all numbers only for illustration, so that the conversion of OPS to dollars (ignoring position), could be:

(OPS above league+.250)*PA*0.10

So, a top guy this year would be close to $40.  An average player with 400 PA would get 10$.  Feel free to tweak this as you see fit.

A standard baseline for a forecasted player for Marcel is league OPS with 200 PA.  So, the minimum Marcel forecasts is 5$.  That doesn’t seem fair at all to apply to guys that are not in the forecast pool.  The minimum forecast should be either 0$ or 1$.

So, I think this may be the best way to go…


#3    FrankM      (see all posts) 2007/10/04 (Thu) @ 15:06

Is OPS the best way to evaluate projections? You could be off 50 points each way in OBA and SA and end up “nailing” the OPS.


#4    Tangotiger      (see all posts) 2007/10/04 (Thu) @ 15:21

Of course OPS is not the best way.  Linear Weights is (for real baseball) and Fantasy Dollars is (for fake baseball). 

But, since we barely see any difference among the forecasters, I doubt you’ll see a differnce between using OBP+SLG and 1.7*OBP+SLG.


#5    Nate Silver      (see all posts) 2007/10/04 (Thu) @ 15:51

Tom,

If I adjust each forecasting system such that it nailed the league average, the average errors change to the following:

PECOTA .060
Marcel .060
Chone .061
ZiPS .061
THT .063
ESPN .064
Rotowire .065
RotoTimes .066

And the RMSE’s are:

PECOTA .076
ZiPS .077
Chone .078
Marcel .078
THT .081
ESPN .082
RotoWire .083
RotoTimes .084

I agree that this is the proper way to do things.  Even as a purely academic exercise, there doesn’t seem to be much skill in predicting league averages; you pretty much can’t do any better than to just to use the previous season’s numbers.  (Obviously, this might change if there is a change in the strike zone or a couple of big ballparks get cycled out for small ballparks or something like that).

It’s interesting that ZiPS “dominated” all forecasting systems but PECOTA in the regression exercise even though it isn’t very unique.  For example, if you stick just ZiPS and CHONE into a regression equation, then CHONE does not materially improve your R^2 versus what you were getting with just ZiPS.  Maybe the way to think of ZiPS is that it’s an exceptionally good “plain vanilla” forecasting system.  The philosophy is the same as something like CHONE, but because Dan’s a bright guy and has been working on his system for longer, the execution is a little better when Dan makes decisions about how to weight different seasons, or does his park factors, and so forth.

Also, with respect to the ESPN forecasts being unique, I’ve heard by those might have been done by Ron Shandler.  Ron Shandler always stood out from the other forecasters because he was willing to make manual adjustments to his forecasts after the fact; PECOTA on the other hand goes straight from my program to the retail shelf.  If Shandler is making “gut-feel” adjustments, that’s definitely proprietary to his system, and they seem to make a positive difference on balance. 

It’s interesting that all the forecasting systems are down from last year.  I don’t have the time to look at this now, but when I looked at it about six weeks ago (I was concerned that PECOTA was doing badly compared to last year; turned out it was but it wasn’t alone), I found that the year-to-year correlation in OPS was fairly normal this season.


#6    Tangotiger      (see all posts) 2007/10/04 (Thu) @ 16:11

Nate, thanks for all that.

Marcel having an average error of just .060 to lead the pack (with PECOTA) should pour alot of cold water in everyone’s faces.  There’s nothing more plain vanilla than Marcel (and it’s the only one that’s completely Open Source).

Part of that success is because of the threshholds you use.  If a rookie has at least 250 PA, they did pretty good (say around league average).  And Marcel forecasts every rookie with around a league average OPS.  It’s one of the hidden tricks that lets Marcel hang out so well.

Believe me, I *want* to knock Marcel down, which is why I proposed the process is post #1 above.

If you can post the ESPN forecasts for a few players, like Tulo, Braun, someone here will be able to confirm if they are Shandler’s or not.


#7    Tangotiger      (see all posts) 2007/10/04 (Thu) @ 16:21

The formula for Marcel has not changed since I first introduced it:
http://www.tangotiger.net/archives/stud0346.shtml

(Note the age fix in posts 4 and 20.)

I also go through it step-by-step in post 25.

Pitching instructions are in post 28.


#8          (see all posts) 2007/10/04 (Thu) @ 16:49

From the ESPN fantasy baseball site:

Braun-no projection
Tulowitzki-0.781
S. Drew-0.760
B.J. Upton-0.843
C. Young-0.803
D. Young-0.828
Markakis-0.858
Kouzmanoff-0.801
Hart-0.798
Carlos Beltran-0.939
Matt Holliday-0.904
Brad Hawpe-0.938
Ichiro Suzuki-0.799
Alfonso Soriano-0.893
Carlos Lee-0.873
Magglio Ordonez-0.874
Grady Sizemore-0.93
Lance Berkman-1.011
Juan Pierre-0.718
Chase Utley-0.89
Brandon Phillips-0.741
Robinson Cano-0.773
Brian Roberts-0.788
Dan Uggla-0.797


#9    HarryAbles      (see all posts) 2007/10/04 (Thu) @ 17:01

Those aren’t from Shandler’s Forecaster.


#10    MGL      (see all posts) 2007/10/04 (Thu) @ 18:02

Can we just say that all of the Marcel-like projection systems by the “good” sabermetricians do equally well and that the “fantasy” ones are a step behind and that Pecota (which I like a lot) does AT LEAST as well as the former?

And not worry about anything else.

There are many other good and interesting ways to compare systems. One is to look at players with little statistical history only.  Or a lot of experience.  Another is to look at certain age groups of players.  Another is to look at players who were far off from the average forecast and see how each system did with them.  Another is to look at players who were forecast very differently among the systems and see how they did actually and which systems did the best. 

Just looking at how everyone did with regard to all players is not necessarily a good, interesting, or useful way to compare systems.  And it depends on what you want.  Do you want to “nail” most of the easy ones?  Do you want to do well on the ones that are difficult?  Etc.  Each system may also have its strengths and weaknesses with regard to that.


#11    Rally      (see all posts) 2007/10/04 (Thu) @ 22:14

What do we want?  Bragging rights.  And to beat that clever little monkey…


#12    Rally      (see all posts) 2007/10/04 (Thu) @ 22:48

Pitcher projection results here:

http://lanaheimangelfan.blogspot.com/2007/10/so-how-did-pitcher-projections-do.html


#13    Nate Silver      (see all posts) 2007/10/05 (Fri) @ 05:09

Rally,

I’m getting results that are a bit different than yours.  CHONE still does pretty well, though.  Not sure where the differences lie, since the methodologies sound very similar on the surface.  I’m using 4.75 (e.g. just slightly worse than league average) rather than 6.00 for missing pitchers, which could make a difference.  Also, a greater number of pitchers get excluded because I’m using more systems.  But there might also be version control issues.  FWIW, the versions I’m using are:

PECOTAs: Build labeled 04/04/07 (our start-of-season update) from my Hard Drive.
CHONE/Marcel/ZiPS: Downloaded from Fangraphs over past couple days.
THT: Version I got with book I e-book I purchased on or about 3/15.
RotoWire/RotoTimes/ESPN: Downloaded from respective websites over past couple days.

MGL ... I agree with 99% of what you say in general and 100% of what you say in this thread.


#14    tangotiger      (see all posts) 2007/10/05 (Fri) @ 07:14

Rally,

For the “average error”, can you weight it by IP (this way, you don’t even need an IP cutoff)?


#15    Rally      (see all posts) 2007/10/05 (Fri) @ 09:20

The average error is runs, so its already weighted by innings.  The formula is simple - ABS(era-proj_era) * ip/9.  I used a cutoff because of data validation issues and had to limit the time I spent on that.  Only Marcel has a common id, so I had to match up names.  There’s a reason other people didn’t use retroid, because you can’t find one for minor leaguers.  THT was the worst offender, listing everyon’e proper name instead of what they are commonly known as.  Too many “Matthew“‘s when every in baseball goes by “Matt”.

Plus pitchers less than 50 IP are more likely to not have one or more projections, and I’d lose a lot of them anyway.

Nate, that’s what I’m using, except for PECOTA.  (You should be pleased to know I do not have access to your hard drive) I did not publicly update CHONE after fangraphs picked it up, so those are the projections in the test.  For PECOTA I’m going by the print copy.  I think that’s fair as that is about the same time I put my numbers on the web.  Out of over 300 pitchers in the sample, PECOTA took a “6” for only about 10 pitchers.


#16    tangotiger      (see all posts) 2007/10/05 (Fri) @ 09:52

Rally: ah, great, that’s exactly what you should have done.

Nate checks in as well for pitchers:
http://www.baseballprospectus.com/unfiltered/?p=569

(He has a typo for CHONE with this number: 1.804 needing to be 1.084.)

In any case, using Nate’s unweighted differences, it’s around 1.1 runs per 9 IP.  Presuming an average of 144 IP per pitcher, that makes his difference about 16 to 17 runs per pitcher.  Rally is reporting 9 to 10 runs.  (That’s almost definitely the result of Rally using proper weighting, which is why I highly recommend this method.)

In any case, Rally is reporting these numbers:
Chone 9.0
Pecota 9.2
THT 9.3
Marcel 9.4
Zips 9.5

If Marcel is off by 9.4 runs on average, and the best we can do is CHONE being off by 9.0 on average, and Marcel intentionally is brainless about rookies… well, you can see how there’s really not much that we can do here.

***

Rally, one last request: can you do a head-to-head with Chone and Marcel (and PECOTA and Marcel).  Anyone that has at least a 4-run advantage gets a win and the other guy gets a loss.  Anything under 4 wins is a tie.  If “4” is too high (implies a 0.50 ERA in 72 IP, and 0.20 ERA gap in 180 IP), then choose 3.

I’d like to be able to say, in plain english: “Marcel’s record against Chone is 21-31, with 52 ties, for an overall win% of .452”, or something to that effect.

***

I agree, EVERYONE should be using RetroID or BDBid.  I’m calling on ALL forecasters to do this.


#17    Rally      (see all posts) 2007/10/05 (Fri) @ 10:13

I’ll see what I can do to add a retroid in this year.  The problem is I don’t use that id to build the projections since I’m combining major and minor league stats.

I’ll try the head to head records tonight.


#18    tangotiger      (see all posts) 2007/10/05 (Fri) @ 10:20

As long as you use anything unique, that’s fine with me.  This way we can create a mapping table to use in the future.

***

Yes, the head-to-head is really the best way, since no one’s going to know what r=.68 or .65 really means in a real-world sense.  The question will always be “which one is better, and how much better”.  Saying that Chone was better 52% of the time than Marcel is straight-forward… though it does mean it’ll be wrong 48% of the time (d’oh, nobody told me that).


#19    Nate Silver      (see all posts) 2007/10/05 (Fri) @ 10:33

Rally,

Philosophically, I’d probably raise an objection to going with the print copy of PECOTA.  Those projections are generated in November, which is before we’ve gotten to do things like update them to reflect each team’s projected defense based on their depth chart.  It’s also going to miss some players that looked unlikely to be relevant players on their team’s roster in the fall, but have imprvoed their status by the spring.  For the other forecasters, I’m using what essentially boils down to the closest available copy as of the start of the season, so I think we’re being consistent at least; all else being equal, a projection system that is updated more frequently prior to the start of the season is an asset to the customer.


#20    Rally      (see all posts) 2007/10/05 (Fri) @ 10:44

Understood.

I’m just testing what I have - in PECOTA’s case the print version.


#21    tangotiger      (see all posts) 2007/10/05 (Fri) @ 10:50

I think Rally is justified in taking anything that is published.  He should call it “PECOTA/BP2007” to be explicit.

A reader can’t be influenced by the results of PECOTA/HardDrive or PECOTA/BP.comApr1.2007 in making his decision as to whether to buy the book.

***

Sticking on philosophical grounds, Marcel has an implicit forecast for all players to be league average, with 200 PA or 50 IP.  It’s in the actual formula that is posted.  Anyone can reproduce the results I generate.  I didn’t bother doing it for Dice-K, or any player in AA, AAA, Japan, or my pee wee league, because it’s a burden to do so.

However, I’m always looking for ways for other forecasters to beat Marcel.  If one way to do that is to give it an assumed forecast over an implicit forecast, that’s ok.


#22    Nate Silver      (see all posts) 2007/10/05 (Fri) @ 10:57

BTW, here are PECOTA’s head-to-head scores for the pitchers in response to Tango’s question.  This is using the version of the forecasts that recalibrates everyone to the correct league average.

PECOTA versus:
ZiPS 55.1% (n=287)
RotoTimes 53.7% (n=285)
ESPN 53.1% (n=285)
THT 52.3% (n=285)
CHONE 51.9% (n=287)
RotoWire 50.9% (n=287)
Marcel 50.3% (n=288)

So, not really a whole lot of advantage if you’re looking at all of the forecasts ... you could pretty much just flip a coin with Marcel. 

However, if we restrict to those evaluations to cases in which there was at least a .50 difference between the forecasts, e.g. the pitcher was “difficult” to forecast for whatever reason, the numbers change thusly:

PECOTA versus:
RotoTimes 62.9% (n=62)
THT 62.0% (n=92)
ESPN 61.7% (n=81)
ZiPS 60.8% (n=74)
RotoWire 59.8% (n=92)
Marcel 59.5% (n=74)
CHONE 52.3% (n=44)

So now, PECOTA is in the 60% range against all systems except CHONE, which you can go back to flipping a coin with.


#23    Nate Silver      (see all posts) 2007/10/05 (Fri) @ 11:08

Tom,

All of these forecasts were available to our Fantasy and Premium subscribers.  We update things once a week or so through mid-June.  I reverted to the start-of-season update rather than using the version on our website because the latter is dated 6/15, which is clearly enough an unfair advantage since it can account for more team changes, etc.


#24    tangotiger      (see all posts) 2007/10/05 (Fri) @ 11:09

Great stuff Nate.  Looking at the PECOTA v Marcel, we can say: “25% of the time, PECOTA and Marcel did not agree on a forecast, and of those times, PECOTA won 60% of the time”.

Out of the 74 disagreements, PECOTA is 44-30.  In the other 214 pitchers, they pretty much agreed on.  So, that’s the advantage for PECOTA: 14 pitchers.  That seems entirely reasonable to me.


#25    tangotiger      (see all posts) 2007/10/05 (Fri) @ 11:17

Nate, I don’t have any problem with whatever forecasts are used.  BP2007, BPpremium are clearly two different though related sources, and therefore, BOTH can be part of the analysis.  One does not preclude the other. 

A reader contemplating purchasing just one source (book or online) can make his determination based on the results of whatever the actual source is.

So, I don’t think we have any disagreement.  That said, I think the results will barely change regardless.


#26    Nate Silver      (see all posts) 2007/10/05 (Fri) @ 11:24

Tango, here are the same numbers for the hitters.

First, all hitters in the sample:

PECOTA versus:
RotoTimes 56.9% (n=295)
ESPN 53.2% (n=295)
ZiPS 53.2% (n=295)
RotoWire 52.9% (n=295)
THT 52.9% (n=295)
CHONE 51.9% (n=295)
Marcel 50.8% (n=295)

Now, for differences of >= 50 points of OPS between the respective forecasts.

PECOTA versus:
RotoWire 67.2% (n=61)
RotoTimes 62.3% (n=53)
Marcel 60.4% (n=48)
ESPN 55.8% (n=77)
THT 55.6% (n=54)
ZiPS 51.4% (n=37)
CHONE 51.3% (n=39)


#27          (see all posts) 2007/10/05 (Fri) @ 11:27

7/Tango:

I’m attempting to recreate Marcel in Access right now with the Lahman database (I could use the Access practice). I have a query with league totals for 2004-2006 (leagueBatting) and a query with the weighted player totals (Batting0406).

Two questions: how do I remove pitchers’ batting stats from the league totals? And how do I then add the correctly weighted league averages to the player totals?


#28    tangotiger      (see all posts) 2007/10/05 (Fri) @ 11:31

I ran a regression of Marcel OPS forecast, Marcel PA forecast, and Marcel Fantasy $ forecast (all published on my blog/site) for the 344 hitters in the Fantasy pool.  (Obviously, position forces this so that the correlation won’t reach 1.00)

In any case, correlation was r=.82 (forcing intercept at zero) with the following equation:
.12 * PA * (OPS-.800 + .200)

The .800 is the average OPS (weighted or not) of the 344 players forecast.

A guy not forecast (200 PA, .750 OPS) would come in at $3.60.  To make him worth $1, he’d have to be 150 PA with .650 OPS, or 100 PA with .700 OPS.

So, that’s my recommendation to do the correlation.  Use the above equation (changing “.800” to whatever the average is for teh pool of players), and forcing $1 for any player not forecast.  This will put everyone in the same denomination (dollars).

I’ll come back for the pitchers.


#29    tangotiger      (see all posts) 2007/10/05 (Fri) @ 11:39

Anthony, the SQL to figure a player’s primary position:

http://www.insidethebook.com/ee/index.php/site/comments/database_hacks#4

***

Nate: good stuff.  For hitters, PECOTA has a 10 player advantage over Marcel.  Over hitters and pitchers, that’s a 24 player advantage.  I wouldn’t be surprised if almost all of that was on guys with less than 2-years experience.


#30    tangotiger      (see all posts) 2007/10/05 (Fri) @ 11:53

To convert ERA into dollars (r=.91)

.10 * IP * (4.20-ERA + 0.70)

The “4.20” is the average of the 210 pitchers in my Fantasy pitchers.

The implicit Marcel forecast would be $2.  A $1 forecast is 50 IP, with a 4.70 ERA.  Again, I’d give anyone not forecast with a $1 value.

How about it guys?  Does OPS/PA-translatedDollars and ERA/IP-translatedDollars work for everyone?  It puts everything into the target market (Fantasy players), handles the playing time concerns, and handles guys who did not receive any forecast.


#31    Nate Silver      (see all posts) 2007/10/05 (Fri) @ 12:00

These were the guys with at least 50 points of OPS difference between their Marcel and their PECOTA, excluding guys who didn’t have a Marcel forecast at all.

Eric Byrnes
Javier Valentin
Abraham Nunez
Corey Hart
Jeremy Hermida
Jack Cust
Ryan Theriot
Matt Kemp
Prince Fielder
Jim Edmonds
Willy Taveras
Dustin Pedroia
Esteban German
Jason Tyner
Troy Tulowitzki
Jerry Owens
Marlon Byrd
Kelly Johnson
Nelson Cruz
Nook Logan
Sean Casey
Tony Pena
Josh Bard
Alfonso Soriano
Jason Kubel
Casey Kotchman
Darin Erstad
Frank Thomas
Norris Hopper
Scott Hairston
Adam Dunn
Reggie Willits
Gary Sheffield
Sammy Sosa
Chris B. Young
Jose Guillen
Ryan Klesko
Conor Jackson
Brian Roberts
Aaron Miles
Cliff Floyd
Kevin Kouzmanoff
Howie Kendrick

So indeed, mostly young guys, some old guys, and a few “weird skill set” guys like Adam Dunn and Willy Taveras.

I don’t think there’s much doubt that there’s little difference between any two essentially competent forecasting systems for the sort of 80% mainstream of the MLB population. Frankly, when I run the PECOTAs through each year, the only ones I pay a lot of attention to are the forecasts for minor leaguers and the longer-term (5-year) forecasts.

It’s a little easier to screw up a pitcher forecast if you don’t understand DIPS theory and the way that the components feed into ERA and things of that nature, but ZiPS, Chone, Marcel, PECOTA certainly have no problem with this.


#32    MGL      (see all posts) 2007/10/05 (Fri) @ 16:38

Tango, have you done your assessment yet?  I am curious how I stack up, although my forte is not necessarily forecasting by any means.  In fact, I let the computer do everyhting and don’t look at at any of the forecasts it spits out (I don’t make any “manual” adjustments).  For example, if someone was not healthy all of last year and is healthy this year, my forecast for that player will probably be too pessimistic.  I think I might do some adjustments based on DL time, but I am not sure.

Also, I use retrosheet ID’s (I think).  For all minor league players, I use the same format (last 4 letters of last name plus first letter of common first name, spaces, hyphens, and apostrophes in first and last names removed), and then I add “991”.  If there are duplicates, I use “992”, “993” etc.  If everyone wants to use that same format from now on, that might work.  Just a suggestion, no big deal.

Nate, I take exception to the 1% that you disagree with!  JK. wink


#33    philly      (see all posts) 2007/10/05 (Fri) @ 18:20

Nate

Any chance you could publish some of these tests exclusivly on minor leaguers and/or the 5-year forecasts?

I’d love to see a comparison of 2007 forecasts published prior to 2006 to actual 2007 production.

Any sense of the increase in error from year N to year N+1 projections?


#34    Rally      (see all posts) 2007/10/05 (Fri) @ 19:03

I don’t make any manual adjustments either.  I may comment on it, something like “I’ve got Kotchman projected to hit X but I’m convinced he’s better than that” but I publish what the computer spits out.

I take that back.  I will adjust playing time for known injuries, I think I cut Francisco Liriano down to 30 innings just to keep his name in there even though he was unlikely to pitch at all, but I didn’t touch any of his rates.  I think I projected Juan Rivera for a half season after his winter injury, but again, his BA/OBP/SLG is unchanged.


#35    Nate Silver      (see all posts) 2007/10/05 (Fri) @ 20:14

Philly,

I looked at our numbers for minor league hitters back in August and it looked like our translations we were coming in about 15-20 points of OPS too high.  I haven’t looked at how this ended up—some of the guys we were missing on finished the year strongly—but I don’t know that this was a particular strength of ours this year.  Clay has done some due diligence as a result and tweaked the translations algorithms slightly going-forward.


#36    Tim Dierkes      (see all posts) 2007/10/10 (Wed) @ 21:24

Hey guys, great discussion.  I have a little fantasy website called RotoAuthority.com where I sell projections each year (’07 was my second year). 

I was wondering if I could get in on the action, figure out my correlation coefficient for pitchers?  I’m just wondering the exact sample of pitchers you guys used.

Thanks

Tim


#37    MGL      (see all posts) 2007/10/10 (Wed) @ 21:54

Tim,

I don’t know.  You can probably write Nate and ask him.  I assume his e-mail address is on the BP site or you can just write to him % BP.  He is the one that did the comparisons. We are only referencing his work here.  Tango can correct me if I am wrong, as he started this thread.


#38    Tim      (see all posts) 2007/10/10 (Wed) @ 22:06

Thanks - I will give it a shot.


#39    Tangotiger      (see all posts) 2007/10/11 (Thu) @ 10:55

Actually, I’d be happy to reference any evaluations of forecasts here.

I’m really behind on my work.  I need to get out the Scouting stuff this week, and then I need to go through all the forecasts I have lying around, a good dozen or so.  Plus hockey season just started, and the NHL changed their file layouts completely.


#40    Rally      (see all posts) 2007/10/18 (Thu) @ 18:51

We all know what Marcel doesn’t do - the monkey doesn’t like MLE’s and his simian brain freaks out when you discuss park factors.

I had a thought today to see how well the forecast systems do when they meet on even ground with the monkey.  So I took my sample of 500+ AB players and cut it down even further - keep only players who have been in the major leagues with the same team for the last 4 years.  I’m left with 48 players.

Even without minors and parks, the other systems have advantages, one being you can regress each component stat separately instead of Marcel’s one size fits all regression approach.  Plus whatever info the projectors come up with - speed scores, height/weight, sim scores, custom aging curves, who knows what else.

None of it beats the simple approach.  For correlation, Marcel comes in at .700, followed by Chone .698, and Zips .697.  For average error Pecota beats the monkey 14.2 to 14.3, with Chone at 14.4.

My CHONE spreadsheet is an excel worksheet with 142 columns of mind numbing calculations with multiple lookup tables, and gets no better answers than a monkey.

I think the term I’m looking for is mathematical masturbation.


#41    tangotiger      (see all posts) 2007/10/18 (Thu) @ 21:31

Rally, that’s fantastic to know.  Great work.  Since Marcel only uses 3 years, you can cut that back down to same team with 3 years.  That should give a boost to anyone who uses 4 years of data.


#42    Rally      (see all posts) 2007/10/18 (Thu) @ 22:20

I was looking at every one with the same team for 4 years - 3 years of data to create the Marcel, and the 4th year that we’re testing.


#43    tangotiger      (see all posts) 2007/10/18 (Thu) @ 23:02

D’oh.

***

I should note that before I heard the call of Marcel, I too spent countless hours looking for various permutations of numbers, all adding so very little for so much effort.  (For guys with 3 years, anyway.)

***

I show a “reliability” number in my forecasts.  You could for example run your forecasts against the Marcels at various reliability numbers.  So, “when Marcel is at reliability >=.80, Chone and Marcel are a dead heat… when Marcel is at reliability <=.30, Chone trounces Marcel 70% of the time, etc, etc”.

Nonetheless, great stuff.


#44    Ender      (see all posts) 2008/01/17 (Thu) @ 11:13

Isn’t the real problem with the pitcher comparisons that ERA is just an unreliable stat? 

If you were to look at something even as simplistic as FIP or xFIP I bet you would get a much better view of the pitching projections.  Something like xERA or qERA would be even better.

Always seems weird to me that people do comparisons using WHIP and ERA which show weak year to year correlations.


#45    Tangotiger      (see all posts) 2008/01/17 (Thu) @ 11:27

But, this is what you are trying to forecast.  If you are trying to forecast how much money you will earn in the next 10 years, and it turns out your idiot brother did better than you, then that’s what you are interested in.  He got lucky to earn all that money, but that’s what you are interested in.  You are not interested in who is smarter, or more talented, or has the best path in life.

That’s why you look for ERA.  You have a combination of pitching talent, luck, fielding, park, etc.  There’s alot of noise in there.  But, that’s what you care about.  You want to know how many runs will score when that guy is on the mound.  That’s the test.

Same for wins.  You bet on who will win more games, not on who will have the largest run differential, or the largest base differential, or other component-type analysis.  You want the whole thing.  If the Dbacks ended up being lucky, well, that’s the point.  That’s the bet.  Otherwise, why bet on blackjack?  You are betting on a combination of randomness and your talent.

The true test for a hitter is either WPA or WPA/LI.  It’s irrelevant if a guy ends up getting on base at a .400 clip, but ends up scoring 80 runs in a full season.  You can bet on components all you want, but the true test is really the final total output.  And ERA is pretty much it.


#46          (see all posts) 2008/01/23 (Wed) @ 14:11

Well, I think it depends on the type of league you’re in.  Scoresheet leagues tend to want more detailed data because it can affect that style of fantasy play a bit more.

--sam


#47    Tangotiger      (see all posts) 2008/01/23 (Wed) @ 14:28

Sam, to which post or idea are you referring to.  Your statement is true based on the question I can infer, but I don’t know if that question was really brought up.


#48          (see all posts) 2008/01/28 (Mon) @ 01:41

Oh… I was just referring to the comment some of the details are irrelevant (which I inferred meant that they were irrelevant for a typical roto league).  I was just pointing out that the details are more than welcome in other environments.  :D

By the way, thanks for Marcel!
--sam


#49    Ender      (see all posts) 2008/01/28 (Mon) @ 10:52

Well I’d say the final test is RA/G but that is nitpicking, heh.

My point was no system is going to be able to predict the noise so if you really want to find the system that understands how pitchers work the best you would be better off looking at the correlation to a component stat. 

An absolutely perfect system that could 100% predict the pitchers k/9, bb/9, gb% etc would still show a poor correlation year to year for ERA because it is just so random.


#50    Tangotiger      (see all posts) 2008/01/28 (Mon) @ 11:18

You are not predicting the noise, nor are you trying to achieve some high correlation, just for the sake of seeing a high number.

You are trying to predict the output, within the constraint that it will be polluted by noise.  That’s the reality of it.  We know exactly what the top-end high correlation is, so that’s your comparison point if you are looking to see a high number.  The maximum r is *not* always 1.0


#51          (see all posts) 2008/01/28 (Mon) @ 11:49

I was just alerted to another set of free projections.  Has anyone looked at CAIRO…

http://www.replacementlevel.com/index.php/RLYW/comments/cairo_projections_and_zone_rating_database_updated


#52          (see all posts) 2008/01/30 (Wed) @ 13:05

Did more digging on CAIRO… it’s based on Marcel.  However, when I compare the two, the projections are somewhat different.


#53    Tangotiger      (see all posts) 2008/01/30 (Wed) @ 13:25

First I’ve heard of it.  SG at RLYW is/was a frequent contributor to Primer, and he does good work.

***

Here’s the main CAIRO pages:

http://www.replacementlevel.com/index.php/RLYW/direct/C233

***

Here’s his description:
http://www.replacementlevel.com/index.php/RLYW/direct/cairo_projections_v01

It is based on Marcel, plus more.  It keeps the batting weights of 5/4/3, but extends it back two extra years (2/1 weights).  The pitching weights of Marcel is 3/2/1 (or 6/4/2 or 7.5/5/2.5); SG usees (7/5/3) and extends it 2 more years to 2/1.

The most rigourous that I use (not in Marcel) is 0.8^yrs for hitting and 0.7^yrs for pitching.  You can see that he basically follows this model for pitching.

I don’t know how the regression toward the mean is handled, and that’s the most important part.

He also includes park and league adjustments, as well as MLEs.  So, a direct head-to-head comparison should leave Marcel with a bloody nose.

As long as a forecasting uses Marcel at its core, but makes intelligent choices with the remaining known data, it leaves Marcel irrelevant as a forecasting system.

I guess we’ll find out in October if this is the case.

***

Here’s where you get to interact with the model:
http://www.replacementlevel.com/index.php/RLYW/direct/cairo_single_player_projector


#54    Rally      (see all posts) 2008/01/30 (Wed) @ 14:10

We already have forecasting systems that start similar to Marcel and go from there.  They beat Marcel by a few percentage points.  His nose isn’t bloodied.

It might be a different case if we evaluated against all players, including most recent year MLE’s.  Limiting to players who earned major league playing time gives the monkey a good chance, since his projection will be last year’s september callup + a large helping of league average.


#55          (see all posts) 2008/02/02 (Sat) @ 16:37

My question goes back to assessing the projections after adjusting each to the league averages.  This puts them on a level playing field when looking back to see how they did.  So, if I’m looking forward to 2008, and plan to use projections from 5 different sources, would it make sense to adjust them all to the same league averages, rather than use the raw projections?

For example, at Fangraphs they have projections from Bill James, CHONE, and Marcels, and on the few players I’ve looked at so far, the Bill James projections seem well above the other two.  If the hitter population for the Bill James projections are well above the others, should each player be adjusted down?


#56    tangotiger      (see all posts) 2008/02/02 (Sat) @ 17:28

Yes, definitely.

Marcel is using 4.80 runs per game.


#57          (see all posts) 2008/02/05 (Tue) @ 18:49

Anyone know how Shandler does in these comps?


#58    Matt      (see all posts) 2008/02/13 (Wed) @ 18:59

Fantasy usefulness is unfortunately a lot more complicated than a simple comparison of predicting ERA.  Beating another system by a few tenths of a percentage point in predicting ERA probably won’t help me win my league. 

I say to myself, How does each system do in projecting Wins for pitchers?  How about Runs and RBI for hitters?  That’s what I would want to know. 

Some systems don’t predict all of the basic fantasy stats (not that they are beholden to, since obviously that’s not why the creators make them).  For Salary $ those questions don’t mean anything, but for fantasy value they are vital.


#59    makewayhomer      (see all posts) 2008/02/25 (Mon) @ 16:28

does anyone have the time/access to compare PECOTA/CHONE/ZIPS to Baseball HQ? Rotolab uses HQ projections, and is a nifty piece of software....but of course it all depends on the underlying projection assumptions.

do you guys think HQ is closer in accuracy to the “SABR” methods (pecota, chone.zips), or closer to the “Fantasy” guys? (rotowire, etc)?


#60    Tangotiger      (see all posts) 2008/02/25 (Mon) @ 16:34

When we have looked at Shandler, he’s with the SABR methods group you are listing.  By the way, you should call them the “Marcels” group, since that’s what these guys basically are.

I get annoyed when Marcels are not considered, since they are the simplest, and are at least in the middle of the pack.

Plus, they are an open box.


#61    Chris Long      (see all posts) 2009/10/15 (Thu) @ 19:01

Forecasts should provide confidence intervals; given this, there are standard scoring methods in meteorology that take into account sharpness of projection as well as calibration.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Mar 22 02:00
Pre-Introducing Batted Ball FIP

Mar 22 01:11
Mike Silva on J.A. Happ

Mar 21 18:56
Morgan Ensberg has parental advice

Mar 21 18:17
MGL and Tango Interviews

Mar 21 12:03
Statistical Significance, or the reason that mathematician Ron Fisher is on MGL’s “On Notice” Board

Mar 21 08:18
Yahoo fantasy sabr league

Mar 20 21:32
BDB Database (MS Access)

Mar 20 15:42
Quickest ejection in MLB history?

Mar 20 10:20
Optimizing the batting order: Phillies and Yankees

Mar 20 02:31
Will Mariano Rivera save only 22 games this year, and with a 3.53 ERA?