THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, March 04, 2008

Community Forecast, 2007 - Preliminary Results

By Tangotiger, 12:44 PM

Here’s the Google Docs of the Community Forecast for last year, 2007, for hitters. Here’s how to read the chart:
Player: Albert Pujols (playerName), pujolal01 (LahmanID), 405395 (MLB.com/Elias playerID)
Forecast: 29 ballots (n1) averaged forecast OPS of 1.101; 32 ballots (n2) averaged forecast of 154 games; appeared on 94% of Cardinals ballots

And for pitchers.  I added the teamid.  There’s also a “depth” column, which is really just a sort order.  The five new columns (Starter AceReliever Setup_or_Swing Mopup Callup ) is a percentage of all the ballots cast, where the fan thought that’s how the pitcher would be used.

I did substantial data cleanup, but I may have more to do.  That’s why I’m calling this one preliminary.  But, a first pass look seems to be reasonable, and I doubt any further cleanup will change much.  If you spot anything irregular (like all of a team’s players are too low), let me know.

I’d love it if someone out there did a study here.


#1    Tangotiger      (see all posts) 2008/03/04 (Tue) @ 13:37

I added the pitchers data as well.  (See above.)

***

I’m hoping someone out there does a good study here.  Here’s a couple of ideas:

1. We regress the Saves counts alot in Marcel, simply because we are not sure who will be the closer.  I would guess that having the Community tell you gives you an extra parameter to help in the regression.

2. The usage pattern of the pitchers (starting rotation, swingman, callup, etc) will give you a better IP point to regress to.  Same idea for hitters, where you have a better regression point for games played (implicitly gives you the injury history of the player).

3. How delusional is the Community?  The ERA forecasts for the ace relievers are super low.  Does the Community simply not understand regression?  How much do you have to regress their ERA and OPS forecasts?  (e.g., with Marcel, Chone, PECOTA, when you run a correlation of forecasts to performance, the “slope” will be extremely close to 1… I don’t think that’s the case with the Community).


#2    Tangotiger      (see all posts) 2008/03/04 (Tue) @ 16:47

I looked at the top 14 forecasts by the Fans for hitters.  The average OPS they forecasted was 1.000.  Those 14 actually averaged 0.958.  So, they forecasted +.042 points higher than they performed.

I decided to do the next 14 highest after that.  The Fans came in at .931 forecast.  The gang of 14 did .883.  So, their forecast was +.048 above what they actually did.

It’s not bad.  Basically, the Community might be collectively optimistic, but at least they were able to forecast the players 1-14 to be higher than 15-28.


#3    Colin Wyers      (see all posts) 2008/03/04 (Tue) @ 18:02

I did a quick weighted correlation, and the Fans are beating Marcels, .642 to .600. (The Marcels correlation is only for players who the Fans projected.)


#4    tangotiger      (see all posts) 2008/03/04 (Tue) @ 18:26

Wow, that is great!

Rally, I hope you can do the same thing against your Chone.

***

Remember, the basic equation you are testing is the following:

popOPS = sum(actualPA * playerOPS) / sum(actualPA)

adjPlayerOPS = playerOPS - popOPS = X

diffOPS = sum(actualPA * abs(forecastedX - actualX)) / sum(actualPA)

You set NO PA limits, at all.

This will tell you the average error for each forecasting system.


#5    Colin Wyers      (see all posts) 2008/03/04 (Tue) @ 19:59

This is a bit biased, because the sample of players is “players projected in the Community Forecast;” for those players not projected by Marcels/PECOTA but included in the sample, popOPS is used.

Average error, by forecasting system:

Community Forecast: .0661
Marcels: .068
PECOTA: .074

And, just for kicks, the average player, from each system:

Actual: .781
Community: .808
Marcels: .796
PECOTA: .787

(This is a bit high, sure. But that’s probably an issue of sampling; there are probably more “good” players than “bad” in our sample, compared to the overall population of major leaguers.)

So the community does seem to be more optimistic than the forecasting systems, but seems to do a better job of gauging relative difference between players.

This makes some intuitive sense to me; the fans have access to a lot of information about a player that the forecasting systems aren’t capturing, largely about injuries and future utilization.


#6    tangotiger      (see all posts) 2008/03/04 (Tue) @ 20:13

You should only use the exact same players in all the pools to start with.  If you want to make an uneven pool, you have to make sure that the extra players doesn’t improve those systems.

But, this is incredible.  What we are suggesting here is that the best forecasting system is the one where it takes me 1 hour to setup (and get the input of 1000 people x 10 minutes = 167 hours).  I love that.


#7          (see all posts) 2008/03/04 (Tue) @ 21:25

Not sure if I’m reading this correctly, but is Colin (#5) saying that both the community and Marcel were more accurate (by average error) than PECOTA? And what would Nate Silver have to say about this?


#8    tangotiger      (see all posts) 2008/03/04 (Tue) @ 21:37

Well, Marcel and PECOTA were in a dead heat last year.

That the Fans beat them both is incredible.


#9    Colin Wyers      (see all posts) 2008/03/04 (Tue) @ 22:00

Just for kicks, here’s how the three forecasts included relate to each other by average “error”:

Marcels and PECOTA: 0.037
Marcels and Community: 0.030
PECOTA and Community: 0.039

Marcels seems to be right between the Community and PECOTA, on the whole.

And yes, for the sample we’re using, Marcels beats PECOTA, but not by a whole heck of a lot. (The difference in accuracy between projection systems in our sample is eight points of OPS.)


#10    Colin Wyers      (see all posts) 2008/03/04 (Tue) @ 22:52

I don’t have LahmanIDs for PECOTA’s pitching projections on-hand (a project for later on, I’m sure), but the Community comes out on top versus Marcels again. I used ERA above/below the average, weighting by outs recorded.

Average error for the Community is .91; average error for Marcels is .94. Again, not a massive difference, but worth noting.


#11    Rally      (see all posts) 2008/03/05 (Wed) @ 22:41

I compared the forecasts to CHONE.  I didn’t have player ID’s for CHONE last year (at least I do now, for the 2008 projections), so I compared by name.  There were 26 players that did not match by names, either one of the systems didn’t project a player, a misspelling, or nickname vs proper name, but in the interests of time I just deleted them.  I also deleted the Forecasts without a player ID.  I still had 481 matches, so my sample is good.

Took the actual OPS and PA from the Lahman database.  Then I figured the forecast average.  The actual sample had a .776, the Chone sample .781 (so I multiplied by .994) and the Community forecasts .806 (x .962)

I took the adjusted forecasts, and compared as in #4.  Community forecasts had an average error of .0661 (same as above, though my player group is not quite identical) and CHONE just edges it out at .0648.


#12    tangotiger      (see all posts) 2008/03/05 (Wed) @ 23:24

Fantastic stuff Rally…


#13          (see all posts) 2008/03/06 (Thu) @ 03:44

Here are the 2007 Pecota pitcher projections (I hope) with the Lahman and retrosheet IDs added (Lahman is the first column).  The ones with “xxxxx” have not played in the majors yet and therefore have no ID’s.

http://spreadsheets.google.com/pub?key=p4mB-r5bxU8gxxmJym_fXLw

I can do the batters if anyone wants.  Let me know.


#14    tangotiger      (see all posts) 2008/03/06 (Thu) @ 07:19

You’re missing the headers on that sheet.  If you can tell us which of those numbers is the ERA for the first record, that should be good enough.


#15    MGL      (see all posts) 2008/03/06 (Thu) @ 16:07

I think it is 4.43 in the first record.  The projected ERA, that is.  I thought all you needed was the id maps, that is why I did not put in the headers.  Plus, it is the original Pecota spreadsheet, probably the same as this year’s.  Do you need the mapping for the batters?


#16          (see all posts) 2008/03/06 (Thu) @ 17:15

I added all the headers from the original Pecota spreadsheet and deleted the last column (comparables) to make it more readable.


#17    Colin Wyers      (see all posts) 2008/03/06 (Thu) @ 17:48

I have the batters mapped (have a lookup function that works with the Master table in the BDB); I can put it up later tonight if anyone needs it. It’s at home and I’m at work, though.


#18    MGL      (see all posts) 2008/03/06 (Thu) @ 18:53

I mapped the batters.  Here is the link to the original Pecota spreadsheet with the Lahman and Retrosheet IDs added.  Again, no guarantees!

http://spreadsheets.google.com/pub?key=p4mB-r5bxU8jfPUCfOMF-Yg


#19    Rally      (see all posts) 2008/03/06 (Thu) @ 23:20

Added PECOTA to the same list I had in #11.  Didn’t do any validation, I’m just assuming the Lahman ID’s match up right.

Weighted error for the hitters is .0657


#20    Rally      (see all posts) 2008/03/06 (Thu) @ 23:25

And the same thing for Marcel = .0669


#21    Rally      (see all posts) 2008/03/06 (Thu) @ 23:30

So assume these one year results are accurate in ranking the ability of the systems to project:

They tell you Joe Shlobotnik will have an OPS of .800

With CHONE, you think Joe’s actual output will be between .735 and .865

With PECOTA or Community, it’s .734-.866

and Marcel it’s .733-.867

But Marcel finishes his work early and has plenty of time to eat more bananas.


#22    MGL      (see all posts) 2008/03/07 (Fri) @ 02:58

#21, you’re just saying that because you are both simians! smile


#23    tangotiger      (see all posts) 2008/03/07 (Fri) @ 07:16

It’s official then.  The Community, Marcel, and PECOTA finished in a dead heat for 2nd place, a slice behind Chone.


#24    Rally      (see all posts) 2008/03/07 (Fri) @ 09:25

Call them all a dead heat if you like, I don’t think one teeny little point of OPS is something worth bragging about.

Using one projection system instead of another is not going to win you any fantasy leagues.  The guy who won last year didn’t have better projections than other people, he was just lucky to be in the right waiver position when Ryan Braun got called up. Plus he got outbid on the top 1B, Pujols, Morneau, Teixiera, and Fielder.  So he took a flyer on a $1 scrub, and wound up with Carlos Pena.


#25    Tangotiger      (see all posts) 2008/03/07 (Fri) @ 10:04

RAlly, one last one, since you’ve got the numbers.  How about Chone+Marcel+PECOTA v Community?  Are these three forecasts basically so close that they barely improve on each other?


#26    Rally      (see all posts) 2008/03/07 (Fri) @ 18:24

An average of Chone/Marcel/Pecota does pretty well- error of .0641


#27    Colin Wyers      (see all posts) 2008/03/07 (Fri) @ 20:53

Tango, I know you once did a study on the maximum accuracy of a forecasting system by correlation; is there any way you could figure out the maximum accuracy by average error?


#28    Tangotiger      (see all posts) 2008/03/11 (Tue) @ 12:37

Good question.  First off, while I did say that the maximum r was .73 or so, it’s actually the square root of that, or .85 or so.  The reason is that a sample-to-sample correlation is .73.  But, we are really after a sample-to-true correlation.

Anyway, I don’t know the answer to your question, but I’m sure there’s a very easy way to figure it out.


#29    Tangotiger      (see all posts) 2008/03/11 (Tue) @ 12:46

Ok, I’ve got no more excuses.  I will now do the complete forecasting analysis of all forecasts that I have on my USB drive, which were provided by various of you guys.  They are:
BIS (James)
CHONE
Community
DMB
Marcel
MGL
PECOTA
Pete Palmer
Shandler
THT
ZIPS

I have a couple of doubles as multiple readers may have sent the same thing.  I will do my best to do a good job.  If someone out there wants to help, send an email to tom!tangotiger!net (replacing ! with the appropriate character).

That’s 11 forecasts, for hitting and pitching.  I would hope that this becomes the definitive evaluation of who was the best for 2007, and that Ron, Nate, Rally, Mickey, Pete, Bill, Dan, David, Chris, Tom et al accept the results.

I will be most happy if Marcel ends up last, and the Community ends up first.  I will be mortified if Marcel ends up first.  I will be please if the Community lands right in the middle.

Let’s go…


#30    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 13:13

First up, I’ll do Marcel.

There are 577 hitters that Marcel forecasted that had at least 1 PA in 2007, totalling 172,469 PA.  Of those players, they averaged .773 OPS in reality, against a forecast of .799.  So, I drop the forecast by .027 points for all players, so that I get the matching means.

The average difference in OPS forecast against reality was .069.  This is similar to the values reported by others above.

***

There are 89 hitters that Marcel didn’t forecast, but played in 2007.  They totalled 9252 PA.  That is, Marcel didn’t provide forecasts for 5% of all the PA in MLB.  These players averaged a .759 OPS in reality.  The official position of Marcel is to grant each of these players a .773 OPS (league average).  If we do that, then we’ve got 666 players who played in MLB (with or without a forecast), and the average error is .071.

Note that because it’s only 5% of the population, whether I use .773, or .759, or .800 or .700 or whatever, the average error will be between .071 and .072.

I think it becomes imperative that all 666 hitters be included in the evaluation of the forecasting systems.  You guys can offer your suggestion as to how to handle the missing guys.  As it stands, like I said, since it’s a tiny part of the population, whatever stand-in forecast you want to put in will hardly change the results.  But, they should be included.

Barring objections, the first official result:

Marcel: .071 OPS, n=666


#31    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 13:20

Ugh.  Belay those results.  I didn’t account for guys playing on more than 1 team.


#32    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 13:28

The corrected numbers:
n = 611 players total,
525 with a forecast,
86 without

OPS = for forecasted players, .772 actual, .799 forecasted (.02645 adjustment)
for missing players, .759

diff = for forecasted players, .068
if we include missing players at a league average forecast, .070

***

Marcel: .070 OPS, n=611


#33    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 13:43

The next thing I did with Marcel was look at the “reliability”.  The reliability is based on the number of past PA.  The more PA, the higher the reliability.  Breaking up the reliability groups as greater than .80, less than .60 and in-between, we get three roughly equal groups (180 players in the high, 173 in the medium, and 172 in the low).

The average error for the OPS for the high reliability group is .060.  For the medium, it’s .066.  For the low it’s .097.

How bad is the low?  If I forecasted EVERYBODY to be exactly the league average, I’d get .084 for the high, .081 for the medium, and .096 for the low.

In essence, the low reliability group is a complete shot in the dark on the part of Marcel.  And this makes sense, since these guys are regressed HEAVILY toward the mean.

***

If you forecast every player in the league with a league average OPS, you will get an average error for these 611 players of .086 in OPS.  That in essence becomes your worst-case scenario, and what everyone is trying to beat.  That Marcel is only able to close 20% of that gap (1 minus .070/.086) shows you how incredibly tough it is to forecast.


#34    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 13:58

Pete Palmer:
433 batters forecasted that ended up playing
league OPS = .775
forecasted OPS = .792

adjustment = .018

Of forecasted players, difference of .067

Of 11% missing PA (guys who did play, but were not forecasted), they had an OPS of .746, and if we give them the league average (.775), the difference for the 611 players is now .071

***

Palmer: .071, n=611


#35    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 14:18

The Community:
474 batters forecasted that ended up playing
league OPS = .772
forecasted OPS = .806

adjustment = .0335

Of forecasted players, difference of .065

Of 8.5% missing PA (guys who did play, but were not forecasted), they had an OPS of .762, and if we give them the league average (.772) as their forecast, the difference for the 611 players is now .070.

***

Community: .070, n=611

(That’s a rounding issue, as they are .0695 and AHEAD of Marcel and Pete Palmer.)

***

If we do a head-to-head of Marcel to Community of same-forecasted players, and then group by reliability, the high reliability guys have an OPS diff of .061 for the community, and the same .061 for the medium reliability guys.  These match the Marcels.

When it comes to the low reliability, the community is at .091, and also matches the Marcels.  In fact, if you predicted a league average OPS for ALL players in the low reliability group, the average difference would be .092.

As you can see, Marcel and the Community know nothing about the low reliability group.

And so, we can see why Marcel and the Community ended up in a dead heat.  Insofar as rate stats anyway.

I’ll check into the playing time soon.  My guess is that’s where Community will beat Marcel.  I hope.


#36    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 14:35

Among the high reliability group, Marcel had an average absolute error of 117 PA.  With the the medium, it was 153.  With the high, it was 159.

Of the 86 players not forecasted, hard to decide what to do with them.  If we give them the Marcel standard 200 PA, then we get an average error of 147 PA.  If we give them 0 PA, the average error is 108 PA.  If we give them 100 PA, the average error is 99 PA.  The minimum error is if we give them each 50 PA, for an average error of 90.

Overall, depending how you treat the unforecasted group, the average error is 140 PA, give or take 3 PA.

Let’s cross our fingers that the Community does better.


#37    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 15:08

Yup!

Community average errors for each of the three groups was 100 PA!  While Marcel can match it a little bit with the high reliability group (117 PA), it provided no match at all for the medium and low reliability group.  In all cases, the Community knew exactly how often each group would play.

So, what the Fans know is not the rate stats.  Marcel has that one in the bag.  What the Fans *do* knew is the “Depth chart” for their teams.  They know who is expected to play, and for how much.

It is here where the Community shines, and it is here where we need to listen.

This is fantastic, as we don’t have to worry about the performance stats being tweaked.  Fans are no help there.  What they can give us is the number of games or PA.

And on that basis, since they are equals to Marcel on a rates basis, they are better than Marcel overall.

Congratulations Community!  You are indeed smarter than a monkey.


#38    Tangotiger      (see all posts) 2008/03/12 (Wed) @ 16:02

I’ve got most of the PECOTA names mapped (8% of PA missing, when it should be less than 5%).  Anyway, I’ll get to it tomorrow, but I just wanted to give you a taste: of the 520 hitters I mapped, the average error is .069 in OPS.  And putting in the missing guys like I normally would gets us to .070 difference in OPS, a match to Marcel and the Community.


#39    MGL      (see all posts) 2008/03/12 (Wed) @ 23:40

How do you “group by reliability?”


#40    tangotiger      (see all posts) 2008/03/13 (Thu) @ 06:09

If you look at the Marcel forecasts I publish, each player has a “reliability” from 0 to 1.  The top guys are around a .85 or so, meaning I regressed them 15% toward the mean, based on how many past PAs they had.


#41    Tangotiger      (see all posts) 2008/03/13 (Thu) @ 13:17

Just as I did in post 38, I have most of MGL’s forecasts mapped.  (Basically, those I’m missing are the rookies, like Ryan Braun, Alex Gordon, etc.  I’ll get to them soon enough.  For now, this is just a preview.)

Of the 526 players that have been mapped, the OPS difference is .068.  I’m missing 5% of the PA, and if we do like we normally do, the overall difference is .070 in OPS, a match to PECOTA, Marcel, and Community.

So far, EVERY SINGLE FORECAST ACCURACY IS THE SAME.


#42    Tangotiger      (see all posts) 2008/03/13 (Thu) @ 13:25

Request to Rally, Gassko, Constancio, Dan Syzm, Nate Silver, MGL: if you have a map of the names or ids in your forecasts to the Lahman or RetroID, that’d be really appreciated.

Let me give you a couple of examples:
1. MGL provided retroids for all his MLB players, and creates his own retro-like ids for the non-MLB players (like Ryan Braun).  Now that Braun does have a retroid, I’d like to see something like:
MGLid, RetroID
braur991, braur002

2. Hardball Times would have:
last, first, retroid
braun, ryan, braur002


#43    Tangotiger      (see all posts) 2008/03/13 (Thu) @ 13:58

MGL, I just noticed you took care of the PECOTA and posted the google file.  Thanks.  I can get to work on cleaning those up.


#44    Rally      (see all posts) 2008/03/13 (Thu) @ 17:55

Tango, can you send me the list of the 611 players with your choice of ID? 

I’ll match them to last year’s CHONEs


#45    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 09:01

Rally was nice enough to send me his forecasts with the playerIds already mapped.  Here are the results:

531 players forecasted who played in 2007.

The average OPS of those players was .773, compared to a forecast of .778.  So, just a 5 point adjustment needed.

Of players forecasted, they have an average difference of .068 OPS.

Of the players not forecasted, their actual OPS was .725.  Doing as we are doing and presuming league average, our overall average difference for the 611 players is .070.

In fact, because of rounding, it is .0696.  The Community has an error of .0695.  Note two things however:
1. Rally didn’t forecast less than 4% of all PA, while the Community missed over 8% of all PA.

2. Because of the way we are counting the missing PA (presuming league average OPS), Rally takes a hit.  The average guy that Rally didn’t forecast was crappy hitters (as evidenced by the .725 actual OPS of these unforecasted players).  The Community however didn’t forecast players who ended up averaging .762.

So, I’m second-guessing my decision here to use league average for the unforecasted players.  I think using a flat .700 OPS might be more appropriate.  After all, the reason you didn’t forecast him is because you didn’t expect him to play, and therefore if he does play, it was an emergency replacement guy.

After I said all that, I tried different combinations of “implied forecasted OPS” for the unforecasted players, and this is what I get for Chone:
imply .700, total difference for 611 players is: .0699
imply .725 (the actual average for those missing players), difference is: .0697
imply .750, difference is .0695
imply .778 (actual average), difference is .0696
imply .800, difference is .0699

As you can see, it doesn’t make a whole heckavulotta difference.  (Mostly because he was missing only 4% of PA to begin with.)

It seems using the implied league average works just fine.

So, once again, yet another forecast that is identical in results to all other forecasts.

***

Chone: .070, n=611 hitters


#46    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 09:21

How did Chone do at the various reliability levels of Marcel?

For the high-reliability players, Chone was off by .060 and the medium, Chone was off by .066.  A match to Marcel.

It’s on the low-reliability players where Chone did some good: average difference of .089.  If Chone had forecasted league average for all players, he’d have been off by .095.

There were however 42 players that Chone has a forecast for which Marcel had no forecast (that is, guys with zero MLB experience).  Chone was off by an average of .111 OPS.  If Chone has simply presumed league averge for those non-Marcel forecasted players, he would have been off by only .098 OPS points.

That is, by Chone going out of its way to forecast non-MLB players, he made his overall forecast worse.

What undoubtedbly happens with these players is that even though Chone’s MLEs may say that the player SHOULD perform at a certain level, managers will simply go based on the limited PA they start to accumulate as rookies, and if they don’t perform, will simply drop them.

For rookies who DID have prior MLB experience, managers seem to give those players a longer leash, as evidenced by Chone soundly beating Marcel low the low-reliability players.  However, those players only accounted for 14% of all PA.

So, what you have is that for 83% of the players (MLB veterans), Marcel and Chone are neck-and-neck.  For the 14% of players that are rookies or sophs with prior MLB experience, Chone soundly beats Marcel (i.e., MLEs help).  For the 3% of players that are rookies with zero MLB experience, Marcel trumps Chone (i.e., managers don’t give those players a chance to prove themselves to live up to their MLEs).

All very fascinating to me.


#47    Rally      (see all posts) 2008/03/14 (Fri) @ 10:31

Or in this case, live down to their MLEs.

Maybe I should check how well the projections did compared to those players’ 2007 MLEs.


#48    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 13:23

Ok, I updated the PECOTA player lists.  I now have 546 players mapped out of 611.  The other 65 players they either didn’t provide a forecast for, or I didn’t map them.  The “unforecasted or unmapped” players represents less than 2% of all PA, easily the fewest of unforecasted players among the forecasters.  As you can imagine, even if there’s a couple that I didn’t map, it won’t make any difference.

Anyway, of the mapped players of 2007, their average error was .068.  Including the missing players the usual way, gets us a final of .070.

Again, there was a rounding, and the value is .0696, putting it in a virtual deadlock with Chone and the Community.

If every forecaster out there will agree to the following statement, I will stop this now seemingly pointless exercise:

I, as a professional forecaster, concede that when it comes time to forecasting the rate stats of hitters, that I provide very little value compared to Marcel The Monkey and The Community.

Pete Palmer has said as much in an email.  Rally has said as much in various posts.  MGL, Nate, Ron Shandler, Gassko, Sackmann, Tippett, Bill James: do you concede the point?  Show of hands please.

***

PECOTA: .070, n=611 hitters


#49    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 14:14

Among the 180 guys that Marcel has a high reliability on, PECOTA’s error was .061.  For the medium reliability players, PECOTA’s error was .063.  A virtual match to Marcel.

Among the low-reliability players, PECOTA was off by .090.  If PECOTA had simply guessed league average for all those players, their error would have been .093.  Basically, PECOTA provided barely no extra value above just random.

For the 58 players that Marcel did not forecast, but that PECOTA did, they were off by .112.  If they had forecasted league average, they’d have been off by .107.

Once again, the MLEs, minor league information, and whatever else PECOTA does, provided no new insights.

Sorry guys, Nate, Rally, MGL, et al.  I just don’t see it.


#50    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 14:25

I’ve got only 464 players mapped for Shandler, so I need to work on that one a little more.  Of these guys, the average difference is .071.

7% of MLB PA were either unforecasted or unmapped.  Doing what we normally do, the total for the 611 players is an average error of .075.  This was the worst of the group so far.

***

Shandler: .075, n=611.


#51    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 14:45

By the way, if anyone has any issue with what I’m doing, I don’t want to hear on some other site, “Tango did that, yeah, but...”

This is the right (two-way) forum to engage in the issue.  I will go to any (two-way) forum to engage in the issue.


#52    Rally      (see all posts) 2008/03/14 (Fri) @ 14:55

That’s quite a massive tie going on.

I am down with the statement that CHONE, and everyone else’s projections, provide no value beyond Marcel.

But to the extent that Marcel beats or ties us with the minor league players, that is pure selective sampling.  You have probably made this point but I think it needs to be repeated.

Once we limit our sample to players who earn major league playing time, Marcel is as good as the others.  But Marcel cannot tell you before the season starts which minor league to take in your fantasy draft.  Marcel knows no difference between Evan Longoria, Jay Bruce, and Rubi Koko (and might prefer the one with a monkey’s name).

The forecasting of minor leaguers leaves a lot to be desired, our error rates are much higher than for established players.  Last year I had Alex Gordon rated ahead of Ryan Braun. 

Yeah, if you just want to minimize error for players who earn PT, go with the league average.  But that won’t help you decide who to promote, take in the rule V draft, or for fantasy guys who to grab in a keeper league.


#53    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 15:20

No arguing here.

However, I want to add that what you are offering is not a good MLE in terms of matching rate stats.  Like I said, you could just go with the MLB average there.

What you do offer is playing time expectations.  While Marcel will give no opinion at all as to whether Braun or anyone else will get alot of playing time, your MLEs will point toward that.  A guy with a good MLE will have more playing time than a guy with bad MLE.

So, my point was based strictly on the rate stats, and not on the playing time.  And only for hitters.


#54    Colin Wyers      (see all posts) 2008/03/14 (Fri) @ 15:22

Right; Marcels wins because of selective sampling, not because it’s providing any particularly valuable insight on players without a lot of service time. For the low-reliability players, what we really ought to be doing is looking at how good the forecasting systems are at predicting who will/should be GETTING that playing time.

I’ll sketch this out real quick - I’m at work so I have almost no data available. But basically, take your low reliability player population, as well as any players Marcels didn’t forecast at all, including players in the high minors who didn’t get promoted. If PECOTA, CHONE, etc. are doing their jobs right, you should see some correlation between the players who received playing time at the major league level and the players that the forecasters saw doing well.


#55    Rally      (see all posts) 2008/03/14 (Fri) @ 15:27

I don’t think I’m offering playing time projections, at least for minor leaguers.  For major leaguers, yes.  It’s based on how often they have played in the past.

For minor leaguers it’s the same formula, but just because you see some A ball player with 400 AB in his CHONE does not mean I’m projecting he’ll get that much major league playing time, or any major league time for that matter.

And of course roles of players change as they move up.  The 500 AB minor leaguer who is good enough becomes a 150 AB backup.


#56    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 15:34

The Community did a great job at figuring the depth chart for their teams.

Seeing how well the Community did at forecasting the rate stats (#1 by a nose), and how well they did in getting the playing time right, I’m not sure that any single forecaster can beat the collective knowledge of those fans (in the short term).

After all, the fans know who is in the plans for their team in 2008, whether those players deserve to be in there or not.


#57    tangotiger      (see all posts) 2008/03/19 (Wed) @ 19:07

I’ve got 420 players mapped for Bill James. Of these guys, the average difference is .067.

10% of MLB PA were either unforecasted or unmapped.  Doing what we normally do, the total for the 611 players is an average error of .071.  This ties him with Pete Palmer, and put him just a shade behind the field: Marcel, Chone, Pecota, Community.

***

James: .071, n=611.

***

Question: why would anyone pay for the Bill James, PECOTA and Shandler forecasts, if Marcel and Chone are free, and the Community forecasts are just one effort away?


#58    tangotiger      (see all posts) 2008/03/19 (Wed) @ 19:18

ZIPS: 523 players forecasted and mapped, for an average error of .068.

The other 88 players totalled only 3% of all PA, as low as PECOTA.  Doing as we do, we get .070 as the diff.

***

Zips: .070, n=611.

Add Zips to the list of leaders.  (MGL is also in this group.)


#59    Tangotiger      (see all posts) 2008/03/19 (Wed) @ 22:15

I changed my mind again about how to value the unforecasted players.  I put in a blanket .700 OPS rather than league average.  It makes a sliver of a difference for PECOTA, ZIPS, and Chone (to their advantage), because they forecasted alot more players.  Here then are my semi-final semi-official results:

diff Forecaster
0.069 Pecota
0.070 Chone
0.070 Zips
0.071 Marcel
0.071 Fans
0.071 MGL
0.072 James
0.073 Palmer
0.075 Shandler


#60    Colin Wyers      (see all posts) 2008/03/19 (Wed) @ 23:08

Question: why would anyone pay for the Bill James, PECOTA and Shandler forecasts, if Marcel and Chone are free, and the Community forecasts are just one effort away?

Branding and presentation, mostly. All of the forecasts you mention come in nicely bound books, with glossy covers and recognizable brand names on the front. Surely a CSV file available on a website for free can’t be as good as a book I paid good money for, right? Right?

It’s the price placebo effect in action: http://www.washingtonpost.com/wp-dyn/content/article/2008/03/16/AR2008031602168.html?hpid=news-col-blog


#61    tangotiger      (see all posts) 2008/03/20 (Thu) @ 06:56

That’s interesting.  We actually priced our book because of that.  We couldn’t price our book too low, as it might be considered “cheap”.

Similarly, did you notice that Marcel gets the short shrift among other websites.  Do you know why?  Because it is NOT a black box!  The more secret the system, the more highly regarded it is!

I can put out the “Tangotiger Forecasts”, and they’d be a smash hit, as long as I don’t tell the people that it’s simply the Marcels, repackaged.


#62    Rally      (see all posts) 2008/03/20 (Thu) @ 09:46

I buy Shandler’s book not for the forecasts, but for the whole package.  He’s got some interesting ideas in there.  What really made me buy this year was a 5 year injury log.  I have quite a bit of experience working with safety & health statistics, so I was planning on entering that into a database and see what I come up with.

Not sure if I’ll ever get the work done, but I’d like to merge it with my retrosheet files and come up with an expected range of missed games by injury type and how injury types affect player performance.


#63    Tangotiger      (see all posts) 2008/03/20 (Thu) @ 10:41

Injuries: Yes, that would be good to do.  Maybe next year, I’ll do another Community one, but limiting it to the “depth chart” that I did, and ignore the OPS forecast.

***

Here is the entire set of Forecasts that I have mapped:
http://spreadsheets.google.com/pub?key=pkimQBCeCjbhAy9XfD57rZA

If someone wants to fill the holes (if any), you can contact me offlist at:
tom ~ tangotiger ~ net
(replace the obvious characters)

The DATA sheet is the 611 nonpitchers who batted in 2007.

The TOTALS sheet are sorted by best to worst (but note how very close everyone is).  The headings mean this:
Forecaster: the data as supplied to me by various anonymous donors

n1: number of players with a forecast that was mapped

totPA: number of PA of all players forecasted that were mapped

diffOPS: the average absolute difference between forecasted OPS and actual OPS (using the method in post 4)

n2: number of players with no forecast (or if forecasted, I didn’t have him mapped)

missOPS: the actual average OPS of the guys with no forecast

missPA: the total number of PA in 2007 for guys not forecasted

n3: total number of nonpitchers batting (611 for everyone)

totPlusMissPA: total number of nonpitchers batting PA (same for everyone)

totPlusMissOPS: forced in an implied .700 OPS for the nonforecasted batters, and included this with the “diffOPS” column; this is the FINAL result that shows the accuracy

Congratulations to PECOTA, Chone, and ZIPS for essentially ending up in a 3-way tie (finished within .001 from the leader).

Marcel, the Fans, and MGL also finished very close to the lead, with just a .002 from the leader.

Bill James, Pete Palmer, and Ron Shandler bring up the rear.

I have to point out that the biggest reason that PECOTA, Chone, and ZIPS led is because they forecasted the most number of players. 

***

The ONLY way I’m going to repeat the analysis next year is if forecasts are accompanied with either the Lahman ID, or the Retro ID.  It is a monumental time waster otherwise.


#64    Tangotiger      (see all posts) 2008/03/20 (Thu) @ 12:11

If I focus solely on the 440 batters that the “Big 6” all forecasted (meaning James, Palmer, Shandler have some missing forecasts), you get the following errors:

totPlusMissOPS Forecaster
0.0641 Chone
0.0642 Pecota
0.0648 Fans
0.0651 Zips
0.0653 Marcel
0.0663 MGL
0.0667 James
0.0672 Palmer
0.0690 Shandler

As you can see, Chone takes the slimmest of leads over Pecota, and the Fans make their way into the big 3.

Zips ends up in the Final rankings published a post earlier into the top 3 because Dan made forecasts for more players than anyone else around.  And the Fans dropped out a tiny smidge because they didn’t forecast enough players.

***

So, there are 3 things to evaluate a forecaster:
1. Rate stats for the guys with MLB experience
2. Rate stats for rookies
3. Playing time / depth chart

In evaluating #1, we can see that there’s little separating the top 5.  If you include #2, then Chone/Zips/Pecota step up a bit.  But, for #3, it’s likely the Fans take the spot here.

So, in terms of Fantasy forecasts, if you have to go with only 1, I’d go with the Fans.  If I’m allowed to merge the playing time forecasts of Fans, with the rate stats of the other systems, some combo of Pecota, Chone, Zips would be in order.

***

Here are the results if we look at the 172 players that Marcel had a low reliability on (reliabilty less than 0.60), but had played in MLB prior to 2007:

totPlusMissOPS Forecaster n1
0.088 MGL 171
0.089 Shandler 126
0.090 Chone 141
0.090 Zips 165
0.092 Pecota 147
0.093 Fans 120
0.096 James 78
0.097 Marcel 172
0.100 Palmer 91

MGL comes out on top here.  So, of the 172 players who Marcel didn’t trust (because of limited playing time), and did not debut in 2007, MGL provided forecasts for 171 of them.  And he was off by .088.  Marcel shows how bad he is with an error of .097.  A purely random forecast would be very close to that. 

We see here that MGL and Shandler bubble to near the top, as they used data (minor leagues) that Marcel simply ignores. 

***

And of the 86 players who made their debut in 2007:
totPlusMissOPS Forecaster n1
0.113 Zips 61
0.114 Pecota 58
0.115 Chone 42
0.119 Fans 25

Basically, a total crapshoot.

Note that the other forecasters had little to no mappings.

Which reminds me: I gotta go back and update MGL’s mappings.  I never updated his Ryan Braun’s et al mappings.


#65    Tangotiger      (see all posts) 2008/03/20 (Thu) @ 12:21

Ok, I mapped MGL’s minor league players.  He actually now leads with most players mapped of all forecasters, missing less than 1% of all PA in 2007.

It doesn’t change the overall results much from the Google Docs I published:
totPlusMissOPS Forecaster
0.0692 Pecota
0.0699 Chone
0.0702 Zips
0.0707 Marcel
0.0707 MGL
0.0709 Fans
0.0717 James
0.0730 Palmer
0.0747 Shandler


#66    Tangotiger      (see all posts) 2008/03/20 (Thu) @ 12:23

Updating post 64 for forecasts of debut players of 2007:
totPlusMissOPS Forecaster n1
0.113 Zips 61
0.114 Pecota 58
0.115 Chone 42
0.117 MGL 69
0.119 Fans 25

As you can see, there’s very little that you can do here to beat the Fans, or just plain random.


#67    MGL      (see all posts) 2008/03/21 (Fri) @ 03:57

Nice (humbling I should say) work Tango!  Surely, a combination of everyone should do better than the best one.  Any work on that?

I forgot, does Marcel do an age adjustment?  How does it do that?  What is the peak age it assumes and how much is the delta per year around that peak age?

I gotta say (as an excuse) that one reason I don’t do as well as some of the other systems is that I do my forecasts by first neutralizing everyone’s raw stats.  Then I de-neutralize them for park only. I do that only because my forecasting is to see who are the best and worst players in a context-neutral environment.  I do NOT do forecasting per se.  If I just did the forecasting with no neutralizing except for players who change teams and/or leagues, I would do better.

Oh well.  Maybe next year, I’ll do a pure forecasting.  I would have to do better than I do now.  For example, one of the neutralizing things I do is to adjust for opponents.  When I do a forecast for Tango, I don’t “de-opponent” the final numbers.  If I was just doing a forecast (and not stripping out any context), I would not do anything with opponent strength (again, unless someone changed teams).  I would simply assume that the pool of opponents on one year is roughly the same as that of the next year.

What I am trying to say is that I don’t actually do any forecasting.  I create a database of all MLB, AAA and AA players and do a “true talent” forecast.  Tango wanted to get some “real” forecasts (which is obviously what they are actually going to do, given their home park, average opponent, defense for the pitchers, etc.), so I simply took my database and put SOME of the context back in. That is NOT the way to do a forecast, so I am actually surprised that I came out as well as I did.  Then again, as Tango keeps emphasizing, anything reasonable is going to yield about the same results anyway.


#68    Rally      (see all posts) 2008/03/21 (Fri) @ 08:30

That’s what I do, neutralize the stats for park and league (though not for average strength of opponent beyond that).  Then re-neutralize to the new park and league.

If a player is staying in the same park, all that neutralizing/de-neutralizing will have a net effect of zero, or else you are doing something wrong.


#69    Tangotiger      (see all posts) 2008/03/21 (Fri) @ 08:46

MGL, when I redo by taking the average of all the forecasts, the average trumps all the forecasts, with an average error of .0678, compared to PECOTA at .0692 and Chone at .0699.  The gap between the first place and second place is as large as the gap between 2nd and 6th place (PECOTA, Chone, Zips, Marcel, MGL), with the Fans just missing that cut.

The group also forecasted a total of 609 of the 611 hitters.


#70    Tangotiger      (see all posts) 2008/03/21 (Fri) @ 12:56

Spreadsheet updated with MGL’s data, plus the “average” as mentioned above:

http://spreadsheets.google.com/pub?key=pkimQBCeCjbj3i-RBLTi1Fg


#71    Tangotiger      (see all posts) 2008/03/21 (Fri) @ 13:58

Here’s another one:
http://spreadsheets.google.com/pub?key=pkimQBCeCjbiWeeUJr9S9XQ

This time, I broke up the players by “reliability”.  As a reminder, the high reliability guys are those with lots of career MLB PA, the low reliability have very few, and the medium are in-between.  Each group has roughly the same number of hitters.  The fourth group are those who made their debut in 2007.

Explanation of the additional columns:
rel_class: see above paragraph

n: total number of players in that rel_class (same for all forecasters)

n1: total number of players FORECASTED in that rel_class (for the high reliability class, most forecasters provided forecasts for virtually all the players)

gainOPS: randOPS - diffOPS

randOPS: if forecast was exactly league average for each and every player, how much would you have been off?

diffOPS: difference between actual and forecast

***

So, the gainOPS tells you how much of an improvement the forecast is, over just guessing league average for each player.

The number 1 group of all forecasts was Marcel, with the high_reliability group.  Marcel was better than the rest (by a smidge of course).  Basically, whatever extra information Chone, PECOTA et al consider (even switching of parks!) is simply not worth it.  Shandler and MGL bring up the rear on the high reliability players.

The number 1 forecast of the “medium reliability” group was the Fans.  Shandler and James bring up the rear here.

The number 1 forecast for the “low reliability” group was MGL, pretty much in a walk.  Marcel and Palmer were less than useless, as a forecast of pure league average for each and every player would have been better!

The number 1 forecast for players making their debut in 2007 was… The Fans!  And it wasn’t even close.  However, they only evaluated 25 players.

Among those forecasters who attempted to even try to forecast a large group of debut players (PECOTA, MGL, Zips, Chone), PECOTA was the leader.  However, in each case, they would have been better off forecasting league average for each and every player.


#72    Colin Wyers      (see all posts) 2008/03/21 (Fri) @ 14:54

Tom, is there a tendency for certain projection systems to be consistantly off in a certain direction - I guess you would say “optimistic” or “pessimistic.”


#73    Rally      (see all posts) 2008/03/21 (Fri) @ 15:09

Doh - Last place with the debut players.  Bill James was the leader there (though he didn’t forecast as many as PECOTA did.) And Bill James’ forecasts for rookie players are usually considered wildly optimistic.

I’ll take that last place as something to be proud of.  Because of the selective sampling involved, the way to get the best score for the debut players is to be very generous on your MLE translations.  The flip side is that by doing so you will also predict a great many minor leaguers to have good major league numbers the next year.  Most of them won’t play at all, or play very little, but that won’t be held against you.


#74    tangotiger      (see all posts) 2008/03/21 (Fri) @ 16:56

Right.  Like I said, Marcel’s official forecast for any player that is not in my file is league average OPS.  There’s no point for me to put out a file with an extra 1000 names, if they each have the exact same batting line.

The problem is that we are NOT trying to forecast OPS, but OPS above replacement times PA.  So, that PA becomes critical. 

In Marcel’s case, the forecast for any player not in my dataset is something like 50 PA or 100 PA.  So, essentially, if his WAR is +2.25 wins per 700 PA, his WAR over 100 PA is +0.2 wins or something.

Chone however will note the greatness of say Ryan Braun, will forecast a higher PA, will presume his depth chart is stronger, and might give him a WAR of say +1.5 wins.

The conversion of WAR into Fantasy dollars would say be:
Fantasy $ = WAR * 7 + 1

So, a 2 WAR player would have a Fantasy $ of $15 (presuming it’s a 30-team league).

In Marcel’s case therefore, every player not noted would be worth say $2.  Chone will make a distinction, putting some at $1 and others even higher, much higher.

And if we have that, then Marcel would eat Chone’s dust.

But, the entire key behind this is knowing the depth chart. It’s the single most important thing to know.  And no one can beat the Community over this, as who else other than a Brewers fan would be as knowledgeable about their plans for Braun?

So, a combination of Chone/ZIPS/PECOTA for the debut2008 players, with the Community depth charts should be the ones to provide the “deadly forecast”.


#75    tangotiger      (see all posts) 2008/03/21 (Fri) @ 16:59

Colin, until I also do the pitchers, I won’t know.  Right now, Bill James had the highest expected OPS.  That might be optimistic, if and only if, his pitcher’s ERA forecasted aren’t at the same level.

I consider “optimistic” if the pythag forecast of the implies OPS and ERA forecast to be above .500.

Marcel, by definition, is exactly at .500.  So, while Marcel forecasted OPS a bit higher than actual, we’ll see that it forecasted ERA a bit higher than actual too.


#76    Colin Wyers      (see all posts) 2008/03/21 (Fri) @ 17:56

Should clarify - I meant OPS above/below average for the debut players. Sorry, we’re getting all-new systems in at work, so I’m putting in a lot of extra hours. I think it’s starting to show a bit in the “sleep” category.


#77    tangotiger      (see all posts) 2008/03/21 (Fri) @ 18:35

Colin, I can’t check at the moment, but the complete data is in post 70.  Feel free to report back.

The “debut2007” players are ALL the players that Marcel didn’t forecast.  By definition, Marcel will forecast anyone with at least 1 PA in the last 3 years.

Note that Rick Ankiel, because he counted as a pitcher prior to 2007, will show up as a “Debut2007” player.  So, “Debut2007” is really anyone who didn’t have a PA as a position player between 2004 and 2006.


#78          (see all posts) 2008/03/23 (Sun) @ 05:39

Tango, how do you compute the average error?  Is it just difference in OPS weighted by the number of actual PA?  Do you include all players or is there a min number of PA? 

Another way to look at (evaluate) the forecasters is to look at how they did in intervals of OPS.  First you take all their sub .700 OPS guys and figure a weighted (by actual PA) OPS average and then compare that to the actual weighted OPS of those same players.  Then do the same thing with .700-.725 projections.  Same for .726-.750.  etc.  That would be interesting, I would think.  Lot of other ways you could rate and compare forecasters that would be a lot more interesting and intuitively meaningful than “average OPS error.” If I told someone that I was an average of .068 OPS points off per player in my OPS projections and I asked them how do you think I did, I think they would have to answer, “Huh?”


#79    tangotiger      (see all posts) 2008/03/23 (Sun) @ 08:06

Post 4 has the exact details. 

I think the most intuitive way would be to do head-to-head.  Say we take Marcel v MGL: compare their differences to actual.  If both are within 20 points of actual, that’s a tie.  Otherwise, one guy gets a win, and the other a loss.

Nate Silver did this a few months ago.  A vast majority of the forecasts ended in a tie.

It should be noted that since Marcel’s official forecast for players not forecasted is league average, Marcel will likely win the debut2007 players (selective sampling).


#80    tangotiger      (see all posts) 2008/03/23 (Sun) @ 08:15

I should say, within 20 points of actual, or 20 points of each other, then it’s a tie.

So, an actual of .800 OPS means that a .780 forecast for one and an .810 for the other is a tie.

Or, an actual of .800 OPS, with a forecast of .900 for one and .915 for the other is also a tie.

20 points of OPS is roughly 4 runs.


#81    Tangotiger      (see all posts) 2008/03/24 (Mon) @ 09:38

Colin asked about the “optimistic” forecasts or not.  For the high reliability forecasts, the average forecast was 28 OPS points above average.  In reality, these players performed at 22 points above average.

MGL/78: this ties in somewhat to what you are asking for.  The high reliability guys are the MLB veterans, and we can see here that the average forecaster doesn’t regress these guys enough (we can’t tell here if it’s an age-based issue, or quality-based issue, or whatnot).

Marcel for example rated these 180 veterans as forecasting 20 OPS points above average.  They were actually 22 OPS points above average.  So, in this case, you might say that Marcel over-regressed.  This makes some sense, since Marcel only considered 3 years of data.  Clearly, there’s no reason not to use the 4th, 5th, 6th years.  Gives you more knowledge, and the likelihood is that these 180 veterans have such years of service.

So, here’s how smart everyone was in regression for MLB veterans:
#1 MGL: 21.3 OPS points above average forecast (compared to the 22 actual)
#2 PECOTA: 22.8 OPS point above average forecast

As you can see here, MGL slightly over-regressed, while PECOTA slightly under-regressed.  It is only 180 players, so we can’t make any definitive statements here.

#3 Palmer: 20.9 OPS points above average forecast.  Slight over-regress.

#4 Marcel: 20 OPS points above average forecast.  A bit too much regression.

Then, there’s a chasm.

An over or under-regression of 4-5 OPS points:
#5 James: 17.7 OPS points above average ... big over-regression

#6 Chone: forecasted OPS of 27.1 points above average… big under-regression

#7 Fans: 27.2 points above average

Finally, an enormous lack of regression:

#8 ZIPS: 30.7 points above average (when the average actual performance of the 180 players was 22 points above average)

#9 Shandler: 32.0 points

***

Note: I compare each forecaster’s forecast to their own overall average forecast.

***

So, ZIPS and Shandler have big problems in my view, as being overly optimistic.  Chone and the Fans are too optimisitc.  James is overly pessimistic.

Palmer, MGL, and PECOTA obviously use a similar basis as Marcel for regression.

***

How about the “medium reliability” players.  There’s 173 players in this group, and can be considered as guys with some MLB experience, but not a whole lot.

The average forecaster forecasted this group to be 19 OPS points below average, and were in fact 16 points below average.  So, not enough regression.  But, let’s see who did what.

First off, everyone didn’t regress enough.  But, the guys who were closest were:

#1 MGL: forecasted 17 points below average, and were in fact 16 points below average.  MGL knows regression.  As does…

#2 Fans: forecasted 20.2 below average compared to actual of 18.3 below average.

In this case, I have 173 players for MGL, but only 153 for the Fans, which is why the comparison point of 18.3 is different than MGL’s 16 points.

#3 PECOTA: forecasted 18 points below average compared to 15.3

#4 CHONE: forecasted 20.0 compared to actual of 16.6

#5 ZIPS: forecasted 19.4, compared to actual of 16.0

#6 Palmer: 24.0 compared to 17.9

#7 Marcel: 22.5 compared to 16.0.  So, Marcel does not regress enough, which makes sense, since in comparison, it regressed the veterans too much.

#8 James: 25.3 compared to 15.4.  Way not enough regression, the opposite of its problem with the veterans.

#9 Shandler: 30.1 compared to 16.2. 

Once again, Shandler isn’t regressing toward the mean enough. 

He had the veterans at +32 OPS points compared to their actual performance of +22 OPS points, and he had these medium MLB guys at -30 OPS points compared to their actual of -16.  As you can see, the regression amount is just plain missing.  I’m not even sure that Shandler is using any regression actually.

***

How about the “low reliability” players, those with basically just a bit of MLB playing time, the bench or sophs?

Incredibly, the overall average was spot-on: the 172 players were forecasted at being 38 points below average and performed in fact at 35 points below average.  But, big difference in individual forecasters.

The top 4 forecasters, all between 2 to 5 points, over or under the actual were:
James, Pecota, Palmer, MGL

It’s clear that MGL and PECOTA are the king of regression here.

The next set, at 10-13 points are Chone (not enough regression), Zips (not enough regression), and Marcel (way over-regressed).  Marcel forecasted these players at just 22 points worse than average, when they in fact performed at 35 points worse than average.

It’s clear to me a very quick way to fix Marcel: give a different regression point based on career PA.  Rather than regressing everyone toward the same league mean, regress the guys with 1800 PA in the last 3 years to a higher OPS than guys with 180 PA in the last 3 years.

Man, I really, really, really, don’t want to change Marcel.  But, this seems to be such an easy correction.  Let me think about it…

Dead last is Shandler, with a forecast of 55 points below average, when they performed at 27 points below average.

***

Finally, the guys who made their debut in 2007.

The average forecaster forecasted an OPS of 47 points below average… makes sense right?  But, these rookies performed at only 14 points below average.  That’s why Marcel can get by in forecasting league average.

I’ve only got 5 forecasters with enough sample size, and, once again, the one who is closest was MGL, forecasting 19 points below average against an actual of 13 points below average.

The way pessimistic forecasters were:
PECOTA (-21 compared to -2)
Fans (-77 to -48)
Zips (-50 to -5)
Chone (-48 to +8)

***

So, for this little exercise, I’m going to proclaim MGL the king of regression, and Shandler as seemingly eschewing regression altogether.

***

What’s interesting to me is that even though all these forecasting systems have such a different basis for regression, they all come out so close to each other in the end!


#82    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:17

Similar to the reliability groupings, let’s see how the forecasters did by quality of batters.  I’m using the Marcel forecast as the baseline to tell us the quality of batters.

I have 23 batters forecasted by Marcel with an OPS of at least .900.  So, I look at those 23 players, and see how each of the forecasters forecasted them AS A GROUP. 

Marcel forecasted those 23 batters to be +.147 OPS above league average.  In fact, they were +.164.  So, Marcel was too conservative with these guys.

However, among the 35 batters between an OPS forecast of .850 to .900, Marcel forecasted +.070, but were in fact only +.047.  So, Marcel was too optimistic here. 

If you combine the two (58 players), it’s +.100 forecasted, and +.094 actual.  Pretty good.

Among the 47 batters with an OPS forecasted of under .700, Marcel forecasted an OPS of -.119 relative to league, and in fact were -.118.  So, great job here.

For all those in-between, Marcel was dead-on.  So, we see that Marcel has the right regression amount, more or less.

How’d the others do?  Gimme a minute…


#83    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:29

Here’s Chone:
qual n1 forecast actual
0.675 44 -0.113 -0.115
0.725 122 -0.059 -0.072
0.775 170 -0.022 -0.020
0.825 95 0.014 0.028
0.875 35 0.071 0.047
0.925 23 0.161 0.164

“range” of .775 implies a Marcel-forecasted OPS of “.750 to .800”.  These 170 players were forecasted to be 22 points below average, but were in fact 20 points below average.

As you can see, Chone nailed the 23 great hitters, and was similarly flummoxed by Marcel (see post 82) for the next set of 35 great hitters.

Excellent job overall.


#84    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:32

PECOTA:

range n1 forecast actual
0.675 43 -0.137 -0.116
0.725 121 -0.070 -0.070
0.775 175 -0.018 -0.019
0.825 96 0.019 0.028
0.875 35 0.076 0.047
0.925 23 0.151 0.164

Like the other 2, flummoxed by the 35 second-set of great hitters.

Among the great hitters, too conservative, just like Marcel. 

Way too pessimistic among the bad hitters.  Possible that PECOTA’s aging curve is too aggressive for the bad hitters.

Not as good as Chone, and just a step behind Marcel.  So: Good.


#85    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:37

MGL:

range n1 forecast actual
0.675 47 -0.117 -0.118
0.725 134 -0.065 -0.071
0.775 189 -0.022 -0.019
0.825 96 0.013 0.028
0.875 35 0.065 0.047
0.925 23 0.182 0.164

Somewhat similar to Marcel, but too aggressive on the great hitters.

Overall: Pretty good, a step behind Chone, a step ahead of PECOTA.


#86    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:41

Shandler:

range n1 forecast actual
0.675 43 -0.144 -0.118
0.725 116 -0.080 -0.071
0.775 159 -0.030 -0.018
0.825 91 0.021 0.030
0.875 35 0.092 0.047
0.925 23 0.173 0.164

Did great on the great hitters, too strong on the second-set of great hitters.  And far too aggressively pessimistic on the bad hitters.

I really like Ron, and he was good to us when we launched The Book.  So I hope he doesn’t take this personally.  But, I have some doubts about his forecasting system.


#87    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:43

Bill James:

range n1 forecast actual
0.675 37 -0.139 -0.118
0.725 98 -0.079 -0.069
0.775 136 -0.026 -0.018
0.825 88 0.018 0.030
0.875 35 0.074 0.047
0.925 23 0.161 0.164

Nails the great hitters, pessimistic on the bad ones.  Overall, a shade behind PECOTA.


#88    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:45

Zips:

qualClass n1 MopsAA AopsAA
0.675 45 -0.136 -0.117
0.725 131 -0.068 -0.072
0.775 185 -0.023 -0.020
0.825 96 0.021 0.028
0.875 35 0.083 0.047
0.925 23 0.172 0.164

Similar range as Bill James.


#89    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:48

Community:

qualClass n1 MopsAA AopsAA
0.675 38 -0.142 -0.125
0.725 104 -0.075 -0.067
0.775 154 -0.032 -0.022
0.825 95 0.025 0.025
0.875 35 0.087 0.047
0.925 23 0.163 0.164

Fantastic on the great hitters, a but too pessimistic on the bad hitters.  Overall, similar to PECOTA.


#90    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:49

Pete Palmer:

qualClass n1 MopsAA AopsAA
0.675 39 -0.116 -0.123
0.725 110 -0.063 -0.073
0.775 137 -0.022 -0.019
0.825 89 0.017 0.029
0.875 35 0.065 0.047
0.925 23 0.130 0.164

Seems too conservative among the good hitters.  Overall, a bit like Bill James.


#91    Tangotiger      (see all posts) 2008/04/01 (Tue) @ 14:55

Overall:

If I take the absolute difference of each quality class, and add them up, I get the following order:

58 Chone
61 MGL

73 PECOTA
76 Community
79 Zips
81 James
85 Palmer

109 Shandler

So, Chone is the best, followed very closely by MGL.  Shandler is way at the bottom, and everyone else is in the middle.

Note: Marcel is excluded since I didn’t want to possible bias the rankings.  I used the Marcel forecast to decide which group a player goes to.  In the event that this didn’t bias the results, then the OPS difference for Marcel was .055 points, making it the best of the group.

***

It seems clear to me that when I group by “reliability” and by “quality”, that MGL is the best one at regression.  Chone is very strong when we focus on the quality level of the player, while PECOTA is very strong when we focus on the sample size of the player.

Marcel is among this class.

Shandler has his own forecasting system that seems to ignore regression.

The other systems are a bit weak on the regression front.


#92    Tom Meagher      (see all posts) 2008/04/01 (Tue) @ 20:22

Sorry if this double posts, but it didn’t go through.

Didn’t Shandler explicitly say he cheats on his forecasts - that is, he tweaks the numbers to “convince” fantasy players to draft/avoid players by giving them exaggerated lines?

OK, I found the article.
http://www.baseballhq.com/books/myths.shtml

I understand his point, but if he really wants to show us his stuff works, shouldn’t they release the ‘real’ projections in addition to the fudged ones?

From the article:
“‘For any player, what is the one piece of information that is far more important than the most accurate projection?’ That information is how the other owners in your league value that player. If you know that, and have a sense of a player’s potential, it doesn’t matter a whit how accurate your projections are.

“So our track record is not necessarily built on any given level of prognosticating accuracy. Our track record is built on a series of analytical tools and a decision-making process that has led to success in playing this game. And since your ultimate goal is to fare better in your fantasy competitions, I see this all as a justifiable means to an end.

“I’m not publishing deliberately inaccurate projections. I’m just taking a potential reality from an upper or lower decile, based on strong underlying indicators, and engaging in a bit of behavior modification. If you are offended by the psychological implications, I apologize. If you now consider me a sabermetric hack, I’ve been called worse. But the users of this information seem to be winning their leagues so I’ll accept the baggage that comes along with it.”

I don’t buy that the evidence is on Shandler’s side here, although I have no conviction either way. They can easily claim that their approach has generated a lot of success and hence business/reputation, but I don’t see the evidence that this is because they’re willing to manually adjust the numbers (that is what he’s saying they do, right?). If his actual projections don’t do better than the other projection systems, how exactly does that give the fantasy player an edge?

Now, it could be that they are right about the reasons they choose to alter the forecast from the strict formula. That is, if they know their customers will themselves be making manual adjustments, and if they know that their manual adjustments will be more fruitful, then they’re onto something. I’ve never played fantasy baseball, but I suspect that I would not benefit particularly from Shandler’s alterations because I simply refuse to incorporate information non-systemically. In other words, Shandler can be helping (though the evidence is missing, to my knowledge) players who will otherwise let their biases screw up their valuation, but someone who won’t make ad hoc adjustments doesn’t figure to benefit.

I don’t mean any of this to knock their methods or innovations, which I have no reason to assail. I wonder, though, if the reason their customers are (supposedly) successful is because they understand the underlying rationale and methods better than players who prefer other services, and not because Shandler et al are the best in the business at countering their customers’ supposed irrationality.

There may also be a refutation of Shandler based on the fact that the fans’ forecast did better, but I’m not ready to connect the dots.


#93    tangotiger      (see all posts) 2008/04/02 (Wed) @ 06:07

He can’t claim “seem to be winning” without analyzing it.

In fact, I think perhaps the best way to test this is to have a 10-man Fantasy league (several of them), where each player relies EXCLUSIVELY on the forecasts of ONE system, and selects based on that.

In fact, I can program this, by forcing the same algorithm for selection of position, so that all we are left as the variable is the forecasting system.

I could set up 10 factorial different leagues, where each league has a different ordering of draft position for the 10 forecasting systems.

Maybe we can do this for the 2009 season.


#94    Rally      (see all posts) 2008/04/02 (Wed) @ 08:22

I can’t make any claim that the CHONE system will win you any fantasy leagues, after finishing 1st in 2004/2005 I’ve dropped out of the money the last 2 years.  Maybe my problem is that anyone in my league can see my projections.


#95    Tangotiger      (see all posts) 2008/04/02 (Wed) @ 09:02

Frankly, I don’t think any of them will win anymore than any others.  Maybe Chone will win 12%, and James/Palmer will win 8%, and the rest 9%-11%.  I think that’s pretty much it.

You see, it’s not even important that Marcel forecasts rookies like Ryan Braun or Tulowitzki last year if you are in a draft league.  Or Longoria/Bruce or whoever else this year.  Let’s say that Tulo should have been valued at $15 coming into 2007.  But, 2 of the teams had him at $9, 7 of them had him at $0 and one had him at $15.

How much will he go for?  Well, you don’t know what the other 9 guys have him valued for.  You may HOPE that they didn’t value him at $15.  So, when you get to the $14-$16 draft rounds, you have to start deciding if the other guys have him at $15 or not.  Likely the most you are going to do is figure that someone else has him valued at $12, so you pick him up in the $13 draft round.

Marcel has completely ignored him, and no matter.  While Tulo, a $15 player was picked up in the $13 round, Marcel is drafting a $13 player in that round.  Big deal.

In an auction, it makes a much bigger difference.  But, as long as you have one other guy who is close to valuing Tulo correctly, Marcel can stay out of the bidding, content the the rest of the teams will prevent a $15 player from being bought for $5.

It’s like the technicals and fundamentals of stocks.  Technicals only work, if the fundamentals already work.  The fundamentals tell you how much MSFT and EBAY are worth.  The technicals will trade around that.  If there were no fundamentals to speak of, the technicals would be like trading penny stocks, with no rhyme or reason as to how a stock is priced.  You need some sort of centering, which is the fundamentals.

So, as long as you get a couple of people who come in with an MGL forecast for all players in MLB and minors (the fundamentals), that’s enough to force a decent price, and lets Marcel stay afloat.


#96    Tangotiger      (see all posts) 2008/04/02 (Wed) @ 13:35

The average of each player’s forecast:

range n1 forecast actual
0.675 47 -0.125 -0.118
0.725 135 -0.066 -0.071
0.775 189 -0.022 -0.019
0.825 96 0.022 0.028
0.875 35 0.080 0.047
0.925 23 0.164 0.164

Better than all the forecasting systems.


#97    MGL      (see all posts) 2008/04/03 (Thu) @ 06:18

I don’t get at all from the Shandler quote above that he is “deliberately giving out less accurate projections than he is capable.” That makes no sense whatsoever.  I am not sure exactly what he is saying, other than he is hemming and hawing in case someone points out that his projections were not very good after a season is over ("Well, I said that they were deliberately bad,” or something like that).  My best guess is that he is saying that his projections are not like other ones (Marcel, Pecota, etc.), in that he uses certain “keys” that sometimes or perhaps even often go against typical Marcel-type projections.  If that is the case, which I think it is, clearly it ain’t working.

“I’m not publishing deliberately inaccurate projections. I’m just taking a potential reality from an upper or lower decile, based on strong underlying indicators, and engaging in a bit of behavior modification. If you are offended by the psychological implications, I apologize. If you now consider me a sabermetric hack, I’ve been called worse. But the users of this information seem to be winning their leagues so I’ll accept the baggage that comes along with it.”

I have no idea what this means, which in and of itself, is not right. If someone writes something ostensibly important about their projection system, and sells it, and it makes little sense, something is wrong.  The bolded part sounds like gobbledygook to me.

Like Tango, I have nothing against Shandler, and he seems like a real nice guy (I met him briefly when I was working for the Cardinals), and probably does a lot of good work. If he reads this blog and would like to explain himself, he is more than welcome of course.

BTW, this is the type of analysis that I think is much better than just giving us the average forecasting error per player (weighted by the number of PA I think).  These results also are an example of how misleading and/or deceptive the “average error can be” especially when you can forecast just about anything for a player and do pretty well with “average error.”

I would like to see the above breakdowns for low, medium and high reliability players, and debut players especially.  Before, based on average error, there was the inference that randomly forecasting debut players or just giving them leave average, or rookie average forecasts would have done just as well or better than most if not all of the forecasters.  I don’t buy that and I doubt that that is the case.

Also, I don’t think Tango, you should use Marcel to determine the classes.  That makes little sense as Marcel could be awful at projections within classes.  You should use classes based on each projection system or based on actual performance. 

For example, you want to look at all of Chone’s projections between .650 and .700 and see how they did.  Then all of his .700 to .750 and see how they did.  etc.

Or, equally good is to use all actual .650 to .700 players, and see how Chone did for that group.  There is absolutely no reason to use Marcel to determine the classes.  Of course, it won’t make much difference what you use to determine the classes, but my way I think makes more sense and is fairer to all forecasters.

For fantasy leagues, it is WAY more important to have good playing time projections.  Way more.  As we can see, anyone can nail the rate projections.  It is also important to have good projections for rookies and low reliability (little experience) players.  Again, anyone can project what A-Rod and Pujols and Jeter are going to do.  Not so with Rasmus, Longeria, Votto, Soto, etc., all of whom may get significant playing time and perform really well.

The guys that win the fantasy leagues, besides getting lucky of course, can pretty much use anyone’s (reliable ones of course) projections.  If they know how to value players (in draft $) from projections (I have no idea - I’ve never played fantasy baseball), given the type of league, and know how to conduct themselves in a draft, that is about all they can do.  I assume that if they do all that they will have a significant edge over the field regardless of whose projections they use.  Unless almost everyone else in the league are expert fantasy players as well.


#98          (see all posts) 2008/04/04 (Fri) @ 00:25

Wow, this is terrific stuff and very interesting to read.  Thanks, Tango, for doing all the work on this. 

I used your method over the winter with the forecasts I had for 2007, but with a much smaller population of hitters (I only really cared about the guys I might draft, which ended up being 223 hitters, as opposed to the 600+ that you used).

My results:

All Projections Combined .065
Marcel .065
PECOTA .066
ZiPS .067
CHONE .068
Fantasy Sports Central .068
Bill James .068
Rototimes .069
Player’s OPS in 2006 .077

The “All Projections Combined” entry came out on top, but it’s very close. 

In previous years, I used Shandler’s Forecaster, but stopped using it last year when I discovered that they weren’t including HBPs in figuring On-Base Percentage.  I contacted Shandler about this and he responded that they had intentionally left out HBPs in figuring OBP because they did a study 15 years ago that “showed that there was virtually no correlation between individual player hit batsmen numbers from one season to the next, except for a miniscule subset of players (0.3% I recall).” That bothered me probably much more than it should have and I just stopped buying the book.  Plus, there are plenty of free projections out there and I didn’t see any benefit to buying projections that weren’t any better than the free ones.  That probably sounds too much like I’m bashing Shandler, which I don’t want to do.  I have friends who swear by the Forecaster and Baseball HQ, but for all the hype some projections get, it looks like Marcel the monkey is right there with them.


#99    salb918      (see all posts) 2008/04/17 (Thu) @ 08:32

I’m late to the party, but this was fantastically interesting.  Thanks, Tango.

If the only place to make improvements is with playing time forecasts, wouldn’t it make sense to use Marcel to project rate stats and the community to project playing time?

I know that Nate Silver shows that playing time and performance are coupled, which is probably strictly true.  But decoupling would be much easier and maybe even more accurate!

Last year, I wrote (click on my name):

And yet, when we project variability, we treat performance and playing time together. As Dave Studeman noted on Ballhype, performance variation can be enhance or detract from the value of a player. Playing time variation, or injury risk, can only detract. What I propose, in the most general terms, is that performance projection ought to focus on rate statistics (AVG, OBP, GPA, MLVr - pick your favorite) and that a separate projection be developed to account for playing time. I imagine that such a projection would involve:

# A baseline distribution based on the occurrence of traumatic injuries to all players.
# A positional factor to account for the type (e.g., collisions for middle infielders, hamstring pulls for outfielders) and frequency (I’m looking at you, catchers) of injury.
# A specific factor based on a player’s injury history/proclivities.

What I didn’t realize was that the PT projection could just be community-based and would implicitly account for all of those factors!


#100    Tangotiger      (see all posts) 2008/04/17 (Thu) @ 08:54

MGL/97:

I would like to see the above breakdowns for low, medium and high reliability players, and debut players especially.

This was answered in very long post 81, with which I concluded:

So, for this little exercise, I’m going to proclaim MGL the king of regression, and Shandler as seemingly eschewing regression altogether.

***

Also, I don’t think Tango, you should use Marcel to determine the classes.  That makes little sense as Marcel could be awful at projections within classes.  You should use classes based on each projection system or based on actual performance.

As I must have written elsewhere, it won’t really matter what I do, considering how well Marcel matches up to the rest.  Regardless though, you get a huge advantage: every forecasting system is being compared against the same group of players.  All I’m looking for is a reasonable group of “high quality players”.  They do *not* have to be the highest of quality, as per each system.  All you really need is a reasonable set of high quality and low quality. 

Really, this would come out to answering the question: “Hey, I think t