Tuesday, March 04, 2008
Community Forecast, 2007 - Preliminary Results
Here’s the Google Docs of the Community Forecast for last year, 2007, for hitters. Here’s how to read the chart:
Player: Albert Pujols (playerName), pujolal01 (LahmanID), 405395 (MLB.com/Elias playerID)
Forecast: 29 ballots (n1) averaged forecast OPS of 1.101; 32 ballots (n2) averaged forecast of 154 games; appeared on 94% of Cardinals ballots
And for pitchers. I added the teamid. There’s also a “depth” column, which is really just a sort order. The five new columns (Starter AceReliever Setup_or_Swing Mopup Callup ) is a percentage of all the ballots cast, where the fan thought that’s how the pitcher would be used.
I did substantial data cleanup, but I may have more to do. That’s why I’m calling this one preliminary. But, a first pass look seems to be reasonable, and I doubt any further cleanup will change much. If you spot anything irregular (like all of a team’s players are too low), let me know.
I’d love it if someone out there did a study here.
I added the pitchers data as well. (See above.)
***
I’m hoping someone out there does a good study here. Here’s a couple of ideas:
1. We regress the Saves counts alot in Marcel, simply because we are not sure who will be the closer. I would guess that having the Community tell you gives you an extra parameter to help in the regression.
2. The usage pattern of the pitchers (starting rotation, swingman, callup, etc) will give you a better IP point to regress to. Same idea for hitters, where you have a better regression point for games played (implicitly gives you the injury history of the player).
3. How delusional is the Community? The ERA forecasts for the ace relievers are super low. Does the Community simply not understand regression? How much do you have to regress their ERA and OPS forecasts? (e.g., with Marcel, Chone, PECOTA, when you run a correlation of forecasts to performance, the “slope” will be extremely close to 1… I don’t think that’s the case with the Community).