Great, Tango. A wonderful public service.
Questions: no park or league adjustments, right? Also, the pitching file has a “league” column, but the batting file doesn’t. Why is that, and what does “league” mean in this case?
I’ll have to look at my code, but I think the reason I did that was because AL pitchers face more DHs than NL pitchers, which gives NL pitchers an advantage. Any shift from NL to AL would mean an increase in ERA.
Of course, I don’t handle league adjustments beyond that, so if the AL is even stronger than the NL, aside from the DH, then the ERA should be even worse. But, this particular problem will exist with hitters too.
For league adjustments, I think it’s simply that I use the league stats for regression toward the mean (for pitchers only I think). For hitters, I regress everyone to the same mean. I think. I gotta open up my code to see exactly what I did.
No, no park adjustments.
I introduced Marcels here three years ago, and the code hasn’t changed at all:
http://www.tangotiger.net/archives/stud0346.shtml
One other question: it looks like your projected ERA is based on both projected earned runs and projected base earned runs (and average of the two). Is that right? If so, are projected earned runs simply projected from the past using your formula, and are projected base earned runs based on projected components? If so, which ones?
Thanks, Tango.
Right, as per the last line in post#28 in the archive thread.
Right, the ER is simply based on the standard Marcel equation, and BaseRuns ER uses a “basic” version, that looks only at H, HR, BB, and IP. I don’t have it handy at the moment, but something like the B equation in BaseRuns is (H+HR+.1BB) or something simple like that.
When I read comments here:
http://p092.ezboard.com/fbrewersfandemoniumfrm3.showMessageRange?topicID=13719.topic&start=1&stop=99
like this:
probably my least favorite of the projections anyway...Seems most do at least one thing pretty well (either pitching or hitting) and marcel is bottom of the pack on both.
Ugh.
It’s as if people can just invent whatever they want.
First, the intent of marcel is laid out here:
http://www.tangotiger.net/marcel/
And as Rally has shown:
http://lanaheimangelfan.blogspot.com/2006/12/pitcher-projections.html
Marcel was pretty much in the middle of the pack for pitchers. .451 for Pecota, .445 for BIS and .442 for Marcel is pretty much the same thing. (Shandler was the lowest)
And, as I’ve shown here:
http://www.insidethebook.com/ee/index.php/site/comments/whos_smarter_than_a_monkey/#7
and here:
http://www.insidethebook.com/ee/index.php/site/comments/whos_smarter_than_a_monkey/#12
Anyone who wants to claim that a forecasting system is better than another has little leg to stand on. All forecasting systems should be humbled by how barely better, if at all, they are above Marcel. And Marcel gives all other systems a huge head start by intentionally not using minor league, scouting, or park factors. And yet, it still does extremely well.
***
As for the other comment on the Brewers board how the sum of all the pitcher wins is way above the number of games (and you’ll find something similar with player AB-H being way above to): the system intentionally puts a floor of 200 PA on every single batter. If I had 10,000 batters, I’d still give each one at least 200 PA.
This is an extremely basic forecasting system, and it’s not trying to figure out who will make the 25-man roster. It doesn’t know, so it bumps everyone to at least 200 PA.
A sim like Diamond-Mind would be much better if you want to get technically accurate.
You’ll notice that the majority of your points were already brought up in that thread. There just seems to be a lot of misconceptions about projection systems as a whole. For instance, many get caught up in counting stat predictions based on AB projections. If you think he’ll get more or less ABs, adjust accordingly!
Post projections from any system on a general baseball board and you’ll see some very similar objections.
What program must be used to read the player forecasts?
You need winzip or other zip software to unzip, then any plain text editor to read it. Notepad, wordpad, textpad, even MS Word.
However, Excel will automatically parse it for you.
I wanted to note that the Brewerfan.net link Tangotiger provided above has been moved to the site’s Statistical Analysis forum.
New location:
http://p092.ezboard.com/fbrewersfandemoniumfrm21.showMessage?topicID=124.topic
Threads on the site’s Major League board are dropped once the forum becomes ‘full’. The move will preserve the thread for future reference and further discussion down the road.
Tango, should the BB and IBB columns in your Marcels be added together for total BB?
Nope. The columns match exactly to the Lahman or BDB databases. So, “BB” is total walks.
I just noticed that Bill James is extremely high on Bonds:
http://www.fangraphs.com/statss.aspx?playerid=1109
Including an insanely high number of walks.
Each of the forecasting systems also give Bonds, essentially, a 50/50 shot of breaking Aaron’s record, but James’ forecast gives him a far greater shot.
Tango—I noticed that Marcel doesn’t have a view on Matsuzaka. If it did what would it be? League average?
Correct.
DSG brought this up elsewhere, and I should set it in stone here. The issue is: if we weight 2006 more than we weight 2004, then shouldn’t we weight Sep 2006 more than Apr 2006?
YES!
Theoretically, anyway. And this is how I do it:
weight = .9994 ^ (daysAgo)
So, for a game that was played 365 days ago, the weight is .80.
If the game played was yesterday, the weight is .9994. If the game played was 180 days ago, the weight is .90.
As you can see, it won’t really make much of a difference, but, if you have things in game log format, you might as well do it.
***
Note that the “.9994” is actually:
EXP(LN(0.8)/365.25) = 0.999389253…
The equation I want to solve is:
x^365.25 = .80,
and the above equation solves for x as .9994.
***
The stats from one year ago is weighted at 80%, at two years ago it’s weighted at .64%, etc.
Feel free to work out the numbers yourself, as maybe .78 or .83 works better than .80.
***
For pitchers, I use .70, not .80. So, in their case, the performance of 180 days ago is weighted at only 0.84.
Again, feel free to work it out yourself, and maybe it’s really .65 or .72 or whatnot.
Thanks for the work and the explanations. A couple of questions (feel free to point me somewhere if you’ve answered this before).
Why are PA estimated in a different way than the other stats? Are the .5 and .1 factors derived from something?
If I were to estimate fielding stats, are they more or less reliable than hitting or pitching? I figured they were somewhere in the middle, so I used a 4/3/2 weighting scheme.
I received two requests from students this week asking for more Marcels.
I have now added the Marcels back to 2001.
It takes about a minute to run each season, but I have to manually set it up. I could spend some time to tweak it to generate the Marcels back to 1958 or so (complete stats start in 1955 I think). I don’t know if it’s worth it, but if I get, I dunno, three of you to say that you want it, then I’ll do it.
Terrific, thank you!