THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, November 17, 2008

Marcel 2009 is here

By Tangotiger, 05:43 PM

Lots of requests, so here it is

Consider it preliminary until I can give it a once-over.  But, I expect no changes.

As always, don’t repost or redistribute.  Just be nice enough to send whoever wants them to my site.  Bandwidth is not an issue. 

UPDATE: David at Fangraphs has it in an easy-to-use manner.


#1    Ryan      (see all posts) 2008/11/17 (Mon) @ 18:52

Awesome, thanks. Is it cool to talk about specific projections in a blog as along as we link back? Like say we wanted to blog about the Pirates pitching staff?


#2    Tangotiger      (see all posts) 2008/11/17 (Mon) @ 20:31

I should have been clearer.  You can repost whatever you want, except the file in its entirety.

Also, you can repost with bells and whistles, like Fangraphs and Hardball Times do.

Basically, the only thing you can’t do is simply upload the exact file you just downloaded.  Hard to believe, but some people actually did that.


#3          (see all posts) 2008/11/17 (Mon) @ 20:55

I’ve mapped this name/ID set to the current names and IDs at MLB and CBS. I’ll be adding other name/ID sets as I get to them. You can
download the file (Excel 07 format)
here.


#4    Hyltzn      (see all posts) 2008/11/17 (Mon) @ 21:07

Great. I’ve been waiting for these. Is wOBA park-adjusted in the spreadsheet?


#5    Tangotiger      (see all posts) 2008/11/17 (Mon) @ 23:44

If you know Marcel, you know he doesn’t know anything about parks.


#6          (see all posts) 2008/11/18 (Tue) @ 00:10

The wOBA is only to two decimals - I assumed it would be 3. Is this in error?


#7    terpsfan101      (see all posts) 2008/11/18 (Tue) @ 00:59

It is probably just a rounding issue. I know that Access often truncates decimals to 2 places when you use the export feature. That is why I never use the export feature. I simply copy and paste the table into Excel, and then save it as a spreadsheet or CSV. CSV stores a maximum of 9 decimal places, XLS stores a maximum of 18 decimal places.

It’s interesting that you brought this up Brian. Because the person who provides the Zone-Rating database (he does the CAIRO projections) truncated the zone-rating values to 2 decimal points. ZR is typically presented in 3 decimal points.


#8    Tangotiger      (see all posts) 2008/11/18 (Tue) @ 08:02

What terps said.  I didn’t verify the output.


#9    Terry      (see all posts) 2008/11/18 (Tue) @ 10:19

Thanks Tango. Your work always stimulates the brain but it’s fun as hell too.  I really appreciate your efforts.


#10    Tangotiger      (see all posts) 2008/11/18 (Tue) @ 11:25

Ok, I just checked.  The league average baseline wOBA id .332.  So, when you do your comparisons, compare against that number.

Pitchers are .164, but I don’t provide forecasts for pitchers-as-hitters.

***

As for pitchers, the league ERA is 4.30, and league RA is 4.65.


#11    Tangotiger      (see all posts) 2008/11/18 (Tue) @ 11:51

Actually, Marcel does do pitchers_as_batters, and so, here are the leaders/trailers according to Marcel:
wOBA nameLast nameFirst
0.236 Owings Micah
0.224 Zambrano Carlos
0.215 Willis Dontrelle
0.211 Wainwright Adam
0.202 Backe Brandon
0.202 Peavy Jake
0.193 Parra Manny
0.191 Looper Braden
0.187 Sabathia C.C.
0.186 Eaton Adam
...
0.143 Rodriguez Wandy
0.143 James Chuck
0.142 Snell Ian
0.142 Smoltz John
0.142 Myers Brett
0.141 Bergmann Jay
0.140 Maddux Greg
0.136 Harang Aaron
0.128 Sheets Ben
0.123 Davis Doug

They all get regressed to .164, but, that’s not necessarily a good way to do it.  If you believe that Micah Owings could be a legitimate position player, then you would regress him to .332, not .164. 

That’s why you have to be VEEEEEEEEEEERY careful when regressing toward players of the same position.  For example, when ARod was a SS, some people would regress his stats to those of the average SS, which I always thought was ridiculous.  After all, you look at his body, and you realize he could be a 3B or 1B.  And so, just because he was such a good fielder doesn’t mean you now regress his hitting to such a lower level. 

Same thing here.  Is Micah Owings a legitimate hitter, such that if he were not a pitcher that he would actually be able to pull an Ankiel?

If so, it doesn’t make sense to regress him to .164, but instead to .332.


#12    Tangotiger      (see all posts) 2008/11/18 (Tue) @ 12:07

Ok, I uploaded the file with the better format on the numbers (integers are integers, and wOBA has 3 decimal places).


#13    devil_fingers      (see all posts) 2008/11/18 (Tue) @ 12:09

Tango:

Thanks so much for making this so readily available. I got mine into excel and have it calculating wOBA, bRAA, RV/700 with SB/CS figured in etc. Good times.

You actually brought up something I wanted to ask about—are you saying that if we want to do bRAA we should use a lgWOBA of .332? That seem pretty low, as opposed to the number between .335-.340 you suggest elsewhere. Did I misunderstand the point you’re making above? Why such a low number?


#14          (see all posts) 2008/11/18 (Tue) @ 13:53

I have batting totals by league and position and year which I copied from BPro into a spreadsheet. I have not yet updated with 2008. For 1998-2007 (post expansion) I got the same .332 wOBA for position players, ranging from .312 for C to .357 for 1B.

Right now I list both BRAA compared to all position players, and also to primary position.

Another example is Josh Phelps. He came to the majors as a catcher, moved to 1b/of for poor fielding. The Pirates did start him a couple games at catcher in 2007. My projection for Phelps is .355 wOBA - which is above the average of all players, but just about MLB average for a 1b. If he can catch, his bat becomes more valuable, but you have to subtract off the defensive runs below average. Most teams are unwilling to take the defensive hit, even though he has considerable experience at that position in the past. So, even though Phelps is above league avg, he’s stuck at 1b defensively, where he’s jusy avg offensively, and therefor a AAAA player, hitting 25-30 HRs in AAA every year (but then there’s guys like Adam LaRoche who are no better offensively, but have a job locked up, at least for a while.)


#15    Tangotiger      (see all posts) 2008/11/18 (Tue) @ 14:17

Devil: scoring was down this year, and the average expected for 2009 is .332.  It’s no biggie really, as long as we know what the average is.

***

Fangraphs has the Marcels in an easy to use fashion:
http://www.fangraphs.com/projections.aspx?pos=all&stats=bat&type=marcel


#16    devil_fingers      (see all posts) 2008/11/18 (Tue) @ 14:54

Tango/15:

OK, that makes sense. Would you use .332 as the 2009 wOBA baseline even with other projections systems (e.g., Zips, CHONE), or only with Marcels because that’s what it spits out for next year?

All of the sudden Adam Dunn is looking like a 2.5 WAR player, which I hadn’t expected given his defensive limitations. But that’s a different thread.


#17    JD      (see all posts) 2008/11/18 (Tue) @ 14:58

Tango,

Never really checked Marcels before so this is a rather elementary question. What is a “good” reliability number. Anything over 50%? 75%? Obviously for some players (Gergorio Petit and his .09 reliability number) the Marcels shouldn’t be looked at too much. But what is the number where I can really start to trust the projection?


#18    Tangotiger      (see all posts) 2008/11/18 (Tue) @ 15:13

This is probably the most relevant post for you:
http://www.insidethebook.com/ee/index.php/site/comments/community_forecast_2007_preliminary_results/#64

(post 64)

Basically, if the reliability is under .60, don’t trust Marcel.  MGL, Shandler and Chone came through here, with ZiPS, PECOTA, and the Fans themselves not far behind.

***

Here’s the pitcher thread:
http://www.insidethebook.com/ee/index.php/site/comments/community_forecast_2007_pitcher_results/


#19    Tangotiger      (see all posts) 2008/11/18 (Tue) @ 15:14

As for wOBA of the other systems, I would say to simply do:
sum(wOBA*PA)/sum(PA)
for each forecaster (maybe limit it to the top 420 hitters BY PA), and use that as the baseline


#20          (see all posts) 2008/11/18 (Tue) @ 20:53

Tom, a question about Marcels.

I understand why one weights the player’s last three years by placing more emphasis on the most recent.  My question is, why is this necessary when computing the league average?  Would you get a materially different result if you simply used last year’s league averages for regression purposes?  (Sorry if this is a dumb question.)


#21          (see all posts) 2008/11/20 (Thu) @ 13:39

I got the Oliver batter projections finished and posted. I will update with RetroIDs, they’re in the db. I’ll also add BRAA

http://statspeak.net/2008/11/2009-batter-projections.html


#22    TangoTiger      (see all posts) 2008/11/20 (Thu) @ 14:20

Craig: you won’t get materially different results, but the performance of the player was done in a particular context, and it is toward that context that you need to regress.


#23          (see all posts) 2008/11/20 (Thu) @ 15:44

To Craig’s question - a general prnciple is to compare observed results to the expected values. If the observed is a weighted mean 5-4-3, then it would be best to construct the expected values in a similar way, representing how many “opportunities” he had in that environment.


#24    cannatar      (see all posts) 2008/11/21 (Fri) @ 12:01

Brian, does Oliver use different aging curves for different offensive components? Marcel doesn’t, correct?


#25    Tangotiger      (see all posts) 2008/11/21 (Fri) @ 12:26

Marcel does not, correct.

If it did, it would use something like this:
http://tangotiger.net/agepatterns.txt

But, Marcel prefers the K.I.S.S. approach.


#26          (see all posts) 2008/11/21 (Fri) @ 13:03

Yes, each component has it’s own aging curve.

I compared the multi-year weighted projections with the following year’s single season normalized stats, testing for a bias, plus or minus, in each component, then applied this correction to each individual.

The regression minimized the rms error, this step should bring the bias to zero.


#27          (see all posts) 2008/11/21 (Fri) @ 19:27

It seems that Fangraphs might have a bug.  You project Jose Reyes to lead all of baseball with 13 triples.  If you sort the shortstops, there he is, but if you sort the “All”, he is not listed.  It might have something to do with the fact that you have projections for two Jose Reyeses.

I checked, and there is only one Alex Gonzalez in the “All” category as well, though there are two in the BattingMarcel.csv . Only one shows up in SS as well.

Must be a same-name thing.

Just thought I’d let you know.


#28    Greg Rybarczyk      (see all posts) 2008/12/03 (Wed) @ 19:33

Adam Dunn has hit 40 home runs 5 years running now.  His Marcel has him projected for 32 homers next year.

Is the difference between 40 and 32 due to a projected drop in playing time, or a drop in performance with comparable playing time?  It can’t be an aging thing, 2009 is his “Age 29” year…

On a related note, I understand that when you “grade” Marcel against other methods, you generally judge each method by how well it does overall.  Have you ever split out the comparisons, and tried to judge how well each method does on say the top 5% hitters and the bottom 5% hitters (and same for pitchers)? 

I ask because at some point on the right-hand tail of the hitting talent distribution, it stops making sense, *to me*, to “cut” the stats of a player like Albert Pujols with some % of league average stats, no matter how small, unless it is on the basis of a projected injury/drop in playing time (and maybe that’s it, I don’t know).


#29    Tangotiger      (see all posts) 2008/12/03 (Wed) @ 20:51

Yes, I have tested Marcel based on how you are saying it, and presented the results on my blog here, along with the competing forecasts.

The amount of regression is equal to 1 minus reliability, which I post with the Marcels.  If Dunn for example has a rel=.85, then 15% of his RATE forecast is league average. 

The playing time forecast is, 50% of what he played in 2008, 10% in 2007, plus 200 PA. 

So, alot conspires to bring a player down to size.

Any concern should be addressed by this article:
http://www.hardballtimes.com/main/article/forecasting-2006/


#30    Zach      (see all posts) 2009/02/28 (Sat) @ 23:09

In the pitcher projections, is the “last league pitched in” used to determine the AL/NL ERA, K/9, and BB/9 adjustments, or the player’s current team? I noticed Francisco Rodriguez was projected to have a 3.27 ERA despite switching leagues.


#31    Tangotiger      (see all posts) 2009/03/01 (Sun) @ 08:14

Marcel in 2009 only uses the knowledge through to the last regular season game of 2008.


#32    MGL      (see all posts) 2009/03/01 (Sun) @ 14:37

Greg, regression is regression.  If a player hit 50 HR for 10 straight years, it is still more likely than not that he is NOT a 50 HR a year player who got lucky.  That goes if he hit 50 HR for 100 straight years (if that were possible).  Of course, the more years he hits 50 HR, the less you will regress.

But…

We find that players’ talents change enough that going back more than 4 or 5 years adds nothing to our projections, so you can NEVER have a large sample size for a player.  Therefore a player with 50 HR for 10 years will regress around as much as a player with 50 HR for 5 years. That should not be true if players did not change each year, but it is because they do.

In any case, it is a simple task to test “classes” of these regression and projections, like a player who hits 40 HR for 5 straight years.  Just use the Lahman database or whatever, and look at all players who hit, say, more than 30 HR for at least 5 straight years.  There will not be a whole lot of course, but enough to get some kind of answer with a fair degree of reliability.  See how many they hit the next year as compared to the average number for those 5 years (either weighted by recency or not) and you have your answer.  If the average over the 5 years is 40, my guess is that in the 5th year, it is around 34.  Regression.

As well, playing time gets regressed just like everything else.  That is why Tango only takes 60% of a players past playing time and adds 200 PA.  That is simply a regression towards the mean.  Any player who averages less or more than the typical playing time (for that kind of player) more likely than not got lucky or unlucky with that playing time - on the average of course.

If you believe that Micah Owings could be a legitimate position player, then you would regress him to .332, not .164. 

Be REALLY careful with statements like that!  I know Tango knows this, but many people will interpret that to mean, “Well, we just KNOW that Micah is not a typical hitter because of his great stats, therefore we are NOT going to regress him toward .164.”

No, no, and no!

You have to know something else about him OTHER THAN HIS HITTING STATS in order to regress him toward something other than the average pitcher’s wOBA.  Whether he used to be a position player.  His size.  His attitude (lots of pitcher “mail in” their AB and other pitchers actually try and hit well). MAYBE his swing (the problem with that is that his swing tends to be inextricably related to his sample performance - for example, let’s say that see one PA for a pitcher and he puts a great swing on the ball - like a position player - and hits a HR.  You might be tempted to regress his one PA wOBA or BA to that of a position player or a great hitting pitcher based on that one swing.  Now, that one swing might allow you to regress to .180 or .200 0 or whatever - and not .164, but you are seeing a small sample of this pitcher’s swing as well, and it is more likely than not that he just had a “lucky” swing.  Almost all pitchers can occasionally put a good swing on pitch - again, not to say that you can’t use a swing or two to tweak the regression mean).

In the Micah example, there is NO chance that you would want to regress him to that of an average position player (.332 wOBA).  No chance.  Even a former position player (I don’t know that he was) who is a pitcher is not going to hit enough that he has any reasonable chance to be near a true league average position player in the hitting department.  Plus the fact that someone is a pitcher suggests that he was not much of a hitter if and when he was a position player.  Etc.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:21
The two uncertainties of UZR

Sep 02 15:17
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?