THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, March 19, 2010

Will Mariano Rivera save only 22 games this year, and with a 3.53 ERA?

By Tangotiger, 02:03 PM

I’m quoting Rob quoting Allen Barra using PECOTA:

Scariest of all, Rivera from 44 saves to 22, and an ERA that moves from 1.76 to 3.53.

Neyer’s response:

I don’t believe Mariano Rivera will save only 22 games. I will say, too, that if your system says those things, it’s probably worth checking under the hood just in case one of the belts is running a little loose.

Just last week, I wrote a pretty good piece about what it means to forecast 31 HR for Pujols in 2009 and I concluded:

As you can see, it runs the gamut from Delgado’s 4 to Pujols’ league-leading 47.  These 13 hitters were forecasted to hit a combined 401 HR in 2009.  And how many HR did they actually hit in 2009?  400.  That’s right, Marcel nailed it.

So, the forecasting systems work… if you know how to properly interpret what it is they are trying to tell you.

Let’s look at who Marcel has forecasted with the highest saves:


mSV ERA playerID nameLast nameFirst
33 3.55 rodrifr03 Rodriguez Francisco
30 3.07 nathajo01 Nathan Joe
29 3.20 papeljo01 Papelbon Jonathan
28 3.18 riverma01 Rivera Mariano
27 3.60 cordefr01 Cordero Francisco
27 3.69 wilsobr01 Wilson Brian
27 4.04 fuentbr01 Fuentes Brian
26 3.39 hoffmtr01 Hoffman Trevor
25 3.58 valvejo01 Valverde Jose
23 3.26 soriajo01 Soria Joakim
23 3.80 jenksbo01 Jenks Bobby
23 4.50 lidgebr01 Lidge Brad

321 3.57

Let me first tell you what this list is NOT:
1. It is NOT telling you that these 12 relievers are going to finish in that order
2. It is NOT telling you that these 12 relievers are going to finish in some order in the top 12
3. It is NOT telling you that these 12 relievers are going to end up with those exact saves
4. It is NOT telling you anything that specific about any single one of those relievers

What it IS telling you is:
1. That those group of 12 relievers will finish with around 321 saves (*)
2. That those group of 12 relievers will finish with an ERA around 3.57
3. That probably half of those 12 relievers will finish in the top 12 in saves

(*) Though with Joe Nathan already knowingly out for the season, let’s say that the remaining 11 will finish with around 291 saves and a 3.62 ERA.

Mariano Rivera has about a 50/50 chance of finishing with more than 28 saves as he has of finishing with less than 28 saves.  What happened to Joe Nathan could happen to anyone.  It seems to me that we should be reporting the 75th percentile forecasts for all above-average players and the 25th percentile forecasts for all below-average players.  This will match the expectations of fans that don’t appreciate regression toward the mean.

But, regression toward the mean is real, and that’s why we report the 50th percentile forecasts for all players.

That said, 22/3.53 does seem low.  Not absurdly low.  I’ll guess that would be Marcel’s 35th percentile, if Marcel would do that.  But, why not report the even more absurd Bill James forecast: 44 saves, 2.12 ERA.  Rivera has 3 times exceeded that figure, and 9 times not.  Presuming Mariano does not age, a 44-save forecast seems like a 75th percentile forecast.  The Fangraphs FANS, who are optimistic across the board to begin with, have him with 40 saves, which probably traslates to 30-35 saves once you deflate the optimism for all players.

This is just like when I asked for the chance of Lincecum posting a below 2.50 ERA: it’s the same chance of him posting a 3.90+ ERA.  This is proven by empirical evidence of similar superstar pitchers.  And, the Fans have a good sense of this to also say that 2.50 for Lincecum was as likely as 3.90.  And if 2.50 and 3.90 are just as likely, then you have to forecast Lincecum for close to 3.20.  If you were an oddsmaker and you were taking bets and you were putting your money on the line then you’d have to forecast Lincecum’s 50/50 line to be somewhere close to that.

So, exactly where would Rob Neyer and Allan Barra put Mariano Rivera’s 50/50 line, enough that they’d take bets on either side of that line?  It sure as heck is not going to be 44 saves.  They’d get at least 3 times as much under action as they’d get over action.  I agree that 22 saves is pretty low.  Somewhere in the high 20, low 30s is right.

But, why are these articles only written on downside forecasts for great players?  Why not on upside forecasts of bad players?  Because, it’s the same system doing it.  Bill James’ forecasts are getting a pass because he’s got optimistic forecasts for the above-average players.

And one thing I’ve learned from the Fangraphs fans voting in their forecasts, they like to forecast at the 75th percentile alot more than than 25th percentile.  And so when they see a 44 save forecast for Rivera, they think it sounds about right.

But, they’re wrong.

#1          (see all posts) 2010/03/19 (Fri) @ 15:11

Doesn’t everybody basically ignore the Bill James forecasts?

PECOTA has been spitting out odd things the last two years.  Others have noted that this coincides with the time Silver has been gone from BP.  It seems Neyer has noticed this now as well. 

I happen to agree with you here that the Mo/Jeter forecasts actually aren’t crazy at all (just slightly pessimistic, which I can understand from a projection system based on comparables, and we’re talking about a 36-yr old SS and a 40-yr old pitcher). 

Actually, the Jeter forecast is something you didn’t even discuss, I assume b/c it’s not really worth discussing (PECOTA’s forecast is right in line with all the others).  Mo… well hell, people have been missing high on Mo’s ERA for years.


#2    Greg Rybarczyk      (see all posts) 2010/03/19 (Fri) @ 15:17

It might take more space than is usually allotted these things, but forecasts like this might be understood better if they were explained more like this:

“totally healthy & effective”: 44 saves, 2.00 ERA; likelihood: 10%

“mostly healthy & effective”:  35 saves, 2.00 ERA; likelihood: 15%

“mostly healthy, mostly effective”:  30 saves, 3.00 ERA; likelihood: 15%.

.
.
.
.

“significant injury time, effective:” 10 saves, 2.00 ERA; likelihood: 10%

“significant injury time, mostly ineffective”: 5 saves, 4.00 ERA

“season-ending injury, mostly ineffective:” 0 saves, 5.00 ERA

Then people could better understand how you arrived at the projection you did, and what the risks are, as perceived by the forecaster, for the upcoming season.

These sort of calculations *have to be happening* in projection systems, but this detail is submerged.  Maybe for a few key players it ought to be uncovered and described in gory detail… I think everyone would benefit from that.


#3    Tangotiger      (see all posts) 2010/03/19 (Fri) @ 16:00

Yes, I agree, it should happen.  Maybe one page for one player, you do that.  And then the rest gets the summary treatment.

Buuuuuuuuuuuut, even PECOTA, which DOES shows the percentile forecasts, have those percentile forecasts ignored in favor of discussing the mean forecasts!


#4          (see all posts) 2010/03/19 (Fri) @ 16:01

When SG presents his CAIRO projections for Yankee players over at rlyw, he shows baseline, but also the 20%, 35%, 65% and 80% forecasts.  I think that presentation really does help.


#5    MGL      (see all posts) 2010/03/19 (Fri) @ 16:07

As I have said before, you have to define what you mean by “forecast” as Greg says above.  It is not that the fans are necessarily wrong.  It is that they are giving you a forecast, given that “so-and-so stays healthy and plays most of the season.”

Marcel and the rest of the forecasting systems include the possibility of a player getting hurt and not playing for some of all of the season, which, especially for pitchers, can be a significant factor in the forecast.  Fans don’t think that way, and there is nothing wrong with that.  If the fans forecast Mariano at 40 saves and he doesn’t play or plays half the season, they don’t consider that a failed forecast.  They consider that, “Well, he got hurt.  Of course the forecast was ‘wrong.’ Had he not gotten hurt...”

So it is more about defining what you mean by a forecast than bad forecasting by the Fans or even James.  If you told the fans, “I want you to include the possibility that so-and-so gets hurt and can’t play for X number of games,” they might have a different forecast, although that concept (of including chance of injury in a forecast) is a hard one for them to wrap their arms around and even harder still for them to translate that into a “number.”


#6    Greg Rybarczyk      (see all posts) 2010/03/19 (Fri) @ 16:11

Do you mean the “Breakout”, “Collapse”, etc. groups?  Because those don’t seem quite enough to me. 

I think some sort of grid that crosses a playing-time prediction model with a performance prediction model would be great.  Show how likely each is, show a typical stat line for each block, and then shaded in the block deemed most likely by one’s projection system.  Then interested fans could clearly see upside from more playing time and/or better performance, and downside from poor performance or lost playing time due to either weak performance or injury.

There would no doubt be some interactions, too, as poor performance might lead to reduced playing time, while strong performance could increase playing time for a part-time hitter or a reliever, or someone not originally expected to be in the rotation…

Put grids for the whole pitching staff on one page, and you could even show the dependencies between players, as a downside risk for playing time on your ace pitcher would be reflected in an upside likelihood on playing time for a lesser pitcher on the staff.  You could even figure a way to display the likelihood of different leverage levels for pitchers based on the whole playing time model…

Hopefully I didn’t just exactly describe some fantasy baseball guide everyone but me has read - I suspect some of these ideas are already out there, but I know I haven’t seen a 2-dimensional grid crossing playing time with performance.... yet.


#7    David Gassko      (see all posts) 2010/03/19 (Fri) @ 16:22

28 saves is almost certainly not the appropriate over/under for Rivera, though. After all, Marcel only knows about a pitcher’s save totals from the past three years—it does not know what his role will be in the upcoming season. There’s a ton of turnover among closers (see http://bookofodds.com/Daily-Life-Activities/Sports/Articles/A0144-Few-Baseball-Closers-Return-for-Seconds), so Marcel’s saves projections will obviously tend to be conservative for those who actually stay in the closer slot. We know that if Mariano is healthy, there is a 99% chance he will close for the Yankees all season. That is not the case for many of the players on your list. Therefore, with the knowledge that Rivera is the Yankees closer, his projection should be higher than Marcel says—probably on the order of 33 saves or something.


#8    Tangotiger      (see all posts) 2010/03/19 (Fri) @ 16:36

David/7: I agree with you and for exactly the reason you stated, which is why I said:

So, exactly where would Rob Neyer and Allan Barra put Mariano Rivera’s 50/50 line, enough that they’d take bets on either side of that line?  It sure as heck is not going to be 44 saves.  They’d get at least 3 times as much under action as they’d get over action.  I agree that 22 saves is pretty low. Somewhere in the high 20, low 30s is right.

Rivera’s median saves in his 13 closer seasons is 40.  Saying 33 sounds perfectly fine to me as an over/under.


#9    Brian Cartwright      (see all posts) 2010/03/19 (Fri) @ 17:32

We have seen in the past that some pitchers with really ugly ERAs can achieve rather high save totals if they are given enough opportunities.

Estimate how many wins this team should win, then how many team save opportunities there should be. Do a depth chart and split up those opportunities among the available relievers.

Even the worst pitchers can hold their lead 70% of the time (Sv+Hld)+(Sv+BS), the best 95%+. Do an analysis of ERA vs Hld%. Then multiply by opportunities.

(numbers made up)
Chacon converts 70% of 40 opps, gets 28 saves with a 6.00+ ERA. Rivera gets 90% of 50, for 45.


#10          (see all posts) 2010/03/20 (Sat) @ 02:31

Predicting saves is kind of like predicting RBI’s.  RBI’s of course are dependent on opportunity (ROB’s, and runners in scoring position) which is somewhat dependent on position in the batting order and quality of offense, and the hitters HR totals which is dependent on park and PA. 

Saves are dependent on number of opportunities, as well as the distribution of 1, 2 and 3 run save opportunities, and usage (pitching more than 1 inning and coming in with ROB)

With the Yankees offense and projected wins, I would have to imagine Mo would be projected to have a high number of save opportunities, with the distribution of saves more toward the 3 run save, which even lousy RP’ers pitchers can be expected to bank.

He had 46 save opportunities last year, even if his save rate drops to a poor 80%, if he gets the same number of opportunities, he is good for 36.  His save rate would have to be below 70% to save less than 30 games.

I would not bet on Mariano getting 44 saves, but I suspect he will be closer to 44 saves than 22, or even 28 saves.

Only way Rivera gets 22 saves is if he blows out his arm and does not finish the season, or loses his job due to poor performance (which would likely be due to arm trouble), or if the Yankees lose 100 games.


#11    Toph      (see all posts) 2010/04/17 (Sat) @ 20:53

What you guys are missing here is the 3.53 era that pecota is projecting.  I think Neyer is questioning that just as much as the save total.  Rivera has NEVER had an ERA over 3.15 since he became exclusively a RP in 1996.  3.53 is an absurd estimate IMO for his projection as Rivera has had more seasons under 2.00 than over 3.00 as a RP.


#12    John Beamer      (see all posts) 2010/04/19 (Mon) @ 05:59

Interesting - CHONE is at 2.70 for Rivera. ZIPS was 3.1 at the start of the season. So 3.53 does seem a tad high.

Anyway ...

Part of the issue here is that the number of 40 yo HoF-quality pitchers is tiny. And quite a few of that sample, the sample that Mo is being compared to, will see a sharp year-to-year spike in ERA, which is why Mo’s historic numbers will be regressed so much. We know PECOTA is heavy on comps, right?

One of the issue with any mathematical forecasting system is that at the extremes it is more liable to fail.

If an alien landed on this planet and looked at all similar pitchers to Mo (if there are any!) and then looked at Mo’s track record perhaps they’d pick 3.4 as the 50th percentile.

I’m guessing, but systems like CHONE/ZIPS which don’t really use comps suffer less from this issue (if it is an issue - I’m not sure it is - it is an equally valid way of looking at how his 2010 might evolve).


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 09 21:00
New PECOTA

Feb 09 20:51
Psst… wanna intern in Canada?

Feb 09 19:10
Who’s evaluating the 2011 forecasts this year?

Feb 09 18:35
MGL: Today on Clubhouse Confidential

Feb 09 16:38
The will of the people?

Feb 09 16:25
Correlation of pitcher metrics: FIP strikes again

Feb 09 11:56
Forecaster’s Challenge: 2012?

Feb 09 11:45
When is a life entity considered a person?

Feb 09 10:08
Change in fastball velocity by going from starter to reliever

Feb 08 22:41
Batman, the webslinger?