THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, January 05, 2012

Hard to beat Marcel’s pitching forecasts

By Tangotiger, 05:19 PM

Matt shows this:

Estimator(N=1,576 pitchers)     RMSE of Statistic with Next Year’s ERA(2006-2011)
SIERA     1.126
Marcel     1.132
PECOTA     1.141
ZiPS     1.143
xFIP     1.148

FIP     1.212
tERA     1.236
ERA     1.387

First of all, no need to go to three decimal places.  We show ERA as two points, so why bother showing it to three decimal places?  As far as I’m concerned, there’s virtually no difference among the top five.

Secondly, I can’t tell if the future ERA is park-adjusted or not.  It MUST be unadjusted.  NO ONE is trying to estimate a pitcher’s park-neutral ERA in terms of testing.  The only test is how he actually did.  So, we don’t adjust for park and strength of schedule and innings per start. 

(MGL for example only cares about park-neutral.  And that’s fine.  But then, we can’t test his results.  SIERA is park-neutral I think, but FIP is not.  All the forecasting systems in fact are park-specific.  You can’t turn everything to park-neutral first.)

You COULD make the case that we should throw out any unexpected starter-relief switches, for reasons we’ve learned about over the years.  But, we need to be careful here, as we may end up with a selection bias.

In the comments, Matt notes that it was park-adjusted.  Again, I completely disagree here.  The test is against actual performance, not adjusted performance.  He notes it didn’t make a difference.  Well, given that the test is slanted toward SIERA, and SIERA is Matt’s baby, then, I’d REALLY like to see the results the right way.

Now, Matt may decide to introduce a park-specific SIERA, so that we can all make the apples-to-apples comparison.  Until then, SIERA will simply have to have its hand tied behind its back.

Thirdly, for the RMSE test, you MUST calibrate it so the league average for the forecast equals the league average for the actuals.  It should be clear that if you treat the forecasting system as its own universe, it’s irrelevant if the expected ERA was set to 3.9 or 4.3 and the actual ended up at 3.7 or 4.8 or whatever. I’m not sure if Matt handled this.

As we know, RMSE, not correlation, is the correct test.

Having said all that: great job to Matt!


#1    Matt Swartz      (see all posts) 2012/01/05 (Thu) @ 18:12

The test was against regular ERA-- NOT park-adjusted ERA. So SIERA and xFIP should have been punished. I could have multiplied them by their park factors, and presumably, they would have done better.

The comment in the comments section was about whether I park-adjusted ERA in creating SIERA. The non-difference was about using park-adjusted BB,SO,GB,FB as inputs.

I didn’t adjust for league average, since I assumed that every year kind of uses the previous year as the run environment for projections, and obviously ERA Estimators are supposed to be the same year. I could check this later.

I actually disagree that RMSE is necessary better than correlation here, but I’m not 100% sure. It’s better if your goal is estimate exact talent level as closely as possible-- so for a straight test of projection systems. But FIP doesn’t do that-- it gives the pitcher his HR rate as his talent, effectively, if used as an ERA projection. Neither does SIERA-- it assumes the pitcher’s strikeout rate is his strikeout rate skill level. So, correlation controls for that, by basically saying, “if each estimator/projection were optimally regressed and re-centered, what would the best guess be?” Does that make sense, or am I missing something about the logic of RMSE vs. correlation. It seems like correlation is the way to do apples to apples, or closer to it, when ERA Estimators are not intended to be projections themselves.


#2    Perceptron      (see all posts) 2012/01/05 (Thu) @ 20:23

It took me awhile to convince myself, but Matt does have a valid point. When we only care about projections, RMSE is absolutely the right choice, no questions asked. But in this case, if all we want to know which has the strongest relationship, then sure, correlation is fine.

That said, Matt’s article is focused on projections. You would never include projection systems as a measure of a player’s ability. Ability metrics, such as SIERA, are useful for inference. Forecasts are nice for predictions. In baseball, it just so happens that most players don’t change much from year to year, so the inference metrics can be doubled as prediction metrics without worrying too much (of course, inference metrics don’t account for the player being a year older, which would inevitably help for predictions). However, I’m not sure I would agree with the converse here. For that reason, if you are going to include SIERA and Marcel in the same article, you have to be talking about predictions. And thus RMSE is clearly the best choice here.


#3    Tangotiger      (see all posts) 2012/01/05 (Thu) @ 22:24

y=mx+b

RMSE forces m=1

Correlation will find the best-fit for both m and b.

So, if someone forecasts a range of ERA 3.75 to 4.25, a correlation will get us an m greater than 1.  So, even though the forecasting system went out of its way to regress the data toward the mean (and maybe OVERREGRESS incorrectly), allowing the correlation to undo the damage by getting m=2 or something is cheating.

You can’t do that, hence the reason the correlation won’t work.

I had a very long post about this a year or two ago, using Marcel and Pujols as the example.  Hopefully, googling Marcel Pujols forecasting correlation RMSE will find it easily enough.



#5    Matt Swartz      (see all posts) 2012/01/05 (Thu) @ 22:31

That’s why as a pure projection system test, RMSE works best. However, as a potential projection system derived using ERA Estimators, correlation basically figures out how much to regress (set “m” and “b") to get a projection derived solely from the ERA estimator. It turns out that regressing SIERA about 30% towards the mean does a better job than anything else.

I set the mean of each estimator to be the same for each year, and then re-tested. The order was the same:

SIERA:  1.126 -> 1.108
Marcel: 1.132 -> 1.113
PECOTA: 1.141 -> 1.117
ZiPS:  1.143 -> 1.123
xFIP:  1.148 -> 1.133
FIP:  1.212 -> 1.195
tERA:  1.236 -> 1.214
ERA:  1.387 -> 1.351

Basically, you might as well always subtract about the same amount since they’re all starting from the same point. If I use post-dicting projection systems re-derived from old data, then I’d need to do this for sure, but this way, it seems like tomaeto/tomahto.


#6    Tangotiger      (see all posts) 2012/01/05 (Thu) @ 23:32

Matt: yes, for the non-forecasting systems, you need to do the correlation, to turn them from estimators into forecasters.

FIP for example is not a forecaster, and so, it would make no sense to not change the slope.  SIERA as well.

My original point applies only to forecasting systems, those that had already decided on the slope.

By the way, can you include (K-BB-HBP)/PA in all future testing (and best-fit the slope)?


#7    Matt Swartz      (see all posts) 2012/01/06 (Fri) @ 11:19

Tango-- I can but this was in Excel so it’s a little harder to code it right in for this, but I can include it. I did remember to for the SIERA testing this summer, but just forgot here, since I was just trying to do vlookups and match the spelling of the names.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com