THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, February 14, 2007

How reliable are PECOTA Forecast Percentiles

Chris Carpenter is arguably the second best pitcher in baseball.  Santana is first obviously, and after him you have a cast of characters, like Carpenter, Halladay, Oswalt among others.

Forecasts among five forecasting tools are equally impressed with him.  His K minus BB per 9 IP are forecasted as:


5.51 (ZIPS)
5.47 (Chone)
5.36 (Marcel)
5.08 (PECOTA)
4.73 (Bill James)
---------------
5.23 Average

They all love him.  He is 51-18 in his last 3 years.  If there’s one thing we can be certain about is that Chris Carpenter is one of the best pitchers in baseball.

Adam Wainright looked great last year.  But, are we as sure about his greatness as were are with Carpenter?  Of course we can’t be.  We’ve got history to judge Carpenter, and we don’t have anywhere near as much with Wainright.  On top of which, since pitchers perform much differently as starters or relievers, this adds another level of uncertainty. 

His K minus BB per 9 IP are forecasted as:

5.66 (Chone)
4.74 (Bill James)
4.57 (Marcel)
4.36 (ZIPS)
4.33 (PECOTA)
-------------
4.73 Average

How they see them
Marcel is a great barmoter, since we know exactly how it’s calculated, and we know exactly what kind of assumptions it makes.  We see here that Chone expects much better performances from these two guys than Marcel, while PECOTA expects worse performances than Marcel.  In effect, Chone sees their historical numbers as being more “real” than does PECOTA.  Marcel has properly valued them, given the limited amount of data it looks at.  Chone has decided by looking at even other data, that they’re better.  PECOTA, looking at also other data, decided that they are worse.

Bill James (or I guess BIS) has decided that Carpenter isn’t as good as his numbers, and Wainright is better than his numbers.  ZIPS takes the opposite viewpoint.

All fun and games really, and not really germane to this blog entry.  What is more important is that the standard deviation of the forecasts is much smaller with Carpenter than it is with Wainright: 0.29 to 0.49.

This is of course expected, and any other result would have been seen with skepticism.

Reliability
Marcel also supplies “reliability” scores for each forecast, with Carpenter at 0.83, just a shade below the leader, Santana, at 0.84.  What the reliability score measures is how much of a player’s stats we can trust.  In the case of Carpenter, we regress his historical data 17% toward the league mean.  Not much really.  Wainright on the other hand has a reliability score of 0.46, meaning we regress over 50% of his stats toward the league mean.  Qualitatively, this makes perfect sense.  While the numeric representation of the reliability scores may not be obvious, that we get such a wide difference is.

Enter the PECOTA percentile forecasts.  PECOTA does something similar to Marcel, and further expands it by introducing performance lines for various percentile levels.  If we focus on the 25th and 75th percentiles, PECOTA tells us that it’s 50% sure that Wainright’s K minus BB per 9 IP will be between 4.53 and 3.97, or a gap of 0.56.  And with Carpenter?  Between 5.33 and 4.58, or a gap of 0.75.

Isn’t that the opposite of what we expected?  And if we look at the “equivalent peripheral” ERA, Carpenter is 3.18 to 4.37, or a difference of 1.19, while Wainright is 3.73 to 4.96, or a difference of 1.23.  They both have the same level of uncertainty.  And, of course, that can’t be.

Why?
Why does PECOTA do that?  I suspect that PECOTA first establishes the comparable pitcher list, and once that is established, the unreliability of the comps is thrown out the window.  That is, with Carpenter, we have a solid track record as to how good he is.  We’re pretty sure of it.  The “0.83” of Marcel.  With Wainright, not so much (Marcel reliability of “0.46").  But, once the comp list is created, the uncertainty of those comps is likely no longer considered.

Now, PECOTA does know more about Wainright than Marcel, since Marcel is intentionally oblivious to minor league performance, and PECOTA is not.  But, how much higher can that reliability estimate go?  0.55?  0.60?  Marcel has 135 pitchers with a reliability estimate of at least 0.70.  If we consider all minor league performances as well, can Wainright jump into that pool of pitchers?  I don’t see how he could.

Finally, is being 50% sure that a pitcher will have a peripheral ERA within a 1.20 range something great?  Let’s assume you have a pitcher with an OBP of .340, and he will face 700 batters.  What would the binomial give us in terms of the 25% and 75% percentile levels?  That would be .328 and .352, which translates into an ERA of 4.13 and 4.90, or a difference of only 0.77.  If you were 100% sure that a guy had a .340 OBP, you’d expect to see a peripheral ERA 50% of the time within that 0.77 range. 

But, we are not 100% certain of our true mean.  It seems to me that our uncertainty level around our true mean must be fairly high, if we can only be 50% sure that a pitcher’s peripheral ERA can only be estimated at within a 1.20 ERA range.

In short, I see no reason to believe that the forecast ranges of PECOTA actually represents what it purports to.  And I see no reason to believe that PECOTA’s uncertainty of a player’s forecast is dependent on how much information we have about that pitcher.

(20) Comments • 2007/03/11 • SabermetricsForecasting
Page 1 of 1 pages

<< Back to main