THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, March 09, 2010

Forecast evaluation 2009: pitchers

By Tangotiger, 06:17 PM

Jared and gang:

ERA Projections
System W L W% Z score

ZiPS 100 62 0.618 3.01
CAIRO 95 67 0.585 2.18
Chone 91 71 0.564 1.62
Steamer 90 72 0.558 1.47
PECOTA 88 74 0.541 1.05
Oliver 82 80 0.503 0.08
Marcel 81 81 0.500 --
Fantistics 80 82 0.492 -0.20
Sporting News 73 89 0.454 -1.18


#1    colintj      (see all posts) 2010/03/09 (Tue) @ 18:44

Anyone know where to find past season forecast results for pitchers?  Particularly the ones that did well?


#2          (see all posts) 2010/03/09 (Tue) @ 18:45

sry, meant to get the email followups…


#3    Mike Fast      (see all posts) 2010/03/09 (Tue) @ 18:46

That is an excellent evaluation!  Thanks for sharing, Jared.


#4    RMR      (see all posts) 2010/03/09 (Tue) @ 19:23

Thoughts on why the walk rate projections fared so poorly? 

Clearly CAIRO had the spread way off (I wonder what the actual distribution looks like), but was that the same problem to a lesser degree with the other systems?  I wonder if this is a function of having so many players in the system with relatively small sample sizes of IP.


#5    Tangotiger      (see all posts) 2010/03/09 (Tue) @ 19:27

Since Marcel is the baseline, then you can’t say there is an IP bias in the spread.  Marcel didn’t have a problem.

***

I did an extensive evaluation of the 2007 forecasts in this blog.  You can search for it.  It was probably done in mid-2008 to late-2008.


#6    anon      (see all posts) 2010/03/10 (Wed) @ 10:35

Where do their standings come from?  Is the win percentage determined by all pairwise comparisons against all other systems, and then the 162 game record gotten by multiplying this percentage by 162?

The z-scores look like they’re based on the 162 game record.  If the procedure is the way I outlined above, the z-scores are meaningless.


#7    Rally      (see all posts) 2010/03/10 (Wed) @ 10:55

Jared, how did these systems do in predicting babip?  Since innings are supplied by not atbats, I assume, you can estimate it like this:

(H-HR)/(IP*2.83 + H - HR)


#8    Tangotiger      (see all posts) 2010/03/10 (Wed) @ 11:22

Rally, you forgot about SO.

As well, it would be better to do this:

[(IP*3-SO) * something + H - HR]

or

[(IP*3-SO + H-HR) * something]
on the idea that nonHR hits lead to runners out on base

The key point is that you should subtract strikeouts from innings first.


#9    J. Cross      (see all posts) 2010/03/10 (Wed) @ 11:42

I’ll take a look at BABIP

(H-HR)/[(IP*3-SO +H - HR)*.95]

ought to be good enough, right?

The something doesn’t actually matter for correlations.

anon, the z-scores are based on a comparison to Marcel (the test of the hypothesis that a system’s predictions are equally as good as Marcel’s) So, no pairwise comparisons.  Three things things factor in: the correlation between system x and the actual results, the correlation between Marcel and the actual results and the correlation between system X and marcel.

w% =SQRT(0.25/162)*Z-score+0.5

.25 b/c that’s p(1-p)

That said, I think there’s a fair chance I’m not understanding your question.


#10    J. Cross      (see all posts) 2010/03/10 (Wed) @ 12:03

Also, here’s Tango’s analysis from 2007 (do you still think it’s spoon bending?):

http://www.insidethebook.com/ee/index.php/site/comments/community_forecast_2007_pitcher_results/

and another discussion that was helpful in thinking about this:

http://www.insidethebook.com/ee/index.php/site/comments/forecast_evaluations/

The thing that I didn’t check for, but should have (b/c it’s irrelevant in correlations) is the slope of actual stat v. pred stat.  I could add slope columns to these tables.


#11    anon      (see all posts) 2010/03/10 (Wed) @ 12:46

J. #9:  Thanks.  You calculate the z-score first and then derive a win % from that based on the formula you gave?  I had it backwards.


#12    J. Cross      (see all posts) 2010/03/10 (Wed) @ 13:11

yes, that’s right.


#13    Tangotiger      (see all posts) 2010/03/10 (Wed) @ 13:34

Jared referred to a statement I made here in one of those threads (work I’m pretty proud of):

“Forecasting is the sabermetric equivalent of bending spoons. “

Yes, I still believe that.  Jared: I think your win% are a bit deceptive in terms of interpretations.  The correct way to interpret it is the way you described it: the chance that system X is better than Marcel, and put it in terms of 162 trials. 

So, ZiPS is 100-62 (62% wins), and so the chance that ZiPS is better than Marcel is the same as a team OBSERVED to playing 100-62 is better than a team that is 81-81.  But, that does not mean that ZiPS wins 62% of its head-to-heads against Marcel.

I’d rather (or at least in addition) like to see the head-to-heads.  How often did ZiPS actually beat Marcel?  If you can send me your data, I can run it through the process I had for the hitters.

I do love what you did when you showed the effect of Steamer and PECOTA v Marcel.  Steamer was better than PECOTA but counted as being worse because we are more confident (due to similarity of Steamer to Marcel) that Steamer really is worse.  Very novel way to do it.  I just don’t know if more than 10% of the readers will pick up on that nuance.  Maybe more than 50% of the big regulars here get it, but the passing-reader might not appreciate it.


#14          (see all posts) 2010/03/10 (Wed) @ 14:12

I think that so long as one reader picked up on it I’m probably over 10%.  Just sent you the data.  If anyone else wants it just let me know.


#15    Zach      (see all posts) 2010/03/10 (Wed) @ 19:09

Jared, what’s the correlation between ZiPS’s ERA forecasts and actual ERA (or whichever system has the largest correlation)?


#16    Brian Cartwright      (see all posts) 2010/03/10 (Wed) @ 19:21

Even though Oliver did well in BB & SO for pitchers, you are testing the accury with BB/IP and SO/IP, but I calculated them as BB/PA and SO/PA.

Did you pick IP because that’s the common denominator in the data that was provided to you? I know I sent PA (or TBF, BFP, whatever you want to label it) in my data.

For Oliver
BB% = (BB-IBB)/(PA-IBB)
SO% = SO/(PA-IBB)
HR% = HR/(AB-SO+SF)
BH% = (SI+DO+TR)/(AB-SO+SF-HR)
XBH% = (DO+TR)/(SI+DO+TR)
TR% = TR/(DO+TR)


#17    J. Cross      (see all posts) 2010/03/10 (Wed) @ 19:49

Zach, ZiPS had a correlation of .337 between predicted ERA and actual ERA.  If you narrow the pool to the 461 players with prior MLB experience that goes up to .360.  Both numbers were the highest.

Brian, yes, I didn’t have PA #’s for some of the systems.

I could look at, say, K% as K/(IP*2.83 +H) or K/(IP*2.83 + H + BB) and see if the the rankings change.  I don’t expect them to but I am unsure enough to try it.

btw, one thing that’s not making sense to me is how Oliver projected K/9 better than anyone, BB/9 as well as Marcel and HR/9 better than Marcel but ~FIP (just a combination of the three) not as well as Marcel.  Maybe just an odd fluke; I don’t have an explanation.


#18    Rally      (see all posts) 2010/03/10 (Wed) @ 23:52

Yeah, that doesn’t make sense - On the FIP.  Chone’s poor ranking on FIP corresponds to a poor ranking in walks.  I would have though good ratings in K and HR would offset that somewhat. 

I’m surprised nobody beat Marcel in walks.  What do I do differently than Marcel?  My year weights and regression are a bit different, though not that different.  I use park factors and league adjustments.  Maybe the park factors are inconsistent year to year and throw things off.

Jared, can you send me the data?
rallymonkey (numeral five) at comcast dot net.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 01:57
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential