THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Saturday, March 13, 2010

Pitching Forecasts 2009: Head-to-Head

By Tangotiger, 11:49 PM

Results:
0.517 ZIPS
0.511 CHONE

0.507 CAIRO
0.504 STEAMER
0.502 FANTASTICS
0.501 PECOTA
0.500 MARCEL
0.495 OLIVER

0.486 SPORTING NEWS

Here’s what I did:
Marcel forecast Felix Hernandez with a 3.87 ERA.  Given his 238.2 innings, that would mean he’d allow 103 earned runs.  He actually allowed 66, making Marcel +37 too high.  Chone was +29 too high.  Steamer was +40 too high.  Oliver was +48 too high.  So, Marcel gets a W against Oliver, and L against Chone and a tie against Steamer.  I counted as a tie any difference that was within 4.

(Also note that I calibrated all the ER to match the actual 19531.  FWIW, Marcel forecasted 19390, which was one of the closest.)

I went through all 512 pitchers in the dataset provided to me by J Cross, and tallied up the wins, losses, and ties.  The winner was ZiPS: 73 wins, 56 losses, 383 ties.  52% of the time, it was better than Marcel.  (With a nod to Homer, this does mean that 48% of the time Marcel is better. D’oh.) Marcel really is in a group with the majority at .500.

Chone continued its strong showing, and given that it led the way for hitting, Chone is the best forecasting system of the year, among those in Jared’s collection.


#1    Tangotiger      (see all posts) 2010/03/14 (Sun) @ 00:08

If you make the tie as anything within 1 run, then here’s what you get:

0.520 CHONE
0.519 ZIPS
0.513 STEAMER

0.505 OLIVER
0.500 MARCEL
0.496 PECOTA
0.489 CAIRO

0.464 SPORTING NEWS
0.450 FANTASTICS

I think 4 runs is appropriate.


#2    Tangotiger      (see all posts) 2010/03/14 (Sun) @ 00:36

If I limit it to only those pitchers with a reliability of at least 0.50 (i.e, those pitchers with the most MLB experience), those 286 pitchers accounted for two-thirds of the innings in 2009.  Here’s the W/L record against Marcel:

0.530 ZIPS

0.507 CAIRO
0.505 STEAMER
0.503 PECOTA
0.500 MARCEL
0.495 CHONE

0.486 SPORTING
0.484 FANTASTICS
0.481 OLIVER

It looks therefore that ZiPS shines bigtime on existing MLB pitchers, while Chone shines big time on young or rookie MLB pitchers.  Here’s how everyone did against Marcel with the reliability score under .50:

0.531 CHONE
0.524 FANTASTICS
0.513 OLIVER

0.507 CAIRO
0.502 STEAMER
0.500 MARCEL
0.500 ZIPS
0.498 PECOTA

0.487 SPORTING

So, Chone was great here, and Oliver looks good too.

Ideally, all you guys would get together and come up with a super-forecast where each of your insights would be brought to the table.

By the way, for those pitchers where MArcel simply guessed league-average, well:

0.549 FANTASTICS

0.500 STEAMER
0.500 MARCEL
0.500 SPORTING
0.500 CHONE
0.490 CA?

0.480 OLIVER
0.461 ZIPS
0.431 PECOTA

Well, that’s pretty bad.  Basically, MO ONE other than Fantistics knows how to forecast someone with zero MLB experience.  So, forgive me when I’m incredulous at claims of great MLB forecasts for pure rookies.  Flipping a coin will give me just as good a result.


#3    J. Cross      (see all posts) 2010/03/14 (Sun) @ 01:08

I think it’s true that none of these systems projects ERA of rookies better than a coin flip but I think that could have as much to do with rookies just not throwing that many innings as it does with the weakness of the projections.

Try adjusting the reliability and IP scales on this graph:

http://7791439752913419574-a-1802744773732722657-s-sites.googlegroups.com/site/steamerprojections/steamer-files/09pitcherforecastevaluation.swf

Oliver did project K-rates for rookie better than other systems and by a significant margin.


#4    J. Cross      (see all posts) 2010/03/14 (Sun) @ 01:18

Oh, I should have said this before but ca is CAIRO.

Interesting about Chone/ZiPS and their specialties.


#5    Tangotiger      (see all posts) 2010/03/14 (Sun) @ 08:18

1. Nice graph software.

2. Remember that I’m only counting as a W or L if the runs difference is over 4.  That means that if you have 27 IP, the forecasted ERA would be 4.50 and he put up more than 5.83.  Or if you have 54 IP, put up more than 5.17, etc.

3. While it’s nice that we can forecast components, the ONLY thing we care about is actual production.  Ideally, forecasting FIP is the best test because of the high correlation between FIP and actual pitching skill.  But to the extent that FIP doesn’t capture everything, then it’s ERA (or RA actually) that provides the real test.


#6    Tangotiger      (see all posts) 2010/03/14 (Sun) @ 13:15

The winner was ZiPS: 73 wins, 56 losses, 383 ties.

This really means that 75% of the time, it makes almost NO DIFFERENCE whether you choose ZiPS or Marcel.  None.  This is why it’s virtually impossible to have a healthy margin over Marcel. 

The other 25% of the time, ZIPS wins 56.5% of the time.

Imagine your best case scenario… 70% of the time, you go for the tie, and the other 30% of the time, you go for a .600 win record.  Overall, that makes your win% as .530.

See?  It’s almost impossible to beat Marcel by anything approaching a knockout.


#7    J. Cross      (see all posts) 2010/03/14 (Sun) @ 13:26

Okay, all fair points.

But, doesn’t OLIVER’s ability to better predict the K% of guys who make the majors hint at a better ability to better predict who should be in the majors?


#8    Rally      (see all posts) 2010/03/14 (Sun) @ 14:25

"Well, that’s pretty bad.  Basically, MO ONE other than Fantistics knows how to forecast someone with zero MLB experience.”

That is not right.  It’s an artifact of the evaluation criteria being only major league performance.

The systems that project several thousand pitchers deep into the minors wind up saying a lot of guys are 5.50-6.50 ERA talents.  Below replacement level, which is why they are in the minors.  For a few of these guys they are really better than that, they get called up and oupitch their projection.  Most of course, do not and stay in the minors or get released.

If I wanted to “project to the test” then I’d set the system up to never project anything worse than 20-25% worse than MLB average.  If I did that I would be closer on a guy like Rick Porcello, coming straight from A ball to earning a full season in an MLB rotation.  But I’d also be telling you that there are thousands of pitchers in professional baseball who you can expect an ERA in the 4.50 range from, and that is flat out wrong.

Marcel cannot tell you, preseason, which rookies deserve a chance to pitch in the majors.  Whether the other systems do this well, or provide any value above traditional scouting in making this decision is an open question.  But Marcel cannot tell you anything here.


#9    Mike Fast      (see all posts) 2010/03/14 (Sun) @ 14:28

Rally/8, correct.  There is a huge case of selection bias at play here, worse than anything we deal with in aging calculations.


#10    Rally      (see all posts) 2010/03/14 (Sun) @ 15:08

For pitchers with no experience, I don’t see any point in comparing anything to Marcel since Marcel by design does not tell you anything.

How about this: For all rookie pitchers allowed to pitch in the majors, rank their preseason forecasts from best to worst.  Then compare the rank of their performance.  If I’ve got two pitchers projected at 4.85 and 5.25, and they actually pitch at 4.25 and 4.50, then I have identified who is better.  But Marcel beats me since he says they’ll both be exactly average.  Before we praise the monkey for his foresight, keep in mind that he projected 2000+ more pitchers to be just as good, I projected them at ERAs above 6.00, and no MLB team allowed them anywhere near a mound.


#11    David Gassko      (see all posts) 2010/03/14 (Sun) @ 15:39

Hey guys,

This is simple. All we have to do is project pitchers in A, AA, AAA or whatever. That is, tell me not only how you think a given pitcher will do in the MLB but also in AAA, AA, or A, depending on the level he plays at.

Then, at the end of the season, we’ll take every A pitcher and compare his A projection from each system to his actual A performance, each AA pitcher and compare his AA projection from each system to his actual AA performance and so forth.

Marcel, of course, projects everyone to be a league average major league pitcher, so it will look terrible at the minor league level. That’ll be reflected in such a test.

The projections themselves shouldn’t be too hard to generate—just apply your MLEs in reverse. Would the people who do projections and are reading this thread be interested in doing something like this? If so, I’ll put something together.


#12    J. Cross      (see all posts) 2010/03/14 (Sun) @ 15:55

Cool idea, David.  Steamer hasn’t put too much thought into MLEs but we’d be game just to give these other systems someone to beat up on if you like.

Hitters and Pitchers?  Just players without MLB experience?


#13    Tangotiger      (see all posts) 2010/03/14 (Sun) @ 16:48

There’s no question we have a selection bias issue.  Marcel lives on that basis.

And the correct test is one that is based on WAR. Marcel will forecast all the pure-rookies at somewhere between 0 and 0.5 WAR.

But, the forecasters seem to take the view that:
1. they don’t forecast playing time, but
2. want their forecasts treated such that a reasonable playing time matches their forecasted talent level

So, say you have a pure-rookie pitcher forecasted with a 3.30 ERA.  If you REALLY believe that, then you have to forecast some level of IP, or barring that, have the user infer an IP based on that talent level.

Marcel forces all the unknown pitchers to be around the 0.50 WAR level. 

All we have to do is get the forecasters to forecast WAR, and then Marcel should be trounced.


#14    Newcomer      (see all posts) 2010/03/15 (Mon) @ 19:47

There are two things I’d be curious to see in future comparisons.  One, how does a simple average of ZiPS and CHONE perform?  I know that people like to combine the available projections, seeing as more information is better than less information, but how much does that actually help?  The other idea is that when you sometimes include the previous season’s stats as a projection, I’d be curious to see how a three-year average would compare.  In other words, teasing out the weighting, aging, and any other adjustments, how close is the simple average (which people cite online frequently) to the projection systems?


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential