THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, November 21, 2011

Clay’s housekeeping

Clay notes how embarrassing, and otherwise confusing, decades old code looks like.  All I can say to that is: guilty!  When you don’t follow standards, things look so messy in a few years, that you not only try to avoid looking at the code, sometimes you just end up re-writing it entirely. 

That’s a lesson for you kids: get it right the first time, by taking your time.  That’s why what I’ve done for the last several years is include a “readme” file in every new folder I create.  It’s basically what we call a “run book”, so that if someone comes in cold, you know exactly what needs to be done, if you start from scratch.  It’s tremendously helpful.

Anyway, that’s not really the reason I linked to his article.  What caught my eye is this:

I’ve also been validating projection systems from the 2011 season. While I’m pleased with how my system (which ran with some of Nate Silver’s ideas on PECOTA, threw out some of them, replaced them with some of my own tools and approaches, resulting in a chimeric Sildavenverport monster) graded, and I was also pretty shocked at just how little difference even the most complex systems made when compared to an ordinary three-year average.

This is exactly why I created Marcel some eight years ago:
http://www.tangotiger.net/archives/stud0346.shtml

And why I thought so little of forecasting systems that I published the code so you can create it yourself.  And I thought so little of systems precisely because I spent countless hours trying to beat myself each time.  I’d come up with the basics, then think of different parameters, and trying to combine them in different ways, to improve my system.  And each time, the gains would be so negligible, that the gain was hardly worth the time. 

Even things like park factors, which I presumed would make a huge difference, hardly made a dent.  And when it came time for pure rookies (guys who never played in MLB), systems who were designed with extreme intelligence on the matter (Rally, MGL, ZiPS, PECOTA) barely were any better than if we just presumed the players were ALL THE SAME (while Marcel uses league average out of convenience, it’s better to just use the first-year average, or about a wOBA of 15 points under league average).  The rookies thing, the MLEs, is ripe for selection bias that makes it basically impossible for those systems to beat the most basic system.  Not to mention it makes an enormous difference if the rookie is going to be a reliever or starting pitcher.

It’s not like I just took some position on the matter, and am defending it.  I took this position because this is where the path has led me after countless hours spent studying this matter in as many ways as I can.  And it’s been re-affirmed when testing systems of other people smarter than me and who spent more time than I have, only for those people to be perhaps one step above Marcel.  If you need a visual here, we all started on Canal Street, and while Marcel is at Penn Station, those systems are on 35th street, and Times Square is just outside our reach.

Any forecaster honest with himself, and his readers, will attest to this.


(26) Comments • 2011/11/22 • SabermetricsForecasting
Page 1 of 1 pages

<< Back to main