THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, January 04, 2010

Non-stathead Chronicles - Bleed Cubbie Blue

By Tangotiger, 01:03 PM

(Note: I quite enjoy these.  I encourage all non-statheads to send me tough questions.  Give me at least five questions, and preferably at least 10.  The tougher the better.  Whatever it is that drives you batty.  I’ll try to make some sense so we can move forward.  These questions from Cubbie-Tim seem pretty tame to me, compared to the Silva questions.  We also have 217, and counting, Q&A in our mailbag: http://www.tangotiger.net/wiki/index.php?title=Mailbags .  If you haven’t checked those out, then please do so.)

A background for the readers, I saw a 232-post thread that started with a link to the Mike Silva Chronicles at http://www.bleedcubbieblue.com .  I asked one of the commenters who seems to be on the anti- side to send me his questions, and I would answer them.  Here it goes from Cubbie-Tim:


1. Who decides what data is “minimal effect” or “levels out” so to speak? Is that different data each time based on who the SABR is or a standard piece of data?

Just a correction based on all your other questions that will follow: SABR is the organization for baseball research.  It has little to do with sabermetric research.  The word you intend to be using is sabermetrician or saberist.

As to your question, there is no one person that decides anything.  The only thing researchers do is interpret the data.  That’s our job.  It is not to advance a theory and look for the data to support it.  Saberists follow the evidence.  If Joe Carter has 115 RBIs, a piece of data that is factual, our job is to try to explain how someone with a SLG average below .400 could have done that.

2. If different SABR’s use different pieces of data (for one example I linked to a SABR says to use CERA and another says not to), what does this do to change how it is viewed by “nonstatheads” since there is not a constant?

Ideally, saberists will rally around some principles, so we can move forward to finding other principles.  When that doesn’t happen, then you have to look at the merits of the arguments and decide who makes his case the best.  And if that’s still not enough, then you have to hedge your bets and take a bit of everything.

As for catcher ERA specifically, you certainly can’t use it for a single season.  Like most things, the more data you have, the more the signal sticks out from the noise.  The person to listen to is the one who tells you how much signal is in the noise.

3. Does a SABR believe leaving any data out truly gives the best picture, since even small data amounts can skew a stat? I know that in most any industry a variance is attached, just curious more than anything.

Leaving data out gives the best picture?  Well, if there’s a reason for leaving data out (say Coors data) because it’s a way to handle the bias, that’s ok.  You could try to correct for the bias by keeping the data as well.  That’s ok too.  You just have to tell the readers what you are doing and why you are doing it.  You make your case, and let the market judge you on the merits of the argument (not on their prejudices).

4. Why it is that “statheads” will say that intangibles are of little to no importance, and “nonstatheads” are to just accept it blindly? When I asked that I never can get a straight answer

You say it as if there’s a 100% position for each “group”.  First off, each person is an individual, and no one follows the group blindly.  If you want to say there is a predisposition to lean toward one side, that’s ok.  That’s a far cry from the extreme position your statement suggests.

To the specific claim of intangibles, of course they are important.  The question is not if they are important, but how do you identify it BEFORE the fact?  Say for example your star goalie is mired in a messy divorce, with his spouse calling him minutes before game time.  You are in the playoffs, and are making a run for the Stanley Cup.  So, you know, for a fact, that this goalie, a human being, has his emotions in play.  They are going to surface, in some form or other, at some point during the day.  But, how do you know how he’s going to handle it during game time?  Is it going to cost his team?  Is it going to make him more resilient?  Are you willing to bet, with money, on your opinion?  Martin Brodeur’s Devils won the Stanley Cup that year.  AFTER the fact, we can say that his powers of concentration are superb.  Would you have bet on it before the playoffs started?

The question is not if something exists.  No one denies the existence.  The question is one of identification.  How do you identify it, how do you find it, how do you measure it.  And once you do, what do you do with this?  And if you can’t identify it, what do you do with it?

5. Why does a “nonstathead” accept the importance of stats and projections, but many “statheads” will not agree on that?

A non-stathead accepts the importance of stats?  And projections?  If your question is what you intended, then I can’t agree this is the case.  If your question is meant to be flipped, then I can’t speak for the position of non-statheads.  The only thing I would say is that a forecast is more like an over/under bet.  It’s something that you are about equally sure is just as likely to happen more than that line as it is to happen below that line.

6. As the SABR stat evolves almost annual (URZ for example), why do some SABR continue to use the old manner while some use the newer one, which results in SABRs having different results to the same question (so to speak, could not think of a better way to phrase this)

Some saberists are stubborn.  Any half-assed metric based on play-by-play (PBP) stats is better than one that is not.  A PBP metric will say it’s important to know one, or all of: where a ball was hit, how hard it was hit, which park it was hit in, who the pitcher was, who the batter was, what the base/out situation was.  A non-PBP metric will ignore all that information, and instead try to fill-in-the-blanks in some other way.  So, I would reject a non-PBP metric if a PBP metric exists.

Some of the parmeters being used are not objective, but subjective.  So, there could be a good reason for not using some of that data.  Even the location of a batted ball is not so objective.  Two different people can look at the data, and be off by 10 or 20 feet.  This is not that uncommon

As for the various competing PBP metrics (UZR, PMR, Dewan, among others), they each try to make decisions as to how much each of the parameters means to a particular ball that was hit.  They try to establish a baseline, with as many parameters as they can, the chance that a particular batted ball would have been caught by an average fielder.  It’s a tough job in terms of trying to find a common ground, as you can imagine. 

Ideally, you would also have a human observer give his opinion when a ball was hit, so that we don’t have to infer so much, and we could rely on stringers to tell us how tough a ball actually was to field.  But, we’ll get to that point soon enough. 

7. Where would you recommend newbies or novice stat people to go for insight to the SABR world and how the various stats and projections have evolved?

I suppose my wiki is one such place:
http://www.tangotiger.net/wiki/

I’d recommend the annual Hardball Times:
http://www.hardballtimes.com

I would not recommend my book until you actually have some saber-leanings.

#1          (see all posts) 2010/01/04 (Mon) @ 14:14

"The only thing I would say is that a forecast is more like an over/under bet.  It’s something that you are about equally sure is just as likely to happen more than that line as it is to happen below that line.”

A little off-topic, but does anyone have any data on distribution of error terms for the major projection systems?  Are median or mean errors zero for any?  Which ones?


#2    shawndgoldman      (see all posts) 2010/01/04 (Mon) @ 14:19

Thanks for this, Tom! I really appreciate it. I’ve recently been brought on to BCB to be their “statistics expert.” I want to write some posts introducing that community to the saber-minded approach and to bring them “up to speed” on the various tools available. The community is for the most part not well-versed in these types of analyses, but they are genuinely curious.

On a related note, I’m also in the process of applying for faculty positions, and if I get one I’d like to try to teach a course on this stuff to fulfill the “generic non-science major” requirement. The target demographic would be similar: interested people that have little to no background in these tools.

The biggest thing I struggle with is: where do I start? Your answer to one of these questions has given me a good idea:

Like most things, the more data you have, the more the signal sticks out from the noise.  The person to listen to is the one who tells you how much signal is in the noise.

To me, one of the main issues we face is getting people to understand how statisticians and scientists approach error and uncertainty. IMO, that single issue drives a lot of the debates between various parties. The “inaccuracy” of projections, the improvement of stats with time, the rejection of applying intangibles to expectations… these things all are related to error and noise in a fundamental way.

Is this a good starting place? Error and uncertainty? To me, it seems to provide a strong context for everything else. And if I’m teaching one science course to a bunch of non-science majors, it’s one of the principles I want them to retain years after the course is finished.


#3    Mitch      (see all posts) 2010/01/04 (Mon) @ 14:27

Don’t know if you guys would agree, but I found the Baseball Prospectus book Baseball Between the Numbers a great place for a not-yet-stathead to start.

http://www.amazon.com/Baseball-Between-Numbers-Everything-About/dp/0465005969


#4    Tangotiger      (see all posts) 2010/01/04 (Mon) @ 14:38

Mitch, I’ve often recommended that book.  That would be more SABR 201 I would say, with The Book being SABR 301.

For novices, a SABR 101, Eric’s Bridging the Statistical Gap might be the best bet.

***

Bill:
http://www.insidethebook.com/ee/index.php/site/comments/community_forecast_2007_preliminary_results/

http://www.insidethebook.com/ee/index.php/site/comments/community_forecast_2007_pitcher_results/

http://www.insidethebook.com/ee/index.php/site/comments/evaluating_the_2008_forecasting_systems/

http://www.insidethebook.com/ee/index.php/site/comments/forecast_evaluations/

***

Shawn: I’d say there are two basic approaches: interpreting the data (as you are thinking it), and asking basic questions (and using data to find the answers).

Creating a metric for its sake doesn’t help.  It should really be thought of as a byproduct or toy (which you can then refine for more purposes).

Do a search for “syllabus” in this blog.  I just posted to a thread by Justin on the topic of teaching.


#5    shawndgoldman      (see all posts) 2010/01/04 (Mon) @ 15:53

Thanks, Mitch and Tango! It’s much appreciated. To be honest, I would probably be well-served by sitting down with these books myself, first.


#6    cubbietim      (see all posts) 2010/01/04 (Mon) @ 23:05

Thanks Tango.  I appreciate how you went into details about this and shed more light on it for me.  I appreciate you taking the time to answer my questions as well, and look forward to learning more about this as I continue to read on it.


#7          (see all posts) 2010/01/05 (Tue) @ 16:49

As for diving into analytics, I spent most of my infancy over at fangraphs.  While their pieces have a penchant for various biases, they do a great job at immersing you in the data without just saying “here’s a bunch of charts...good luck”.  It might be inefficient, but it’s definitely more fun to learn by playing with the data they have.

Their willingness to constantly add new stats and tweak old ones encourages me that they won’t settle too far into that “stubborn” problem you highlighted in #6.  I’ve become increasingly disillusioned with BP because of that (I’m hoping the new things they’re doing changes that).

I mentioned where I spent my infancy, I’m still a mere toddler…


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 14:26
Mail: rWAR v fWAR

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 13:00
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 12:05
Could Rob Dibble have been a comp for Strasburg?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II

Sep 01 22:11
PITCHf/x Summit 2010 - Recaps