THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Sunday, February 05, 2012

Is Nate Silver alot more certain than he lets on?

By Tangotiger, 07:04 PM

I’ve been following Nate’s mean forecasts for the five primaries so far.  So far, he’s made 25 predictions over those 5 primaries (and obviously, they are interdependent).  His worst forecast result was Santorum in Iowa, where he gave him a mean forecast of 19.1, and he ended up at 24.6, for a difference of 5.5 points.  His average error over those 25 forecasts is 2.34 points, with one standard deviation being 2.71 points.

However, his posted uncertainty level is much higher than that.  Let’s take Mitt in Iowa as an example.  He gave him a mean forecast of 24.5, with a range of 13 to 32 (a range of 19 points).  In another article, he notes that his range is the 5th and 95th percentiles.  Those levels are reached at the +/-1.645 standard deviations (or a range of 3.29 standard deviations).  This means that one standard deviation for Romney is 5.8 points.

So, I calculated it for all 25 forecasts, and one standard deviation averaged 4.6 points as Nate’s uncertainty level.  However, as I noted earlier, the actual observed standard deviation was 2.71 points.  This means that Nate’s uncertainty level is 4.6/2.71 too wide, or 1.7 times too wide.

Now, either he made a calculation error of his historical data (making the width of his uncertainty level almost twice what it should have been), or this year, things simply worked out alot closer to the mean than expected, just by luck (after all, we only have 25 data points).

Here’s the data for those who want to take a crack at it:


1SD is simply the difference of the 95% and 5% columns, divided by 3.29.

Results    mean    5%    95%    1SD    diff    State    Person
24.6    19.1    10.0    29.0    5.8    5.5    IA    Santorum
22.9    18.6    11.0    27.0    4.9    4.3    NH    Paul
18.6    15.0    8.0    23.0    4.6    3.6    NV    Paul
17.0    13.9    7.0    22.0    4.6    3.1    SC    Santorum
11.1    8.1    4.0    14.0    3.0    3.0    NV    Santorum
24.5    21.9    13.0    32.0    5.8    2.6    IA    Mitt
31.9    29.5    20.0    38.0    5.5    2.4    FL    Newt
46.4    44.2    33.0    51.0    5.5    2.2    FL    Mitt
40.4    38.7    26.0    49.0    7.0    1.7    SC    Newt
39.3    38.5    27.0    47.0    6.1    0.8    NH    Mitt
21.4    21.0    12.0    31.0    5.8    0.4    IA    Paul
16.9    17.0    9.0    26.0    5.2    
-0.1    NH    Huntsman
10.3    10.5    5.0    18.0    4.0    
-0.2    IA    Perry
0.7    1.2    0.0    2.0    0.6    
-0.5    NH    Perry
13.3    13.9    8.0    21.0    4.0    
-0.6    FL    Santorum
27.8    29.3    19.0    39.0    6.1    
-1.5    SC    Mitt
13.3    15.1    8.0    24.0    4.9    
-1.8    IA    Newt
9.4    11.5    5.0    19.0    4.3    
-2.1    NH    Newt
13.0    15.6    8.0    24.0    4.9    
-2.6    SC    Paul
9.4    12.3    6.0    20.0    4.3    
-2.9    NH    Santorum
5.0    7.9    2.0    15.0    4.0    
-2.9    IA    Bachmann
22.7    25.6    20.0    32.0    3.6    
-2.9    NV    Newt
0.6    3.8    0.0    8.0    2.4    
-3.2    IA    Huntsman
47.6    51.3    41.0    56.0    4.6    
-3.7    NV    Mitt
7.0    11.0    5.0    18.0    4.0    
-4.0    FL    Paul
1.8                        SC    Other
1.5                        NH    Other
1.3                        FL    Other
0.3                        IA    Other

Blogging
#1    J. Cross      (see all posts) 2012/02/05 (Sun) @ 19:43

Interesting, although I don’t think these are independent.  Certainly, Santorum in Iowa and Romney in Iowa aren’t independent (if you’re off by a lot in one you’re more likely to be off by a lot in the other).  I think it *might* be fair to call the Santorum/Iowa and Santorum/South Carolina errors independent but there may be reasons why that isn’t so either.


#2    Tangotiger      (see all posts) 2012/02/05 (Sun) @ 20:14

Right, within each vote, they are interdependent by definition.

However, it still doesn’t take away from the fact that his results are alot closer than he’s predicting they should be.

We just don’t know why they are closer: luck or timid modeling of the uncertainty levels.


#3    J. Cross      (see all posts) 2012/02/05 (Sun) @ 20:50

I’m not sure I’m following you here.  Let’s say you roll a die a long series of times and you want to see how accurate you predictions for the number of 5 and number of 6’s are.  The number of 5’s and number of 6’s aren’t independent (there’s a correlation of -0.2).  Isn’t this the same thing.  The Romney% and Santorum% in Iowa are inversely correlated because the sum of the %’s must by 100%.

Basically, I think that there are 5 events that were predicted with better accuracy than expected, not 25.


#4    Tangotiger      (see all posts) 2012/02/05 (Sun) @ 20:55

I keep saying they are interdependent.

But, it’s not like it’s a 2-person choice for each ballot, but 4 to 7.  So, rather than saying it’s 25 choices, maybe it’s best to describe it as 20 choices.

Nonetheless, how are we to evaluate Nate’s uncertainty level then?  The actual results are much closer to the mean forecast than Nate is letting on.


#5    J. Cross      (see all posts) 2012/02/05 (Sun) @ 21:29

Ah, sorry, my eyes were tricking me and I was reading interdependent as independent and wondering what you were thinking!

How to analyze it?  I’m not sure.  I think we could estimate the correlations.  If Mitt and Santorum are each expected to get 20% in Iowa then, Santorum is expect to get in every 4 votes that Mitt doesn’t get so it’s a correlation of 0.25.


#6    J. Cross      (see all posts) 2012/02/05 (Sun) @ 21:30

Sorry, a correlation of -0.25, I mean.


#7    J. Cross      (see all posts) 2012/02/05 (Sun) @ 21:52

Not sure if this is correct but if I sum up the chi-squareds, I get 10.86 and then sum the degrees of freedom (5-1 for each contest times 5 contests = 20), a chi square calculator tells me that there would be more than this amount of error 95.0% of the time by chance.  So, if I’ve done this right, this is an unusually small amount of error given his model.


#8    Telnar      (see all posts) 2012/02/06 (Mon) @ 12:33

My gut reaction is that primaries in separate states are significantly dependent on each other.

One way for a candidate to suffer a sudden huge swing compared to the polling to date (the primary basis on which Nate’s model is built) is to have an extreme gaffe.  Gaffes are rare events, and could be modeled as a mostly unobserved talent of the candidates.  If Romney2012 (who might have a different talent level than Romney2008) has a true talent of gaffing at a rate of 0.01/month, his polling data prior to a gaffe opportunity will be more predictive than if his true talent is 0.05/month, but neither possibility is high enough for us to expect to observe it accurately in a year of active campaigning.

There are other factors which could easily be seen as Nate getting lucky.  For example, the revelations about past publications under Ron Paul’s name came at a time when there were enough polls remaining before a primary that they could be incorporated in the model predictions.  Had they occurred the day before a ballot, we would expect a less accurate forecast for all candidates in that primary.


#9    Tangotiger      (see all posts) 2012/02/06 (Mon) @ 13:23

I’d like Nate to release his historical data, so that we can see that his model (though obviously fitted with the data we are looking at) does have these wide-ranging forecasts.


#10    Telnar      (see all posts) 2012/02/08 (Wed) @ 10:30

If you feel like rerunning the variance calculation for 2012 including yesterday’s data (when Romney did much worse than expected), I suspect that it won’t be nearly as clustered.

Both this data and the rare gaffe model make me suspect that the normal curve doesn’t do a good job approximating primary results.  Perhaps a fat tailed distribution would be more appropriate.  I doubt that we have enough 2012 data to do a decent estimate of the kurtosis, but if we got the historical data like you suggested in 9....


#11    Tangotiger      (see all posts) 2012/02/08 (Wed) @ 11:43

I’m hoping that I set the process in motion that someone ELSE can take what I did, and add to it.

I need more fishermen!


#12    JD      (see all posts) 2012/02/08 (Wed) @ 22:01

Tango/11 - “I need more fishermen!”

But it’s way more fun to just eat the fish. At Red Lobster. smile


#13    Guy      (see all posts) 2012/02/28 (Tue) @ 14:59

I think Nate’s shifting forecast of today’s Michigan primary is a good illustration of how problematic it is to even estimate confidence intervals for his projections.  Monday morning, he had Romney as a 77% favorite, but now Romney is down to 55%.  To me, that strongly suggests the earlier 77% forecast was wrong (which I thought at the time I saw it)—Romney’s position wasn’t really that much stronger than Santorum’s, who he only led by a few % points.

Nate’s forecasts have two potential sources of error.  One, the polling he relies on could be wrong.  We can estimate the range of error there reasonably well, although not that well for primaries (because we don’t really know who will vote).  But then there is the second source of error:  the potential for voter preferences to change before election day.  I don’t know how Nate estimates that, but whatever he does will be very problematic.  And that’s because there is just too little data to rely on.  He could try to estimate how often there has been a shift of 3 points or more over the final 48 hours of a presidential primary. But the history of public polling is too limited to estimate this well.  And we already know that voter preferences in this GOP cycle are much more variable over short periods of time than in most or all past primaries.  So I would argue we really have no idea what the likelihood was that MI Republicans would shift a few points over the last 2-3 days, much less the combined error of that plus errors from polling and estimating the likely electorate.


#14    Tangotiger      (see all posts) 2012/03/01 (Thu) @ 00:11

Since it’s awarded by district, it’s a major non-story that you get a 15-15 tie:

http://www.denverpost.com/dnc/ci_20069454

Really, so very embarrassing that the news channels tried to make this news.


#15          (see all posts) 2012/03/01 (Thu) @ 02:46

Why do you say that, Tango?  The “game” is not “winning primaries”, it is collecting delegates.  In Michigan, Santorum did so more efficiently, both by dollars spent and by how his votes were accumulated, than Romney.  Why isn’t that a story?  In 2008, Obama beat Clinton in large part because he paid more attention to the rules on how delegates were awarded.  See the table entitled “Popular Vote Table” below; depending on how you count, Clinton and Obama were in a dead heat by votes, but Obama won the majority of delegates.  http://en.wikipedia.org/wiki/Results_of_the_2008_Democratic_Party_presidential_primaries


#16    J. Cross      (see all posts) 2012/03/08 (Thu) @ 20:39

I took another look at Nate’s predictions now that we have more data and his error bars are looking right on now.  50 out of 55 within the 90% CI’s and the z-scores spread as they should be based on his confidence intervals.


#17    Tangotiger      (see all posts) 2012/03/08 (Thu) @ 21:36

Great stuff Jared.  Maybe at the end of all this, you can post the results like I have, and we can look at it some more.


#18    J. Cross      (see all posts) 2012/03/08 (Thu) @ 23:11

Thanks.  I put up a post with the results thus far on the Steamer Blog with a link to download the data.


#19          (see all posts) 2012/03/08 (Thu) @ 23:41

FWIW, what you probably want to do is look at the forecasts not just on Election Day but throughout the period leading up to the election, since we’ve sometimes issued forecasts as much as 45 days in advance.

When I did that prior to Super Tuesday, I found that the forecasts were pretty darn well-calibrated. There had been fewer misses than there were supposed to be on Election Day itself and the couple of days leading up to it, but more misses in the period about 7-14 days in advance.

http://fivethirtyeight.blogs.nytimes.com/2012/03/01/a-warning-on-the-accuracy-of-primary-polls/

Also, the model definitely doesn’t assume that the errors are normally distributed, nor that they are symmetrical. The whole approach is fairly nonparametric and relies on quantile regression.

Another challenge is that the errors aren’t independent from one another—the polling has better and worse years. This is probably even more apparent in the case of a general election when everything is held on the same day. IIRC, our forecasts in both 2008 and 2010 were a little underconfident(*). But in the long-run, there are also going to be years (1998 was one example) where the polling does badly all over the map and you wind up with a lot of misses.

(*) It could be that the polling is getting better but I think the prior for that is a poor one since polls are getting lower and lower response rates.


#20    J. Cross      (see all posts) 2012/03/08 (Thu) @ 23:46

Thanks, Nate.  I just realized that the z^2’s don’t really follow a chi-square since as you say the aren’t independent and I edited the post.

I do wonder if the polling has gotten better in the last few years.  If so, you might deserve some of the credit for keeping tabs on them.


#21    J. Cross      (see all posts) 2012/03/09 (Fri) @ 00:07

That is, even if the errors were normally distributed, the z-squareds wouldn’t follow a chi-square distribution.


#22    Uncle Herniation      (see all posts) 2012/03/09 (Fri) @ 10:35

Not sure how you can generate a standard deviation from the 10th and 90th quantiles. The standard deviation is calculated from the sum of squares and variance, which you do not have. Your rudimentary method seems to at the very least assume a Gaussian distribution, but I’m not sure that can be correctly assumed in all cases. 90% confidence intervals are the product of a mathematical procedure, not a researcher’s whim.


#23    Tangotiger      (see all posts) 2012/03/09 (Fri) @ 11:00

It’s the best I can do, with the data in hand.

If you would like to do something better, the floor is yours.

If you would like to show evidence that what I did has zero value, the floor is also yours.

If you are simply suggesting that what I did has to be qualified, but otherwise has some limited value, then fine, this thread has done its job, and it was worth my ten minutes, and the readers’ two minutes.


#24    Uncle Herniation      (see all posts) 2012/03/09 (Fri) @ 20:16

"If you would like to do something better, the floor is yours.

If you would like to show evidence that what I did has zero value, the floor is also yours.”

Wrong. If I make up some mathematical mumbo jumbo that can only be disproved with data that aren’t available, then there’s no way that you could show how it’s incorrect or how it has zero value. The point is that without the full data, it’s impossible for someone else to compare your results to the results that would be expected if you had the complete “data in hand.”

If the purpose of your post was to claim that “Nate’s uncertainty level was too wide” then you should have realized you had insufficient data to decide this one way or the other and saved everyone’s time by finding something else to write about.

The problem is that your readers may not all be statistically sophisticated, and may actually trust that you know what you’re doing without questioning these problematic conclusions based on insufficient data. When it comes to political forecasting, that’s one thing, but when it comes to sabermetrics, people trust you. But if this is your analytic approach, maybe they shouldn’t.

Not trying to make this personal, but rather to suggest a more rigorous approach given your eminent standing in the field of sabermetrics.


#25    J. Cross      (see all posts) 2012/03/09 (Fri) @ 20:34

U.H.,

Statisticians model data using imperfect models.  When a histogram of errors looks roughly normal (as these errors do) it’s not unreasonable to model them as normal and see what that implies.  You might take issue with that choice here and point out reasons why the model doesn’t work (as Nate did) but calling it “mathematical mumbo jumbo” is a bit unfair particularly given that this is a blog post


#26    Uncle Herniation      (see all posts) 2012/03/09 (Fri) @ 21:11

Hi J Cross,

I’m a statistician, so I feel qualified to comment. And like I said, the point is not to make this personal, but to call attention to some faulty assumptions and methods that non-statistician readers might overlook. If it helps more baseball fans become educated consumers, it can only improve what sabermetricians are trying to do.


#27    J. Cross      (see all posts) 2012/03/09 (Fri) @ 21:29

Definitely good to have that input.

So, if you were looking to analyze Silver’s error bars how would you go about it?  Simply too little data to say much of anything?  It seems to me that we can, at least, say that they’re in the right ballpark and not totally out of whack.


#28    J. Cross      (see all posts) 2012/03/09 (Fri) @ 21:40

Although, the distribution of errors looks considerably different without Virginia which suggests that even that conclusion might be premature.


#29    Tangotiger      (see all posts) 2012/03/09 (Fri) @ 21:41

I ended with this:

...or this year, things simply worked out alot closer to the mean than expected, just by luck (after all, we only have 25 data points).

Now, perhaps I should have OPENED with that, this way, someone who expected a conclusion would know not to wait for it, and therefore, not read my blog post at all.

In any case, I feel confident that what I did actually gave people something to think about, and inspired others to do more (like Jared).  That’s my only objective here.

If that’s not good enough for you, and if you can’t add anything of value here, then why are you here exactly?  To poke your head in the door and tell me that I have a lame party?


#30    Uncle Herniation      (see all posts) 2012/03/09 (Fri) @ 22:20

The idea that criticism might be valuable is not a new idea. But if you like fanboys, I’ll leave.


#31    Tangotiger      (see all posts) 2012/03/09 (Fri) @ 22:41

A statistician who makes conclusions based on n=1.

You said this:

But if this is your analytic approach, maybe they shouldn’t.
...
But if you like fanboys, I’ll leave.

So, you like to do the “if you...” conditional statement, draw a conclusion based on it, and rely on your limited sample size for it. 

Bravo!

What I like is *constructive* criticism.  So, if you can stop your utter b-llsh!t (*), and actually provide value, then please do so.

(*) Summary opinion with no evidence.  And n=1 is not evidence.


#32    Tangotiger      (see all posts) 2012/03/11 (Sun) @ 22:42

Nate responds:

http://fivethirtyeight.blogs.nytimes.com/2012/03/01/a-warning-on-the-accuracy-of-primary-polls/

I wish I would have thought of doing it that way.


#33          (see all posts) 2012/03/14 (Wed) @ 13:49

This reminds me, does anyone know where the data for primary results comes from? I mean the county by county data that gets updated every couple minutes, which is used for all the graphic results on the nytimes and others.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 08:49
Do pitcher’s reach back for velocity when needed?

May 25 08:11
What sabermetrics is NOT

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story