THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, May 19, 2010

How much talent is there with NHL goalies?

By Tangotiger, 04:21 PM

Thanks to Tom Awad for the data, here’s what I did. 


I took all goalies that faced at least 1200 shots over the last three seasons in 5-on-5 play.  I calculated each goalie’s save percentage (Vokoum was at .933), compared it to the league average of .921, and figured his z-score of 2.82.  That is, his performance was 2.82 standard deviations from the mean.

That by itself tells us nothing, as we expect someone to lead at some level.  What interests us is to take the standard deviation of all the z-scores.  If there was no such thing as goalie talent, we should get back 1.00.  If there is goalie talent, it’ll be higher than 1.00.  The more talent, the higher the number.

In the data I have, the standard deviation of the z-score is 1.25.  So, yes, there is NHL goalie talent.  This is based on 2650 shots faced.

What can we do with this?  We will follow the exact same methodology as I used for measuring HR talent for pitchers.

With 2650 shots faced, and a .921 save percentage, the standard deviation of the binomial is .0052.  We put all this into our trusty equation:

1.25^2 = variance(true)/(.0052^2) + 1

variance(true) = .0039^2

And there we have it.  The spread in goalie skill per 5-on-5 shot is one standard deviation equal to .0039 goals per shot.

Now, let’s figure out our regression equation.

1/(1-r^2) = 1.25^2

And so, r=.60 when n=2650

We also have a general equation that says:

r = n / (x+n)

And in this case:
.60 = 2650 / (x+2650)

That makes x = 1790

Therefore, our correlation equation is:

r = Shots / (Shots + 1790)

Our regression equation is 1-r.  And so:

regression rate = 1790 / (Shots + 1790)

If you have 1790 shots, you regress the SV% 50% toward the mean.  If you have 3580 shots, you regress 33% toward the mean.  Simple enough?

Vokoum had a .933 save percentage on 4652 shots.  This means we regress his save percentage 28% toward the league mean of .921.  Therefore, our best estimate of his talent level, limit to this data, is .929.

This is the list of all the goalies, ranked by their talent level, and how many goals they are worth per 29 even-strength shots.

talent goals Goalie
0.929 0.23 TOMAS VOKOUN
0.929 0.23 TIMOTHY THOMAS
0.928 0.19 JONAS HILLER
0.927 0.17 ROBERTO LUONGO
0.926 0.15 CRAIG ANDERSON
0.926 0.12 JAROSLAV HALAK
0.925 0.11 MARTIN BRODEUR
0.925 0.10 PEKKA RINNE
0.925 0.10 TY CONKLIN
0.925 0.10 HENRIK LUNDQVIST
0.924 0.08 RYAN MILLER
0.924 0.08 ILJA BRYZGALOV
0.924 0.06 JEAN-SEBASTIEN GIGUERE
0.923 0.05 KARI LEHTONEN
0.923 0.05 EVGENI NABOKOV
0.923 0.05 JAMES HOWARD
0.923 0.04 NIKOLAI KHABIBULIN
0.923 0.04 CAREY PRICE
0.922 0.02 NIKLAS BACKSTROM
0.922 0.02 CAM WARD
0.921 0.00 DAN ELLIS
0.921 0.00 MARC-ANDRE FLEURY
0.921 0.00 SCOTT CLEMMENSEN
0.921 (0.00) JOSH HARDING
0.921 (0.00) MARTIN BIRON
0.921 (0.01) ANTERO NIITTYMAKI
0.921 (0.02) JONATHAN QUICK
0.920 (0.03) CHRIS MASON
0.920 (0.03) MIIKKA KIPRUSOFF
0.920 (0.04) JOSE THEODORE
0.920 (0.05) STEVE MASON
0.920 (0.05) ALEX AULD
0.919 (0.06) RICK DIPIETRO
0.919 (0.07) MANNY LEGACE
0.919 (0.07) MARTIN GERBER
0.919 (0.07) JASON LABARBERA
0.919 (0.07) DWAYNE ROLOSON
0.918 (0.09) MARTY TURCO
0.918 (0.09) MIKE SMITH
0.918 (0.10) PETER BUDAJ
0.918 (0.10) CRISTOBAL HUET
0.918 (0.10) JOEY MACDONALD
0.917 (0.14) ONDREJ PAVELEC
0.917 (0.14) JOHAN HEDBERG
0.916 (0.15) BRIAN ELLIOTT
0.916 (0.16) MATHIEU GARON
0.916 (0.17) PATRICK LALIME
0.915 (0.19) CHRIS OSGOOD
0.914 (0.20) PASCAL LECLAIRE
0.914 (0.21) VESA TOSKALA
0.914 (0.23) ANDREW RAYCROFT

#1    Guy      (see all posts) 2010/05/19 (Wed) @ 17:35

For those of us who don’t follow hockey, what does this mean in terms of wins?


#2          (see all posts) 2010/05/19 (Wed) @ 17:40

Tango,
Interesting- I didn’t recall the pitcher HR talent study so I might just be out of the loop on this methodology, but what is the reason for the 1200 shot cutoff over the last three seasons with this method?
Since you’re regressing based on the amount of data, couldn’t this be applied more or less to any goalie used in a reasonable stint as an NHL netminder over the last year or two?

I bring this up primarily because I couldn’t help but notice Thomas at #2, and how Rask has not yet accumulated enough shots to meet the cutoff; also as a Hawks fan I was curious how this sort of analysis would rank Niemi but he didn’t have enough shots to make the cutoff (746 shots at EV with a .914% this year, so not particularly good; pulled these #s from NHL.com which perhaps is different than the data you used as it might count 4on4 play?)

Anyhow I noticed Thomas was right around Niemi this year with a .913 SV% on 988 shots at EV; I wonder what this list would look like if the data were weighted by year? Although any weighting would just be pulled out of thin air, so perhaps that isn’t a good idea with such thin margins between players using this analysis anyhow…

Would be interesting to see similar numbers for PK SV%- it certainly seems that there might be some sort of separate talent measurable on the PK, although it might be tough to tease out given the smaller sample sizes with which to work.

Really interesting how quickly NHL goalie talent seems to come and go for many guys who are able to put up a few solid seasons; it certainly seems to me that teams with a flexible goaltending scenario are better off in the long run, given today’s salary cap environment- so few guys have been elite year after year in the salary cap era it seems like obcessing over goaltending is a fool’s errand.

I realize my thinking on this matter might be somewhat biased in the “goalies aren’t predictable enough to pay big bucks” direction, given the Hawks and Flyers success in the playoffs so far- just a few brief months ago every talking head who could find a microphone and a cheap suit was certain goaltending would be the downfall of these two teams.

Anyhow, great post and looking forward to discussion from the hockey portion of the readership on what exactly one can draw from these results.


#3          (see all posts) 2010/05/19 (Wed) @ 18:01

Guy- I believe the generally accepted conversion rate is 6 goals per win?
So Vokoun would be around 2.5 wins in 70 games, plus any additional PK talent he might have that would come into play. Note that 70 games is a pretty aggressive workload, and it might be more prudent to suggest your top goalie play 55 games in a year, meaning the best guy in the league is only worth about 2 wins above average; going into a season it is probably accurate to say that an average goalie would be freely available on the market, so average is basically replacement level.

I don’t mean to say that below-average performance is impossible, obviously; just that replacement level for goalies doesn’t quite work like it does for skaters and baseball players. The downside of a goalie flop is huge, but the upside of a goalie (when regressed...) is only a couple wins. Of course without regressing for the sample size- Timing is everything, as the cliche goes, and Halak’s playoff performance vs WSH & PIT is obviously worth a ton, however if he doesn’t turn it around soon it will not really matter for a whole lot, historically speaking, and so MTL fans will feel the definition of “regression to the mean” as they complain all summer about Halak falling apart in the ECF, and not being clutch and top-notch down the stretch like Michael Leighton or Antti Niemi* who can take a team all the way to the Cup.

*Did not incl. Nabokov here because it doesn’t make my snark work, so crossing my fingers he doesn’t bring the sharks back past the hawks; I will take the blame for jinxing the Blackhawks if this does happen, since as any devoted reader of this blog surely can attest to, such jinxes are not real.

(tangent aside, hope that answered your ? and is not grossly incorrect; more experienced hockey analyst bloggers feel free to chime in if I did happen to mess up in the pre-tangent portion of this post)


#4    Tangotiger      (see all posts) 2010/05/19 (Wed) @ 18:19

I don’t have a good reason for putting any kind of cutoff, other than I still don’t understand my own methodology perfectly.

Guy: around 6 goals = 1 wins, and around 60 games = 1 season (for a regular goalie).  So, take the goals per game, multiply by 10, and you get wins per season.

A top goalie therefore adds 2 wins above average.  And a bad goalie is -2 wins relative to average.  So, in terms of wins above replacement, a top goalie would be 4 WAR, average goalie is 2 WAR and bad goalie is 0 WAR.

Sounds just like MLB players.


#5    Tangotiger      (see all posts) 2010/05/19 (Wed) @ 18:52

I should note that Tom Awad performed a Bayes:
http://www.puckprospectus.com/article.php?articleid=565

He used more than just the 5-on-5.

His top 3 match my top 3.  The spread looks similar as well.  Bayes is better.  My method is faster and approximates Bayes.  And of course is more apparent as to the level of regression.


#6    Sunny Mehta      (see all posts) 2010/05/19 (Wed) @ 19:09

I’m trying to leave a comment, but every time I click “preview” it takes me to a new page and my comment is gone. Is there a spam filter or something? Why doesn’t it do that AFTER the preview screen though (i.e. after I hit “submit") or even let me know that that’s what’s going on?


#7    Tangotiger      (see all posts) 2010/05/19 (Wed) @ 19:14

HTML tags or excessive use of caps sometimes triggers it.


#8    Sunny Mehta      (see all posts) 2010/05/19 (Wed) @ 19:20

I believe you’ve introduced a ton of survival bias by looking at only goalies with 1500 or more shots against. Teams often make decisions about playing time based on short term results, so playing time ends up being correlated with results. (For example, an NHL team might bring up some rookie from the minors, and if he runs good he gets to keep playing, but if he doesn’t he’s back to riding pine.)

So by simply eliminating all the goaltenders with less than 1500 shots from the population, you’re polluting the data with guys who got lucky, and selectively removing guys who got unlucky. There’s absolutely no reason to just exclude a guy from the population who has 999 shots against over the past three seasons.

If you run basic binomial sims using the entire population of goaltenders over the past three seasons, you get very different answers than the ones you posted. I know because I’ve done it. Granted, leaving in the REALLY low shot guys widens the likelihood distribution a bit, and you end up with a pretty heavily skewed distribution of sd/var sims.

So fine, even if you leave out guys that have, say, less than 100 shots, what does that distribution look like? It ain’t what you’ve presented here. It’s much, much narrower. In fact, after you remove the less-than-100-shot guys, the histogram looks like a big bell curve with a small left tail. Examine the left tail a little further and you’ll find three particular goaltenders who are likely true fuck-ups that don’t belong in the population.

Okay fine, so take them out. You’re left with 94 goalies. Examine the distribution of save percentages. Compare it to binomial (or hypergeometric, etc) sims. Notice anything? You can’t tell the fkn difference, that’s what you notice. 94 goalies. Three years worth of even strength data. No difference in the spread from what we’d expect from chance alone. And this is all BEFORE even accounting for home shot recording bias, team shot quality, and playing to the score effects. IMO your “true talent skill” numbers are likely quite a bit off, and I’d bet real money on it.


#9    Sunny Mehta      (see all posts) 2010/05/19 (Wed) @ 19:44

Tom,

Thanks for the tip on the html tags.  I was originally using the less-than sign instead of typing “less than”, and i think your spam filter thought it was an html tag.


#10    Tangotiger      (see all posts) 2010/05/19 (Wed) @ 20:19

By setting the threshhold as I have, I end up narrowing the talent level.  The more shots faced, the narrower the talent level.  So, I’m biasing it opposite to what you are saying.

Anyway, let’s lower the threshhold to 500 ES shots.  That gives me 74 goalies.

The SD of the z-score is still high at 1.23.  It implies an “X” of 1478 shots.  (Now I only need almost 1500 shots to regress 50%, whereas before I needed 1800 shots.) The spread in true talent is now higher at 1 sd = .0041.

So, it’s just like I thought.. the lower the threshhold, the more bad goalies come into play, and the wider the spread in talent.

Furthermore, Awad did a BAyes, which is the correct way to do it, and he gets similar results to me.  So, I don’t see what you are seeing.


#11    Tangotiger      (see all posts) 2010/05/19 (Wed) @ 20:27

Setting it to a minimum of only 50 shots over 3 years, gets me “X"=1012, an SD of z-score of 1.26, and a spread in talent of 1 SD = .0046 goals per shot.  This is 102 goalies.

Again, completely consistent.

***

How many NHL goalies should we have over 3 years?  At least 60, naturally.  Maybe 70 or 80?  So, a 500 ES shot limit seems like the right threshhold.


#12    Mike Rogers      (see all posts) 2010/05/19 (Wed) @ 21:56

Just subscribing to this thread. Love this stuff.


#13    Tom Awad      (see all posts) 2010/05/19 (Wed) @ 21:58

Tango, great work as always. Happy to see that our numbers match up close enough, it’s always good to get validation when you do something this messy.

Sunny: we’ve been over this at Gabe’s site. You CAN’T do the standard deviation of save percentage. Save%’s variance will depend on the number of trials. Either you do it with z-scores, as Tango did, or you do it with goals vs. average, as I did. Obviously, if you do it with save % and leave in guys who faced 100 shots, you’ll see only noise. We’re talking about an end-to-end talent spread of 1 goal every 100 shots.

That being said, I smell a good bet. I’ll try and propose something.


#14    Tangotiger      (see all posts) 2010/05/19 (Wed) @ 23:35

No difference in the spread from what we’d expect from chance alone.

But this is EXACTLY what I am doing.  If you apply the binomial based on the number of shots of each goalie, then you get a spread that is 1.25 times wider than what chance alone would say.


#15    Tyler      (see all posts) 2010/05/20 (Thu) @ 02:24

Just to add - if you’re talking WAR, it seems to me you need to factor in the PK stuff as well.  Although you’re using a shot figure of 29 per game, which would include PK/PP shots, I assume.

In any event, I don’t really disagree with any of this.  Interesting about the salaries though.  I have a marginal standings point having an average cost of $880K or so.  I’m convinced that there’s a chunk of goalies who simply do not get paid their value.  If Vokoun’s worth four wins a year and has a marginal salary of $5.8MM next year, his four WAR cost less than they should on AVERAGE, which doesn’t allow for the fact that he’s selling UFA years, when the price should be higher than average.

Love seeing Conklin and Brodeur so tightly paired too.  Prior to the 2005-06 season I advanced the arguemnt that if save percentage was the bestway to judge a goalie, there wasn’t much difference between them, although the error bars surrounding Conklin’s estimate would be much bigger.  Conklin was promptly atrocious and I had that comment thrown in my face all year.  I’ve enjoyed his subsequent revival.


#16          (see all posts) 2010/05/20 (Thu) @ 05:46

Tyler,
isn’t the distribution of Goalies/Skaters between UFA/RFA/ELC going to bias that type of analysis?
Tons of skaters produce on an ELC, especially more marginal 3rd/4th liners who might have some marginal value but not be paid much at all unless they have future upside as a younger prospect. If you’re a 26 y/o 4th liner making little, it’s completely different than a 20 y/o rookie with massive upside; and is this 880k in cap dollars?

A 0.25 WAR 4th liner, just to pull a number out of nowhere, would get time if they were in your org or available at a minimum price; but the only guys who play tiny roles for expensive contracts are younger guys with upside, or older guys signed to deals based on long-lost skillsets and reps.

If a goalie has a breakout year, rarely is it pre-RFA/UFA- yet when a skater does, it is extremely rare that they would already be UFA-eligible. The differing development curves, combined with the same contract/cap rule structures, seems like it makes it inaccurate to assign a marginal value/pt to all players without adjusting for the differing age ranges, talent spread, & mkt $$.


#17    Neil      (see all posts) 2010/05/20 (Thu) @ 10:38

I can’t get the Puck Prospectus article to load, so maybe he deals with my questions, but I have some questions.

The way in which ranking goalies seems entirely unlike ranking pitchers is that the pitcher v. batter match-up is over before the ball ever reaches the defense - we can evaluate it independent of the defense for that reason. But the offense v. defense match-up, in hockey, precedes the shooter v. goalie match-up. So it seems necessary to account for the effects of team defense - how, for example, does a team that’s good at blocking shots affect the number/quality of shots a goalie faces? (And relatedly - what about deflections? They seem to go in a disproportionate amount of the time, so has anyone ever tried to determine whether stopping them is wholly luck related? Or whether some defenses are particularly bad at preventing them?)

I’m also wondering whether a team’s penalty-rate (and penalty killing rate, maybe?) should be accounted for in some simple way, since the odds of a goal being scored are substantially higher during a power-play. Looking at the standings from this year only, the average team allowed something like 60 PPGA on about 300 PPOA. But the range in PPOA is roughly 240-340, and PK% ranges from about 75%-87%. Marc-Andre Fleury was facing-down the opposition’s best shooters on the power-play about 40% more often than Martin Brodeur was. That seems significant.


#18    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 11:13

The data I used was only 5-on-5.  Tom’s data is broken down by skater-on-skater situations.

Yes, team defense will affect the quality of each shot, and so, the save% is affected by that.

So, part of the reason that the spread is 1.25 times that expected from pure luck is that a goalie is married (mostly) to his team.  We would expect to get a spread wider than 1.00 times the pure luck spread simply because of that. 

Ideally, we would look at those goalies that switched teams to determine what kind of effect teammates have.  Perhaps the spread is 1.20 for the goalie and 1.04 for the teammates.  Or 1.15 for teammates and 1.09 for goalies.  Something.

This is NO DIFFERENT than pitchers/fielders and BABIP.

So, we need a database with goalie switchers to get a handle on this.  Or, goalies with long careers, since there’s going to be great turnaround in team personnel.  Mike Richter’s team in 1992 and 2002 is not the same, though his age will now play a huge factor.  And Leetch was there for almost all of his career.

There’s lots to consider.  The process I laid out is the starting point for it.


#19    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 13:05

Sunny, I think there may be a problem with your script or your data set. Or perhaps there is an error in mine.  In any case, I ran 1000 hypergeometric simulations and got an average non-luck variance slightly higher than Tango’s (.0041^2), median near enough the same.  So it would seem that his is a reasonable first approximation for the non-luck variance parameter.

Granted it’s an academic exercise with this data, scorer bias and score effects are huge components left unaddressed, and team effects (as suggested by Tango) are also affecting, though to a much smaller degree.  And this is an NHL measure that has survival and censorship bias issues every bit as dramatic as those in most MLB stats.


#20    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 13:26

Tango

If we chose a beta prior instead of a normal prior, just for the sake of convenience, then the K value for EVsave% in your model (knowing that we are conflating ability with scorer bias and score effects, this just for the sake of argument).

K = 4783

The larger the K, the more an element is driven by luck, and this should not change with sample size, only the error surrounding it.

To compare it to baseball:

SO rate: K ~ 45
IP HR rate: K ~ 70
BB rate: K ~ 85
OBP: K ~ 230
...
and at the extreme
In Play Single rate (1BBIP): K ~ 530

These are as estimated by Jim Albert using predictive value as the criterion.  Which is more rigorous than we’ve done here.

So NHL goaltending raw EVsave%, it’s more in line with most measures of clutch hitting in baseball, in terms of impact on results and non-luck distribution of the population.

Makes sense, no?


#21    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 13:40

Tom Awad @ #13 said:

Sunny: we’ve been over this at Gabe’s site. You CAN’T do the standard deviation of save percentage. Save%’s variance will depend on the number of trials.

Yeah, you’re wrong on this, Tom.  In fact that is the strength of Sunny’s model.  He can use subsets of the data (say even and odd numbered games) and the over/under will be set at the same result.  Try it yourself if you don’t believe me.

I do think that your presentation of the data in your last post was terrific.  I have never seen the information presented better, and I’ve read tonnes of numberish articles on sports (mostly NHL and MLB).

The choice of a normal distribution as a prior was curious (though in that particular case, it does work out a shade better than the computationally trivial beta from assumption).

You either solved that using brute force methods or your software did (I count integration by iterative methods as brute force smile ).  The simple presentation of the goalie ‘ability’ likelihood is really terrific.  What software are you using, Tom?


#22    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 14:08

So NHL goaltending raw EVsave%, it’s more in line with most measures of clutch hitting in baseball, in terms of impact on results and non-luck distribution of the population.

Makes sense, no?

On a per opportunity basis, sure.  But, a goalie has 30 opportunities a game, not 4 or 5. 

The key point is the time frame, and the time frame that we talk about is the season.  And, on a seasonal basis, using my numbers, the goalies are +/- 2 wins from average.

***

The Albert numbers for OBP is right in-line with what I have, and what we showed in The Book.  The other numbers seem a bit low.


#23    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 14:08

Tangotiger:

I read at Cosh’s MacLean’s blog that you are doing some work for the Blue Jays now.  A belated congratulations.  As a lad I was a huge Expos fan, now I am a fairweather BlueJays fan.  I lived in Toronto when they won their second World Series, kind of got swept up in it.

Encouraging to see Bluejays management using a mix of observational information and data to make decisions, rather than being strongly tied to either approach.  I would prefer that they are erring towards the former, but the mix is good imho.

As a Bluejays fan, I’d strongly encourage you to think in terms of likelihood distributions instead of forecast averages. 

As an academic example, obviously the NHL isn’t an area of expertise for you, and there are problems with the methodology that extent beyond what I have mentioned above, but we’ll use this data for the purpose of example:

Plot out sp^4400 * (1-sp)^380 ... that is roughly Vokoun’s likelihood distribution of ability, at least for our purposes today.  I gives you a sharp sense of the likelihood of outcomes for Vokoun in the following season.

From there you are into stochastic modelling, like we’d see in any business, or at least he oil business.  That’s straightforward.

Best of luck.  And go Jays!


#24    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 14:23

The larger the K, the more an element is driven by luck, and this should not change with sample size, only the error surrounding it.

This is true only if the players in your sample are randomly drawn.

In my case, I was using the number of shots faced.  And, as we know, the better the goalie, the more games he will play (and hence more shots).

As you can see by my numbers, the SD of the z-scores was 1.25 regardless of the threshhold I was setting.  This made my “X” (your “K") value go down substantially, thereby showing greater spread in talent.  As it should, since goalies with fewer than 500 shots faced over a 3 year period would basically be the bad goalies for the most part, therefore contributing to the greater spread in talent for all the goalies in that sample.

We can see this for example in baseball:
http://www.tangotiger.net/dipsbands.html

If you see the chart at the bottom, if you have fewer than 800 BIP in MLB, you represent a skewed portion of the pitcher talent.

And of course with more than 3200 BIP, we see it’s very skewed the other way.

Fascinating, no?


#25    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 14:27

As an academic example, obviously the NHL isn’t an area of expertise for you

Uhhh… actually, it is.  I’ve done far more work for NHL teams than MLB teams.


#26    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 14:46

Tango:

I haven’t read ‘The Book’.  Frankly, that’s mostly because I think that the title is terrible.  Also I have shelves full of books that I keep meaning to read, I want to put a dint in that before I start adding more.

Any road, I’m not surprised that your K values are more condensed than Albert’s.  That’s implicit in much of your writing here.  I suspect that the K implicit in marcel forecasting is optimized for wOBA, though I’ve never checked.

The thing is, Tango, the problem becomes obvious when we look at the likelihood distributions.  So back to Awad’s study, we look at Vokoun and see his calculated ability distribution ... now let’s say that Vokoun faces 1500 EV shots next year in the NHL ... we can take his result, let’s say it is .918, and calculate where that landed compared to our forecast of his ability.  Let’s say that qualifies in the 3rd percentile, to save ourselves the math ... i.e if our ability likehood distribution was correct, we would have expected him to do better than that 97% of the time.

Now that alone is no reason to worry.  We expect about 3% of guys to be in the 3% or less bracket.

If we look at everyone, though, and we’ve got too many guys at the extremes ... well then we’ve made a mistake with our original model.

And if we look at everyone and we’ve got too many guys bunched between the 40th and 60th percentiles, and almost nobody in the 1-20% bucket or the 80-100% bucket ... well then we’ve made a mistake with our original model.  We’ll call that over-regressing, for lack of a better term.  And Nate Silver has a bigger brain than mine, but I don’t think his test will catch that.  Plus he’s comparing it to other forecasting systems that are gravitating the same direction.

With that framework though, the veracity of the claims of guys like Brad Null, Jim Albert and even Robert Woods ... they become undeniable.  Though to date none have posted the information as explicitly as Awad (though I would very strongly recommend that Tom only bets against Sunny in a non-monetary way.  The badger in the woodpile is survival/censorship bias, and Mehta knows more about game theory than most of us, I suspect.  Certainly more than me.)

The hammer is predictive value.  And I suspect you’ve bored of listening to me, I know I’m starting to tire of writing :D You know a boatload more about the nuances of baseball and baseball stats and MLB measurement error than me, and those things need to be accounted for in a reasonable model.

Likelihood distributions.  Predictive value in the context of said distributions.  that’s what matters. 

Admittedly this is outside of the scope of what will be interesting in an online sabermetric conversation, but just pretend you believe me and run with that for a while, Tangotiger.


#27    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 15:03

Tangotiger said:

In my case, I was using the number of shots faced.  And, as we know, the better the goalie, the more games he will play (and hence more shots).

I think this is at the heart of the different views of Sunny and Tom here. 

Tom believes that teams are making the right decisions with regard to goalies and it is reflected in the results.

Sunny believes that teams are making decisions based on previous results, and misinterpreting luck as ability in doing so.

Both Tom and Sunny should correct me if I’m wrong, but that’s my sense of it.

What would happen in a game where all of you, me, Tom, Sunny and your 26 most frequent commentors were given a coupld of sets of loaded dice ... in this case we are lead to believe that:
a.) the dice vary in magnet strength.  We are given previous dice rolling results to illustrate that.
b.) using the same pair of dice too much over one stretch cause them to heat, and the magnetic force to wane. One of us, we’ll call him mike Keenan, thinks that’s a fallacy ... the rest of us buy in to one extent or another.

How would we play the game?

After the fact, how would the dice-o-metrics crowd estimate the true weighting of our sets of dice?


#28    Tom Awad      (see all posts) 2010/05/20 (Thu) @ 15:07

Vic #21,

Thanks. I must be misunderstanding Sunny’s model, because the way I understood it he was looking for the standard deviation of raw save %. But for the low shot total guys, save % will be 99.9% luck, and will widen the total standard deviation, making the other guys harder to notice. No? What am I not understanding? Could you explain to me in 15 lines what exactly Sunny/you are doing? For example, I don’t understand how you can reduce the data set and have the “luck” component not be larger.

Sunny himself said:

“Granted, leaving in the REALLY low shot guys widens the likelihood distribution a bit”

In Tango’s method, the Z-scores of the guys with low shot totals will converge around 1, which adds more “noise” samples but doesn’t harm the data for the high shot total guys.

For software, I do most of my analysis in C, because it’s what I know. In a perfect world, I’d be using Matlab.


#29    Tom Awad      (see all posts) 2010/05/20 (Thu) @ 15:16

Vic #27,

I thought (though I may be incorrect) that my way of calculating things, based on variance of total goals, eliminated survival bias. For example, let’s play a game. We’ll roll 10 dice, then choose the 5 highest ones and roll them again. Obviously, the dice with 2 rolls will appear to be “better”, on average than the dice with 1 roll. But when you do a variance analysis of the sums, taking into account the fact that some dice got rolled twice, you should still see that it’s 100% luck. 

My calculation of the total amount of luck/skill will be correct. HOWEVER, my estimate of the a priori skill level of each individual die (assuming they were loaded, and I don’t see 100% luck) WILL be biased towards those who benefited from the survivorship bias. In this respect, Sunny is correct.


#30    Sunny Mehta      (see all posts) 2010/05/20 (Thu) @ 15:50

Imagine we gave a fair coin to a big population of people, and we told them all to flip it 1000 times. Then we told everyone who flipped fewer than x number of heads to put their coins down and stop flipping. We tell the remaining population to flip another 1000 times. And we repeat the process a couple times.

Now someone comes along and says “I’m analyzing coin-flipping skill, and there seems to be significant skill for guys who have a minimum of 4000 flips.”


#31    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 15:52

There was 102 goalies that faced at least 50 ES shots over the last 3 years.  Let’s split them up into 3 groups of 34 goalies, sorted by shots.

The first group, facing at least 2098 ES shots, had a standard deviation of their z-scores of 1.16.

The third group, facing at most 662 ES shots, had an SD of their z-scores of 1.15.

The middle group had a z-score of 1.14.

What does this tell us?  If the shots faced was not a parameter (i.e., did not indicate quality), we would have expected all three groups to have had the same z-score, and that z-score to be 1.26.

Instead, each group was the same z-score, but only 1.15 or so.  The implication here is that shots faced does tell us about quality of goalies.  The more shots you face, the better the goalie (on average).  And, the more goalies you add into your sample, the wider the talent distribution.


#32          (see all posts) 2010/05/20 (Thu) @ 16:04

The .921 league average number is the average of those guys who made the 1200 ot cutoff, right?
Why regress to the mean of this group rather than the mean of the league over that time period, which I believe is a point or two lower- I don’t see the number for 09-10 readily available, but .919 in 07-08 and .920 in 08-09 makes me think this .921 figure is most likely the 1200+ shot guys total EVSV%)


#33    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 16:06

"For software, I do most of my analysis in C, because it’s what I know. In a perfect world, I’d be using Matlab.”

Damn, that’s impressive, Tom.

My advice would be to stick with that, refrain from Matlab, Minitab, R, etc.  You’re dealing with nuts and bolts now, that math you executed is extremely difficult.  Colour me impressed.  If you get into the stats packages then you’re dealing with black boxes, I think that the essence of what you are doing, and why, can become lost.  You end up with all sorts of madness coming from that.  I can’t speak for sabermetrics, but in hockey analysis you can see the those guys a mile away.

Frankly, I still don’t know the true nature of Tango’s math in this thread.  I think he has assumed normal distributions for ability (non-luck) and centred them, then used Pearson’s (or Fisher’s?) math to deduce the joint probabality distribution of the two.  I mean I believe that almost anyone is capable of almost anything (I learned that from the Jerry Orbach character on Law and Order), but damn, that derivation is big math.  Plus I’m not comfortable with the assumptions from the get go, so that was my excuse for not bothering.

Am I wrong?

In any case, it works out nearly the same as a Monte Carlo model, so I’m not going to argue.

On Sunny’s comment re the widening of the likelihood distribution ... since you’ve done Bayesian math from the ground up you should be the one guy that follows that.

Think of the non-luck distribution plotted out on a computer screen and representing 50 goalies.  Each of those 50 goalies owns an equal share of the pixels under the curve, and the total pixels are linited to 1000.  Adding in guys with shorter track records ... well they still get their fair share of those 1000 pixels under the curve, but because we know less about them, their persoanl non-luck likelihood distributions are wider ... so they combine to widen the population non-luck distribution.

Sunny was saying nothing more or less than that.

He has a hell of a point, though.  And while it would be convenient to disregard it, that wouldn’t be wise IMO.

Generally I suspect that the survivors, or at least the less censored, would tend to gravitate towards the mean more heavily, just thinking of my magnetic dice game above.  That would contradict both Sunny’s implied interpretation and your results, so it’s not a friendmaker of an argument.  Still, I don’t know.

Agree or disagree, I don’t think that either of you or Sunny are particularly tied to any philosophy.  It’s about what works best.

If you use your recent presentation, but with Sunny’s EV-tied on-the-road data (that’s not totally clean, but pretty good IMO) ... run your same scripts and output your data the same way.  We’ll run an order test on that data against future results.  Or just exclude the 09/10 data then test it against that.

Makes sense, no?


#34    Sunny Mehta      (see all posts) 2010/05/20 (Thu) @ 16:11

Vic,

Yeah I just ran the whole model myself using the entire population, and here’s what I get:

eta = 0.9190374
K = 2628.674

It appears that at K shots, the Total Variance is 5.660157e-05 (sd = 0.007523401), which is comprised of Binomial Variance equal to 2.830617e-05 (sd = 0.005320354), and Non-Binomial Variance equal to 2.82954e-05 (sd = 0.005319342).

The thing that just bugs the hell out of me wrt all of this is the shape of the observed distribution. It’s so obviously a bell curve (which is spread out no wider than what we’d expect from chance), but with a small left tail that does in fact make the entire distribution spread out wider than chance. But does the beta curve account for this?


#35    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 16:12

Sunny: as long as your subgroup has a variance that’s greater than that of the binomial, then you’ve identified… something, be it skill, or some bias.

***

Say, you have 1024 people flipping 100 coins, and the top 50% get to play again.  Then those 512 people flip 100 more times, and the top 50% get to play again.  Then, those 256 people flip 100 more times, to get down to 128, and then down to 64, then down to 32 who flip 100 final times.

What standard deviation do you think we’ll find for the 600 flips for those 32 people?  Well, it’s going to be LESS than that of expected from the binomial had they each been given 600 flips, no?

So, given the survivorship issue, any spread we see will be understated.  This becomes obvious here, where we see the bunching happening:
http://www.tangotiger.net/dipsbands.html

And, it is also supported with the z-scores I reported for the three groups of goalies.


#36    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 16:15

Jared, for the 102 goalies with at least 50 shots faced, the sv% was .9202.

For the 74 with at least 500 shots faced, it was .9208.

Conclusions will not change.


#37    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 16:24

Vic: read the DIPS Bands link 24 or 35.  That’s how I calculated the z-scores.  I then took the standard deviation of the z-scores.

***

Sunny: ok, so you’re showing non-randomness.  The K of 2628, while larger than my numbers, still shows a decent deal of skill for goalies.  While I regress Vokoun’s SV% 28%, you are going to regress his 36% toward the mean.  Close enough that we can agree to something.

Now, you seem to be implying that because the regressed observations gives us a normal shaped talent spread that it should cause us some concern?  Why would that be?

You think it should be a right-tail of the normal distribution?

Perhaps this will explain it for you:
http://www.tangotiger.net/talent.html


#38    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 16:31

I’ll also add the following: because we have a HUGE level of uncertainty around our estimate of talent, we almost have no choice but to see a bell curve.  If for example we had NO uncertainty, then we would see something that would be the right-tail of a curve.

But, seeing that we have a decent level of uncertainty, all the players are being regressed toward a midpoint (NOT toward the left-point), and so, the right-tail ends up looking somewhat like a bell curve.  The more uncertainty in the talent metric, the more it will look like a bell curve.  If you had almost total uncertainty (among those 74 or 102 goalies), you would simply have a very narrow curve around the mean.

Basically, start with the right-tail with 0 uncertainty and with a very narrow curve in the middle with 99% uncertainty.  Change the shape of the curve from one to the other by changing the level of uncertainty of the talent metric.

At some point, you’ll get something that looks like a bell curve.


#39    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 16:37

Tango

Having reread this thread, forgive the lack of diplomacy in my debating style.  You and I may well both be wrong, and probably an anonymous reader who gambles MLB game lines for a living is clucking his tongue at both of us in this thread.

It is an interesting conversation, though.  The inital study aside, we’re hitting at the heart of some very important issues, methinks.  Oddly enough, issues that are even more relevant to MLB than the the NHL.

And the initial study included ... top prize goes to anyone who can predict future road EVsave% (close game score only) within the narrowest range of error.  If that data isn’t available I’ll publish online at google docs or similar.


#40    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 16:41

Tango said:

Jared, for the 102 goalies with at least 50 shots faced, the sv% was .9202.

For the 74 with at least 500 shots faced, it was .9208.

Hrmm.  That’s a strange way of looking at things to my mind, though well presented.

The goalies in your selected sample (1200+ shots):
EVsave%:  .921

The remainder of the population:
EVsave%:  .914

Hoe much of that future playing time was lost because they sucked?  How much of that future playing time was lost because they were perceived to suck? 

That’s today’s question.


#41    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 16:55

Sunny:

Well chance alone would give us a left skew in the posterior distribution, that if non-luck were relatively small (as is the case here) or especially so if the true non-luck distribution were inclined the same way.

I think that ‘R’ will give you the prior for any presumed form, no?  With so few goalies it’s tough to peg.  In fact you could use R and the presumed Normal form prior to measure the error in Tango’s methodology.  Similar to the t-test assumptive error example in one of the books you’re reading right now.  I think so, anyways.


#42    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 17:03

Hoe much of that future playing time was lost because they sucked?  How much of that future playing time was lost because they were perceived to suck?

That’s today’s question.

Right, that’s exactly what DIPS is all about:
http://www.tangotiger.net/dipsbands.html


#43    Guy      (see all posts) 2010/05/20 (Thu) @ 17:05

If I’ve followed this correctly, Tango says the true talent SD is .004 and Sunny says .005.  The only difference seems to be that Sunny thinks this is a trivial difference, in effect no different than zero.  But in hockey terms, that’s clearly mistaken. 

Selection bias will, if anything, lead Tango to underestimate (not overestimate) the talent spread. 

Otherwise, what’s to argue about?


#44    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 17:06

The save% for the three groups of goalies I was talking about:

.923 (lots of shots)
.916 (medium shots)
.911 (few shots)


#45    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 17:07

Is Sunny saying that 1 SD in true talent is .005?  If so, then, yeah, there’s nothing more to talk about.  I’d be happy to accept that conclusion.


#46    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 17:17

Tango, re dipsbands:

I think you’re advancing Sunny’s case with that, no?  there are magnetic dice scattered along the curb there.  Surely the obviously gifted pitchers can survive some bad luck early intheir careers.  the mediocre guys ... not so much.

You see the same thing with NHL forwards and shooting%, though not as dramatically.

I’m sure that if you took a random pair of seasons for a randomly selected 100 pitchers in the middle group ... all the smart money says that the second season was worse. 

On any one test it will be a near enough a cointoss, but repeat it ten thousand times and let me bet on the season-before guys.  What odds would you suspect I’d need to declare to make that a fair wager?

Survival of the luckiest.  This is what Sunny is talking about.


#47    Guy      (see all posts) 2010/05/20 (Thu) @ 17:23

"I think you’re advancing Sunny’s case with that, no?”

No.  Sunny has it exactly backwards. 

Survival of the luckiest explains why some poor players get to play 2 years rather than one.  But it has little or no bearing on the pool of players Tango is looking at.


#48    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 17:35

Guy:

No.  Sunny made a programming error originally, now they are near enough.

The data is poor though, there are tremendous effects from scorer bias.  As well there are score effects that are just as significant, teams in hockey play differently based on the score.  It’s not as dramatic as in soccer (MacCrackens current endeavour is a bit of a mess, he should google sisu hockey and read the stuff from a few years ago).

It’s similar to baseball, really.  The decision to play for one extra run, as we all know, is determined by the score, the stage of the game, who is pitching and who are in the bullspens and rested.  Further it depends on the relative quality of the relievers that you’d expect to see given whether or not you manage to score a run.  The pitch count in the game becomes important, because the previous behavior of the pitcher after heavy inning games (pitchers hate pitch count obviously, and never want to be pulled when things are going their way) will affect the manager’s interpretation of his best outcome in the present and future.  In fairness to MLB managers, they largely seem to have this stuff figured out, don’t let the fact that they usually speak like hill people fool you.

NHL coaches similar, though IMO they play to the score too heavily.

In any case, the score effects have to be eliminated or goalies in good teams appear to be better than they really are, and vice versa.  Much the same as pitchers who play for teams that are usually leading tend to give up fewer doubles in general, and in late innings in particular ... no surprise, for obvious reasons.

It’s fine brushstrokes, I know, but it’s a very fine difference between goalies in the NHL right now.  Certainly wasn’t always the case, but it is right now.

And there are a few other factors that are finer brushstrokes yet.

As a discussion of goalie quality this thread is a bit of a nonstarter imo.  As a discussion of survival/censorship bias ... I think there is something to be gleaned here.


#49    Guy      (see all posts) 2010/05/20 (Thu) @ 17:48

Vic, my comment had nothing to do with programming.  Sunny’s claim was that by limiting the analysis to goalies with more than X shots against, Tango was creating the illusion of more talent variance than is true of the larger population of goalies.  He illustrated the point again with the coin flipping example.  But he is mistaken.  If you include short-career players, you increase the observed variance, even after accounting for those players’ smaller samples.  Maybe you think this is luck; most of us would probably it reflects realy talent differences.  But in any case, the effect is exactly opposite of what Sunny suggests.

As a result, I think your final comment also has it backwards.  As a discussion of survivor bias, the thread offers nothing of use.  As a discussion of goalie talent, interesting....


#50    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 17:51

Guy #47:

So you would confidently take my wager, then?  Your verbage slants towards you thinking I’d be a fool for taking even odds.

In fact, given the total results from a thousand random samples of 100 guys from the middle group ... I have no concrete idea, I wouldn’t even know how to get the data (I’ve followed the Wyer’s instructions to build the SQL database, but have yet to become prficient in MySQL ... are you cats ALL database programmers?).  Still, I would think that the odds favoured me to a staggering degree.  10,000 to 1 maybe, I dunno.  Maybe more.

And you can’t rationalize by saying “that’s because all pitchers get worse from the minute they enter the league”.  That would be the crazy icing on the crazy cake, and would surely make your predictive values even worse.


#51    Guy      (see all posts) 2010/05/20 (Thu) @ 18:01

Vic:
Of course pitchers who pitch in consecutive seasons will have a tendency to decline.  Everyone who hangs out here is well familiar with that pattern and why it occurs.  I have no idea, however, what that has to do with Tango’s study or the point Sunny was making.  But if we can all agree that Tango’s initial analyis (and Tom’s) was correct, and that the objections you and Sunny have raised are not ultimately germane, then I guess we’re all on the same page.....


#52    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 18:05

Guy,

Intuitively I agree that the survivors, or least censored, should collectively see the angels lose their desire for them, and drift together a bit.

What about the forgotten dice though?  And why were they discarded?  That has a lot ot do with fame, contract, Ws and Ls, overall save percentage and coach bias.  And more than anything it has to do with streaks, and the way that the human mind interprets them.

Let’s consider the excluded in Tango’s study, the unwashed 71 and compare their performance, in aggregate, next year to the guys he studied, those being the righteous 52.

Will the righteous do better?  Absolutely, even a fool can see that Lungo is better than Raycroft, after all.

Will they do as much better?

Not a chance in hell.

Why?

And if you tell me that the reason is because of aging, and that goalies peak at about 21.5 years old ... well I’ll give up and move on.  :D


#53    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 18:21

Guy said:

Of course pitchers who pitch in consecutive seasons will have a tendency to decline.  Everyone who hangs out here is well familiar with that pattern and why it occurs.

I don’t doubt that there is a concensus, Guy.  I’m just saying that it’s madass, and those that have truly bought into this line of thinking and wagered on the assertions ... well can be easily identified by their lack of worldy possessions.  :D

In fairness to tango, he seems to be a contrarian in this regard.  Admittedly I miss a lot of threads here, but that’s my sense of it.

Really, I don’t give two poops what the concensus here is.  If someone could show me a model with real predictive value, I’d be impressed.  It’s eluded me, I’m searching for bigger brains to answer this.  And I personally don’t care about postdictive values, though I understand why other people feel differently.

Usually simple answers reveal themselves, but I suspect that won’t be the case here.  It’s about streaks, humanity, and the survival of coaches/gms that can see through the noise as well.  there is an interplay between that small community as well.  The “saw him good” guys survive for a reason.  Springsteen was right, too much faith in anything will get you killed.  That includes hockey and baseball stats, as it turns out.


#54    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 18:43

Will they do as much better?

Of course not, because we already said that we have to regress ALL observed performance.

So, you are never as good as you show, and you are never as bad as you show.  If you take players who show good and players who show bad, it is a certainty that the future observed gap between the two groups will be smaller.


#55    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 18:46

Vic, you’d probably like this old research on DIPS:

http://www.tangotiger.net/solvingdips.pdf

It very much has applications here.  Change “fielding” to “team defense”, change “pitching” to “goalies”, and you have what you need.


#56    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 19:27

The stuff in that link is pretty crude, but it’s earthy, I like it.  Your analogy to hockey is very rough, Good Lord, we’re well beyond that.

I read that years ago, and reread it fairly recently, maybe a year ago.  I loved the organic nature of that thread, I remember that much.  I liked the people that were writing in it.

I think the lesson to be learned here is that the sum of variances can be a dangerous thing.

We both know that if you take a 30 dice rollers with differently weighted dies, let them throw the bones an equal number of times ... then the best guess for the variance of the dice weighting is the difference bewteen the actual variance and what we would have expected if all the dice were equally weighted.

The thing is, that distribution could be normal, it could be 25 guys with slightly better dice and 5 guys with terrible dice.  It could be a distribution that was a perfect square, it could be anything at all.  In all cases the Bernoulli trials will carry the variance through, at the grace of capricious gods, to something similar to the difference of the variance between random and non-luck.

The problem is that people, or specifically frequentist NHL fans, seem to be unwilling to believe that the process isn’t reversible.  The dstribution of talent could be anything at all, moreover it could just be coincidence.

I can reread that thread and provide a critique if you’d like, is that what you’re driving at?  I’m going to be busy this weekend though.


#57    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 19:58

Yes, earthy.  I like that.  It puts it all on the table, very bare, so you can see exactly why things are happening.

No, not critique necessary.  Just showing a perspective.


#58    Vic Ferrari      (see all posts) 2010/05/20 (Thu) @ 20:02

Tango said:

Of course not, because we already said that we have to regress ALL observed performance.

So, you are never as good as you show, and you are never as bad as you show.  If you take players who show good and players who show bad, it is a certainty that the future observed gap between the two groups will be smaller.

Regression is not a force of God in the universe that drives any and all things towards mediocrity.

I was not divorcing bad players from good players by ice time.  You were doing that, Tango.  We can agree on that I’m sure.


#59    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 21:24

Regression is not a force of God in the universe that drives any and all things towards mediocrity.

I never said that.


#60    Sunny Mehta      (see all posts) 2010/05/21 (Fri) @ 10:32

Great discussion all around. Few thoughts…

1) I ran Tango’s model. I.e. I took the true talent save percentages he has listed above, and I put them through 10,000 binomial sims using the actual shots they faced over the past three seasons. The mean simmed sd was .0068 whereas the actual observed sd for these goalies over that time frame was .007. This seemed like a good result to me at first, but then I thought maybe it had to turn out that way because that’s how Tango reverse engineered it?

2) I think the point Vic is making (and if I’m understanding it correctly it’s a damn good one, but please correct me if I’m misunderstanding) is that this model will not have predictive value. It has postdictive value because it HAS to (it’s engineered to fit the past results). But a big reason why it won’t have predictive value is because each goalie’s “true talent” is assumed to be a point estimate mean, each with normally distributed results.

So if Tomas Vokoun’s true talent is .929 according to the model, we’d expect his future results to look like a big bell curve with the middle squarely on .929, and 95 percent of the results between .922 and .937. However, this is very unlikely to actually be the case. Vokoun is probably WAY more likely to be at .922 next season than .937. And this same type of discrepancy applies to every player. Next season if we compare each player’s results to how likely those results should’ve been based on the model, we’re gonna get some f’d up results.

The thing is, isn’t this problem unresolvable so long as we use a point estimate for each player’s true talent/projection/etc as opposed to using a probability distribution (and one that is not Gaussian)?

3) As far as the selection bias stuff, I actually didn’t make a programming error, I just didn’t explain myself clearly. I guess my point is simply that, in addition to the last point I made, a huge factor in the results of the model (and quality of the model, as dictated by predictive value) is which population one chooses as a prior. I think we all probably agree on that.

And it’s just interesting to me that if you tweak the population by picking and choosing a guy here and there, a) you can get pretty different results, and b) before you do it you should probably have good reason to do it other than “it makes the math more convenient”. The data is particularly fickle due to the relatively small number of goaltenders in the whole population.

For example, if I take the 123 goalies and filter out the guys who have fewer than 100 shots faced, I’m left with 97 goalies. If I analyze that population, the spread is in fact wider than what we’d expect from chance alone. However, a big reason that’s the case is because three particular schlubby low-playing-time goalies are dragging out the left tail. If I remove them from the population on grounds that they don’t belong, I’m left with 94 goalies. The distribution of these 94 goalies’ save percentages is a bell curve that is spread out no wider than what we’d expect from chance. So if I use this population as my prior, my model would show no skill.


#61    Sunny Mehta      (see all posts) 2010/05/21 (Fri) @ 10:37

D’oh! In the second paragraph of my second point, the sentence “Vokoun is probably WAY more likely to be at .922 next season” should read “Vokoun is probably WAY more likely to be at .922 over the next three seasons” (and not just because of age).


#62    Sunny Mehta      (see all posts) 2010/05/21 (Fri) @ 10:42

Tom Awad,

Thanks again for having the sense to make this data available for easy download. It has spawned a great discussion.

Note that Ilya Bryzgalov’s numbers are messed up in the spreadsheet. He is listed twice - once as “Ilja” and again as “Ilya”. And it says Ilja didn’t play in 2010 and Ilya didn’t play in 2008 or 2009.


#63    Tangotiger      (see all posts) 2010/05/21 (Fri) @ 11:09

This seemed like a good result to me at first, but then I thought maybe it had to turn out that way because that’s how Tango reverse engineered it?

variance(observed) = variance(true) + variance(binomial)

So, yes, I inferred the true spread based on the observed and the binomial.  So, you using the true spread, and applying the binomial should lead to a similar observed in your sim as in the NHL.

***

So if Tomas Vokoun’s true talent is .929 according to the model, we’d expect his future results to look like a big bell curve with the middle squarely on .929, and 95 percent of the results between .922 and .937.

Correct.

However, this is very unlikely to actually be the case. Vokoun is probably WAY more likely to be at .922 next season than .937.

Why is that?

His observed for the last 3 years were: .927, .936, .936, for an observed mean of .933.  Given that the true mean of all goalies was .920 or .921, it doesn’t seem a stretch to say that this was a true .929 goalie would got a bit lucky for two years and a tiny bit unlucky one year.

His chance of being .922 (7 points below) or being .936 (7 points above) seems about right.  Basically, he has just as good a chance of playing league average as playing league-best.  (Aging notwithstanding.)


#64    Tangotiger      (see all posts) 2010/05/21 (Fri) @ 11:15

Next season if we compare each player’s results to how likely those results should’ve been based on the model, we’re gonna get some f’d up results.

If you take the top 6 goalies,
TOMAS VOKOUN
TIMOTHY THOMAS
JONAS HILLER
ROBERTO LUONGO
CRAIG ANDERSON
JAROSLAV HALAK

Their average true talent is .9277.  And, this system is saying that next year their average observed save percentage will be .9277 +/- whatever uncertainty.

(Their actual observed over the last 3 years is .9313.)

Indeed, I can test this pretty easily.  I can use the 08 and 09 data, run my model against that only, generate the top 5 or 6 goalies, and then compare to the actual 2010 data.  Do you want me to do that?


#65    Tangotiger      (see all posts) 2010/05/21 (Fri) @ 11:34

However, a big reason that’s the case is because three particular schlubby low-playing-time goalies are dragging out the left tail. If I remove them from the population on grounds that they don’t belong, I’m left with 94 goalies. The distribution of these 94 goalies’ save percentages is a bell curve that is spread out no wider than what we’d expect from chance.

Obviously, we can select after the fact which goalies from the sample we are going to discard. 

It’s one thing to use a shots faced cutoff (and like I said, I *always* get a spread that is roughly 1.25 times wider than chance alone, regardless of the cutoff I choose).

It’s quite another to then select the three worst observed goalies from that list.  You’d better have a darn good reason for doing that, and it cannot have anything to do with the observed performance.

Let’s for the sake of argument, do what Sunny does.  I have the 102 goalies with at least 50 shots faced.  Here are the bottom 10 in observed save percentage:
0.847
0.859
0.865
0.866

0.880
0.887
0.888
0.891
0.894
0.898

Ok, so we don’t like the 4 worst because they just stick out.  Let’s take them out.

The result?  Standard deviation of 1.19 times wider than the binomial.  So, I reject the contention that the spread is made up purely on these guys.

Let’s go further, as these are the top 10:
0.953
0.952
0.951

0.938
0.933
0.933
0.932
0.932
0.931
0.930

Again, the top 3 stand out, and seeing they only have 64, 104, 61 shots, let’s take them out.

Spread is still 1.19 times as wide.  This would have to have been the case since their low shots totals barely gave them a high z-score.

So, I’m not seeing it.


#66    Tangotiger      (see all posts) 2010/05/21 (Fri) @ 12:03

I took the top 70 goalies in shots faced for the 08 and 09 seasons.  The spread in the z-scores was 1.22.  The regression point was “X” (or K) of 1094.

The top 5 for estimated true talent was:
talent Name
0.9350 TIMOTHY THOMAS
0.9294 CRAIG ANDERSON
0.9287 ROBERTO LUONGO
0.9283 TOMAS VOKOUN
0.9270 JONAS HILLER

This would mean that we’d expect those 5 guys to average .9297 in 2010.  What happened?

2010 talent Name
0.9129 0.9350 TIMOTHY THOMAS
0.9243 0.9294 CRAIG ANDERSON
0.9267 0.9287 ROBERTO LUONGO
0.9356 0.9283 TOMAS VOKOUN
0.9325 0.9270 JONAS HILLER

The average in 2010 was .9264.

Obviously, one set of small test cases is not going to prove anything.

HEre’s the bottom 5 in 08+09 seasons (of goalies who played in 2010):

2010 talent Name
0.9361 0.9151 PETER BUDAJ
0.9023 0.9146 VESA TOSKALA
0.9103 0.9145 PATRICK LALIME
0.9217 0.9129 JOHAN HEDBERG
0.9080 0.9116 ANDREW RAYCROFT

Average in 2010 was .9157 compared to a talent expectation of .9137.

These two sets of data points do point to not enough regression occurring.

But, whatever it is, it’s certainly not the case that there’s no differentiation in talent.  Of these 10 goalies, 4 of the 5 best observed performances were by the estimated more talented goalies.


#67    Sunny Mehta      (see all posts) 2010/05/21 (Fri) @ 12:52

Tango,

It’s one thing to use a shots faced cutoff...It’s quite another to then select the three worst observed goalies from that list.  You’d better have a darn good reason for doing that, and it cannot have anything to do with the observed performance.

But we’ve already agreed that there is a correlation between shots faced and observed performance, so by you filtering by shots faced you ARE filtering by observed performance.


#68    Sunny Mehta      (see all posts) 2010/05/21 (Fri) @ 12:57

Tango/#66,

Thanks for running that. Out of curiosity, if you use the entire population for 08-09 (i.e. no minimum shots) to get your K, what numbers do you get for those 10 goalies’ expected versus actual?


#69    Sunny Mehta      (see all posts) 2010/05/21 (Fri) @ 13:11

Tango,

Using your z-score method, what number do you get if you take the whole population and remove the top 10 and bottom 10 save percentage guys? (All of whom are very low playing-time guys.)


#70    Tangotiger      (see all posts) 2010/05/21 (Fri) @ 14:28

But we’ve already agreed that there is a correlation between shots faced and observed performance, so by you filtering by shots faced you ARE filtering by observed performance.

Yes, agreed, which is why I was running with the 1200 shot cutoff, the 500 shot cutoff and the 50 shot cutoff.

Now, logically, if Vokoun faces 4000 shots, we ALSO know something else about him: his coaches really like him enough to let him face 4000 shots.  Therefore, in order to find a representative sample from the population of goalies, it’s perfectly fine to start with all the goalies with at least 1200 shots (or 2000 or 3000 even… but then we start losing sample size).

I’m not saying all this is easy. I am saying that we have to be careful to come to any kind of conclusion whereby you have human beings performing a task and saying that there’s no difference in their talent level at whatever population you choose.

The question on the table is HOW MUCH of regression you need, and to WHAT regression point do you regress the observed sample.

With Vokoun, if you use ALL the goalies, you regress little to the population mean.  If you limit it to goalies with at least 1200 shots, you regress somewhat to that population mean.  If you limit it to goalies with 2500 shots, you regress alot to that population mean.  Ideally, you find a population that Vokoun belongs to where he is indistinguishable (won’t ever happen) and regress 100% to that population mean.

That’s our job.  To find the most representative sample for Vokoun.  And do that for all 100 goalies.


#71    DSMok1      (see all posts) 2010/05/21 (Fri) @ 14:44

Sounds like we’re about to go full-out Bayesian for this…

Good work, all.


#72    Tangotiger      (see all posts) 2010/05/21 (Fri) @ 14:45

if you use the entire population for 08-09 (i.e. no minimum shots) to get your K, what numbers do you get for those 10 goalies’ expected versus actual?

107: n
1.27: z-Score
982: average Shots per goalie (*)
613: X

(*) This is where I’m not sure, whether to take the mean or the harmonic mean.

The top 5 goalies will regress little, and tehrefore, haev a very high .9311 true talent.

If I limit it to 1000 shots faced:
45: n
1.23: z-Score
1890: average Shots per goalie
1343: X

True talent of top 5 is .9294.

Though, I suppose since I did 5/107 the first time, I should do 2 or 3 of 45 this time.  Top 2 is .9317, and top 3 is .9306.  That puts it right in line with the other.

Interesting.... so, the top 5% of true talent level is .9311, regardless of what cutoff I chose.

***

10: n
0.99: z-Score <-- Bingo!

The top 10 in shots faced showed a distribution that is no different than expected from luck!

That is, these 10 goalies, having faced the most shots, have these observed sv% for the two seasons (08, 09).  And, we CANNOT tell which one of these goalies is the best:

sv% Name
0.931 TOMAS VOKOUN
0.926 NIKLAS BACKSTROM
0.925 ILJA BRYZGALOV
0.924 HENRIK LUNDQVIST
0.924 RYAN MILLER
0.923 MARTIN BIRON
0.921 CAM WARD
0.919 EVGENI NABOKOV
0.915 MIIKKA KIPRUSOFF
0.913 MARTY TURCO

We would have to regress each of them 100% toward the population mean of .922.

Therefore, it becomes critical to know what population the goalies come from.


#73    Tangotiger      (see all posts) 2010/05/21 (Fri) @ 14:47

Sounds like we’re about to go full-out Bayesian for this

Right, what I’m doing is a shorthand for Bayes.  Tom Awad did the Bayes in the previously noted article.


#74    Sunny Mehta      (see all posts) 2010/05/22 (Sat) @ 15:54

Okay, I ran a full Bayesian model on Vokoun by doing the following:

1) I fixed the Bryzgalov data error.

2) I created a Prior ability distribution using a beta curve with K and eta calculated by using the Nelder-Mead algorithm on the data of all 122 goaltenders. (K = 2580.349, eta = 0.9189768)

3) I used Vokoun’s observed results to create a beta Likelihood curve, then I multiplied that by my Prior to get the Posterior. You can view the graph of all three curves here:

http://sunnymehta.com/public/Vokoun.jpeg

A few comments…

While this was fun, I am skeptical of its accuracy. I definitely wouldn’t bet my own money on this as a predictive model. Basically I just don’t think the prior is right. I’m not sure exactly how the Nelder-Mead algorithm works, and honestly I don’t even know if a beta prior is the right tool at all. But from what I know about how goaltending works in the NHL, and from what I know about looking at the Sv% data a million times, I just don’t think this prior accurately reflects the true ability distribution. I may try putting together a prior distribution manually by brute force method and see how that works.


#75    Guy      (see all posts) 2010/05/22 (Sat) @ 16:10

Sunny: 
You started out here: “IMO your [Tango’s] “true talent skill” numbers are likely quite a bit off, and I’d bet real money on it.”

Now you appear to have reached similar results (yes?) but are nonetheless “skeptical” and “wouldn’t bet my own money on this.” Are you saying that you still don’t accept the basic outlines of what Tango is saying?  If so, how do you explain his results in comment #66?  Or if you do now agree, then it would be good form to say so and acknowledge your earlier error. 

I’m all for people criticizing others’ work, and stating their case as forcefully as they want.  But when you turn out to be wrong, then man up and say so—Tango certainly would if the situation were reversed.  Same goes for Vic, who had lots to say until the evidence against him became overwhelming.  Just a pet peeve of mine, I guess, but when someone loses a debate online they should acknowledge it, not just slink away…


#76    Sunny Mehta      (see all posts) 2010/05/22 (Sat) @ 22:24

Guy,

I guess we just don’t look at it the same way. I don’t consider any of this a “debate”, or an opportunity for anyone to “man up”.  I think of myself, Tango, Vic, Awad, etc as all being on the same team.  We all put a lot of time and thought into this stuff because we are passionate about it.  I consider this thread an exchange of information, and a good one at that.

The vehement tone of my first comment was a function of my enthusiasm, not some kind of attack on Tango. I highly doubt he took any offense to it, but if he did I would gladly reconcile with him.

As for me being wrong (and being able to admit it), lol, let me just say right now that I’m wrong all the time. Probably some percentage of every day. But what can I do but try to keep an open mind while accepting my humanness.

As for your assertions that there is “overwhelming evidence” to Tango’s original model being good predictor of future goalie talent, good enough to bet on, well I just disagree. And yes, I’ve run my own tests, and I’ve looked at the data from different angles, and sometimes my math has come to similar conclusions as his, and sometimes it hasn’t, but the bottom line is that I just don’t have a ton of faith in any of it being the right answer. I have mentioned many of my reasons for skepticism in this thread.

However, if you still disagree, and you think there’s absolutely nothing to the qualms Vic and I have expressed, I’m sure we can work out some sort of wager.


#77    Tangotiger      (see all posts) 2010/05/23 (Sun) @ 09:10

I think the focus should be on the merits, not on our faith system as to how much money we would bet.  All I care about is the evidence, and what it means.

Now, there is one huge issue still to resolve: team play.  Just as with the Solving DIPS (which I hope everyone here has read… you should read it), there is a team play true talent.  And, to the extent that 3-years of goalie data doesn’t have these effects randomly distributed, it is something real.

Basically, the observed spread if all goalies were equals should STILL be wider than the binomial, because we’ve introduced team defense into the mix as well.  How much that is, I don’t know.  If someone wants to argue that the 1.25 width should really be 1.12 for true talent goalies and the rest because of team play, well, I won’t really be able to disgree much.

And of course the more years we add in, say 5 or 7 years of data, such that goalies + teamdefense pairings won’t be very strong (even if Brodeur is on the Devils, he doesn’t have the same team in front of him every year), we then have to worry about aging.

The only conclusion we can make is that there is some degree of goalie talent.  The question is how much.

So, you present what you know, and declare the missing parameters that need to be explored.


#78    Sunny Mehta      (see all posts) 2010/05/23 (Sun) @ 10:29

Tango,

Well, perhaps “faith” was a poor word choice on my part, but I disagree that “how much money we would bet” is irrelevant. We as a society use money to place value on things, and truth is no exception. Gambling markets are in essence a giant “truth gauge” imo, and they tend to be very efficient. Humans are empirically much more discerning about truth probabilities when forced to bet their own money on it.

But let me scratch “faith” and replace it with a reiteration of my actual qualms.

1) We’re dealing with a very high K-value to begin with. I.e. very small edges, not very much discernable skill relative to what chance alone would say or relative to other sport skills (e.g. hitters’ walk rate in baseball, forwards’ corsi/fenwick in hockey, etc.), and not much room for error.

2) We still haven’t come up with an accurate prior wrt the ability distribution. There is a clear left tail visible in the population, and both the beta prior and normal prior don’t seem to be quite depicting it correctly. This in addition to the survival bias issue, the “which population is a guy part of?” issue, and the inherent problems with putting certain guys in certain populations purely based on playing time/observed results (i.e. with no other real prior to do so).

3) Shot recording bias. It is very real, and it likely plagues these numbers. And remember, we’re dealing with such small edges to begin with that even a few percentage points difference in Sv% due to recording bias is HUGE. Using road-tied numbers is a great start imo, but it cuts down our already small samples sizes.

4) Team effects. I’m talking mostly about playing-to-the-score effects, which are very real. I’m skeptical of your assertion that there is team play true talent wrt affecting shot quality, if you mean beyond score effects, mostly because it doesn’t appear very strong empirically. You should read Vic’s great study of it here:

http://vhockey.blogspot.com/2009/07/shot-quality-fantasy.html


#79    Sunny Mehta      (see all posts) 2010/05/23 (Sun) @ 11:35

Also, one thought I’ve had about dealing with the selection bias thing is to look at Sv% at the team level. After all, even if coaches are “going with the hot hand” wrt particular players, they still have to start SOMEONE at the goaltender position every game. And we’re only talking about 82 man-games per season per team, so maybe we can glean something by examining the spread in team goaltending.

Thanks to JLikens I have the last four seasons worth of team data for road games, even strength only, with the game tied. I ran 10,000 sims and found the average simmed binomial sd to be 0.007031. The observed sd is 0.006954744. Using the Nelder-Mead algorithm I get a K of 4732.775. Doesn’t seem like much skill there.


#80    Tangotiger      (see all posts) 2010/05/23 (Sun) @ 12:17

4300 shots still means you are going to regress Vokoun 50% toward the mean.

You can characterize that as “not much”, but really who cares how YOU or anyuone want to characterize it.  It’s 50% toward the mean given 4700 shots or whatever.  It is what it is.  There’s no ambiguity with that.

Gambling markets are in essence a giant “truth gauge” imo, and they tend to be very efficient.

If that is the case, then we should see no change in spreads depending on who the goalie is, for a given team.  I would guess that we in fact do see a non-zero change in spreads.


#81    Sunny Mehta      (see all posts) 2010/05/23 (Sun) @ 12:20

Tango,

Fair points.

Question: Do you know how the Nelder-Mead algorithm works?  I.e. why did it spit out a K of 4732 when the observed sd was smaller than the average simmed sd?


#82    Sunny Mehta      (see all posts) 2010/05/23 (Sun) @ 12:23

Also, Tango, what is Vokoun’s Sv% in his last 4300 ES shots on the road w/score tied?


#83    Guy      (see all posts) 2010/05/23 (Sun) @ 12:27

"However, if you still disagree, and you think there’s absolutely nothing to the qualms Vic and I have expressed, I’m sure we can work out some sort of wager.”

I’m game.  The bet is that Vokoun will post a save% above the league mean next season.  I’ll put up $100 to your $50, with the money going to charity of choice.  If there’s no skill being measured here, that’s a GREAT bet for you. OK?
(And I offer this knowing nothing at all about hockey—I’d never heard of Vokoun until reading this tread).


#84    Sunny Mehta      (see all posts) 2010/05/23 (Sun) @ 13:04

Guy,

I am saying that I’d bet against the numbers in Tango’s original post being actual indicators of true goalie talent and predictors of future results.

If you take his top 5 goalies, he says their average true talent is .928. How about we take those 5 goalies, and if their combined even strength save percentage on the road w/score tied next season exceeds .928, I’ll lay YOU odds. And fk this hundred dollar stuff, you have nothing to lose with that. I’ll lay $3k to your $2k, and we are each free to do what we want with the money (charity, beer, strippers, whatever).

Deal?

(I’d even be comfortable escrowing the money with Tango, if he’s okay with that.)


#85    Guy      (see all posts) 2010/05/23 (Sun) @ 16:40

Sunny:  As I’ve said, I don’t know anything about hockey.  So I can’t possibly judge how all your conditions (road, tied, even strength) affect the odds here.  Or the sample size implications.

But the fact you won’t take my bet tells me everything I need to know.....


#86          (see all posts) 2010/05/23 (Sun) @ 22:34

Tango,
In #80, were you referring to gambling lines showing a preference based on starting goalie? Given the lack of hockey betting activity I’d be surprised if this effect would be significant, although it seems like the data needed re:spreads would be so tough to collect as to make this type of analysis nearly impossible to undertake with any accuracy?

Is this list predictive to some extent? It seems like that must certainly be true. But I don’t see how we will ever really know whether the numbers are really “right” or not. It feels like the regression step doesn’t really add anything to the predictive ability of a goaltending model.

Could there be some way to include expected playing time going forward, much like your fan scouting report? It seems like much of the uncertainty around goalie performance could stem from shifts in usage from season to season? Any ideas on how this could be explored from here?


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 08:11
What sabermetrics is NOT

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards