THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Monday, April 19, 2010

Correlation coefficient in football

By Tangotiger, 07:09 AM

Brian Burke does find a strong relationship between QB performance and draft spot, even with the survivor bias:

This Blog
This Blog





Advanced Stats for Individual Players

Win Probability Added, Expected Points Added, Success Rate, and more for every skill player since 2000. Find out who really made a difference for their team when it mattered most, by position or by team:
Win Probability Graphs for All Games Since 2000

Check out the Win Probability graphs and play-by-play of your favorite team’s biggest comebacks and most exciting games. Browse the archive for previous years or the current season by week:
Apr 18, 2010
Are Top Draft Pick QBs Any Better Than Late Round Picks?

Is quarterback performance related to where a passer is taken in the draft? It may seem like a silly question, but the answer is more complicated (and controversial) than it first seems. What if I said that the correlation between a QB’s draft rank and his career adjusted Yards Per Attempt (AdjYPA) is only -0.07? You’d think that’s amazingly small. (The correlation coefficient would be -1 if the relationship between draft order and performance is perfectly proportional, and it would be 0 if there is no relationship at all.)

What if I told you the correlation coefficient is -0.72? That’s more like it, you’d think. But which correlation coefficient is correct? They both are.

Difficulties in the Analysis

There are several wrinkles to the question of whether draft position is related to QB performance. The first deals with opportunity. Top picks will be offered more opportunities to play than later picks. This would be due to two causes: one, the top picks may be better players; and two, teams have invested sunk costs in acquiring them. It’s impossible to separate and measure the two causes.

To partially address the fact that top picks tend to get more starts and more pass attempts, we should use per-play “rate” stats as a measure of performance. Using aggregate “total” stats would favor the top picks. Unfortunately, there are many QBs who never played a down or who attempted so few passes that their stats are very erratic due to their small sample size.

Typically, to avoid the low sample size problem caused by these players, we use a cutoff of a minimum number of pass attempts, say 100 or 200 career attempts. By using a cutoff, however, we create a second major problem—selection bias. The draft itself is fundamentally a process of selection, so selection bias is going to be a large issue.

If we only consider those players who are good enough in their coaches’ eyes to play, we’re seeing an unrepresentative sample of QBs. Assuming coaches have any ability to discriminate on player ability, we’d see only the players above a certain threshold of ability. The diamonds in the rough like Tom Brady will skew the analysis, while the numerous 6th-rounders who never played aren’t considered.

My initial solution was to assign an arbitrary level of performance to QBs whose total attempts were below the qualifying total. I chose the 5th percentile performance, figuring they weren’t good enough to play much in real games, but it would be unfair to say they would all be as bad as the worst player in the group.

Dave Berri, whose research sparked the debate about whether top picks are any better than later picks, convinced me that my approach may penalize non-qualifying QBs too much. But we shouldn’t only look at QBs good enough in their coaches eyes to play, and we can’t know how well the guys who didn’t get to play would do given a fair opportunity. I suppose the real question is how to consider low-attempt QB careers. (More on this in a subsequent article.)

An alternate way to include the performance of all QB draft picks without suffering the erratic effects of low sample sizes is to aggregate the passes of all QBs at a certain order in the draft. In other words, we think of all 1st round QBs as a single case, all 2nd round QBs as a single case, and so on. We could repeat the same analysis, but instead of using draft rounds as our “bins,” we can use position order--first QB taken, second QB taken, and so on.

The problem with this approach is that QBs with longer careers, who are the better QBs, are going to bias the numbers because they contribute more weight in the average. But I’ll accept this “survivor” bias for now because it favors the later round diamonds in the rough. If we can still detect a significant relationship between draft position and performance despite this bias, we could safely confirm that better QBs really are drafted earlier.

Measuring the Correlation

As the measure of performance, I’ll use Adjusted Yards Per Attempt (AYPA), which is yards per pass attempt with a 45-yard penalty for each interception. The correlation coefficient between overall draft order and AYPA is 0.07. But we get very different answers when we group picks together.

The first graph plots AYPA by overall draft order, grouped by bins of 10. Keep in mind there is still the matter of survivorship bias due to the diamond-in-the-rough effect.

The correlation coefficient becomes -0.22, which is relatively weak but still stronger than the original -0.07.

But now let’s look at AYPA by draft round, which is essentially the same as grouping picks in bins of about 30.

The correlation coefficient is now -0.38, which is considerably higher, especially in light of the diamond-in-the-rough bias.

Let’s look at the relationship a third way. The next graph plots AYPA by position-order, that is, the 1st QB taken, the 2nd QB taken, and so on.

The correlation coefficient now becomes -0.72, which suggests draft order really can predict QB performance. Even if we lop off the 13th QB taken from the analysis (there are relatively few of them), we’d still get a correlation of -0.64.

Correlation Coefficients Can Be Deceiving

Why do we get such different answers to the question of how well draft order correlates with performance? The reason lies in how correlation coefficients are calculated. Correlation coefficients are a measure of how much variance two variables share. Variance is an abstract concept, defined as the square of the difference from the mean to the actual value. It always has at least two components:

var(observed) = var(true) + var(random error)

The larger we make each “bin” of comparison (the groupings of the various cases),the more the random sample error is averaged out. Sample error tends to be reduced with larger samples. Consequently, the share of the observed variance that is “true” and not due to sample error is increased. By definition, the random error component of the variance of a variable will not correlate with that of another variable. The result is that correlation coefficients will be higher when we group cases into larger bins, and they will be lower when we group cases into very small bins.

In fact, theoretically, if we only used 2 very large groups of cases, say top 3 rounds and bottom 3 rounds, we could create a correlation coefficient of -1.0. Correlation coefficients can be deceiving. If you took the -0.07 correlation at face value, you might wonder if teams would be better off signing a couple late-round QBs for the rookie-minimum salary than mortgaging the farm to sign a top prospect.

Page 1 of 1 pages

Latest...

COMMENTS

May 25 16:59
Howard Stern

May 25 16:46
“Why Kickstarter works”

May 25 16:43
Pete Palmer’s new book: Basic Ball

May 25 16:31
What sabermetrics is NOT

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

April 19, 2010
Correlation coefficient in football