THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, August 24, 2006

True Talent Levels for Sports Leagues

By Tangotiger, 09:19 AM

I’m engaged in a discussion on True Talent Levels across sports leagues.  The question on the table is how many games do you need to get the truly better team to have the better record.  In that thread, I give you a couple of useful equations to use.

I will reprint all my comments into this thread here (which may look somewhat bizarre without the accompanying context), but I encourage all to also follow the discussion there.


var(observed) = var(true) + var(random)

where var=variance

If we look at OBP, var(true) in MLB is around .030^2.

For any single PA, it’s either safe or out. That makes our var(random) = .474^2.

var(observed) therefore will be .475^2.

Your regression toward the mean therefore will be over 99%.

That’s for one PA. But, there are 80 PAs per game, more or less (hitters and pitchers). The var(random) drops down to .053^2.

So, it all depends on the number of “trials”. In football, you probably have around 150 possessions? Basketball is what, 200? Hockey likely in the 100+ neighboorhood? Tennis, 4 matches x 9 games x 6 to 8 points = 250?

The less trials, and the closer the var(true) is to zero, the more luck plays a role. My guess is that tennis has far fewer upsets simply because the trials are so high, and the spread in talent is so much wider.

===================

I don’t see how it can be close to 100%. It’s not like we always see players ranked #1 through #16 in every tournament.

In any case, the exact answer can be determined either empirically (we have enough data), or through the process I explained.

===================

Ah, 60% is huge! I guess the simple question is: if a guy who wins 60% of the points faces a guy who wins 40% of the point, how often will the second guy win more than 50% of the points, over 250 trials? I get 99.9%.

===================

If the probability we expected was simply 51% to 49% for any single point, the better guy will win 62% of the time. If let’s say this was Sampras/Agassis head-to-head record, it shows you how very close they are, and it’s only the setup in tennis that allows Sampras to stand out much more.

===================

For tennis, this is likely the case, with women. The spread in talent in women’s tennis is likely far wider than in men’s tennis. To ensure that the same women don’t always win, you need fewer games per match.

As for baseball, var(true) for a baseball team is about .060 (which can be calculated in many ways).

var(random) reaches .060, when the number of games played is 69. That is, after 69 games, the “r” is .50.

I don’t know what the var(true) for a football team is. I’m sure it’s quite a bit higher. Just taking a quick stab at it now, let’s say var(true) it’s .150 for football. To get var(random) to be .150, you need 11 games. That, is, after 69 baseball games, you’ll know as about the true talent of teams, as you would after 11 NFL games.

===================

Here is one way to figure out the var(true) for any league.

Step 1 - Take a sufficiently large number of teams (preferably all with the same number of games).

Step 2 - Figure out each team’s winning percentage.

Step 3 - Figure out the standard deviation of that winning percentage.

I just did it quick, and I took the last few years in the NFL, and the SD is .19, which makes var(observed) = .19^2

Step 4 - Figure out the random standard deviation. That’s easy: sqrt(.5*.5/16)

16 is the number of games for each team.

So, var(random) = .125^2

Solve for:
var(obs) = var(true) + var(rand)

var(true), in this case, is .143^2

Knowing that var(true) is .143, to get an “r” of .50, you need var(rand) to also be .143. For that to happen, the number of games played equals 12. That is sqrt(.5*.5/12)= .144

In baseball, var(true) is .060.

I haven’t figured out what it is in NHL, or NBA, but perhaps someone wants to look at it?

(52) Comments • 2008/03/11 • SabermetricsTalent_DistributionOther SportsFootballHockey
Page 1 of 1 pages

<< Back to main