THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Thursday, October 22, 2009

Why I hate regression “techniques”

By Tangotiger, 12:04 PM

A reader made reference to a popular Moneyball paper by Sauer and Hakes (pdf). Boy that paper makes my blood boil. Let’s look at Table 2, column 2001. This is the equation:
ln(Salary) = 3.1*SLG - 0.13*OBP + .003*PA + 1.1*ArbEligible + 1.68*FreeAgent +.07*IsCatcherOrInfielder + 10.3

This is based on 357 players (min 130 PA in year 2000).

And here is their money quote (no pun intended):

The relative valuation of on base and slugging percentage is abruptly reversed for the year 2004, despite the inertia produced by long-term contracts. The returns to slugging are similar in 2004 to prior years, but this is the first year for which the ability to reach base is statistically significant. The labor market in 2004 appears to have substantially corrected the apparent inefficiency in prior years, as the coefficient of on base percentage jumps to 3.68, and the ratio of the monetary returns to reaching base and slugging is very close to the ratio of the statistics’ contributions to team win percentage.

Now, just because a regression equation (likely poorly constructed) shows that the OBP has virtually zero impact on salary in 2001 doesn’t mean that this equation is valid. It’s mathematical gymnastics that makes no sense.

Let’s start with something basic: .400 SLG, .330 OBP, 400 PA, outfielder, free agent.  Now, how much do you think this guy should be getting (in 2001)?  According to their model, $1.75MM.  Sounds about right for that time, I guess.  What if this guy had a .280 OBP?  $1.76MM.  So, OBP is basically completely irrelevant according to this model of 2001.  Maybe that’s true, so let’s keep going.

Now, let’s give this guy 600 PA.  He’s at $3.2MM.  Sounds ok, too.  Give him a .250 SLG and .250 OBP and 600 PA: $2MM !!  Does that make any sense whatsoever?  No! This is the kind of b.s. mumbo-jumbo mathematical gymnastical regression “techniques” that drive me nuts.  Nuts! Indeed, a free agent outfielder with 600 PA in 2000, who hit like a pitcher (say .180 SLG, .160 OBP) would make $1.65MM according to this model.

Just because you can throw everything into a regression model doesn’t mean that you should.  You “hope” that it all works out, and all the relevant parameters rise to the surface, and their all work nicely independently of each other. But, they don’t.  And this is a prime example of what NOT to do.  I have no doubt that if I go through each year’s equation I will find similarly absurd results.

As I’ve said (and shown) in the past, it is extremely easy to value players in terms of runs, wins, and dollars. 

Page 1 of 1 pages

Latest...

COMMENTS

May 26 10:58
What makes for a successful GM?

May 26 07:27
“Why Kickstarter works”

May 26 03:03
Pete Palmer’s new book: Basic Ball

May 26 01:11
Largest demonstration in Canadian history?

May 25 19:41
What sabermetrics is NOT

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

THREADS

October 22, 2009
Why I hate regression “techniques”