THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Friday, January 29, 2010

Clustering exposes bias

By Tangotiger, 12:18 PM

This is a good example of something that you can start, but you can’t end there.  Steve decided to take SOME traits of a hitter’s profile, and create clusters based on those traits.  For example:

I found it mildly amusing that if you only look at batted ball types (excluding HF/FB) that Yuniesky Betancourt and Albert Pujols fall in the same cluster.

This makes it abundantly clear that the traits you choose will drive your results.  He gives another example:

The set of clusters I decided to focus on were the ones based on LD, HR/FB, BB. Here are the cluster centers for it, along with the average wOBAs of each cluster.... And here are a couple guys that stand out by having a low wOBA relative to their cluster (potential for improvement maybe?)… this all boils down to an interesting thought experiment

The bold part is the only thing I would strike from his article.  After all, he just finished telling us that because of the limited number of traits in one experiment, he had Pujols and YuBet in the same cluster.  He took a better set of traits, but still, that doesn’t mean there’s no bias there.  Whatever it is he left out, that’s what will drive the differences.

Ideally, you would look at ALL the parameters such that you don’t leave anything out, and therefore, have no bias.  Other than the actual identity of the player being special (say there’s nothing in Ichiro’s or Jeter’s hitting profile that could do them justice).

And this is exactly how a forecasting system works: you identify the traits, you figure out the relationship of those traits to the hitting stats, and you come out with a forecast.  You will find, after hundreds of hours on this, that:
1. You spent hundreds of hours on this
2. The difference between the point you are now at, and the point you started at, can be measured in inches

Page 1 of 1 pages

Latest...

COMMENTS

May 25 00:36
Help needed with sticky issue…

May 25 00:32
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 20:16
Largest demonstration in Canadian history?

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com

May 24 00:16
Psst… wanna intern… somewhere?

THREADS

January 29, 2010
Clustering exposes bias