THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Monday, February 01, 2010

The Universal Player ID and Biographical Data project

By Tangotiger, 02:57 PM

All those who are extremely annoyed at all the databases, files, and systems out there that use their own player IDs, please raise your hand.  Yes, me too.  It bothers me to no end.

I will propose the following: let’s make MLBAM ID the universal ID, and let’s create a mapping table against that ID for every source we are interested in (Retro, BDB, STATS, BIS, etc).  The reason for the MLBAM ID to be the universal ID is that they have all the players were are interested in.  Indeed, at some point, I can see MLBAM be interested in college and high school, and so, an easy way to expand their pool.  Japanese players?  Well, ok, you got a point, but let’s consolidate everything else first.

I will post a file soon that has all the IDs that I have created in partnership with MLBAM.  They supplied with their IDs, and their maps to Retro and BDB.  I then updated the Retro and BDB IDs that were wrong.  I then added my own mappings of STATS and BIS.  And I will post that file.  Ideally, you guys then validate the data.  After we get that done, the fun starts: validating biographical data.

While the BDB has bio data for some 17,000 players, there are over 80,000 players in the MLBAM file.  For all of you guys who want to help sabermetrics but are afraid or intimidated, this is the grunt work that needs to be done.  All I can say is: help me. 

UPDATE:
You will see something like this:
MLBAM_ID,retro_id,bdb_id,stats_id,bis_id,source_id
110015,abbop001,abbotpa01,4543,1061,292

That’s the MLBAMID, Retrosheet ID, BDB Id, STATS Id, BIS ID. 

The “source id” is for me, as it tracks where I got my data from.  “292” just tells me the data starting with “source 4”, then updated with “source 32” and “source 256”.  I have 15 different sources I cobbled together.

Anyway, download it, link it with your own data sources, and report any problems.  This is version 0.1.  It is not complete.
EXPORT_ID_MAP.zip

(27) Comments • 2011/10/04 • SabermetricsData
Page 1 of 1 pages

Latest...

COMMENTS

May 26 11:15
What makes for a successful GM?

May 26 07:27
“Why Kickstarter works”

May 26 03:03
Pete Palmer’s new book: Basic Ball

May 26 01:11
Largest demonstration in Canadian history?

May 25 19:41
What sabermetrics is NOT

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

THREADS

February 01, 2010
The Universal Player ID and Biographical Data project