THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, February 01, 2010

The Universal Player ID and Biographical Data project

All those who are extremely annoyed at all the databases, files, and systems out there that use their own player IDs, please raise your hand.  Yes, me too.  It bothers me to no end.

I will propose the following: let’s make MLBAM ID the universal ID, and let’s create a mapping table against that ID for every source we are interested in (Retro, BDB, STATS, BIS, etc).  The reason for the MLBAM ID to be the universal ID is that they have all the players were are interested in.  Indeed, at some point, I can see MLBAM be interested in college and high school, and so, an easy way to expand their pool.  Japanese players?  Well, ok, you got a point, but let’s consolidate everything else first.

I will post a file soon that has all the IDs that I have created in partnership with MLBAM.  They supplied with their IDs, and their maps to Retro and BDB.  I then updated the Retro and BDB IDs that were wrong.  I then added my own mappings of STATS and BIS.  And I will post that file.  Ideally, you guys then validate the data.  After we get that done, the fun starts: validating biographical data.

While the BDB has bio data for some 17,000 players, there are over 80,000 players in the MLBAM file.  For all of you guys who want to help sabermetrics but are afraid or intimidated, this is the grunt work that needs to be done.  All I can say is: help me. 

UPDATE:
You will see something like this:
MLBAM_ID,retro_id,bdb_id,stats_id,bis_id,source_id
110015,abbop001,abbotpa01,4543,1061,292

That’s the MLBAMID, Retrosheet ID, BDB Id, STATS Id, BIS ID. 

The “source id” is for me, as it tracks where I got my data from.  “292” just tells me the data starting with “source 4”, then updated with “source 32” and “source 256”.  I have 15 different sources I cobbled together.

Anyway, download it, link it with your own data sources, and report any problems.  This is version 0.1.  It is not complete.
EXPORT_ID_MAP.zip


(27) Comments • 2011/10/04 • SabermetricsData
Page 1 of 1 pages

<< Back to main