THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, February 17, 2010

The long-awaited Injury Database

By Tangotiger, 02:41 PM

Last year, when Corey Dawkins unleashed his online injury database, I said this:

A few years ago, I went through all the DL transactions for a given year.  I created a pretty good model for an Injury Database.  I published the model, but then let it die there.  No one picked it up. Now, someone has decided to spend the time to do it.  Not only that, but he’s gone back several years, and given us a very nice interface as well.  His name is Corey.  Great job!

The next step was to get it in downloadable form.  Well, say hello to Josh:

The database uses the data model developed by Tom Tango, where each injury is broken out by body part injured, side and description. This allows for a pretty granular analysis. The work was hard, but I’m pleased to report my portion of it is finished (for now). I’m making the CSV file and the SQL dump available for download here. I’m hoping the community will find it useful enough to help me keep it updated.

All players are referenced using retroIDs.

There are only two restriction on using the data, and folks can use it for commercial purposes if they choose. 1. You must post a link back to Rotobase. 2. You must make any additions to the database public in CSV or SQL dump form for other to use and enjoy. 

I look forward to seeing what the saber community does with the info. For my part, I’ll be posting my analysis of some of the data in the next few days.

This is also an easy call for the BtB Sabermetric resource award nomination.  I just love being part of this community, this sabermetric-utopia.  Have at it boys. (zip file)


#1    Nick Steiner      (see all posts) 2010/02/17 (Wed) @ 15:39

Wow, this is amazing.  Jeff, did you know about this?


#2    Brad at Cubs Stats      (see all posts) 2010/02/17 (Wed) @ 16:11

Thanks Tango, Corey, Josh, and then Tango again! I’m looking forward to playing with this!


#3          (see all posts) 2010/02/17 (Wed) @ 16:18

I found out today.  The one I have been working on will have more details.  It will be nice to have as a reference.


#4    Josh      (see all posts) 2010/02/17 (Wed) @ 17:18

Thanks guys for the kind words. Hope you find it useful.

Jeff/3

David at Fangraphs has been working on one as well, so it seems a lot of folks were working on the same thing at the same time, duplicating efforts.

Seems a perfect opportunity for collaboration, but every time I reached out to folks I was told they had a profit motive, sunk costs etc.

For me this is just a hobby, and I needed the data. Hopefully now the community will add the extra detail and help keep it updated, but I;d say the odds are against.

So if you are looking to keep yours proprietary, you’ll probably still have an audience.

Besides, there are no names in the csv and sql dumps. Someone needs to have a database to even use the info, and that rules a bunch of folks out.

Best,

Josh


#5    Jeff Z      (see all posts) 2010/02/17 (Wed) @ 17:27

Josh—I am actually working with David on it along with a couple of other regulars.  We are doing a few more options and plan on keeping it updated throughout the year.  I am glad it is available.


#6    Joe Pawlikowski      (see all posts) 2010/02/17 (Wed) @ 17:30

The biggest additions we can make to this database:

1) Day-to-day injuries.
2) Injuries in spring training.
3) September injuries.

In other words, injuries that don’t result in a DL trip. That’s what I like about Corey’s tool.


#7    Josh      (see all posts) 2010/02/17 (Wed) @ 17:35

Joe

Totally agree. Corey’s is superior, and that’s why I reached out to him as well. But he’s in talks to license it to Bloomberg.

If anyone can generate a list of players that fit the categories you mention, I still have a few weeks and some resources I can devote to making it better. I just need a little help.


#8          (see all posts) 2010/02/17 (Wed) @ 18:51

Josh, this is very cool! Thanks. Can I just say that I love you for including the retroID? Is there anyway you could include the eliasid in the future?

Thanks!

Josh


#9    Nick Steiner      (see all posts) 2010/02/17 (Wed) @ 18:55

Is eliasid the same thing as MLBAM?


#10    Tangotiger      (see all posts) 2010/02/17 (Wed) @ 19:04

MLBAM is not eliasID.  I know that others have used them interchangeably.  They are nothing the same.

MLBAM is used in MLB.com and Gameday.


#11    Rally      (see all posts) 2010/02/17 (Wed) @ 22:01

This is awesome.  What’s needed is a system that allows a collaborative effort to keep this up to date.  If a bunch of people take this and try to update by themselves, getting some injuries and missing others, it would be a fulltime job for someone to reconcile the various copies.


#12    Josh      (see all posts) 2010/02/17 (Wed) @ 22:23

Sean/11

It’s a great point - i have 2 ideas.

1. for players with non-DL stints from 02-09, all I need is a list of names that are missing. I will do the rest to make sure that the data is sound. This could be generated using a wiki or simply by posting in the comments section of a blog post.

2. Moving forward, it wouldn’t be too hard to code up a front end to enter 2010 data into the thing if there is real interest. It could check the date and player name and only allow additions that were unique.

Problem with 2 is data vandalism. There will always need to be someone involved to vet the data. Which is why, in the end, it will probably just be easiest to send additions to me (or whomever becomes the designated steward) and have them reconcile things.

Since this will likely happen at year end, not a full time job I don’t think. Also it seems David has in-season stuff covered.


#13          (see all posts) 2010/02/17 (Wed) @ 22:43

Meh, sorry. I meant Rally above.


#14    Rally      (see all posts) 2010/02/18 (Thu) @ 00:25

Actually, updates shouldn’t be too hard.  At least for official DL visits.  MLB.com’s transactions page lists players placed on the DL and activated from it.  I would just copy into excel and search the text for “activated"/"placed", and get rid of all the signings/trades.

I’m sure somebody using more efficient code than me could find a better way to do it, but as a fallback, this would be no big deal at all getting a list of players and dates.  The injury coding might be a bit trickier.


#15    Josh      (see all posts) 2010/02/18 (Thu) @ 00:51

That’s how I got my list. Works gud. wink

Any idea where to get a list of day-to-day & offseason injuries, other than scouring the news aggregators and hoping to strike gold?


#16    Nick Steiner      (see all posts) 2010/02/18 (Thu) @ 03:04

Tango/10

In my Gameday database, players are referenced under their “eliasid”.  I’m using Mike Fast’s setup and all of his scripts. 

Okay, I now see that Elias has nothing to do with the actual ID’s, and it’s all just MLBAM.  How confusing!  And I found the Lahman to Gameday ID mapping that I was looking for. 

http://www.insidethebook.com/ee/index.php/site/comments/mapping_ids/#8

Thanks again Mike!


#17    Peter Jensen      (see all posts) 2010/02/18 (Thu) @ 03:25

Josh #11 - The Gameday files list players who had to leave the game due to an injury.  That might be a start.  The Rotowire reports at Fangraphs usually give pretty detailed information on injuries.  If you subscribe I think that you get the information in a downloadable format, but I am not certain of that.  If you know that an injury has occurred (from Gameday) the MLB team pages usually have an article that gives a good description of the injury and the player’s status.


#18    Brian Cartwright      (see all posts) 2010/02/18 (Thu) @ 05:12

Once a player has been placed on the DL, that status is listed in the players file at Gameday - but sometimes the operators get lazy and don’t list players on the DL. Instead of counting DL days from the player roster, I’ve queried for the dates when the status changes.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 03 23:19
Susan G. Komen

Feb 03 23:03
Danks or Garza?  ToMAYto, ToMAHto?

Feb 03 20:18
Aasif Mavi and The Daily Show

Feb 03 20:06
Werth: How long can a non-CF stay in CF?

Feb 03 19:54
Illusion of numbers

Feb 03 18:02
Knowing enough about numbers to be dangerous

Feb 03 16:36
Who’s evaluating the 2011 forecasts this year?

Feb 03 13:47
Are relievers being used optimally, compared to 1980?

Feb 03 13:00
Casey Kotchman line

Feb 03 12:11
ULTIMATE BASEBALL THE GAME