THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, January 16, 2007

Looking for Data?

By Tangotiger, 10:56 AM

Here’s a handy one-stop shop for all online data you can find.  Feel free to add more in the comments area, and I will update the main entry accordingly.


http://www.retrosheet.org/boxesetc/
http://www.retrosheet.org/boxesetc/Jpujoa0010.htm
Box Scores, PBP data, players, umps, schedules, the mother load

http://www.fangraphs.com/statss.aspx?playerid=1177&position=1B
Event data, including batted ball types, win probability, pitches thrown

http://mlb.mlb.com/NASApp/mlb/stats/historical/individual_stats_player.jsp?c_id=mlb&playerID=405395&section1=1&section2=1&statSet2=1&section3=1&statSet3=1&statSet1=2
Includes groundball outs, air outs, pitches thrown

http://sports.yahoo.com/mlb/players/6619/splits
Yahoo’s player pages where you can see current season, past season, career player splits.

http://www.baseballmusings.com/cgi-bin/PlayerInfo.py?PlayerID=1177
Day by Day player game log

http://baseball1.com/statistics/
The place to download the Lahman database

http://www.baseball-reference.com/
http://www.baseball-reference.com/p/pujolal01.shtml
The online historical baseball site.

http://www.thebaseballcube.com/
http://www.thebaseballcube.com/players/P/Albert-Pujols.shtml
Baseball encyclopedia that also provides minor league data for players

http://www.baseball-almanac.com/
An almanac of baseball

http://www.baseballprospectus.com/dt/pujolal01.php
Includes extra context-specific adjustments

http://www.minorleaguesplits.com/pl/425/425834DurILbbip06.html
Minor league data, including balls in play, and splits

http://firstinning.com/players/B.J.-Upton-665/
Minor league data, year-by-year register per player

***

Definitions, glossaries

http://www.hardballtimes.com/main/statpages/glossary/

http://www.baseballprospectus.com/glossary/index.php?context=all&category=true

SabermetricsData
#1    Patriot      (see all posts) 2007/01/16 (Tue) @ 11:24

This program doesn’t include any data that’s not available elsewhere, but it allows for some decent sorting/filtering for those who aren’t Access whizs.  It also lists primary position with the offensive stats which is proving quite useful to me right now.


#2    Tangotiger      (see all posts) 2007/01/16 (Tue) @ 11:46

Thanks, I’ve heard of a few people using that, but I’ve never tried.

Your position comment reminds me that I should do an add-on to the Lahman DB to include primary position.  I’ll try to get to that this month.


#3          (see all posts) 2007/01/16 (Tue) @ 14:44

Hardball Times (you’ve already included their glossary, but not their stats) has win shares and some other stats not found elsewhere:
http://www.hardballtimes.com/main/stats/

ESPN has more complete stats than yahoo (this may be redundant with the MLB link, but I find the MLB stats a real pain to navigate):
http://sports.espn.go.com/mlb/statistics

If you want to include salary information:
http://mlbcontracts.blogspot.com
http://mlb4u.com


#4    vro      (see all posts) 2007/01/17 (Wed) @ 15:42

pat - what program?


#5    tangotiger      (see all posts) 2007/01/17 (Wed) @ 15:50

Click Patriot’s name.


#6    Trader Joe      (see all posts) 2007/01/18 (Thu) @ 10:57

BP has yearly umpire stats:
http://www.baseballprospectus.com/statistics/sortable/index.php?cid=139814

Dan Fox has calculated baserunning stats (2006 only—using different method from James): http://danagonistes.blogspot.com/search/label/Baserunning

Fox also has a (free) Windows application for directional charts for BIP, described and linked here (may require BP sub to download): http://www.baseballprospectus.com/article.php?articleid=5764


#7    Rally      (see all posts) 2007/01/18 (Thu) @ 11:53

I downloaded Fox’s batted ball charts about a month ago, didn’t need a BP subscription.  Its pretty cool.


#8    Rally      (see all posts) 2007/01/18 (Thu) @ 12:04

I have a link to his unfiltered post.  You can download the program from there.

http://lanaheimangelfan.blogspot.com/2006/12/batted-ball-charts.html


#9    Trader Joe      (see all posts) 2007/01/18 (Thu) @ 12:18

Thanks. Here’s a link to Fox’s BP Unfiltered post with his BIP program (to replace the link to BP premium):

http://www.baseballprospectus.com/unfiltered/?p=98


#10    salb918      (see all posts) 2007/01/18 (Thu) @ 12:31

Does anybody know where I could get a list of retrosheet player IDs?  I couldn’t find it easily on retrosheet, but I’ve been known to miss things.


#11    Rally      (see all posts) 2007/01/18 (Thu) @ 14:20

Doesn’t the master list of the Lahman database have that?  I can’t remember for suer but they do list several player ID’s.


#12    Tangotiger      (see all posts) 2007/01/18 (Thu) @ 14:43

Right, the MASTER table in the BDB or Lahman database has it.  Until Lahman is ready for distribution, you can use this:
http://www.insidethebook.com/ee/index.php/site/article/complete_baseball_database/

You can get the Retro Id in the .ros file in the 2006 event files.


#13    salb918      (see all posts) 2007/01/18 (Thu) @ 18:08

Rally, Tango—thanks!


#14    vro      (see all posts) 2007/01/18 (Thu) @ 18:14

Tango - Other than MYSQL, What’s the best database to play with the data? Access? Is there anything you use that’s free?


#15    Tangotiger      (see all posts) 2007/01/18 (Thu) @ 18:29

If you are new, MS Access is the best one to use, bar none.  It has a fantastic user interface, and has a wonderful export/import facility, including with Excel.  And I hate Microsoft, so that says something.

Most other database have a free version, including Oracle.  Even Oracle’s full version is free, for development-use only (i.e., non-production).
http://www.oracle.com/technology/products/database/oracle10g/index.html

At the bottom, you will see various flavors of Oracle.  On the right side, you’ll have downloads, including to the Express Edition, which is free to use, even in production.

Oracle also has SQL Developer for its GUI capabilities.

SQL Server is also fairly strong, and I think they came out with a free version as well.  It has a good GUI.

I’ve never really used MySQL.  SQLlite seems to be good, but I’ve never really used it.  But it has lots of fans.

If you have less than 1 million records, and don’t anticipate doing too many joins of tables, I would stick with Access.  I processed the 99-02 data using Access for The Book.  I tried Oracle, but with all the off-the-cuff stuff I was doing, MS Access was far more practical.


#16    Jim A      (see all posts) 2007/01/19 (Fri) @ 01:50

I agree that MS Access is best for database newbies.  But make sure you back up your data regularly, preferably to a non MS format.  Access has had many well-documented issues with data loss and corruption (technically it’s the underlying Jet engine where problems exist and is no longer maintained by MS).

I personally use MySQL.  The GUI tools are actually quite good these days.  Plus, as open source, there is an extensive user community available as a support network for troubleshooting.  PostgreSQL is another good open source database thought by many to be more reliable than MySQL.

I would be hesitant to use SQLServer or Oracle (even the free versions) unless you have DBA experience with them.  Too complex and too much of a black box for anything less than industrial-strength apps.


#17    tangotiger      (see all posts) 2007/01/19 (Fri) @ 08:05

I agree about backing up your Access data, but that’s the case about backing up your Oracle, MySQL, and even text data.  You’d be crazy not to backup your files on a weekly or daily basis.

Access makes it real easy to move your files from DB to text, and vice-versa.  Heck, I provided the shell script to load the BDB database, and it runs very quickly (under a minute I think, to load, I dunno… what half a million records...).  When I load flat files in Oracle, with my computer, I think it’s about 1 million records a minute.  So, the load/unload issue should be moot.

I disagree about DBA experience required for Oracle.  You just need to be confident in what you are doing.  Oracle is not complex at all, but it is very powerful.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:23
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors