Friday, November 30, 2007
Google Earth meets Retrosheet
By Tangotiger, 03:25 PM
Philip (Flip) Kromer writes:
Hello,
I needed a file that had geolocations for each park, and separately
wanted to match the BDB team info against the gamelogs database. I've
taken the retrosheet park info from
http://retrosheet.org/boxesetc/MISC/PKDIR.htm , the old parkcode.txt
http://www.retrosheet.org/parkcode.txt info, these Google Earth files:
http://bbs.keyhole.com/ubb/download.php?Number=721289 NL
http://bbs.keyhole.com/ubb/download.php?Number=721294 AL
the MLB team info http://mlb.mlb.com/team/index.jsp and David
Vincent's Alternate Site Games at
http://www.retrosheet.org/neutral.htm and
http://www.retrosheet.org/neutral19.htm , smashed it together and made
a unified file.
The result contains all teams, names and alternate site info from
Retrosheet, geolocations, and address and URL info for active teams.
Please enjoy
http://vizsage.com/apps/baseball/results/parkinfo/parkinfo-all.xml
-- This is the best format: it lists all the info hierachically;
using python's element tree (http://effbot.org/zone/element-index.htm)
or perl's XML::Simple
(http://search.cpan.org/~grantm/XML-Simple-2.18/lib/XML/Simple.pm)
should give you a clean, simple data structur.
http://vizsage.com/apps/baseball/results/parkinfo/parkinfo-flatall.csv
http://vizsage.com/apps/baseball/results/parkinfo/parkinfo-flatall.xml
-- This is the same file, in .csv and in .xml formats, listing the
parks in a flattened (but still parsable) format; it's not a drop-in
replacement for parkcodes.txt but it has the same flavor. If people
are interested in a drop-in parkcodes.txt replacement I can spin that
off pretty easily.
Parks
parkID -- Retrosheet parkID
name
-- The current name, or the last name this stadium was known by.
beg, end, active, games
-- Dates YYYY-MM-DD for the first and last recorded (according
to retrosheet gamelogs) games at that stadium (blank for active or
future sites); whether the site is currently the home stadium for an
active MLB team; and the total number of gamelogs games at that stadium.
lat, lng
-- Geolocation
streetaddr, extaddr, city, state, country, zip, tel
-- Address
url, spanishurl
-- The main URL for active teams, and the URL for its
Spanish-language
logofile
-- The MLB logo file: prefix
http://mlb.mlb.com/mlb/images/team_logos/ to retrieve. I suppose this
should be a full URL, if someone would like to ship me team logos for
past teams I'll fix that.
Teams
teamID -- Retrosheet teamID, with 'ANA' used for the Los Angeles
Angels of Anaheim in Orange County, California, USA, Sol 3, Milky Way,
Local Cluster since 1997 to now.
parkID -- Retrosheet parkID
beg, end
-- Dates YYYY-MM-DD the first and last games recorded by that
team at that stadium (according to retrosheet gamelogs), blank for
active or future sites.
games
-- total number of gamelogs games at that stadium by that team.
altsite
-- Given as "1" if the site is listed in David Vincent's
Alternate Site Games
OtherNames
parkID -- Retrosheet parkID
name
-- A name this park was known by. I used the Retrosheet park
info from http://retrosheet.org/boxesetc/MISC/PKDIR.htm to list off
the official names of each park (flagged with "auth"). If a park was
labelled in some way in one of my other sources, it was also thrown on
the heap.
beg, end
-- For "auth" names, the seasons that the park was marketed
under that name
auth
-- Was this an official name for the park ('Bank One Ballpark'
yes, 'The BOB' no)?
curr
-- Is this the current or last-used-while-MLB-active name for
the park?
Comments
parkID -- Retrosheet parkID
comment
-- comments for each site. There may be some parkcodes.txt
cruft left in there.
The flat files give all of the above in the format
"parkID","name","beg","end","active","games","lat","lng","allteams","allnames","streetaddr","extaddr","city","state","country","zip","tel","url","spanishurl","logofile","allcomments"
where
allteams lists each 'team (beg-end) [alt]', with end=>'now' for an
active park, separated by '; ', and with '[alt]' appended to
alternate-site games
allnames lists each 'name (beg-end)', auth names first, with
end=>'now' for an active park, separated by '; ', and no dates for a
non-auth team.
allcomments lists the comments separated by ' | ' in some arbitrary
order.
I've left five 'proposed' stadiums in the set, including the future
NYA stadium and a few others whose status I'm unsure of. These are
easy enough to remove if the idea offends.
I'm going to try to spin off a Google Earth .kml file from this data,
hopefully later tonight, which will placemark all the geolocated sites
with all the above info in the descriptions.
Cheers,
Philip (Flip) Kromer
http://vizsage.com
PS If you enjoyed playing with the Google Earth thing, Id also like
to point to these other Google Earth files:
All Minor Lg
http://bbs.keyhole.com/ubb/placemarks/997756-minorleague.kmz
AAA Stadiums http://bbs.keyhole.com/ubb/download.php?Number=47579
Football http://bbs.keyhole.com/ubb/download.php?Number=12353
which give geolocations (and neat-o team logos) for minor league and
football stadiums.