Wednesday, April 16, 2008
Retrosheet Database, part 1
With taxes finally filed, and my (now) lack of interest in forecasting systems analysis behind me, I’ll be focusing on building a Retrosheet database. I was envisioning releasing everything I do all at once. But, I had second thoughts about that. I might as well release things as I do them.
So, follow along with me, and you can build your database with me, and we can finally share our SQL code once we finish building it, since we’ll all be using the exact same names for everything. This is what you have to do to start off:
1. Download CWEVENT
http://sourceforge.net/project/showfiles.php?group_id=57086
I’m using version 0.5.1, for Windows.
2. Download all the Retrosheet event files. I use Internet Download Accelerator:
http://westbyte.com/ida/
I think they have a 30-day free trial. I think I paid 30$ for the full version. It suits my needs, because I do alot of spidering. Baseball Hacks has some perl scripts to do the same thing. You can also go to CNET’s Download.com for other free versions. I’ve also used GetRight.com, and that one is free I think.
I can post the XML file that IDA uses for the list of files to download, this way you simply open up this XML file, and you will download exactly the files I download.
Placeholder for Tom
3. Execute CWEVENT.exe
Again, I have a batch file, and I’ll post that when I get home. You simply execute the batch file, and you’ll get everything extracted for you. Put this batch file in the same folder as the .ev* files. Name it $cwevent.bat. The $ simply sorts it to the top. My cwevent.exe file is in the folder noted below. You can create your own folder scheme.
REM CWEVENT
REM
REM First line includes the header flag (-n). Decide whether you need it or not.
REM C:\Retrosheet\Ted_Chadwick\cwevent -n -f 0-96 -x 0-50 -y 2007 2007*.ev* > all2007.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 2007 2007*.ev* > all2007.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 2006 2006*.ev* > all2006.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 2005 2005*.ev* > all2005.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 2004 2004*.ev* > all2004.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 2003 2003*.ev* > all2003.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 2002 2002*.ev* > all2002.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 2001 2001*.ev* > all2001.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 2000 2000*.ev* > all2000.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1998 1998*.ev* > all1998.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1997 1997*.ev* > all1997.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1996 1996*.ev* > all1996.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1995 1995*.ev* > all1995.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1994 1994*.ev* > all1994.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1993 1993*.ev* > all1993.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1992 1992*.ev* > all1992.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1991 1991*.ev* > all1991.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1990 1990*.ev* > all1990.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1989 1989*.ev* > all1989.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1988 1988*.ev* > all1988.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1987 1987*.ev* > all1987.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1986 1986*.ev* > all1986.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1985 1985*.ev* > all1985.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1984 1984*.ev* > all1984.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1983 1983*.ev* > all1983.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1982 1982*.ev* > all1982.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1981 1981*.ev* > all1981.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1980 1980*.ev* > all1980.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1979 1979*.ev* > all1979.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1978 1978*.ev* > all1978.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1977 1977*.ev* > all1977.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1976 1976*.ev* > all1976.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1975 1975*.ev* > all1975.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1974 1974*.ev* > all1974.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1973 1973*.ev* > all1973.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1972 1972*.ev* > all1972.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1971 1971*.ev* > all1971.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1970 1970*.ev* > all1970.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1969 1969*.ev* > all1969.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1968 1968*.ev* > all1968.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1967 1967*.ev* > all1967.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1966 1966*.ev* > all1966.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1965 1965*.ev* > all1965.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1964 1964*.ev* > all1964.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1963 1963*.ev* > all1963.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1962 1962*.ev* > all1962.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1961 1961*.ev* > all1961.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1960 1960*.ev* > all1960.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1959 1959*.ev* > all1959.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1958 1958*.ev* > all1958.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1957 1957*.ev* > all1957.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1956 1956*.ev* > all1956.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1954 1954*.ev* > all1954.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1922 1922*.ev* > all1922.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1921 1921*.ev* > all1921.csv
C:\Retrosheet\Ted_Chadwick\cwevent -f 0-96 -x 0-50 -y 1911 1911*.ev* > all1911.csv
4. Create your table.
This is the DDL for Oracle. Other flavors of database systems will be very similar, with likely only the VARCHAR2 needing to be changed to VARCHAR or CHAR or STRING. So, if someone is using MySQL, maybe you can post the equivalents for INTEGER and VARCHAR2.
Note: the names are 99% solid, and this is why I didn’t want to do this piecemeal, in case I change the names slightly.
Note2: I created an extra column, which is an autosequence field. I might end up using that as my alternate key to the database.
DROP TABLE events;
CREATE TABLE events (
seq_events integer
,GAME_ID VARCHAR2(12)
,AWAY_TEAM_ID VARCHAR2(3)
,INN_CT INTEGER
,BAT_HOME_ID INTEGER
,OUTS_CT INTEGER
,BALLS_CT INTEGER
,STRIKES_CT INTEGER
,PITCH_SEQ_TX VARCHAR2(40)
,AWAY_SCORE_CT INTEGER
,HOME_SCORE_CT INTEGER
,BAT_ID VARCHAR2(8)
,BAT_HAND_CD VARCHAR2(1)
,RESP_BAT_ID VARCHAR2(8)
,RESP_BAT_HAND_CD VARCHAR2(1)
,PIT_ID VARCHAR2(8)
,PIT_HAND_CD VARCHAR2(1)
,RESP_PIT_ID VARCHAR2(8)
,RESP_PIT_HAND_CD VARCHAR2(1)
,POS2_FLD_ID VARCHAR2(8)
,POS3_FLD_ID VARCHAR2(8)
,POS4_FLD_ID VARCHAR2(8)
,POS5_FLD_ID VARCHAR2(8)
,POS6_FLD_ID VARCHAR2(8)
,POS7_FLD_ID VARCHAR2(8)
,POS8_FLD_ID VARCHAR2(8)
,POS9_FLD_ID VARCHAR2(8)
,BASE1_RUN_ID VARCHAR2(8)
,BASE2_RUN_ID VARCHAR2(8)
,BASE3_RUN_ID VARCHAR2(8)
,EVENT_TX VARCHAR2(100)
,LEADOFF_FL VARCHAR2(1)
,PH_FL VARCHAR2(1)
,BAT_FLD_CD VARCHAR2(2)
,BAT_LINEUP_ID INTEGER
,EVENT_CD INTEGER
,BAT_EVENT_FL VARCHAR2(1)
,AB_FL VARCHAR2(1)
,H_CD INTEGER
,SH_FL VARCHAR2(1)
,SF_FL VARCHAR2(1)
,EVENT_OUTS_CT INTEGER
,DP_FL VARCHAR2(1)
,TP_FL VARCHAR2(1)
,RBI_CT INTEGER
,WP_FL VARCHAR2(1)
,PB_FL VARCHAR2(1)
,FLD_CD INTEGER
,BATTEDBALL_CD VARCHAR2(1)
,BUNT_FL VARCHAR2(1)
,FOUL_FL VARCHAR2(1)
,BATTEDBALL_LOC_TX VARCHAR2(5)
,ERR_CT INTEGER
,ERR1_FLD_CD VARCHAR2(8)
,ERR1_CD VARCHAR2(1)
,ERR2_FLD_CD VARCHAR2(8)
,ERR2_CD VARCHAR2(1)
,ERR3_FLD_CD VARCHAR2(8)
,ERR3_CD VARCHAR2(1)
,BAT_DEST_ID INTEGER
,RUN1_DEST_ID INTEGER
,RUN2_DEST_ID INTEGER
,RUN3_DEST_ID INTEGER
,BAT_PLAY_TX VARCHAR2(8)
,RUN1_PLAY_TX VARCHAR2(15)
,RUN2_PLAY_TX VARCHAR2(15)
,RUN3_PLAY_TX VARCHAR2(15)
,RUN1_SB_FL VARCHAR2(1)
,RUN2_SB_FL VARCHAR2(1)
,RUN3_SB_FL VARCHAR2(1)
,RUN1_CS_FL VARCHAR2(1)
,RUN2_CS_FL VARCHAR2(1)
,RUN3_CS_FL VARCHAR2(1)
,RUN1_PK_FL VARCHAR2(1)
,RUN2_PK_FL VARCHAR2(1)
,RUN3_PK_FL VARCHAR2(1)
,RUN1_RESP_PIT_ID VARCHAR2(8)
,RUN2_RESP_PIT_ID VARCHAR2(8)
,RUN3_RESP_PIT_ID VARCHAR2(8)
,GAME_NEW_FL VARCHAR2(1)
,GAME_END_FL VARCHAR2(1)
,PR_RUN1_FL VARCHAR2(1)
,PR_RUN2_FL VARCHAR2(1)
,PR_RUN3_FL VARCHAR2(1)
,REMOVED_FOR_PR_RUN1_ID VARCHAR2(8)
,REMOVED_FOR_PR_RUN2_ID VARCHAR2(8)
,REMOVED_FOR_PR_RUN3_ID VARCHAR2(8)
,REMOVED_FOR_PH_BAT_ID VARCHAR2(8)
,REMOVED_FOR_PH_BAT_FLD_CD VARCHAR2(8)
,PO1_FLD_CD INTEGER
,PO2_FLD_CD INTEGER
,PO3_FLD_CD INTEGER
,ASS1_FLD_CD INTEGER
,ASS2_FLD_CD INTEGER
,ASS3_FLD_CD INTEGER
,ASS4_FLD_CD INTEGER
,ASS5_FLD_CD INTEGER
,EVENT_ID INTEGER
,HOME_TEAM_ID VARCHAR2(3)
,BAT_TEAM_ID VARCHAR2(3)
,FLD_TEAM_ID VARCHAR2(3)
,BAT_LAST_ID INTEGER
,INN_NEW_FL VARCHAR2(1)
,INN_END_FL VARCHAR2(1)
,START_BAT_SCORE_CT INTEGER
,START_FLD_SCORE_CT INTEGER
,INN_RUNS_CT INTEGER
,GAME_PA_CT INTEGER
,INN_PA_CT INTEGER
,PA_NEW_FL VARCHAR2(1)
,PA_TRUNC_FL VARCHAR2(1)
,START_BASES_CD INTEGER
,END_BASES_CD INTEGER
,BAT_START_FL VARCHAR2(1)
,PIT_START_FL VARCHAR2(1)
,RUN1_FLD_CD VARCHAR2(2)
,RUN1_LINEUP_CD INTEGER
,RUN1_ORIGIN_EVENT_ID INTEGER
,RUN2_FLD_CD VARCHAR2(2)
,RUN2_LINEUP_CD INTEGER
,RUN2_ORIGIN_EVENT_ID INTEGER
,RUN3_FLD_CD VARCHAR2(2)
,RUN3_LINEUP_CD INTEGER
,RUN3_ORIGIN_EVENT_ID INTEGER
,PA_BALL_CT INTEGER
,PA_INTENT_BALL_CT INTEGER
,PA_PITCHOUT_BALL_CT INTEGER
,PA_OTHER_BALL_CT INTEGER
,PA_STRIKE_CT INTEGER
,PA_CALLED_STRIKE_CT INTEGER
,PA_SWINGMISS_STRIKE_CT INTEGER
,PA_FOUL_STRIKE_CT INTEGER
,PA_OTHER_STRIKE_CT INTEGER
,EVENT_RUNS_CT INTEGER
,FLD_ID VARCHAR2(8)
,BASE2_FORCE_FL VARCHAR2(1)
,BASE3_FORCE_FL VARCHAR2(1)
,BASE4_FORCE_FL VARCHAR2(1)
,BAT_SAFE_ERR_FL VARCHAR2(1)
,BAT_FATE_ID INTEGER
,RUN1_FATE_ID INTEGER
,RUN2_FATE_ID INTEGER
,RUN3_FATE_ID INTEGER
,FATE_RUNS_CT INTEGER
,ASS6_FLD_CD INTEGER
,ASS7_FLD_CD INTEGER
,ASS8_FLD_CD INTEGER
,ASS9_FLD_CD INTEGER
,ASS10_FLD_CD INTEGER
);
ALTER TABLE EVENTS
ADD CONSTRAINT PK_EVENTS
PRIMARY KEY (GAME_ID, EVENT_ID)
;
commit;
You also need to create a SEQUENCE:
CREATE SEQUENCE SEQ_EVENTS
START WITH 0
INCREMENT BY 1
MINVALUE 0
NOCACHE
NOCYCLE
NOORDER
;
5. Load the table.
I’ll post this when I get home, but this is where we’ll get divergence. Oracle has its own loader, so we’ll probably have to come up with a unique implementation for whatever you guys are using.
This is the control file you need:
OPTIONS ( ROWS=10000, bindsize=6000000, ERRORS=99999 )
LOAD DATA
INFILE 'C:\Retro2\data\events\parsed\all2007.csv' BADFILE 'C:\Retro2\data\events\parsed\all2007.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all2007.dsc'
INFILE 'C:\Retro2\data\events\parsed\all2006.csv' BADFILE 'C:\Retro2\data\events\parsed\all2006.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all2006.dsc'
INFILE 'C:\Retro2\data\events\parsed\all2005.csv' BADFILE 'C:\Retro2\data\events\parsed\all2005.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all2005.dsc'
INFILE 'C:\Retro2\data\events\parsed\all2004.csv' BADFILE 'C:\Retro2\data\events\parsed\all2004.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all2004.dsc'
INFILE 'C:\Retro2\data\events\parsed\all2003.csv' BADFILE 'C:\Retro2\data\events\parsed\all2003.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all2003.dsc'
INFILE 'C:\Retro2\data\events\parsed\all2002.csv' BADFILE 'C:\Retro2\data\events\parsed\all2002.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all2002.dsc'
INFILE 'C:\Retro2\data\events\parsed\all2001.csv' BADFILE 'C:\Retro2\data\events\parsed\all2001.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all2001.dsc'
INFILE 'C:\Retro2\data\events\parsed\all2000.csv' BADFILE 'C:\Retro2\data\events\parsed\all2000.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all2000.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1998.csv' BADFILE 'C:\Retro2\data\events\parsed\all1998.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1998.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1997.csv' BADFILE 'C:\Retro2\data\events\parsed\all1997.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1997.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1996.csv' BADFILE 'C:\Retro2\data\events\parsed\all1996.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1996.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1995.csv' BADFILE 'C:\Retro2\data\events\parsed\all1995.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1995.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1994.csv' BADFILE 'C:\Retro2\data\events\parsed\all1994.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1994.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1993.csv' BADFILE 'C:\Retro2\data\events\parsed\all1993.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1993.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1992.csv' BADFILE 'C:\Retro2\data\events\parsed\all1992.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1992.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1991.csv' BADFILE 'C:\Retro2\data\events\parsed\all1991.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1991.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1990.csv' BADFILE 'C:\Retro2\data\events\parsed\all1990.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1990.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1989.csv' BADFILE 'C:\Retro2\data\events\parsed\all1989.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1989.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1988.csv' BADFILE 'C:\Retro2\data\events\parsed\all1988.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1988.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1987.csv' BADFILE 'C:\Retro2\data\events\parsed\all1987.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1987.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1986.csv' BADFILE 'C:\Retro2\data\events\parsed\all1986.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1986.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1985.csv' BADFILE 'C:\Retro2\data\events\parsed\all1985.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1985.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1984.csv' BADFILE 'C:\Retro2\data\events\parsed\all1984.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1984.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1983.csv' BADFILE 'C:\Retro2\data\events\parsed\all1983.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1983.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1982.csv' BADFILE 'C:\Retro2\data\events\parsed\all1982.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1982.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1981.csv' BADFILE 'C:\Retro2\data\events\parsed\all1981.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1981.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1980.csv' BADFILE 'C:\Retro2\data\events\parsed\all1980.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1980.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1979.csv' BADFILE 'C:\Retro2\data\events\parsed\all1979.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1979.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1978.csv' BADFILE 'C:\Retro2\data\events\parsed\all1978.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1978.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1977.csv' BADFILE 'C:\Retro2\data\events\parsed\all1977.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1977.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1976.csv' BADFILE 'C:\Retro2\data\events\parsed\all1976.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1976.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1975.csv' BADFILE 'C:\Retro2\data\events\parsed\all1975.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1975.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1974.csv' BADFILE 'C:\Retro2\data\events\parsed\all1974.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1974.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1973.csv' BADFILE 'C:\Retro2\data\events\parsed\all1973.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1973.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1972.csv' BADFILE 'C:\Retro2\data\events\parsed\all1972.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1972.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1971.csv' BADFILE 'C:\Retro2\data\events\parsed\all1971.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1971.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1970.csv' BADFILE 'C:\Retro2\data\events\parsed\all1970.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1970.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1969.csv' BADFILE 'C:\Retro2\data\events\parsed\all1969.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1969.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1968.csv' BADFILE 'C:\Retro2\data\events\parsed\all1968.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1968.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1967.csv' BADFILE 'C:\Retro2\data\events\parsed\all1967.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1967.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1966.csv' BADFILE 'C:\Retro2\data\events\parsed\all1966.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1966.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1965.csv' BADFILE 'C:\Retro2\data\events\parsed\all1965.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1965.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1964.csv' BADFILE 'C:\Retro2\data\events\parsed\all1964.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1964.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1963.csv' BADFILE 'C:\Retro2\data\events\parsed\all1963.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1963.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1962.csv' BADFILE 'C:\Retro2\data\events\parsed\all1962.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1962.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1961.csv' BADFILE 'C:\Retro2\data\events\parsed\all1961.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1961.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1960.csv' BADFILE 'C:\Retro2\data\events\parsed\all1960.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1960.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1959.csv' BADFILE 'C:\Retro2\data\events\parsed\all1959.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1959.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1958.csv' BADFILE 'C:\Retro2\data\events\parsed\all1958.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1958.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1957.csv' BADFILE 'C:\Retro2\data\events\parsed\all1957.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1957.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1956.csv' BADFILE 'C:\Retro2\data\events\parsed\all1956.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1956.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1954.csv' BADFILE 'C:\Retro2\data\events\parsed\all1954.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1954.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1922.csv' BADFILE 'C:\Retro2\data\events\parsed\all1922.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1922.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1921.csv' BADFILE 'C:\Retro2\data\events\parsed\all1921.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1921.dsc'
INFILE 'C:\Retro2\data\events\parsed\all1911.csv' BADFILE 'C:\Retro2\data\events\parsed\all1911.bad' DISCARDFILE 'C:\Retro2\data\events\parsed\all1911.dsc'
INTO TABLE "EVENTS"
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"' AND '"'
TRAILING NULLCOLS
(GAME_ID
, AWAY_TEAM_ID
, INN_CT
, BAT_HOME_ID
, OUTS_CT
, BALLS_CT
, STRIKES_CT
, PITCH_SEQ_TX
, AWAY_SCORE_CT
, HOME_SCORE_CT
, BAT_ID
, BAT_HAND_CD
, RESP_BAT_ID
, RESP_BAT_HAND_CD
, PIT_ID
, PIT_HAND_CD
, RESP_PIT_ID
, RESP_PIT_HAND_CD
, POS2_FLD_ID
, POS3_FLD_ID
, POS4_FLD_ID
, POS5_FLD_ID
, POS6_FLD_ID
, POS7_FLD_ID
, POS8_FLD_ID
, POS9_FLD_ID
, BASE1_RUN_ID
, BASE2_RUN_ID
, BASE3_RUN_ID
, EVENT_TX
, LEADOFF_FL
, PH_FL
, BAT_FLD_CD
, BAT_LINEUP_ID
, EVENT_CD
, BAT_EVENT_FL
, AB_FL
, H_CD
, SH_FL
, SF_FL
, EVENT_OUTS_CT
, DP_FL
, TP_FL
, RBI_CT
, WP_FL
, PB_FL
, FLD_CD
, BATTEDBALL_CD
, BUNT_FL
, FOUL_FL
, BATTEDBALL_LOC_TX
, ERR_CT
, ERR1_FLD_CD
, ERR1_CD
, ERR2_FLD_CD
, ERR2_CD
, ERR3_FLD_CD
, ERR3_CD
, BAT_DEST_ID
, RUN1_DEST_ID
, RUN2_DEST_ID
, RUN3_DEST_ID
, BAT_PLAY_TX
, RUN1_PLAY_TX
, RUN2_PLAY_TX
, RUN3_PLAY_TX
, RUN1_SB_FL
, RUN2_SB_FL
, RUN3_SB_FL
, RUN1_CS_FL
, RUN2_CS_FL
, RUN3_CS_FL
, RUN1_PK_FL
, RUN2_PK_FL
, RUN3_PK_FL
, RUN1_RESP_PIT_ID
, RUN2_RESP_PIT_ID
, RUN3_RESP_PIT_ID
, GAME_NEW_FL
, GAME_END_FL
, PR_RUN1_FL
, PR_RUN2_FL
, PR_RUN3_FL
, REMOVED_FOR_PR_RUN1_ID
, REMOVED_FOR_PR_RUN2_ID
, REMOVED_FOR_PR_RUN3_ID
, REMOVED_FOR_PH_BAT_ID
, REMOVED_FOR_PH_BAT_FLD_CD
, PO1_FLD_CD
, PO2_FLD_CD
, PO3_FLD_CD
, ASS1_FLD_CD
, ASS2_FLD_CD
, ASS3_FLD_CD
, ASS4_FLD_CD
, ASS5_FLD_CD
, EVENT_ID
, HOME_TEAM_ID
, BAT_TEAM_ID
, FLD_TEAM_ID
, BAT_LAST_ID
, INN_NEW_FL
, INN_END_FL
, START_BAT_SCORE_CT
, START_FLD_SCORE_CT
, INN_RUNS_CT
, GAME_PA_CT
, INN_PA_CT
, PA_NEW_FL
, PA_TRUNC_FL
, START_BASES_CD
, END_BASES_CD
, BAT_START_FL
, PIT_START_FL
, RUN1_FLD_CD
, RUN1_LINEUP_CD
, RUN1_ORIGIN_EVENT_ID
, RUN2_FLD_CD
, RUN2_LINEUP_CD
, RUN2_ORIGIN_EVENT_ID
, RUN3_FLD_CD
, RUN3_LINEUP_CD
, RUN3_ORIGIN_EVENT_ID
, PA_BALL_CT
, PA_INTENT_BALL_CT
, PA_PITCHOUT_BALL_CT
, PA_OTHER_BALL_CT
, PA_STRIKE_CT
, PA_CALLED_STRIKE_CT
, PA_SWINGMISS_STRIKE_CT
, PA_FOUL_STRIKE_CT
, PA_OTHER_STRIKE_CT
, EVENT_RUNS_CT
, FLD_ID
, BASE2_FORCE_FL
, BASE3_FORCE_FL
, BASE4_FORCE_FL
, BAT_SAFE_ERR_FL
, BAT_FATE_ID
, RUN1_FATE_ID
, RUN2_FATE_ID
, RUN3_FATE_ID
, FATE_RUNS_CT
, ASS6_FLD_CD
, ASS7_FLD_CD
, ASS8_FLD_CD
, ASS9_FLD_CD
, ASS10_FLD_CD
, SEQ_EVENTS "SEQ_EVENTS.NEXTVAL"
)
And this is the batch file that will execute the load:
"c:\oracle\ora92\bin\sqlldr" userid=myname/mypass control="C:\Retrosheet\chadwick\control\events.ctl"
Where myname and mypass is the username and password of your database user.
This is what my Windows directory looks like, if you want to set it up exactly as I do:
C:
Retro2
data
events
parsed
unzipped
zipped
Retrosheet
chadwick
control
programs
sql
Ted_Chadwick
No guarantee that I won’t change things around, nor that I’ll update anything I do. This is all as-is, with no promise of support. Basically, you get to intermittently look over my shoulder.


Why don’t you just do what was in Mark Adler’s book? He provides code and the way he does it the database will stay up to date each year and there are less manual steps. Or at least use it as a starting point. He also used the simpler for beginners and free MySQL.