THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, June 10, 2009

The HITf/x Database

By Tangotiger, 10:55 AM

This thread is reserved only for sharing your work, in creating a database, loading data, creating tables or views, creating equations to use, etc, etc, etc.  You can ask direct, specific questions.  Consider this thread to be a workshop.

More general posts can be made at the Summit thread.  Consider that thread to be a welcoming type of thread.

Specific discussion of results can be continued at the great thread that was started from Harry’s article on launch and spray angles.  Consider that thread to be alot of yapping, some results-based posts, etc, like a classroom discussion.


#1    John Walsh      (see all posts) 2009/06/10 (Wed) @ 11:45

A question for Mike or Harry or anybody already working with the data.  I have had a first look and am wondering how to use ball trajectory data (GB, FB, etc) with the hit-f/x quantities.  I don’t see a field in the database tables that corresponds to trajectory.

The “result” and “description” fields contain the information for outs, but not for hits.

Can anybody help?


#2    Mike Fast      (see all posts) 2009/06/10 (Wed) @ 11:56

John #5, I’m curious to see others’ answers to your question.

I use SQL queries to update the batted ball type field in my database.  However, afterward I still have a few batted balls that are unclassified.  If it’s particularly important to me for a particular analysis, I view the video.  Otherwise, depending on the application, I leave the unknown events out of my data set, or I assume that they are all ground balls (which the vast majority of them are).

Here are my SQL queries:

SELECT FROM `atbatsWHERE bbtype IS NULL AND event <> "Strikeout" AND
event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x 0

UPDATE 
`atbatsSET bbtype "bu" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND des LIKE "%bunt%"

UPDATE `atbatsSET bbtype "gr" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND des LIKE "%ground ball%"

UPDATE `atbatsSET bbtype "fl" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND des LIKE "%fly ball%"

UPDATE `atbatsSET bbtype "ld" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND des LIKE "%line drive%"

UPDATE `atbatsSET bbtype "gr" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND des LIKE "%grounds into%"

UPDATE `atbatsSET bbtype "ld" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND des LIKE "%lines into%"

UPDATE `atbatsSET bbtype "po" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND des LIKE "%pops into%"

SELECT FROM `atbatsWHERE bbtype IS NULL AND event <> "Strikeout" AND
event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x AND event
NOT LIKE 
"Catcher%" AND event NOT LIKE "Batter%" AND event NOT LIKE "Field%"
AND event NOT LIKE "Sac Fly%" AND event NOT LIKE "Home Run%"

UPDATE atbats SET bbtype "fl" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND event NOT
LIKE 
"Field%" AND des LIKE "%sacrifice fly%"

UPDATE atbats SET bbtype "fl" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND event NOT
LIKE 
"Field%" AND des LIKE "%grand slam%"

UPDATE `atbatsSET bbtype "fl" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND des LIKE
"%reaches on fielding error by right fielder%"

UPDATE `atbatsSET bbtype "fl" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND des LIKE
"%reaches on fielding error by center fielder%"

UPDATE `atbatsSET bbtype "fl" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND des LIKE
"%reaches on fielding error by left fielder%"

UPDATE `atbatsSET bbtype "gr" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND des LIKE
"%missed catch error%assist to second baseman%"

UPDATE `atbatsSET bbtype "gr" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND des LIKE
"%missed catch error%assist to shortstop%"

UPDATE `atbatsSET bbtype "gr" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND des LIKE
"%force attempt%shortstop%"

UPDATE `atbatsSET bbtype "gr" WHERE bbtype IS NULL AND event <>
"Strikeout" AND event NOT LIKE "%Walk" AND event <> "Hit By Pitch" AND hit_x
AND event NOT LIKE "Catcher%" AND event NOT LIKE "Batter%" AND des LIKE
"%force attempt%second baseman%"


#3    Tangotiger      (see all posts) 2009/06/10 (Wed) @ 11:58

I’m going to move some of the posts here to this thread:
http://www.insidethebook.com/ee/index.php/site/comments/pitchf_x_summit_introducing_hitf_x/

I’d really like to keep this thread as a pure workshop.


#4    John Walsh      (see all posts) 2009/06/10 (Wed) @ 12:11

Mike/#2 (for now!)

By “your database”, do you mean your own db that you populate with data downloaded from mlb.com? 

In other words, more than what sportvision has made available with the hit-f/x data?

I know that batted ball type is included in the pfx stream, at least it was last year.


#5    Tangotiger      (see all posts) 2009/06/10 (Wed) @ 12:12

To post the percent sign % you need to type this:

& #37;

WITHOUT the space between the & and #


#6    Mike Fast      (see all posts) 2009/06/10 (Wed) @ 12:16

John, I’m not aware that batted ball type is included in either the HITf/x database from this year or the Pfx stream, other than implicitly in the “event” and “des” fields, which are what my queries attempt to parse.  Am I missing something in the Pfx stream?


#7    Tangotiger      (see all posts) 2009/06/10 (Wed) @ 12:18

Mike’s SQL has been updated in post 2.


#8    Peter Jensen      (see all posts) 2009/06/10 (Wed) @ 12:41

Mike and John - The event descriptions in the Hit f/x file are not as complete as the event descriptions that are currently on the Gameday XML files.  They are missing many fielder references and some batted ball references.  Sportvision must have downloaded them on the same day they were logged and before they were proofed for completeness.  There are also 13 double entries in the Hit f/x files that have to be removed and you should be aware that the “ in the “innings” in the Hit f/x file are actually half innings. That is the bottom half of the first inning is recorded as inning number 2 in the Hit f/x file.


#9          (see all posts) 2009/06/10 (Wed) @ 13:14

I have my own clutzy way of downloading the data.  I save both the pitch and the hit data to separate Excel files.  Then I use the pitch-id field that is common to both, to get info contained in the pitch spreadsheet (such as the Result) into the hit spreadsheet.  As I said, it is a bit clutzy, but it works.  When all is said and done, I export the spreadsheet into Root, where I do all my analysis.


#10    john      (see all posts) 2009/06/10 (Wed) @ 13:35

I’m not too good with SQL but if I wanted all players Hit FX data from “X” team...how would I go about this?


#11    Dan Brooks      (see all posts) 2009/06/10 (Wed) @ 14:31

The sample query on this page allows you to do just that:
http://baseball.sportvision.com/summit/data

Replace the ‘xxx’ with a team name (e.g., ‘bos’wink.


#12    Dan Brooks      (see all posts) 2009/06/10 (Wed) @ 14:34

Obviously don’t add the smiley. For whatever reason a smiley got inserted into my post.


#13    john      (see all posts) 2009/06/10 (Wed) @ 14:43

I was trying to do something like this

SELECT * hitballs
INNER JOIN team INNER JOIN players
ON id.hitballs=id.teams and id.teams=id.players
WHERE teams.code=’NYN’;

I figured that would give me hitballs data for all Mets players.  It didnt work tho :(


#14    Dan Brooks      (see all posts) 2009/06/10 (Wed) @ 14:56

There are a few SQL errors there… you’re missing FROM and the quotes are wrong… what are you trying to do exactly? Following the example (just copy/paste) will do what you were asking for in your first post.


#15    john      (see all posts) 2009/06/10 (Wed) @ 15:02

The sample query they had on their page worked. Thanks.

Basically what I was trying to do was take players from a team and get their average “Initial speed” for each player.

Get something like
David Wright (initial speed)
Jose Reyes (initial speed)
etc etc.


#16    Peter Jensen      (see all posts) 2009/06/10 (Wed) @ 17:22

John - Why don’t you import the whole database into Excel and work with easy Excel queries instead of having to learn SQL?


#17    john      (see all posts) 2009/06/10 (Wed) @ 19:26

I think thats what I’m going to do, at least till I can figure things out

Its just that I got retrosheet, game log, and the baseball databank databases as well so I guess eventually ill have to learn SQL lol.


#18    Rally      (see all posts) 2009/06/10 (Wed) @ 20:43

How big is the database?  Excel has some limitations there, though if you can live with office 2007, they do allow up to a million records


#19    Peter Jensen      (see all posts) 2009/06/10 (Wed) @ 21:07

Rally - Under 15000 records.  I wouldn’t have recommended it as an option if it weren’t well under the 64k limit.  Although I have do Office 2007 and haven’t had any problems with it at all.  And I LOVE the 1000000 record capacity of Excel 2007.  And its ability to directly import XML files.  Despite Tango’s old fogie rants about Access not being quite the same as before, there are really no problems with it either.


#20    Harry Pavlidis      (see all posts) 2009/06/10 (Wed) @ 21:20

select concat(b.first_name,’ ‘,b.last_name),count(h.id) as `#`,
avg(h.hit_initial_speed) as `SOB`,avg(h.hit_vertical_angle) as `IVA`,avg(h.hit_horizontal_angle) as `IHA`
from hitballs h, pitches t, actions a, players b
where t.batter_id = b.id and t.id = h.sv_db_pitch_id and a.pitch_id = t.id
and b.first_name=’juan’ and b.last_name=’pierre’
group by b.id;

John - this will do it by player. And you can add max() min() std() functions, too.

More to come.  And a link to the csv export I sent Alan.


#21    Harry Pavlidis      (see all posts) 2009/06/10 (Wed) @ 21:28

Here’s the csv file I sent Alan. Even if you don’t “need” this, could you check it out and let me know if I’ve missed anything?

http://sites.google.com/site/harrypitchfx/Home/hitsforalan.zip


#22    Harry Pavlidis      (see all posts) 2009/06/10 (Wed) @ 22:03

This gets a team’s batted balls. If, for some reason, you want to see a team other than the cubs, edit ‘chn’ accordingly.You can switch to pitchers’ batted balls against by changing t.batter_id=b.id to t.pitcher_id=b.id. Change `who` = ‘team’ to `who` he ‘opp’ to see the opponents’ stats.

select
concat(b.first_name,’ ‘,b.last_name),
count(h.id) as `#`,
if((c.id=g.home_team_id and round(s.inning/2)=(s.inning/2))or(c.id=g.away_team_id and round(s.inning/2)<>(s.inning/2)),
‘team’,’opp’ ) as `who`,
avg(h.hit_initial_speed) as `SOB`,
avg(h.hit_vertical_angle) as `IVA`,
avg(h.hit_horizontal_angle) as `IHA`
from hitballs h, pitches t, actions a, players b,games g, teams c, scores s
where t.batter_id = b.id and
t.id = h.sv_db_pitch_id and
a.pitch_id = t.id and
g.id=a.game_id and
c.code=’chn’ and
(c.id=g.home_team_id or c.id=g.away_team_id)
and s.id=t.score_id
group by b.id
having `who`=’team’;


#23    Harry Pavlidis      (see all posts) 2009/06/10 (Wed) @ 22:54

btw,

and a.event not like ‘&#xb;u;nt%’

should be added to the where clauses above. Sorry ‘bout that.


#24    Harry Pavlidis      (see all posts) 2009/06/10 (Wed) @ 22:58

’%bunt%’


#25    John Walsh      (see all posts) 2009/06/11 (Thu) @ 03:03

Mike/#6

Yes, the batted ball types (which I sometimes nerdily call trajectories) can be determined, usually, from the “des” field of the pfx stream, that’s what I meant.  There is not a dedicated field in the pfx stream for bbtype. 

The same is true in the hitfx database provided by sportvision, except that the descriptions for hits (as opposed to outs) don’t seem to contain any batted ball info—so bbtype for hits cannot be obtained from the hitfx db.

So, to compare launch angle for different batted ball types, as Harry did in his THT piece, you’d have to link together data from the pfx stream and data from the hitfx db.  Harry, if you are reading, is that what you did?


#26    Peter Jensen      (see all posts) 2009/06/11 (Thu) @ 03:09

John - I don’t know about Harry, but that’s what I did.  See my Post#8 if you haven’t already.


#27    JBrew      (see all posts) 2009/06/11 (Thu) @ 09:20

Harry: that CSV file is great, thanks for supplying it to the less SQL-inclined.


#28    Harry Pavlidis      (see all posts) 2009/06/11 (Thu) @ 12:51

Mike and John - The data you’re looking for is in the action table. action.event and action.description have what you’re looking for.


#29    Mike Fast      (see all posts) 2009/06/11 (Thu) @ 13:43

Harry, do you do something similar to what I do in post #2 in order to get your batted ball types?  Or do you have a different method?

The steps I list in #2 cover almost everything except for some fielding errors and fielder’s choices by the pitchers, catchers, and corner infielders (which might be on bunts).  For 2007-2008, I looked up the batted ball type for fielding errors from Retrosheet and input that into my Pfx database, but of course I can’t do that for 2009 until after the season.


#30    Peter Jensen      (see all posts) 2009/06/11 (Thu) @ 14:27

Mike - I tried to do what you did on the Action table event Description in the Hit f/x database, but ended up downloading the XML files from Gameday for April and parsing the Description field in those instead. I then linked them to the Hit f/x file. My parsing program was similar to yours, but not quite them same, and I also had several hundred fields that my parsing program missed and had to be inputted by hand.  Mostly for the Fielded By information though.


#31    Harry Pavlidis      (see all posts) 2009/06/11 (Thu) @ 14:42

Mike, something like that. I’ll post some more code later tonight if I have a chance.


#32    John Walsh      (see all posts) 2009/06/11 (Thu) @ 16:56

Harry/#28

D’oh—I was sure I had checked both “event” and “description” in the action table, but in fact, I had missed “description”, which does indeed contain batted ball type for hits.

Thanks for pointing that out.


#33          (see all posts) 2009/06/12 (Fri) @ 00:05

Question:  Someone mentioned in another thread the act of calculating the Vxy (and I’d be interested in Vyz as well). 

You would accomplish this using the data under the hit_vx0 and hit_vy0 values, right?  So my first attempt to do this was simply using the pythagorean formula.  Still, the value comes out too high, as the data i have in those columns is way high (Particularly hit_vy0 which goes into the 140s and beyond...which seems a bit fast)

What am i doing wrong here?


#34    Mike Fast      (see all posts) 2009/06/12 (Fri) @ 00:20

Garik, the units on hit_vx0 and hit_vy0 are feet/second.  If you want to convert this to mph, you need to multiply by 3600/5280.


#35          (see all posts) 2009/06/12 (Fri) @ 00:24

Ah, thanks Mike


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 22:33
Clutch analogy

Feb 11 22:08
Who is Jeremy Lin?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul