THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, November 29, 2010

Gameday slice bias

By Tangotiger, 02:22 PM

How much is the hit location biased based on where a fielder is normally positioned?  The following is a starting point, and as a result, will use a crude estimate.

From 1B to 3B line is 90 degrees.  The circumference of a circle is 2PI*r, or PI*r/2 for a quarter circle.  If we treat the radius as around 115 feet (meaning a spot somewhere between the 1B/3B bags and the 2B bag), then the distance from 1B to 3B bags, along a circular path is 180 feet.  Or, 1 degree = 2 feet.  I know it’s not a circle, but, we just need crude approximations.

Also remember that a SS / 2B are positioned around -16 / +16 degrees, where 0 degrees is 2B. 

Peter provided us with this data from HITf/x (FX) and GameDay (GD), based on spray angles in 4 degree slices:

Spray    FX    GD    rate
-44    107    61    175%
-
40    244    134    182%
-
36    253    297    85%
-
32    309    389    79%
-
28    306    458    67%
-
24    316    347    91%
-
20    369    382    97%
-
16    312    429    73%
-
12    339    300    113%
-
8    328    218    150%
-
4    329    204    161%
0    314    246    128%
4    291    180    162%
8    322    156    206%
12    305    214    143%
16    280    377    74%
20    293    455    64%
24    260    276    94%
28    234    314    75%
32    208    293    71%
36    205    248    83%
40    149    130    115%
44    85    55    155%

rate is FX / GD.

We see that at -16 (meaning -14 to -18 degrees), groundballs are recorded by Gameday far more frequently than HITf/x is recording.  We see an enormous bias in the holes as well.

Now, let’s try an experiment.  Let’s say that the Gameday scorers agree with HITf/x perfectly on half of the batted ball locations, and are off by 4 degrees (8 feet) on the other half?  Let’s start with the 314 balls up the middle (-2 to +2 degrees).  Gameday marks 157 of those up the middle, and for the other 157 record half to the left (-6 to -2) and half to the right (+2 to +6).

At the -4 degrees (-6 to -2), HitF/x had 329 balls, of which half Gameday agrees with, and the other half are all more toward the SS side (at -8 degrees). 

So, this is what we have at -4 degrees as per Gameday:
164.5 balls that HITf/x marked at -4 degrees
78.5 balls that HITf/x marked at 0 degrees

That’s a total of 243 balls marked by Gameday at -4 degrees under this illustration, compared to the 329 actally recorded by HITf/x… but still a far away from the 204 actually recorded by Gameday.

The same thing happens at -8 degrees: of the 328, 164 are properly marked by Gameday, the rest are marked toward the SS side (at -12 degrees).  And we go on until we get to -20 degrees, where the shift happens toward the 2B bag.  This is the result of all that:

Spray    FX    GD    rate    Tango    rate
-44    107    61    175%     53.5     88%
-
40    244    134    182%     175.5     131%
-
36    253    297    85%     248.5     84%
-
32    309    389    79%     281.0     72%
-
28    306    458    67%     307.5     67%
-
24    316    347    91%     311.0     90%
-
20    369    382    97%     498.5     130%
-
16    312    429    73%     510.0     119%
-
12    339    300    113%     333.5     111%
-
8    328    218    150%     328.5     151%
-
4    329    204    161%     243.0     119%
0    314    246    128%     157.0     64%
4    291    180    162%     224.0     124%
8    322    156    206%     322.0     206%
12    305    214    143%     305.0     143%
16    280    377    74%     426.5     113%
20    293    455    64%     423.0     93%
24    260    276    94%     247.0     89%
28    234    314    75%     221.0     70%
32    208    293    71%     206.5     70%
36    205    248    83%     177.0     71%
40    149    130    115%     117.0     90%
44    85    55    155%     42.5     77%

My illustration here shows that my model bridges some of the gap.  The standard deviation of the original FX/GD is 42%, while the Tango/GD is 34%.

And trying different inputs didn’t make much better difference.  If I treat anything between 50% and 75% of the HITf/x data as being perfectly recorded by Gameday, the remain balls in play are about 4 degrees (about 8 feet) biased toward where the fielder is positioned.

I think we can try to construct a more elaborate model, and we’ll probably end up at the following: about half the data from Gameday will match HITf/x, and the other half will be off by 2 to 8 degrees (4 to 16 feet).  The amount it will be off will be biased by either where a fielder is normally positioned, or whether a play was made or not, or how much space between fielders (the holes).

This is the framework I’m proposing.  Implementations will vary.


#1    Rally      (see all posts) 2010/11/29 (Mon) @ 14:45

I wanted to compare the gameday to hitf/x sample (.csv for April 2009 provided by Harry Pavlidis) on a hit by hit basis, but what stopped me was the hit f/x sample did not have the inning listed.

I’m pretty confidant that with gameid, batter, pitcher, inning, and result (1b, 2b, out, etc.) I could match 95-99% of the rows and have a small leftover sample to hand match (basically what I do with retrosheet and gameday hit location).

But without inning it becomes a truly unappealing task.  You can of course learn something by comparing the aggregate distributions from both systems, but I was hoping to see how often disagreements occur, and how large to expect them.  Anyone else tried matching the samples?


#2    Peter Jensen      (see all posts) 2010/11/29 (Mon) @ 14:55

I’ve done it.


#3    Peter Jensen      (see all posts) 2010/11/29 (Mon) @ 14:59

Rally - I believe that the original Hit Fx data that we were given had an inning code that was actually a half inning code.  Something like 1 was the top of the first inning, 2 the bottom of the first, 3 the top of the 2nd, and so on.  Does the data Harry gave you have anything that looks like that?


#4    Chris Dial      (see all posts) 2010/11/29 (Mon) @ 15:10

I’ve done it.

Come here, Watson, I need you!


#5    Rally      (see all posts) 2010/11/29 (Mon) @ 15:37

I’ll check tonight when I get a chance.  The data I have is just what Harry posted, and was linked to here.

Either I missed it, or Harry left it out when he posted the .csv file.


#6    Colin Wyers      (see all posts) 2010/11/29 (Mon) @ 15:40

I don’t have Harry’s CSV file, but IIRC the useful data for merging Hit F/X with Gameday was contained in the pitches table - you’d join hits up with pitches and that’d give you those attributes. I could be wrong, it’s been over a year since I looked at that.


#7          (see all posts) 2010/11/29 (Mon) @ 16:08

the useful data for merging Hit F/X with Gameday was contained in the pitches table - you’d join hits up with pitches and that’d give you those attributes

That’s what I did.  I matched on sv_id at the pitch level plus the home team (because there are a few duplicates in the sv_id field).


#8          (see all posts) 2010/11/29 (Mon) @ 16:50

Sorry to ask a question that’s probably been asked elsewhere, but my searches on it have led to nothing: are there plans for Hit F/X data to be available to the public in the same way as the Pitch F/X data this year?


#9          (see all posts) 2010/11/29 (Mon) @ 17:09

Geoff/8, no.  Sportvision released a month’s worth of HITf/x data from April 2009, but they have not released any further HITf/x data, and nothing from Sportvision has indicated that they are likely to release any additional HITf/x data to the public.


#10          (see all posts) 2010/11/29 (Mon) @ 17:16

Thanks, Mike. I was actually starting to do some investigation into this data last night. I heard that the Gameday data has to be adjusted for each park, is that just for distance or is the orientation of the parks also different on each one? (I know there’s some distance issues due to the overlay graphic they use to mark hit locations or actually where the ball stopped moving on the Gameday app.)


#11          (see all posts) 2010/11/29 (Mon) @ 17:29

Geoff, I use a home plate origin of hit_x = 125.5 and hit_y = 203.  For distance, I generally use a generic multiplier of around 2.4 feet/pixel, but that’s not completely accurate for all balls.  The infield and outfield are to different scales on the Gameday diagrams (or perhaps you would say they are not to scale).  The infield is actually close to a scale of 2.63 feet/pixel.  The outfield is more like a scale of 2.2 feet/pixel, but it’s not completely proportional.  Balls to each field may not have the same multiplier.  I haven’t typically done work with this data where I have the need to have a perfectly accurate distance from the plate.  I’ve done more work with the spray angles.

The park diagrams should all be roughly the same, with the exception of data for 2007 and earlier and the fact the the outfield dimensions may not all scale the same.  On the last point what I am trying to say is that since within a given park the fences in the outfield are not all drawn to the same scale, it’s unlikely that the outfield fences are all mis-drawn the same way in all the parks.

And yes, you are correct that Gameday marks the fielding location, not the landing location.


#12          (see all posts) 2010/11/29 (Mon) @ 17:51

Rally, I sent a file of HITf/x data cross-mapped to Gameday data to your Comcast email address.


#13    MGL      (see all posts) 2010/11/30 (Tue) @ 00:06

How does the hit f/x mark the location for each type of batted ball?  Do they first indicate whether it is an air ball or a ground ball or do they just give the speed, angle of elevation and angle of direction?  Do they give any spin or does the user have to infer the lateral trajectory assuming average spin given the angle of direction?  For the ground balls, is it assumed that they all go straight after they leave the home plate area?  Etc.


#14    Peter Jensen      (see all posts) 2010/11/30 (Tue) @ 01:35

MGL - Hit Fx doesn’t attempt to map the trajectory of the ball.  It only gives the speed of the ball as it comes off the bat, plus the initial vertical and horizontal angles as the ball leaves the bat.  It does not calculate any spin information nor does it infer any.  For the data I presented here I only compared the initial horizontal angle off the bat to the eventual location that gameday recorded as where the ball was fielded.  The Gameday location was calculated by me using the translation parameters for each MLB field that I had estimated.


#15    MGL      (see all posts) 2010/11/30 (Tue) @ 02:20

Got it, that is what I thought.  Shouldn’t make much difference for ground balls, but air balls down the lines have lots of spin such that initial horizontal angles might be very different from where they end up, right?


#16    Brian Cartwright      (see all posts) 2010/11/30 (Tue) @ 02:24

Rally, here’s the sql code I used to create a table from hit f/x which can be linked to Gameday using gamename and event_id. Ran in 24s just now when I tested to make sure it worked.

drop table if exists test.hitfx;
create table test.hitfx
select 
  concat('gid_',hb.mlbam_game_id) as gamename,
  a.number as event_id,
  br.mlb_id as resp_bat_id,
  pr.mlb_id as resp_pit_id,
  hb.hit_initial_speed,
  hb.hit_horizontal_angle,
  hb.hit_vertical_angle, 
  a.event,
  a.description
from hitfx.hitballs as hb
inner join hitfx.actions as a
   on hb.sv_game_id=a.game_id
  and hb.sv_db_pitch_id=a.pitch_id
inner join hitfx.pitches as b
   on hb.sv_game_id=b.game_id
  and hb.sv_db_pitch_id=b.id
inner join hitfx.pitches as p
   on hb.sv_game_id=p.game_id
  and hb.sv_db_pitch_id=p.id
inner join hitfx.players as br
   on p.batter_id=br.id
inner join hitfx.players as pr
   on p.pitcher_id=pr.id;

alter table test.hitfx
  modify column resp_bat_id int(6),
  modify column resp_pit_id int(6);


#17    Colin Wyers      (see all posts) 2010/11/30 (Tue) @ 02:37

Got it, that is what I thought.  Shouldn’t make much difference for ground balls, but air balls down the lines have lots of spin such that initial horizontal angles might be very different from where they end up, right?

Air balls all over the field will have pretty dramatic spin effects, compared to ground balls - the farther the distance to landing point, the greater the effect of sidespin. For the purposes of the number Peter presents, sidespin shouldn’t present much of a difference - once the ball lands that “deadens” most of the spin effects and it should roll mostly in a straight line.

What Peter has mentioned is that balls that make it to the outfield should present slightly more angle error than balls in the infield, due to the sidespin - I don’t know if that’s necessarily so, because we’re restricting ourselves to ground balls, most of which land in front of the infielder regardless.

And sidespin should correlate pretty well with horizontal angle - balls should slice a bit more toward the line than the horizontal angle off the bat would imply, in the aggregate. If you look at the ground balls fielded by outfielders, you do see balls shaded more toward the line in Gameday than Hit F/X on each side of the corner outfielders. But for ground balls fielded by the center fielder, they seem to be shaded closer to the CF than the lines, compared to Hit F/X. Again, due to sidespin, if scorers were marking the location objectively we should expect the opposite - the location of GB fielded by the center fielder should be MORE spread out than Hit F/X would lead us to believe.

(The other concern is what happens if a ball is deflected - final fielding point will appear radically different from intial trajectory, then. I don’t know what Peter did in those cases, if anything. That certainly wouldn’t create the appearance of range bias, though.)


#18    Alan Nathan      (see all posts) 2010/11/30 (Tue) @ 03:44

Those who attended the 2009 PITCHf/x summit may recall my presentation where I combined HITf/x data on the initial velocity vector (speed and angles) with Greg R. hittracker data on landing position and hang time (for home runs only).  One of the things I looked at was the difference between the initial and final spray angles, which I then used to infer the sidespin on the batted ball.  One of the very interesting things I found was an asymmetry between left-handed and right-handed batter, which was most apparent on hits toward centerfield.  While balls hit to LF and RF curve towards the line, those toward CF always slice, meaning they break towards RF for a RHH and LF for a LHH.  My interpretation of what is going on is that the bat is tilted downward upon contact, which leads to that sort of behavior for balls that are undercut (to give primarily backspin).

I have since examined this phenomenon for a large collection of batted balls for which I had the good fortune to obtain data from TrackMan.  The data included the initial velocity vector, landing point, and hang time, where the latter two are extrapolated to field level.  I see the same slicing behavior for balls it in the general vicinity of CF.  I had hoped (and am still hoping) that data such as these might help me develop my model for the spin on a batted ball (the subject of my summit talk this year).  Still working on it.

If anyone is interested, I can post a plot of my results.


#19    Brian      (see all posts) 2010/11/30 (Tue) @ 10:32

Does anyone know how Gameday data is influenced by the diagrams of each park? I ask because at least one of those parks (the Cell in Chicago or Bush in St Louis I forget which) is at least 10 feet off down the foul line. You can see it in every Gameday presentation. That is a pretty big error, isn’t it?


#20    Tangotiger      (see all posts) 2010/11/30 (Tue) @ 10:37

Alan: the answer to any question that starts with:

“If anyone is interested, I can...”

is yes.  There’ll always be at least one person interested, so, please, post away.


#21    Rally      (see all posts) 2010/11/30 (Tue) @ 12:18

I took the data Mike sent me, and refigured the gameday angle using Peter Jensen’s park- specific home plate location.  Then compare to the hit fx horizontal angle (-90 degrees).  There are some real outliers, but it’s not all bad data from gameday.  Some of the hit fx angles are not possible, such as a groundout to short with a -169 horizontal angle.

Anyway, I looked at how often you find differences in the angle for groundballs that are fielded by position 3 or higher.

Diff Cumulative freq
0 426
1 1229
2 2026
3 2767
4 3380
5 3952
8 5094
10 5493
15 5841
20 5910

out of 6006 ground balls.  So +- 5 degrees is about one standard deviation.


#22    Tangotiger      (see all posts) 2010/11/30 (Tue) @ 12:39

So, one-third is within 0-2 degrees.  Remember that one pixel is roughly 2 degrees.  I had speculated that 50% of the balls might be “perfect”, and we have instead about one third.

I speculated half would be off 2-8 degrees, and that’s exactly what we have (5094 minus 2026 divided by 6006).

Where I erred was the total wrongness on one-sixth of the data 8+ degrees.

Good job guys.  This is what I mean, that as baseball-loving fans, we can construct reasonable models, and fill it in with data as it becomes available.


#23    Guy      (see all posts) 2010/11/30 (Tue) @ 12:48

Rally:
Is it possible for you to calculate the out rate on balls that are classified by GD as closer to the nearest fielder than is true (according to hitf/x), those that are classified correctly, and those (if there are any) classified to far from the fielder?  Or turn it around:  are outs more likely than hits/errors to be scored as closer to a fielder than is true?


#24    Tangotiger      (see all posts) 2010/11/30 (Tue) @ 12:52

Excellent suggestion Guy.


#25    Rally      (see all posts) 2010/11/30 (Tue) @ 16:08

Easily?  No.  It would take plenty of time just to figure out how to approach that.

What I’ll do is post my dataset, I took the stuff Mike sent me and linked the groundballs to retrosheet fields.  So if anyone wants to look deeper into this stuff, have at it.

http://www.baseballprojection.com/special.htm


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 05:00
Help needed with sticky issue…

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards