THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, September 12, 2007

Clustering pitches

By Tangotiger, 10:36 PM

Hat tip Kevin.

Some great stuff being done, this time with clustering pitches to figure out what pitchers are throwing.  See also links at the in the comments on that page from the equally impressive Mike Fast.  Seriously, there’s a good dozen bright bulbs out there working their hearts out, bringing the work to the masses.  It feels great to sit back and watch this unfold.


#1    MGL      (see all posts) 2007/09/13 (Thu) @ 16:28

Is everyone still using MLB’s convention for the vertical break (is “vertical break” up and down movement?), which is the difference between a pitch thrown with no spin and what the pitch actually does?  That is preposterous and has to stop!  When looking at the vertical and horizontal break numbers, we want to get a visual idea of what the pich does.  Since we have NO IDEA of how much a pitch thrown at 82 mph with no spin is supposed to drop, we also have no idea what a pitch at -3 inches (as compared to a no spin pitch) does!  Does it rise?  Does it drop? Does it stay perfectly parallel to the ground? 

Is there anyone else who is bugged by this?

Can you imagine getting stopped by a cop for speeding and he says, “Do you have any idea how fastt you were going?” You respond, “No.” He then says, “6.8 mph more than a 210 HP car with 40 pounds of torque and a 60% aspirated gas mixture in the carberator, with no head or tail wind.  Don’t you think that is a litle too fast for this neighborhood?”

Can Walsh, Sheehan, and all these other great researchers and programmers start telling us how much a pitch drops and how much it breaks from side to side - please?!  It will only take you about 2 extra minutes of code.


#2    Mike Fast      (see all posts) 2007/09/13 (Thu) @ 17:40

MGL, the vertical break as used in PITCHf/x makes much more sense to me than what you are proposing.

A fastball has backspin, so it “rises” compared to a theoretical spinless pitch.  A curveball has topspin, so drops to compared to the spinless pitch.

I can easily tell what a pitch with a vertical break of +6 inches did versus a pitch with a vertical break of -1 inch.  The first is a fastball, the second is a curveball.

What can you tell about the same two pitches if I tell you that one dropped 35.3 inches overall and another dropped 52.6 inches overall?  The first happens to be the curveball and the second was the fastball.

All pitches drop, and they drop a lot (with the exception maybe of a pitch from a submariner), and that drop due to gravity, left in the data, wipes out the ability to see most of the interesting things.

In addition, the overall x-z movement from release to plate is affected by the pitcher’s delivery and where he aims the ball.  That data is interesting on its own terms, but when you’re trying to look at groups of pitches for comparison by pitch type, for example, it muddies the data and makes pitch-to-pitch comparisons virtually impossible.

There are some folks (Dan Fox, Harry Pavlidis) who are doing 3-D plots of pitches that basically have the data you desire.  Or you can go to MLB’s Gameday app itself and see those plots for any pitch you want.  I know some people find those plots fascinating, but I don’t because all the pitches look basically the same to me--a parabolic arc down from the mound toward the strike zone. grin

When you remove the effect of gravity, as the pfx_z parameter reported by PITCHf/x does, that’s when the data really gets interesting, in my opinion.


#3    joe p      (see all posts) 2007/09/13 (Thu) @ 18:51

Are you asking for the differences between the release point and where the ball crosses the plate, like...a Verlander fastball drops (roughly) 3.4 feet and moves 2.1 feet to his left, while his his curve drops 4.1 feet and moves 2.8 feet to his left?

As a point of reference, (like putting a radar gun on that cool car) I think the non-spinning version of Verlander’s fastball drops 4.6 feet and moves 3.3 feet to his left.  His “non-spinning curve” drops 3.5 feet and moves 2.3 feet to his left.  I got these non-spinning values by calculating the path of his regular version of that pitch, but assuming that the only force acting on the pitch the whole time was gravity.  I think this is right, but if someone with physics knowledge doesn’t agree, let me know.

I think break is a really hard thing to define, which is why the definitions MLB gives are a bit weird, and maybe why they tried using that ridiculous “break” value during the playoffs last year .  A curveball has a big break, but do you measure it from the top of it’s parabola or where it left the pitchers hand or somewhere else entirely?


#4    Mike Fast      (see all posts) 2007/09/13 (Thu) @ 19:25

Joe, doesn’t the absolute x-z movement of a pitch depend primarily on (1) where the pitcher released it and (2) where it was located in the strike zone moreso than on the movement, or break, of the pitch?  The difference in break between a typical fastball and a typical curveball is about a foot or less, and the typical strike zone is two feet tall.  The release point can also vary on the order of a foot, or more if the pitcher comes sidearm sometimes.

So your stats on Verlander may mean that he likes to locate his fastball down and in to righties, or that he comes down lower when he releases his curveball and throws his fastball more overhand.  It don’t think it tells us anything about the “real” break on the two pitches, since a curveball drops more than a fastball, ceteris paribus.


#5    joe p      (see all posts) 2007/09/13 (Thu) @ 20:45

Mike, I agree with you.  The drop values are very dependent on where the pitch crosses the plate.  Where the pitch crosses the plate is calculated based on 9 initial parameters though, and I’m not sure what is causing what.  Is a curveball that ends up low in the strike zone thrown low because thats where the pitcher wants it, so he does something extra, different than his normal curveball release, to impart more spin, or is he just throwing a curve, and when a pitch is thrown with curveball specific values for the parameters, it ends up low in the zone?  I’m not sure if I’m being clear with what I mean (or if the two options are even different) but either way, I prefer to compare pitches to their non-spinning counterparts and use the pfx values, even though those values are abstract.


#6    MGL      (see all posts) 2007/09/15 (Sat) @ 00:02

I see your point, Mike, and I have to think about it a bit more.  I am away from home and have not been able to get on the internet much, or do any baseball work.


#7    MGL      (see all posts) 2007/09/15 (Sat) @ 00:05

MLB has a lot of time, money, and manpower to devote to these things.  I should not be so quick or presumptuous to think that I have a better method than they or that they have not already thought about this carefully.


#8    Mike Fast      (see all posts) 2007/09/15 (Sat) @ 13:54

MGL, you had a very important point about much of the PITCHf/x data and the things written based on it not being presented in a consistent or intuitive fashion.

I think it’s important that the researchers in the field develop a common language so that we and the general baseball/sabermetric public can understand what we are saying.  That would help the information would be more easily used by people in other parts of the discipline, which would in turn result in better analysis.

Part of the problem is that the PITCHf/x analysis is still in its infancy (only five months old, with most of the analysis taking place in the last three months).  Nobody really knows yet what the important parts of the data are.  Every rock that gets turned over has something new or surprising underneath it, and we’ve barely scratched the surface of what can be done with the data.  There are tasks of all sorts waiting to be tackled.

It makes me laugh when Gary Huckabay says that the analysis of quantifying player performance is dead.  PITCHf/x is about to release a tidal wave on the quantification of player performance.  Huckabay’s not going to know what hit him, it will be so revolutionary.


#9    MGL      (see all posts) 2007/09/15 (Sat) @ 15:04

How do they determine where a pitch should end up if it had no spin?  It seems to me that there are 3 determinants:  One, average pitch speed (or initial plus density of the air), which determines the time to the plate and hence the amount gravity will cause the ball to drop.  Two, the trajectory upon release, and three, the height of the release point.  Can they determine trajectory with that much accuracy?  If they are off by just a little, doesn’t that screw up their numbers?  For example, let’s say that I throw a ball at 90 mph from 7 feet off the ground with a certain trajectory.  Let’s say that a no spin ball with that same trajectory (and speed and height) should cross the front of the plate 2 feet off the ground.  Now let’s say that it crosses the plate 3 feet off the ground.  PITCH/fx would determine that the underspin I put on the ball caused it to arrive 1 foot higher than a no-spin ball would and they would list the vertical “drop” at -1 (or maybe +1, I don’t know the convention).  But what if they were a little off on their estimate of the trajectory.  Maybe the reason it arrived 1 foot higher than a no-spin ball would arrive is because my trajectory was higher than they thought?  In that case, I might have had no spin on the ball.

Or do they not measure trajectory at all?  Do they look at the entire path of the ball and then use the compuer to analyze that and the speed, etc., to determine the spin on the ball (the spin affects the arc/path)?

Is there somewhere where there is a good primer/explanation of exactly how they measure the pitch and come up with the numbers they do?


#10    Mike Fast      (see all posts) 2007/09/15 (Sat) @ 15:14

Some of the answers are here at Dr. Alan Nathan’s Physics of Baseball website:
http://webusers.npl.uiuc.edu/~a-nathan/pob/technology.htm

You might also find his paper on solving pitch trajectories interesting:
http://webusers.npl.uiuc.edu/~a-nathan/pob/Analysis.pdf

I’ve based a lot of my work on that paper.

You’ll find other good stuff if you wander around his site.  He used to have a glossary of PITCHf/x terms on his site, but that seems to have been taken down.  You can use the glossary on my site as a poor substitute:
http://fastballs.wordpress.com/2007/08/02/glossary-of-the-gameday-pitch-fields/


#11          (see all posts) 2007/09/17 (Mon) @ 13:59

MGL,

Following up on Mike Fasts point, heres a good link from that site: http://webusers.npl.uiuc.edu/~a-nathan/pob/tracking.htm

pitchFx is not the difference between the actual pitch location and the location of the same pitch without spin. A ball with no spin will knuckle. It would be closer (although not quite accurate) to think of pitchFx being the deviation between the actual pitch location, and the expected pitch location if the ball did not have raised stitches. Another way to look at it is the deviation between the actual pitch location and the expected pitch location if the ball did not accelerate after it left the hand of the pitcher except for the acceleration due to gravity and drag.

My understanding of the process:
(1) Cameras take 60fps pictures as the pitch is in the air.  The pixel location of the ball at each point in time, t, is determined.

(2) The pixel location data is fit to a model (simple least squares).  The model includes terms for start location, initial velocity, and acceleration along all 3 dimensions (9 parameters).  x(t) = x0 + vx0*t + .5 * ax0*t^2

(3) Using these parameters, the pitch location when the ball crosses the plate is calculated. (i.e., if the ball crossed the plate at t=340ms, x(340) = x0 + vx0*.34 + .5 * ax0 * .34^2)

(4) The predicted location of the ball if there were no acceleration due to the raised stitches is computed by the same equation, excluding the acceeleration term (except in the Z dimension, where the acceleration is set to equal the force of gravity). (i.e., x(340) = x0 + vx0 * .34)

(5) The difference between the actual and the prediction sans acceleration is computed along each dimension, giving pf_x, and pf_z.

I think this is oversimplified (for instance, unless the ball is through with no initial velocity in the X and Z dimensions, acceleration due to drag will affect the balls trajectory.  I think the model accounts for drag in step (4).), but may be helpful nonetheless.


#12    Mike Fast      (see all posts) 2007/09/18 (Tue) @ 11:31

CDM, thanks for the correct glossary link from Dr. Nathan’s site.  Somehow I had missed where that had moved, or perhaps I had the wrong link from the beginning.

You are right that it’s not perfectly accurate to say that the PITCHf/x system reports the break as compared to a non-spinning pitch.  It’s more accurately stated that it is compared to a theoretical pitch that is not affected by the Magnus force. (I think they include the effect of drag on the theoretical pitch, but I’m not 100% sure about that.)

The three main forces on a pitched ball are gravity, drag, and the Magnus force.  The Magnus force is the one that makes the ball break.  The stitches influence the drag but don’t really affect the break on a ball that’s spinning at typical spin rates (500-3500 rpm). 

However, as you pointed out, the knuckleball is an exception and the reason we can’t technically say the pitches are being compared to a “spinless” pitch.  When a pitched ball rotates very, very slowly, the seam orientation matters and will affect the movement, or break, on the pitch. 

I’ve read in the past that Wakefield likes to put 1/4 to 1/2 a rotation on the ball between his hand and the plate.  That corresponds to ~40 rpm, which is an order of magnitude lower than even the splitter/forkball.

The PITCHf/x model accounts for gravity, drag, and the Magnus force.  It does not account for the effect of the stitches or the wind, as far as I know.  That works as a very good approximation except in the case of the knuckleball.


#13          (see all posts) 2007/09/18 (Tue) @ 12:40

I’m no physicist, and I’m not sure the distinction is too meaningful, but the PITCHf/x model doesn’t account for Magnus force per se.  All it sees is the trajectory of the ball, and from that estimates location, velocity and acceleration.  It assumes a constant gravity (reasonable assumption wink ), and I think it can partial out the drag by looking at acceleration in the Y dimension.  But it can’t dissociate the effects of the magnus force and wind, or the effect of stitches (though you’re right; I was mistaken in thinking that the stitches were what gave rise to the magnus force), or any other factor that causes acceleration.

More relevant, though, is that the knuckleball won’t be modeled by PITCHf/x properly, because the model assumes a constant acceleration in each dimension.  Knuckleballs will change acceleration as the ball spins.  Thus, when PITCHf/x tries to find a least-squares fit to the frame-by-frame pitch location, its not going to fit the data properly.  I don’t know the algorithm it uses, but it seems likely that this would cause it to get everything--release point, p_x, p_z, and pfx_x and _z--wrong.  Wakefields release points, though, don’t look too out of the ordinary…

On an aside, Wakefield is the only pitcher I’ve found who regularly throws pitches that PITCHfx believes have acceleration in the Z dimension that are on par with gravity.  Other pitchers I’ve seen only have a few outliers hit -35 ft/s^2.  Weird.


#14    Alan Nathan      (see all posts) 2007/09/18 (Tue) @ 13:06

I am a bit late jumping in on this thread, so pardon me if I am covering material that everyone knows about already.  Some random comments:

1.  I think everyone understands correctly that the numbers provided by PITCHf/x are based on a 9-parameter constant acceleration fit to the “raw” data (i.e., the location of the pitch in each frame determined by the cameras and their calibration).

2.  I have done some modeling to confirm that for “normal” pitches (i.e., non-knuckleballs), the 9-p fit is an excellent representation of the actual trajectory.  That is, I start with a trajectory calculated using a model for drag and Magnus forces.  I then do a 9-parameter fit to the trajectory and observe that it is a excellent representation of the calculated one.  I conclude that the PITCHf/x technique is a reasonable one.

3.  I have not yet tried to apply this procedure to a knuckleball (which is hard to model but not impossible).  I will try that soon and see what happens.  It is interesting that the z acceleration for Wakefield is roughly consistent with g (32.2 ft/s^2).  For a non-spinning ball with no “extra” forces, that is exactly what I would expect.  In this case, the extra forces would come from the interaction of the air with the stitches, which is responsible for the movement on a k-ball.  I completely agree with the comment made by one of you that the 9-p fit to a k-ball might not get any of the parameters right.

4.  To figure out a non-Magnus trajectory, my suggestion is to recompute the trajectory with a_z set to g (-32.2) and a_x set to 0 (but leave a_y alone, since the drag is mostly theres).  In fact, that is exactly what is done with PITCHf/x when they report pfx_x and pfx_z.

5.  Following up on the previous point, there is a somewhat better way to do the non-Magnus trajectory (but it is only marginally better).  It comes from recognizing that if the initial velocity of the ball is not totally in the y direction, there will be a little drag in the other directions.  So in that case, setting a_z equal to g and a_x to 0 is not quite the right thing to do.  It is straightforward to modify that procedure and I will write it up for my web site sometime soon.


#15          (see all posts) 2007/09/18 (Tue) @ 17:46

A brief addendum to my earlier post.  Responding to post #3 in this thread (Joe P.), who talks about the definition of “break” with PITCHf/x.  As explained in my glossary, the data file utilizes two different definitions of break.  The “pfx_x” and “pfx_z” values are the deviation at home plate from the location expected if the ball were not spinning (ignoring any k-ball effects).  This is what one might call the “physicist’s” definition of break.  It is the one most easily related to the forces on a baseball.  The other definition is “total break” and “break angle”, which together can be used to compute an x and z break.  This definition starts by drawing a straight line from the release point to the front edge of home plate.  Then compare that line with the actual trajectory.  The “break” is the maximum deviation between the two lines.  I like to call this the player’s definition of break, since it is the one that makes most sense to them.  By this definition, a straight hard fastball with lots of backspin has very little break, so that the break on all other pitches is that relative to a hard fastball.  A splitter looks like a fastball that drops more, so it has a large positive break-z.  A 12-6 curveball would have an even large positive break-z and very little break-x.  Sliders and cutters have varying degrees of break-z and break-x.  A changeup has a much larger break-z than a fastball, but only because it is thrown slower and therefore drops more due to gravity.  The Sportvision people decided that this definition of break is what made most sense to the players, announcers, etc.  Note that break-x and break-z include all deviations from a straight-line trajectory (including effects due to gravity) and not just effects due to spin.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season

Dec 05 05:33
Avery being Avery

Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 04 23:42
Poll: Would you vote Raines for the Hall?

Dec 04 23:07
How to calculate the area of a baseball field

Dec 04 22:48
Complete Run Expectancy, Retrosheet Years

Dec 04 22:03
Raines for the Hall

Dec 04 15:55
Mailbags on Parade

Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?

Dec 04 11:49
Estimating BABIP