THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, August 06, 2010

Felix throws 91mph curveball?

By Tangotiger, 11:49 PM

I’m looking at the July 31 game, specifically in the 6th inning against Cuddyer.  There are three PITCHf/x sites I’m looking at, and presumably they are going with the MLBAM classification.

First up is Timothy Fisher’s site, pitches 78 and 79.  It seems clear to me that 4 of those pitches could not be curveballs.

Here’s TexasLeaguers, and can we guess the speed of those 4 “curveballs”:

Here is what his changeups look like:

Here’s Brooksbaseball.

My question: what attributes did those 4 90-91 mph pitches have that would make them be considered to be curveballs and not changeups?  MLBAM does show the confidence levels in their classification, and for those 4 pitches, they are the lowest among the curveballs, from .73 to .86 confidence.  All the other curveballs have confidence of .90 to .91. 

Anyway, I did NOT see those pitches.  If someone wants to go to the tape, and show us that Felix did in fact throw 4 curveballs at 90-91 mph, I think that would be pretty jaw dropping.

Otherwise, I would say that any pitch that is outside a 5 mph range (for that pitcher) would have to have a low confidence level, regardless of the remaining attributes to that pitch, no? 


#1    Peter Jensen      (see all posts) 2010/08/07 (Sat) @ 07:52

This is not Felix’s curve.  It is being classified a curve because the Az is less than minus 32, (gravity), meaning the balls spin is helping it to sink.  But Felix’s true curve (see pitch 3 in the Cuddyer PA) has a much higher spin rate and a much slower start speed, around 82 MPH. This pitch is similar to 39 pitches thrown by other pitchers in 2009.  Mostly by Chien-ming Wang, Chad Qualls, and Brandon Webb.  There it was cclassified a Sinker, FastBall, or Slider.  It is something new for Felix in 2010 and probably should be classified a sinker or slider.  It is a nasty pitch to have in his arsenal.


#2    Peter Jensen      (see all posts) 2010/08/07 (Sat) @ 08:49

Sorry. Got my databases mixed up.  The numbers I gave above were for 2008, not 2009.  And Felix did throw that pitch 1 time in 2008, but not in 2009.

In 2009, pitches with those characteristics (start speed over 89, Az less than -32, ZO greater than 6 feet) get thrown much more often,324 times.  Mostly by Aaron Cook, 84 times, and Scott Feldman, 41 times.  Cook’s pitch is a bit different than Hernandez’s in that even though he has a release point above 6 feet, he his pitch acts more like a side arm fastball with a high negative Ax component breaking in on right handers and almost no spin component in the Az direction breaking down.  Feldman’s pitch (pitches?) show a lot more variation.  Spin rates go from 68 to over 1500 and Ax’s from -14 to +6.  Very difficult to classify.  Maybe Mike Fast has been able to figure it out.


#3    Harry Pavlidis      (see all posts) 2010/08/07 (Sat) @ 09:19

Are you talking about the 86 mph pitches thrown on 1-2? Change-ups, not curves. Arm side movement.

Target Field is a little funky calibration-wise. As a result, you get less deflection (tighter clustering towards 0,0—not as bad as Coors though) and extra/erroneous sink.

As a rule, Gameday’s classifications are best when (a) the pitcher is well known and (b) the PFX system is well-calibrated. Even well known pitchers with atypical stuff can be challenging. So, Felix + Target = Tough Challenge for Ross’ neural net.


#4    Peter Jensen      (see all posts) 2010/08/07 (Sat) @ 10:08

Harry - Yes, those are the pitches.  But they started at over 91 MPH, they crossed the plate at 86.  You do much more of this stuff than I do.  I didn’t consider the Pitch Fx funkiness of the park.  Hernandez throws a change up at other parks that has an Az in the range of -17 to -26 or so.  Does the Target Field erroneous sink effect change the Az that much?  Or does Hernandez have two types of change up?


#5    Harry Pavlidis      (see all posts) 2010/08/07 (Sat) @ 10:14

Yes, I jotted down the plate speed.  Still his “change”. I do wonder as you are if he throws something less circle-like and more split like (as in a tumbler more than a tailer with depth).

My classifications have two change-ups, with the ones we’re discussing ending up, in some cases, marked splitter.

Another possibility is these are sliders that backed-up or simply didn’t get a clean read from PFX. I’ll have to look at the rest of the game and a few others, my 2010 classifications end at the All Star break. And I haven’t done review/reconcile yet either.

Mike would know better about the az variances than I would, so we’ll wait til he pops in.


#6    Rally      (see all posts) 2010/08/07 (Sat) @ 10:34

Without looking into it I’m 99% sure those were sliders.  What I’ve found with MLBAM pitch classification is that in the aggregate, they are pretty good.  If they say a pitcher throws 25% sliders and 15% curveballs, that is going to be pretty close to what BIS, which classifies pitch type from scores watching videos, will get.

But when you look at individual pitches, some of them will be clearly wrong.


#7    Rally      (see all posts) 2010/08/07 (Sat) @ 10:44

My database is current through 7/31, and I see 837 curveballs with start speed over 85 MPH, and 74 over 90.  I’m suspicious of all of those, extremely so on the 90+ ones.

Felix had 3 90+ in the 7/31 game against the Twins, and another on 7/10 against the Yankees.

Evan meek has 9 curveballs over 90, including a few 95+.

Interestingly, Both Francisco Rodriguezes show up on my list.


#8    Peter Jensen      (see all posts) 2010/08/07 (Sat) @ 11:22

After thinking about this some more while walking the dogs and taking a second look at Feldman’s pitches in 2009 I’m thinking his pitches might be pronated and supinated fastballs.  Felix’s pitch may be a supinated fastball.  The only way to be sure is to ask him how he throws it.


#9          (see all posts) 2010/08/07 (Sat) @ 11:27

The pitches that MLBAM’s algorithm called curveballs for that game include changeups, sliders, and curveballs.

Felix’s changeup is 88-91 mph, and the pitches in that speed range that MLBAM labeled curves are really changeups.

His slider is 84-87 mph, and the pitches in that speed range that MLBAM labeled curves are really sliders.

The other pitches were accurately labeled curveballs.

Harry, my computer with my PITCHf/x database is down right now, so I can’t tell you whether there were any calibration issues at Target Field on 7/31, but I have the data from the first half of the year on another computer, so let me check that.


#10          (see all posts) 2010/08/07 (Sat) @ 11:29

Still his “change”. I do wonder as you are if he throws something less circle-like and more split like (as in a tumbler more than a tailer with depth).

My classifications have two change-ups, with the ones we’re discussing ending up, in some cases, marked splitter.

That’s an interesting question that requires some in-depth investigation.  Identifying changeup types accurately can be a challenging task.  However, with Felix we should at least have an extensive pictorial record, which should help.


#11          (see all posts) 2010/08/07 (Sat) @ 11:48

In 2010 through the All-Star Break, the Target Field PITCHf/x cameras were recording pfz values that were about one inch too negative.


#12    Harry Pavlidis      (see all posts) 2010/08/07 (Sat) @ 12:02

I don’t think they’re sliders, looking through the game inning by inning.

Mike, I’m putting some Felix in the THT lightbox at Icon. I’ve found a circle change so far. Which reminds me, Rich Harden can throw a circle change that acts like a splitter.


#13          (see all posts) 2010/08/07 (Sat) @ 12:09

Harry/12, I wasn’t saying that the pitches that Tango was asking about were sliders.  They’re changeups.  But MLBAM also labeled Felix’s sliders from that game (and most of the season) as curveballs.

I found several images of circle changeups from Felix on Daylife.  None of them are the best pictures of his grip, but between the three of them you can see it pretty well.

http://www.daylife.com/photo/09tV3tIgL946Z?q=Felix+Hernandez
http://www.daylife.com/photo/0280bsM1wIfCq?q=Felix+Hernandez
http://www.daylife.com/photo/0b8McZkfuq2mk?q=Felix+Hernandez


#14          (see all posts) 2010/08/07 (Sat) @ 12:13

Harry, that 5/30/09 image in the Lightbox is definitely a circle change, but the 8/2/06 image is breaking ball of some sort, maybe a curveball based on how much he’s supinating.


#15          (see all posts) 2010/08/07 (Sat) @ 12:25

There’s another picture of him throwing a changeup here:
http://davidrichard.photoshelter.com/image/I0000_yK8NWdtKxI
It looks like a circle change, too, but it’s hard to tell for sure without being able to see his middle and ring fingers.

I don’t see any images so far that suggests he throws anything but the circle change, and I don’t really see it anything else in the PITCHf/x data, either.


#16          (see all posts) 2010/08/07 (Sat) @ 12:40

Harry, I put three more images of Felix throwing a circle change into the THT Lightbox.  It looks like he does vary how much he curls his index finger under on the circle change grip. 

It seems to me that he would get a little more sidespin the more he tucked his index finger under, but that would also depend on whether he rearranged his other fingers on the ball.  Basically, I wouldn’t consider those separate pitch types.

Honestly, it can be hard to tell apart the splitter, the circle change, and the three-finger change from PITCHf/x data alone if you don’t know which one a pitcher throws.  But when you look at the grips you can see how they cause different movement (and then you can go back and see it in the PITCHf/x data). 

The splitter and the circle change both lend themselves to more purchase on the ball for pronation on release, and thus more sidespin, but not every pitcher throws the splitter that way.  Some (e.g., Haren, Harden) bury it deeper in their hands, which cuts the pronation effect and also leads to less spin on the ball.


#17    Harry Pavlidis      (see all posts) 2010/08/07 (Sat) @ 12:59

Mike, re. sliders I was responding to rally/6

Based on looks at data and pics, I agree, it’s only a circle change, with variations.


#18    Peter Jensen      (see all posts) 2010/08/07 (Sat) @ 13:10

Mike - Again I defer to yours and Harry’s expertise in these matters.  But I am trying to learn.  I’ll ask you a variation on the question I asked Harry above.  Some of Felix’s change ups have Ax’s in the mid -20’s and the 2 Cuddyer pitches in question are -33 and -36 with corresponding differences in Ax’s.  Can that happen by chance and park variability alone?  Or does a pitcher have to be trying to throw the pitch differently?


#19          (see all posts) 2010/08/07 (Sat) @ 13:21

Peter/18, there are four sources of variation that I can think of.

1. Pitcher varies how he spins the pitch, either purposefully or unknowingly
2. Air density changes
3. Random measurement variation
4. Systematic measurement variation

Speaking in terms of pfx/pfz movement changes rather than acceleration changes, here’s the relative size of those variations.

1. Typically 2-4 inches unless the pitcher is really doing something unusual
2. Usually <1 inch unless at altitude (Coors especially, but also Chase Field, Turner Field, etc., to a lesser extent).  This effect is proportional to the spin on the pitch.
3. Around 2 inches
4. Most parks are in the 0-3 inch range most of the time, but occasionally this can be as far off as 6 inches.


#20    Tangotiger      (see all posts) 2010/08/07 (Sat) @ 13:33

I’m going to be creating my own clustering algorithm because, basically, I want to learn how to do that.

When I see the data, it seems to be that the prevailing weight, in terms of something being curve/not-curve, is the speed of the pitch, relative to the fastest (say 10% or 25%) of all pitches thrown by this pitcher.  Even if you start from scratch for each game, those pitches occurred in the 6th inning.

So, my original question stands: what kind of an algorithm would you have that would not do that kind of weighting?  It seems that speed is being treated as one of say 5 parameters to distinguish each pitch, and perhaps it only counts as 20% of the weight, or something.

Even so though, it’s not like the speed of those pitches is the only parameter that was very different from the other pitches labelled as curves.  There are at least two parameters that would make you think this is not a curve.  So, it seems strange that the “confidence level” was so high. 

Or perhaps, any confidence level below .89 may as well be .50 or something?

***

I understand that for the various fastball pitches (4-seam, 2-seam), sliders, the speed won’t have that strong a weight.

That’s the other question, that whatever clustering algorithms that are being used, can you re-weight based on characteristics?  That is, if you have a bunch of pitches at 93-98, which can be fastballs or sinkers, and others at 88-92 that are probably changeups, but could be sliders or sinkers, then the weight for the speed won’t matter as much as the spin and movement numbers.  But if a pitch is at 80mph, then it’s going to be a curveball most of the time.

Is this how the clustering algorithms that the pitchfxers are using?


#21          (see all posts) 2010/08/07 (Sat) @ 13:47

Tango, lots of the answers about pitch classification methods depend on which data you are using and what you want to do with it.

If you are classifying real-time, like MLBAM is, you have to make some comprises.  I assume you will be classifying after the fact and won’t have to concern yourself with that.

Are you going to be classifying one game at a time, or at a whole season level?

How accurate do you need to be?  Are you mostly concerned with separating fastballs from changeups from breaking balls?  Or do you care about properly identifying two-seam fastballs vs. four-seam fastballs, sliders vs. curveballs, etc.?

Are you only going to be working on one pitcher or a very limited number of pitchers?  If so, you can tune your algorithms for each pitcher.  If, on the other hand, you want to use the same algorithm for hundreds of pitchers, you have to be more generic.

Btw, the simplest, most accessible algorithm, that will give you decent results, after the fact, is K-means clustering.


#22    Colin Wyers      (see all posts) 2010/08/07 (Sat) @ 13:50

The most common clustering algorithm is k-means. It’s very simple to use (not that easy to program, but there are a lot of libraries for k-means - GNU R is loaded with them, Python has a few, etc.)

With k-means you pick a number of “centers,” and the algorithm picks the centers based upon least distance. Distance between any two observations is figured as:

SQRT((x1-x2)^2+(y1-y2)^2+(z1-z2)^2

Assuming that you’re using x,y,z values (say - speed, horizontal break and vertical break).

You can probably recognize why speed is a primary selection criteria for most k-means clustering uses - the differences between two pitches in speed (numerically speaking) are greater than the differences in break. If you want to compensate for this effect, you have to scale the input variables (typically using z-scores).


#23    Tangotiger      (see all posts) 2010/08/07 (Sat) @ 14:01

Right, that’s good.  But I want to weight say the speed alot more if it’s low, and alot less if it’s high.

Basically, the way that equation works (the one posted by Colin) is that it treats each of the distances the same (or if you convert it to z-scores, it treats the standardized distances the same).  And it keeps all the three parameters independent.

I’m having a hard time trying to say this.  If a pitcher’s baseline speed (the fastest 10% of his pitches) is 95mph, and you come across a pitch that is 90mph, then the chance that that is a curveball is less than 1%, regardless of what all the other characteristics of the pitch is.  Given that you have different choices (4, 2, slider, changeup) to pick from, the result can never be a curveball.

It doesn’t seem that these kinds of clustering algorithms can use this kind of knowledge.  Which is basically why I’m thinking that I’m better off (well, maybe not better off, but at least there’s going to be some redemption to it) just writing my own.

***

Mike: I just don’t see how the 80th pitch of a pitcher in a game can come up with curve for those 2 pitches, regardless of how generic the algorithm is, other than it doesn’t consider speed that important.


#24          (see all posts) 2010/08/07 (Sat) @ 14:14

Tango/23, because K-means and other similar algorithms classify based on the square of the distance, it’s not going to have the trouble that you think it will have.  If the avg curveball spd is 77, a pitch at 90 mph will be considered four times as far away from the center of the curveball cluster as a pitch at 83.5 mph.  K-means is not going to classify a pitch at 90 mph as a curve.

As to your last paragraph, I prefer not to make any public comment on that topic since there’s been too much difficulty in the past when I’ve done that.


#25    Tangotiger      (see all posts) 2010/08/07 (Sat) @ 18:01

Let’s say you want to do “sim scores” between nonpitchers.  You have HR per PA, SB per 1B+BB, SO per SO+BB.  Let’s say.

You can find the standard deviation of each of those so you can use z-scores.  Then, you simply add the square of the z-scores for each comp pair to get the most similar players.  Straightforward stuff.

BUUUUUUUUUUT, let’s say that, by far, the most important thing is SB per 1B+BB, and you really don’t care too much about SO per SO+BB (let’s just say for this illustration).  So, what you end up doing is overweighting the speed component, so that you guarantee say that Carl Crawford and Juan Pierre matchup, regardless of whether they are similar, because, to you, it’s important that the similarity on speed is important.

That’s what I mean here, that you can put in all the parameters from PITCHf/x that you want, but you would like to overweight some, because you know it means more.

BUUUUUT, it means more in some instances (speed of ball for curves), and means alot less for others (speed of ball for sliders).

Anyway, that’s the best example I can give to describe what I mean.


#26          (see all posts) 2010/08/07 (Sat) @ 18:06

Tom, I understand what you’re saying.  I’m telling you that in practice I’ve not found that to be necessary.  Maybe as a second or third level of improvement, but I think before you get to that point, you’re going to find that there are a lot of other challenges to tackle first.


#27    Colin Wyers      (see all posts) 2010/08/08 (Sun) @ 00:15

Real quick and dirty here. Here’s a scatterplot of the game in question, using start_speed and spin as published in the Gameday XML files:

http://flic.kr/p/8qnnY4

And the same scatterplot, but this time with the results of a simple K-means cluster, with five centers, using start_speed and spin as my input parameters:

http://www.flickr.com/photos/42654229@N00/4870875602/in/photostream/

Now this is quick and dirty, and mostly for illustration purposes - I don’t know if five is the right number of centers, and if I was doing this “for real” I’d use more than two input variables. But this way, the scatterplot corresponds EXACTLY with what K-means is doing - the visual distance between any two points on that graph should represent the squared distance in K-means.

That green band of pitches in both plots is the curveballs. In the Gameday plot, you see the phantom curves on the far right - there is not a more distant cluster to put those pitches in than curves. It’s literally impossible for K-means to get that one wrong.

Now, look, what I’m doing here is very different from what the MLBAM automated classifier has to do. I am able to throw practically unlimited resources at the problem (both in terms of computer and human labor), I’m able to use the last pitch of a game as information on how to classify the first pitch of the game, etc. But this particular instance (which is to say, classifying Felix’s curveball in a single game) is a case that k-means can handle VERY robustly.


#28          (see all posts) 2010/08/08 (Sun) @ 00:41

Using spin angles in a classifier is a little dangerous unless you can tell your classifier that 360 degrees wraps back around to 0 degrees.  K-means won’t realize that.  Which works fine in this case but won’t always.  In reality those phantom curves aren’t that far from the real breaking balls in those two dimensions.  (Though I suspect even if K-means did understand that the angle wrapped around, it still wouldn’t put those changeups in with the curveballs or sliders.)


#29    Colin Wyers      (see all posts) 2010/08/08 (Sun) @ 01:22

Yeah, you’re right. I went ahead and reclassified based upon my normal parameters (start_speed, pfx_x and pfx_z) and you still see separation between those (although it’s broken the “curve balls” into two groups - it’s late, and I obviously am not putting a lot of effort into this, so who knows if I have the right number of centers):

http://flic.kr/p/8qoaEz


#30          (see all posts) 2010/08/08 (Sun) @ 10:11

Colin, that’s a better classification.  The breaking balls are supposed to be in two groups, so that’s good.  I don’t know if it’s getting all the 90+ mph pitches divided correctly, but it’s at least pretty close.


#31    Harry Pavlidis      (see all posts) 2010/08/08 (Sun) @ 10:34

Yep, Mike/30 is right. The fun of Felix is sliders blend with curves, all his fastballs and even off-speed stuff get blended. Oh, how I love pitchers who have distinct groupings.


#32    Cory Schwartz      (see all posts) 2010/08/09 (Mon) @ 15:34

No doubt we’re wrong on those pitches for King Felix.

As many of you know we are transitioning from using a generic neural net (one for all righty pitchers, and another for all lefties) for all pitchers, along with biasing based on their specific repertoires, to using a custom neural net for each pitcher. We have several hundred custom nets built already, including one for Felix, but many of the ones we did back in the spring using 2009 data are not as accurate as those trained using 2010 data.

We last retrained the NN for Felix back on March 31 so he’s one of many on the list to retrain using 2010 data, after which we’ll reclassify every one of his pitches this year. This is an ongoing process that I suspect will continue on into the winter…

Thanks,
Cory


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 01:57
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 10:29
Dwight Evans