THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, July 17, 2008

Converting PITCHf/x into performance stat lines

By Tangotiger, 04:23 PM

Great job by Paul in finding correlations between some PITCHf/x data and performance stat lines.

A couple of things I’d change is make it GB/(GB+FB), rather than ratio.  I’ve talked about this alot. 

Use K/BFP, not K/9IP.  I’d also suggest trying an additional correlation against (K-BB)/BFP.


#1    Tangotiger      (see all posts) 2008/07/17 (Thu) @ 16:41

Also, include FB% (i.e, percentage of pitches thrown that are fastballs).


#2          (see all posts) 2008/07/17 (Thu) @ 17:50

Very cool.

Shouldn’t he be regressing the expected stats?  So if his equation predicts 50% of the variance in GB/FB ratio, shouldn’t he regress the expected GB/FB ratio 50% to the mean?  Or is that already accounted for in the best-fit equation?

Very cool to see Bush show up on that list, as he always seems to underperform his peripherals.  As for the savvy/unsavvy list for the next couple stats, I think it’s more of a lucky/unlucky list.  But a couple seasons worth of stats should clear it up one way or another.

The fastball speed impacting GB rate is very interesting.  I was just sort of thinking about that the other day, with regards to my work slow-pitch softball league.  Last season, I used a small bat because I liked the bat speed I could generate with it.  I was mostly a LD and FB hitter, and tended to pull the ball more than Ortiz (I’m a lefty as well).  Unfortunately, I could never quite get the distance I wanted on my fly balls, and pulling the ball as a lefty is doubly-bad because the one thing casual softballers know is that you’re supposed to shift over on a lefty.

This season, I began using a big bat, hoping it would slow down my swing enough that I would hit to center field, and maybe the added weight would put a little more pop into the ball.  What ended up happening was I turned into a GB hitter.  I just kept smacking grounders to the 2B and SS.

Two nights ago, I realized that the slower bat speed made me hit the ball later in its path than I was used to.  Which yes, made me hit more to center than pull it.  But it also meant the ball was LOWER than I was accustomed to it being when I struck it.  So I was always on top of the ball, hence the grounders.  With the light bat, I got to the ball sooner than I expected, and thus was under it and could get some lift on it.

Anyways, bottom line is… it makes total sense to me that faster pitches = more grounders.  If you’re late on a pitch, you thus hit the ball later in its path, and it’s going to be closer to the ground than you expect.  So makes sense you’d be on top of it and hit a grounder more often.


#3          (see all posts) 2008/07/17 (Thu) @ 18:38

I too think a lot of the “Savvy” is actually “luck” and that over a larger sample the difference from predicted will lessen. Calling it “Savvy” implies that under or over performing is a skill, and that is not yet proven. The skill is measured by the numbers input into the formula.

I think a slower fastball with more drop would normally be called a sinker - but this is a good way to measure it.

I like the way Paul was thinking, now he just needs to learn perl and sql and spider all those files!


#4    Tangotiger      (see all posts) 2008/07/17 (Thu) @ 18:52

I don’t mind that he calls it “savvy” for his intro article on it.  However, I’d expect after we vet this for him that alot of it will come out as luck.  Ignoring that part, the rest of it was very good.


#5    mgl      (see all posts) 2008/07/17 (Thu) @ 21:45

Great stuff!  I wonder how much projections can be improved by regressing toward a mean for pitchers of similar “stuff.”

We often forget that one of the components of “savvy” which is not savvy at all of course, is the deception that a pitcher has on his various pitches.  For example, if pitcher A throws a fastball the same percentage of time in exactly the same counts (and both have the same “other” pitches) as pitcher B, and they both have exactly the same speed and movement on their fastballs, one pitcher could have a more effective fastball because of his “deception” (I use “deception” as a catch-all for how the pitcher’s motion, release point, etc., affect how the batter sees and responds to each pitch).

Also a question on these multiple regression equations.  Since fastball vertical movement is obviously highly correlated with fastball speed (the higher the speed, the less the downward movement, right?), is that accounted for in the regression equation?  IOW, if he says that each 10 inches in extra movement (I assume “up") means .25 lower FIP (numbers for illustration only), does that mean assuming that the speed remain constant?


#6          (see all posts) 2008/07/17 (Thu) @ 22:21

MGL: assuming he says it is NOT controlled for… what would be the best way to tackle the issue?  I’d think you’d need to account for the covariance, but also add a dichotomous variable that codes for 2 seamer vs 4 seamer, right?


#7    Tangotiger      (see all posts) 2008/07/17 (Thu) @ 22:28

"Since fastball vertical movement is obviously highly correlated with fastball speed”

I don’t think that’s correct.  Remember the terms MOVEMENT and BREAK.  Break is what we think of in actual terms.  Movement is a one parameter of the break.  It’s highly confusing, but it’s the amount of movement compared to a spin-less ball.  So, if you can imagine a ball traveling at X speed, with the force of gravity but the ball not spinning (but not knuckling), and then compare that same ball at the same speed with the same gravity, but with the ball actually spinning.

Like I said, movement is really only something a physicist might talk about (and for some reason, analysts continue to present this information). 

For us viewers, we either want to know about the break (which is what we visibly see and what we talk about, as if we were the hitter or catcher) or we want to know the rpm of the ball, along with its spin axis (a pitcher would want to know that).

So, human terms, pitcher, catcher, batter… break, or spin rpm/axis.  Some isolated technical lab experiment… movement.


#8    MGL      (see all posts) 2008/07/18 (Fri) @ 00:20

Tango, you are right.  I always forget that the movement is as compared to a non-spinning ball of the same speed.  So basically it is a function of the speed of the spin on the ball and the angle of the spin.

Mike, I think the multiple regression does control for each of the variables with respect to the other ones.


#9          (see all posts) 2008/07/18 (Fri) @ 08:18

So, if we can find very similar pitches (fastballs, same speed, break, etc) thrown by different pitchers, and see if there are different results.

For example, Chris Young throws about 92 on his fastball, and has a low babip, while Felix Hernandez is up about 96 but gives up more hits. What is there in the micro analysis of the pich data that can show why? (back to Brian Bannister?)


#10          (see all posts) 2008/07/18 (Fri) @ 10:21

I appreciate the comments and will do my best to incorporate them into Version 2. As for the residuals being representative of luck or a skill, especially considering the low r-squared #’s I big luck, small part skill.  Maybe it is writer bias but looking the list of 97 pitchers sorted by “savvy,” it at least seemed that guys fell where you would expect.  With that said I do not think any of us would except such vague anecdotal evidence. 

I think the best way to distinguish between luck and skill would be to see if it is repeatable and if it is a skill which increases with experience. I thought about toying around with age as a variable but am uncertain how one would identify and quantify a self selecting sample error.

Another problem with the model was horizontal movement.  The data was messy in that when I tried to distinguish between left and right movement or look at absolute movement the data became less statistically significant.

On BABIP: Toying around with the effects of relative velocity (Going to write up the results soon, although any mild effect I found is captured in normal velocity #’s)I came to the conclusion that it is more effective to look at LD%.


#11    Peter Jensen      (see all posts) 2008/07/18 (Fri) @ 11:23

Paul - How did you select the 97 pitchers?  Am I correct that you used the aggregated season data from Kalk’s pitcher cards for your analysis and not data from individual pitches?

Tango - I am surprised at your enthusiasm for this research as it has been presented. There is very little description of the methodology used.  We don’t know how the pitchers were selected or whether their was any weighting for the pitchers by the number of pitches thrown or batters faced.  We don’t know what variables were tried in the regression analysis but rejected as insignificant. And the research seems to suffer from many of the flaws that you described in last weeks thread on studies using regression analysis.  There is no attempt to determine a causal relationship for the variables or establish the predictive value of the relationship formulae that were produced.  The only thing that can be determined from this research is that pitchers with faster fastballs and/or more movement to their fastballs also have more ground balls and are better pitchers overall.  A result that is both predictable and not particularly useful.


#12    Tangotiger      (see all posts) 2008/07/18 (Fri) @ 12:31

I think I’m more enthused at the potential and effort than the article as a final product, which is why I said we need to vet it.  Correlations are good as a first step, and this is how I see the article.  A faster fastball is better than a slow one (all other things equal), natch.  Now, we want to quantify that.  More movement is better than less movement.  We know that, so now we need to know the degree of that.  How does the whole thing get affected if a guy throws his fastball 50% of the time instead of 75%?  How much does it matter if his second pitch is a changeup or curve?  And how much does it matter in terms of the differential of changeup and fastball?

It’s a great first step, and I see the article in that light in trying to find those answers.

As long as the results stick to basic logic (faster fastball is better), then I really won’t have much problem here.

The difference in expectation and results, may have something to do with “savvy” (mixing pitches and location and aware of the count), but it may not.  Hard to tell at this point, so I wouldn’t go around continuing to say “savvy”, unless you say “savvy plus luck and unknowns”.


#13    Tangotiger      (see all posts) 2008/07/18 (Fri) @ 12:37

Btw, the power here is that it gives us a regression point for our forecasts.  If you have two guys with the same ERA, same K/BB ratio over the last 3 years, but one “looks” (based on FB speed, movement, kinds of pitches) better, than you don’t regress their performance stats to the same league mean.... you regress them toward what we expect from someone with those tools.  This is the exciting part.

If Tim Raines steals 50 bases and Jose Canseco steals 50 bases, I know that Raines’ steals are more “real”, because I know how much faster he is.  So, I would regress his steals toward guys with his speed.

Same for fielding, if fans tell us that Ichiro is a fantastically wonderful fielder, and they despise Ibanez, and they both have a UZR of -10 runs… well, Ichiro’s “tools” are far better and therefore, I need to regress Ichiro toward a much higher mean (say regress a certain percentage toward +25 runs), and regress Ibanez a certain percentage toward -25 runs.

Same thing here… we’re trying to quantify the quality of his tools that are manifested in a game.


#14          (see all posts) 2008/07/18 (Fri) @ 12:59

Peter, the sample was pitchers with greater ten 97 innings.  All the data except movement was taken from Fangraphs which has a great export to excel function.  With a few vlookups it was very easy to get the data together. I had to manually add in the movement data from the Kalk pitching cards. That was no fun and was the #1 Factor limiting my sample size.

Concerning weighting. No I did not weight my results.  However, I think is there is less variance with “stuff” then many other baseball metrics, so that relatively smaller sample sizes are more representative.

Concerning lack of methodology. I am thrilled all of you, with far more knowledge then I put thought into my article.  However I wrote with the audience at Blastings! Thrilledge who I didn’t think they wanted to hear too much about the methodology. Simplified it went something like this.
Step 1: Run multi variable with Fastball %, Fastball Velocity, Horizontal Movement, Vertical Movement against Output variable. 
Step 2: Rerun regression with only those variables, which were statistically significant in first regression.
Step 3: Apply results of step two to determine expected Output variable for 97 pitchers.
I experimented with massaging the data, such as taking the natural log of Vertical movement and Absolute Value of Horizontal Movement. 
I would be happy to send you the excel file. Just shoot me an e-mail at


#15    Peter Jensen      (see all posts) 2008/07/18 (Fri) @ 13:12

I give plus points to anyone who actually conducts a research project and plus plus points to anyone who publishs his results for all to see and pick apart.  So I agree that Paul deserves encouragement.  But it is not a particularly good first effort.  And if he is going to continue his efforts and be better in the future he needs more than just your pat on the back, he needs a thoughtful critique of both the bad and good points of his study.

You should not use the predicted values by Paul’s regression formulae as regression points unless some causal relationship is established between the variables and the outcome and especially until the methodology is fully explained and shown to be valid.  There are too many variables that haven’t been tested that are more likely to have higher correlations than those that Paul did test.  The covariances for the variables tested also needs to be shown.

For instance, it appears that Paul used the agreggated data off Josh Kalk’s pitching cards for his analysis.  I asked Paul this question and am awaiting his confirmation but lets assume it for now.  Josh doesn’t make a distiction between two seam and four seem fastballs for most pitchers.  So the movement numbers shown are a combination of both if a pitcher throws both.  This is going to confuse the relationship between movement and the dependent variables.  Another problem that Paul acknowledged in his post above is that horizontal movement is going to have a different affect depending on the handedness of the batter.  Also, most pitchers have very different pitch selection profiles depending on the handedness of the batter.  None of these things can even be investigated using the methodology that I am assuming Paul used because Josh doesn’t break down his statistics this way.

So while the question that Paul raises is a good one, and a first step is always worthy of praise, I am not convinced that this first step is in the right direction.


#16    Peter Jensen      (see all posts) 2008/07/18 (Fri) @ 13:48

Paul - Thanks for answering my questions.  So really the only data taken from Pitch f/x was movement.  Fastball speed and fastball percentage and pitch type are all BIS estimates.


#17    John Peterson      (see all posts) 2008/07/18 (Fri) @ 14:05

Paul acknowledged that his own skills were limited and his process may be flawed.

I have to admit, my own knowledge of how to find statistical significance is pretty barren. All I can do is tell Paul, what about this? Maybe these two things are correlated?

Anyway, I’m sure he will improve the research with your suggestions.


#18    Tangotiger      (see all posts) 2008/07/18 (Fri) @ 14:10

I tailor my posts to how I read the article.  Paul wasn’t presenting it as some end-all research, nor was he digging his heels, nor did he present this as a better alternative to something else.  I read it in that light, that he was presenting some ongoing research results (as if we are looking over his shoulder).  I find his research here similar to the research on catcher skill blocking from a few months ago, using Gameday: great first effort with interesting results.

I also don’t have a monopoly on critiques, and I encourage all to offer theirs.  That I simply link to his research, and gave some kind remarks and some but little in critical review doesn’t imply anything beyond that.  In any case, I think the focus would be best served on what Paul has done, rather than what I may or may not, or should or should not, have done.

Otherwise, I agree with Peter/11.


#19    Tangotiger      (see all posts) 2008/07/18 (Fri) @ 14:17

My biggest problem is with the site itself: it is very hard to read (for me).  Any site that has a dark or black background is enough for me to avoid, or I’d need to have my arm twisted to read.  It’s very, very difficult (for me).  I prefer any background color scheme that is no darker than a light grey (color code CCC), except for nav bars.


#20    John Peterson      (see all posts) 2008/07/18 (Fri) @ 15:09

Everyone complains about the color of the site. I don’t know, I prefer a black background. It’s easier on the eyes, I think. Anyway, I’ll consider changing it sometime.


#21    MGL      (see all posts) 2008/07/18 (Fri) @ 22:10

With all due respect, if “everyone complains about the color,” what difference does it make what you like?  Now if it is the color of your shirt, your carpet, or your car, that is a different story.  The design and layout of your web site should be what the readers or potential readers like/want, unless you don’t care about that.

I too cannot stand black backgrounds (besides being hard to read, it gives the appearance of being a hackers site designed by a teenager), and I think that is pretty universal, as your own words attest.


#22          (see all posts) 2008/07/20 (Sun) @ 00:00

MGL & Tango.  Maybe John should change the color (I have no problem myself and this is somewhat of a silly subject).  What do i know i have never felt one way or the other.  With that said when I pick on someone I respect, I make a point to also mention why I respect them as an effort to not stir up that much ill will. Lets not forget that blastings thrilledege is the most intelligently written mets blog out there (Eric Simons commentary at Amazin avenue is not to shabby either).


#23    MGL      (see all posts) 2008/07/20 (Sun) @ 01:52

Paul, I am not familiar with the web site, so I cannot comment on the content, however, I was one of the first ones to laud this author on his work.  I told him, “Great stuff!”

As far as the color of the web site, almost no “respectable” site has a black background for a very good reason.  What kind of a response is, “Yeah, I’ve been told by a lot of people, but I like it?”

I am not a web person, but wouldn’t it take someone about 2 seconds to change it?  If it is such a good site, and I have no doubt that it is, of course he should change the color.  But, it is his site and he can do whatever he wants.  It is no big deal, unless he does not want to lose and/or annoy readers, in which case it is, or should be, a big deal to him.


#24    John Peterson      (see all posts) 2008/07/20 (Sun) @ 17:47

Well, I’m not really seeking a high volume of readers. But the only problem with changing the color is the images I have with black backgrounds (that I worked hard to make).

Are black backgrounds the hallmark of “hacker sites”? MLBTradeRumors.com had a black background for a long time. Also, don’t most people use RSS anyway?


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:21
The two uncertainties of UZR

Sep 02 15:17
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?