Tuesday, April 24, 2007
Pitching Measure: call for comments
Someone is asking for more feedback on his work.
Buy The Book from Amazon
Someone is asking for more feedback on his work.
Well, looks like he IS regressing LD%, IF%, and HR% 75% toward the mean. That’s probably not too far off, but I’d use the Z-Score method to figure how much regression to apply.
One more thing. I think one the hittracker data is finished for the year, we’ll have a better idea of why some people can have lower HR/F. I believe Jamie Moyer, for example, had the lowest average distance of a HR last year per Hit Tracker. That makes sense. Having watched him during his tenure as a Mariner, especially after the move to Safeco, my Dad and I hypothesized batters were not getting as much distance off of him, perhaps the much lower than normal pitching speeds, and changing speeds a lot makes him harder to hit. Also a lot of high GB guys have high HR/F numbers. That totally goes in the face of a lot of currently held sabermetric views, but I think hit tracker will help clear a lot of it up.
I think that the batted ball data is wonderful, but with the new B-Ref pitch data, hit tracker, and extended gameday, the sabermetric community is on the verge of understanding things in much more detail. However, it opens up the possibility of misinterperting what the data meant, much like what happened when DIPS was introduced.
I’d guess you’re right about the lower pitching speeds impacting the HR distance. It’s just a theory of mine, but I think pitch speed is the biggest factor in the original DIPS adjustment for knuckleballers, and the smaller one for lefties. They throw slower, thus balls are hit slower off of them, giving fielders more time to react. I’m not sure the changing speeds of Moyer has a huge impact on the distance of balls hit, but the overall low speed is almost certainly a factor. Where does Wakefield fall on this list of avg. distance per HR? A better question would be average MPH per fly ball, I think, because if you play in a field with 200 foot fences, your average HR distance will be small no matter how you pitch.
Also a lot of high GB guys have high HR/F numbers. That totally goes in the face of a lot of currently held sabermetric views, but I think hit tracker will help clear a lot of it up.
I’m not sure why you think that flies in the face of sabermetric views. A bunch of us have studied HR/F rates, and we’ve all come to the same conclusion - as a group, GB pitchers post higher HR/F rates than FB pitchers.
This is pretty much an accepted fact among everyone who has looked at the issue, as far as I know. I know I’ve mentioned it many times on USSM.
I’ve been thinking about mining the Hit Tracker data. I think it’ll be more useful for 2007, seeing how he’s tracking all fly-balls. I agree, the velocity is probably a better proxy than distance, until we get all of the flyball data.
Sorry David, you’re right, which is why I even mentioned it. Two things, I kind of lumped the GB pitcher part, and the Moyer part together. Also I agree, you, and others, have looked at the issue and you have pointed that out, so it’s not “in the face of” per se, bad choice of words I guess. Even w/ the HR distance/velocity, I suspect a lot of saber people realized this, even before Hit tracker, they just couldn’t quantify it. I stand corrected.
David, I don’t think that’s true at all:
http://www.hardballtimes.com/main/article/the-truth-about-the-grounder/
“The correlation between a pitcher’s ground ball rate, and the percentage of the balls hit in the air against him (measured as outfield fly balls plus line drives) that land in the seats is -.05, otherwise known as non-existent.”
If I look at all qualified pitchers (using THT data) 2004-2006, the correlation between GB% and HR/F is 19%. The average HR/F is 12.27%. If I look only at pitchers who had >=52% GB rate (1 SD above the average from that sample) for a season, the average is 13.83%. Also, I believe the relationship is exponential, not linear, if I use GB%^2 I get r=.20, GB%^5 r=.22. So it seems to be pronounced at the extremes. However, When I looked at retrosheet data, high GB pitchers did very slightly better overall on HR/LD (I believe HR/F on THT stats page is HR/OF, is that correct?), and worse on HR/OF. Overall, I got R = -.07 for HR/Air overall using retrosheet (so yeah, similar to what you put in the article), but I’m not sure how I was filtering that (how many Batted Balls).
We talked about this last year:
http://www.insidethebook.com/ee/index.php/site/comments/pitching_components/#40
This was my quick study on the matter, where I concluded:
I think this is a case where in the very extreme GB guys, they will have a higher-than-normal HR rate, but over the whole population of pitchers, the relationship is fairly weak.
Right - from that same thread, comment #29 was where I laid out the small study I did on the ‘04 to ‘06 starting pitchers and their HR/F rates. The extreme groundball group - Halladay, Lowe, Webb, Felix, etc… - posted a HR/F rate about 1.5% higher than the rest of the population.
It’s not a huge difference, but it’s there, and it makes intuitive sense.
I took a look at GB% and HR/FB% for all pitcher seasons 1988-2006 sans 1999 (RS data), normalized around the league’s numbers for each season and blocked on throwing hand.
I don’t see any correlation between GB% and HR/FB%.
http://farm1.static.flickr.com/172/472612892_71955956c5_o.png
The theory is that guys who are exceptional in the GB rates are the ones who will show this tendency.
Can you do the following:
1 - don’t look at individual seasons, but look at pitcher careers (within the dataset you have)
2 - compare to the weighted league mean
3 - figure out how many SD their performance is from the mean
That is, follow this process:
http://www.tangotiger.net/dipsbands.html
Report back on the top 20 GB pitchers of that time period, and let us know their career HR/FB frequency, relative to league average.
It was actually a long enough process that I decided to go ahead and write it up as an article so the full blown piece should appear on THT some point soon, but I’ll give you the punchline:
Looking at the top and bottom 20 GB and comparing their HR/FB rates to the league HR/FB and using the individualized num of flyballs as the population mean yielded:
for bottom 20: on weighted average, they were about 3/4 of a SD higher than the league mean
for top 20: on weighted average, they were about 1/4 of a SD lower than the league mean
Just to be clear, the 20 guys least prone to give up a GB were also more prone than average in HR/FB? And to a lesser extent, the 20 high GB guys gave up HR/FB at a rate just below the average?
Fantastic… looking forward to the article.
Well, that’s very interesting. Perhaps the “apparent” positive correlation to GB% and HR/F David is talking about, and I thought I saw in the THT data is the result of limited data, or perhaps differences in the BIS and Retrosheet data are to blame. I know the RS ‘F’ and ‘P’ batted ball types are not the same as OF and IF on the THT website, os that may be. Also, I did not look by throwing hand, but only looked at the whole population, and although I found the same relationship limiting to >=300 Batted Balls as I did for “qualified’ starting pitchers (I’ve looked at this a couple times), it could still be a result of a limited sample.
Interesting, but there is a lot of bias in the HR park factors in this thought that needs to be controlled.
Paul Maholm (Pirates) is a 65.6% GB pitcher and has an 11.8% HR/FB rate (28 HR in 237 FB). That doesn’t seem odd until you look at the park factors..
9 at PNC in 128 IP
10 of the remaining 19 HR in two of the easiest parks to hit a HR in baseball that Wang and Webb had little exposure too, for example.
Plus, look at the bias from the catcher’s experience in game management as an affect on the HR/FB rate. Paulino (Pirates) gets very few calls on the black with his mitt waiving forcing his pitcher’s to come in the zone more, for example.
Just something for you to think about.
Here’s Matt’s article:
http://www.hardballtimes.com/main/article/groundballs-and-homerun-rates/
Matt’s conclusion:
The top 20 groundball pitchers had a magnitude of -0.28, while the bottom 20 groundball pitchers had a magnitude of +0.83. This tells us that the more extreme groundball pitchers are indeed seeing lower home run per flyball rates than the overall average, while the worst groundball pitchers are seeing higher home run per flyball rates than the overall average.
It looks like there are a lot of relievers in both samples, which gives me two concerns: 1) sample size, and 2) the possibility that relievers have some particular advantage in HR prevention. I’m not sure the latter is true, but we know that top closers tend to have below-avg BABIP, and I assume they are selected for the role in part for their ability to prevent HRs.
In any case, this is potentially a very important insight, but before we reject the conventional view that GB pitchers give up more runs on FBs (and vice-versa), I’d like to see more robust samples limited only to starting pitchers with a significant # of IP.
David Appleman did the same as Matt, but focusing on 2002-2006, and using BIS data:
http://www.fangraphs.com/blogs/index.php/fly-balls-and-groundball-pitchers
And he concludes the opposite!
I decided to run a similar analysis using data from 2002 to the present. The average HR/FB rate during that same period is 10.7%. If we look at the 2002-present totals of all pitchers with a groundball percentage (GB%) greater than 55% and more than 100 innings pitched, they have an average HR/FB of 12.2%. That 12.2% is not a weighted average, it’s just a simple average of each qualified pitcher’s HR/FB.
Using the same method, if you look at pitchers with a GB% less than 35%, they have an average HR/FB of 9.9%.
So, in this case, as David Cameron pointed out, the higher the GB rate, the higher the HR/FB rate.
Tango—perhaps another couple of decent studies might be:
1) Do starter’s with low DER’s behind them give up more HR per FB than those with high DER’s?
and..
2) How much of an influence does 20+ pitches over their last outing have on a pitcher’s HR/9 rate?
(ie: Since making his debut in 2005, Zach Duke has been asked by the Pirates 8 times to throw at least 21 pitches more than in his previous outing.
In those 8 games, the Pirates went 1-7, Duke allowed 34 earned runs in 49 innings of work (6.24 ERA), gave up 71 hits, and 5 home runs. In the remaining 48 games Duke started, he has a fabulous 3.51 ERA.)
youth + too many pitches = disaster for some pitchers
I’m looking further into this for perhaps a follow-up, but there’s some things I figured I’d share already:
1. Removing (most) RP doesn’t seem to affect the *overall* results. I re-ran the seasonal regression using only pitchers that threw at least 120 innings (good enough proxy in my view) and there’s still no significance.
http://farm1.static.flickr.com/227/496465377_07fcd2f6c4_o.png
2. There’s a huge issue concerning what data to use. I’m still confident that using Retrosheet data is better since it gives us a much larger sample than BIS, but there’s also the factors of cutoffs.
Using a 275IP cutoff for career numbers I get:
HR/FB for GB < 35%: 14.62%
HR/FB for GB > 55%: 12.82%
HR/FB for else: 13.38%
but that a)includes RP, which, I agree with others who have brought it up, may be affecting the breakdowns and b) isn’t yearly adjusted for league HR/FB rate which was steadily rising until about 2004 and since then has been dropping (hmmm, wonder why, though the differences are small).
I think I’m going to table this for a few weeks until I get back home, at which point I can go back and re-run through Restrosheet’s event files and this time keep track of the games started vs games breakdown for each pitcher and thus more easily separate SP from RP. I also need to investigate their designations on batted balls for the 2000-2 time period as they seem to be inconsistent.
For now though I still think number 1 is important and illuminating since it’s both a) focused almost exclusively on SP and b) yearly adjusted for overall league changes in HR/FB rate and the conclusion looks pretty clear, if there’s any correlation between GB rate and HR/FB, it’s not happening linearly on a macro level.
What I can believe, and this have some evidence behind it, is that the relationship is not linear, but that it’s more of a double hump (like a m) with the HR/FB rate rising from <~35, falling from >35 to <50, rising again from >50 to <65 and falling off after that. But again, it looks like you’re talking about a weak relationship.
Jul 04 01:40
BPro Idol
Jul 03 01:39
sUZR v bUZR
Jul 02 21:15
Batting Order and the pitcher
Jun 30 07:22
NHL draft analysis and spreadsheet 1994-2009
Jun 30 04:14
The Poz goes FJM on Harold Reynolds’ a$$ - gather around the kids
Jun 30 00:11
Blogosphere Question of the Day, 06/24; OR Why should OPS die?
Jun 27 16:04
Loss aversion in golf
Jun 26 16:30
Donald Fehr
Jun 26 14:04
Barry Code
Jun 26 10:33
David Wright
Well, LD%, IF% should probably be heavily regressed, and I’d use BaseRuns or use your Markov aproach to calculate “expected” runs instead of linear run values.
I’ve done something similar using just K%, BB%+HBP%, BattedBall%, and GB% of Batted Balls (all percentages of BFP) to created “expected” lines, and then used BaseRuns to calculate the expected Run Values. I had a version of it using LD% (as a percentage of Balls in the Air), since it has a strong correlation to GB% (for pitchers at least), I dont have the data handy, but it was like r=.72 for >=100 IP, and IF% of Non LD FlyBalls was correlated r=.57 to GB%. I used that as a basis to create “expected” IF and LD percentages, weighting them closer to the league averages. I used LD/Air=GB/BattedBalls*.625+.0875 (or somethign close to that), and cant remember what I did for IF/F. Truth be told, I think using something similar to that as a regression point, using the Z-Score method would have been more accurate. Also, again, no data handy, the NL had a much higher spread in IF/F than the AL, perhaps (like stdev(Z)=1.15 in the AL and 1.5 in the NL), but again, I don’t have the spreadsheets handy to give the exact numbers. One other thing I did was use a regression for the peripherals (K, BB, and GB%), even if it didn’t make much of a difference for most pitchers.
Also, in the NL, pitchers have a higher GB%, than DH’s, so I would use a modifier of like .9 (I can’t remember the actual number, I think it was >.9), so a 55% GB/Batted Ball becomes 50%. It’s a similar idea, and it’s similar to The DIPS 3.0 stuff David Gassko wrote about. I’ve mentioned it here, but I wouldn’t call what I did a study.
I think, GB% needs the lowest regression of any pitchers peripherals (actually, the pitch data on B-Ref may be stronger, but I looked before that was available), even stronger than K% and BB%, it makes sense to use it in a DIPS like stat, but as has been mentioned here, if you have enough data, regression to the mean is probably a better way to evaluate pitchers.
That said, I like Grahams idea, I think it’s a good idea, but my suggestion would be to, regress the peripherals, especially LD% and IF%, and I would create an “expected” line, and use Base-Runs to determine the run value of the final line.
***
I’ve thought about re-doing what I did with Retrosheet data (2003-2006). I’d have to recalculate everything, since the Retrosheet Pop-Up And Fly-Ball designations are different than the BIS IF and OF designations, but overall. I’d especially like to use it to get better league adjustments, park adjustments, and (possibly) to do a different set of hit values for relievers and starters. I looked at starter/reliever data using retrosheet once, and relievers did better on flyballs, and worse on groundballs, but that data was using Linear Weights as a basis, so I’d like to re-do it at the component level, if I were to re-do what I did before.
FYI, if anyone is curious, on the Lookout Landing and USSM, I go by chrisisasavage.