THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, April 24, 2007

Pitching Measure: call for comments

By Tangotiger, 11:06 AM

Someone is asking for more feedback on his work.


#1    Chris Miller      (see all posts) 2007/04/24 (Tue) @ 13:19

Well, LD%, IF% should probably be heavily regressed, and I’d use BaseRuns or use your Markov aproach to calculate “expected” runs instead of linear run values.

I’ve done something similar using just K%, BB%+HBP%, BattedBall%, and GB% of Batted Balls (all percentages of BFP) to created “expected” lines, and then used BaseRuns to calculate the expected Run Values.  I had a version of it using LD% (as a percentage of Balls in the Air), since it has a strong correlation to GB% (for pitchers at least), I dont have the data handy, but it was like r=.72 for >=100 IP, and IF% of Non LD FlyBalls was correlated r=.57 to GB%.  I used that as a basis to create “expected” IF and LD percentages, weighting them closer to the league averages.  I used LD/Air=GB/BattedBalls*.625+.0875 (or somethign close to that), and cant remember what I did for IF/F. Truth be told, I think using something similar to that as a regression point, using the Z-Score method would have been more accurate.  Also, again, no data handy, the NL had a much higher spread in IF/F than the AL, perhaps (like stdev(Z)=1.15 in the AL and 1.5 in the NL), but again, I don’t have the spreadsheets handy to give the exact numbers.  One other thing I did was use a regression for the peripherals (K, BB, and GB%), even if it didn’t make much of a difference for most pitchers.

Also, in the NL, pitchers have a higher GB%, than DH’s, so I would use a modifier of like .9 (I can’t remember the actual number, I think it was >.9), so a 55% GB/Batted Ball becomes 50%.  It’s a similar idea, and it’s similar to The DIPS 3.0 stuff David Gassko wrote about.  I’ve mentioned it here, but I wouldn’t call what I did a study. 

I think, GB% needs the lowest regression of any pitchers peripherals (actually, the pitch data on B-Ref may be stronger, but I looked before that was available), even stronger than K% and BB%, it makes sense to use it in a DIPS like stat, but as has been mentioned here, if you have enough data, regression to the mean is probably a better way to evaluate pitchers.

That said, I like Grahams idea, I think it’s a good idea, but my suggestion would be to, regress the peripherals, especially LD% and IF%, and I would create an “expected” line, and use Base-Runs to determine the run value of the final line.

***

I’ve thought about re-doing what I did with Retrosheet data (2003-2006).  I’d have to recalculate everything, since the Retrosheet Pop-Up And Fly-Ball designations are different than the BIS IF and OF designations, but overall.  I’d especially like to use it to get better league adjustments, park adjustments, and (possibly) to do a different set of hit values for relievers and starters.  I looked at starter/reliever data using retrosheet once, and relievers did better on flyballs, and worse on groundballs, but that data was using Linear Weights as a basis, so I’d like to re-do it at the component level, if I were to re-do what I did before. 

FYI, if anyone is curious, on the Lookout Landing and USSM, I go by chrisisasavage.


#2    Chris Miller      (see all posts) 2007/04/24 (Tue) @ 13:39

Well, looks like he IS regressing LD%, IF%, and HR% 75% toward the mean.  That’s probably not too far off, but I’d use the Z-Score method to figure how much regression to apply.


#3    Chris Miller      (see all posts) 2007/04/24 (Tue) @ 13:53

One more thing.  I think one the hittracker data is finished for the year, we’ll have a better idea of why some people can have lower HR/F.  I believe Jamie Moyer, for example, had the lowest average distance of a HR last year per Hit Tracker.  That makes sense.  Having watched him during his tenure as a Mariner, especially after the move to Safeco, my Dad and I hypothesized batters were not getting as much distance off of him, perhaps the much lower than normal pitching speeds, and changing speeds a lot makes him harder to hit.  Also a lot of high GB guys have high HR/F numbers.  That totally goes in the face of a lot of currently held sabermetric views, but I think hit tracker will help clear a lot of it up. 

I think that the batted ball data is wonderful, but with the new B-Ref pitch data, hit tracker, and extended gameday, the sabermetric community is on the verge of understanding things in much more detail.  However, it opens up the possibility of misinterperting what the data meant, much like what happened when DIPS was introduced.


#4          (see all posts) 2007/04/24 (Tue) @ 14:16

I’d guess you’re right about the lower pitching speeds impacting the HR distance.  It’s just a theory of mine, but I think pitch speed is the biggest factor in the original DIPS adjustment for knuckleballers, and the smaller one for lefties.  They throw slower, thus balls are hit slower off of them, giving fielders more time to react.  I’m not sure the changing speeds of Moyer has a huge impact on the distance of balls hit, but the overall low speed is almost certainly a factor.  Where does Wakefield fall on this list of avg. distance per HR?  A better question would be average MPH per fly ball, I think, because if you play in a field with 200 foot fences, your average HR distance will be small no matter how you pitch.


#5    David Cameron      (see all posts) 2007/04/24 (Tue) @ 14:27

Also a lot of high GB guys have high HR/F numbers.  That totally goes in the face of a lot of currently held sabermetric views, but I think hit tracker will help clear a lot of it up.

I’m not sure why you think that flies in the face of sabermetric views.  A bunch of us have studied HR/F rates, and we’ve all come to the same conclusion - as a group, GB pitchers post higher HR/F rates than FB pitchers. 

This is pretty much an accepted fact among everyone who has looked at the issue, as far as I know.  I know I’ve mentioned it many times on USSM.


#6    Chris Miller      (see all posts) 2007/04/24 (Tue) @ 14:29

I’ve been thinking about mining the Hit Tracker data.  I think it’ll be more useful for 2007, seeing how he’s tracking all fly-balls.  I agree, the velocity is probably a better proxy than distance, until we get all of the flyball data.


#7    Chris Miller      (see all posts) 2007/04/24 (Tue) @ 14:34

Sorry David, you’re right, which is why I even mentioned it.  Two things, I kind of lumped the GB pitcher part, and the Moyer part together.  Also I agree, you, and others, have looked at the issue and you have pointed that out, so it’s not “in the face of” per se, bad choice of words I guess.  Even w/ the HR distance/velocity, I suspect a lot of saber people realized this, even before Hit tracker, they just couldn’t quantify it.  I stand corrected.


#8    David Gassko      (see all posts) 2007/04/24 (Tue) @ 15:37

David, I don’t think that’s true at all:

http://www.hardballtimes.com/main/article/the-truth-about-the-grounder/

“The correlation between a pitcher’s ground ball rate, and the percentage of the balls hit in the air against him (measured as outfield fly balls plus line drives) that land in the seats is -.05, otherwise known as non-existent.”


#9    Chris Miller      (see all posts) 2007/04/24 (Tue) @ 15:53

If I look at all qualified pitchers (using THT data) 2004-2006, the correlation between GB% and HR/F is 19%.  The average HR/F is 12.27%.  If I look only at pitchers who had >=52% GB rate (1 SD above the average from that sample) for a season, the average is 13.83%. Also, I believe the relationship is exponential, not linear, if I use GB%^2 I get r=.20, GB%^5 r=.22.  So it seems to be pronounced at the extremes.  However, When I looked at retrosheet data, high GB pitchers did very slightly better overall on HR/LD (I believe HR/F on THT stats page is HR/OF, is that correct?), and worse on HR/OF.  Overall, I got R = -.07 for HR/Air overall using retrosheet (so yeah, similar to what you put in the article), but I’m not sure how I was filtering that (how many Batted Balls).


#10    tangotiger      (see all posts) 2007/04/24 (Tue) @ 15:57

We talked about this last year:
http://www.insidethebook.com/ee/index.php/site/comments/pitching_components/#40

This was my quick study on the matter, where I concluded:

I think this is a case where in the very extreme GB guys, they will have a higher-than-normal HR rate, but over the whole population of pitchers, the relationship is fairly weak.


#11    David Cameron      (see all posts) 2007/04/24 (Tue) @ 17:58

Right - from that same thread, comment #29 was where I laid out the small study I did on the ‘04 to ‘06 starting pitchers and their HR/F rates.  The extreme groundball group - Halladay, Lowe, Webb, Felix, etc… - posted a HR/F rate about 1.5% higher than the rest of the population. 

It’s not a huge difference, but it’s there, and it makes intuitive sense.


#12    Matthew Carruth      (see all posts) 2007/04/25 (Wed) @ 13:38

I took a look at GB% and HR/FB% for all pitcher seasons 1988-2006 sans 1999 (RS data), normalized around the league’s numbers for each season and blocked on throwing hand.

I don’t see any correlation between GB% and HR/FB%.

http://farm1.static.flickr.com/172/472612892_71955956c5_o.png


#13    tangotiger      (see all posts) 2007/04/25 (Wed) @ 14:41

The theory is that guys who are exceptional in the GB rates are the ones who will show this tendency.

Can you do the following:
1 - don’t look at individual seasons, but look at pitcher careers (within the dataset you have)
2 - compare to the weighted league mean
3 - figure out how many SD their performance is from the mean

That is, follow this process:
http://www.tangotiger.net/dipsbands.html

Report back on the top 20 GB pitchers of that time period, and let us know their career HR/FB frequency, relative to league average.


#14    Matthew Carruth      (see all posts) 2007/04/27 (Fri) @ 15:53

It was actually a long enough process that I decided to go ahead and write it up as an article so the full blown piece should appear on THT some point soon, but I’ll give you the punchline:

Looking at the top and bottom 20 GB and comparing their HR/FB rates to the league HR/FB and using the individualized num of flyballs as the population mean yielded:

for bottom 20: on weighted average, they were about 3/4 of a SD higher than the league mean

for top 20: on weighted average, they were about 1/4 of a SD lower than the league mean


#15    Tangotiger      (see all posts) 2007/04/27 (Fri) @ 16:14

Just to be clear, the 20 guys least prone to give up a GB were also more prone than average in HR/FB?  And to a lesser extent, the 20 high GB guys gave up HR/FB at a rate just below the average?

Fantastic… looking forward to the article.


#16    Chris Miller      (see all posts) 2007/04/27 (Fri) @ 16:41

Well, that’s very interesting.  Perhaps the “apparent” positive correlation to GB% and HR/F David is talking about, and I thought I saw in the THT data is the result of limited data, or perhaps differences in the BIS and Retrosheet data are to blame.  I know the RS ‘F’ and ‘P’ batted ball types are not the same as OF and IF on the THT website, os that may be.  Also, I did not look by throwing hand, but only looked at the whole population, and although I found the same relationship limiting to >=300 Batted Balls as I did for “qualified’ starting pitchers (I’ve looked at this a couple times), it could still be a result of a limited sample.


#17    Joliet Jake      (see all posts) 2007/05/11 (Fri) @ 05:35

Interesting, but there is a lot of bias in the HR park factors in this thought that needs to be controlled.

Paul Maholm (Pirates) is a 65.6% GB pitcher and has an 11.8% HR/FB rate (28 HR in 237 FB). That doesn’t seem odd until you look at the park factors..

9 at PNC in 128 IP

10 of the remaining 19 HR in two of the easiest parks to hit a HR in baseball that Wang and Webb had little exposure too, for example.

Plus, look at the bias from the catcher’s experience in game management as an affect on the HR/FB rate. Paulino (Pirates) gets very few calls on the black with his mitt waiving forcing his pitcher’s to come in the zone more, for example.

Just something for you to think about.


#18    Tangotiger      (see all posts) 2007/05/11 (Fri) @ 07:48

Here’s Matt’s article:
http://www.hardballtimes.com/main/article/groundballs-and-homerun-rates/


#19    tangotiger      (see all posts) 2007/05/11 (Fri) @ 10:47

Matt’s conclusion:

The top 20 groundball pitchers had a magnitude of -0.28, while the bottom 20 groundball pitchers had a magnitude of +0.83. This tells us that the more extreme groundball pitchers are indeed seeing lower home run per flyball rates than the overall average, while the worst groundball pitchers are seeing higher home run per flyball rates than the overall average.


#20    Guy      (see all posts) 2007/05/11 (Fri) @ 11:06

It looks like there are a lot of relievers in both samples, which gives me two concerns:  1) sample size, and 2) the possibility that relievers have some particular advantage in HR prevention.  I’m not sure the latter is true, but we know that top closers tend to have below-avg BABIP, and I assume they are selected for the role in part for their ability to prevent HRs. 

In any case, this is potentially a very important insight, but before we reject the conventional view that GB pitchers give up more runs on FBs (and vice-versa), I’d like to see more robust samples limited only to starting pitchers with a significant # of IP.


#21    tangotiger      (see all posts) 2007/05/11 (Fri) @ 16:13

David Appleman did the same as Matt, but focusing on 2002-2006, and using BIS data:
http://www.fangraphs.com/blogs/index.php/fly-balls-and-groundball-pitchers

And he concludes the opposite!

I decided to run a similar analysis using data from 2002 to the present. The average HR/FB rate during that same period is 10.7%. If we look at the 2002-present totals of all pitchers with a groundball percentage (GB%) greater than 55% and more than 100 innings pitched, they have an average HR/FB of 12.2%. That 12.2% is not a weighted average, it’s just a simple average of each qualified pitcher’s HR/FB.

Using the same method, if you look at pitchers with a GB% less than 35%, they have an average HR/FB of 9.9%.

So, in this case, as David Cameron pointed out, the higher the GB rate, the higher the HR/FB rate.


#22    Joliet Jake      (see all posts) 2007/05/12 (Sat) @ 02:47

Tango—perhaps another couple of decent studies might be:

1) Do starter’s with low DER’s behind them give up more HR per FB than those with high DER’s?

and..

2) How much of an influence does 20+ pitches over their last outing have on a pitcher’s HR/9 rate?

(ie: Since making his debut in 2005, Zach Duke has been asked by the Pirates 8 times to throw at least 21 pitches more than in his previous outing.

In those 8 games, the Pirates went 1-7, Duke allowed 34 earned runs in 49 innings of work (6.24 ERA), gave up 71 hits, and 5 home runs. In the remaining 48 games Duke started, he has a fabulous 3.51 ERA.)

youth + too many pitches = disaster for some pitchers


#23    Matthew Carruth      (see all posts) 2007/05/13 (Sun) @ 13:37

I’m looking further into this for perhaps a follow-up, but there’s some things I figured I’d share already:

1. Removing (most) RP doesn’t seem to affect the *overall* results. I re-ran the seasonal regression using only pitchers that threw at least 120 innings (good enough proxy in my view) and there’s still no significance.

http://farm1.static.flickr.com/227/496465377_07fcd2f6c4_o.png

2. There’s a huge issue concerning what data to use. I’m still confident that using Retrosheet data is better since it gives us a much larger sample than BIS, but there’s also the factors of cutoffs.

Using a 275IP cutoff for career numbers I get:
HR/FB for GB < 35%: 14.62%
HR/FB for GB > 55%: 12.82%
HR/FB for else: 13.38%

but that a)includes RP, which, I agree with others who have brought it up, may be affecting the breakdowns and b) isn’t yearly adjusted for league HR/FB rate which was steadily rising until about 2004 and since then has been dropping (hmmm, wonder why, though the differences are small).

I think I’m going to table this for a few weeks until I get back home, at which point I can go back and re-run through Restrosheet’s event files and this time keep track of the games started vs games breakdown for each pitcher and thus more easily separate SP from RP. I also need to investigate their designations on batted balls for the 2000-2 time period as they seem to be inconsistent.

For now though I still think number 1 is important and illuminating since it’s both a) focused almost exclusively on SP and b) yearly adjusted for overall league changes in HR/FB rate and the conclusion looks pretty clear, if there’s any correlation between GB rate and HR/FB, it’s not happening linearly on a macro level.

What I can believe, and this have some evidence behind it, is that the relationship is not linear, but that it’s more of a double hump (like a m) with the HR/FB rate rising from <~35, falling from >35 to <50, rising again from >50 to <65 and falling off after that. But again, it looks like you’re talking about a weak relationship.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:17
Mail: rWAR v fWAR

Sep 02 15:08
The two uncertainties of UZR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?