THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, January 09, 2007

GB plus K equals Great

By Tangotiger, 11:28 AM

Rich Lederer gives us a great graphical presentation of pitchers’ GB and K rates.  If he could have figured out how to do a three-dimensional chart to include the walk rate, that would have been perfect.  Otherwise, I would have suggested K minus BB rate instead (I think Guy first suggested it, and I’m on board with it).  It is almost as good as FIP and nice and clean. 


#1    Chris Miller      (see all posts) 2007/01/09 (Tue) @ 21:48

WOW!  I haven’t had internet for 2 and a half weeks, and my computer crashed, and all I had was a copy of the 2004-2006 THT data I had burned onto a CD, and the only really saber stuff I’ve done, just happens to be (attempted) run estimation using K/BFP, BB/BFP, and GB/Batted Ball using that data. 

I wont go into the nitty gritty of all of it, because the lack of data forced me to do some incorrect things, and I’ve found some obvious flaws in what I was doing, but it’s still pretty cool, that someone went in the same direction, on one of my favorite baseball sites.

I think a fair RA model can be constructed with those 3 stats.  As far as I can tell they’re the most stable peripherals a pitcher has, at least using currently (freely) available data (I came up w/ regression of 50% toward the mean at 57 batted balls for GB%, 80 BFP for SO%, and 190 BFP for BB%).

I did something like Gassko’s DIPS 3.0, except I calculated LD% of Airballs as GB% * 0.584 + 0.0972, and did something similar for IF/F, but I believe I goofed up my calculations there.  The reasoning is LD% Airballs correlates R=.73 to GB% of Batted Balls over an average of 671 Batted Balls, and IF/F correlated at R = -.58 using the same sample (159 pitchers).  I then (using some old data I had), converted into an expected ‘batted against’ line for each pitcher, both individual years, using 2004-2006 data, and all 3 years regressed toward the mean based on each pitchers BFP and Batted Balls totals during the time frame.  I then used BaseRuns to create a “league neutral” RA for each pitcher. 

What stood out at me was how well it treated both high SO% pitchers and high GB% pitchers, the top of the list was littered with both.  Of course it loved Felix and Liriano, but I thin Liriano’s SO% is somewhat inflated by his relief appearances, and he’ll probably regress somewhat to the mean, but since SO and GB rates are pretty much stable, I think if he’s healthy, and returns to ‘06 form, he’ll be one of the best (and possibly the best) SP in baseball. 

I also tried calculating Run per BB using BB% and Run per Batted Ball (using GB%) and then using (1-SO%-BB%)*RunPerBattedBall to calculate RA, which I never did get satisfactory results doing, but I did find that Runs Per Batted Ball decreases w/ each point of GB%, but it’s not linear at all, it scales better to GB%^.5 or even GB%^.25 than GB%. 

Anyway, I have a lot of things I can do to improve this, like creating modifiers for NL vs AL for BB%, K%, and GB%, creating park factors for each stat, and seperating the batting lines of NL and AL pitchers (and perhaps creating seperate NL vs AL RA calculations), seperating relief and starting pitching (I only looked at starters), etc.  Since I have (dial-up), and tommorow get DSL at my new place, I might do this soon.  I just got 03-06 Retrosheet data into MSAccess, and hope to get MYSQL installed tommorow, and hopefully, the Lahmans DB will be updated soon.  Anyway, like I said, no results to post, but I just found it fascinating Baseball Analysts had a piece on this, since it’s been right up my alley the last couple weeks.  And as an M’s fan, any way I tried cutting the data made me happy (about Felix).


#2    tangotiger      (see all posts) 2007/01/09 (Tue) @ 23:04

Nate Silver came up with something, which I critiqued, and then offer an off-the-cuff model as another way to do it:

http://www.insidethebook.com/ee/index.php/site/comments/quick_eras/


#3    Chris Miller      (see all posts) 2007/01/10 (Wed) @ 00:49

Thanks!  I knew you had done that, but I hadn’t had internet at home and had only been at work for a week during that time (and avoid surfing the web for the most part at work). Originally I was just trying to generate batted ball lines (GB%, IF%, OF%, LD%) from just GB%, then decided to reapply it to an “expected” batted ball line and generated BaseRuns based on it.


#4    David Arnott      (see all posts) 2007/01/10 (Wed) @ 04:00

The walk rate could be indicated by either larger/smaller plot points, with popups as the mouse goes over the points, or with a blue-to-red shading of points (more blue=more walks, more red=fewer). I especially like the color idea.


#5          (see all posts) 2007/01/10 (Wed) @ 10:05

I had mentioned the same thing on Baseball Musings when he linked to this graph.  I also said that the 3D graph would, I think, further cement Liriano as likely the best pitcher in baseball.  He has about the best K rate, one of the best BB ratea for anyone who actually strikes people out, and a pretty solid GB rate.

Those are the 3 factors that made Pedro amazing in ‘99 - a tremendous K rate, a miniscule BB rate, and a pretty solid GB rate.  I think Liriano is the best chance at us seeing anything like that again in the near future.


#6    Guy      (see all posts) 2007/01/10 (Wed) @ 11:11

We can keep this in 2 dimensions without losing much just by using K-BB per BF (as Tango suggests above). 

I thought the surprising part of Rich’s analysis was in the followup article on relievers, where he found that GB rate had virtually no impact on ERA, and the impact of K-rate seemed less than for starters.  Perhaps relievers give up fewer HRs (even controlling for their GB/FB tendencies)?  Interesting.....


#7    tangotiger      (see all posts) 2007/01/10 (Wed) @ 11:23

This is the article Guy is talking about:
http://baseballanalysts.com/archives/2007/01/categorizing_pi_1.php

Beware sample size.  The average reliever in his study probably has 40% the batters faced of the average starter in his study.  On top of which, the way an “earned run” is credited is not necessarily a good way.  At the least, figure an RA, and figure it over a period of years.


#8    Rich Lederer      (see all posts) 2007/01/10 (Wed) @ 18:01

Re sample size, I agree that is an issue at the individual level.  However, the study encompassed more than 10,000 innings pitched in the aggregate.  As such, I think the information could be statistically meaningful.

I will do my best to try to test the results based on RA as well.


#9    tangotiger      (see all posts) 2007/01/10 (Wed) @ 18:11

Rich, it doesn’t matter if the “aggregate” had 10,000 IP.  You can have 1000 players with 10 IP and that is not equally meaningful to 100 players with 100 IP.  The 10 IP gives you the uncertainty level for the ERA itself.  The 1000 players tells you of your confidence for that uncertainty.

Just think of flipping a coin.  Does it matter if 10,000 or 10 million people flip a coin 10 times?  The mean “heads” rate will still be .500, with 1 standard deviation = .160.

If on the other hand 10 people flipped a coin 10 million times, the mean will be .500, with an SD of almost zero.  That is, each person will have a heads rate of just about exactly .500.


#10    Rich Lederer      (see all posts) 2007/01/11 (Thu) @ 11:55

I understand your point, but I’m not sure it is applicable here.  The average ERA for above-avg GB and K rates, and below-avg GB and K rates, as well as the four quadrants, were all based on weighted averages.  In other words, the averages given were the means and not an equal-weighted average or a median.  Therefore, I believe the aggregate number of innings provides for a statistically meaningful sample size on the whole.

As I mentioned in the article, the use of ERA (or even RA) for relievers can be problematic because there are many instances in which they can escape an inning without having to get three outs.  But I think that is a separate issue.


#11    tangotiger      (see all posts) 2007/01/11 (Thu) @ 12:40

Rich, you are wrong.

The average IP per player gives you an uncertainty-level-of-the-mean.  The number of players gives you an uncertainty level of the uncertainty-level-of-the-mean.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 14:20
Marcel 2009 is here

Nov 20 14:19
Nate Silver: hero to interviewers

Nov 20 13:42
Top Free Agent Pitchers

Nov 20 12:29
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being

Nov 20 12:27
David G. checks in again on whether experience matters in the post-season

Nov 20 10:42
Offense by position groups by decade

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel