Tuesday, January 09, 2007
GB plus K equals Great
Rich Lederer gives us a great graphical presentation of pitchers’ GB and K rates. If he could have figured out how to do a three-dimensional chart to include the walk rate, that would have been perfect. Otherwise, I would have suggested K minus BB rate instead (I think Guy first suggested it, and I’m on board with it). It is almost as good as FIP and nice and clean.
WOW! I haven’t had internet for 2 and a half weeks, and my computer crashed, and all I had was a copy of the 2004-2006 THT data I had burned onto a CD, and the only really saber stuff I’ve done, just happens to be (attempted) run estimation using K/BFP, BB/BFP, and GB/Batted Ball using that data.
I wont go into the nitty gritty of all of it, because the lack of data forced me to do some incorrect things, and I’ve found some obvious flaws in what I was doing, but it’s still pretty cool, that someone went in the same direction, on one of my favorite baseball sites.
I think a fair RA model can be constructed with those 3 stats. As far as I can tell they’re the most stable peripherals a pitcher has, at least using currently (freely) available data (I came up w/ regression of 50% toward the mean at 57 batted balls for GB%, 80 BFP for SO%, and 190 BFP for BB%).
I did something like Gassko’s DIPS 3.0, except I calculated LD% of Airballs as GB% * 0.584 + 0.0972, and did something similar for IF/F, but I believe I goofed up my calculations there. The reasoning is LD% Airballs correlates R=.73 to GB% of Batted Balls over an average of 671 Batted Balls, and IF/F correlated at R = -.58 using the same sample (159 pitchers). I then (using some old data I had), converted into an expected ‘batted against’ line for each pitcher, both individual years, using 2004-2006 data, and all 3 years regressed toward the mean based on each pitchers BFP and Batted Balls totals during the time frame. I then used BaseRuns to create a “league neutral” RA for each pitcher.
What stood out at me was how well it treated both high SO% pitchers and high GB% pitchers, the top of the list was littered with both. Of course it loved Felix and Liriano, but I thin Liriano’s SO% is somewhat inflated by his relief appearances, and he’ll probably regress somewhat to the mean, but since SO and GB rates are pretty much stable, I think if he’s healthy, and returns to ‘06 form, he’ll be one of the best (and possibly the best) SP in baseball.
I also tried calculating Run per BB using BB% and Run per Batted Ball (using GB%) and then using (1-SO%-BB%)*RunPerBattedBall to calculate RA, which I never did get satisfactory results doing, but I did find that Runs Per Batted Ball decreases w/ each point of GB%, but it’s not linear at all, it scales better to GB%^.5 or even GB%^.25 than GB%.
Anyway, I have a lot of things I can do to improve this, like creating modifiers for NL vs AL for BB%, K%, and GB%, creating park factors for each stat, and seperating the batting lines of NL and AL pitchers (and perhaps creating seperate NL vs AL RA calculations), seperating relief and starting pitching (I only looked at starters), etc. Since I have (dial-up), and tommorow get DSL at my new place, I might do this soon. I just got 03-06 Retrosheet data into MSAccess, and hope to get MYSQL installed tommorow, and hopefully, the Lahmans DB will be updated soon. Anyway, like I said, no results to post, but I just found it fascinating Baseball Analysts had a piece on this, since it’s been right up my alley the last couple weeks. And as an M’s fan, any way I tried cutting the data made me happy (about Felix).