THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, September 30, 2008

Splitting the batting lines into binomial metrics

By Tangotiger, 10:56 AM

Pizza lays out the idea.  As studes noted, we talked about this alot in the past. 

What Brian suggests in the comments is the way I normally approach the problem, as the way Voros did it.  Here are my aging patterns by these metrics.

I also echo Pizza’s position on where to put the HR.  Sometimes I do it the way Pizza says it, and sometimes the way Voros says it.  The fact of the matter is that you can construct two equally plausible scenarios.

There is an undeniable relationship between K, BB, and HR.  There is also an undeniable relationship between HR and FB (and to a lesser extent LD).  The only rigtht way to do it is to model this relationship.  If for example you do it as Pizza proposes it, then you need to have an additional function on the HR/FB rate that includes the K and BB rate.  If you do it as Voros proposes it, you need to include the FB rate to apply to the HR rate.


#1          (see all posts) 2008/09/30 (Tue) @ 11:56

This is how I’ve been doing MLEs & projections.

Start with a fixed number of plate appearances. First ddjust the hbp, bb & so. Then find how many balls hit fair are left. How many are homeruns? That leaves balls in play. How many of those went for hits? How many of those went for extra bases? How many of those were triples?

I believe one point Pizza was making is that each of these should be regressed by different amounts. We discussed this here a couple months ago while I was working on my projections. I agree, it is true, but I also found that if I use 150 PAs of regression for all components it just didn’t make that much of a difference. For example, babip never found a minimum rms even after 1000 PAs, but the slope was so gradual that the difference in rms between 150 and 1000 PAs wasn’t really consequential (at least for batters).

Heads up - I will be discussing how each of the components age on Thursday.


#2    Sky      (see all posts) 2008/09/30 (Tue) @ 12:59

I believe one point Pizza was making is that each of these should be regressed by different amounts.

I’m actually really surprised this idea hasn’t been fleshed out yet.  We’ve all been aware of its truth for years.

I believe the guys developing StatCorner.com (and tRA) have done some work on this (at least for pitchers, and they haven’t published it yet), but I’m really looking forward to seeing everyone’s results.


#3    Tangotiger      (see all posts) 2008/09/30 (Tue) @ 13:57

Four years ago, MGL did work here:
http://tangotiger.net/mgl/regression.pdf

And we had a thread here:
http://www.tangotiger.net/archives/stud0274.shtml

Post 6 gives you the summary:
http://www.tangotiger.net/archives/stud0274.shtml#1006


#4    Pizza Cutter      (see all posts) 2008/09/30 (Tue) @ 14:15

On HR rates, the pitcher gives up the flyball, but the batter makes it a home run.  That’s an over-simplification, but it gets the basic point across.  So, HR rates for pitchers will be about as consistent as FB rates over a long period of time.

And yeah, the point I’m making is that the regression coefficients are different from stat to stat.  In the background, I’ve been generating tables on exactly what those coefficients are on a PA-by-PA basis.  It’s slow to come, but it will eventually be out and available to all who are interested.


#5    Sky      (see all posts) 2008/09/30 (Tue) @ 15:40

Tango/3—thanks for the links, I don’t remember seeing those before.  I was more thinking of regression in regards to batted ball data, though, taking BABIP to the next sub-level.  Regression of GB%, LD%, FG%, etc.  But also regression of run-values on those things, such as a player’s SLG on LDs, HR/FB, AVG on FB, etc.


#6    Tangotiger      (see all posts) 2008/09/30 (Tue) @ 16:11

We did those as well a long while ago.  Check under Batted_Ball here:
http://www.insidethebook.com/ee/index.php/site/categorylinks/


#7    Tangotiger      (see all posts) 2008/09/30 (Tue) @ 16:12

We did something along those lines anyway… don’t remember exactly what…


#8    Tangotiger      (see all posts) 2008/09/30 (Tue) @ 16:21

And of course, MGL’s DIPS piece has the information you need:
http://www.baseballthinkfactory.org/files/primate_studies/discussion/lichtman_2004-02-29_0/

(Note: that is a ground-breaking piece, and should be required reading for all newcomers.)

He reports both the correlation for the frequency of event types (GB, FB, LD), but also the correlation on the success for each event type.

He also points out that each data pair has around 500 BIP.

Those data should be enough to determine the regression components Sky is asking about.


#9    Tangotiger      (see all posts) 2008/09/30 (Tue) @ 16:29

I’ll get you started.

MGL reported a year-to-year correlation of .74 on GB per BIP.

r
= var(true)/var(observed)
= 1-var(random)/var(observed)

var(random) = .5*.5/500 = .022^2

With r=.74, this implies var(observed) = .044^2

In order to get var(random) = .044^2, we need BIP = 125

So, at 125 BIP, we regress the GB rate 50% toward the league mean.

r = BIP/(BIP+125)

regression amount = 1 - r

That simple.  You could also perform the same exercise using current one-year data, as I’ve done it before.


#10          (see all posts) 2008/09/30 (Tue) @ 17:30

PC #4 - One of the things I have on my to do list is seeing how much HRs are the result of “mistake” or “fat” pitches. Data from hitracker shows that the variance of speed of bat for HRs is much lower for pitchers than it is for batters. I’m not so sure that for pitchers HRs are just a random outcome of a flyball. I think it may be that certain combinations of pitch type/location/count are more likely to be a HR. It’s up to the pitcher to not throw it, and up to the batter to recognize it and not let it go by. Eric Seidman has looked at some of this with his Johan Santana HR rate article, and it also gets into your own pitch recognition work. This is one part of that where I have a hypothesis, but it still needs rigorous testing (so getting my own pfx database finished in the offseason is a priority)


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 21 17:29
Sabermetric Moves of the 2009 Pre-Season

Nov 22 06:40
The New Triple Crown

Nov 22 06:24
Chance of Scoring by Base/Out, Retrosheet Years

Nov 22 02:48
How good are the Fans in evaluating fielding?

Nov 21 20:13
Runs Produced

Nov 21 19:27
Marcel 2009 is here

Nov 21 16:43
Nate Silver: hero to interviewers

Nov 21 10:57
New BBTN

Nov 20 20:34
ABSO-lutely… not!

Nov 20 19:23
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being