THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Sunday, July 24, 2011

My issue with regression equations

By Tangotiger, 10:35 PM

Patriot captures it right here:

Building your metric around a run estimator does not necessarily restrict you to simply plugging in the numbers in the appropriate place. Suppose you wanted to construct a metric based on batted ball types, strikeouts, and walks. One way to go about it would be to simply go through and estimate singles, doubles, triples, homers, and outs in play based on the percentage of each batted ball type that wind up as each. So, you would end up with equations that might look something like this:

Singles = .057FB + .217GB + .516LD + .017PU

However, if you believe that you have gleaned some other insights into the relationship between events that could improve your metric (such as strikeout pitchers having lower HR/FB rates) , you could still build that in to your formula for estimated home runs, and plug those into the run estimator.
It’s more difficult than running a regression, and a more delicate balancing act (at least in terms of developing the formula), but it allows you to stay grounded in a model that estimates runs by taking a first step of, well, estimating runs.

He’s saying this (or if he’s not saying it, then that’s how I am reading it, and, in any case, it’s how I think it):

1. You start with a working model of how runs are created.  This is the beauty of something like BaseRuns, because it works so darn well… GIVEN its inputs.  If you know the number of hits, HR, walks, outs, then we have a fantastically great estimate as to how many runs are expected to be scored.

2. If you don’t know the inputs, estimate the inputs… but don’t change the actual run scoring model.  So, again, if you happen to not have the number of doubles, but can estimate the number of doubles that this pitcher either gave up, deserved to give up, or was expected to give up, and it’s based on his batted ball distribution profile, and/or the number of HR he gave up, and/or his SO/BB ratio, then estimate the doubles in that manner.... but do NOT touch the run scoring model.

Once you have the estimates of all your inputs, then you can plug them into an established working model.

Even something like FIP is basically a regression equation, because it doesn’t adhere to an actual run scoring model.  Of course, there is a tradeoff between complexity level.  A linear equation is used at the expense of a real baseball run scoring model because it’s easier to compute or understand.  But, if you’ve got a complex linear equation, or even a complex multiplicative equation, or some other form of equation, then you’ve got the worst of both worlds.

This is why I like FIP or wOBA, because they are such simple metrics, that its strengths and limitations are readily apparent.

So, ANY pitcher metric that is not grounded in BaseRuns is immediately setup for a limitation.  The bigger your limitation, then the easier your metric must be.

SIERA, for example, is a good example of a metric that is too complex for its own good.  The insights, the benefits of SIERA is hidden inside its complexity.  But, if Matt were to follow Patriot’s lead here, and compute estimates for events (1b, 2b, 3b, hr, bb, so) based on his findings, about how things interact, then we would have a very helpful metric.

So, that’s my recommendation as to how you can really advance the cause: keep the logic of baseball intact if you insist on complexity.

(23) Comments • 2011/07/26 • SabermetricsStatistical_Theory
Page 1 of 1 pages

Latest...

COMMENTS

May 26 03:03
Pete Palmer’s new book: Basic Ball

May 26 01:11
Largest demonstration in Canadian history?

May 25 23:40
“Why Kickstarter works”

May 25 19:41
What sabermetrics is NOT

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

July 24, 2011
My issue with regression equations