Tuesday, September 30, 2008
Splitting the batting lines into binomial metrics
Pizza lays out the idea. As studes noted, we talked about this alot in the past.
What Brian suggests in the comments is the way I normally approach the problem, as the way Voros did it. Here are my aging patterns by these metrics.
I also echo Pizza’s position on where to put the HR. Sometimes I do it the way Pizza says it, and sometimes the way Voros says it. The fact of the matter is that you can construct two equally plausible scenarios.
There is an undeniable relationship between K, BB, and HR. There is also an undeniable relationship between HR and FB (and to a lesser extent LD). The only rigtht way to do it is to model this relationship. If for example you do it as Pizza proposes it, then you need to have an additional function on the HR/FB rate that includes the K and BB rate. If you do it as Voros proposes it, you need to include the FB rate to apply to the HR rate.
This is how I’ve been doing MLEs & projections.
Start with a fixed number of plate appearances. First ddjust the hbp, bb & so. Then find how many balls hit fair are left. How many are homeruns? That leaves balls in play. How many of those went for hits? How many of those went for extra bases? How many of those were triples?
I believe one point Pizza was making is that each of these should be regressed by different amounts. We discussed this here a couple months ago while I was working on my projections. I agree, it is true, but I also found that if I use 150 PAs of regression for all components it just didn’t make that much of a difference. For example, babip never found a minimum rms even after 1000 PAs, but the slope was so gradual that the difference in rms between 150 and 1000 PAs wasn’t really consequential (at least for batters).
Heads up - I will be discussing how each of the components age on Thursday.