Thursday, November 20, 2008
New BBTN
Phil just posted two new editions. I will read them over lunch.
Buy The Book from Amazon
Phil just posted two new editions. I will read them over lunch.
Tom - Phil Birnbaum directed me to your comment, which I found very interesting. Your estimator is based on actual pitch counts while mine is based on box scores. Yours is bound to be more accurate. The point of mine was doing quick-and-dirty pitch count estimates from pre-Project Scoresheet (i.e., before 1983) data. It implicitly presumes that pitches per at bat never differ, which is likely false and probably the reason why our estimates differ so much (your “nonlinear terms” in the last equation, which I imagine represents something like differences in pitches per at bat in low vs. high pitch games).
In mine, I was surprised by the size of the constant (175), which dominates the impact of the predictors (hits/walks/strikeouts) in the estimates. Mark Pankin believes I erred in the computation; I divided the predictors by mean innings per game but not the pitch count, so the two are measured on totally differently sized scales. Technically, he is absolutely right, and I am planning on redoing the estimates when I have a bit of free time. There is no question that this will drastically shrink the size of the constant. However, then I will have to multiple the estimate by mean innings per game to get estimated pitches per game. We’ll see if it matters in the end. If it does, I’ll retract the earlier piece in BBTN and present the new numbers.
charlie
Actually, there’s nothing really to retract. You didn’t have a term for outs, ergo the constant of 175. You have 53.4 outs of which you didn’t account for. (The strikeout term is over and above that.)
So, 175/53.4 is 3.28
It is close to the expected value.
I think what would make the presentation clearer is if you used the same denominator for each term. For example, the 175 constant is based on per 53.4 outs. But your hits, walks, and K terms are based on per inning (3 outs).
If you scaled everything, as was my intent in my post, to the same thing (per event), then you might see it clearer.
***
All to say that your work is excellent and pretty much confirms my work, if the presentation was altered slightly.
You could for example run your sample date against your regression equation and my two pitch count estimators:
http://www.tangotiger.net/pitchCountEstimator.html
And, if you highlight any biases, that would be the most instructive thing.
Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season
Jan 09 02:33
Cheers
Jan 08 23:45
The first Hardball Times Annual available for download!
Jan 08 21:16
Line Drives
Jan 08 20:23
(recent) Historical WAR on Fangraphs
Jan 08 16:07
Clint Eastwood is Archie Bunker
Jan 08 16:06
Hardball Times Annual 2008, starring…
Jan 08 15:58
Madoff’s Ponzi
Jan 08 03:41
Valuing relievers
Jan 07 17:41
The latest in park factors
I liked the Morrow piece on the change in performance in component by date. Especially what happens to BB and K rates in September. I’d like to see the results without the September callups.
***
Harsh criticism of JC’s Sabernomics book. I disagree that it’s not worth the price, especially at under 6$ at Amazon. It’s a decent to good read.
***
Pavitt cites my work on pitch counts, and gives his, as pitches per game (where a game is 18 half innings):
175
+ 2.6 * H/g
+ 6.5 * BB/g
+ 2.0 * SO/g
That 175 represents the intercept for 54 outs per game, so, we can rewrite as:
Pitches per game
= 3.3 * outs/g
+ 2.6 * H/g
+ 6.5 * BB/g
+ 2.0 * SO/g
We can merge the outs/G and H/g as a weighted average of 3.1. So, we have:
Pitches
= 3.1 outs+hit
+ 6.5 * BB
+ 2.0 * SO
Or the same as:
Pitches
= 3.1 * outs+hit
+ 3.1 * BB
+ 3.4 * BB
+ 2.0 * SO
The outs term includes SO.
We can further merge it as:
Pitches
= 3.1 * outs+hit+BB
+ 3.4 * BB
+ 2.0 * SO
Which is the same as:
Pitches
= 3.1 * PA
+ 3.4 * BB
+ 2.0 * SO
Contrast that with mine:
Pitches
= 3.3 * PA
+ 2.2 * BB
+ 1.5 * SO
If Charlie’s does work out better, he is basically saying that what I have is correct, but you have to add an extra 1.2 pitches per walk, and 0.5 pitches per K, because it shows the pitcher goes deeper in the count.
Indeed, this is the whole point of the extended pitch count estimator, whereby the deeper in the count you go, the more pitches it takes you per batter.
To tie this in to the other thread, I would prefer seeing the presentation as:
= 3.3 * PA
+ 2.2 * BB
+ 1.5 * SO
+ 1.2 * BB
+ 0.5 * SO
Whereby the first group of parameters shows the average pitches per event (easy enough to calculate given any dataset), and the second group shows the non-linearity of the events.
Personally, I find this alot clearer than a regression equation which has some truth in it (4.8 pitches thrown per strikeout) and some proxy in it (an extra 0.5 pitches per strikeout to account for the “going deep").