Tuesday, January 05, 2010
Lego-metrics
This is a numbers/toy-driven post. There is no insight here other than seeing how you can play with numbers in a meaningful way. That is, I show that when you add and multiply and divide numbers, there’s a reason for it. I also talk about a new Bill James stat.
We all know FIP (fielding independent pitching). It’s 13*HR plus 3*BB minus 2*SO, all divided by IP. We add some constant, like 3.2, to put it onto an ERA scale. And, instead of BB, you can add hit batters to that and subtract intentional walks. Pretty straight forward. And it works darn well. It has its genesis in DIPS, a term which has been supplanted by FIP to my surprise. Though in retrospect, DIPS is complicated, FIP is simple, and FIP has a correlation of r=.99 to DIPS. So, occam’s razor and all that.
A lesser known metric is kwERA, which is ERA based only on BB (walks) and SO (k, strikeouts). Hence the “kw” in kwERA. I’ve sometimes called it szERA (for strike zone). The formula for that is even simpler: (SO-BB)/PA. Multiply it by 12, and subtract it from 5.40. That’s it. A guy with as many SO as BB will have an ERA of 5.4. Again, works real nice. Credit to GuyM for the genesis. Anyway, all that was brought up a few (maybe several) years ago. The differential works FAR better than the ratio.
And we also know that a quick way to convert runs into wins is to divide the runs by 10. So, a guy who allows 4 runs when he gets 5 runs of support (that’s +1 for him) will win .600 of his games (that’s +1/10, or +.100 added to the average .500).
Let’s try to convert kwERA into wins and losses based on these principles. And, I’m going to use 4.3 PA per inning pitched, as well as 4.30 runs per game as the league average ERA. Ready?
wins per game
= .500 + (lgERA - kwERA)/10
= .500 + 4.30/10 - kwERA/10
= .500 + .430 - (5.4 - 12*(SO-BB)/PA) /10
= .500 + .430 - .54 + 12*(SO-BB)/(IP*4.3)/10
= .390 + .28*(SO-BB)/IP
So, a guy with as many strikeouts as walks will have a .390 win percentage. A guy with 8 K and 2 walks in 8 innings will have a .600 win percentage.
This is where we are right now:
win% = .390 + .28*(SO-BB)/IP
In order to convert that to wins and losses, you simply multiply by IP/9 to get wins. So, a pitcher with 225 innings has 25 decisions, and so he has 15 wins. And 10 losses.
The wins therefore is win% * IP/9. Which, if we apply it above:
wins
= IP/9 * win%
= IP/9 * (.390 + .28*(SO-BB)/IP)
= .043*IP + .031*(SO-BB)
Bill James proposed something interesting:
http://www.billjamesonline.net/ArticleContent.aspx?AID=1102&Code=James01043
wins = SO/13, where 13 is whatever the league average per 18 innings
losses = BB/6.5, where 6.5 is again the league average per 18 innings
wins = SO/13 = .077*SO
So, if you compare to what I have, .043*IP + .031*(SO-BB), we see the differences. In the Bill James contruction, his idea is to specifically focus on SO in order to give it a “wins” feel. Nothing wrong with that.
Suppose that we want to look at it strictly from a “SO” feel. In that case, we turn the BB into IP by figuring 3 walks per 9 innings, so we go to here:
wins
= .043*IP + .031*(SO-BB)
= .043*IP + .031*(SO-IP/3)
= .033*IP + .031*SO
And if you turn IP into SO by taking IP*2/3, you get:
wins
= .033*(SO*3/2) + .031*SO
= .080*SO
Rounding issues aside, we can see that we have pretty good support for James’ .077*SO = wins. I’d have thought we’d need some constant in there. But, the way we linked BB and IP to SO basically let us get away with that.
Here’s how losses works for kwERA:
losses
= IP/9 * (1- win%)
= IP/9 * (1 - .390 - .28*(SO-BB)/IP)
= .068*IP - .031*(SO-BB)
Bill has it as: losses = BB/6.5 = .15 * BB
Doing the same thing, turning SO into BB, by doing SO/2 and turning IP into walks by doing IP/3 and we get:
losses
= .068*IP - .031*(SO-BB)
= .068*(BB*3) - .031*(BB*2-BB)
= .17*BB
Again, forget about the rounding issues.
So, I think Bill’s come up with a pretty neat way of converting SO into W and BB into L in a way that’s clean, simple, and has some basis back into kwERA. It’s a pretty good toy I must say.
I would not go far with it. As I noted, it works because of the relationship we’ve built into it to get the shortcuts to work. kwERA is what you want if you want to take it a step further. But, I really like his toy.
He calls it “Strike Zone Won-Lost”, or szW and szL is how I saw it, which made me think of szERA (later kwERA).
I therefore give my support to James’ construction, which is ridiculously easy to calculate, as noted:
wins = SO/13, where 13 is whatever the league average per 18 innings
losses = BB/6.5, where 6.5 is again the league average per 18 innings
And, you might as well do BB-IBB+HBP. You just have to figure the league average every year.
You could also, if you like, do it at the TEAM level. If, for example, a team has 600 walks+hit batters, and 60 losses, then you can divide a pitchers walks by 10, instead of 6.5. This sets his K/BB into a W/L record that INCLUDES the team effect. You could, therefore, compare his actual W/L record to what his SO/BB record would have implied. Again, just a toy here, but it’s a fun toy. It’s the kind of toy that would lead you to kwERA eventually. Or, you could just as well stick with the szW and szL from the toy.
This is the fun part of sabermetrics, where there’s a bit of meaning behind the numbers, a bit of logic. And, you can start to learn some things. Lego-metrics?
Good job on Bill.
Isn’t replacement level about .390? The formula win% = .390 + .28*(SO-BB)/IP implies (in broad brush strokes) that a replacement pitcher strikes out the same number of hitters that he walks. Does that jive with the known SO-BB tendency of replacement pitchers?
Please correct me if I am mistaken.