Tuesday, September 06, 2011
What WAR is… what WAR is not
a) WAR is wins above replacement.
b) WAR is a framework.
c) WAR presents the performance of a player into a single number.
d) WAR is limited to the data points it considers.
e) WAR is limited by the bias in the data.
f) WAR is not all-encompassing.
So, what does all that bullsh!t mean?
a)
In order to evaluate the performance of a player, we need to compare him to some baseline. The baseline that is chosen is what is typically considered a “bubble” player. Those players are the kind who sign minor league free agent deals in the off-season. That is, guys who any team can sign, at the lowest of possible costs, and still have a warm body playing in MLB.
b)
A framework means it’s a way to string the data together, to weave a story into a consistent process. For nonpitchers, it considers the following:
1. Hitting over average
2. Running over average
3a. Fielding over (positional) average
3b. Positional value (over a “neutral") position
4. Playing time
5. Anything else (Clutch, Heart, etc)
1. There is a high level of agreement as to how to measure the hitting performance of a player into a single number. The relationship of walks, hits, home runs, outs, with respect to runs and wins has a high level of agreement to those who research it.
2. There is a strong level of agreement as to how to measure the running performance.
3a. There is some level of agreement as to how to measure the fielding performance. The high level of agreement is focused on counting the number of plays a fielder actually makes. Then there is some level of disagreement as to how to measure the number of opportunities the fielder had to make a play.
3b. There is a strong level of agreement as to how to establish the positional value. It basically follows the Bill James fielding spectrum (DH, 1B, LF/RF, 2B/3B/CF, SS, C). Because there are few players who are capable of playing SS or C, and there are tons of players who can play 1B or in the corner OF, there is more value in shortstops and catchers. How much of a gap between 1B and SS/C has some level of agreement. The DH has its own special limitations, in addition to the question of positional scarcity.
4. There is a high level of agreement as to how much value to give a player for being able to play.
5. There is a high level of disagreement as to how much value to give a player for clutch or heart, or, more importantly, how to determine a player’s clutch or heart. You can use performance in clutch situations (late and close, or in a pennant race). You can use the eye test. You can use multiple ways. Or you can believe all of that is just chance. Or you may want to consider other facets of a player’s skillset, over and above what already leads to what we’ve counted for hitting and fielding. The discussion here centers on being able to indentify those characteristics, and, just as important, being able to apply it consistently.
For pitchers, it considers the following:
6. Runs Allowed
7. Non-fielding components (BB, HB, SO, HR)
8. Batted ball fielding components (hits, outs)
9. Non-batted ball fielding components (SB, CS, PK, BK, WP, PB)
10. Sequencing of events (performance by men on base, bases empty)
11. Starter / reliever performance
12. Innings pitched
13. Leverage
14. Clutch
15. Anything else
You can make a reasonable and justifiable case to include or ignore any of the above. There are great discussions that have been had and could be had in discussing this. It’s all a question of how much does the performance that we assign to a pitcher actually relate to the impact of that pitcher. If Cliff Lee (2010) has a .350 BABIP with runners on base, and .250 with the bases empty, is this really a reflection of Cliff Lee, or is it a reflection of his fielders? And if he maintains his K/BB ratio in both situations, does this help in determining how much that BABIP represents his impact? There are dozens of such questions.
c)
Once you’ve established each individual component of a player, and how it relates to his team winning, you add it up, so you can come up with an opinion. Because everything that is estimated is an estimate, it comes attached with a level of uncertainty. The confidence range of the estimate is going to be of some non-zero number.
d)
Sometimes, WAR ignores a statistic because it doesn’t know what to do with it. Other times, WAR ignores a statistic because it thinks that it’s not worth considering.
e)
There’s bias in all data, and the analyst tries to adjust the data to account for the bias. The bias can be obvious like parks or strength of schedule. Or it can be more subtle, like teammates (synergy, offense, pitching, fielding, bullpen support). Or it can be much more difficult to establish, like relying on stringers to plot the location of batted balls or technology to plot the location of pitched balls.
f)
WAR does its best to decide what stats are most worth considering. But it allows enough room for others to include additional parameters. WAR only goes as far as the data it uses allows it to go.
***
So, for all those who think they’ve figured out why WAR doesn’t work, sucks, or is otherwise unusable: that’s bullsh!t. WAR is a solid framework, logically constructed, and flexible enough that every single person out there can have his own personal WAR implementation. As I’ve already noted in the past:
We all come up with our “single number”, even though we kick and scream that we shouldn’t come up with a single number. If one guy argues that Felix is better than Lincecum, and the other argues the opposite, then guess what: they’ve each “smushed” a bunch of parameters, considerations and gut feelings to get to their final opinion.
I remember an old boss of mine deriding the idea of a spreadsheet that would take a bunch of factors into consideration to come up with everyone’s rating at the office, and, in turn, everyone’s salary. He said that he has to do everything on a case-by-case basis.
But, lost to him is that, in the end, everyone DOES get a final number: a salary. So, you can have a consistent process, that considers everything objective and subjective. Or, you can consider those same objective and subjective things, and smush them together in your mind on a case-by-case basis. You are STILL considering the exact same things.
The difference is that by going case-by-case you may be applying different weights to different parameters for different people as the mood strikes you. If you have a process, that doesn’t happen.
No one is telling you not to overweight or underweight strikeouts or HR. But a system requires you to spell out the rules for weighting, and apply that consistently to everyone.
The one good thing about the case-by-case basis is that it forces you to think about parameters. You’d like to ding Manny Ramirez a little, you’d like to up Jeter a little. So, you have to create a “heart” parameter. And that’s perfectly fine! Just spell it out that that’s what you are doing. And tell us how much you are giving to each player for heart. I have no problem with giving out wins for heart, over-and-above whatever his actual performance tells us. Just spell it out and be consistent.
Instead of coming up with what you think are the problems, come up with your own solution instead. And if you say you don’t have a solution, and yet you still have an opinion as to which players are the best performing, that’s called bullsh!t. You have thrown your hands in the air, proclaimed it’s not possible to have a consistent fair system, and yet still can argue about who is having the better season. So, stop with that bullsh!t, and instead, add value to the discussion by telling us how you would evaluate a player’s performance, and show us your logical and consistent system. Stop polluting the discussion, and instead, offer us some solution.