Tuesday, June 21, 2011
This fellow seems to have his heart in the right place. However, he’s all over the place in terms of trying to get a grasp of WAR, what it means, why Fangraphs and B-R.com are different, and a host of other puzzling statements.
I’ll try to get to these in the morning. I am thankful that he made his post, because I think there must be tons of people as confused as he is, and it gives me something to work with.
Plus, awesome blog name.
UPDATE:
I have successfully deleted this post twice already because of the amazing functions of undo and autosave. Regardless, I hope to incite a forum on some of the sabermetrics that are becoming more ubiquitous as time passes. I have read Tom Tango’s book showing how wOBA is better than AVG, OBP, OPS, etc.
Kind of an odd takeaway from The Book. But, that doesn’t seem to be the issue at hand, so let’s skip that.
Of course there are the constant stream of intermediaries that people use to calculate these statistics, but the one that I’m most hesitant of is WAR. For those that don’t know, WAR (Wins Above Replacement) is an all-encompassing statistic that essentially determines how much a given player is worth. This includes offensive and defensive analyses.
Rather than say how much a given player “is worth”, let’s say “WAR is the number of wins his past performance has been attributed to the player”.
I don’t like the idea of how “manufactured” the stat is because it’s essentially an average of an average of an average, etc.
I have no idea what average of an average means, nor the “etc” part. Let’s throw this sentence out the window . The blogger is trying to learn, but I think he’s reaching here for something.
And each statistic that is used in its calculation has limitations and assumptions, which aren’t usually discussed.
EVERY metric has limitations and assumptions, which aren’t usually discussed. OBP values a walk and HR equally. No one talks about this either. SLG has HR at 4 and single at 1, and that’s not discussed. Let’s not set a higher standard for WAR.
I see how it can describe how “valuable” a player was to his team last year, but can it really help when it comes to a player being traded or picked up?
Ah, excellent. Now, we have something to talk about. Can’t we say the same thing about OBP or ERA? By definition, every performance metric measures past performance. That’s what the stat is. If you want to know about the future value of the player, we need to INTERPRET that metric, be it WAR or any other metric.
First thing you have to figure out is: what is the metric actually trying to do.
Or can you simply add the WAR of each player on a team and predict the playoffs for the following year (and maybe the World Series teams)? I don’t think it can stretch that far.
No, you can’t do that.
The data I have below (which I can’t format well for the life of me) are total WAR for each time last year. Now, of course the better teams have better WARs since they were better. The reasoning is a bit circular which I think makes it robust for past analyses but not as useful for the future.
Right, if you are stuck on an unadjusted metric, it’s hard for it to be useful for the future. Same as any other metric.
Anyway, let’s look at them and see how well it did. The first table is from Baseball-Reference.com, and the second is from Fangraphs.com (WAR is also calculated differently at different places, another reason I’m not too high on it).
How well they “did”? Did at what?
As for different calculations: that’s why I call them rWAR and fWAR to show that they are in fact different calculations. They are part of the WAR family. Is it that hard to get past it?
I have them listed as batWAR, pitWAR, and Team WAR. These are the sum of the WARs for each individual position player (batWAR), pitcher (pitWAR), and collective team (Team WAR), respectively.
I can’t seem to write below these, so I apologize for any scrolling that’s necessary. If you look closely, there are some discrepancies. First off, the Fangraphs.com values are higher in general than the Baseball-Reference.com ones.
fWAR is higher than rWAR because fWAR uses a lower replacement level. There’s nothing wrong in either case. Just a reasonably justiable choice by both systems.
And Fangraphs had the Twins as the best team in baseball. Baseball-Reference had them 5th. Seems to be a decent drop.
Here is probably where the big difference rests: rWAR tries to account for all runs scored and allowed. fWAR does not do that. Basically, rWAR tries to apportion the luck to the players involved, while fWAR largely ignores the luck aspect.
It’s a choice.
Anyway, as a comparison sake, I would say that Baseball-Reference better encompassed the results of last year so I’ll talk about it mainly. I just wanted to show the difference between the sites.
To the extent that luck is a result, and you need to see that luck somewhere somehow, then rWAR would be the better choice. In this particular instance.
Something that first strikes me as interesting is that the Yankees had a better RAR than the Rays in both systems, but Tampa won the division. That seems to be interesting. I can see how WAR would fail when comparing teams that didn’t have much of an effect on the other, but to me, it seems odd the Tampa was not 1st in it’s division’s WAR from either site.
I don’t find that interesting at all, nor is it even a requirement of anything really. The Rays scored 23.6% more runs than they allowed. The Yankees were at 24.0%.
If rWAR or fWAR were more interested in capturing the luck of wins, then, sure, you’d have a case to make. But, that’s not what they are about.
Something impressive from BR (Baseball-Reference) is that the 8 playoff teams were in the top 9 in WAR. Only Boston (who was impressively 4th, meaning the AL East had 3 of the top 4 WAR teams last year) didn’t make the playoffs within the top 9 WAR teams. So, this measure pretty well “predicted” the playoff teams. FG (Fangraphs) didn’t do as well.
The use of predict here is very wrong. When you “predict”, you are making an estimate of a future event. In this case, fWAR is simply representing the runs scored and allowed by the team, and distributing it to the players. Obviously, the teams that make the playoffs will be predisposed to be those teams that score alot more runs than they allow.
This is another instance of the blogger wanting to learn, but it stuck on something that he should get out of.
Something else that’s interesting is that of the 8 playoff teams, the Giants had the best pitching WAR according to BR. Seems to coincide with the old belief that pitching is everything in the playoffs.
Again, he’s grasping. n=1.
Actually, if you look closer, within each series from the playoffs, the team with the better pitching WAR won the series. That makes me feel more comfortable about the statistic, but again, these calculation included the successful pitching of those teams so it’s circular. However, it does seem promising.
No, you should forget about all this. None of this is relevant in discussing WAR. It’s fun trivia, but ultimately meaningless in validating WAR.
But I would like to have people talk about these context-neutral statistics. WAR is normalized based on the replacement player of that year, so it’s supposed to comparable across time and leagues.
Eh, sorta-kinda. It compares players to that year’s baseline. Whether that baseline player is identical across time and leagues is debatable.
However, wouldn’t the context change if that player were to change teams? They would play around different defenders which can take away plays from them or cause problems. The new pitching staff could affect a players defense. The ballpark obviously has an effect. And you see different pitching more than likely changing your ability to hit to some degree. Does this not seem to matter?
The exact same thing can be said of any metric. Again, the blogger is grasping here, looking for chinks in an armor.
Also, WAR takes into account some form of fielding statistic, and all of the fielding statistics seem to be a bunch of magic.
Granted, they “seem” like a bunch of magic. But, they have a logical, rational basis.
I’m not saying I know a better way, but not much can be quantitative. Anyway, please respond with thoughts about these statistics and what you feel is successful and appropriate in many discussions. I just feel a bit hesitant, but maybe someone can help ease my discomfort.
Take care.
That first sentence is the key: if you want to discard WAR, and you STILL want to have an opinion, then what do you do? Well, you come up with your own flimsy, half-rational metric, without any internal consistencies. You’ll look at someone’s OBP and SLG, maybe his SB, look at his park, appyl some visual observations of their fielding and how they look at bat, see how his team did and say “Yeah, Ryan Howard is pretty good.” That’s really all you are going to do. And the more you try to do, the more rigid you make your system, the more consistent you try to make your ideas, the more logic you apply, well… congratulations, because you are on the path to WAR.
It’s almost like you don’t want to go to WAR, and are trying to figure out how to do it your own way. When your way is simply a circumventing of WAR. And eventually, the more you do the work, the more you realize that, “yup, that WAR is what I’ve been doing all along”.
Really, it’s not like I just came in and said: “This is WAR and this is how it’ll work.” This was a long process to get to where we are. And, if we have to change things, we will. This is not some religion. It’s a result.
And if you don’t want to use WAR, then use whatever else you want to use. But when you are challenged on logic and rationality, then, please, be kind enough to explain yourself. Don’t just say “this sucks” without offering an alternative. That’s what politicians do. Challenge the logic, and the rationale. That we can talk about.
Thursday, May 19, 2011
Ben approached me a while ago for a guest piece for BPro. I didn’t know what to write really, as I much prefer to not come up with an idea and then find my own answer. I’d rather someone else come up with an idea, and then I’ll be inspired to find an answer. So, I proposed that BPro readers ask whatever is on their mind, and then I’d answer them. It also has the benefit that the marketplace dictates where we actually are in this sabremetric conversation, rather than me dictating it. To that end, it gives me a chance to bring everyone up to speed. Or something like that. Anyway, I’ll answer the questions next week, but if you want to follow along to what they are asking, here it is. Lots of good questions off the bat already. I was hoping to get 10-15 good ones by the time commenting is closed next week, and there’s already five good ones just in the first hour.
I was asked to write a brief biography, as well as a preamble. In bullet form:
* I co-authored The Book—Playing the Percentages in Baseball and run a blog of the same name.
* I’m a heavy proponent of sabermetrics and especially enjoy discussions where both sides can move upward and onward to the next issue.
* If you have a summary opinion with no evidence, I will call you on it.
* In the “a lot” v. “alot” debate, I stand with “alot.” I also have a son and a dog, so I’ve got plenty of experience with bedtime stories and leashes. If you ask me about women, I can tell you the one, and only one, thing I’ve learned.
So, post your questions or thoughts below, and in a week or so, I’ll do my best to provide my comments. I know some of you think there are “too many numbers,” while others can’t get enough of this stuff. I’m very interested to find out what the typical Baseball Prospectus reader is thinking about regarding quantitative and qualitative analysis, as well as critical thinking. Even feel free to unload your exasperated thoughts, and maybe I can placate you to some degree.
Your turn.
Thursday, August 05, 2010
Ken is back for part 2. These are the ones I don’t believe:
25. I believe a player who comes in contact with a base prior to being tagged should be called safe, regardless of whether the throw beat him there.
In theory I believe that. In practice, where the ump doesn’t have things in slow-mo, you have dirt, and the sweep of the tag can happen on any part of the body, it’s just way too much to expect from a human being to make this call right a good portion of the time. Umpires are simply applying probability here. They know that when a throw beats the runner, 51%+ of the time the fielder applies the tag prior to the sliding runner reaching base. Unless the umpire can clearly see the play, he can and will call a routine-looking play as if it was routine. An umpire knows that if he has to call each play as if it was its own universe, he may get more than 50% of them wrong. They are just like managers, and you people out there: risk averse.
26.... I believe they need to protect their teammates if they’re being thrown at by opposing pitchers.
I prefer the hockey method, where, essentially, one player challenges the other to a duel. It’s an act of cowardice to do what pitchers do. That’s not to say it’s wrong. Sometimes, being a coward and throwing a ball at an innocent is the right thing to do to prevent an escalating conflict, especially if both sides expect that to be the way to end the conflict. So, is there a way to handle this in baseball, without it being cowardice?
Baseball’s greatest coward may have been Ben Christensen:
http://findarticles.com/p/articles/mi_m1208/is_27_223/ai_55198802/
We always say “does someone have to die to change the rules”. And the answer is: yes. Until then, we act as if what we are doing is quasi-dangerous, rather than playing with fire. (Yes, yes, I know, it’s “not the same thing”.)
It’s cowardice. As long as you believe what Ken believes is cowardice, but you accept it as needing to be done, then fine. I may also believe it is necessary.
However, the selection process to become a major-league pitcher ensures that the variation in this skill is much less than you might think – the hitability gap between Mariano Rivera and Nick Blackburn is tiny compared to that between Blackburn and, say, the best starter at your local community college.
The hittability gap may be tiny, but 75% of PA end up with BIP, so the small gaps add up. It’s like with hockey goalies, where the save percentages range from .890 to .930 (40 point gap). The BABIP gap is say .280 to .320 (also 40 point gap). In hockey, the goalies face about 28 shots per game. In baseball, pitchers give up about 27 BIP per 9 IP. The difference is that in hockey, goalies play 60 games, so they will face 1500-2000 shots per season, and so that small gap manifests itself for the season. In baseball, in order to give up 1500 BIP, you need to make about 90 starts (3 seasons).
Yes, not the same thing. Just giving you a different perspective on the matter.
However, I suspect (with admittedly no tangible evidence) that the extreme outliers in this data probably might be reasonably predictive of a given pitcher/hitter who truly does “own” a given hitter/pitcher, and it makes sense for a manager to use this information when selecting a player to use in a given situation, all other factors being somewhat equal.
I DID provide tangible evidence of the most extreme of the batter-pitcher matchups, and there is NO predictability.
38. I know it’s a bit of a chestnut, but I do believe that making consistent solid contact with a wooden bat on baseballs thrown by major-league pitchers who are trying to deceive you is the most difficult achievement in sports.
This is an oldie but a goodie. That’s bullsh!t isn’t it? If what baseball hitters are doing is the most difficult, then what baseball pitchers are doing is the easiest, right? Hockey goalies save 90% of the shots they face… man, that must be easy to be an NHL goalie. So, far, Ken’s list has been good, a justifiable list. But this one is just so Field of Dreamers that it’s just out of place here. It’s something that is said without thought to make sure that your favorite sport is placed at the top. How about the most difficult thing in sport is a soccer goalie trying to stop a penalty kick? Or an NHLer trying to score a goal?
I believe that teams are limiting veteran pitchers to many fewer innings than they can safely work.
I dunno. The most pitches thrown, in their careers, were made by pitchers of the Nolan Ryan generation (born 1942-1951), and then by the Clemens/Maddux generation (born 1962-1971). It seems to me that teams may be justified in their approach.
I believe the combination of money and wisdom in New York and Boston will keep both the Yankees and Red Sox from posting another losing season for the next, oh, let’s say 20 years
If Ken is taking bets, I’m willing to accept. Let’s say that Redsox and Yanks make sure to spend, and spend wisely, so they have a 90-win team each and every year. By luck alone, it’s almost certain that one of those 40 teams will win 80 games or less. Playing at 1.5 SD below your true mean will happen once every 16 tries. So, for any one season, Ken will be right 93.3% of the time. But, he needs to be right on 40 such rolls of the dice. Basically, you lose any time you throw two dice, and you get 1-2 or 2-1 (more or less). I’ll tell you, making 40 such rolls and not losing once will happen 6% of the time (.933 ^ 40).
***
42. I believe I personally owe a debt of gratitude to Rob Neyer for using his column at ESPN.com to initially fuel my interest in baseball analysis
Not a point of disagreement. I just want to take the time to thank Rob Neyer for continually linking to my blog, and to many other blogs, around the web. There’s no site around the web that generates as much referrals to this blog as does his blog. It is always an avalanche when he is kind enough to mention this blog.
He is easily the best spokesperson for saberists among the mainstream. Even when others are picking fights with him, or disagreeing with him, Rob rises above them all to accept when he’s wrong, or highlight why he is right.
Where do you see advanced metrics in 10 years? Fad? Major part of a front office operation? Replace traditional scouting?
Fad? You haven’t seen anything yet. Wait until PITCHf/x, FIELDf/x, and HITf/x take shape. You will wish and pray to get back to the simpler times of 2000s. The 2010s will bring an avalanche of data. It will absolutely be a major part of the front office. The best-case scenario is that you have all these f/x systems set up at colleges and high schools. Instead of one scout seeing one game of some prospect in one town, while missing a game on another town, you will have every single pitch charted, every swing charted, and every single fielder charted. The question is to try to identify all of the contributions of each player to each pitch and each play. Having a summary opinion without evidence is bullsh!t. Scouts have summary opinion on limited amount of data (say they see 5% of someone’s games in college). That’s valuable. Now, imagine having a summary opinion based on 100% of the data?
And no, it will never replace traditional scouting, because as I said, you will always need two lenses to your glasses. It will certainly make him more efficient. Instead of seeing 5% of each player’s games, maybe he will see 30% of the games that the f/x system is high on, and only 2% for the less-than-stellar players. It’s another tool they can add, in addition to their radar gun.
If the goal of this industry is not to advance it monetarily or its role in MLB, then why have it? What’s the point? It seems like a very time consuming hobby with little reward.
The hobby itself is its own reward. You may as well ask the millions of bloggers why they blog. Those things also consume time. Why do you go watch a movie? Why do you have dinner with friends? In those cases, you actually pay with money to get your reward. In this case, the payment is time. And, we are more than happy to give it, especially if others also give their time. We all benefit.
Making money and having a role in MLB is a byproduct. I wrote The Book, and I spent several hundred hours on it, if not 1000 hours. And I made less than minimum wage. Based on your line of thinking, I’m crazy and stupid. Yes, you are probably right. But, that doesn’t mean I shouldn’t have done it, nor does it mean that others did not get a bigger benefit of it than I did. Yes, I’d have loved it if we had sold a million copies rather than 1 percent of that, so I could turn my hobby into a full-time profession. But this is true of anyone who has a hobby. It’s ideal if your hobby and your living can merge. But you are not going to stop your hobby if you can’t make money out of it. Your hobby is all about trading time for enjoyment. My job is what I do. My hobby is who I am.
Recent comments
Older comments
Page 3 of 342 pages « First < 1 2 3 4 5 > Last »Complete Archive – By Category
Complete Archive – By Date