Monday, March 15, 2010
The Church of Baseball, part 2
My response to these 9 questions:
1. Why are we bestowing OPS with an aura of universal verity? Why are we still using it at all?…
I snipped the rest of the question, but the entire passage is bang-on. OPS and OPS+ should ONLY be used to get someone through the door. Using it in saber studies, giving it any kind of staying power is a bad thing to do. It’s a wrong metric often used at the wrong times that has biases that can doom a study. OPS should be the gateway drug.
2. Why can’t there be a left-handed shortstop?
There WERE LH shortstops in the 19th century. The net result is that a LH shortstop is going to cost you some 30 runs or so. Unless your choice is Endy Chavez v Frank Thomas, there’s simply way too much RH talent at SS to take the chance on a LH talent. That player may as well just make life easy for everyone and go to CF. We talked about it a few years ago, when I was developing WAR. It was one of my favorite topics, and really led to a key breakthrough in WAR.
3. Why don’t we incorporate base stealing in OPS (or the weighted OPS proposed in #1).
Well, OPS should not be given anything complicated to do. Adding SB would be a bad thing. If you need to do what you want, Total Average is a better stat. It adds up the total bases the player himself gets via walk, hit, steal, etc, and compares it to the outs he makes.
Total Average is as reliable as OPS. I think Total Average is a better gateway drug, but it never caught on given the chance in the 1980s. It may be because its deficiency is self-evident (walk = single), while the problems in OPS and OPS+ are masked by doing something your teacher told you to never do: adding fractions with different denominators.
4. Why aren’t there more “two-way” baseball players? Even at the very highest levels of amateur ball, many pitchers double as position players.
The potential gain is small, and the risk is huge. Basically, the payoff isn’t there for it to be attempted unless it’s with someone who has nothing to lose. We had a thread here a year ago that went through the process.
5. Is a strikeout by a batter more detrimental to his team than other types of outs, or isn’t it?
Good question. It’s barely more dentrimental, like .01 or .02 runs more costly than other types of outs.
...We typically treat BABIP for hitters also as if it were largely determined by luck.
That’s not true. Who says that?
...On the other hand, Edgar Martinez and Bobby Bonilla had the exact same number of at-bats (7213) and put the ball in play almost the same number of times (Edgar struck out 1202 times, Bonilla 1204). Their ISO is also similar (.204 to .193). Yet Edgar’s career BA is 33 points higher (.312 to .279), suggesting either an astronomical amount of luck or an ability to maintain an above-average BABIP.
He has the ability. I think he’s misinformed regarding BABIP for hitters.
Also, Edgar had nearly 400 more walks. I like the comparison to Bonilla actually. A great comparison frankly. Edgar had 150 more singles, some 80 more extra base hits (and over 200 fewer outs) and nearly 400 more walks. They both have the same kind of fielding+positional value. Really, I like this comparison in terms of highlighting Edgar. Bonilla is like a poor-man’s Edgar.
...What I’m getting at is that I sense a paradox in the current sabermetric thinking on this question. Either BABIP is random (in which case punchouts are devastating instances of lost opportunity which should be avoided like a plague) or it is determined by a particular skill like a consistent ability to hit more line drives (in which case we can no longer expect abnormalities to regress toward the mean). I don’t know which of these viewpoints is correct, but I know they cannot both be.
NOTHING is random. Nothing, nothing, nothing. There’s no such thing as 100% luck. None. Anyone who says that, feel free to remove his sabermetric card from his wallet.
The #1 job of a saberist is to tell you, the reader, HOW MUCH the observed metric is associated to the player’s skill. A pitcher’s BABIP has ALOT of noise. It’s not 100% noise. But, it’s alot. A hitter’s BABIP has alot of noise. Not ALOT, but alot.
I posted somewhere how much noise. For a hitter, it’s something like after 500 balls in play, half of it is luck. For a pitcher, it’s something like after 2000 or 3000 balls in play (I forget exactly), half of it is luck.
On the other hand, after less than 200 plate appearances, half of a hitter’s walk rate is luck. And after less than 100 balls in play, a pitcher’s tendency to give up groundballs is half-luck. So, you can see that we can tell VERY QUICKLY how much a pitcher is a groundball pitcher. It takes us a very very long time to figure out how good the pitcher is at preventing hits on balls in play.
So, every single component has its own reliability level.
6. Why do we have a run expectancy matrix showing the average number of runs scored from each base-out situation, but not one (that I can find) showing the probability of scoring at least one (or two or whatever may be needed in a particular scenario)?
You were sooooooooooo close. In the original question, the person actually linked to this page:
http://www.tangotiger.net/RE9902.html
For his specific question, it’s here:
http://www.tangotiger.net/RE9902score.html
It’s also in The Book. You can read it for free from Amazon.com’s Look Inside.
This disconnect explains why saberites so obstinately despise the sacrifice bunt, and also, I think, explains why we’ve been criticized for failing to ”actually watch a game.”
Actually, we have a 50-page chapter in The Book that supports alot of what a manager does and refutes a great deal, if not the entirety, of the sabermetric “wisdom” with regards to the bunt. And yes, the reason that MGL wrote such a groundbreaking piece of work could ONLY have happened because he watches a game. He says he watches some 200 games a year, which can only mean that he loves baseball, and he all his kids are grown up.
...The underlying sabermetric ethos has always been to examine questions in terms of runs scored or prevented, since that is what wins games. Why not carry that logic one step further and just think in terms of wins and losses?
Yes, yes, yes! The currency of baseball is wins, not runs. Runs is a useful proxy. But, you are 100% right that for these kinds of decisions, when it’s close, you should rely on win changes not run changes.
7. Why doesn’t sabermetrics devote more study to the process, rather than just the results? An example would be the controversy surrounding maple bats. Carlos Pena, a maple-swinger, said “It feels harder to me. And if I was to put a formula on it, I’d want the hardest wood possible, the one with the least amount of give. That’s just straight physics.”
I think Dr. Alan Nathan has done some great explanations on our blog on balls and bats, as well as on his site. I agree with the questioner’s general point. To study Alan requires alot of dedication. And after you do that, a month later, alot of what he wrote just… I don’t want to say it disappears from my head, but it has a hard time sticking there.
I suppose many people have the same thoughts in reading stuff we write. That’s ok, been there, done that. But, I’m not going to turn around and then say that Alan makes no sense! The best thing to say is: “I read it, and I tried to get it, but it hasn’t sunk in yet.”
It takes time…
8. What’s the deal with Win Probability Added? It’s a delicious little garnish to the traditional game summaries, but is there any substance behind it at all? Any metric that gives a team, down a run in the ninth with the bottom of its order facing a dominant closer, the same chance to win as another team, down a run in the ninth with three Silver Sluggers facing some bush-league junkballer, is obviously lacking.
I could give you the win expectancy for any situation you want. That’s hard to do though. The taste-test is to give you the standard ones, and then adjust it for the situation accordingly. Indeed, I showed this when it was Bonds being walked with the bases loaded to face Mayne.
So, yes, you are right. But, this is a limitation of programming and presentation (the implementation), not of the concept (the framework).
9. Why the mistrust and disdain of observations that contradict statistics? ... Matt Tolbert dropped a routine pop-up, which could happen to anyone on such a windy day, but got the putout anyway because the infield fly rule had been called. He dropped another one in the next inning, but again was fortunate enough to do so with a runner on first, and so was able to salvage the play as a fielder’s choice. His fielding stats actually improved on those two plays, but Ron Gardenhire will see right through them. When a third pop-up began its descent toward Tolbert’s glove, Alexi Casilla raced over from short and snatched it.
There is no disdain. Indeed, I’ve been running my Fans’ Scouting Reports for seven years and counting exactly because I WANT and TRUST those observations.
I’ve been pleading with MLBAM to record more. I want to know the hang time, I want to know how many hops a ball takes to the fielder. Strong wind? Want that. I want all the observations, because they all make up data. I want the data recorder to record all observations, however subjective they may be. Let me worry about bias. As the data analyst, my job is to try to make sense of the data recording.
So, it’s not a disdain at all that we have. Indeed, it’s the data recorders themselves that are hamstrung by the limited role they have been given. The NHL employs over half-a-dozen scorers to capture all the subjective things like hits, giveaways, takeaways, faceoffs, wrist/slap shots. MLB has less, even though it has twice the revenue stream. If it was me, I’d hire ten scorers for each game because I want that subjective data the questioner is asking.
My larger goal here is to bring the two factions (proponents of sabermetrics and of scouting, people living in basements and people like Don Zimmer) together in the hope that their mutual enmity can evolve into mutual respect.
Actually, there is mutual respect already at the scouting / stathead level.
These questions however do help in giving us a perspective from a third faction (the average fan), because the fourth faction (the media is really slow to catchup). If there’s a divide, it’s in the media, where the old guard has plenty of angst for what we do, while there’s the new guard that finds what we do fascinating. It’s really a media v media fight.
I am seriously concerned that some of its building blocks may not be as strong as they were thought to be. In case they do buckle, I urge all sabermetricians to stop being so self-righteous.
The building blocks I use are as strong as I think they are, because I think as much as I can prove it. Every model has an uncertainty level around it, and my convictions are, and should only be, as strong as the uncertainty level allows.
Even if the foundations of sabermetrics are strong, its essential objectivity can only see so much.
Right, exactly. It’s all based on the uncertainty level.
Like all collisions between science and art, the relationship between the two factions will remain tense until both realize that neither has a monopoly on the truth, and that the other has the complementary piece of the puzzle.
There is no tension between scouting and stat-heads. And, as I have said for many years, the pinnacle of sabermetrics is the convergence of performance analysis and scouting. Of this, I have no uncertainty.


"Why aren’t there more “two-way” baseball players? Even at the very highest levels of amateur ball, many pitchers double as position players.”
Not entirely related, but I have my own question. Could Babe Ruth had continued pitching if the DH rule had been in effect back then?