Friday, February 25, 2011
Scorecasting review, part II
I wrote this to the authors on their web site:
Dear Sirs:
I am a professional (having worked for several MLB teams, notably the Cardinals in 2004 and 2005) sabermetrician and I have been working extensively in this field for over 20 years. I am the “inventor” of one of the most widely used advanced defensive metrics, UZR, I am one of the co-authors of the sabermetric book, “The Book,” and I host, with one of my colleagues (Tom Tango), a popular and highly respected sabermetric blog, http://www.insidethebook.com (click on the “blog” link).
I recently read your new book, Scorecasting, and I liked it very much. It was well-written, clearly presented, and well-researched, as far as I can tell. I believe it has broken some new and important ground as well. I have recommended it to many of my colleagues and to our (blog) readers.
I was particularly interested in your baseball research of course, especially that pertaining to home field advantage (HFA), a topic that I am not unfamiliar with. While it has been researched some over the years, it is admittedly one area that we (sabermetricians) know comparatively little about.
While you provided some well-researched, eye-opening insight into the role (both in quality and quantity) that umpires may play in the HFA in baseball (and other sports), I must say that some of your research in that area conflicts with similar research I conducted recently, after reading your book.
I would appreciate it if you could take the time to read my comments below (they are edited and reprinted from my blog) and address each issue as you see fit to do.
Kindest regards,
Mitchel G. Lichtman
To recap part of what I wrote in Part I, the authors in Scorecasting said this:
To test our theory, we first compared all pitches, about 5.5 million of them, from 2002-2008 made in stadiums using QuesTec versus those without it. For example, we looked at all called pitches when the Astros visited the Cardinals (at their non-Questec stadium) and when the Cardinals visited the Astros (at their Questec equipped stadium).
What did we find? Called strikes and balls went the home team’s way, but only in stadiums without Questec, that is, ballparks where umpires were not being monitored. This is consistent with an umpire bias toward the home team causing the strike-ball discrepancy. We also found something surprising. Not only did umpires not favor the home team on strike and ball calls when Questec was watching them, they actually gave more strikes and fewer balls to the home team. In short, when umpires knew they were being monitored, home field advantage on balls and strikes didn’t simply vanish; the advantage swung all the way to the visiting team.
Basically, they say that the home team bias exists in non-QT parks in 02-08, but that in QT parks, it was reversed – there was a road team bias.
My data do not support that claim, using the same 11 QT parks that they did (they listed them in the footnotes) in the same years (02-08). I found a .58% bias in favor of the home team in QT parks and a .46% home bias in non-QT parks. So not only was the bias was not reversed in QT parks, as the authors assert, but it is higher (in favor of the home team, as always).
Now, as it turns out, comparing the home/road differentials in QT and non-QT parks is not fair anyway, in terms of isolating the effects of the park. The reason should be obvious and the authors, at least one of them, should have addressed this issue.
When you compare the H/R bias in QT parks compared to non-QT parks, you are comparing the effects of the park and the effects of the QT teams.
Imagine that all the QT teams had league average batters but pitchers who threw many more strikes (per pitch) than league-average pitchers. And let’s say that the park had no influence on the umpires or anyone else. Well, in QT parks, the H/R differential would be enormous, because it would be driven by the home team pitchers who threw lots of strikes. In non-QT parks, in games where the QT teams were not involved, the H/R differential would be normal (the teams would not affect the numbers) and in games where the road team was a QT team, the H/R differential would be greatly reduced or reversed, because the road pitchers would be strike-throwing pitchers.
So the bottom line is that if we want to see if the park (QT or non-QT) has any effect on the differential, we can’t just look at one subset of parks (like QT parks) and compare them to another subset (like non-QT). We have to control for the teams. They can’t do it and I can’t do it. So their finding (that the differential is reversed in QT parks) doesn’t mean anything and neither does mine (that the differential is larger in QT parks) until we control for the teams.
One way to do that, at the risk of losing some sample size, is to just look at QT teams in QT parks and compare that to non-QT teams in non-QT parks. That will tell us the true effect of the park, more or less, since the home and road teams in both sets of data are essentially the same.
Here is that data:
QT v. QT teams
H: .323
R: .319
.4% home bias.
Non-QT v. non-QT teams
H: .318
R: .312
.6% home bias.
So this time, the QT parks do indeed have a smaller home bias, but by no means is it reversed, as the authors claim.
(BTW, as it turns out, the QT teams are not league-average when it comes to the called strike percentage of their pitchers and their batters. Their pitchers have a .6% advantage compared to non-QT pitchers, and their batters have a .3% disadvantage, again, as compared to non-QT batters. So the QT teams have a net advantage of .3% in called strike percentage, whether they are the home or road team and regardless of what park the play in.)
To recap the leverage issue that I discussed earlier and then expand on it, the authors assert that the home bias, and I’ll sometimes call that umpire bias, even though I am not certain how much of it, if any, is due to the umpire, is much larger in high leverage situations (when the game is on the line) than in low leverage situations (when the outcome is not too much in doubt).
I also found that to be the case using my original, simple definition of low and high leverage. Here is my original data for 02-08:
HL
.74%
LL
.24% (I originally posted .18%, which was wrong)
So I found more than 3 times the called strike differential in HL situations than LL ones.
I reran the data using a much more rigorous definition of high and low leverage. This time I used Tango’s complete LI chart (rpg, inning, relative score (home v. away), bases, and outs) to separate all PA into low and high leverage ones.
The Scorecasting authors did not say what criteria they used and unless they licensed Tango’s charts (which I don’t think they did) I’m not sure how they would do it, unless they duplicated his work. They did reference Tango when talking about LI though.
Anyway, I defined low leverage (LL) as any PA where the LI was less than .1. That seems low, but around 8% of all PA were LL.
High leverage (HL) PA were defined as LI > 2.0 and constituted about 9.5% of all PA).
I think that is a pretty fair, albeit arbitrary (and round), division. Again, I have no idea how that scheme compares to the authors’.
Remember that one of the authors’ central theses in the chapter on baseball HFA is that most of it is due to umpire bias, especially in ball and strike calls. Furthermore, they say that because of the psychology of this bias, they are much more biased (in favor of the home team) in HL situations when the game is on the line, and much less biased in low leverage situations, where it doesn’t really matter (so they might as well just call them as they see them).
Of course you could create the reverse argument from a psychological standpoint - that the umpires favor the home team and the crowd, but only when it doesn’t influence the outcome of the game. But that is neither here nor there, and I am certainly no expert in prospect theory or cognitive or behavioral psychology.
The authors claim in their book that in low (and medium) leverage situations, there is no home/road bias in terms of the percentage of called strikes (of all pitches not swung at). In fact, if you look at their “Difference in Percentage of Called Pitches that Are Called Strikes on Home v. Away Batters,” you will see that in low and medium leverage situations (however they define them), the bias is actually reversed in favor of the road team.
Here is what they said about HL and LL situations and umpire bias (my comments are in parentheses):
…in low leverage situations, when the game is not much in doubt, the home team advantage in receiving fewer called strikes and more balls (they mean to the batters, BTW) goes away (it actually reverses). But as the following chart shows, the called strike advantage for home teams grows considerably as the game situation gets more and more important.
Again, in my initial look, where I used a gross definition of high and low leverage, I found that from 02-08 there was a .74 home bias in HL situations and a .24% positive difference in LL situations. While this is a large difference, these numbers contradict the authors’ claim that there is a reverse bias in low and even medium leverage situations.
The difference between my old results and theirs could easily be due to our definitions of low leverage. As I said before, because my initial criteria were so coarse, my low leverage PA included lots of actual higher leverage situations.
With my new definition, that is not the case. Low leverage was any PA with a LI of less than .1. So what was the home/road called strike differential in this pass (in all parks from 02-08)? It was .3%, a little larger than when I used the much simpler definition of high and low leverage. So again, I have to dispute the authors’ claim that the bias “disappears in low leverage situations.” I say that with some trepidation because I don’t know how they bifurcated the PA (LL and HL) and I am not quite sure of the rest of their methodology as well as the integrity of their and my pitch database. (Plus, either one of us could have made one or more computation errors.)
What about the new high leverage differential? The old one was .74%. The new one was 1.1%, 50% higher. So their claim that the home/road bias is much higher in HL situations is once again confirmed. Interestingly, in their chart, in “very crucial situations” they have the bias at only around .55%.
As I said, I am not 100% convinced that most or all of the difference is due to umpire bias.
What about their Questec claims with regard to leverage? They are pretty bold. They say that in non-QT parks, where they know they are not being monitored, the home bias is large in HL situations, and that in QT parks, where they are being evaluated, the bias in HL is actually reversed, that is, the road team gets favored. (In the last thread, I mis-reported what they said. I stated that they asserted that in QT parks, in HL situations, the bias was smaller than in non-QT parks, and that in LL situations, the bias was reversed. In fact, they say nothing about the difference between QT and non-QT parks in LL situations.)
Here is exactly what they say in their book:
That is, when the game is on the line, home teams in non-QT stadiums get a big strike-ball call advantage and those in QT stadiums get a huge strike-ball disadvantage.
As I said before, I would be skeptical of those claims off the top of my head, especially the last one (reverse bias), but the data will speak for themselves. We are not dealing with anything subjective here on the part of the researchers or me, other than perhaps their choice of words, like “large”, etc.
Using my rigorous criteria for high and low leverage, albeit likely different than the authors’ (unknown criteria), in HL situations in QT parks, I get 1.3%, which is higher than the overall 1.1% (all parks).
In non-QT parks, in HL situations, the home bias is .9%, as you would expect (after subtracting the 1.3% weighted by 11 parks from the overall 1.1% weighted by 30 parks) and after rounding.
So there is no reverse bias in the QT parks as I suspected. Not even close. In fact, the home bias is larger (likely due to the fact that QT teams have a .3% called strike bias to begin with, as compared to non-QT teams, and they are always the home teams in QT parks of course). I cannot imagine how the authors came up with this result from the data. There does not seem to be any justification for it, no matter what their definition of high and low leverage situations is.
I can only assume that they accidentally reversed the home and road called strike percentages. In fact, I ran the numbers twice just to make sure that I didn’t make the same mistake.
BTW, in LL situations, QT parks had a home bias of .5% and it was .3% in non-QT parks, but again, the difference is likely due to the teams and not the parks. The “real” difference (between QT and non-QT parks in LL situations), after adjusting for the teams, is probably around the same.
As I mentioned before (in case you didn’t read the last post or forgot it), I believe that the authors make another error in the same section on leverage and Questec. They say:
In practical terms, when an umpire is not being monitored by QT, a home batter in crucial game situations will get a called strike only 32% of the time if he doesn’t swing. In the same situation, a batter from a visiting team gets a called strike 39% of the time.
Those numbers appear to be wrong. In their own charts, the authors show the H/R differences to be on the order of less than 1% and not 7%. In fact, I think they mean something like 30.2% and 30.9% and somehow it got printed as 32% and 39% - or some such thing.
Then they go on to say:
Now consider the same two situations when the umpire is being monitored by QT. Here the home batter gets a called strike 43% of the time, and the away batter only 35% of the time.
Again, those numbers appear to be wrong, as all of the called strike percentages are on the order of 30% to around 32%, and the differences between home and road are only as large as around 1.3% (not 7 or 8%), according to my data and analysis. Plus, they have the home and road effect reversed, as far as I can tell from my data output, as I mention above (the home team always has the advantage whether it is a QT park or not – small in LL situations, and large in HL situations – there is no “huge reverse bias”).


It used to seem a little bit pretentious when sabermetrics was described as a “science”. Technically, I guess that it fit the definition (which is kinda a flexible definition, anyway), but it seemed a little bit too haughty and even pretentious a word.
This, though, is exactly what I like to think of when I hear the term “peer review” - other scientists, hobbyists, and enthusiasts having their B.S. antenna raised and then, driven by a dedication to honesty, investigating the issue with rigorous dedication to empirical facts.
In fact, I think that this sort of article is BETTER than much modern science because I know that many issues are so politicized and/or corporatized that the ultra hyped “peer review” process has been shown to be totally ineffective and is, in many scientific studies, just a rubber stamp.
So, anyway, I think that this little investigation is a great testimony for sabermetrics and I hope that (a) you continue your scrutiny of Scorecasters and (b) you continue to publicize your work.
If the authors don’t respond honestly and completely, then I think that you should write strong letters to their publisher. I was loosely involved with the retraction of a fraudulent book by the scumbag “scientist” Charles Pellegrino and we had to send faxes and make calls to the publisher and lots of mainstream media outlets.
(There’s a small formatting error in the “quote box” near the top of your article when you’re quoting from the book. One of the quoted paragraphs isn’t in the box. No biggie.)