Tuesday, October 03, 2006
And in this corner…
Phil Birnbaum lets loose and restrains himself on his blog. He uses J.C. as his foil, which undoubtedly may make it seem like a personal attack, rather than an overall us-v-them professional gripe.
Buy The Book from Amazon
Phil Birnbaum lets loose and restrains himself on his blog. He uses J.C. as his foil, which undoubtedly may make it seem like a personal attack, rather than an overall us-v-them professional gripe.
The two model works out there are by Allen/Hsu (pdf) and by MGL.
Any DIPS-related article that does not explicity mention either of the above two articles is in my opinion doing a grave disservice.
This little piece by me is also interesting:
http://www.tangotiger.net/dipsbands.html
***
And, you should not be a believer. J.C. has stated that he is a believer, which is a strange position to take, in light of the above.
Tango,
Thanks for the disclaimer. I hope my post isn’t interpreted as a personal screed against JC, whose work I respect and enjoy.
My point was to complain about the tendency of academics to ignore (or be unaware of) our work, and JC’s paper came to my notice as a particularly explicit example of that.
Phil
At the end of the BTF thread, were two excellent posts.
The first is by Gaelen, which I will copy here:
So here are things we know:
1) We know that some pitchers have an ability to suppress hits on balls in play and that this ability follows a normal age curve. (the Greg Maddux rule)
2) We know that minor league pitchers allow more hits on balls in play than major league pitchers. (the Kyle Snyder rule)
3) We know that when pitchers are injured or recovering from an injury they allow more hits on balls in play. (the Curt Schilling rule)
4) We know that when formerly good pitchers are done they allow more hits on balls in play. (the Kevin Brown rule)
5) We know that when pitchers are tired they allow more hits on balls in play. (the Pedro Martinez rule)
6) We know that when pitchers who rely on command are less than perfect they allow more hits on balls in play. (the Josh Towers rule)
7) We know that pitchers pitch differently with men on base than with the bases empty and this effects hits on ball in play. (the Tom Glavine rule)
Ok, I don’t actually “know” any of these things because I haven’t done any “studies” to “prove” them. Nonetheless . . .
Change the “we know that” to “we should find out if”, and Gaelan is spot-on. There’s a world of parameters to consider, and coming to any hard conclusion on DIPS, without considering everything, is really a dead-end. Each of Gaelen’s points deserves its own study, and I hope there’s some enterprising researchers out there willing to do the grunt work.
***
The other excellent post was by David, which I’ll copy here:
Alright, I’ll write this up later, as I’m running late to class right now, but I did a little study just now. I took all pitcher seasons since 1921 and divided them into even and odd years. I then tallied up each pitcher’s career totals for both halves, and looked only at pitchers with at least 1,000 BIP in both the even and odd years. There were 1,157 such pitchers. I then ran a regression in which BABIP in even years was the dependent variable and BABIP, K/BFP, BB/BFP, HR/BFP, and HBP/BFP were independent. The results were as follows:
Constant = .137 (p = .000)
BABIP = .532 (p = .000)
HR/G = -.002 (p = .267)
BB/G = -.002 (p = .004)
K/G = .000 (p = .542)
HBP/G = .009 (p = .009)r = .557 (r^2 = .310)
*note: I used per game instead of per BFP notation. That simply means I multiplied those numbers by 38.5. So K/G = K/BFP*38.5.
So here’s what we’ve got:
(a) Controlling for defensive independent variables, BABIP is still highly, highly, highly significant. It’s more than half the equation.
(b) Unlike in JC’s THT study, strikeouts and home runs are NOT significant. The reason JC found such a significant effect for K/9 is because all other things being equal K/9 are inversely related with BABIP. The higher your BABIP, the more strikeouts you’re going to get per 27 outs because your fielders aren’t making outs. You must use K/BFP, especially with a one-year sample when BABIP can vary a lot from pitcher to pitcher. For HR/9, it could be the same effect, or it could be because pitchers with high HR-rates often have fluky HR/Fly ball rates, so more of their fly balls stay in the park the next season, and fly balls are converted into outs a very high percentage of the time.
(c) HBP and BB are significant but in opposite directions (more hit batsmen means a higher BABIP, more walks means a lower BABIP). This one I need to think about for a moment.
J.C. noted in a subsequent post that David should have controlled for aging. Guy quickly corrected J.C. that, because David used the even-odd year method, that controlling for aging is irrelevant. And, this method is a fabulous way to jump your sample size enormously. Woolner followed this method when tackling DIPS a few years ago.
Oh, and the other thing is, absolutely, you need to do K per BFP and not K per IP. What does IP represent? Essentially it’s K + batted ball outs. So, K per IP is, essentially, K per (K + batted ball outs). What in the world do we want that for, in a DIPS study?!? We are trying to figure out if the batted ball outs belong to the pitcher or fielder (or both). K per IP is a terrible measure to look at.
So many people have been conditioned to look at “per IP”, without realizing when it’s good, and when it’s not. For a DIPS study, it’s not good ever.
***
(I realize that FIP uses per IP, but, that’s just part of the little trick of FIP, plus IP is much more readily available than PA. If you were to use FIP for a DIPS study, you would need to create a version that was based on BFP, not IP, like szERA.)
To add to Gaelan list:
8 - We know that more outs are made on flyballs than ground balls, and fewest outs on line drives. We know that pitchers have an enormous influence on whether they allow FB or GB (Lowe v Zito).
9 - We know that FB-heavy pitchers get more outs on FB, and that GB-heavy pitchers get more outs on GB, than their counterpart.
10 - We know that FB pitchers give up far more 2b+3b than GB pitchers.
11 - We know that GB pitchers gets more GIDP than FB pitchers.
12 - We know that closers get far more outs on balls in play than other pitchers.
***
Even if you want to call everything in Gaelan and my list as “maybe”, it’s apparent that pitchers have an enormous influence of the trajectory of the batted ball, and some influence on whether that ball is caught or not. The question is always “how much is this impact?”. And the impact to look at is not only on outs per BIP, but more important, the run value per BIP, because of the extra base hits and DP.
It could very well be that all these things come out fairly even (at the MLB level).
As for the David study, it only shows that BABIP correlates in years x and x+1. It doesn’t show why it does so. BABIP is a product of a pitcher, park, fielders, and luck. Looking at pitchers who switched teams within the same league would probably be the best way to neutralize the park and fielders. Of course, you then lose a huge sample.
Here’s a quick study I did. I took all pitchers:
- from 1946-2005
- who played on more than one team the same season
- both in the same league
- allowed at least 50 BIP
This gave me a list of 592 pitchers. The average BIP in the first stint was 119, and in the second stint was 131.
The BABIP in the first stint was .291 and .280 in the second stint. This likely shows a bias, that pitchers who are moved in-season probably were not having a good season (bad luck), and reverted to form with the new team.
I then computed a z-Score for each pitcher, in each stint. For example, Andy Ashby in 1993 went from 3.8 SD above the population mean, to 0.6 SD above the mean with the new team.
Running a regression against the z-scores, and I get an r=.10. Remember, the BIP size was around 125 for each sample. Our regression toward the mean equation therefore is:
x/(x+PA), where x=1250
If you have 125 BIP, you regress 90% toward the mean.
Therefore, you need 1250 BIP (two full seasons for a starter) to be able to regress 50% toward the mean.
For comparison, you need 200 PA (two months) for a batter to regress his OBP by 50%.
Extending it back to 1901, I have 864 pitchers. The r is now .17, with about 135 BIP for each pitcher. Now, the “x” in the regression equation is 660.
I gotta run right now, and I’ll continue tomorrow. It’s very possible that in the olden days, when Ks were very low, that pitchers needed to get outs on BIP, and there was a bigger talent disparity there (as opposed to these days where the talent disparity is on K).
Did you park adjust your stats?
You should also compare the pitchers with the year before.
Why would I need to park adjust? That’s the point of selecting pitchers in different teams, that the parks and fielders are neutralized.
And why would I need to look at the year before? Again, I’m comparing the same pitcher in the same year in different parks. That takes care of age, fielders, and park.
It increases the level of uncertainty (introducing randomness), but it doesn’t bias the results.
J.C. recommends certain books for the aspiring sabermetrician.
I added Andy’s Appendix in THE BOOK to his list.
Andy has recommended Numerical Recipes to us, and he said we can’t pry it away from him.
Leaving aside whether the Bradbury paper gave adequate citation of non-academic work on DIPS, what do people think of the paper? To me, the second half on compensation doesn’t really support the conclusion he reaches (that the market rewards DIPS performance, while properly ignoring BABIP). I made these observations at BTF:
“The year-to-year focus also makes the labor market analysis here pretty questionable. If I’m reading it correctly, he uses year 1 performance data to predict year 2 salary. But players don’t generally sign a series of one-year contracts. So even if teams DID overestimate the importance of BABIP/ERA, you’d have to look at a pitcher’s performance in their pre-contract year (or two years) to find it. Not to mention that teams obviously take account of a pitchers’ CAREER performance (including even minor lgs) to determine his value. So why in the world would anyone build a model to predict salary based only on one year of data? ... It would be really interesting to know whether teams overestimate the importance of ERA, wins, etc., but unfortunately this analysis can’t answer that.”
To answer this question, I think you would need to make the dependent variable the annual value of a player’s current contract, and the independent variables his performance PRIOR to signing that contract—career stats and also perhaps the pre-contract year.
Other than the two studies mentioned in comment 2, are there others I should read to get a feel for current thinking on DIPS?
Phil
Tippett: http://www.diamond-mind.com/articles/ipavg2.htm
Woolner (Search BPro archives)
Davenport (ditto)
I did this recently:
http://www.insidethebook.com/ee/index.php/site/comments/spread_in_talent_pitching/
***
In addition to the Tippett article is his followup (3rd blog entry down)
http://www.diamond-mind.com/weblog/2003_07_27_archive.htm
***
Here’s the excellent Woolner article:
http://www.baseballprospectus.com/article.php?articleid=883
***
And finally, an extensive primer into the history of DIPS, along with an excellent bibliography, can be found here:
Thanks, Guy and Tango.
Here’s Davenport: http://www.baseballprospectus.com/article.php?articleid=3946.
Also, look at Gassko’s “DIPS 3.0” work at THT. Bradbury has also written about DIPS there.
Continuing my study, I now only focused on 1901-1918. I dropped the BIP requirement to a measly 20 in each stint.
I have 119 pitchers, with an r of .21. The average number of BIP is 116. (Note, my BIP average is really 1/avg(1/BIP1 + 1/BIP2 + ... 1/BIPn). The “x” in the regression equation is a very low 440. If I put my requirement to a min of 50 BIP, my r is .19, and average BIP is 177. The “x” in the regression equation is 750. This is the average of the 1901-2005 dataset. The r is .16, if I drop the min BIP to 100 (70 pitchers). And with a min BIP of 150, average of 265, 53 pitchers, the r is .23, giving us an “x” of 900.
And more finally, using a “modified” version of PA, the 50 pitchers with the most PA gives me an r of .29, with an average BIP of 278, or an “x” of 680.
***
Going with the modified version of PA, I then go back to the 1901-2005 dataset. I select the 300 pitchers with the most PA who played with multiple teams in the same season. Their average PA is 260 for each team is 260. The r is .18, giving us an “x” of 1200.
Also, their “before” BABIP was .292 and their after BABIP was .279. The biggest switcheroo happened to this guy in 1967:
http://www.baseball-reference.com/b/barbest01.shtml
While his ERA shows remarkable consistency, just look at his hits!
For an even more remarkable story, look no further than last year’s Ohka:
http://www.baseball-reference.com/o/ohkato01.shtml
His BABIP went from .211 to .310! However, his K/BB rate, when he was on BABIP-fire was 17/26! That’s right, fewer K than BB. Talk about looking horrible in one spot, and great in the other. However when he was BABIP-garbage in his new team, his K/BB was an astounding 81/24!! (All IBB removed) A case study on Ohka should prove rather interesting.
And the most enigmatic going the other way, TWICE:
http://www.baseball-reference.com/m/morgacy01.shtml
Check this out:
.333 - 1907, team 1
.215 - 1907, team 2
.253 - 1908
.278 - 1909, team 1
.210 - 1909, team 2
It’s a switheroo of 4 SD from the mean, twice.
The pitchers who switched teams in the middle of the year are usually underperformers in that year; especially with a rather low BABIP number you get guys on waivers.
I took players with 50 BFP - SO with 2 different teams in the same year, and looked at their ERA the year before from 1947-2005 and compared their ERA to the year before (I exclude 1946 due to the war). There were 501 pitchers, whose same year ERA (simple average) went from 4.90 to 4.24 in the same year, but their prior year ERA was 3.92.
This tells me there is a lot of bias in the sample.
(I thought that there might be bias in PF, due to pitchers failing being more likely to be in high run environment in prior years, as teams did not park adjust in their decisions. I took the Lahman DB PF (the source of my data as well) and compared the PF for the players that switched teams. Both numbers were right around 100, so there appears to be no bias here, but I thought it was worth checking.)
dq, I reported this figure:
Also, their “before” BABIP was .292 and their after BABIP was .279.
So, we know there’s definitely a bias. However, I don’t see the point.
Choosing by ERA just shows the bias in terms of how managers decide on who to trade, but that ERA didn’t necessarily have to be correlated to the BABIP. (It is, as it turns out.) For example, say that the ERA was what you reported, but the BABIP was .290 in both sets. Do we have a biased sample, relative to what we are studying (BABIP)? Probably not.
Good check on the PF.
I’m also not surprised that the answer is 100. If you look, the SD of the park factor, at the multi-year park level, is like 3 or 4. Outside of a few parks, most parks are 95-105. Given that we didn’t expect to find much of a PF effect to begin with, coupled with low PF that exist, and 100 was the expectation.
The point is that the players being traded performed much better in the year before then in the year being studied.
For the players in my sample (50+ BABIP for 2 teams in same year, same league and 50+ BABIP prior year with 1st team) I get .275 for prior year, .289 for test year, same team, .282 for test year, new team.
I don’t think looking at players who change teams in mid year tell you much, because there is a reason they are changing teams.
I would also think in a large enough sample, there would be correlation between BABIP & ERA.
Granted all that. I still don’t see why the correlation would be affected. If say *everyone* is affected to the same degree, it cancels out completely. If it’s not everyone, then we’ve introduced a bias, however, this bias would make our correlation go down. So, the r=.20 that I report would be too low. What I’m showing then is that even though we’ve introduced a bias into our sample, we can still see, rather clearly, a correlation between the two sets of data.
If you are arguing that my r is too low, I agree. But, I doubt it’d make more than a .01 or .02 difference. If you’ve read the Solving DIPS paper, you’ll see the correlation of all pitchers, including those that stayed on the same team, and therefore would have benefitted in their correlation with the same park/fielders year-to-year. (i.e., we’ve introduced a bias to make the correlation higher than it should)
Long story short: it’s ok to introduce a bias that makes the correlation too low, but it’s not ok to introduce a bias that makes the correlation too high.
About 71% of the pitchers have a higher ERA, (real quick, gotta go) from year 1 to year 2 with same team...what are the odds of that over 500 occurences?
And it’s not just a population of pitchers who were “failures” You obviously have a mixture of good pitchers who were traded for a purpose (say Ernie Broglio for a name that popped in my head).
So you can’t say it is a population of pitchers who failed.
If you have such a mixed population (and not a random one by any means) I’m not sure what types of conclusions you can draw.
dq, I’ve already stipulated that our group of pitchers is biased, so there’s no need to quote the odds that the pitchers are not biased!
And obviously it’s a mixture of pitchers, not just strugglers.
Just because it’s a mixture doesn’t mean that it’s necessarily a bad thing.
The point is that we have a hazier group of pitchers than we’d like, and still we find an r of .20. That’s a *good* thing! Imagine how much stronger our correlation was if our pitchers were not such hard-luck cases.
I don’t see why the correlation would go down.
Other than the two studies mentioned in comment 2, are there others I should read to get a feel for current thinking on DIPS?
***
I would add JC’s and my study in the THT Annual 2006 (and an upcoming study in the 2007 Annual which will be even better).
Also, here: http://www.hardballtimes.com/main/article/dips-again/
And here: http://www.hardballtimes.com/main/article/another-look-at-dips1/
Though as I mentioned, in the last study JC makes some mistakes. Also, in my study, I understated Emeigh’s corectness. I later realized that he was likely correct, as the high-BABIP guys were much more likely to continue to have a high BABIP than the low BABIP guys were to continue to have a low BABIP in the next year.
I always nominate the Solving DIPS paper as the best DIPS paper around. On the other hand…
J.C. said this on his site:
“I don’t think there is much useful in the Solving DIPS discussion. I agree with Walt Davis’s characterization in the BTF thread. “
In that thread, Walt Davis said this:
“Pardon my snooty academicism (no really, I know I am), but “Solving DIPS” is mainly a bunch of really smart people working their way through the very basics of the binomial distribution, measurement theory, and covariance algebra. It’s impressive in that many of them didn’t seem to know this stuff before and worked it out, but it’s stuff you get in intro stats courses. “
Walt is right about everything except using the word “mainly”. And, that single word is huge. Solving DIPS is mainly about trying to separate observation from the underlying true rates. I had never seen this done before. And, this should be done almost all the time. Using year-to-year correlations has enormous selective sampling issues, things that I don’t see much with the Allen/Hsu process. In fact, I showed how I can get the underlying true OBP for hitters, true BABIP for pitchers, and true win% for any major sports league. This process opens up a huge door for us. Walt and J.C. don’t seem to think so.
I think Walt does a terrible disservice by characterizing the Allen/Hsu paper as just wading themselves through some basic methods. And J.C.’s concurring doesn’t help either. While the wading portion may be correct, the conclusions are much more powerful than anything else that I’ve read on DIPS.
If Walt or J.C. or others in academia want to take that paper on, feel free. I’d be happy to learn. Otherwise, I’m left to believe that the paper is a model work.
(I’ll cc: them tonight.)
Phil posted his followup here:
http://sabermetricresearch.blogspot.com/2006/10/did-baseball-salary-market-anticipate.html
And I have a comment in that thread.
Re: Tango’s post (two posts above this one):
As I said in the BTF thread, I also hold “Solving DIPS” in very high regard. And I too am at a loss to understand what problems Walt and J.C. found with it.
And in any case, I don’t understand Walt’s comment. What does the complexity of the method have to do with the usefulness or correctness of the result? He seems to be confusing the mathematical prerequisites for solving the problem (which are perhaps simple) with the importance of the result (which is high).
Double posted on here and the latest Birnbaum thread, just in case:
I haven’t read all of the materials very carefully or rigorously - my apologies - but I wonder about the fundamental relevance of the question of whether or not teams ( ‘the market’ ) had ‘solved’ the BABIP question prior to DIPS theory.
Teams choose their players on the basis of expected performance; that’s a given. But were teams even necessarily looking predominantly at previous results to determine player value in the future? Voros’ DIPS theory was a corrective among those trying to project pitcher ERA *mathematically*. How many teams were even trying to do that in 2000 and before?
Teams, much more than any of McCracken’s audience, base their evaluations largely on scouting information, or at least I’ve been led to believe that that’s the case. Well, if the spread in BABIP is insignificant relative to the spread in K and BB, you would expect that the scouting info wouldn’t reflect BABIP too strongly, right?
Sure, scouting info and stats are heavily cross-contaminated, and much of the scouting info that makes its way to the fans is predicated on explaining results moreso than on giving objective analysis. But, as a WAG, I’d expect that the effect of that will never come close to vaulting BABIP near K and BB in terms of statistical significance vis-a-vis player salary.
So my hypothesis, which I hope someone here can speak to, is that this is not at all similar to the supposed “market inefficiency” wrt OBP, because the inefficent ‘market’ that DIPS was correcting (i.e., statheads’ projections) was almost entirely separate from the market which may or may not have ‘solved’ that question prior to DIPS.
And I would have to wrap my head around it much more before feeling confident in this final point, but here goes: given that BABIP determines, to varying degrees, the key factors that determine(d) player salaries in the time span in question (Wins, ERA, scouting info), we would expect BABIP to influence salaries at the margins and shouldn’t expect it to be statistically significant wrt player salary.
There’s a good chance I’m screwing my thinking up with gross negligence, but I wanted to throw that out there.
If you want to see a classic example of academics’ failing to learn from sabermetrics, look at this paper which uses stolen base attempts to understand decision making. Because they badly miscalculate the true ‘rational’ value of SBAs in different situations, they conclude that the pattern of actual SBAs demonstrates huge risk aversion and other biases on the part of baseball managers. If these 4 B School professors had just read The Book, they could perhaps have then done some legitimate analysis on decision making.
I haven’t read the paper yet, but I see that one of the authors is a professor at Arizona State U, which is where Andy was… uh, professoring. I’ll ask if he knows him.
I’ll just make my notes as I read the paper, and will stop when my kid wakes up.
Page 11: the 22.7% figure. It would have been much better, and easier, to report something like 27% of the runners on 1B will score, whilw 42% of those on 2B will score. And the 22.7% figure itself is bad, since it treats the walk as a “non-baserunner” movement value, for the purposes of driving the run in. However, in this case, it would be better to treat the walk as a “do-over”, like in tennis. So, a 26% figure (i.e., .260 batting average) would be more appropriate.
Ok, Table 1 reports the figures I wanted! Good.
Page 13: I don’t like the talk about a “48% increase”. What we care about is the differential, and that would be .20. The next sentence treats it properly.
So far, paper is good.
Page 14: You must use win (or run) expectancy to calculate the marginal cost of the CS. The authors only look to see where no run (i.e., at least 1 run) will score in an inning. That is the wrong approach (unless that’s all you need). An introduction of the RE Table at this point is a necessity. They should have cited Pete Palmer at this point.
Ok, now they’re going straight to win expectancy. I don’t like missing the RE gap, but let’s go on.
Page 18, 19: I like the discussion on the economic theory of it. I’ll have to go through my charts about the home/visitor thing though. I didn’t seem to remember it being like that, so hopefully, I’ll be able to check it out. They are right about the general idea that you should steal when tied or ahead, not behind, since the cost/benefit is best here. I have a table in THE BOOK which shows this quite clearly.
I don’t like that they used “100” for the reputation opps. It’s absurdly low.
Page 26: The faster players leadoff the most, and therefore, are more likely to be there in the 1st inning. I don’t think it’s “regret” necessarily, though I would probably agree that it would come out as that.
Page 27: Hmm, they now talk about the leadoff hitter, but they don’t make the connection.
Gotta run…
The most obvious error I saw was focusing only on the probability of scoring a single run. This leads them to badly underestimate the cost of a CS, and specifically the cost of a CS to a team that is trailing. Then, when they discover that managers steal more with a lead and less when trailing, they conclude this is non-utility decision—i.e. it doesn’t maximize chance of winning—and therefore supports other theories of decision-making. I’d guess the posters here can identify other errors in the SBA analysis.
The authors’ premise that teams “should” steal more in close games is also rather odd. Their analysis leads them to conclude that SBAs are always a good move (but with small payoff). If that were true, then teams should always do it (given a catcher/runner pairing that meets or exceeds break-even success rate) regardless of inning and score. The fact that a SBA increases a team’s win prob more in close games is irrelevant: a team should always do what most increases its win prob at that moment. So a utility-maximizing manager shouldn’t steal any more (or less) in close games, even given the paper’s mistaken assumptions.
(I’m personally sympathetic to the idea people aren’t perfectly rational utility maximizers, but this paper doesn’t provide meaningful evidence for that view.)
Basically, the paper ignores the fact that while stealing increases the probability of scoring one run, it decreases the probability of scoring multiple runs—which is pretty much a full explanation of why you don’t steal when you’re down by three.
This fact is so well-known that it’s become sabermetric cliche ... I wouldn’t even know what to cite as a reference, so conventional is that wisdom.
P.S. check out the footnote that says players hit better with a runner on first. They say Ted Turocy discovered this in 2003.
I found that Guy previously made some points at “The Sports Economist.” He also mentions that the academic authors didn’t reference any sabermetric works, and that “academics should be very careful about dabbling in the world of baseball analysis.”
http://thesportseconomist.com/archive/2006_10_01__arch_file.htm#116068325670025759
Interesting is the response from Skip Sauer (comment following Guy’s, paragraph marked “third"). Dr. Sauer defends academic contributions to sabermetrics, and asks, if Guy finds their grasp of sabermetrics so misguided, “why are you wasting your time around here?”
Phil: In what way do you find Skip’s reply “interesting?”
I took notes as well:
I have a bunch of problems with this study:
(1) The study only looks at the increase in the probability of scoring a run. It should be looking at the change in run expectency. This is a HUGE mistake. It completely ignores that baseball is not about scoring one run.
(2) It uses empirical win probabilities derived in 1961. They should have at least derived the WE from their data, and better yet, done the Markov math (or taken it from someone who has, like Studes, who has built a whole WPA spreadsheet that anyone — including these economists — can download from his site).
(3) They write, “We noted earlier that stealing second base improves the likelihood that a team
will score one more run but has no impact on a team scoring more than one run.” This relates to number one. BUT WHAT ABOUT A CS?! I’m sorry, it just pisses me off for them to have missed something so obvious.
(4) They write, “Thus, if prospect theory holds, we should expect both visiting and home teams to attempt, ceteris paribus, to steal more often when they are ahead than when they are behind.” Of course this will happen because in baseball the one limit is outs! It has nothing to do with prospect theory, it just means that your outs are more valuable (and you’re less likely to waste them) if you’re down more runs.
Here’s another way of looking at it: The further down you are, the more runs you have to score to win — essentially, your run environment is higher. What happens in a high run environment? The value of a SB goes down and the value of an out or CS goes way up!
(5) They do adjust for batting count, but in a half-assed way. A 2-1 count, IIRC is actually about neutral, and certainly not at all the same thing as a 3-0.
(6) They include a variable for handedness and then look at a pitcher’s base-stealing prevention ability without adjusting for that. This will lead to double-counting, essentially. They’re going to under-predict the probability of a steal against lefties and over-predict the probability of a steal against lefties.
I have no idea how all of this impacts their findings, but certainly, they should be taken with a grain of salt. I remain unimpressed.
Interesting in that not only does he not properly address your criticism of academia, but he takes it personally and basically suggests you go away.
I would like to see the academics/amateurs discussion move forward, but Skip doesn’t seem to want to have a conversation.
“We noted earlier that stealing second base improves the likelihood that a team
will score one more run but has no impact on a team scoring more than one run.” This relates to number one. BUT WHAT ABOUT A CS?! I’m sorry, it just pisses me off for them to have missed something so obvious.
Totally agree, of course. The interesting part to me is that they aren’t even right about the part of this they tought about: a SB does increase the chances of scoring 2+ runs (mainly by preventing DPs, I assume).
I’ll bring it up again, what has been bothering me for a while now, which is the impact of a SBA when the batter puts the ball in play (fewer GDP’s, extra bases on a hit, extra hits because the IF’ers are out of position, minus a few extra DP’s on line drives and short fly balls). Does anyone know if this has ever been addressed by anyone and can anyone venture a guess as to how much “extra” value it might provide?
Well, let’s see...6.3% of all BIP are infield line drives (according to DIPS Revisited), and 1.6% are pop flies to the outfield. 3.74% of those will be converted into outs, and probably become double plays. When the batter singles, the batter has, what, a 27% chance of advancing to third? Let’s say he’s always ends up at third on a single. Let’s say that on a double, he always scores. He has maybe a 51% chance of scoring otherwise, right? And on an out, he always ends up on second except in those 3.74% of outs that turn into double plays. Now of course, 2.85% of all BIP result in GIDP anyways, so it’s actually just an extra 0.9% GIDP.
Now let’s assign some run values to this. I’ll assume that taking second third is worth .2 runs, and scoring is worth .3 runs more than being on third.
Okay, let’s do the math. For every single or ROE, that’s an extra (1 - .27)*.2 = .146 runs. For every double, that’s an extra (1 - .51)*.3 = .147 runs. For every out, that’s an extra .2 runs. And then we have -.84 runs for every additional double play.
26.4% of all BIP are singles or ROE, so .264*.146 = .0385.
6.6% of all BIP are doubles, so .066*.147 = .0097.
65.3% of all BIP are outs, so .653*.2 = .1306.
And we have an additional .9% double plays, so .009*-.84 = -.0076.
Add it all up, and a SB attempt when the ball is put into play is worth an additional .17 runs. Of course, it’s not all that simple…
First of all, I guesstimated all the run advancement values and the probabilities of advancing. Don’t know if they’re accurate.
And secondly, and more importantly, this assumes that everything else stays constant. I doubt that a hitter hits the same way when a batter is attempting to steal as he does when there is no one on. In a hit-and-run type situation, the batter almost always tries to hit a groundball, which of course has a much lower probability of becoming a single or double and a higher probability of becoming an out. The runner might end up on second, but there’ll be an extra out on the scoreboard that much more often. If a groundball results in an out, say 10% more often than any other type of BIP, well then that’s an extra -.03 runs every time the runner attempts to hit-and-run. Actually that doesn’t change the results very much. Hmm...Did I make some kind of mistake?
The mistakes made are bothersome, but they can be overcome. I like the effort, and if the authors want to learn, they can stop by here for a fair and critical review.
Turocy didn’t discover anything, any more than Hakes and I did. People have done this before. Anyone who claims such a thing as their “finding” at this late date should be embarrassed.
The paper on stolen bases that Guy discusses at TSE is not something academics should be proud of, as my comment makes clear.
Hakes & I (academics, fwiw) have investigated the probability structure & strategies of baseball since 2000. We tried to crack the nut of defensive contributions from the start, and Hakes has presented this work at SABR meetings. We are not “Johnny come latelys”. We like SABR people.
Hakes and I, like Brian Goff, are interested in using data in baseball to address issues in decision-making per se, from the perspective of academic research on that question (i.e. we care little if the data imply that Dusty Baker is a good manager). We do err when we miss the contributions made by non-academics, for sure.
But I don’t think throwing stones against people housed in academia advances the conversation.
Skip, thanks for stopping by, and I hope you can join in on more discussions.
Hi, Skip,
“But I don’t think throwing stones against people housed in academia advances the conversation.”
I didn’t interpret Guy’s comments the way you did ... as implying that all academic research on baseball is substandard. Rather, I think the comment says that some of the issues that are raised with regards to the current paper under discussion—“discoveries” that aren’t, questionable conclusions on issues that have been studied validly—are sadly too common in academic papers.
My feeling is that it’s a widespread (although by no means universal) phenomenon, that academics are often unaware that sabermetrics may have covered issues they are researching from scratch. I think it’s reasonable to note that fact and discuss it, while at the same time appreciating that tarring all academics with the same brush is not an acceptable form of debate.
That is, acknowledging that academic studies of baseball are often flawed is fair comment. Wondering why the flaws so often occur is fair comment. Wondering why they don’t reference non-academic research is fair comment.
But I agree that assuming that every academic study is therefore guilty of something or other ... well, that’s *not* fair comment. And it’s wrong—there’s lots of good academic work out there, and I’ve reviewed some of it on my blog. And I don’t think Guy, or I, implied otherwise.
But the existence of worthwhile papers doesn’t change the fact that flawed work like the base-stealing paper is by no means unusual, and it’s worth having a conversation about it.
Skip,
Here’s an analogy that may help me explain better what I’m trying to say.
I have heard economists say that many (but not all) non-academics have a poor grasp of economic logic, which is why (for instance) they tend to support policies like rent control, which economists have convincingly demonstrated are actually harmful to the tenants they purport to help.
Similarly, sabermetricians say that many (but not all) academics have a poor grasp of sabermetrics, which is why (for instance) they omit part of the cost of a caught stealing from their studies of managerial decision-making.
Both these positions are reasonable generalizations, and neither is a blanket attack on anyone.
Phil,
Point taken. Note that the base-stealing paper doesn’t cite Hakes and Sauer’s work on the issue either. This happens all of the time. Recently, some physicists published a supposedly novel study on outcome uncertainty in soccer. They completely ignored a couple of decades worth of studies undertaken by economists and statisticians, yet they and the press advanced their work as something new.
Communication across camps is naturally limited. This is unfortunate since it retards progress. But it is not easily addressed, and off-kilter criticism tends to reinforce existing barriers. I thought Guy was sniping at TSE and called him on it. Civility facilitates communication and so I will wipe the slate clean on the episode.
Re: value of a SBA when the ball in put into play. First of all, David, we are talking about a stolen base and NOT a hit and run. As well, whereas the batter MAY try and alter his approach when the runner is going on a straight steal (and I am not sure that they are), we are still talking theoretically in order to establish a proper BE point for the SB/CS, so we have to assume that the batter is NOT going to alter his approach.
If the number is .17, that is really high and substantially changes the BE point (I think)! How often does the batter put the ball in play on a SBA? Of course, the batter should not be deliberately taking a strike (or whatever pitch he might otherwise swing at) unless the success rate if really high (80’s or 90’s I guess) on that pitch.
Right. If we look at this completely theoretically, and guess that nothing changes, then we also have to look at the fact that a batter will only put the ball in play on 20% of all pitches. On the other 80%, we will just have a straight steal attempt, which is worth just about zero runs. So the actual value would be .2*.17 + .8*0 = .03 runs.
My guess is that because a steal attempt would change a batter’s approach for the worst (he wouldn’t try to hit a line drive, for example), the overall impact will actually be zero or negative.
Hi, Skip,
Thanks for the response!
I agree that communication between camps would be very helpful. Any suggestions in that regard?
For what it’s worth, I’d be happy to point anyone who asks to existing work I know about (although I don’t know about everything and I occasionally have to ask at places like this). Also, someone suggested that I index issues of “By the Numbers,” which is on my list of things to do.
There’s also Charlie Pavitt’s sabermetric bibliography, but that doesn’t include abstracts, so you’d have to go by title alone. I’ve always thought that expanding it would make a good project for SABR, if Charlie didn’t mind.
What would best help you guys bridge the gap? What kinds of resources do you wish the sabermetric community had that would make the economist’s work easier?
I just read the paper and I think it is wonderful. The methodology with respect to SBA gains and losses and WP or WE seems fine, but even if it has some problems, it doesn’t matter, as the authors are not trying to determine any BE points at any point in a game. They are simply trying to see whether managerial strategy viv-a-vis the SBA conforms with expected utility theory or whether it comforms with prospect theory (including endowment and regret theory). They did this very nicely and not surprisingly found that managers do in fact manage contrary to expected utility theory (i.e. optimally), with repsect to ordering a SB. Now whether in any given situation they order too many or too few stolen bases, based on the estimated BE point at those points, is another story and one which this article does not and does not need to address.
Very little if any sabermetric research I have seen addresses the issue of whether and how managers deviate from expected utility theory (what they SHOULD do) vis-a-vis the frequency of the SBA as a function of various pertinent variables (inning, score, outs, etc.).
A very, very good study, IMO, with no obvious flaws that I can discern.
Am I missing the sarcasm in MGL’s post, or do we have a case of identity theft here?
Phil,
A SABR-based archive with abstracts would be helpful. But note that the physics example comes from a culture where citation indexes and article searches are routinely accessible. It is a hard problem.
Dec 05 04:40
Sabermetric Moves of the 2009 Pre-Season
Dec 05 05:06
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon
Dec 04 23:42
Poll: Would you vote Raines for the Hall?
Dec 04 23:13
Avery being Avery
Dec 04 23:07
How to calculate the area of a baseball field
Dec 04 22:48
Complete Run Expectancy, Retrosheet Years
Dec 04 22:03
Raines for the Hall
Dec 04 15:55
Mailbags on Parade
Dec 04 14:01
What would happen if the shootout period was 10 minutes, not 5?
Dec 04 11:49
Estimating BABIP
Well, I’m tempted to agree with Phil. It’s less significant that items besides McCracken’s are excluded from the bibliography. It’s OK for a bibliography to be selective, though normally I think it should list an item which can be consulted for more detailed bibliography. And in general discussing web-based material is problematic, because of the chance that it may disappear and remove the foundation for the discussion.
But having reasons not to cite that work or discuss it in detail doesn’t justify the dismissive approach to the work of others.
On the other hand, a bit off the main point, I don’t think that Phil is correct to say that a consensus has been reached on DIPS. I hope not, because I’m not a believer.