THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Saturday, November 14, 2009

The Sauer/Hakes Moneyball spreadsheet

By Tangotiger, 12:01 AM

I put it on Google Docs.  Here is my first example:

image

You can click the image to see it better.  Ok, so what do we see?  The data on the left is if you use the 2001 coefficients.  The data on the right is the 2004 coefficients.

Player 1 is an OF/1B, player 2 is a C, and player 3 is an IF.  I made this guy a free agent, with 600 PA, a .400 OBP and .500 SLG.  His salary as a free agent is 4.5MM$ or 4.8MM$.  An infielder with those stats is a huge star, and an OF with those stats is a borderline one.  And a catcher who can hit like that would be Mike Piazza.  There should be more of a differentiation in salary than we see here.

Now, look at 2004.  Notice the NEGATIVE coefficient for the infielder?  That’s right, while the OF is earning 6.7MM$ for his performance, the infielder is earning 6.0MM$ for the same performance.

Here’s another image:
image

In this case, I made all three players free agent outfielders.  Player 1 is a slugger, player 3 is an on-base machine.  Player 2 is in-between.  In all cases, their 1.8*OBP + SLG is the same.  As we know, this metric tracks Linear Weights (or wOBA) very well.  We see that in 2001, the slugger would earn 6.3MM$, while the on-base heavy guy would earn 4.0MM$.

Fast-forward to 2004, and we have equilibrium!  All three players earn between 5.7MM$ and 5.8MM$.  The 2004 numbers are pretty believable and argues that the market has corrected itself.  I can also believe in the 2001 numbers. 

So, in a limited sense, we can see how there is some rational results.  But, does it seem possible that the sluiggers, after 3 years, dropped in salary?  That would be an interesting finding, but I don’t know if it’s true.

We know there are big problems, like the ones I’ve noted already (negative value for the infielders in 2004).  What if he now change it to 400, 500, and 600 PA?  Here’s what you get:
image

For the players in 2001, they all come out with a 3.35MM$ salary.  In order to do that, I have to give the guy with 500 PA a .415 OBP and .498 SLG and the guy with 600 PA a .333 OBP and .400 SLG.  And for the guy with 400 PA?  Forget it.  He’d have to be Barry Bonds.  Does this make any sense?  A 1B with a .333/.400 slash line is out of baseball practically.  A guy with a .415/.498 line and 500 PA is a very solid outfielder.  How can those two guy both earn the same salary?  And then match them to a 400 PA Barry Bonds?

The 2004 data is more reasonable here.  The .333/.400 hitter with 600 PA gets his 4.2MM$, the same total as the .381/.457 hitter in 500 PA.  And the same as the .429/.515 hitter with 400 PA.  I don’t agree with it still, but at least it’s more reasonable.

In short, I don’t see that their regression model reasonably models the reality of 2001 or 2004.  Sakes/Hauer did identify the important parameters, but the method in which those parameters are used in the model is not reasonable, nor are some of their coefficients even plausible. 


#1    Tangotiger      (see all posts) 2009/11/14 (Sat) @ 01:01

HEre’s another.  Put in .350/.450 outfielder with 600 PA.  Make one a slave, one an arb, and one a FA.

In 2001, you get these reasonable salaries:
0.7MM$
2.2MM$
3.9MM$

Arb players earn around half the FA, so that makes sense.

The slave player are way too high, but let’s let that one slide.

In 2004:
1.0MM$
3.9MM$
5.0MM$

That make any sense?  The slave player is now much too high.  I’m not sure there’s any non-arb player who makes 1MM, never mind the average .350/.450 outfielder.

And the arb player is at 78% of the FA player?  Maybe an arb player in his last year of his contract prior to being an FA.  But, no way.

I mean, how much time do I have to put into this before 100%, rather than 99% of the people are convinced?


#2    Nick Steiner      (see all posts) 2009/11/14 (Sat) @ 02:06

I just love this quote from JC Bradbury:

That’s all I need, because I understand how multiple regression analysis estimates the coefficients. But, maybe someone who doesn’t understand econometrics is confused by this.

Any putz with proficiency in R, or even fucking excel with some macros, could do a damn fine multiple regression.  The problem is, or course, that a regression is dumb, in that it has no idea of logic or how baseball works. 

If JC is really defending Sauer and Hakes model because they know how a multiple regression works, than I fear that there is just no reasoning.


#3    Tangotiger      (see all posts) 2009/11/14 (Sat) @ 11:18

JC is responding in the MVN thread, including (apparently) my thread in post 117.

This is basically my point of contention:

Again, interaction terms or some other correction could have been used, but they felt that their final specification was best. And they were able to convince many other economists (colleagues, editors, and referees) at different levels of review that what they produced was the best choice.

Fine.  They felt it was best, and they were able to convince a bunch of other economists.

They didn’t convince me.  I explained my reasoning, of which he does not take any specific issue.  My issues are at least justifiable, and possibly reasonable.

He’s saying that my issues are not relevant enough to make it a big deal.

I like the way I laid out my case, and if JC is happy with the way he laid out his, then the reader can make his informed opinion.


#4    Terry      (see all posts) 2009/11/14 (Sat) @ 12:26

I remember as a child that models were supposed to be fun to build and look cool when they were done.  Generally they weren’t really expected them to work though.

Maybe that’s the same minimum standard for economists these days?


#5    berselius      (see all posts) 2009/11/14 (Sat) @ 13:58

.

Any putz with proficiency in R, or even fucking excel with some macros, could do a damn fine multiple regression.  The problem is, or course, that a regression is dumb, in that it has no idea of logic or how baseball works.

If JC is really defending Sauer and Hakes model because they know how a multiple regression works, than I fear that there is just no reasoning.

I disagree with Bradbury and the economists on this stuff too. But there’s a big range of things you can mean when you say ‘they know how a multiple regression works’. As you said, any one who knows how to run a regression in R or Excel or whatever knows how to run a regression. I think what he was trying to say is that they know the mathematics behind why multiple regressions should work. But speaking as a mathematician (but definitely NOT a statistician), why and how are two different things, and that’s the reason why their regression isn’t all that useful.


#6          (see all posts) 2009/11/14 (Sat) @ 14:10

You know, it’s hard not to notice a parallel with the “real world”.

All of the models used to construct and sell securitized mortgage products were based on regressions that purported to show how the packaged products would perform in different market environments.  The models were backwards-looking; they just looked at how prices had moved in the past. A lot of smart Ph.D.s in economics designed the models.

And they worked just fine for firms like Lehman, Bear Stearns, Merrill Lynch, AIG . . . until they didn’t.


#7    Nick Steiner      (see all posts) 2009/11/14 (Sat) @ 14:40

Exactly, berselius. 

I replied to JC at the other thread, in summary:

“If I were an expert on baseball, and only had moderate expertise in econometrics or statistics, I would consult an econmotrician or a statistician before I did a study using those tools.

Converserly, if I were an expert at economtrics and statistics, but only had moderate expertise at baseball, I would consult a saberist before doing a study about baseball.”

I don’t see why that’s that hard for him to understand.


#8    Dackle      (see all posts) 2009/11/14 (Sat) @ 21:29

Here’s one of the problems with the way they designed their regression. Let’s say you’ve got at-bats, hits and batting average. You’re trying to predict hits based on batting average and at-bats, but for whatever reason, you don’t know the relationship between the three variables. So, you turn to multiple regression. Using all players with 200 at-bats last year, the best fit is:

Hits = (417.6 * bat avg) + (.268 * at-bats) -110.08

with a correlation of .992

Plugging in a few players:

Ichiro = (417.6 * .352) + (.268 * 639) - 110.08 = 208 expected hits vs 225 actual

Omar Infante = (417.6 * .305) + (.268 * 203) - 110.08 = 72 expected hits vs 62 actual

Maybe Sauer/Hakes should have used terms like (on-base% - league on-base%)*plate appearances and (slug% - league slug%)*at-bats in place of at-bats, on-base% and slug%.


#9    Guy      (see all posts) 2009/11/14 (Sat) @ 23:17

Despite the serious problems with the model, I’ve always thought it was still a bit surprising, and interesting, that the OBP coefficient is so low in many years.  It occured to me that H-S never look to see how OBP and SLG predict productivity at the player level.  They only explore this at a team level (and they don’t get it quite right, concluding that OBP is more than 2x as valuable as SLG).  If regression doesn’t get you the right coefficients for productivity, then what appears to be mispricing in the salary model could just be a limitation of the regressions. 

So I checked 2000 and 2003.  If you take RC (from B-Pro, using one of the James’ formulas) and regress it on OBP, SLG, and PA, the OBP coefficient is about 1.5 times as big as for SLG.  So it’s not terrible, but OBP appears to be less important than it really is, and quite a bit less than what H-S think it should be.  I think this is because SLG is more highly correlated with both PA and RC than OBP, at the player level.  Regression tries to sort this out, but doesn’t quite succeed.

Then I checked Runs Produced, which is R+RBI-HR.  Here the coefficients switch, and the coefficient for SLG is 2-2.5 times as large as OBP—very similar to what the H-S salary model tends to show.  That’s interesting!  If salaries mirror runs produced, which seems plausible, that would be roughly consistent with the H-S salary model.

So, one possible interpretation is that GMs were placing too much value on context-dependent production, which would be a kind of inefficiency.  We know, for example, that clutch hitting isn’t likely to repeat.  On the other hand, to the extent R and RBI reflect lineup position, they also reflect the managers’ knowledge of the hitters’ talent, and so could be a better predictor of future performance than one year of OBP.  So I don’t think we can assume that it’s “right” to pay for RC and “wrong” to pay for RP.  Would be an interesting question to explore. 

But here’s the weird thing:  the correlation between RC and Runs Produced?  It’s .96! (both seasons) So, how do we get radically different coefficients when predicting two variables that appear to be virtually the same?  I await the insights of the panel.  It certainly suggests to me how difficult it is to parse the value of these highly correlated factors using regression.  In fact, if you have one salary model that weights OBP to SLG 2:1, and another that weights them 1:2, they will both spit out about the same salary for Pujols, and the same salary for Adam Kennedy. (Which, by the way, means that it would have been extremely hard for the A’s to actually exploit any inefficiency that did exist. But that’s another story.)


#10    Guy      (see all posts) 2009/11/14 (Sat) @ 23:18

Oops:  RC data is from B-Ref, not B-Pro.


#11    Cyril Morong      (see all posts) 2009/11/14 (Sat) @ 23:49

Guy

When you write ‘So, one possible interpretation is that GMs were placing too much value on context-dependent production, “ It made me think of another paper by Hakes and Sauer called “Are Players Paid for “Clutch” Performance?” It may not be related to what you are saying, but context made me think of clutch. It has been awhile since I read this paper, so I can’t give much of a summary or analysis.

http://people.albion.edu/jhakes/pdfs/clutch.pdf

Cy


#12    Tangotiger      (see all posts) 2009/11/15 (Sun) @ 10:25

Here is JC’s response on his site, from the comments:
http://www.sabernomics.com/sabernomics/index.php/2009/11/defending-hakes-and-sauer/

And this is Guy’s response:

Guy November 14, 2009 at 10:33 am

The question isn’t whether the model is perfect, but whether it is accurate enough to make a rather subtle distinction in how the labor market was valuing two highly correlated skills, OBP and SLG, over short periods of time. Their claim, as you say, is that the market was assigning no value to OBP in 2001. But we know this is wrong. Their own later paper showed (see table 3) that the market was valuing both batting avg and the ability to draw BBs long before Moneyball, from 1986 thru 2003. So by definition, it was valuing OBP. And the relative value of BA and BBs was actually not far from what H-S say is correct. You can argue from their data that power was overvalued (though I think their power metric is too flawed to be sure), but their data proves conclusively that OBP and BBs were valued.

I agree we shouldn’t cherry-pick one year with an odd model to criticize. But the real point there is that these coefficients fluctuate wildly at the annual level — see table 5 in the 2nd paper. One-year models just don’t work — but H-S insist on drawing conclusions from individual years.

But let’s not neglect the other major claim of the paper, that in one year, 2004, OBP went from being severly under-valued to properly valued. However, this cannot possibly be true, because at least 80% of the players in their sample were either in a multi-year FA contract signed before 2004 or were still subject to arbitration decisions by arbitrators whose decisions are governed entirely by precedents set in 2003 or before. It is literally impossible for player salaries to adjust like this in a single year. Putting aside the details of whether Hakes and Sauer’s model was or was not properly specified, the simple fact is that they cannot possibly be right about this—it’s an economic and mathematical impossibility. Doesn’t that matter? To choose to believe a model over what we know about the baseball salary market makes no sense. Econometric tools can be powerful, but still need to be used within the constraints of good judgement and common sense.


#13    Guy      (see all posts) 2009/11/15 (Sun) @ 15:27

What do folks think of the log(salary) method?  It seems odd to value OBP or SLG in percentage terms.  For example, a 100-point gain in SLG increases salary by 26%.  Translating that into dollars depends on position, playing time, and OBP.  But why should it?  It’s value is really distinct from all those other factors.  Similarly, being a catcher increases salary by 16%, when it presumably should be a fixed amount (controlling for offense).  Not sure how much any of this matters.  But I wonder what the advantages of the log method are that might offset these obvious shortcomings.


#14          (see all posts) 2009/11/16 (Mon) @ 12:02

"All of the models used to construct and sell securitized mortgage products were based on regressions that purported to show how the packaged products would perform in different market environments.  The models were backwards-looking; they just looked at how prices had moved in the past. A lot of smart Ph.D.s in economics designed the models.”

To be fair, I think there was a fair amount of outright fraud in the financial sector, which I don’t think applies to authors of books about baseball statistics.

The fact that sabremetrics gets relatively little attention keeps it honest.

My own criticism is that I’ve noticed a tendency to try to be too precise with too little data.  There are always key variables that the model is going to miss.  Also, the game changes, so a model that predicts one decade accuratly won’t predict the next as accurately, and often the changes won’t be apparent until after the fact.

One technical problem with the securitized mortgage products is that the data was fairly limited, and worse limited to periods where house prices only went up!  Alot of economic statistics only date from the New Deal era, which means they capture the post-World War II boom very well, but it turns out that two or three decades of continuous economic growth is historically unusual.  The sabremetric equivalent would be making all your predictions based on 1995-2005 data, and either completely ignoring the influence of steroids, or assuming that MLB wouldn’t start a testing program and players would keep using them like they always did.


#15    Tangotiger      (see all posts) 2009/11/16 (Mon) @ 12:29

Guy,

I presume the log method is what works elsewhere in the real world, so, the presumption is that it should work in the sports world.

Let’s see what the Sauer/Hakes model says.  Everyone can follow along with my posted spreadsheet.  I’ll use the 2004 coefficients, since they make the most sense (the impossible infielder coefficient notwithstanding).

Start with a .375/.450, 600 PA free agent outfielder line.  He makes 5.48MM$.  Drop him to .3028/.3634.  That puts him at 3.48MM$.  Bring him up to .4245/.5094, which puts him at 7.48MM$.

So, we see here that the marginal effect to produce a change of 2 million dollars is:

take 80% of the base performance, and you get a drop of 2MM$.

Take 113% of the base performance, and you get an increase of 2MM.

Or, if you prefer differentials, dropping 72 OBP and 87 SLG is equivalent to gaining 50 OBP and 59 SLG from the base salary.

Or, to make it super clear:

If you take 2 guys at .375/.450, they will get 11MM in salary.

If you take 1 guy at .303/.363, and another at .425/.509 (for an overall average of .364/.436), they will ALSO get 11MM in salary.

This is what their model is saying, not me.


#16          (see all posts) 2009/11/16 (Mon) @ 12:57

I can almost guarantee that they have run the identical regression using salaries as a linear outcome - and simply didn’t report it in the paper because the results were not different in any important way.

Also, the goal of the paper was to test a specific hypothesis, that OBP was undervalued at the start of the decade, but that this has corrected itself. The regression is not intended to accurately model everything that goes into salary determination - two missing elements are age and defensive ability, along with a variety of interaction variables. However, to properly critique their analysis, you would have to explain why those omitted variables are correlated with OBP and SLG in such a way as to bias their key result. 

Every regression has omitted variables, few relationships are precisely linear (or log-linear), and therefore no regression is ever “right”. The question is - does the regression demonstrate a specific correlation that is interesting? Here, I think that Hakes and Sauer have done that.


#17    Mike Fast      (see all posts) 2009/11/16 (Mon) @ 13:20

Ken/#17, I don’t understand how what you are saying is any different than saying that you can make a regression produce whatever result you want by tweaking the input variables until the conclusion is in line with your desired outcome.


#18    Tangotiger      (see all posts) 2009/11/16 (Mon) @ 13:41

Ken/17 is basically saying the same thing as JC.

No one is talking about being “precisely linear” or being perfect, etc.  And yes, using the log or making it linear will yield similar results.  Guess what, you can use Total Average or RC or BaseRuns or something out of my a$$, and you will get a high correlation as well.  Such is what happens when you have clustering of data.

Though he is right that the omitted variables (fielding notably) being a bias is an important critique point. 

But, why ignore what I have found, that they had to force in that the infielder makes less than an outfielder for the same hitting stats.  Does that make any sense?

And, why use SLG, which forces the weights as 1,2,3,4?  Why not actually use singles, doubles, triples, HR individually?  And, if the regression gives far more weight to the triples than HR, then so be it, the regression has spoken.

It is stupid, beyond stupid, to say that a guy, if he moves from the IF to the OF, and keeps the same hitting stats, will earn 1MM less dollars.  It’s irrelevant if the “best-fit” says that.  Totally irrelevant.  What it does show is that the data IS biased (or that the equation is not good enough).  Or that just using data from year X-1 is not good enough.  Or there’s not enough players.  Or that mixing FA and arb and slave players doesn’t work within the context of the equation. 

There very clearly IS a bias.  The indicator on the position proves it.


#19    Tangotiger      (see all posts) 2009/11/16 (Mon) @ 14:06

... will earn 1MM MORE dollars


#20          (see all posts) 2009/11/16 (Mon) @ 16:47

There are an endless stream of variables that one could include - but that doesn’t make the analysis better. In particular, the point of the regression is not to discover what affects salaries - it is to discover the relative weight of OBP and SLG. You could look at the relative weight of homeruns vs. doubles as well - but that is a different empirical test. Note that the authors are intentionally looking specifically at OBP vs. SLG. because they are working off of the analysis of Moneyball. If Moneyball had said that doubles are undervalued, then they should have tested that.

The additional explanatory variables, such as position adjustments, seem inconsistent with experience - and that may be simply an anomaly, or it may be that controlling for additional factors would alter that to something that corresponds with our expectations. Given that explaining the positional adjustment in salaries was not the point of the article, it really isn’t very important. 

Think of a different test. Do players with a last name starting with a vowel get paid more than all other players? You can test this simply with salary data and information on last names. You don’t need additional information about player quality, because those things are most likely orthogonal to the main question. Can you use the regression coefficients to determine each player’s salary? - Of course not. Because that wasn’t the point of the exercise.

In this case, I agree that there could be a problem - mainly because the OBP/SLG tradeoff is likely to be correlated with the positional adjustments and therefore I would like to see more data to demonstrate that the result is robust to different specifications. Given the journal that it was published in, my guess is that the result is robust to a variety of additional . However, this doesn’t mean the original study is wrong, useless, or biased.


#21    Mike Fast      (see all posts) 2009/11/16 (Mon) @ 17:16

So you are saying that you can make a regression produce whatever result you want by tweaking the input variables until the conclusion is in line with your desired outcome, and you’re fine with that.  My word! 

Coming from the world of physics and engineering, I find that conception of “investigation” shocking.  I guess maybe I shouldn’t.

I don’t even know what there is to discuss at this point.  You economists can go off in your fantasy world and do your regressions and the rest of us will continue to work in the real world where the relationship between reality and the conclusions matters.


#22    Guy      (see all posts) 2009/11/16 (Mon) @ 18:54

Following up on my post #10, I wondered how much difference these salary models made in real life.  The claim of Hakes-Sauer is not just that OBP was mispriced, but mispriced enough to allow a team (Oakland) to gain significant advantage by exploiting the inefficiency.  Is that true?

So I looked at the division champion As of 2000 and 2001, comparing their hitters’ value under the pre-moneyball 2000 model and the “corrected” 2004 model.  I took the top 9 hitters in PA, assumed all hitters were FAs for the purpose of this exercise, and didn’t worry about the small position coeffecients (and I adjusted for the fact the 2004 model generates a salary 7% higher for an average hitter).  So, how much cheaper were the A’s players compared to their true value? 

Not much.  In 2000, As hitters were only 3.6% cheaper than under the 2004 model. And that’s all Jason Giambi, who earns 16% less under the 2000 model.  All other As players actually cost 2% more in 2000.  Same results for 2001.  Given the enormous contract Giambi signed in 2002, I’m not sure it’s reasonable to say his skills were undervalued pre-Moneyball, and his values are so extreme we probably shouldn’t expect the model to get him right.  But even if you include him, the A’s would save only 3% due to this inefficiency.  When you consider how much they are being outspent, the uncertainty of the best player projections, and random variation in performance, this is just a trivial advantage.  It can’t possibly explain the A’s success in this era.

For the large majority of hitters, their 2000 and 2004 salaries differ by less than 5%.  An extreme player like Menchino (.369/.374) earns 8% less in 2000, while a SLG-heavy hitter like Chavez in 2001 (.338/.540) is paid 8% more in 2000.  Because many A’s hitters have good SLG as well as OBP, the net savings from the “cheap” pre=2004 OBP is tiny. And that’s just generally true.

I think it’s revealing that Hakes-Sauer and their many reviewers appear never to have even asked the question, “how big is this inefficiency?” Could it be exploited, and how much advantage could a team gain by doing so?  Their own models tell us the advantage would be very small, and that the A’s specifically gained little to no advantage from the inefficiency.  But I guess that doesn’t matter, as long as you pass peer review.


#23    Tangotiger      (see all posts) 2009/11/16 (Mon) @ 21:01

Guy, yet another fantastic point. 

Along the same lines, we’ve found a “statistically significant” existence for clutch hitting, but when we try to estimate that skill, it becomes basically useless in terms of impact.  A manager will have no use of the clutch hitting numbers enough to make him change his decision as to who to send to the plate.

So, even if we concede that the authors found something, there’s nothing to exploit.


#24    Tangotiger      (see all posts) 2009/11/16 (Mon) @ 22:20

I contacted the authors and they were nice enough to give me the data.  I don’t have the stats package they used, so they’re going to give it to me in csv form.

I’ll be able to report back exactly where I see the shortcomings, using their data.  That is, we’ll be able to see why we get the data we get for infielders in 2004, and maybe for the OBP in 2001.


#25    Ken      (see all posts) 2009/11/16 (Mon) @ 23:31

Mike,

If your comments were in response to mine, then I think that you missed the point. If the authors come to different conclusions when they use alternative specifications, then I would expect them to report it. The fact that they didn’t, and the paper was published in a respected journal, makes me think that they ran a whole series of alternative specifications that produced similar results, and chose to present this one because it was the simplest way to make their point.  In addition, apparently they are willing to make the data available, which is the best possible check on the results.


#26    Tangotiger      (see all posts) 2009/11/17 (Tue) @ 01:52

I now have the data, and will go through it this week.  I’ll open up a separate thread when that happens.


#27    Terry      (see all posts) 2009/11/17 (Tue) @ 09:48

Ken (#26), as someone who sits on an editorial board (a journal in the life sciences), I don’t think it is wise to beg that question....

I’m not sure what reputation the journal in question has achieved. Based upon impact factor it seems to be a mid tier journal in the field of economics suggesting it’s a decent one.

But in my experience, it’s safer to assume the reviewers vetted the nuts and bolts of the model but didn’t necessarily have the knowledge of the system it was modeling to critique how well the model worked outside of a vacuum. In other words, the 2 or 3 reviewers involved in the actual peer review process may not have understood major league baseball well enough to ask the many questions that have been asked here and elsewhere, e.g. does the model actually work in real life? 

Heck, this is JC’s baby and he seems to have some serious blind spots.  I imagine it would be hard to find 3 reviewers qualified to referee an article like the one in question. I’d suggest and editor should define such a person as being both a credentialed economist (an editor likely wouldn’t select a reviewer who wasn’t)and a qualified sabercat. Obviously the latter quality would be the limiting one.

To me that is the rub with JC’s suggestion that the peer review process elevates the status of the paper in question. I think the concerns that the model doesn’t work in real life is a valid issue to raise because it certainly weakens the authors’ conclusions (potentially to the point where there is a high likelihood that they are wrong).

Again this is just supposition.


#28    Ken      (see all posts) 2009/11/17 (Tue) @ 10:42

Terry (#28),

There are a large number of economists that would have a working knowledge of sabermetrics - though it is impossible to know who they picked as referees for this paper. My objection was mainly to the implication that the authors chose the specification that best supported their claim. While I admit that it is possibly true, and a similar claim can be made about most papers in every field, it seems unlikely. More likely is that they ran a variety of specifications, the results were generally identical across regressions, and they reported the one that was simplest to explain - primarily because the article was not aimed at the sabermetric community.

And I should mention that the Journal of Economic Perspectives is a highly respected journal. The mid-range impact factor is the result of its focus - it is not intended to present cutting edge research, rather it publishes summary papers on a variety of topics at a level that is accessible to all economists, regardless of field, and down to roughly the level of an advanced undergraduate. As a result, its impact on research is relatively low, but its quality as a journal tends to be very high.



Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 15:00
Do pitcher’s reach back for velocity when needed?

May 25 14:44
What sabermetrics is NOT

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion