Very nice work by David. What kind of program is he in at Rice—graduate economics?
His conclusion that teams generally pay position players what they are worth will probably not surprise too many people at this site, but it’s good to see it demonstrated so rigorously. He also shows empirically that rich and poor teams all pay the same amount for wins—again, not a surprise for posters here, but perhaps to some sports economists. And it’s nice to see a good empirical refutation of the Hakes/Sauer “Moneyball” paper, which never seemed plausible to me. I hope David gets this published somewhere.
My only quarrel is with David’s conclusion that “teams seem to place absolutely no weight on a player’s fielding abilities.” I have two concerns. First, his model shows that players are compensated based on their offensive production over a replacement-level player at their position. So that means teams obviously ARE taking fielding into account, paying more for SSs and Cs, less for 1B. So at a minimum he should say teams are not paying for “within-position differences in fielding performance.”
But how plausible is it that teams value defense between positions, but not at all within positions? Seems unlikely, even if there are some inefficiencies. I think the problem here is looking only at a single season’s fielding performance (as measured by Sean’s fielding runs). We know that such results are a poor estimate of a player’s true fielding skill, and also that within-position skill differences are relatively small (David mistakenly cites the single season standard deviation as evidence that differences in fielding skill are very large). If a player is +9 runs in a season, how much more should a team be paying him? Probably no more than the value of +2 or +3 runs.
Also, David is looking at both starters and bench players on the same zero-based scale. So a -10 fielder and a +10 fielder are both almost certainly starters, while a +1 fielder might be a bench player with 175 PA. Since David’s model has no playing time variable, that seems problematic. I also wonder whether his SB, CS, and baserunning variables might be picking up some of the value of defensive talent, as least for OFs.
Taking all these factors into account, it’s not surprising to me that teams don’t appear to be paying for defense using this methodology. Defense may be undervalued by the market, but I doubt it’s completely ignored.
*
Small point: I thought it was interesting that his coefficients for SB and CS indicate that the market places more value on basestealing than linear weights would suggest they are worth. That’s consistent with the idea that traditional sabermetric analysis overestimates the cost of basestealing, because of the confounding impact of hit-and-run plays.
Thanks for posting this, Tom.
Interesting comments as always, Guy. I think you get one thing wrong, however, which is that a +9 fielder is a true +9 fielder the way I do it, since I use a player’s statistics in the year AFTER his contract is signed. His numbers therefore are an unbiased expectation of his true talent—on average, all the +9 fielders will be true +9 fielders
I don’t think the lack of a playing time variable is problematic, since playing time is embedded in the variables themselves.
I agree that the SB/CS and even 1B variables might be picking up some of the fielding impact. Still, even if teams didn’t ignore defense (which by the way is not my conclusion—merely that they undervalued it), I think that still would speak to the fact that they didn’t really have a good idea of how to evaluate it outside of a player’s speed/light hitting.
I’m an undergraduate, actually. (Well, for another week-and-a-half anyway.)
David #3
“His numbers therefore are an unbiased expectation of his true talent—on average, all the +9 fielders will be true +9 fielders”
But the variation is so large for total zone (I’m guessing) that “on average” doesn’t come close to occurring, not on a 10 year level position by position, especially for outfielders since I’m guessing their batted ball distributions to their zones vary much more than infielders (Does anyone have a link to year to year Total Zone correlations by position?)
It is possible that teams were paying attention to fielding, but TZ is not doing a good job of evaluating fielders. Unlikely though, since TZ does have a decent correlation with the other metrics out there (including Fans scouting reports), and anything better like field f/x would not have been available.
It’s hard to come up with examples of players who were paid for their defense though. And plenty of counterexamples that show defense meant nothing.
Look at Carlos Lee and JD Drew, both available in the same offseason. Over the previous 5 years, they had similar value as hitters. Drew was better, though less durable. By linear weights it’s close enough, a +22 hitter vs a +20 hitter. Drew was a good fielder and Lee an awful one, yet Lee gets slightly more money per year and one more year on the deal.
Problem in this case is that only one of the teams involved was smart enough to care about defense, so they didn’t have to pay more to get Drew. At the macro level it looks like nobody cares about fielding since it doesn’t affect free agent pay. It does affect player movement though.
The smart team says sure, lets set prices based on hitting only. You can take that guy, we’ll settle for this one.
Let’s say Boston values Lee at 5 million (a huge penalty for defense) and Drew at 17. Houston values Lee at 17 and Drew at 14. Lee signs with Houston because he’s their first choice, and the Red Sox don’t show serious interest. Then Drew signs with Boston for less than they would have been willing to pay, because nobody else bothers to understand his value.
The other issue is that they only pay attention to fielding if it’s a positive, and ignore if it’s a negative. And possibly half-count positives at LF, RF, 1B.
David:
Even more impressive work for an undergraduate. (I had it in mind that you went to college in Boston, but now I’m remembering you grew up there).
I don’t think I agree about the fielding stats. A fielder puts up a -9 performance in the first year of his contract. Your paper says we expect the salary coefficient to be 1.0—he should be paid as a +9 fielder. But that’s not his real ability, nor is it what we expect in the remaining years of his contract—that’s probably more like +3. So why do we expect him to be paid as a +9 fielder?
I think your expected coefficients should incorporate mean regression. In many cases, the actual coefficients are a bit lower, consistent with that idea. And where the coefficients don’t make sense—triples and HBP—that reflects the fact that one year’s performance in those statistics has almost no relationship with true talent.
Interestingly, two stats are overvalued in your model: singles and BB. I would guess that’s because they are the most closely correlated with playing time. They are telling you whether someone is a starter or not. That could mean that starters get paid some premium beyond what their above-replacement stats generate, but more likely this playing time data is providing additional information about the player’s true talent. Suppose you have two 0.5 WAR players, one guy with 200 PA and the other with 650 PA. Would you guess they have the same salary? Or would you guess the second guy had a bad year and is paid more? I’d say “B.”
*
I understand why you didn’t look at all the years covered by a contract. But if you ever revisit this, you might want to try looking at the first two years of performance. That should bring observed performance closer to true talent, without shrinking your sample size too much.
David is correct that the +9 does represent that player’s mean estimate (albeit with a huge uncertainty level). It’s an out of sample point, and does not require regression.
Had the +9 occurred in the year preceding the signing, then yes, you’d have to regress, as you would for any other stat.
The out of sample is on the other hand unbiased. We don’t want to touch it.
Guy,
I have to think about the fielding thing a little more. Still, there’s no way the coefficient should be 0. As for 1B and BB, I don’t see why 1B and BB above replacement should correlate more strongly with playing time than other numbers. Actually, running the numbers, the correlation between the marginal performance variables and PA in my database are all close to 0, except for mHR which have an “r” of 0.47 (1B = 0.14, BB = 0.28).
Tango/David: I could definitely be wrong here, but let me push back a little. David’s model predicts salary based on the player’s performance in the first year of his contract. He is saying that players who are +10 in the field should be paid for 1 additional win (about $7M), while -10 fielders should be docked the same amount—if teams are correctly valuing fielding. Now that would be true if we were talking about true talent. But we aren’t—these are observed values. +10 fielders will not, as a group, be true +10 fielders. So we would expect them to be paid the value of maybe 3 or 4 runs saved. Same in reverse for the -10 fielders.
So the fielding coefficient shouldn’t be zero, but it probably should be something like .2 or .3.
*
David: do you agree that teams are paying players differently based on the position they play? And if so, isn’t that inconsistent with the claim that they place “no weight” on fielding?
*
Looks like my theory about the coefficient for singles is wrong. Still, I think it would be interesting to see if PA has any predictive value for salary in your model.
"+10 fielders will not, as a group, be true +10 fielders. “
That’s not true. They definitely will be.
For example, say I take all the +20 fielders in 2009. What will I see their fielding in 2010? Well, I’ll see it at around, as a group, of +10. I’m not going to further regress.
So, the +20 we observe in 2009 was actually +10 in 2009. And +10 in 2010. (Age notwithstanding.)
Tango: To use your example, David is saying all the +20 fielders in 2009 should be paid for the 20 runs they saved. He is saying that the one season we see is an estimate of their true talent.
Look at it this way: suppose we have two groups of players who both average +10 runs/season in the field. But one is based on a single year of data, the other on three years of data. Don’t we expect the second group to be better fielders? And shouldn’t they be paid more for their fielding? If David used 2-year or 3-year performance samples, we’d expect his coefficients to better approximate the linear-weight-based prediction.
And I believe this applies to all the statistics David uses, not just fielding. All of the salary coefficients should be at least a little bit less than his “expected coefficients,” because poor performers are likely to be better than they appear and vice-versa. I’m not sure why that doesn’t happen with singles and walks.
One other question for David: outs are not included in your regression. Wouldn’t that mean that the marginal value of each offensive event should be about .25 higher than you estimate, since each hit/walk also means the absence of an out? If a player hits one more HR than a replacement player with the same number of PAs, and 1B, 2B, BB, etc. are equal, isn’t that worth 1.65 runs?
Firstly, congrats David both on a terrific thesis and on your graduation! In particular, I like that you used the run-values of events as your “null hypothesis” for calculating t-statistics and p values rather than using 0 and looking to reject it.
What Guy is saying about regression makes sense to me and, Tango, I don’t think I understand your counterargument.
Maybe it would be clearer to consider an extreme version of what David did: regress salary on each player’s first week of production in April. What would our model show? Presumably, a very large constant (close to league average weekly salary), and very small coefficients for every offensive event. At that point, offensive production would mainly reflect luck, not skill, and it would have a very weak relationship to salary. One season is far better, of course, but still contains a fair amount of noise, especially in some stats.
Guy,
This is a bit difficult to understand (in fact, I got confused myself for a bit), but the key is that the performance variables are out-of-sample while salary is determined in-sample (that is, based on already known statistics). Imagine a player that is exactly replacement-level at everything except for fielding, where his true talent is +10 runs. At $600k/run, he will get paid $6 million. If you take every such player and regress his salary (divided by 600k) vs. his performance variables the next season, they will average +10 fielding runs and 0 everything else, and so the fielding coefficient will be 1. This is basically what’s happening here. The key is that the dependent variable is being determined in-sample, while the independent variables come from out-of-sample.
***
I don’t know if teams pay players differently based on the position they play. Obviously, I make that assumption in the paper, but it’s an assumption rather than a fact.
***
PA would definitely have predictive value, but for the wrong reasons—that is, they are simply going to be strongly correlated with the residuals. Players who under-perform their salaries will get more PA and vice-versa.
***
I think you’re right about the outs thing but I have to think about that a little more.
Guy/15,
This is why I don’t include a constant in the model. If you had a large enough sample, just using April data should still give you the correct coefficients.
David,
Doesn’t it being out of sample, as you say, simply mean that the slope of the best fit line of Fielding Runs v. Pay shouldn’t be affected by regression to the mean? I’d think that the slope of the best fit line of Pay v. Fielding Runs still would be. And, you looked at the latter, right?
Things that make me want to punch babies:
1) economics
2) double-spaced articles
All kidding aside, congratulations!
David/15: Yes, true talent +10 players will deliver +10 next year (on average) and be paid accordingly. But +10 performers—and performance is all your model has to work with—are not true +10 players, and should not be paid as such. All the +10 players in your sample should be paid as +4 players, just as the -10 players should be paid as -4—because that’s what they really are. If you used 3 years of data, you should then get coefficients much closer to your expectations. And with an infinite sample (and perfect market) they would converge.
Let’s say you repeated this exercise for pitchers, using K9, BB9, HR9, and BABIP. BABIP would appear hugely important in determining wins. However, when you ran your salary model, the coefficient would (presumably) be extremely low. In that case, we would say the market is smart. But that’s only because we know from separate analysis that single-year BABIP reflect only a small portion of skill. But you haven’t done a comparable analysis here, to assess how reflective of true talent each single-season stat is.
If you first regressed each player’s performance, then (I think) the coefficients would match in an efficient market. And it looks like that’s basically the case. The two problems are basically offsetting: the missing out value compensates for the lack of regression. Adding the out value back in will raise your expected coefficients, but if you regress performance I bet you end up with a pretty efficient market.
Guy,
I’m using OUT-OF-SAMPLE performance, so that isn’t a problem. It would be a problem is I regressed performance in the season BEFORE the contract was signed against salary, but not the season AFTER.
Right, David’s salary variable is essentially a proxy for 2007-2009 data, and the performance variable is 2010 data, which is unbiased.
great work - i enjoyed reading it very much. a lot more fun than the vast majority of business school cases i’ve slogged through the last 19 months.
Nothing much to contribute, just another voice saying that I find Guy more convincing than David/Tango on the fielding regression issue.
David/Tango: I don’t see how in-sample vs. out-of-sample addresses the problem. Chronology isn’t the issue here, it’s variance. When you add random variance, your coefficients will shrink (even if teams are correctly valuing true talent). Prove it to yourself: create a sample of players with various true values: $3M, $7M, $12M, etc. Assume they are paid accordingly. Now add some random variance to represent their actual performance in year 1, and run your regression. You will find that the coefficient for Year 1 performance is below 1 (using salary as dependent variable). The more variance you add, the smaller the coefficient.
Again, do the same exercise with BABIP: the first contract year’s BABIP is also “unbiased” in your terms. Are we then going to say teams “undervalue” BABIP when the salary coefficient comes out close to zero? David’s analysis would require us to say that. He is treating all performance as equivalent to sustainable true talent.
When you consider the variance problem, the results David gets for triples and HBP are just what you’d expect. Single-year performance in those categories has almost no correlation with skill.
1) I was kind of hoping for some way to use a player’s stats to come up with an estimate of what his salary will be.
2) David, you’re thinking too much like a SDCN. There are two other stats the front office will consider that we tend to ignore on a personal level: Runs Scored, and Runs Batted In. After controlling for Batting Runs, I’d be very surprised if there is not a correlation between salary, R and RBI.
"When you consider the variance problem, the results David gets for triples and HBP are just what you’d expect. Single-year performance in those categories has almost no correlation with skill.”
I didn’t think that statement was right for HBP, and it’s not. Looking at players with 500+ PA from 2009-2010, the y-t-y correlation for HBP/PA is .59.
My take is that teams might accept that those who get a lot of HBP have value, but this “skill” entails some injury risk - something that might hurt the player’s salary results.
And there’s one other variable that we need consider: melanin. Do teams pay more for white guys?
I’m trying to find some way to look for the Jeff Francoeur Factor—some irrationality in the market that causes players to be overrated.
Rally: Fair enough. There is some signal in even one year of data. And indeed, David finds a positive coefficient (but much smaller than true value of a HBP) that seems roughly consistent with your reported correlation. I would also guess that true HBP skill is somewhat correlated with true HR skill, so maybe the HR variable is picking up some of that value.
Rally: Can you shed some light on how strong a correlation there is between a single season TZ rating and a player’s true talent?
Guy/25,
Not if you force the intercept to 0.
Charles/26,
You can use my coefficients to estimate player salary. Of course, really all my findings point to is that you might as well use WAR, maybe minus the fielding.
As for runs and RBI, that’s a more difficult issue to tackle than you might think.
Charles/28,
Some studies have found that white players get paid more, some have found that they do not. You can take a look at some of the studies referenced in my bibliography to get a better feel for what others have looked at.
David/25: Good point: by forcing the intercept to zero, you will magnify the coefficients. However, you still need to grapple with the problem that the amount of regression is not equal for all statistics. So in cases where the gap between the salary coefficient and statistic X’s real run value is especially large, you can’t just conclude that “X is undervalued.” It could be that X is a single-season statistic with a relatively weak relationship to true talent (as would be the case with BABIP).
Also, despite forcing the intercept to zero, the ratios of your salary coefficients to the linear weights values are all less than one (once you include the missing .25 out value). According to your model, the market undervalues EVERY batting contribution, by 20-40%. Do you have a theory as to why that is?
*
I take your point that your analysis assumes, rather than proves, that teams pay more to players at premium defensive positions. But as long as you are going to argue that is true, I don’t think you can make the claim that teams place “zero weight” on fielding. The two ideas are mutually exclusive.
I ran the simulation* that Guy suggested in #25 and sure enough it plays out as he describes.
*Just 100 lines in an excel spreadsheet with one normally distributed random number for true talent and another normally distributed random number for luck.
One way to think of it is that as you add in more luck the spread of your independent variable (fielding runs) increases but the spread in your dependent variable (pay) stays the same so the the slope of pay v. fielding runs must decrease.
J.Cross: what happens if you force your constant to zero, as David did in his regression? Do you then get a coefficient of 1?
My two random variables were both centered on zero so I didn’t have to force the constant to zero. If you center true talent/pay around some non-zero number (either positive or negative) but force the intercept of the pay v. performance best fit line to zero then the slope is pushed closer and closer to one the larger the offset is.
"Rally: Can you shed some light on how strong a correlation there is between a single season TZ rating and a player’s true talent?”
Sure. Just as soon as we can agree on a measure of true talent to compare it to.
Hope that didn’t come off as too snarky. Seriously though, TZ has been around since 2007, David’s study back to 1999. It’s being used as a proxy for defensive skill, which teams would have evaluated by other means. How well TZ or any defensive metric evaluates true defense an open question.
How much a team pays for an extra homer is pretty straight forward. How much they pay for an extra TZ run, where most seasons in the sample nobody would have ever heard of TZ? That’s tough.
It’s a fair question (what is the correct measure of true fielding talent?). I was really just asking how well a single season TZ rating correlates with the next season, or a player’s career rating. There’s obviously quite a bit of y-t-y fluctuation. (We also know that TZ is influenced a bit by adjoining fielders, so even some of the observed correlation is not reflecting true talent.) And the more fluctuation there is—and this applies to offensive stats as well—the more the salary coefficient should differ from David’s “expected” coefficient.
As mentioned before, TZ is a good system and probably has a .75 or so correlation with something like Dewan’s latest systems. I would guess the year-to-year correlation for TZ is also about .5. So if the teams are valuing differences in defensive value within position at all, TZ should be picking up a significant chunk of that valuation.
MAH: A y-t-y correlation of .5 is decent, I agree. But it’s much less than we see for offensive statistics like BB or HR, so I would still argue we should expect a lower salary coefficient as a result (even if teams properly rewarded fielding talent).
But I realize now there’s a bigger problem that likely provides at least part of the explanation for David’s finding of a zero salary coefficient for fielding: TZ is negatively correlated with offensive talent. Since the single-year offensive statistics are an imperfect measure of true offensive talent, the TZ rating remains a (negative) predictor of players’ true offensive value. If you have two SSs with identical offensive statistics this year, but one is -10 in the field and the other +10, the first player is likely to outhit the second in year 2—the mere fact that he was allowed to play regularly at SS at a -10 fielding level signifies offensive talent. And that negative correlation will tend to offset any valuation that teams do give to fielding talent per se.
Looking at fielders with at least 2500 PA 2001-2010, here is correlation of TZ/PA and OPS*:
SS: -.55
2B: -.23
OF: -.28
* I know, I know—but that’s what Play Index spits out.


Enjoyed reading it. Truly shocking that fielding is apparently completely ignored in free agent pricing.
Does MGL or anyone else have any idea how many teams purchase batted ball data from BIS and/or STATS? It would seem very odd if many bought the data yet none paid for fielding value it might reveal.*
*David’s thesis used TotalZone to estimate fielding value, not a batted ball data system. Still, as reported in Wizardry, TotalZone probably has an approximately .75 correlation with batted ball data systems, so David’s model should have picked up a lot of the fielding value teams would theoretically pay for based on batted ball data.