THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, September 29, 2011

Payroll v Wins, in the Moneyball era (2002-2011)

By Tangotiger, 01:10 PM

The chart below shows the total team payroll over the ten year period of 2002-2011, in millions of dollars, from the Yankees’ 1.875 billion dollars to the Rays 393 million dollars.  It is plotted against the total number of wins in that time period, from the Royals’ 668 wins to the Yankees 975 wins.

The data shows an r=0.70 correlation, signifying that there’s a great deal of relationship between payroll and wins. 

This is particularly strong, considering that service time is not included as a variable.  As any baseball fan would know, the performance from Evan Longoria’s first four years generated as much win impact as Chase Utley in 2008-2011.  But the cost for Longoria was just a fraction of what Utley earned, due to MLB’s salary being heavily tied to a player’s service time.  If we include service time as a variable, then the correlation would naturally increase.

The teams noted below were the three teams that got the most bang for their buck.  The Moneyball A’s were #1, followed closely by the Twins and Cardinals.  The teams that got the least for their buck was a two-team race “won” by the Orioles over the Royals.  The Mets came in third worst.

image


Someone asked me to index the data.  It works out to the same thing, as it turns out:

image

#1    Bill Petti      (see all posts) 2011/09/29 (Thu) @ 13:43

Is this inflation-adjusted?


#2          (see all posts) 2011/09/29 (Thu) @ 13:47

I’ll say tie goes to the Royals; playing 1/3 of your games against the Yankees, Red Sox, and Rays is going to make any team look like they’re underperforming.


#3    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 13:50

Bill: there is no adjustments, neither for dollars, nor for wins (not every team played exactly 1620 games).

Now, I can do that adjustment.  But, I will end up with virtually the same chart, so I don’t see a reason to add adjustments to increase precision, while still telling the exact same story.


#4    George      (see all posts) 2011/09/29 (Thu) @ 13:54

Where are the Jays on this graph? JP and AA have gotten a lot of wins out of below-average payrolls over the years.


#5    Bill Petti      (see all posts) 2011/09/29 (Thu) @ 13:57

Thanks, and I agree. I only ask because I did a similar study back in January (http://sbn.to/hIzDP2) and got a ton of questions because I didn’t adjust for it.

Also, when I ran my correlation I got an r equal to .41. My data included 2001 through 2010, though. Still, that’s a pretty good difference.


#6    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 14:02

Bill: I uploaded the image if I annualize it.

So, I figured the “payroll index”, as the team payroll divided by the league average that year. 

I plotted that against win%.

Nothing changes really.


#7    David Pinto      (see all posts) 2011/09/29 (Thu) @ 14:02

Tom,

It looks like there are three lines here.  One for the normal teams, one for teams that exceed expectations, and one for the teams that fail to meet expectations.  Is there any way I can get the data to see who is who?

David


#8    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 14:04

Jays: 681MM$, 808 wins.  Just a bit above the line.


#9    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 14:06

PayrollIndex    win%    expWin%    TeamName
234
%     0.602      0.613     New York Yankees
162
%     0.575      0.552     Boston Red Sox
145
%     0.491      0.538     New York Mets
130
%     0.494      0.525     Chicago Cubs
126
%     0.561      0.522     Los Angeles Angels
126
%     0.526      0.522     Los Angeles Dodgers
125
%     0.555      0.521     Philadelphia Phillies
117
%     0.549      0.514     Atlanta Braves
115
%     0.468      0.513     Seattle Mariners
113
%     0.557      0.511     StLouis Cardinals
112
%     0.523      0.510     San Francisco Giants
112
%     0.522      0.510     Chicago White Sox
107
%     0.469      0.506     Detroit Tigers
103
%     0.494      0.502     Houston Astros
94
%     0.505      0.495     Texas Rangers
91
%     0.430      0.492     Baltimore Orioles
88
%     0.486      0.490     Arizona Diamondbacks
85
%     0.499      0.487     Toronto Blue Jays
83
%     0.534      0.485     Minnesota Twins
81
%     0.475      0.484     Colorado Rockies
79
%     0.475      0.482     Cincinnati Reds
75
%     0.477      0.479     Milwaukee Brewers
74
%     0.484      0.478     Cleveland Indians
72
%     0.526      0.477     Oakland Athletics
68
%     0.478      0.473     San Diego Padres
65
%     0.412      0.471     Kansas City Royals
65
%     0.448      0.471     Washington Nationals
54
%     0.420      0.462     Pittsburgh Pirates
52
%     0.499      0.460     Florida Marlins
48
%     0.464      0.456     Tampa Bay Rays

regression line is= payrollIndex x .084 + .416


#10    Bill Petti      (see all posts) 2011/09/29 (Thu) @ 14:09

Weird. I still don’t see why the correlation would be that different. What was it for just the 2011 season? For 2001 I have .31.


#11    Bill Petti      (see all posts) 2011/09/29 (Thu) @ 14:22

Wait, I see one difference--you ran it for total wins and total payroll. I ran it for each individual team, each individual season.


#12          (see all posts) 2011/09/29 (Thu) @ 14:31

Which makes sense, of course, since there will be more random variation from season to season, but over a period of years, the payroll advantage will become clearer.


#13    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 14:37

Mark: right.

Imagine if you will you run a correlation of payroll and just April 2011.  Then you run against April-June 2011.  And so on and so forth.

As you increase your sample size, your correlation goes up.

Up to a point where systematic bias allows for it.  And not accounting for service time is a systematic bias.

If I were to repeat this for only say, I dunno, 4 years or 5 years in total, I’d probably also get r=.70.  This is also because of team turnover.  All the extra years is really just “random”.  That is, the 2002 A’s and 2011 A’s aren’t really the same “team”.  They share GM and financial constraints, etc, but they aren’t the same players.


#14    Devon      (see all posts) 2011/09/29 (Thu) @ 14:40

Apparently the Pirates haven’t read Moneyball yet, which explains everything in Pittsburgh.


#15    Bill Petti      (see all posts) 2011/09/29 (Thu) @ 14:44

I guess they are two different ways of looking at it. Overtime, the general advantage goes to the higher spending team (r=.7), but within a given year that advantage does not necessarily hold (r=.41).


#16    David Pinto      (see all posts) 2011/09/29 (Thu) @ 14:49

Thanks Tom.


#17    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 14:56

Right, and the reason it doesn’t hold is that with 162 games, just about anything can happen.  Wins is NOT a proxy for talent, if you only have 162 games.

When you have 1620 games, wins IS a proxy for talent.

(Well, it’s not “IS” and “IS NOT”, but the more games you play, the more the talent leads to wins.)

However, regardless of number of seasons, service time is service time.

I’ve explained this in the past.  The causative relationships are the following:
talent + luck -> wins
talent + service time + management -> salary

So, when you run a correlation of wins to salary, you are not running a correlation of a cause-effect relationship!

The cause effect relationship is exactly what I have above.

Now, when you have 1620 games, the luck part starts to get dwarfed by the talent part, and so wins starts to become a proxy for talent.

In the second equation, the service time will remain regardless of number of games, because the Yanks are the Yanks and the Rays are the Rays.

That’s why you have to know what you are doing first, before you actually do it.  Running a regression in the hands of an “expert” at regressions but amateur in baseball is a recipe for disaster.


#18    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 15:09

David Pinto noticed that there may be some clustering effect.  I don’t know if it’s random or not, but this is how the teams clustered:

Oakland Athletics
Minnesota Twins
St. Louis Cardinals
Florida Marlins
Los Angeles Angels
Atlanta Braves
Philadelphia Phillies

Boston Red Sox

San Francisco Giants
Chicago White Sox
Toronto Blue Jays
Texas Rangers
Tampa Bay Rays
Cleveland Indians
San Diego Padres
Los Angeles Dodgers
Milwaukee Brewers
Arizona Diamondbacks
Cincinnati Reds
Houston Astros
Colorado Rockies
New York Yankees

Washington Nationals

Chicago Cubs
Detroit Tigers
Pittsburgh Pirates
Seattle Mariners
New York Mets
Kansas City Royals
Baltimore Orioles

I don’t know that there’s any common elements in the three main groups.

The teams are ordered from best to worst in “Moneyball”.


#19          (see all posts) 2011/09/29 (Thu) @ 15:20

I have a question but I’m not sure how to formulate it (or even what kind of answer I’m looking for).

Basically, I’ve been looking at this data and thinking that if you attempt to draw conclusions about a given franchise’s front office competence (or, the competitive balance in baseball), you’re making an implicit assumption that all teams enter every season with the goal of winning the most possible games.

That doesn’t sound terribly shocking, but it makes me wonder about franchises like the Florida Marlins that seem to have a schizophrenic attitude—they cut payroll and promote young players aggressively (or fill holes with cheap veterans), and then every 4 or 5 years, they bump the payroll up and take a shot at the postseason. Of course, that’s my impression of their plan; what they are actually trying to do might be entirely different.

So, I’m not sure what to say about this, other than to ask very generally, is it possible that this data might obscure some important information about a franchise that takes a somewhat nontraditional approach to building a contender?


#20    Micah      (see all posts) 2011/09/29 (Thu) @ 15:32

Tango, does this curve shift when you look at different ends of the decade? The A’s have a lot fewer wins now, in part because more teams are using the same strategies and market inefficiencies are harder to find. If that’s the cause I would expect that the data contains a spectrum shift if front office strategy and may mask a trend you might expect in the next 10 years. (Of course, smaller sample size increases error, so maybe this is hard to determine.)


#21          (see all posts) 2011/09/29 (Thu) @ 15:39

how would you account for service time? could you just add up the starters’ service years per team, instead of breaking it down by games played or at bats? would age serve as a reasonable proxy for that?

then what would expect to see, teams with younger players (or less service time) would have lower salaries, but not necessarily more wins, right?

sorry for all the questions, i’m just thinking through that out loud (or via keyboard). thanks for posting this. would make a perfect college or grad level stats example.


#22    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 15:50

To account for service time, you have to understand how salaries work:

1. for pre-arb players, all salaries are “dictated” by the team, to the point that there’s virtually no relationship between talent and salaries.

2. for arb-players, better players get paid substantially more, but still at a discount to free agent players; say they get about 50 cents on the dollar

3. free agent players get paid more the better they are

So, you need to weight service time based on the talent level of the players.  Something like:

Team’s talent level that they pay for
= 5% of talent level of pre-arb players
+ 50% of talent level of arb players
+ 100% of talent level of free agents

Something like that.

You see therefore that a team that relies heavily on pre-arb players (say, the Rays) would have very little of their talent actually reflected in their payroll dollars spent.


#23    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 16:18

To show how the service line variable affects everything, consider the best fit line I showed:

win% = payrollIndex x .084 + .416

So, a team that spends the minimum (a payrollIndex of almost 15%) would have an expected win% of .429, or 69 wins.

Of course, there’s a difference between spending that money on free agents who are only good enough to sign minor league contracts, and drafting players or bringing up your cost-controlled players and making them play for the minimum.

That’s why this regression equation implicitly includes some level of service time.


#24    VZelepukin25      (see all posts) 2011/09/29 (Thu) @ 16:43

It seems that adding in the payrolls of the other teams within the division does nothing to improve the win% prediction.  I guess it shouldn’t have a huge impact given how many out-of-divison games are played, and how opposing team payroll only explains 50% of opposing team win%, but i still found this a little surprising.


#25    Richard Bergstrom      (see all posts) 2011/09/29 (Thu) @ 16:55

$14/Devin In 2002, you had the likes of Kevin Young and Jason Kendall cluttering up the Rays and Pirates rosters. 2002 is just a cutting off point based on Moneyball and I realize that fewer seasons of data increases the factor of luck. That being said, how the Pirates front office has operated the last two to three years is vastly different than 2002 and would appear more “Moneyball"-ish in a few years time.


#26          (see all posts) 2011/09/29 (Thu) @ 17:08

It would be interesting to see how teams rank in terms of WAR per *free agent* dollar spent.  That way, you could classify teams by “perhaps exploit inefficiencies in the free agent market,” vs. “perhaps exploit inefficiencies in drafting/getting young players”.


#27          (see all posts) 2011/09/29 (Thu) @ 17:40

@tango23&24 - cool thanks! so accounting for service time you think there’d be more recognizable clusters forming, ie moneyball young, moneyball old and non moneyballers? course teams changing strategies during the “moneyball era” would probably muddle those clusters.


#28          (see all posts) 2011/09/29 (Thu) @ 17:48

I’m actually surprised by where the Cardinals rank.

The A’s get A LOT of attention for their approach, as they should.

So do the Twins ... there’s the lowly Twins winning all those games without money, using their intelligence and organizational approach to succeed.

Then there’s my Cardinals ... one of the most successful franchises in MLB, and the NL team with the most WS titles.

We hardly ever hear about StL in regards to having a good front office (at sabermetric sites), and I began to wonder why? Is TLR that detestable (Yes, he is)?

The more I thought about it, I wondered if it’s because the team is so dominated by Pujols. That he’s so good that he either [1] overshadows the front office or [2] trumps their mistakes.

According to FG, AP5 has been worth 80 WAR over these 10 years and has earned around 110M. The Cards during that time have 900 Wins and have spent 900M.

So if you remove Pujols from the team, that’s 820 wins and 790M. Replace him with a league average 1B for 2 WAR per year @ 9M (4.5M per WAR), and the “final” numbers are 810 Wins for 880M.

That moves StL BELOW the line. Not sure if I took the correct approach or not, but if so this reveals what every Cardinal fan has known since AP5 came on the scene ... the difference between being an elite team and an average team (on the graph) has been ALBERT PUJOLS.


#29          (see all posts) 2011/09/29 (Thu) @ 17:51

Oops, that would be 830 Wins, not 810 ... putting the Cards basically “right on the line”. Still ...


#30    Richard Bergstrom      (see all posts) 2011/09/29 (Thu) @ 17:56

#29/CircleChange11 Doesn’t that just mean that Pujols is signed to a very team-friendly contract in terms of dollars/value?


#31    Xeifrank      (see all posts) 2011/09/29 (Thu) @ 18:24

Would you get the same results if you used wins over replacement and payroll over minimum?


#32    Tangotiger      (see all posts) 2011/09/29 (Thu) @ 18:35

I’d get exactly the same results.  All you have to do is change the numbers on the x and y axis.

The data points remain, the slope of the line remains.  Everything is the same.


#33    Michael K      (see all posts) 2011/09/29 (Thu) @ 18:38

@10- The Wall St Journal recently claimed the correlation is just .14 for 2011 (as of Sep 16).  They spent a lot of words digging for meaningful explanations (e.g. revenue sharing) and for the most part ignoring the “one season is a small sample size” issue.

http://online.wsj.com/article/SB10001424052748703743504575493942146685242.html


#34    Michael K      (see all posts) 2011/09/29 (Thu) @ 18:43

Oops-- that should read “for 2010” (the article was recently linked to by someone arguing that the wins/payroll correlation was decreasing and therefore there must be more market inefficiencies)


#35          (see all posts) 2011/09/29 (Thu) @ 20:49

Another interpretation is that for any given amount of spending, there is a huge range of wins even over 10 seasons.

I don’t think correlation is a great tool here because the relationship is so clearly non-linear.


#36          (see all posts) 2011/09/29 (Thu) @ 20:49

Another interpretation is that for any given amount of spending, there is a huge range of wins even over 10 seasons.

I don’t think correlation is a great tool here because the relationship is so clearly non-linear.


#37          (see all posts) 2011/09/29 (Thu) @ 22:26

Having a bit of trouble with #23.

An average team spends about $90M a year. At average free agent prices, that would get you about 19 wins above replacement before you had to spend the rest on guys at the league minimum salary.

Since replacement level is about 48 wins, I’m getting 67 wins for an average payroll and no surplus value (think the discount teams get on draft picks). But if average surplus value and no payroll gets you about 69 wins, then average surplus value and average payroll should give you about 88 wins a season, which can’t be right.

Anyone want to show me what I’m doing wrong here?


#38          (see all posts) 2011/09/30 (Fri) @ 01:06

Paul/37, the explanation for that is in #22.  A lot of the talent is not paid at anywhere near free-agent prices.


#39          (see all posts) 2011/09/30 (Fri) @ 15:50

Glad to see this

I did something similar looking at payroll and wins from 1986-2005

http://cybermetric.blogspot.com/2008/05/another-look-at-salaries-and-wins.html

My graph looks similar. Could that mean things have not really changed that much? I really don’t know


#40          (see all posts) 2011/09/30 (Fri) @ 15:50

I think my correlation was about the same as it is here


#41          (see all posts) 2011/09/30 (Fri) @ 16:23

#29/CircleChange11 Doesn’t that just mean that Pujols is signed to a very team-friendly contract in terms of dollars/value?

That goes without saying. The guy has produced something like almost $250M in surplus value over 10 years.

I was wondering/questioning/thinking if that was the reason why StL’s FO doesn’t get the credit it deserves, accoridng to the data in this thread/graph. Without AP5, StL is an average organization/FO.


#42    Jeff Luhnow      (see all posts) 2011/09/30 (Fri) @ 16:26

"We hardly ever hear about StL in regards to having a good front office (at sabermetric sites), and I began to wonder why? Is TLR that detestable (Yes, he is)?”

!@#$%^&*(!!!


#43    Richard Bergstrom      (see all posts) 2011/10/01 (Sat) @ 04:07

#39/Cyril Morong If Moneyball-type theory has changed the game so that there are less efficiencies, should the slope of the graph change i.e. is there evidence of teams across the board being more efficient?


#44          (see all posts) 2011/10/01 (Sat) @ 11:20

Richard

Good question. I did my regression differently. Not sure how it matters

What I did was to run a regression with average wins per year as the dependent variable and the average salary (SAL, the % above or below the league average) as the independent variable.

Suppose a team was 10% above average in salary one year and 30% above average another year, they would get 20 (if were just over two years).

Here is the regression equation

Wins = 0.157*SAL + 80.22

So I am not sure how the .157 compares to the .084 Tango gets here. He had winning pct as the dependent variable.

But I think things end up being similar (at least in the following example).

In Tango’s regression, if you spend 20% more than average, you get a 1.2. The pct is

pct .084*1.2 +.416 = .517

In my case, if you were 20% above average, I used 20. The wins would be

wins = .157*20 + 80.22 = 83.36

That gives a pct of .515, just about what Tango gets.

In my model, an average salary team has a % of .495. Tango gets .500.

In my model, a team that spends twice the average gets a % of .592. Tango’s model gives .584.

In my model, a team that spends half the average has a % of .447. Tango’s model gives .458.

I can’t tell if this makes any sense or is useful. It looks like things have not changed that much in the Moneyball era. At higher salaries my equation gives slightly higher winning percentages. At lower salaries, Tango’s equation gives slightly higher winning percentages. I can’t tell if this helps us understand the Moneyball era


#45          (see all posts) 2011/10/01 (Sat) @ 11:34

In my model, the slope is slighly higher. A 10% increment in salary (like going from 20% more to 30% more) makes pct go up 0.00969.

In Tango’s model it is, of course, .0084.


#46    Richard Bergstrom      (see all posts) 2011/10/01 (Sat) @ 15:58

It’s probably a hard question to answer, in retrospect. More efficient teams would be reflected in the overall quality of the competition in the league itself making “wins” that more expensive salary-wise.


#47          (see all posts) 2011/10/01 (Sat) @ 17:49

My first thought is that if teams got smarter about spending money, then the slope would get steeper (a flat slope would mean spending does not matter or there is no relationship). But if all teams got better at the same time to the same extent, would that change the slope? I don’t think I know


#48    novaether      (see all posts) 2011/10/04 (Tue) @ 09:41

Well, this is just perfect:

http://www.collegehumor.com/embed/6620498/too-much-moneyball


#49    Tangotiger      (see all posts) 2011/10/04 (Tue) @ 10:01

blocked at the office… can you email me:
tom~tangotiger~net


#50    Richard Bergstrom      (see all posts) 2011/10/04 (Tue) @ 18:31

#48/novaether: That was hilarious. College Humor really puts out some great stuff.


#51    Kansas City KC      (see all posts) 2011/10/27 (Thu) @ 21:25

I just don’t understand it - does David Glass and co just “pocket” all of the Royal’s money?  I mean I know that they spent some money updating the stadium (basically turned it into a giant BAR), but where does our money go in Kansas City?

1985 Royals.....I was THERE.


#52    Tangotiger      (see all posts) 2011/11/08 (Tue) @ 11:26

Matt applied the same idea historically, and by GM:

http://www.hardballtimes.com/main/article/bang-for-their-buck/

However, he didn’t come up with a good enough conversion from Payroll Index to win%.  He did:

win% = PayIndex/4 + .250

So, if you have a 100% PayIndex, your win% is .500.  If you are at 200% PayIndex, your win% is at .750.  He then ends up with a fudge.

Indeed, his conclusions are going to be biased because the slope of his line is wrong.

If you look at my chart, you can infer the equation as:

win% = PayIndex/12 + .417

So, a 100% PayIndex means .500, and a 200% PayIndex means .583.


#53          (see all posts) 2011/11/10 (Thu) @ 16:02

I’m way late to this (work gets in the way of life sometimes).  Michael K (@33) notes that the WSJ apparently concluded that the correlation of payroll & wins in 2010 was 0.14 (as of September 16,2010).  Using the data I have (and will share), I found a correlation of payroll & wins for 2010 of 0.48 (for 2011, it’s 0.41).

How the WSJ managed to be quite that wrong is beyond me.


#54    Kincaid      (see all posts) 2011/11/11 (Fri) @ 07:15

There are various estimates of team payroll, so I guess it’s possible the correlation using the WSJ’s data was .14.  I get a correlation of .37 for 2010 using the salary table from the Baseball-Databank (.34 with games through Sept. 15).  WSJ’s .14 does seem pretty low compared to Donald’s data and the Baseball-Databank data.

Regardless of whether it is the correct r for their data, it is kind of misleading to report the r for single years like that.  Since each season only has 30 data points, r will be highly variable.  The two most extreme correlations they found over the 16 years they looked at (.71 and .14) still have overlapping 95% confidence intervals (.47 to .85 and -.23 to.48).  With that much random volatility in the year-to-year correlation coefficient, there’s no reason to split up the years and pick out variations as meaningful.


#55    Tangotiger      (see all posts) 2011/11/11 (Fri) @ 13:01

r-squared of .14 is r of .37.  Is the difference as simple as that?


#56    Kincaid      (see all posts) 2011/11/11 (Fri) @ 13:24

It might be if they did that by mistake for 2010.  I don’t think the rest of the numbers they report make sense as r^2, though, and since they just call it correlation in the article, I assume they mean it to be r.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 23 01:15
How much should minor leaguers make?

Feb 22 22:31
Not everything you learn in college is true (duh)…

Feb 22 17:27
Would you cut to a regularly scheduled show, if the main event ran long?

Feb 22 17:02
This week in chart failure

Feb 22 16:26
Who’s evaluating the 2011 forecasts this year?

Feb 22 12:21
MLB 2012 Odds: BetOnline

Feb 22 07:11
K minus BB differential or ratio?

Feb 22 01:18
Two players have the same stats: one is much younger.  Which one will be better next year?

Feb 21 14:49
Knuckleball pitchers: all of them

Feb 21 13:57
Proper compensation for Epstein?