THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, October 29, 2007

Zimbalist’s regression

By Tangotiger, 10:31 AM

I just finished writing a post on another thread that shows that you can get a regression to tell you anything you want.  This is how I ended it:


If GP approach infinity, r approaches 1.  If GP approached 0, r approaches 0.  You see, the correlation tells you NOTHING, absolutely NOTHING, unless you know the sample size.

Read the rest of the link.  So, here comes Andrew Zimbalist, who I must say I highly admire, at almost the Bill James level, and is quoted as saying:

“If you do a statistical analysis [of] the relationship between team payroll and team win percentage, you find that 75 to 80 percent of win percentage is determined by things other than payroll,” says Andrew Zimbalist, a noted sports economist and professor at Smith College in Northampton, Mass.

However, I said this a while ago:

Finally, here is the overall average, 1992-2005, of team wins and payroll index.  Correlation?  r = .70.  There is an extremely high correlation between wins and payroll.
...
So, the relationship between payroll and wins is a little tricky and it depends on exactly what the question is.  There’s no question that the driver to wins is the base talent level.  That talent level is not necessarily going to be paid at the proper levels year in and year out.

You can get a correlation to tell you r=.50 as Zimbalist is saying, or r=.70 as I’m saying, or anything at all that you want.  It’s a matter of the sample size. 

#1    Pizza Cutter      (see all posts) 2007/10/29 (Mon) @ 12:25

Could the differences be due to different methods for adjusting payroll?  Certainly, salaries have gone up in nominal dollars in the last 15 years, and possibly in real (inflated-adjusted) dollars.  Perhaps someone was using some sort of z-score method (seems the only fair way?)

That could mess with the distribution and people would get different correlation/regression coefficients.


#2          (see all posts) 2007/10/29 (Mon) @ 12:43

In that previous thread, MGL says:

“I have not been talking about the number of observations.  I have been talking about the sample of data within each observation (100 PA, 700 PA, whatever).”

...which sort of clears up the confusion I was having over this.  In this instance, the number of games in the season are the “sample”, while the number of seasons are the “observations”, right?

So when Zimbalist finds the correlation of r=.50, you’re saying he is not using a full season of games for the teams he is looking at?


#3    Tangotiger      (see all posts) 2007/10/29 (Mon) @ 12:43

I used a total of 14 years of data.  THAT’s the reason I get the r = .70.

He may have used 1 year or 2 years or who knows how long. 

Like I said, I can get r = .9999 for just about anything, and you can get r = .0001 for the exact same thing.


#4          (see all posts) 2007/10/29 (Mon) @ 12:55

In practical terms, are you saying that you could get an r=.9999 over a million seasons because when you examine the teams by payroll, every other aspect of them that influences winning will balance out and you’ll be left with just the impact of payroll?


#5    Tangotiger      (see all posts) 2007/10/29 (Mon) @ 13:30

Yes, that’s right.  The example would be better as 1 million games over one season (since the 1947 Yanks have nothing to do with the 2007 Yanks).

If you only looked at the last 32 games of the season, what does payroll and wins have to do with anything?  But look at the last 320 games, and you get a different story.

BP in BBTN did their clutch hitting chapter and claiming an r of .33 or something.  But, that was entirely driven by the number of PA for each player (half a career’s worth).  It is not impressive at all to get an r of .33, if you need 3000 or whatever PA to get it. 

I’ll challenge any economist or statistician to any claim of correlation, if they don’t tell us the number of trials for each sample.  And, almost all the time, these studies that use correlations make NO mention at all about the number of trials per n.  What they do mention is the n (number of players, of teams, of people, etc).  The n is the driver for the confidence interval range of the correlation, but the number of trials per n is the driver for the correlation itself!


#6          (see all posts) 2007/10/29 (Mon) @ 14:18

OK, one more thing.  So you’re saying that the “r” approaches 1.00 as the sample approaches infinity.

So when you look at Derek Lowe’s ERA over a million seasons, split up by what he had for breakfast, everything else evens out and you’re left with r=1.00.  He’s slightly better when he eats Wheaties, slightly worse when he eats Cheerios, and over a million games everything else evens out and you’re able to predict his ERA over a million games with perfect certainty based on what he had for breakfast.

But then, you decide to look at whether or not he went out boozing the night before (totally hypothetical here).  So over a million games, everything evens out and you’re left with Lowe being a little worse when he got drunk the night before a start, and a little better when he didn’t.  Over a million games, you can predict with perfect certainty r=1.00 how he will do over a million games given whether he was drinking the night before or not.

We know r^2 equals the portion of the output that can be explained by the the input variable.  In this case, we have r=1.00 for both breakfast and boozing.  Wouldn’t that indicate that ERA is 100% influenced by breakfast, and 100% influenced by boozing?  And isn’t that impossible?  This is why I intuitively feel like the ceiling, or asymptote, or a correlation should be some number other than 1.00.


#7    Tangotiger      (see all posts) 2007/10/29 (Mon) @ 14:48

I didn’t say r=1, but rather r approaches 1.

In any case, your number of observations = 1.  That’s not what you need.  You need a larger sample size, say 30 pitchers or 100 pitchers.  You need SOME variance in your population.  I suppose even 2 pitchers would do it.  For each pitcher, you’d want 1 million trials (PA).


#8    Voros McCracken      (see all posts) 2007/10/29 (Mon) @ 20:19

When I used to do this for individual seasons, I found that whether I used Opening Day Payroll or the end of year payrolls made a substantial difference.

I believe Opening Day Payroll is the right one to use, because for causation, causes must precede their effects. A team’s payroll on September 15, 2007 definitionally has no effect on the result of a game on April 15, 2007.

There is a reverse causation problem where teams who win add payroll as the year goes on and teams who lose cut payroll. I remember one season going way back where the correlation coefficient was as low as 0.3.


#9    tangotiger      (see all posts) 2007/10/29 (Mon) @ 20:39

I had this discussion with Voros on baseballboards.

Yes, payroll MUST precede the games played.  But, it should actually be the actual payroll of the game being played itself.  If payroll is being dumped, ala Marlins 98, you need to know the payroll of the game being played.

If Moises Alou played 81 games, then you want 50% of his annual payroll, since that is exactly how much the Marlins paid for him.

In short, you want actual money disbursed for the year.

All numbers for illustration only.

Unless of course you are looking for cause/effect beyond this, and are truly interested in payroll as of Apr 1.


#10    Voros McCracken      (see all posts) 2007/10/29 (Mon) @ 20:53

That’s fine, but you still have to deal with the third variable of wins -> payroll if you’re going to untangle the amount of causation that goes into payroll -> wins.

It seems to me easier to do this (using opening day payroll) than it would be going to retrosheet and coming up with a payroll figure for each team in each game played that season.


#11    MGL      (see all posts) 2007/10/30 (Tue) @ 00:12

Depending upon what question you are trying to answer, I don’t like the whole idea of mixing FA, arb, and pre-arb players anyway in these analyses and regressions.  It just muddies the waters.

Mike, the “reason” sample size of the underlying measurement (in this case, games played for each observation) is so important is that the correlation is sensitive to the the random fluctiation associated with “wins.” We are really interested in a team’s true winning percentage regressed on payroll (actually, we might be interested in a one-year sample of that true WP, as reprented by wins in one season). One year (or multiple year) wins is merely a sample of that true winning percentage.

The reason economists and social and psychological scientists tend to ignore this issue is that they are not used to dealing with variables in a regression that have an underlying sample size. They are used to dealing with regressions like “a person’s intelligence as measured by IQ versus their income,” etc.

But again, once you include the undrelying sample size in your explanation of the results of the correlation, everything is O.K.  But you must include it!  You cannot say, as Zimbalist apparently did, “The correlation between winning % and payroll is X!” That makes no sense.  As I said in the other thread, it makes as much sense as saying that a “pitcher’s K rate is 120 K.”

You have to say, “The correlation between a team’s one year (162 game) win rate and payroll is x.” If you say, “The correlation between a team’s 5 year win rate and its average payroll over those 5 years is y,” y will necessarily be a different number and likely much larger.

A low correlation between one-year team win rates and payroll, or one-year pitcher BABIP and another year BABIP simply tells us that a one-year sample of that independent variable does not explain the dependent variable very well, not that the independent variable itself, without explaining that it is a one year sample (actually the independent variable IS a one-year sample of X, not just X), does not explain the dependent variable well.

It is a tough concept to get your arms around, because it seems like everything that is associated with the independent variable ultimately yields a correlation of 1.0, which seems counterintuitive.


#12          (see all posts) 2007/10/30 (Tue) @ 00:13

In Mike’s example of Lowe boozing and eating breakfast I suspect that we’d find absolutely no link whatsoever so r would tend to 0 not 1.


#13    Voros McCracken      (see all posts) 2007/10/30 (Tue) @ 00:28

Mickey (from now on you’re ‘Mickey’ and Tango is ‘Tommy’smile ),

I agree 100% with everything you said with one possible exception: one reason to stick to single season win percentages is that they are functionally more important than a five year win percentages. Having the best five year winning percentage in the league in and of itself gets you nothing, having the best single season record gets you home field through the LCS.

So what I’m saying is that the extra randomness involved in single season results is an important part of the discussion. Teams have more trouble “buying” championships because of it (and because of the playoffs) and so it is at least fair to look at both sets of numbers.

Also five year running totals also exacerbate the reverse causation issues I mentioned above.


#14    MGL      (see all posts) 2007/10/30 (Tue) @ 03:20

I was wrong (and so was Tango, aka “Tommy") about the fact that these correlations will always approach 1.0 as the underlying samples for each observation approach infinity.

That is ONLY true if you are measuring exactly the same thing in the x and y variables (but they are different samples of course), such as in an ICC type of correlation or even a y-t-y (which is essentially an ICC), or some other “one time period to another time period” regression. 

If you are measuing two different things, like in this case, team wins and payroll, the correlation is NOT going to approach 1.0 if there is any correlation at all.  It will approach whatever the correlation is between true team winning percentage (of which one season or multiple seasons is but a sample, not to mention the fact that within a season, and certainly over multiple seasons, you are measuring different true wp’s) and payroll.  That could be low or it could be high.  It is probably something like .8 or .9, which is why you would get something like .5 for one season, as in the Zimbalist regression, and .7 for multiple (14?) seasons, as in Tango’s regression.


#15    tangotiger      (see all posts) 2007/10/30 (Tue) @ 07:48

It would approach 1 if you had 1 billion games in a season, and there was logic in payroll.

The reason it doesn’t is that the 1947 Yanks has nothing to do with the 2007 Yanks, and the Yanks selective sample from free agents, while other teams embrace arb-players.

Otherwise, you would get r approaching 1.


#16    Tangotiger      (see all posts) 2007/10/30 (Tue) @ 10:36

I posted this on Voros’ blog, and will repeat here:

It really doesn’t matter how much you spend in dollars.  It’s what you are buying.  For example, if an Accord will give me 20,000$ in value, it doesn’t matter how much I spend for it (15K, 25K, 40K).  I’m getting a 20K asset value.

Similarly, a marginal win is worth around 2.5MM each.  If the Angels or Cubs are going to spend 5MM for an ARod win, they aren’t “buying” more.  They are spending more.  They are still buying his 7 wins above replacement.

That’s what you want to figure out: how many wins are you buying.

And this is what studes does with his “Net Win Shares value”, as he figures out how much each team spends for each class of player.


#17    MGL      (see all posts) 2007/10/30 (Tue) @ 11:24

#15, I don’t think that is true, but I have to think about it some more.  I think that the “r” would approach some number, but not necessarily 1.0 (could be .01, could be 1.0).  As I said, it will approach the correlation between “true” wp and payroll.


#18    Guy      (see all posts) 2007/10/30 (Tue) @ 12:07

I think I agree with MGL here.  Even if you let every 2007 team play an infinite number of games, you wouldn’t see an r of 1.0.  Cleveland really would be better than a number of higher payroll teams, the Dodgers would underperform, etc.  The r would reflect the true underlying correlation, which wouldn’t be 1.


#19    Tangotiger      (see all posts) 2007/10/30 (Tue) @ 12:12

Like I said, you have the selective sampling issue of free agents.  Yanks have chosen to throw away 100 million$, simply because they went to the free agent trough.

If this was a Fantasy Draft, and the Yanks had 200 million$, and everyone else had 50MM to 120MM, and all players were free agents, and players came with a price tag, like a car, then r would approach 1.

The only reason it wouldn’t in MLB is because the Yanks pay 100K for a Lexus, when they can buy the same Lexus for 50K.


#20    Guy      (see all posts) 2007/10/30 (Tue) @ 12:20

So, you’re saying the r would be 1.0, as long as we stipulate that it really is a perfect underlying correlation?  :>)

Anyway, I think the distinction MGL makes in #14 is correct.


#21    Guy      (see all posts) 2007/10/30 (Tue) @ 12:37

I think Voros makes an important point in #13:  focusing on seasons is logical (as long as clearly reported), since that’s how championships are awarded.  And it’s fair, to a point, to include the impact of luck if you’re talking about whether spending money ensures success in a sport.  But there are two important caveats:

1) what we mainly care about in talking about payroll and wins is whether lower payroll teams have any chance to reach the postseason.  Looking at the r doesn’t shed a lot of light on that. Yes, some true 75-win teams will win 81 games while others will win 68—and that gives you a “low” r even if spending money does improve teams.  But to reach the playoffs usually requires a true talent level (let’s say 85-86 wins) that many low-payroll teams will rarely or never reach.  So when we look at making the postseason, the disparity is much larger than the r would lead one to believe. 

We had a good discussion of this here:  http://www.insidethebook.com/ee/index.php/site/comments/wins_and_payroll/.  I posted this data on payroll and postseason appearances there:
Looking at 1992-2005, and including hypothetical wildcards for 1992-1993 (I only gave 1 WC slot to the 12-team 1992 NL), I get the following results.  For each payroll quintile, this is the number of championships a team won per decade:
Top 4.9
2nd 3.0
3rd 2.4
4th 1.8
5th 1.2
So a top-quintile team can expect to reach the postseason 4x as often as a bottom-quintile team.  (Or, fans of rich teams have to endure one non-playoff year between championships, on average, while poor teams have to endure 7.)

2) Sports economist often take the analysis a step further, suggesting that the low r for payroll-wins at the season level indicates that teams don’t know how to spend money well, can’t evaluate talent, can’t predict peformance, etc.  To address THOSE questions, you need to first extract luck.  All management can possibly do is assemble a strong team; it makes no sense to credit/blame them for luck. So if we know that something like 30% of the variance in team win% today results solely from luck, then what we care about is how much of the remaining 70% can be explained by payroll.  Plus, we need to adjust for the fact that a majority of the “production” in baseball comes from non-FAs.  Without taking these factors into account, using the overall r to evaluate team decision-making tells us more about the competence of the economist than of GMs.


#22          (see all posts) 2007/10/30 (Tue) @ 19:04

Interesting thread.

I’ve posted a long-winded agreement with Tango on the sample size issue here.

More on the other stuff later ...


#23    MGL      (see all posts) 2007/10/30 (Tue) @ 23:14

Phil, there is no doubt about the sample size issue and the “r”. 

If you are measuring the same thing with both the x and y variables, the r always approaches 1.0 (unless the underlying correlation is zero).

If the two things are not the same, then as the sample size gets larger, the “r” approaches whatever the true correlation is between that being samples and thatr being observed.  That number could be anything.

In the case of salary and wp, it is most certainly NOT 1.0, or even close, unless, as Tango said, and Guy jokes (but he is correct), you define the “r” (true wp versus payroll) as being 1.0 in the first place.

There is no controversy here.  I mislead people in my prior comments (and Tango did too, although I can’t really speak for him), because, as I said, the issue is a little tricky, confusing, and hard to wrap one’s arms around (not necessarily completely intuitive).


#24          (see all posts) 2007/10/30 (Tue) @ 23:19

Sure, if you’re measuring the same thing, the r definitely approaches 1.  I don’t think I disagreed with that anywhere ... I dealt only with payroll vs. wins, where I agree again that r won’t approach 1.

Did it sound like I agreed that r approaches 1?  Oops, maybe I kind of did, where I said that IF it’s only luck and payroll, r will approach 1.  Of course, it’s not just luck and payroll—it’s luck, payroll, drafting, and talent appraisal, to name just four.


#25          (see all posts) 2007/10/30 (Tue) @ 23:23

MGL, thanks ... I added a clarification.


#26          (see all posts) 2007/10/31 (Wed) @ 00:49

If you were able to decompose the variance of single-season wins into various factors, it might be something like this (all numbers made up except the first one):

25% payroll
10% ability to spend wisely on free agents
20% number of slaves in lineup
5% number of arbs in lineup
30% luck (exceeding talent, binomially)
10% other
-----------
100% total

Now, if you use multiple seasons instead of a single season, everything stays the same, except luck.  Luck drops.  If luck drops in half (which it would over two seasons, since we’re talking variances and not SDs), you now have

29% payroll
12% ability to spend wisely on free agents
23% number of slaves in lineup
6% number of arbs in lineup
18% luck (exceeding talent, binomially)
12% other

[Math: if you think of these as “points,” luck drops from 30 points to 15.  That makes everything add up to 85, not 100, so I multiplied all numbers by 100/85 to get it back to 100%.]

Everything goes up except luck.  The r-squared for payroll goes from .25 to .29. 

Eventually, luck goes to zero (over an infinity of seasons), and you get

36% payroll
14% ability to spend wisely on free agents
28% number of slaves in lineup
7% number of arbs in lineup
0% luck (exceeding talent, binomially)
14% other

As mgl says, the r-squared for payroll increases, but not to 1.00, since there are many other factors operating.

Right?


#27    Tangotiger      (see all posts) 2007/10/31 (Wed) @ 09:29

The r approaches 1 is predicated on everything else being random (all other things equal).  For example, we expect a pitchers to pitch in random parks with random fielders over the course of their careers, and that given enough IP (like a million), then their hits per balls in play will represent their true talent level.

Phil in 26 is right.

You can think of other examples, like Tim Raines ability to score a run.  If he’s always paired with Frank Thomas, and other runners are always paired with someone else, then of course the r can’t approach 1 if you look at only one parameter.  What you are in fact doing is looking at TWO parameters: the runner’s ability to score, and the batter’s ability to drive him in.  If you were to look at the Raines+Thomas as a single unit (which it is in this illustration), then r=1.

The basic estimate for correlation is:
r = 1 - variance(luck)/variance(observed)

As the number of trials approaches infinity, then variance(luck) will approach zero. 

The equation for variance(observed)
= var(true1) + var(true2) + var(true3) + ... + var(luck)

Since variance(luck) will approach zero, the variance of the observed is simply the sum of all the variance of the true rates.  If things are not random, like Raines and Thomas always paired together, or Mariano Rivera and Derek Jeter always paired together, then you will have more than one parameter.

So, r approaches 1 as the trials approaches infinity, with respect to anything that is not random, be it one parameter, or a set of parameters.


#28    Tangotiger      (see all posts) 2007/10/31 (Wed) @ 09:55

Also remember what I said in the other thread: teams flush away nearly 1 BILLION $ to free agents every year (overpaying for the nice finish).  If that 1 billion$ was randomly distributed to all players, that’d be one thing.  But, it’s distributed almost exclusively to free agents.  And free agents are not randomly distributed, as hard as the Royals and Gil Meche tried.


#29    Guy      (see all posts) 2007/10/31 (Wed) @ 10:19

That’s all fine, Tango, but you do agree that you were wrong when you said at the outset that “You can get a correlation to tell you r=.50 as Zimbalist is saying, or r=.70 as I’m saying, or anything at all that you want” (and repeated that claim in posts 3 and 5), right?  Even with seasons of infinite length, the win-payroll correlation probably maxes out at the .7 you got, or a little higher.


#30    Tangotiger      (see all posts) 2007/10/31 (Wed) @ 10:45

I was wrong in not adding the provision: “, as long as all parameters are independent”.  That provision, however, is pretty much accepted, since the regression equation is an equation of a set of independent variables (x1, x2, x3) to the dependent variable (y).

Technically speaking, if your independent variables are not really independent, then you have a covariance term.  And that covariance term will stop you from having an r=1, if you look at each variable independently.

So, to answer your question: yes, I’m wrong in making my statement by not stating my assumption.  However, the assumption that I made was a commonly accepted one, which is why I didn’t even think of stating it.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Dec 03 13:52
Sabermetric Moves of the 2009 Pre-Season

Dec 03 16:43
Avery being Avery

Dec 03 16:17
How to calculate the area of a baseball field

Dec 03 16:14
What would happen if the shootout period was 10 minutes, not 5?

Dec 03 14:50
The Return of the Baseball Abstract?  No, the next best thing…

Dec 03 14:48
Estimating BABIP

Dec 03 13:58
NYC’s 3 1/2 year mandatory jail time sentence for carrying a loaded weapon

Dec 03 10:42
What was Pedro worth?

Dec 03 10:20
Complete Run Expectancy, Retrosheet Years

Dec 02 23:36
The Holy Writers strike again!