THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, August 30, 2006

Forecasters: How Accurate Can They Possibly Be?

By Tangotiger, 06:24 AM

0.73

Here’s how we can tell:


If you have thousands of samples, say students, and they each take several hundred tests in one session, say 550, and then have those same students take that exact same number of tests, but new, we can determine the correlation coefficient (r) in two ways.

1 - A sample-to-sample regression.
2 - Using only the results from the first sesssion

The second one is based on taking the variance (true) of the students in question, and dividing by the variance (observed actual tests).  That’s your “r”.

The problem of course is that we don’t know the variance(true).  We could estimate it if we can plug it into the equation:
var(obs) = var(true) + var(random)

However, we don’t know var(random) either.

Enter the binomial.  Let’s forget about students, and look at baseball players, and their OBP.  Fortunately, an OBP is simply the safe plays divided by the safe plus out plays.  So, we can determine the random standard deviation using the binomial.

sqrt(OBP*(1-OBP)/PA)

Remember also that SD^2 = variance

If you select a few hundred ballplayers every year with 550 PA, we can figure that the var(random) = .020 ^ 2.  It’s also easy enough to observe their OBP and get the var(obs) as around .039 ^ 2, depending what years you select.  var(true) is then derived from these two numbers as .033^2.

Our “r” is .033^2 / .039^2 = .72

What does this mean?  You can take several hundred ballplayers, give them 550 PA one year, give them 550 PA another year, make sure that these guys’ true talent level in OBP does not change, make sure they play in the same parks, make sure they face the same quality of pitchers, and their year-to-year correlation will be 0.72.  That is, the absolute maximum year-to-year r you can hope for, given a large number of ballplayers is .72.

How about instead of OBP, we look at wOBA (which is analogous to OPS)?  Here, our var(true) is .036^2, var(random) is .022^2, and var(observed) is .042^2.  Our r is .73.

So, when looking at forecasters, and you look at their correlation coefficient of their forecast to the actual results, anything close to .73 means that they did as good a job as possible.  (They could actually go over that level, since the number of players in their sample is still small enough that the level of uncertainty of that r will be a bit high.  But, given thousands of players over several years, that uncertainty level will drop quite a bit.)

The other key question is: how does Marcel The Monkey do?  A few years ago, when I ran it, I think the r was .65.  I’ll have to redo that to see what it actually is after several years of results.  But, that’s what everyone is fighting for, to get from the .65 level to the impossible .73 level.

And remember, I used 550 PA for each player.  Drop that down, and the maximum r will drop down as well.

#1          (see all posts) 2006/08/30 (Wed) @ 13:16

Hi Tango,

If I could paraphrase, your essentially arguing that 1) if we model the population of hitters as having two sources of variances--skill and error; and 2) we assume that the error variance can be approximated as the variance of n bernoulli trials with p = OBP and n = PA; and 3) we assume that no system can reliably predict the error variance; then we can emperically calculate the portion of the variance the best predictive system could achieve.

This seems reasonable, but what happens if we don’t fully accept #2?  How would it change the situation if the batter didn’t have a static OBP, but rather had a varying “true” OBP that depended on the circumstances of each PA?

Excuse me while I think out loud here… Lets take a ridiculously extreme case, such that a players who appears to have a .335 for 550 PA has a “true” rate of .100 for 225 PA, and .570 for 225 PA. The variance (np(1-p)) for a constant .335 OBP guy would be 122 on-bases per 550 PAs, whereas for a half & half .100/.570 guy, it would be 75.  The standard deviation from the .335 guys would be 11.0 on-bases per 550, whereas the .100/.570 guys would have a SD of 8.6 on-bases.

Do you think moving beyond a bernoulli model of hit probability could allow a predictive system to theoretically break the .73 barrier?


#2    dq      (see all posts) 2006/08/31 (Thu) @ 05:21

obviously a batter doesn’t have a static obs - it changes on a given situation -the best example is quality of pitching - he might have an obs of .2 versus the best pitchers and .5 versus the worst.
There are obviously other factors that affect performance.

The problems would be to (1) identify the situations that have different true variances and (2) be able to predict/project how many of each situation will occur in the year to be projected.


#3          (see all posts) 2006/08/31 (Thu) @ 05:40

I should clarify: I’m not trying to argue that creating a model of hitting performance that takes into account the necessary variables is practical (although, we could probably make an reasonable attempt). I’m just asking whether the theoretical limit of r = .72 depends on an assumption that we don’t necessarily want to commit to.

I’m just curious whether Tango’s theoretical limit is a bit too conservative, independent of the issue of how likely we are to reach it.


#4    John Beamer      (see all posts) 2006/08/31 (Thu) @ 08:05

Well, even if you know the specific subsets of OBP it doesn’t make a difference in reality. In order to calculate the random OBP you use the sqrt(OBP*(1-OBP)/PA) formula. Even if you know that for 100 pitchers the true OBP is around .100 and for another 100 pitchers it is .500. On average (ie together) the random variation would be the same.

The .500 OBP has greater random variation than the .100 OBP—plug it into the formula and see. But on average there won’t be much difference.

For example, take 550 PA

At .300 OBP st dev = .019
At .500 OBP st dev = .021
At .100 OBP st dev = .012

So at the extreme if you know if a hitter hits 0 against one set of pitchers and 1 against another, and as long as you can identify these pitchers you can get an r of 1.

However, in reality Tango’s method is spot on


#5    John Beamer      (see all posts) 2006/08/31 (Thu) @ 08:41

Further to my earlier post the st dev of the variation would actually be bigger as you’d have fewer PA for the example above at 275 PA

OBP of .100 = .019
OBP of .500 = .03
OBP of .300 = .027

Weighted average of 550 PA at .100 and .500 OBP = .024, which is worse than our original estimate with 550 PA


#6          (see all posts) 2006/08/31 (Thu) @ 09:55

J. Beamer: “Even if you know that for 100 pitchers the true OBP is around .100 and for another 100 pitchers it is .500. On average (ie together) the random variation would be the same”

Hi John,
I don’t think this is correct.  If we are assuming that a PA is a bernoulli event, with a prob P of success, repeated N times, the variance is np(p-1).  So for a batter with a static P = .335, and an N = 550, his variance will be 122.5 (std = 11.1).

That is, with 550 PAs, the .335 OBP guy would be expected to get 184 on-bases (n * p), with a standard deviation of +- 11.1. 

If we split the batters season in half and say he has 225 PA at .100 OBP, and 225 at .570 OBP, we can take the sum of the variance:
225 * .1 * .9 + 225 * .57 * .43 = 75.4 (std = 8.6).

Thus, a guy with 225 PAs at .100 and 225 PAs at .557 would be expected to get 184 on-bases (n1 * p1 + n2 * p2), but with a standard deviation of +- 8.6.

Mathematically, this demonstrates that it is not the case that a player that has 225 PA at OBP .100 and 225 PA at OBP .500 will have the same variance in performance as a player with 550 PA at OBP .300.


#7    John Beamer      (see all posts) 2006/08/31 (Thu) @ 10:44

CDM,

You are right—it is slightly better, I was being a bit glib with my first statement. My point, poorly explained, is that there is very little differnce, unless OBP against a certain (large) set of pitchers is very low or very high.

Therefore for all intents and purposes Tango is more of less correct. The difference in OBP for hitters against different pitchers is probably a range of .100 centred around .350ish.

In the great scheme of things it will make almost no difference.

One more point. Your calculation underestimates variance. N = 275 (550/2) not 225—but your conclusion still stands.


#8          (see all posts) 2006/08/31 (Thu) @ 11:26

Fair enough.  You’re probably right.  But lets go through the motions. For the extreme example I came up with above, the r = .84, but lets consider a more realistic case.

Lets say we break down performance and probability distribution as such, so that for n PAs, the batter has an “true” OBP of p:

p = [.08 .13 .18 .23 .28 .33 .38 .43 .48 .53 .58];
n = [ 5 15 45 75 85 100 85 75 45 15 5];

then sample variance = sum(np(1-p)) = 115.83

The variance of the sample proportion would be 115.83 / n^2 = 0.0196^2

plugging this into tangos equation:
var(obs) - var(rand) / var(obs)
(.039^2 - 0.0196^2) / .039^2 = .74

Sure enough. The improvement is marginal.
Even if I multiply p above by 0.8 (to make a really bad OBP guy), the r doesn’t exceed 0.78.
So the limit seems to even apply to the Neifi Perez’s of the world.


#9    John Beamer      (see all posts) 2006/08/31 (Thu) @ 12:01

CDM—good stuff. That’s what I should have done first off!!


#10    dq      (see all posts) 2006/08/31 (Thu) @ 18:32

I have one other question about the correlation here - it’s not whether you have different correlation for different at bats, but whether the correlation is affected because it is made up of different skills.. - walk %, hr %, contact % (non k), and ability to get a hit on a batted
ball. On base $% is not one skill, but a combination of several.

I’m pretty sure that the 1st 3 skills have higher possible correlation than .72 - if you can devise better measures of those 3 skills, plus factor in the 4th, can you get a better potential correlation?  My advanced math days are long gone, so I’m hoping one of you guys can do the heavy lifting.


#11          (see all posts) 2006/09/07 (Thu) @ 20:34

Tango,

You write that the year-to-year correlation r is given by

r = var(real)/var(observed)

I am fascinated by this equation ... is there an easy derivation?  Is it a well-known result?

Also, you say that formula is for regressing actual against actual, where actuals are independent but based on the same “real” (such as two consecutive seasons).  Is there a similar formula for calculating r for a regression on actual versus real?


#12          (see all posts) 2006/09/08 (Fri) @ 05:52

Tango,

Are you sure your above result is for the actual/actual regression, and not the actual/real regression?  I ran a little numerical experiment on an actual/real and the r matched your formula above.

If that’s true, I’m guessing you’d have to divide by sqrt(2) to get the actual/actual case?


#13          (see all posts) 2006/09/08 (Fri) @ 05:56

OK, your first example shows actual/real.  But your second example shows actual/actual (year-to-year correlation).  I assume the first is correct.


#14    John Beamer      (see all posts) 2006/09/08 (Fri) @ 22:00

Phil,

Why do you divide by sqrt(2) to get the actual/actual correlation? Is it because you need to account for the random variation in the 2nd actual variable and the “r” essentially standarizes the distribution (ie, makes it a Z-score if effect).

Thanks
John


#15          (see all posts) 2006/09/08 (Fri) @ 22:36

Hi, John,

Actually, I don’t know if you do that, I was asking.  But now I think that’s not correct. 

The reason I thought of root 2 is that if X and Y are independent, var(x+y) = var(x)+var(y).  So var(2x) = var(x) + var(x) = 2 * var(x).  And so SD(2x) = sqrt(2) * SD(x).  What I thought was that if you go from actual/real to actual/actual, maybe you’re doubling the variance.  But that’s not right, I don’t think.

Actually, I ran a simulation and found that the real/observed correlation worked out numerically very close to SD(real)/SD(observed). 

Tango says the observed/observed correlation is the square of that, or var(real)/var(observed).  That makes more sense than my sqrt(2) thing.


#16    tangotiger      (see all posts) 2006/09/09 (Sat) @ 07:22

Phil, can’t talk now, but check out the three links in the first comment here

I hope I’m calculating r, not r-squared, or vice-versa, or whatnot.  Anyway, will check into it next week.


#17          (see all posts) 2006/09/09 (Sat) @ 11:18

Hi, Tango,

It does sound like you’re doing it right.  I think I misread your post orginally.

My next question, though, is this: in my numerical example, I found that the correlation between “real” and “observed” was roughly

SD(real)/SD(observed)

Is this generally true, or just a coincidence for my simulation?


#18    John Beamer      (see all posts) 2006/09/09 (Sat) @ 11:59

Phil

One of the links that Tango posted was this one: http://www.socialresearchmethods.net/kb/reliablt.htm

It confirms your result above.

John


#19    John Beamer      (see all posts) 2006/09/09 (Sat) @ 12:03

Hmm - you can’t edit posts! The example linked to is var(real)/var(observed), which is what Tango said originally. Apologies—no help at all whatsoever.


#20    John Beamer      (see all posts) 2006/09/09 (Sat) @ 22:10

Phil,

Is your equation SD(real)/SD(observed) what you are referring to when you say things like: A correlation of r means that 1 SD move in Y corresponds to a r SD move in X?

If this is so how does it gel with the var(real)/var(observed) = r formula

John


#21          (see all posts) 2006/09/10 (Sun) @ 05:33

Hi, John,

Actually, that’s true of any regression.  A 1 SD change in the independent variable leads to a r SD change in the dependent variable by definition.


#22    John Beamer      (see all posts) 2006/09/10 (Sun) @ 07:02

Phil

That is what I thought—and you did have a great explanation on your blug but .... If that is the case then I’d expect the equation r= SD(x)/sd(y) to be true by definition. But it differs from the r = var(x)/var(y) above, or even the true definition of r = cov(x,y)/var(x)var(y) .... I guess I am missing something!


#23          (see all posts) 2006/09/10 (Sun) @ 09:23

Hi, John,

What I’m saying is that if you normalize X and Y to have mean 0 and SD 1, then

E(X | Y=c) / c = r

SD(X) and SD(Y) can be anything at all, obviously, whether there’s a correlation or not.  So SD(X)/SD(Y) alone doesn’t represent anything significant.

If that makes sense.


#24    tangotiger      (see all posts) 2006/09/13 (Wed) @ 10:44

Phil,

I don’t know why you got the results that you did.  I ran a test, which I will detail here.

Create a population distribution with a mean of .340, and a standard deviation of .030.  The population size is 250 players.

For each player, give him 250 PA for season 1 and season 2.

What should we expect?  Our random standard deviation is: sqrt(.34*.66/250 PA)= .030

Our expected observed standard deviation is: sqrt(.030^2 + .030^2) = .042.

Our expected r is .030^2/.0424^2=.500

When I run the year-to-year correlation, I get the following the first time I tried it: .48.  Running it 9 more times, and I get: .47, .54, .55, .60, .44, .48, .45, .52, .56.  That’s an average of .51, which is essentially what I expected without running any simulations.

***

Note that the true talent level of our players was completely unchanged year-to-year, and therefore, the variability was completely due to the sampling.

However, even if talent changes year-to-year, say to the tune of 1 SD = .015, what “r” should we expect?  Assuming the change in talent level is rather random, then our var(obs) will now be sqrt(.03^2+.03^2+.015^2)=.045, making our r as (.03/.045)^2= .44.

I would bet the true change in talent level is much closer to 1 SD = .005 than anything else, and so, the year-to-year r would be expected to be .49.  For all intents and purposes, treat the change in talent level, year-to-year, as being very minor.


#25          (see all posts) 2006/09/13 (Wed) @ 14:29

Hi, Tango,

You ran a year-1-actual/year-2-actual regression.  I ran a year-1-actual/talent regression.  It’s the latter that I got the SD(talent)/SD(actual) result.

I ran it on teams playing 162-game seasons, not on the players.

See http://sabermetricresearch.blogspot.com/2006/09/payroll-vs-wins-still-significant-in.html .


#26    tangotiger      (see all posts) 2006/09/13 (Wed) @ 17:25

oic.

A regression of observed to true, eh?  Just thinking out loud, and I have no basis other than seat-of-my-pants.  If I do a regression of sample to true, and then a second sample to the same true, and then a regression of first sample to second sample, I should be able to find an equation that ties them all up, right?

I’d say:
r^2 = r1^2 * r2^2
where r1=sample1 to true
r2=sample2 to true

In our case, r1=r2.

If my r=.50, then r^2=.25=r1^4
r1=.71=r2

So, if the observed to true is .71, then the observed to observed is .71^2=.50

Does that jibe with your results?

I’ll ask Andy to chime in.


#27          (see all posts) 2006/09/13 (Wed) @ 17:33

Yup, that’s what I was thinking.  But it’d be nice to have a proof or something.


#28    tangotiger      (see all posts) 2007/11/09 (Fri) @ 18:58

I ran a test of correlation observed-to-true, and the result was the sqrt of the typical “r” equation that I’ve been using.

That is, when you look only at single-year data, the equation
r = 1 - var(error)/var(observed)
implies a year-to-year correlation based on observed-to-observed.

However, if you do a correlation of true-to-observed, the expected r is the sqrt of the above equation.

So, when I say that the best a forecaster can do is r=.73, that implies if the forecaster only has 1 year of data at his disposal.

The true upper limit is the sqrt(.73) or .85.

***

The current forecasters, including Marcel, gets you an r-squared of around 50%.  The upper limit r-squared is around 73%.


#29    Mike      (see all posts) 2007/11/09 (Fri) @ 21:15

Tango,

Real quick, why do you square root the correlation coefficient (.73) to get the upper limit of .85?  And, what you’re saying, is that if a team uses multiple years of data (4+), which they would do, they can reach .85?  I just wanted to make sure I understood that correctly.

Also, has anyone looked at these numbers using lwts (I apologize if this has been done - I haven’t read the whole thread)?  I’d be interested in seeing how those numbers would compare.


#30    MGL      (see all posts) 2007/11/10 (Sat) @ 00:30

The upper limit is the upper limit.  Nothing can bring that down.  Other than a higher OBP (which reduces the binomial variance).

Anything else (such as a variable skill for each batter for whatever reason) will lower that r.

Any other metric that is NOT a binomial (or I should say a combination of binomials), like lwts or wOBA, will result in a lower r (and a higher chance variance) as well.

I am not sure what Tango’s .85 is.  As the years get longer the r gets larger.  Maybe the .85 is regressing one year to an infinite number of years, I don’t know.  Remember that using a regression with an x and y variable, the magnitude of the r depends on only 2 things (assuming that we are measuring the same thing from among the same population), the size of x and the size of y.  As each one gets larger, the r gets larger.

If you use the other method (observed and expected chance variance), then you only have one variable, X.  As that gets large, so does r.  As that approaches infinity, r approaches 1.

With the “traditional” regression (X and Y, usually two different time frames, like y-t-y), the only way for r to approach 1 is for both samples to approach infinity.


#31    tangotiger      (see all posts) 2007/11/10 (Sat) @ 07:53

Yes, infinite years (infinite knowledge, or the true rate) to one year (given 550 PA in the observed year) will give you an r=.85.

You can easily test this by assuming you know the true rate OBP, that the population has a true 1 SD = .030, and that the observed has 550 PA.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main