THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, December 04, 2006

Who’s smarter than a monkey?

By Tangotiger, 11:01 AM

A few people, but just barely.  No shame in that, as I’ve shown before

As for using cutoffs, you don’t even have to worry about that, since your correlation should be weighted by the number of PA to begin with.  Simple correlations weight each data point equally.  But, here, we don’t need to be doing that.


#1    Rally      (see all posts) 2006/12/04 (Mon) @ 11:27

Is there any easy way to do weighted correlations in excel?

I could do this without a cutoff for Chone, Marcel, and Zips, but will have to stick to the higher cutoff for the others, since I’m entering the data from books.


#2    Tangotiger      (see all posts) 2006/12/04 (Mon) @ 11:49

I’m sure Andy described the process in the Appendix (which I don’t have handy).

But this may help you:
http://www.stat.yale.edu/Courses/1997-98/101/anovareg.htm


#3    David Gassko      (see all posts) 2006/12/04 (Mon) @ 16:31

I’m going to copy over my post from “Chone“‘s blog:

Good stuff, Sean, but…

The question is whether or not these differences are statistically significant. Actually, that’s not really a question. With just 114 observations, one standard deviation is around .093 points of correlation, which means that these differences are meaningless. With one year’s worth of data, you actually can’t really prove the superiority of one projection system over another.

Really you need over 1,000 observations for a difference of .05 points of correlation to be significant at the 5% level. That’s many years of data and low plate appearance cut-off. Since systems are continually improved, and we probably have to assume that the order of their quality changes due to those improvements ( i.e., that they do not off-set, and some systems improve relative to other systems and not just in relation to themselves), I’m not sure you can run any kind of basic correlation test that will give you significant results.


#4    tangotiger      (see all posts) 2006/12/04 (Mon) @ 17:11

Rally, since you’ve got the data, can you do a “head-to-head”?  Compare say Marcel to Shandler, and for each player, give a win to the forecaster who was closest to actual *and* at least .010 better than the other forecaster.

So, if someone is .350, and Shandler said .344 (off by 6 points), and Marcel said .358, that’s a “no decision”.  A .366 by Marcel (off by 16 points, and 10 higher than Shandler) would be a win for Shandler. 

What would the win-loss-tie be here?  If it’s too much of a mess, send the data my way, and I’ll run it.


#5    Rally      (see all posts) 2006/12/04 (Mon) @ 18:13

I won’t have time to do anything tonight, but I can send you the data.  I’ll do at when I get to my home computer.


#6    MGL      (see all posts) 2006/12/04 (Mon) @ 23:09

As I remind people from time to time, there is no “magic” to a level that reaches some arbitrary level of significance (5%, 1%, etc.), so let’s NOT call a difference that is not significant at the 5% level “meaningless.” It is not meaningless.  It simply reflects a “conclusion” with a certain level of uncertainty, more than some and less than others.

It is a really bad habit in the “real world” (ans sometimes even in science) to dismiss a result that is less than some arbitrary sigma level and “accept” it above that level.

The results of the “r’s” suggest that each system is better than the one below it.  Period.

As well, these things are often Bayesian problems with a priori probabilities.  For example, we can probably assume that there is a high likelihood that Shandler is going to beat Marcel and perhaps even the other ones on the list, since Shandler is a smart guy and puts a lot of time and research into his projections.

MGL


#7    tangotiger      (see all posts) 2006/12/04 (Mon) @ 23:44

Rally sent me his list.  This is what I did.

1. Selected all players with at least 400 PA. (n=223)

2. Figured out the simple average OPS for all systems (actual, chone, zips, marcel)

3. Recast all OPS against a common baseline (OPS=.800).  For example, the average Marcel OPS was .786, so I added .014 to each player’s OPS.  I added .019 to chone, and .010 to ZIPS.

4. Figured out the difference between the baselined-Marcel and the baselined-OPS.  Jermaine Dye’s Marcel is .227 off the actual.

5. Figured the standard deviation of the differences.  Marcel was .0758, Chone was .0763, ZIPS was .0791.  Big win for Marcel.

6a. Did a head-to-head of Marcel against Chone and Marcel against ZIPS.  Of the 223 players in question:
Marcel v Chone: 111-112 (.498)
Marcel v ZIPS: 122-101 (.547)

6b. Repeated step 6, but limited to the players where there was the biggest disagreement between each.  For example, Bonds was 1.227 by Marcel and 1.038 by Chone, for the largest disagreement in forecast.  Chone won that one.  I selected the 111 biggest gaps in forecast (half the sample):
Marcel v Chone: 54-57 (.486)
Marcel v ZIPS: 62-49 (.559)

Marcel did much better than I thought it would.  I then created a Chone+ZIPS forecast, and repeated the exercise.

In step 5, the result was .0755, slightly bettering Marcel.

In step 6a, 116-107 (.520).

In step 6b, 52-59 (.468)

***

In the 20 biggest differences between Marcel and Chone+ZIPS, Marcel was the clear favorite on 10 and Chone+ZIPS on the other 10.  The Marcel wins were:
1.Hanley Ramirez (where Marcel simply forecast the league average)
2. Soriano (Chone-ZIPS had him below lg avg, likely a park factor issue)
3. Uggla
4. Jeff Kent
5. Adam Dunn
6. Frank Thomas
7. Weekes
8. Teixeira
9. McCann
10. Glaus

Chone-ZIPS bested Marcel on:
1. Bonds
2. Jacobs
3. HAwke
4. Atkins
5. Murton
6. Francoeur
7. Willingham
8. Clayton
9. Bautista
10. Gonzalez

***

I was expecting Marcel to only win on 40%, not around 50%.  There may be a few lessons to learn:

a. are these systems not using Marcel as a base?  These are simply based on historical accounts. 

b. obviously, these other systems will have a leg up on Marcel on park factors and minor leagues, since Marcel ignores both.

But, it seems that these systems are losing ground on “a” and making up for it in “b”.  Why?  They should be better than Marcel.

The Marcel model was laid out here:
http://www.tangotiger.net/archives/stud0346.shtml


#8    MGL      (see all posts) 2006/12/05 (Tue) @ 03:17

As Tango has mentioned in the past, you have a fairly selectively sampled group of players when you use 400 or 500 PA as a min, do you not?  Any unestablished player who does poorly at first is probably not going to get that many PA and bad players who got lucky will probably accumulate more PA than they would have had they not gotten lucky.  So I would expect that all players with a min of 400 or 500 PA should best all of the projections on the whole, especially for the unestablished players.  How you get around that when testing projections, I am not sure.

Park factors do not come into play for players who have not changed parks in the last 3 years, which is probably half the players, no?

Personally, I don’t think any system is going to beat Marcel for players with at least 3 years in MLB once you add park factors and a couple more context neutralizing adjustments.  About all you can do more than that is to maybe account for injury plagued seasons (and maybe a couple different aging patterns) and establishing a better (different for different players) mean to regress towards (perhaps by height and/or weight or defensive position).  Anything else (like trying to predict which players might develop and which might not, or individual aging patterns, etc.) is Quixote’esque I think.

Anyway, we discuss this just about every year don’t we, and pretty much come to the same conclusions and repeat the same old things?


#9    Rally      (see all posts) 2006/12/05 (Tue) @ 10:24

Selective sampling is an issue here for your minor leaguers.  For Uggla and Ramirez, the Marcel is simply the league average, while the others are based on their MLE’s.  Their MLE’s were not particularly impressive.

By forecasting the league average for every remotely talented minoe leaguer who might surprise you and have a good year, you can beat Chone and Zips every time.

Why?  If the player is as bad as his MLE’s suggest, he gets sent back to the minors and doesn’t meet your playing time cutoff.  If he’s successful enough to hold a job, he’ll be closer to league average than a .650 OPS or whatever is projected for him.


#10    Rally      (see all posts) 2006/12/05 (Tue) @ 10:34

Last year’s Chones started with a Marcel base, it was 3 years data and weighting was the same.  Besides park factors, the main differences were homerun regression based on weight, a speed score boost to babip, and each component regressed individually.  Then there’s of course park factors.  I don’t use sim scores for players.

This year I’ve added a 4th year to the sample (weighted less, but it does help more players than it hurts) and am working on a few age factors based on player types.

Right now the spreadsheet that does the work is up to 15 MB, for very little gain.  If I were rational I’d just give up and stick with Marcel for a quick answer, or Shandler’s book to draft my team.  But I’m not.  I love this stuff.


#11    tangotiger      (see all posts) 2006/12/05 (Tue) @ 11:24

Rally, I’m with you!  I’ve spent countless (well into the hundreds of) hours to improve as much as possible, only to get to the point where I am now.

***

It’s kinda of strange that we still are pretty far apart, if you are just doing some mild tweaking.  For example, the average OPS difference between Marcel and Chone is .022, for the 223 players.  The average difference between Marcel and ZIPS is .023.

Among players who ended up with the highest BABIP, Marcel is 32-35.  Lowest BABIP, 34-33. 

The top 20 guys in SB-CS averaged a BABIP of .326, 16 points above the sample unweighted mean.  Yet, Marcel was 10-10 on their OPS against Chone.

I think what this shows is that any changes can only be considered tweaks, and the small sample of performance (400 to 700 PA is “small") makes it so that the noise will often overwhelm the signal.

***

If I look only at guys between 400 and 499 PA, Marcel is 36-31 against Chone+ZIPS.

Players with at least 629 PA: Marcel is 33-34.

Rally’s list started with guys with at least 300 AB, so that’s as much as I can do.

And yes, one “secret” is to assume everyone will play at league average, since forecasting tests all assume a minimum threshhold in PA, like 300 or 400, so, to get to that level, you do end up playing better than expected for the fringe guys.


#12    tangotiger      (see all posts) 2006/12/05 (Tue) @ 16:33

Rally was also kind enough to present the forecast for Shandler and James, along with the other three for guys with at least 500 AB.  Repeating the same steps as above:

5. All were between .071 and .072 as the standard deviation of the individual differences.

6.  This time, for head-to-head, I chose to count as a tie any OPS difference of .010 or less.  An OPS difference of that is around 2.5 runs per 650 PA.  Here’s the head-to-head:

Marcel v
Chone: 34-43, 37 ties
ZIPS: 37-38, 39 ties
Shandler: 36-45, 33 ties
Bill James: 45-38, 31 ties

Before we are too hard on Bill, please make an IMPORTANT note that I rebaselined based on the sample mean, which in this case is these 114 players.  This is the WRONG thing to do.  I should be using everyone in the population.  The simple actual OPS of these 114 players is .825, and Bill James said .816, while Marcel said .799.

For the 223 players in the larger sample, Marcel said .786 against the actual of .810.  IN THIS CASE, the rebaselining was around the same.  That is, Marcel’s run environment for the 223 players was 24 points below actual, while the run environment for the 114 players was 26 points below actual.

I don’t know what run environment Bill used without looking at all his forecasts.

On the other hand, what happens if I simply remove the baseline estimate?  This works to Bill’s advantage, and Marcel’s disadvantage.  In this case:
Marcel v Bill James: 32-40, with 42 ties

Where did Marcel and James disagree the most?  By far, Ryan Howard.  James knocked this one right out of the ballpark, while Marcel and the rest had relatively modest forecasts.

But, the next guy on the list is Teixeira, and Marcel took the rest of the group to the cleaners. 

James won on Aaron Hill, Betancourt, Taveras, Jack Wilson.  Marcel won on ARod, Granderson, Carlos Guillen, Mauer, Jason Bay.

Out of the top 20 disagreements, Marcel was 11-9 against Bill James.

I continue to be completed unimpressed with any and all forecasting systems.


#13    dan      (see all posts) 2006/12/05 (Tue) @ 19:58

I know Chone’s blog says he’ll get to PECOTA but I couldn’t help myself.  Using BP’s weighted means spreadsheet and 400 PAs (which, for some reason, I get as n=208, rather than Tango’s 223), there looks to have been a correlation of .671.  That would be the least improvement over Marcel of all the systems.


#14    Rally      (see all posts) 2006/12/05 (Tue) @ 20:16

I was lazy, I used AB instead of PA as the cutoffs on my spreadsheet.


#15    tangotiger      (see all posts) 2006/12/05 (Tue) @ 21:53

No, you did a great job.  I took your “AB 300” sheet, and simply did AB+BB+HBP above 400.


#16          (see all posts) 2006/12/06 (Wed) @ 09:52

"Who’s smarter than a monkey,” you ask?

In honour of Chief Wiggum, I counter, “How big a of monkey?”


#17    Tangotiger      (see all posts) 2006/12/14 (Thu) @ 18:28

Rally follows up with pitchers:
http://lanaheimangelfan.blogspot.com/2006/12/pitcher-projections.html

Marcel does a SENSATIONAL job!  This makes for a very sorry state of forecasting.

.46 ZIPs
.45 PECOTA
.45 Baseball Info Solutions
.44 Marcel
.42 Shandler

The overall average of the Four Horsemen: .445.  Marcel?  .442.

I mean, really, why should I pay attention to the smartest forecasting guys around, when they can barely beat the back of a baseball card?

I hereby request that every single forecast produced be accompanied with a Marcel forecast, and a blurb telling the reader why we would listen to the forecaster.


#18    Tangotiger      (see all posts) 2006/12/17 (Sun) @ 18:16

Rally posted this:
http://lanaheimangelfan.blogspot.com/2006/12/limits-of-projections-system.html

And he comes up with an upper limit of r=.77.

I said here:
http://www.insidethebook.com/ee/index.php/site/comments/forecasters_how_accurate_can_they_possibly_be/
that it could be r=.73

I used 550 PA.  I don’t know how many PA’s Rally averaged, but let’s say it was 650.  In that case, repeating my exercise, for wOBA:
var(random) = .020^2
var(actual) = .036^2
var(observed) follows as = .041^2
r follows as = .77

That matches Rally.


#19    Rally      (see all posts) 2006/12/17 (Sun) @ 22:39

Just check the average PA in the sample.  Its 646.

Damn you’re good, Tango.


#20    tangotiger      (see all posts) 2006/12/17 (Sun) @ 23:15

I love it when it all makes preemptive sense!

***

Btw, I would not use correlations to test the forecasters to the actual data, but rather RMSE.  Why is that?  Because with a correlation, it tests against y=mx+b.  However, I see no reason for the “m”, the slope, to be fitted as anything but 1.  Salaries are paid based on that, and forecasting something where the m can be fitted .90 or 1.10 just is plain cheating.  So, you can do a y=x+b, where b is simply the correction for the baseline. 

Again, that b should be based not on the given sample, but on the forecasters presumption for his forecasts.  So, the best way is for each forecaster to present his OPS baseline along with the individual forecasts.

***

As well, OPS is more like FIP than anything else.  That is, they are both “component” numbers.  ERA would be like Runs *or* RBIs, but not both R+RBI.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Mar 21 04:44
Morgan Ensberg has parental advice

Mar 21 01:02
Yahoo fantasy sabr league

Mar 20 21:32
BDB Database (MS Access)

Mar 20 15:42
Quickest ejection in MLB history?

Mar 20 12:31
Statistical Significance, or the reason that mathematician Ron Fisher is on MGL’s “On Notice” Board

Mar 20 10:20
Optimizing the batting order: Phillies and Yankees

Mar 20 02:31
Will Mariano Rivera save only 22 games this year, and with a 3.53 ERA?

Mar 20 01:12
One Year and One Million Hits Later

Mar 19 23:52
Another brilliant quote…

Mar 19 23:30
Arbitration and bias