THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, September 23, 2008

Saberists predict better than Insiders

By Tangotiger, 09:09 AM

Thanks to Vegas Watch, we see that PECOTA and Neyer were off by 10 games, Vegas was off by 11, and Olney/Phillips were off by 13.  Lovely.

Now, here’s the fun part.  Ready?  If you had forecasted every single team to finish as 81-81, the RMSE would have been 10.6.  That is, of the 12 smartest and most experienced guys that Vegas Watch decided to track, here is the list, with the perfectly competitive balanced vote listed:
9.6 PECOTA
10.2 Neyer
10.5 Law
10.6 Perfectly Competitive Balanced (all teams predicted at 81-81)
10.8 Vegas
11.1 Passan
11.3 Sheehan
11.4 Brown
11.7 Kurkijan
12.1 Stark
12.1 Henson
12.4 Phillips
13.0 Olney

Says it all doesn’t it?


#1    MGL      (see all posts) 2008/09/23 (Tue) @ 09:41

We know that the “insiders” are terrible at pretty much anything resembling analysis.  So that is a dead issue.

The reason that 81-81 for every team does so “well” might just be that RMSE is a really bad way of evaluating these numbers.  By going with 81-81, obviously you guarantee that you won’t be off by more than a certain amount.  If you predict 93 wins for a team, there is a certain chance that they will win only 70 and you will be off by 23 wins!  Almost never going to happen (23 win difference) if you pick 81.

How about just evaluating everyone based on whether that picked a team to be under or over .500?  How does 81-81 do against everyone else?  Or against the Vegas line?  Lot of better ways of evaluating these projections than RMSE.  RMSE might work if you are comparing one person’s picks to another. Even then, I am not sure.  One of the problems is that one really bad result is going to skew someone’s results badly.  Say I nail just about every team in terms of how good or bad they are, but I pick one team to win 130 games and they “only” win 91.  I probably have a bad average RMSE, but I pretty much nailed everyone. In fact, if you knew that you were going to be evaluated using RMSE, it might behoove you to keep all your numbers near 81.  A good team gets 85 and a bad one 77.  How would each person do if we changes their picks using that criteria?  I bet that just about everyone does better than 81-81!


#2    Tangotiger      (see all posts) 2008/09/23 (Tue) @ 09:48

Excellent point.  Perhaps VegasWatch can do the exercise (or send me the data he compiled and I’ll do it).

Here’s the suggestion:
new_wins = round( 81 + (old_wins - 81) * factor)

Try factor at 0.75, 0.50, 0.25, and see what happens.

So, a team that has been forecasted for 93 wins will have a new_wins of 90, when the factor is 0.75 for all teams.  Run the RMSE.

Then try it at factor = .50, and this team has new_wins of 87.  Run the RMSE.

What is the factor value when RMSE is lowest?  Try different factor values around this factor value.  What is the ideal factor value in this data set?

And how about historically?

***

Note that we know that RMSE historically is 1 SD = .072*162 when you forecast each team at 81-81.  (That is, RMSE is identical to simply taking the standard deviation of the actual win% minus .500 times games played.) So, that’s an RMSE of 11.7.  These days, I think it’s a bit lower, and so, it’s almost certainly a bad luck season for all forecasters.


#3    Vegas Watch      (see all posts) 2008/09/23 (Tue) @ 10:00

"How does 81-81 do against everyone else?  Or against the Vegas line?  Lot of better ways of evaluating these projections than RMSE.”

You could do this, but then you run into the opposite problem.  If a team’s O/U was 74 games, they actually win 92 and analyst X picked them to win 75 while analyst Y had 90, you’re giving both the same credit, when obviously that isn’t fair.

I would agree that using RMSE and comparing the analysts’ projections to 81 wins isn’t fair, which is why I used the ‘07 records as the baseline.

“These days, I think it’s a bit lower, and so, it’s almost certainly a bad luck season for all forecasters.”

I have the 2005-2008 projections of DMB, ZiPS, Joe Sheehan and PECOTA.  The average RMSEs for each year are: 9.28, 13.90, 6.59, 10.58.  Really an incredible difference between ‘06 and ‘07, but this year looks to be a bit above average.

I am going to do another post on this stuff once the regular season is over, I’ll include Tango’s suggestion, and the numbers v.500 and v.Vegas.  I can also send you guys the spreadsheet if you want.


#4    Peter Jensen      (see all posts) 2008/09/23 (Tue) @ 10:23

When I evaluate predictions one method I like to use is how many predictions are within a certain value of the actual number.  So in this case, which forcasting system had the most team win estimates within +-4 wins (or whatever value seems appropriate for a successful prediction) of the actual team wins.


#5    Tangotiger      (see all posts) 2008/09/23 (Tue) @ 11:16

There were 1082 team seasons since 1962, excluding the partial seasons of 1972, 1981, 1994, 1995).

There were 368 teams (34%) who finished the season within 5 wins of .500.  There were 359 teams (33%) who finished the season more than 11 wins of .500.  The other 355 teams (33%) finished the season between 5.5 and 11 wins (inclusive) of .500.

If you were to construct a W/L, I would probably use those boundary points.

From this standpoint, predicting a .500 record for everyone will end up with, on average, a 10-10 record, with 10 “ties”.


#6    Tangotiger      (see all posts) 2008/09/23 (Tue) @ 11:23

By the way, 1 SD = .0709.  The average number of games is 161.8.

In the equation:
var(observed) = var(true) + var(binominal)
we get:
.0709^2 = x^2 + .0393^2

And that makes x= .0590

That’s why I always use x (i.e., 1 SD of true talent) = .06


#7    Colin Wyers      (see all posts) 2008/09/23 (Tue) @ 12:47

RMSE applies an extra penalty to extreme errors. Average Absolute Error would probably knock down the “dumb” predictor a bit.


#8    Ryan J. Parker      (see all posts) 2008/09/23 (Tue) @ 15:03

"Lot of better ways of evaluating these projections than RMSE.”

This is a timely topic for me. I’m new to evaluating projections, so I’m very interested to know what other (especially better) ways to evaluate projections than RMSE. Do you (or anyone else) have a quick list of methods I can research?


#9    Tangotiger      (see all posts) 2008/09/23 (Tue) @ 15:21

I would say that the evaluation must be done based on how much money you can make.


#10    Ryan J. Parker      (see all posts) 2008/09/23 (Tue) @ 15:32

Well lets say we don’t care about the money. We want to know who performed the best. For strictly statistical methods, what other than RMSE are we talking about?


#11    Tangotiger      (see all posts) 2008/09/23 (Tue) @ 15:35

Perhaps an example.  Say Vegas gives you the following odds on the Mariners (and you think they are an 80W team):

85W or higher is 1:1
84W or lower is 1:1

84+ v 83- is 0.9:1.11 (i.e., betting 100$ will get you 90$ if you bet 84W and higher, and 111$ if you bet 83W or lower)

83+ v 82- is 0.8:1.25

82+ v 81- is 0.67:1.50

81+ v 80- is 0.5:2.0

80+ v 79- is 0.4:2.5

(All numbers for illustration.)

Since you believe they are an 80W team, you will bet on the first 5 actions, and ignore the last one.

In this case, you put up 500$, and you’ll end up with a gain of +686$ if the Mariners win less than 80 games.  But, what if they win 82 games?  In that case, you put up 500$ and you have a gain of +136$.  And if they win 83 games, you lose 90$.

Now, would this kind of process more closely mirror RMSE, or absolute error, or something else?


#12    Tangotiger      (see all posts) 2008/09/23 (Tue) @ 15:36

Ryan: your other choice is absolute error.  So, average(abs(x-y))


#13    Ryan J. Parker      (see all posts) 2008/09/23 (Tue) @ 16:17

That makes sense. Thanks for the information.

I would like to understand how existing methods perform with respect to predictions (in my case basketball). The goal is to improve what exists, but I can’t improve anything if I don’t know how other methods perform.

Thanks again.


#14    MGL      (see all posts) 2008/09/23 (Tue) @ 17:53

FWIW, I come in at a RMSE of 9.81 (I think), good for second place behind Pecota.  Maybe Vegas Watch can put me in the database.

Here were my picks:

MGL

NL East

Mets 92-70
Braves 87-75
Phillies 83-79
Nationals 74-88
Marlins 73-89

NL Central

Brewers 88-74
Cubs 86-76
Cardinals 78-84
Reds 77-85
Pirates 74-88
Astros 72-90

NL West

Dodgers 85-77
Padres 85-77
Diamondbacks 86-76
Rockies 83-79
Giants 66-96

AL East

Yankees 92-70
Red Sox 90-72
Rays 86-76
Blue Jays 82-80
Orioles 70-92

AL Central

Indians 90-72
Tigers 86-76
White Sox 77-85
Twins 76-86
Royals 70-92

AL West

Angels 87-75
Athletics 80-82
Mariners 77-85
Rangers 75-87


#15    Vegas Watch      (see all posts) 2008/09/23 (Tue) @ 21:41

Just put you in, MGL.  I only have the standings through Saturday loaded in right now, but you’re right behind PECOTA (9.62) at 9.70.  I’ll include you in the post next week, did you post those somewhere in March?  Obviously I believe you, I just want something to link to for reference.

Tango, what do you think its better for this, abs. error or RMSE?  I think I originally used abs. error last year and was asked to do RMSE.  Can’t win either way of course.

The other thing I was thinking of doing was standardizing the st. dev. for each set of projections.  For example, PECOTA’s st. dev was 8.39, while Sheehan’s was 10.38.  Correct me if I’m wrong, but I’d think that puts Joe at a huge disadvantage right off the bat, right?  I’d think normalizing the st. dev. would allow us to look at how good everyone is at actually judging the relative talents of the teams, rather than how good they are at knowing how far to space out their predictions.

I guess that is similar to your suggestion in #2, but I think it serves a slightly different purpose.


#16    MGL      (see all posts) 2008/09/23 (Tue) @ 22:55

Vegas, I’ve posted my pre-season picks in several forums, I think, but I don’t recall where/when off the top of my head.  I just pulled up my “08projwins” file from before the season started.  I sent that file to Neyer I think (before the season - he actually computes his picks from a composite of other analysts plus his “opinion” to tweak them), and I always send it to the Cardinals.  I’ll see if I can find a link anywhere that goes back to March or April.  Maybe Tango can help out with a search of the blog archives.  I’m not very good at that.

If you don’t feel comfortable using them, that is fine of course.  I was just curious where I stand in relation to the other people.

And of course, just like a small sample of a player’s stats does not have all that much certainty, how the analysts and other people do with these pre-season estimates does not mean much either.  In fact, just eyeballing a person’s pre-season wins/losses gives you a much better idea as to their competence than any test, be it RMSE or otherwise, at least in the short run (say, less than 10 seasons).

For example, anyone who picked the Rays for 70 something wins was not paying much attention or did not have a good model at all.  Same thing with whomever picked the Tigers to win 95+ games.  Doing poorly on, for example, SD or CLE, does NOT, by any stretch of the imagination or logic, mean that a person had a poor model. In fact, quite the opposite.

As another example, let’s say that one forecaster had team A projected at 90 wins, and another forecaster at 75 wins.  And let’s say that team A had devastating and not too foreseeable injuries to the tune of 10 expected wins (like losing 2 or 3 superstars and replacing them with replacement to average players) and they won 78 games.  Who do you think did a better job at the forecast?

So to use ANY method which compares w/l records against projections without taking into consideration injuries and acquisitions, is almost useless, again, at least in anything less than lots of seasons (the “long run").


#17    Vegas Watch      (see all posts) 2008/09/23 (Tue) @ 23:17

MGL, do you have your ‘05 and ‘06 projections handy?  I have PECOTA for those years, I’d like to compare, just for fun.


#18    MGL      (see all posts) 2008/09/24 (Wed) @ 11:38

05

AL East

NYY 94-68
BOS 93-69
BAL 83-79
TOR 76-86
TB 69-93

AL Central

CWS 82-80
MIN 82-80
CLE 82-80
DET 80-82
KC 72-90

AL West

OAK 85-77
ANA 84-80
SEA 82-80
TEX 77-85

NL East

ATL 87-75
PHI 86-76
FLO 84-78
NYM 82-80
WAS 75-87

NL Central

STL 93-69
CHC 84-76
HOU 75-87
CIN 74-88
MIL 75-87
PIT 74-88

NL West

LA 86-76
SD 85-77
COL 76-86
SF 75-87
ARI 74-88

06

AL East

BOS 89-73
NYY 88-75
TOR 81-81
BAL 81-81
TB 72-90

AL Central

CLE 86-76
MIN 84-78
DET 83-79
CWS 82-80
KC 65-97

AL West

OAK 92-70
ANA 83-79
TEX 81-81
SEA 76-86

NL East

NYM 88-74
PHI 88-74
ATL 86-76
WAS 77-82
FLO 69-93

NL Central

STL 88-74
CHC 83-79
MIL 82-80
HOU 78-84
CIN 76-86
PIT 77-85

NL West

SF 86-76
LA 85-77
SD 78-84
COL 74-88
ARI 73-89

07

NYY 93-69
BOS 90-72
TOR 84-78
BAL 78-84
TB 74-88

CLE 89-73
MIN 84-78
DET 79-83
CWS 77-85
KC 71-91

OAK 86-76
ANA 84-78
TEX 80-82
SEA 77-85

ATL 85-77
PHI 85-77
NYM 84-78
FLO 79-83
WAS 69-93

STL 85-77
MIL 84-78
CHC 81-81
HOU 78-84
PIT 76-86
CIN 71-91

SD 89-73
ARI 84-78
LA 82-80
COL 77-85
SF 73-89


#19    Vegas Watch      (see all posts) 2008/09/24 (Wed) @ 17:24

Damn, those are good.

‘05: PECOTA 10.13, MGL 9.32
‘06: PECOTA 13.19, MGL 12.70
‘07: PECOTA 6.10, MGL 6.26
‘08 (through Sat.): PECOTA 9.62, MGL 9.70
Average: PECOTA 9.76, MGL 9.49

Other ‘05-’08 averages: DMB 10.36, ZiPS 10.54, Joe Sheehan 11.65


#20    MGL      (see all posts) 2008/09/24 (Wed) @ 19:05

To be perfectly open with the methodology, all I do each year is to use my pitching, defense, base running, and hitting projections for each player and prorate them for the amount of time that the Pecota depth charts say that each player is going to play for each team (that is probably why Pecota and I are so close).

From these I simply assign each team plus or minus X number of runs per game on offense and defense and that becomes their static wp for every game in the schedule.

Then I sim the schedule (with every game being a log5 matchup between the two teams - I don’t think I even use HFA as that will even out on everyone anyway, other than Boston, MIN, and COL - all have a little bit of a higher HFA) 10,000 or whatever times and average each team’s wins and losses.  Oh, and the AL gets an edge over the last few years in IL games.

To be honest, my batting and pitching projections are probably no better than anyone else’s, maybe worse.  Maybe my UZR and base running projections give me a little bit of an edge.  Rally and Pecota, for example, have great projections.

And as I said, I am relying on the Pecota “depth charts” so they get a good deal of the credit.  As I also said, 1 or even 3 seasons is not going to tell us much of anything with any kind of certainty, but it is fun.

To me, the fun part is seeing how dumb the mainstream press can be when a team has a certain “reputation” or a certain record one year (the MSM tends to assume pretty much the same thing for the next year, with a little tweaking), like TB, SEA, and DET (to some extent) this year.

It would be interesting to assign everyone random pre-season picks for each team and then print out a few results for 3 or 4 seasons.  My guess is that one or two are going to look like geniuses for 3 seasons or so.  It would also be interesting to take 10 fictitious persons and give them all pre-season picks with some getting good picks (based on what actually happens) and others getting bad picks.  Again, I would guess that the good forecasters would have bad results (and vice versa) quite often.  That is especially true since most teams have injuries and trades that dramatically affect their true WP, that no one can anticipate very well.  Given that, the best and the worst forecasters are expected to do pretty badly in general (that is why we often see RMSE of 10-12).  Imagine that there were great forecasters (perfect actually) and terrible ones (basically randomly picking wins and losses).  If all teams were to dramatically alter their true wp by unjiries and acquisitions, everyone would do just as good and just as bad as everyone else. To some extent that is happening.

Since I firmly believe that a team’s true wp is merely a combination of all their players’ true wp prorated by actual playing time at the end of the season, to me, looking at team pre-season picks tells me nothing more (and a whole lot less) that looking at individual player projections.

As I said, if I nail everyone’s projection on the Indians, but their playing time at the end of the season looks nothing like the pre-season depth charts (and again, it could not be forseen), then it is kind of silly to “say” that I did a poor job in projecting their team w/l, at least IMO.

If anything, we should adjust a team’s actual w/l for the mid-season personnel moves and injuries and THEN see how everyone would have done.  Although no one (but me) would take those seriously, to me that tells me a lot about how good someone’s pre-season pick model is.

That is kind of what I did in last year’s THT Annual and pretty much what I am going to do again this year, but in more detail, and from a manager’s perspective.

As I said in last year’s Annual if a team like TB was supposed to be bad according to the ignorant MSM and they end up doing a lot better, should the manager get credit for that?  He will of course.  I assume that Maddon is a shoe in for the MOY award. Now, granted they did play better than Pecota and I (and other “real” analysts) projected - at least w/l-wise.  But if they won 88 games and won the WC, I guarantee that Maddon wins the award as well.

Anyway, one of the things I am going to do again in the Annual is look at how teams should have done, offense, defense, etc., run differential, and w/l (based on the run differential), based on exactly the playing time of the players and their pre-season projections.  To me this, and only this, tells us whether and how a team may have overperformed or underperformed. Now, whether that was a fluke, bad projections, good coaching, good managing, etc., is another story, questions that could not even come close to being answered, at least by me, or by the numbers, which is one reason that I think the MOY award is one of the silliest (team that should not do well does well - manager wins MOY.  Absent that, team, like ANA, that continues to do real well, despite some more conservative predictions, manager wins award. Team that gets in playoffs and manager never gets criticized during the year (like Manuel - maybe) wins award.  Etc.  Dumb award, although not necessarily dumber than some of the other ones, at least the way the voters consider them).


#21    tangotiger      (see all posts) 2008/09/24 (Wed) @ 21:08

Tippett used to track the rankings for a long time (1999-2006):
http://www.diamond-mind.com/articles/tmpred06.htm

As you can see if you page down a few times, the Vegas line and the Diamond-Mind sim. 

FWIW, Nate (PECOTA?) was near the bottom in the 2004-2006 time period, based on Tom’s methodology, but the BP consensus did well.


#22    Vegas Watch      (see all posts) 2008/09/24 (Wed) @ 22:35

The st. dev. for the ‘07-’08 average RMSE for the 14 guys I have is 0.92.  Here is how many standard deviations everyone is from the average, with + obviously being better:

PECOTA 1.40
MGL 1.28
Neyer 1.18
CHONE 1.17
O/Us 0.34
Passan 0.16
Law 0.11
ZiPS -0.25
Brown -0.38
Sheehan -0.45
Kurkjian -0.54
Stark -0.88
Phillips -1.32
Olney -1.78

As much as I may want them to be, those aren’t terribly significant.  The top four and bottom two really have separate themselves though.  It’s interesting that PECOTA and those similar have been a good deal better than Vegas the last couple years.  I should try to track down the Vegas totals from ‘05 and ‘06, not sure where I could dig those up.  Tippet’s work makes it look like they beat PECOTA those two years (or PECOTA lost to them, I suppose).


#23    MGL      (see all posts) 2008/09/25 (Thu) @ 00:10

I’ll see if I can get the Vegas lines for 05 and 06.  You have 07?


#24    Vegas Watch      (see all posts) 2008/09/25 (Thu) @ 00:25

Yeah, I have ‘07.


#25          (see all posts) 2008/09/26 (Fri) @ 02:33

Also, as you said you (Vegas Watch) would do in your blog, I would think that using each team’s pythag record would be much better if you were seriously evaluating each person’s picks.  (And again, even better would be to adjust for injuries and mid-season players changes, but that then makes it more of a player projection contest.)

If it is just for fun, bragging rights, etc., then of course you want to use actual records.

The whole thing is kind of silly. There are two groups.  One is the analysts who use player projections, including playing time, to figure team w/l records based on team offense, defense and schedules.  The other is the media who use G-d knows what.  We know going into the season which group is going to destroy the other group in the long run and probably in the short run as well.  Everything else is just fluctuation.


#26    Tangotiger      (see all posts) 2008/09/30 (Tue) @ 13:51

http://blogs.whereistand.com/gethro/84

Keith Law is #1.


#27    Jim P      (see all posts) 2008/09/30 (Tue) @ 14:32

How about if you have every method go head-to-head against every other method on every team?  Whoever is closer on more teams gets a win (with a point differential if you want), whoever wins the most head-to-head contests is the best.  This would probably move Perfectly Balanced to the bottom.


#28    MGL      (see all posts) 2008/09/30 (Tue) @ 15:17

Keith Law is #1.

In other news, the winner of the 100 yard dash in this year’s Special Olympics was Carl Lewis, beating runner up Timmy Johnson by a mere 11.3 seconds.

With all due respect to Joe Sheehan and Rob Neyer.


#29    Tangotiger      (see all posts) 2008/10/02 (Thu) @ 13:44

Update from Vegas Watch:

http://vegaswatch.net/2008/09/evaluating-april-mlb-predictions-update.html

***

VW: if you can send me your datafiles, I’d love to take a look.  tom~tangotiger~net (replace the ~ with the appropriate character)

Thanks…


#30    MGL      (see all posts) 2008/10/02 (Thu) @ 14:44

His comments about Phillips were funny and perfectly appropriate.  (I actually don’t find him to be too bad on TV when commentating on a game.)

My guess is that GM’s and managers would not do a whole lot better than Phillips.

While guys like Phillips may “know” a whole lot about the game, rational analysis (as a catch-all phrase) is just not one of their fortes.

There are different levels of “knowledge and understanding” when it comes to baseball.  There is that of the baseball insider (Phillips et al.) and that of the sabermetrician.  And never the twain shall meet.  I could easily say, “There are 2 kinds of people in the world...”

Which is why any team could benefit from both on their payroll.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 21 17:29
Sabermetric Moves of the 2009 Pre-Season

Nov 22 06:40
The New Triple Crown

Nov 22 06:24
Chance of Scoring by Base/Out, Retrosheet Years

Nov 22 02:48
How good are the Fans in evaluating fielding?

Nov 21 20:13
Runs Produced

Nov 21 19:27
Marcel 2009 is here

Nov 21 16:43
Nate Silver: hero to interviewers

Nov 21 10:57
New BBTN

Nov 20 20:34
ABSO-lutely… not!

Nov 20 19:23
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being