THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, March 19, 2007

MLB Odds

By Tangotiger, 11:17 AM

John Beamer checks in with his forecast for team wins, using the player forecasts of THT.  The first sanity check is…


Figure out the win distribution for the league.  Historically, the observed distribution is a bit over .07 wins per game.  This implies a true rate of .06 wins per game.  That is, obs^2 = true^2 + random^2, where obs^2 = .07^2 and random^2=sqrt(.5*.5/162)

So, .06 x 162 = 9.72.  We expect 1 SD = 9.72 wins.  The forecast shows 1 SD = 6.4.  In fact, if you were to just create a completely random league with each team having the exact same talent level of players, you would expect 1 SD = 162 * random^2 = 6.4.

Something is terribly wrong here, and I think it’s because John says:

adjust the distribution a little so it agrees with the expected binomial distribution (it is debatable whether this is necessary but I unilaterally decided it was)

That was a terrible choice.

In 2006, the observed SD was .062, so if you want to argue that the talent level is more tightly distributed than historically, that’s ok with me.  But, you can’t possibly assume all teams have the same talent levels.  At the very least, make your estimate so that 1 SD = 8 wins.

I do look forward to this exciting nugget at the end:

Over the course of the season we’ll continue to follow how the odds of each team winning its division change. Also in the next month or two we’ll tackle a few related topics. One is our approach to calculating the odds of winning; two, is how our projected standings compare to prediction markets; and three is the accuracy of these projections.

#1    Tangotiger      (see all posts) 2007/03/19 (Mon) @ 11:44

Actually, what John did was worse than I thought, and how I explained it missed half the argument.

Let me restart.  If every team has exactly the same talent level, you *must* predict every team at 81-81 (1 SD = 0), even though you *know* that they will end up, after 162 games, of 1 SD = .039.

If you assume that the true distribution is 1 SD = .039 (which is just a pure coincidence that it matches the binomial), then you’d be correct in making that your forecast.  That is, if the true SD is .039, then the observed SD after 162 games will be sqrt(.039^2 + .039^2) * 162 = 8.9.

So, choosing the binomial as the expected distribution was just a very convenient way to do something that simply had no place for it.

As MGL pointed out in the other thread I linked to, you also have other parameters, beyond the talent level, that .062, namely change in talent level as the season goes on, parks, etc.  I doubt any of that really has much impact. 

In any case, let the player forecasts determine the distribution level, and not try to match the binomial.


#2    HarryAbles      (see all posts) 2007/03/19 (Mon) @ 12:13

It’ll be nice if they track playoff odds the same way BP does.  The biggest discrepancies between the two sites are MIL, BAL, TB, CWS, and SEA.

Did you get the projection spreadsheets I sent you?  My e-mail can be weird.


#3    Tangotiger      (see all posts) 2007/03/19 (Mon) @ 12:23

Yes, thank you!  I’ll be merging all that with the Community Forecasts that I rolled out.


#4    tangotiger      (see all posts) 2007/03/19 (Mon) @ 16:29

This may seem I’m belabouring the point:

SD = .5*sqrt(G), so that with 162 games, the SD is 6.4.

What if you had a 36 game schedule?  Now, 1 SD = 3 wins per 36 games or .083 wins per game.  And if you had a 400 game schedule?  1 SD = 10 wins per 400 games or .025 wins per game.

Why would your forecast for the Royals winning percentage be different if they plaed 36 or 400 games (with the same level of talent in either case)?  It shouldn’t.  If Santana is forecast for a .650 record, he’s forecast for a .650 record.  It’s not going to change based on the number of games played in the league.  (outisde of the obvious human resting parameters, which is not the issue here)


#5    John Beamer      (see all posts) 2007/03/20 (Tue) @ 04:58

Tango,

First, thanks for posting to the article. I obviously incensed you (not intentionally, I assure you) given you had to post three times on the issue.

I think there has been a minor misunderstanding here, which is down to how we interpreted a critical sentence in my column—on reflection the language I used might not have been 100% clear (it obviously wasn’t given your posts)!

First, the projections are based on talent only. So they do not include luck (random variance). Therefore we’d expect the spread in the w/l record to be much greater if we included luck as well as talent.

When I say I adjusted the distribution so that it agreed with the expected binomial distribution this was so that var(talent) + var(luck) agreed with historical numbers. As var(luck) is fixed by the binomial then this means I had to adjust the talent level to fit the distribution I wanted (std dev 9.2 games over the course of a season or so). (I think the confusion arose because of how I penned the original sentence.)

Now you could argue that I shouldn’t have done that—and I accept that argument completely. I don’t think it’s wrong—but then again I’m not convinced it is right either.

The thing that is confusing is that after adjusting the talent the std dev of talent is very close to the std of luck.

I agree with MGL that there are other parameters—I didn’t take those into account.

To your last point, the forecast of a team’s winning percentage is independent of the number of games played.

Hope that has cleared one or two things up ...

Thanks
John


#6    Nate Silver      (see all posts) 2007/03/20 (Tue) @ 05:14

FWIW, it does seem like we have a relatively even distribution of talent this year.  There’s only one team in each league that’s really bad—the Royals and the Nationals.  And for my money, you’ve only got two real juggernauts: the Yankees and the Red Sox, and even those two teams are punting a position apiece (first base and closer, respectively—at least for now).

There are, of course, several other teams on the fringes of being really good.  The Twins could be really good if they did a better job picking up veteran spare parts or if they weren’t getting ready to do stupid things in the back end of their rotation.  The Mets could have been very good if they’d picked up an ace starting pitcher.  The Tigers if they’d picked up a good first baseman to go with Gary Sheffield.  And so on.  But these things didn’t happen. 

I suspect what you’re seeing happening is a gradual adaptation to the Wild Card structure, in which a lot of teams implicitly or explicitly target a number between 85-90 wins, which is probably pretty close to Pareto optimal from a macro perspective.  Of course, some of these teams undoubtedly overestimate their talent stock, but an 85-90 win target allows perhaps 2/3 of the teams to suspend their disbelief, and so you wind up with a huge cluster of teams between about 77 and 90 wins.  Under the old four-division format, teams had to be a bit more explicit about deciding whether they were going to try and contend or not, and that probably led to greater variance in team quality.

The THT projections do look a touch conservative to me (both on an individual and team level) but I don’t think this is a purely accidental result.


#7    tangotiger      (see all posts) 2007/03/20 (Tue) @ 08:01

John: ok, so the forecast is for var(observed), and not just for var(luck).  However, your distribution (1 SD = 6.4) implies that var(talent) = 0.  That’s obviously impossible, as long as the Yanks, Redsox, Royals and Pirates play in MLB.

So, if you are adding in the luck (terrible choice, which we’ll see at the end of the season), you’ve got to get to the distribution you want to observe.  When was the last time an MLB season observed 1 SD = 6.4?  Last year was tight, and 1 SD = 10.1!

Nate does say that the talent distribution is tighter than usual, and I can accept that.  The historical norm (last 45 years), it’s been 1 SD = .060.  Last year, with a higher level of uncertainty, it was .048.

I think you, John, should present it both ways: with the luck factored into your forecast, and without.

I see no reason to way to introduce random noise into a forecast.  Otherwise, why not also introduce it into your player forecast, and get Ryan Howard to hit 55 HR?  You can’t have it both ways.  You can’t forecast players for their true talent levels, and forecast the teams for their talent level plus noise.


#8    John Beamer      (see all posts) 2007/03/20 (Tue) @ 09:03

No.

The forecast is for var(talent) adjusted so that var(observed) is in line with what we’d expect historically.

By happenstance var(talent) and var(luck) are similar (but are obviously distributed completely differently).

The only thing I use var(luck) for is to see how much I need to stretch out var(talent) to make it fit with var(observed).

Now, if what Nate said is correct and the talent distributiion is tighter than usual then I shouldn’t have made the last adjustment, and I accept that. However, it makes a small difference. I’ll publish the unadjusted numbers later (work blocks Google docs).


#9          (see all posts) 2007/03/20 (Tue) @ 16:18

I’m just curious, but is fielding being double counted here? I know it’s fairly insignificant, but the THT book says that team fielding is included in pitching projections, and obviously that fielding is already included in the position players’ WAR.

Also, is there any benefit to using WAR (as opposed to using depth charts + baseruns to do RS/RA) other than reducing the number of calculations? I suspect - with no evidence - that projections would tend toward more accuracy if the replacement baseline was bypassed.


#10    tangotiger      (see all posts) 2007/03/21 (Wed) @ 23:00

I agree with Tom that there’s no reason to use replacement level here.  All we care about is total runs scored and allowed.

***

Fantastic work here:
http://yankeefan.blogspot.com/2007/03/2007-diamond-mind-projection-blowout.html

Here are the standard deviation for each system:
7.2 ZIPS
7.1 Chone
7.0 DMB
6.9 PECOTA

We know that luck is 6.4, therefore expect to see the following observed standard deviations this year:

9.6 ZIPS
9.5 Chone, DMB
9.4 PECOTA
6.4 Beamer (if I understand him right)

I’ll remind you that last year, MLB was 10.1. 

We probably want to add a few little wrinkles like injuries, trades, and actual changes in base talent level of the players themselves.  If let’s say we consider that to be 1 SD = 3 wins, then we get the following:
10.1 Zips
10.0 Chone
9.9 DMB
9.8 Pecota

Seems like reasoable forecasts all-round.

***

In none of the cases do the Royals, even by pure luck, make the playoffs.  They all agree that they are a 64 or 65 win team.  With 4000 sim runs, they are at best a 4000:1 shot of making the playoffs, and probably a much worse odds.  With 8 teams, and them the worst, if they make it, they probably have a 5% chance of winning the WS.  That puts their odds at winning the WS, at best, 80,000:1.  Since the 4000 sim runs tops them out at 84 wins, they are in much worse shape than 4000:1 odds.


#11    David Gassko      (see all posts) 2007/03/21 (Wed) @ 23:35

I’m just curious, but is fielding being double counted here? I know it’s fairly insignificant, but the THT book says that team fielding is included in pitching projections, and obviously that fielding is already included in the position players’ WAR.

***

No, because fielding is not included in the pitching WAR calculations (neither is park or league). Those ARE all included in the projected numbers, however.


#12    John Beamer      (see all posts) 2007/03/21 (Wed) @ 23:37

Tango ... we still seem to be having as miscommunication! Oh, well. Let me put some figures down for you:

THT Unadjusted (ie, do not include my adjustment):

THT Talent std dev: 5.1
Luck std dev: 6.4
OVERALL EXPECTED STD DEV: 8.2

THT Adjusted (I made the adjustment outlined earlier):

THT Talent std dev: 6.3
Luck std dev: 6.4
OVERALL EXPECTED STD DEV: 9.0

This shows why I made the adjustment to var(talent). It is because I thought that the unadjusted figures were too tightly distributed. (As I said earlier, I understand the argument not to do that, but hey-ho)

If you add in your wrinkles eg, injuries etc ... you probably get the THT number around std dev on 9.5 or so. We’re tighter than the rest but not by much.

By the way I agree with Tom too. The initial article (based on WAR) just grew from some very initial work that David Gassko did, which I added to—hence why it sort of stuck. I’ll get round to doing a BsR version this weekend and post the results here. Will be interesting to see the difference.

I can’t comment on Tom’s first comments as I wasn’t involved with pulling together the original projections.

Oh, and not that anyone cares but my wife has just gone into labor—it’s could be a long night ...


#13    tangotiger      (see all posts) 2007/03/22 (Thu) @ 00:16

Ok, now I follow you.

If THT does have a talent level of 5.1, while the other 4 are reporting 7.0, then it is too tightly bunched (for whatever reason… maybe too much regression at the player level?).  In any case, I understand why you’d want to stretch it out, but it still seems that you didn’t stretch it out enough.  However, I think you have to figure out why THT is so much lower than the rest. I would prefer no stretching, and instead look at the root cause.

***

If this is your first kid, say goodbye to your old life.  It doesn’t exist any more.


#14    John Beamer      (see all posts) 2007/03/22 (Thu) @ 00:28

I was loathe to stretch it out too much. I’ll re-do it using BsR some point this weekend and see where that gets us. I’ll post the results here when they are done. When we see the std. dev. of those we can take it from there ...

***

Yup—first kid. I feel hopelessly unprepared.


#15    Tangotiger      (see all posts) 2007/03/22 (Thu) @ 16:05

Continuing from post #10:

The biggest disagreements among the four:
DBacks: PECOTA 89 wins, DMB 78 wins
Cards: ZIPS 90, PECOTA 80
DRays: PECOTA 78, ZIPS 68
Jays: DMB 89, PECOTA 80
Nats: DMB 75, PECOTA 66

As you can tell, Chone is nowhere here.  In fact, if you take the difference between each forecast, and the group average, Chone is the closest to the group.  And PECOTA is the farthest.

Here’s the overall picture, sorted by the avg.

ch dm pe zi avg SD Team
95 97 94 92 95 2.2 NYA07
92 87 92 87 89 3.1 Bos07
92 88 90 88 89 1.7 Min07
87 92 87 88 88 2.1 Cle07
87 89 84 90 87 2.4 Det07
86 85 88 87 86 1.2 Phi07
86 88 86 85 86 1.4 SD07
88 86 86 83 86 2.0 LAA07
86 83 85 87 85 2.0 NYN07
84 84 80 90 84 4.4 StL07
83 85 82 87 84 2.4 Atl07
84 83 85 86 84 1.2 ChN07
81 89 80 84 83 4.1 Tor07
83 78 89 83 83 4.5 Ari07
81 84 80 87 83 3.0 Oak07
82 82 81 86 83 2.0 LAN07
81 80 84 79 81 2.5 Mil07
82 81 81 76 80 2.6 Hou07
80 77 80 79 79 1.6 Col07
82 77 78 78 79 2.6 SF07
79 77 81 78 79 1.6 Tex07
77 78 73 77 76 2.0 Sea07
76 77 74 77 76 1.6 ChA07
72 75 75 79 75 2.8 Bal07
75 73 77 72 74 2.2 Flo07
74 72 76 72 73 1.8 Pit07
73 76 74 72 73 1.7 Cin07
71 69 78 68 71 4.3 Tam07
69 75 66 71 70 3.8 Was07
64 65 66 65 65 0.8 KC07


#16    HarryAbles      (see all posts) 2007/03/23 (Fri) @ 01:12

TAM and ARI among the highest SD makes sense with all the youngsters (sounds weird, Delmon’s got a year on me), and it’s interesting that everyone else backs up PECOTA’s view of CWS.  Great stuff.


#17    John Beamer      (see all posts) 2007/03/23 (Fri) @ 12:23

Tango

I found something similar at the player level as I wrote (scroll to bottom of each article)

http://www.hardballtimes.com/main/article/hitter-projections/

and

http://www.hardballtimes.com/main/article/2007-pitcher-projections/

My sample size if too small to read anything significant into the results but nonetheless interesting.


#18    Tangotiger      (see all posts) 2007/03/30 (Fri) @ 13:37

Clay checks in with the postseason odds report:
http://www.baseballprospectus.com/statistics/ps_odds.php

1 million runs is better than 1 thousand runs.  Here we see the Royals, worst-team in the league, at a 3% chance of making the playoffs.  Presuming that such a team has a 5% chance of winning the World Series once in the playoffs, that gives them around 500:1 to 1000:1 of winning it all.

On the flip-side, the Yanks have a 54% chance of making the playoffs.  Presuming such a team has a 15% chance of winning the World Series, that makes it a bit worse than 10:1 odds of winning it all.

Given the choice between the Yanks and the Royals, you are talking around the Yanks being 50x more likely to win the World Series.


#19    Tangotiger      (see all posts) 2007/03/30 (Fri) @ 14:00

Using the binomial distribution, setting the threshhold of 85 wins as making the playoffs, and using the distribution of team wins listed in column 1, this is what I egt as a theoretical model:

Col 1: Wins
Col 2: Chance of Making Playoffs
Col 3: Chance of Winning World Series, given in playoffs
Col 4: Chance of winning World Series: col2 x col3

87.0 0.594 0.144 0.086
86.5 0.563 0.141 0.080
86.0 0.532 0.139 0.074
85.5 0.501 0.137 0.068
85.0 0.469 0.134 0.063
84.5 0.438 0.132 0.058
84.0 0.407 0.130 0.053
83.5 0.377 0.127 0.048
83.0 0.347 0.125 0.043
82.5 0.319 0.123 0.039
82.0 0.291 0.121 0.035
81.6 0.270 0.119 0.032
81.3 0.255 0.117 0.030
81.1 0.245 0.117 0.029
81.0 0.240 0.116 0.028
81.0 0.240 0.116 0.028
80.9 0.235 0.116 0.027
80.7 0.225 0.115 0.026
80.4 0.211 0.114 0.024
80.0 0.194 0.112 0.022
79.5 0.173 0.110 0.019
79.0 0.153 0.108 0.017
78.5 0.136 0.106 0.014
78.0 0.119 0.104 0.012
77.5 0.104 0.102 0.011
77.0 0.091 0.100 0.009
76.5 0.078 0.098 0.008
76.0 0.068 0.096 0.006
75.5 0.058 0.094 0.005
75.0 0.049 0.092 0.005

As you can see, it matches decently with Clay’s work.

However, Clay’s high in team wins is 92 (+11 above average), while mine here is only 87 (+6 above average).  But then again, I compensate by lowering the threshhold for making the playoffs (85 wins, or +4 above average).

If I used PECOTA’s win totals that Clay uses (and set the playoff threshhold appropriately), the chance of the Yanks making the playoffs skyrockets.  Why is that?  Because in my theoretical model, I’ve got one league of 30 teams, as opposed to 6 divisions (of which the two best teams are in one of them).

So, if you want to be able to create something quick that matches Clay’s sim, that’s what you have to do.


#20    tangotiger      (see all posts) 2007/04/02 (Mon) @ 11:43

This is the kind of thing that we’ve been talking about:
http://www.sciencedaily.com/releases/2007/03/070330185024.htm

The standard deviation of the forecasts is 10.7. 

My guess is that the professor did a true talent forecast, and then stretched out the win forecast to match the estimated observed.

The Yanks are forecast for 110 wins (!!), which is +29 wins above average.  If I reduce that +29 by 20%, that makes them +23, or 104 wins.  Repeating this step for all teams, and I end up with a standard deviation of 8.5.  Add in the random standard deviation (6.4), and we get a standard deviations of 10.7.

If this is what happened, this is the kind of mathematical gymnastics that I don’t like.  If this isn’t what happened, I’d like to place wagers with the professor.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:33
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors