Wednesday, March 31, 2010
Poll: Chances for Redsox and Royals to surprise
Buy The Book from Amazon
Is the expected win probability a true (symmetrical) bell curve? Does an 81 true talent team have the same chances of winning 96 games as they do of winning 66 games? Does a true talent 50 win team have the same chances of winning 70 games as they do of winning 30 games? Just curious.
vr, Xei
It’s going to be very close to symmetrical.
After 40 votes, it’s still the same results:
Redox 92 wins as midpoint
Royals 72 wins as midpoint
Um, really? I don’t see how it would be symmetrical.
Regression to the mean, right?
If Josh Beckett is projected for a 3.75 ERA, does that mean he has an equal chance of putting up a 4.75 as a 2.75? I don’t think so.
If something changes in the true talent of a good player, isn’t it likely that it would be a BAD thing?
Heck, even for average MLB players… far more things (mostly injuries) will make them bad than will make them better.
Mike, we’re talking about team wins, not player ERA.
Just a suggestion—I know I’m a bit slow on the uptake, but I had to read the questions several times, and finally scroll down to your first comment to figure out how to vote properly (I hope!). Maybe I’m just particularly stupid (that certainly can’t be discounted), but a better explanation up top might help.
Matt K. I had to read it multiple times as well. I sorta think Tango wanted to obfuscate the question so that you didn’t feel like you were just predicting actual wins. SFWIW, saying “at most” and “at least” makes me think more than “no more than” and “no less than”.
What’s the difference between a player who’s projected to be above-average, and a team that’s projected to be above-average? Each of them has many smaller components that make up the whole. And again, as I see it, anything that occurs to change the projection is more likely to be something that lowers expected performance, right?
Matt, I, too was perplexed by the wording. Would have favored something like “The Red Sox have as good a chance of winning 100 or more games as winning X or fewer games.” Royals version was even tougher—I don’t often think in terms of losses, just wins. For future reference, Tango.
Mike, it seems to me that it is less about actually changing the projection (i.e. changing underlying talent) than it is about realized performance around the projection (the error bars of the projection). Personally I see a difference between the two.
How about:
The REDSOX have as much chance at winning at least 100 games as they have at winning at most ___ games.
You guys can put up clearer questions as well, and we’ll go with the one that we can agree on.
I think #12 is easier to understand what is posted up top. I’m not sure how I would phrase it.
The team you start the season with is not always the team you finish with. That’s what makes such projections difficult.
Health and injury can be unpredictable (not in JD’s or Daisukes case, but say a Pedroia going down would be unexpected). If a lot of things go right, the Red Sox could win 100 games, but if things go bad, well, just look at 2006.
I think you would be best to project 95+/- 5 for the Red Sox. Anything more or less would be surprising (improbable), although not impossible.
How about:
“The REDSOX are as likely to win MORE than 99 games as they are to win FEWER than ___ games.”
I was far more confused by the wording in the Royals question, FWIW
Isn’t there a selection bias on the voters in this blog? The readers of this blog aren’t what I’d call the average American baseball fan.
If Josh Beckett is projected for a 3.75 ERA, does that mean he has an equal chance of putting up a 4.75 as a 2.75? I don’t think so.
That’s what it SHOULD mean, yes. That’s the whole point of regressing a forecast to the mean - so that one SD above the forecast is as likely as one SD below. If that’s not the case, the forecast is not being regressed properly.
Colin #18
I think people get confused, because counting stats tend not to fall in a symmetrical curve. If someone is projected to hit 30 HR, he is much more likely to hit 0 than 60.
But, like you said, if the projection is regressed correctly, then RATE stats should be pretty much symmetrical.
At the team level, it ends up being symmetrical because with so many players, your chances of complete failure (replacement level) end up at basically zero, while there is a chance any one player ends up at this level. The Red Sox most likely have a better chance at being a 50 win team than a 130 win team (plane crash?), but both probabilities are so close to zero that it doesn’t matter.
KY,
Did you check the results of the Royals poll? If there was a bias the results should be a lot closer together.
In fact I don’t think the Royals poll tells us anything at all. The average of the poll is 99.5 with an SD of 4.6. If each choice was selected once you’d get an average of 99.5 with an SD of 5.2.
For ERA, it’s a little bit more confusing. Had you said .270 wOBA as .350 wOBA, so a mean of .310 wOBA, then yeah.
But, those numbers translate to say 2.80, 3.75, 4.95 ERA. That’s because of the multiplying nature of runs.
The issue at the team level is that the total number of wins (as opposed to hits/runs/etc.) is fixed. If you have 30 teams and 81 home games scheduled per team there are a total of 2430 wins and 2430 losses - you are absolutely set at one win per game.
So for the Red Sox to win, say, six more games than their projection, some combination of the other teams has to win six fewer games. That puts additional pressure toward the mean that you don’t get with individual players.
Does it work the same with RA, or just ERA?
I should note that there is a technical name for what I described above, the hypergeometric distribution:
mathworld.wolfram.com/HypergeometricDistribution.html
You end up with a random variance for team wins that’s a bit less than what you’d expect if you simply assumed that everything was binomial.
"If something changes in the true talent of a good player, isn’t it likely that it would be a BAD thing?
Heck, even for average MLB players… far more things (mostly injuries) will make them bad than will make them better.”
Depends on how you came up with your team projection. If a team is truly a 90 win team, and the assumption is that that does not change, then the distribution of actual wins is going to be symmetrical.
If you assumed that everyone stays healthy, etc. then, of course fewer wins are more likely than greater wins. But that would be a bad projection.
You are supposed to already account for chance of injury, etc., in your projection, even for a player projection, so that the chances of a player or team ultimately having a true talent better or worse than your projection is the same. Of course a median and mean projection are not necessarily the same, although that is more true for players than for teams. And yes, if your median and mean are not the same, that automatically means that your distribution of possible results is not symmetrical…
Right, ERA or RA… same deal.
Isn’t the best way to settle the question of whether the fans think the distribution is to ask?
Have them assign a probability to each win total. Restrict it to units of X%, or group the wins, or both to make it manageable. Say, groups of 5 wins, and units of 5%.
You might still run into people not being able to follow directions, but it should answer the question.
Bill, I setup my polls so that the reader doesn’t have to think too much. If I did it your way, which is the correct way, I’d get one-third the voters, and I’d still get the same results.
I just find the way I do these polls, like with Lincecum and Strasburg, to convey the same thing, without making the reader sit down and come up with probability numbers.
You Blink, and you get an answer. It works, basically. That’s why I like it. And that no one else does it or thinks to ask a question in this way (forecasting the upper and lower boundaries) makes it the kind of quirky poll that I like.
Tango, have you played with the starting number to see if that affects the results at all? I wonder if the midpoint would move around much (or at all) if you had asked “The REDSOX are as likely to win MORE than 115 games as they are to win FEWER than ___ games.”
Just curious if that 100 figure is shaping people’s answers or not…
I wanted to use some reasonable number that was at 1 or 2 SD from the mean, so that I give people a choice of 10 or 12 answers. If I say 115, I’d have to give them numbers from 81 down to 60 or something.
Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data? And what about outliers?
Feb 12 04:55
Who is Jeremy Lin?
Feb 12 03:15
New PECOTA
Feb 12 02:42
Whitney Houston
Feb 12 02:23
Psst… wanna intern in Canada?
Feb 12 00:40
Clutch analogy
Feb 11 20:11
Fighting leads to goals?
Feb 11 19:55
Why do players get crappy caps?
Feb 11 19:12
Hero of the month: Brittney Baxter
Feb 11 17:59
MGL: Today on Clubhouse Confidential
After 20 votes, the Redsox are at 84.5. This means the chance that they will win at least 100 = chance they will win at most 84.5. That sets the midpoint as 92 wins.
The Royals are at 100 losses, setting the midpoint at 90.5 losses (or 71.5 wins).
It seems to me that fans have a good grasp of the role that probability plays. A fan may SAY that the Redsox are a 98-win team, but clearly he won’t believe that. Because if that’s true, then the chances that they win at least 100 is going to be the same as winning at most 96.
And the same for the Royals.
I think that if you do NOT ask the fans the question as I ask it, and instead just ask for one number of wins, they’re going to think “let’s see… some team is going to win 98-100 games, and I think the Redsox are the best team, so… Redsox at 99 wins”. They don’t really believe that, which is the point of these questions I ask.
This is just like the Strasburg v Lincecum poll. Yes, Strasburg can have a sub-2.50 ERA, and so can Lincecum. But, we’ll be far more shocked if Lincecum posts a 4.00 ERA than if Strasburg were to post a 4.00 ERA. And so, there’s no way you can call their midpoint the same.