Filter posts by...
Talent_Distribution
Friday, May 20, 2011
Matt compares the performance of players who played in both conferences in the same year. In this past decade, it looks like the hitters ended up with runs created that was about 7% lower compared to their AL peers than NL peers (meaning that AL had the better hitters). 7% of about 4.5 runs is about 0.3 runs per game. We can figure that if you redid with pitchers, you’d get something similar, maybe a bit less. Let’s say 0.2 runs per game. So, a team facing AL competition will end up with a 0.5 run disadvantage.
That’s pretty much what I’ve been using these past few years, that the AL team facing an NL team will have a .550 win% (i.e., +.05 wins, or +0.5 runs).
Friday, April 22, 2011
A few months back, I made the case that the wild-card (i.e., the 4th team), was a perfectly reasonable thing to have in MLB. That’s because the average wild card was, on average, better than the division leader of the worst division. The problem is that the 5th best team is not so lucky, and they would be, on average, worse than the division leader of the worst division.
Bringing in that 5th team will mean that the competition won’t be as strong overall. But, as Tommy reminds us, it’s not like the 162-game season is an accurate reflection of talent anyway. It’s alot of luck. And so, what’s another team to add more luck.
We’re used to this in the NHL, seeing 16 teams. And if you followed the NHL playoffs this past week, you would realize that: WHO CARES! This is FANTASTIC hockey. This has been great. The action is non-stop, never a dull moment (a far cry from the regular season). And this is round #1 with 16 teams. Why in the world would I say: “You know what, give me LESS exciting hockey.”
The only reason you even have the thought is that you think that the regular season is supposed to “mean something” in terms of eventually winning the Stanley Cup. So, you think 8 teams is enough out of the 30 in the regular season to “earn” the playoffs.
Again, who cares who actually deserves a chance to win 16 games for the Stanley Cup. I’m perfectly happy having half the teams being given the chance, because they still have to win 16 games. Heck, I’d even put 24 teams in the playoffs. For teams 9 through 24, give them a best 2 of 3 (over 4 nights), with all the games played at the home of the better seeded team. The #1 through #8 are given a 5-day bye, well-rested, and ready to face whoever comes out of that play-in tournament. The best of all? More playoff hockey! And, by doing that, you are making it even more important to finish in the top 8 in the regular season.
You can even construct a system where all the teams make the playoffs, by tiering the teams so that the regular season means something, but you get playoff hockey.
And, the same applies in baseball. I’d love to have all 30 teams in there. How could that work? You take the bottom 12 teams, and given them a one-game play-in. After that one game, you are down to 24 teams. Then, the bottom 16 (of the 24) go into a best 2-of-3 play-in (all at the home of the better seeded team). These two play-in tournaments occur with no rest days. Now you’ve got 8 well-rested teams, and 8 teams that have played 2, 3, or 4 games. All the games of the third round are also all played a the home field of the better seeded team, and you make it best 3 of 5.
After all that, you are now down to 8 teams.
I forgot to say: yes, I know I’m crazy to dare suggest that the virginal MLB playoffs be touched in any way. That any suggestion to change MLB is immediately met with resistance, regardless of how good the suggestion is. That inertia is really tradition, and we should never touch tradition. And how life was better in the old days. If you are of the inertia mindset, please, don’t post here. Do a google for Selig tradition, and post in that forum instead.
Monday, April 18, 2011
Matt gives us his write-up.
Friday, April 01, 2011
Someone supplied me with even strength shots faced and saves made for, well, I didn’t ask. It has 55 goalies, with an average of 2665 even strength shots faced, and Kipper leading at 4784. That’s probably about four seasons worth. Not important.
Step 1: Figure out how much one standard deviation is based on a binomial distribution. Vokoun faced 4249 shots, and the league average save percentage was .920, so one SD is sqrt (.92*.08/4249) = .0042.
Step 2: Figure out how much away you are from the mean. Vokoun’s save percentage was .931, and so was +.0108 from the mean.
Step 3: Figure out how many SD that is. .0108/.0042 = 2.59. That’s his z-score.
Step 4: Do it for all the goalies. (Thomas is 2.57, Luongo is 2.53… Holmqvist is -3.11, Raycroft is -2.34).
Step 5: Find the standard deviation of all the z-scores. In this case, for these 55 goalies, it’s 1.38.
Step 6: Rejoice if the number is substantially higher than 1.00. Happiness sets in at 1.10. You did good at 1.20. If you get 1.40, you’ve definitely found something.
Step 7: Figure out the average number of opportunities for each player. In this case, the average shots faced was 2665.
Step 8: Do this: 1 - 1/1.38^2 = 0.47. That’s your r or r-squared. (Longer story later. Just call it r for now.) That 1.38 was from Step 6.
Step 9: Do this: (1-r)/r * 2665. We get 2969. The 2665 is from Step 7.
That’s the key number. 2969. Let’s call it 3000. That’s how much you use to regress a goalie’s performance. You add 3000 shots of league average performance. So Vokoun’s 4249 shots at .931 save percentage gets added to 3000 shots at .920 save percentage for a best-estimate true talent level of .926. Holmqvist’s .900 with 1809 shots becomes .912. So, the observed difference between the two goalies (.031 saves per shot) becomes a true difference of .014.
A couple of important points:
1. This tells us how many talent there is FOR THE SAMPLE of goalies. Put in less goalies, the talent level will be tighter, and the regression will be higher. Put in more goalies, the talent level will be wider, and the regression will be smaller.
2. It’s best not to mix years without adjusting. As long as the seasons you are mixing up has a fairly static talent level and league averages, then don’t worry.
Thursday, March 31, 2011
Poz’s list. The way to look at these lists is to ask:
Are his top 5 in my top 10?
Are his top 10 in my top 30?
Are his top 20 in my top 50?
Are his top 30 in my top 80?
Are his top 40 in my top 100?
Because that is really how the talent is spread out.
Wednesday, March 30, 2011
Bill said (and has said often in the past):
If we went from 30 teams to a mere 300, on the other hand, carefully managing the expansion, it would make no difference whatsoever in the quality of talent. That’s my view.
We can approximate this easily enough. Doing a back of the envelope calculation, the 750 best players would be at 4.30 standard deviations or better. In order to get 10 times that number, the standard deviation would be at 3.80. Creating a simple enough model, the talent level of the next best 6750 players is roughly two-thirds of the top 750.
Roughly speaking, the gap in talent level between the 75 best MLB players, and the next 675 best is roughly the same as the gap in talent between the top 750 and the next best 6750. So, expanding from 3 teams to 30 would have the same effect in terms of noticeable change in talent as would expanding from 30 to 300.
Monday, March 28, 2011
If I wanted to create a “true-talent” estimate of a past year, how would you suggest weighting the surrounding years? I.e., for 1994 I would weight ‘93 as 3, ‘94 as 5, ‘95 as 3… Or how would I go about answering my own question by analyzing the data?
When we normally do our best estimate for a player’s true talent level, it’s usually based on known past information. So, today, March 28, 2011, what is Justin Verlander’s true talent level? And so, we look at his career, we weight the more recent performances higher, we consider his past performance relative to his age, and we come up with a best estimate of his true talent.
We can figure out the weights because we can look at an unbiased measure of his performance, the out-of-sample performance in 2011, and correlate that performance to his past performance (after adjusting appropriately). We do this historically with all pitchers, and we come up with a weight that basically says to take 4 parts last year, 3 parts the year before that, 2 parts the year before that, and 1 part the year before that (or some other similar scheme).
But, the reader is asking, what if today is July 1, 2014, and with the benefit of hindsight, we ask what is our best estimate of Verlander’s true talent level on July 1, 2011.
Our problem is now we don’t have an unbiased data point to compare against. Could we simply extend the above scheme and say that the performance in 2011 will be weighted as 4, the performances in 2010 and 2012 (one year before and after) will be weighted as 3, the 2009 and 2013 performances will be weighted as 2, and 2008/2014 will be weighted as 1. And we adjust for age. And can we simply apply the standard regression amount based on playing time?
I don’t really have a good answer, and I’d like to hear from the straight arrrows out there on presenting possible frameworks for discussion.
Sunday, March 20, 2011
Jeremy sent me his data. I separated the pitchers from nonpitchers.
For the nonpitchers, taking the top 500 players in each year, I get a best-fit of:
200 / (order + 36)
That minimizes the RMSE to 0.69.
So, for the #1 pick, that sets his forecast WARP to 5.41. The #2 is at 5.27. The #500 pick is at 0.35.
For pitchers, taking the top 400 pitchers, I get a best-fit of:
204 / (order + 40)
The RMSE is 0.60.
#1 pick is 4.98.
This seems to suggest that the level of uncertainty for pitchers and nonpitchers is about the same.
***
Keeping all the players together, I get a best-fit of:
400 / (order + 71)
RMSE is 0.65
#1 pick is 5.53. #900 is 0.41.
If we set the #1 player at 100%, then we get this:
100%: #1
75%: #25
50%: #73
25%: #218
As noted, this means trading a 75 percentile and 25 percentile player for two 50 percentile player is a fair trade.
***
As I noted here last week, with regards to the Amateur draft:
So, a #1 pick is at 100%, a #8 pick is at 50%. A #3/#4 pick is at 75%. A #22 pick is at 25%.
The curve is far steeper with the amateur players, which is no surprise. This means there’s a huge dropoff in talent in the amateur ranks, from slot to slot, compared to the pros.
However, it’s not a fair comparison, because in the pros, you’ve got players aged from 21 to 41, while the amateurs being drafted are from a much more limited age class (18-21, and not already signed).
***
In order to make a fairer comparison, the pros should be limited to a more limited age group like the amateurs.
Anyway, fun stuff, and thanks to Jeremy for the idea, inspiration and getting the data.
Friday, March 18, 2011
Now this is a great idea. Rather than drafting HS and college players, and finding out how much WARP each draft pick generates historically, Jeremy does the seemingly simple thing of drafting MLB players and finding out how much WARP you get. Just a lovely idea.
As you know, the head-to-head matchups of AL v NL in MLB has a decided advantage for the AL teams. Quite stunningly high, if I cherry-pick since 2005: AL teams are .561 against NL teams. If I regress a little, I typically treat the average AL team as a .525 team and the average NL team as a .475 team, so that when they come head-to-head, the AL team would win .550.
By the way, since 2005, the KC Royals are 58-50 against NL teams. Only 5 teams in the AL have a sub-.500 record against the NL, with the lowest being the Indians at .444. Since 2005, THIRTEEN of the 16 NL teams have a sub-.500 record against the AL, with just the Rox, Cards, and Marlins as above .500. At the bottom are the Pirates at .333 against the AL. HALF of the NL teams have a worse record than the Indians (who are last in the AL). A truly horrible setup.
Gabe gives us the numbers in the NHL. Since the lockout, the Western Conference scores 51.64% of all the goals. To convert that to a win%, you simply do: 2*goalRate - .5. So, scoring 51.64% of goals means winning 53.28% of all games. This 53% figure is nowhere close to being MLB bad (56%).
What are the numbers in NBA and NFL over the same last six years?
Wednesday, March 16, 2011
Very interesting, and it seems to fly in the face of what I remember reading from The Hidden Game:
***
Side note: I presume that ERA+ was averaged improperly, only because I’m cynical about it and that experience has shown me that more than 50% get it wrong.. The median however is the median.
Friday, March 11, 2011
One way to figure out how difficult it is to field a position is to see how well a player does relative to his positional peers, at various positions.
For example:
Read More
Thursday, March 10, 2011
Good stuff from Jeff (and MGL?). I’m not sure that setting the zero-point at age 20 was appropriate however, as the reader is going to make a conclusion that the data doesn’t necessarily support, not to mention that data at that low age has a high uncertainty level. Zero-ing it out at around age 23 would probably be clearer.
Tuesday, March 08, 2011
Kahrl makes the case that there’s been a large shift toward catchers who can hit. The reason should have to do that teams are trading defense for offense, that with the running game down, that teams shift toward catchers who can hit. Or it can simply be cyclical. Or there’s been more efficiency by the teams in finding catchers who are overall better than they have been in the past.
Monday, February 28, 2011
Great charts from Hawerchuk.
***
Side note: all other things equal, a 210 lbs player is preferred to a 180 lbs player.
Speed and strength are much better these days. The “little things” that the oldtimers apparently were so good at.... well, I doubt that’s true, considering how much those guys caroused (drinking, smoking) compared to players today.
It’s a fool’s journey to try to compare player past more than one generation. We can be foolish to try, but don’t remain the fool by continuing to try.
Thursday, February 24, 2011
Both! It all depends on where you set the PA threshold.
All data is from Stanley Cup seasons 1987-2010, when the NHL had 4 rounds of best-of-7. Excluded is the 1995 Cup 48-game season (and naturally the lockout year). That gives us 22 Stanley Cups.
In each season, I ranked the teams by points (1 through 30). In all 22 seasons, the Stanley Cup was won by a team that finished in the top 8 in the standings. The team that led the league in points won 7 of the 22 Cups, or 31.8%.
Let’s focus on the teams that led the league. Here’s how many rounds they won:
4: 7 times (and won the Cup)
3: 1 times (lost in Finals)
2: 5 times (lost in Final 4)
1: 4 times (lost in Final 8)
0: 5 times (eliminated first round)
Boring stuff to skip
Suppose the #1 team has a 75.1% chance of winning each series. The chance of them winning 4 consecutive series is 75.1% ^ 4 = 31.8%. So, given 22 seasons, we’d expect 7 Stanley Cups, just like it really happened.
The #1 team made the Cup finals 8 of 22 seasons (or 36.4%). In order for that to happen, they’d have to have a chance of winning each series 71.4%.
The #1 team reached the Final Four 13 of 22 seasons (or 59.1%). In order for that to happen, they’d win each series 83.9% of the time.
The #1 team won at least one series 17 of 22 seasons (or 77.3%). That is its chance of winning that first series.
Putting it together, the #1 team has around a 77% of winning each series.
A three-round Cup run
If the NHL was down to a 3-series Cup run (8 teams, or 12 teams with byes for the top 4), the #1 team would need to win 3 series of 4-of-7. That would imply they’d win the Cup .77^3 = 46% of the time according to the boring stuff above.
An alternative way to look at it is that the 17 times that the #1 team entered the playoffs in the Final 8 round, they won 7 times. 7/17 = 41%.
Conclusion
An NHL fan is suggesting that the #1 team should win the Cup 32% of the time if they want a 16-team playoffs, and 41% of the time if they want an 8- or 12-team playoffs.
My question: how many MLB teams would you need in the playoffs in order to get a 32% or 41% chance of the #1 team in MLB of winning the World Series? How many in the NBA? The NFL?
Who wants to do the legwork on presenting the matching data for these sports?
Wednesday, February 23, 2011
As you know, the MLB 162-game regular season schedule is equivalent to the NHL 82-game schedule in terms of how the final standings reflect the true talent of their teams.
Since MLB has never tested various forms of playoffs in terms of having “not enough” or “too much” teams, I asked NHL fans what they think. And their preference is to have 12 teams (out of 30) to make the playoffs.
The likely structure would be to give the top 4 teams a bye, and the next 8 teams play in an elimination round. Though I didn’t ask the question, it also seems unlikely that they’d want the elimination round to be a best-of-7 that could last 10-14 days. Presumably a one-week bye is the most that would be supported, and so a best-of-5, with limited off days would be the likely alternative (and extra bonus for the bye teams). Anyway, this means that the top 4 teams need to win 12 games for the Cup, and the bottom 8 teams need to win probably 15 games.
Would this apply directly to MLB? Now, don’t forget that one NHL game tells you as much as two MLB games. Therefore, in order to have 12 teams make the MLB playoffs, you would need the teams to win 24 (or 30) games for the World Series. This basically is to counteract the extra luck that you have in MLB compared to NHL. In order to reduce this aspect of luck, you need to reduce the number of teams in MLB playoffs.
If we treat the NHL 12-team as the model, then I suspect an 8-team playoff (12 wins for the Cup) is still too many teams (or too few games).
The NHL and MLB regular seasons each last six months. The NHL playoffs with 12 teams would last up to 7 weeks, compared to the 4-5 weeks that the MLB playoffs currently run. In order to get to the ideal point, you either need to make each playoff series a best of 9 (or even best of 11) to reduce the luck, or you need to have less than 8 teams make the playoffs.
I’m throwing the challenge out there. If the proposed NHL scenario of 12 teams is ideal (as outlined above), how often will the top 2 regular season teams win the Stanley Cup? In order to match that, what kind of MLB playoff scenario do you need (changing the number of teams and/or number of games)?
Tuesday, February 22, 2011
I dunno, Poz. It seems to me this is confirmation bias.
You look at when a player’s career ends, and then decide that he couldn’t beat father time that year. Well, what about when he did beat father time for all the years leading up to that point? Didn’t Gordie Howe beat father time when at the age of 46 his team won the WHA cup, to finish second in playoff scoring on his team behind his own 19-yr old son (the great and brutally underappreciated Mark Howe)? Even in his last season, Gordie Howe played EVERY GAME just as he was turning 52 years old.
What about the recreation of Carlton Fisk or Paul Molitor? Who would have believed these guys would play as long and as effectively as they did?
The proper way to do this is to look at a particular age, and say: ok, this is where the test will happen. You set up the test without knowing the end result first.
But, no, I don’t believe that a player will simply “fall off”, that he simply gets to a point where he goes from being above average to below replacement in one season. It only looks like that, because old guys are not given a chance to show that the observation just happened to have disproportionately bad luck.
To be sure, an old guy ages much worse and the older he gets the worse he ages. But, it’s not a sudden fall.
Monday, February 14, 2011
Here is what I don’t get about WAR. Suppose I have two catchers. My starter is +5 wins/600 PA and my backup is +1 win/600pa. I plan on my starter getting 600 PAs and my backup 100. My expected value from the catcher position is thus 5.17.
Now, if my starting catcher was to be hurt before the year, I would now have (to make it consistent) 600 PAs of my backup catcher, and 100 of my replacement level catcher (we’ll call him +0wins/600pa. So what do I have, I have a catcher position worth 1 win.
So, by losing my starting catcher (valued at +5 wins), I actually lose 4.17 wins.
Do you follow, if his true value over replacement was 5 wins, I would lose 5 wins by replacing him, but in fact, I don’t, because the back up (by way of being on the 25 man roster instead of being a freely available player) is actually better than replacement level. So, I guess the question is, should WAR be adjusted to accommodate that most players aren’t replaced by a minor leaguer, but rather a guy off of the bench?
Suppose your first catcher is +3 wins above average per 600 PA, and the second catcher is -1 wins relative to average per 600 PA. You give the first guy 600 PA and the second one 100 PA and so this team is +3 -1/6 = +2.83 wins above average.
Now, your main guy goes down, and the backup at -2 wins relative to average per 600 PA comes up. Now you’ve got -1 -2/6 = -1.33 wins relative to average.
Instead, your backup goes down, and his backup takes his place. We have +3 -2/6 = +2.67 wins above average.
What do we have? If the main guy goes down, the team goes from +2.83 to -1.33, or a change of 4.17 wins.
If the backup goes down, the team goes from +2.83 to 2.67 or a change of 0.17 wins.
How do we want to represent that first catcher? +3 wins per 600 PA above average? +4.17 wins above the chained replacement? +5 wins per 600 PA above the “minor league callup”? How do we want to represent his backup (-1 relative to average, +0.17 relative to chained replacement, +1 wins above minor league callup) ? And what of the callup?
Don’t use replacement level unless you know what you are doing with regards to handling the playing time issue and roster management.
What you need is performance relative to average. Use that. But at some point, you are going to need playing time. The moment you need playing time, you need to stop and figure out how things work. You’re going to end up with some sort of replacement-level model.
What replacement level gives us is a very useful shortcut. It gets us to where we want to go. It sets the lower boundaries to match the salary of the players (league minimum). It lets us do comparisons simply and quickly.
But we can “break” replacement-level by creating various scenarios. That just means you haven’t appreciated replacement level well-enough.
In the above example, we can call our player +4.17 wins per 600 PA above the chained replacement. But the chained replacement is not going to cost you the league minimum! We want to set it so that the zero point is the league minimum.
Recent comments
Older comments
Page 3 of 344 pages « First < 1 2 3 4 5 > Last »Complete Archive – By Category
Complete Archive – By Date