Filter posts by...
Statistical_Theory
Wednesday, May 09, 2012
Phil tries to reason out “confrontations”, but I think he’s wrong.
His example of “not counting” half the goals basically CHANGES the confrontations. When I talk about “confrontations”, I mean “quality of confrontation”. Someone else proposed a better name, which is scoring chances.
In the NHL today, there are as many shots taken on even-strength as not. But the quality of those scoring chances are quite different. You score twice as much on a PP, and half to a third as much on a PK. On average therefore, more goals are scored when it’s not even-strength than when it is, even if the NUMBER of confrontations remains relatively stable. What counts is the quality of confrontation, or the scoring chance.
Holding all other things CONSTANT, the greater the number of confrontations, then the larger the home field advantage.
Holding all other things CONSTANT, the greater the chance of scoring (holding # of confrontations constant), then the larger the home field advantage. In this respect, Phil’s argument is fine.
Monday, April 23, 2012
Suppose you have a team that scores 6 runs per game and allows 3 runs per game. There’s a second team that scores 4 runs per game and allows 2 runs per game.
What is the runs per game expectation for each team, if they face each other? It’s identical. The ratio of runs per game is what we care about. So, a 2:1 ratio for one team and a 2:1 ratio for the other team will result in a 1:1 ratio when these two teams face each other.
However, if we only focused on win%, the 6:3 team has a win% of .783. And the 4:2 team has a win% of .759. And when a .783 team faces a .759 team, the odds ratio would suggest that the 6:3 team will win 53.4% of the time. But, the expected result should actually be .500.
In baseball, the relationship of runs to wins is to square the runs. But, the relationship of head-to-head scoring is the simple proportion.
This is one of those little quirks in the odds ratio that doesn’t work: baseball’s scoring distribution prevents the full use of the odds ratio method. Just because the odds ratio method does work in the typical cases, it doesn’t mean we can apply it in all cases.
Friday, April 20, 2012
I’ve already offered in the past that to figure out a team’s rest of season record, just apply regression toward the mean: add 35 wins and 35 losses to whatever record you happen to have. This article makes two misses:
However, one thing is evident: April is not just any other month.
But you can’t tell by the data presented. Indeed, I have no doubt that we’d see the same results if we looked at May or June records.
The R-squared in the model—a statistical measure that describes how well the fluctuations of the response variable (in this case, end-of-season winning percentage) are described by the corresponding changes in the explanatory variable (winning percentage in April)— was 0.257. This means that 25.7 percent of the variation in end-of-season winning percentage can be explained by teams’ April winning percentage.
It’s an interesting finding, since the average team played 26 games in April, or only 16.0 percent of its 162-game schedule.
Since the end-of-season record INCLUDES the April record, we expect to see some relationship, even if the rest-of-season was random. The relationship you should show should be April to rest-of-season, and not April to April-plus-rest-of-season.
Anyway, I’d like to see two things:
1. Is April any different than the other months (lesser number of games notwithstanding)?
2. Can we simply apply regression toward the mean at any point in the season? That is, add 35 wins and 35 losses to whatever record you happen to have.
A very related article by John Dewan here.
Tuesday, April 10, 2012
Phil is at it again, with some good analogies.
Tuesday, April 03, 2012
This is just a crazy thought I had. I haven’t tested it. Maybe someone out there can run the numbers. You get 9 points for a win, and then one bonus point for every run differential up to a maximum of 6 bonus points. So, win by 1, and you get 10 points. Win by 6 (or more), and you get 15 points. So, if you win three games by 1 run, or you win two by 6, and lose the third game, you get 30 points either way.
There are three points to consider when constructing this:
1. How many points for a win, your base level (*)
2. What score differential to cap?
3. Do you give points for losing close games, or do you get zero points whether you lose by 1 or 6?
(*) In my case is 9, which is a pleasant number as far as baseball is concerned. I kind of did a trial and error to see what numbers would seem reasonable, and once I saw that I was coalescing toward 9, I decided that’s a good enough number. My guess is that you can do this for any sport, and set the number as the average number of points per game. In baseball, it’s about 9 runs per game (4.5 for each team). I’d bet you can do the same thing in NHL and use 6 as the base level, and then cap it off at half of that, so that the range in NHL would be 7 to 10 or 11. NFL is probably 42 points as the base, so the range would be say 43 to 63 (maximum differential of 21 points). NBA? I dunno, say have a base of 200 points, with a range of 201 to 220 or something.
So, I’d like to see a bit of work from the Straigh Arrow readers, if you like to play around with stuff like this.
Monday, April 02, 2012
Patriot applies Bayes.
Friday, March 30, 2012
Some Millsy news:
http://princeofslides.blogspot.com/2012/03/wolverine-grows-some-scales.html
He notes he worked with Rodney Fort, which reminded me of this thread.
Thursday, March 15, 2012
I can just as easily say:
My problem is that the people evaluating statistics in sports are not experts in sports.
***
Given the choice, I’d rather listen to a ballplayer or a scout that knows nothing about statistics, than a statistician that knows nothing about sports. Ballplayers know instinctively, through experience, how to play. They know where to set themselves up on the field, based on the score, the base-out situation. They know how to pitch. They know this far far more than a statistician.
Statisticians are not experts in sports, and the media is not an expert in sports. Players and scouts ARE experts in sports. Those people know what they are talking about. You can incorporate their knowledge into your model, so that your model can reflect reality. And, you can point out any possibly shortcomings the players may have. But, these are small shifts, not seismic changes.
I couldn’t apply what I do to cricket or rugby until I’d be able to really understand those sports from a participant (player) view point. You have to be a subject matter expert first, before you can apply your technical skills.
I’m going to pick on this article, but the author shouldn’t feel bad, because really really smart people do the exact same thing, like Michael Schell, and SABRMatt (who I think was inspired by Michael).
Basically, what you have is two distributions, from two different era, against different competition, and the number of players is also different. So, the natural tendency is to “normalize” everything, so you have the same mean and the same standard deviation. And then, voila, you can make a comparison.
This presupposes the two important points:
1. The mean of both distributions must be equal
2. The spread in talent of both distributions must be equal
This is the entire point of normalizing. But before we go ahead and ensure the above happens, you have to ask if you actually WANT that to happen.
If you are trying to ask: “Did Babe Ruth dominate his era more than Barry Bonds did?”, then, yes, standardizing in this manner is perfectly reasonable. That is NOT the same thing as asking “Was Babe better than Bonds?”.
For example, let’s say that Jesse Owens beats the field by an average of 0.20 seconds, with an average running time of 10.3 seconds (just made that up). While Usain Bolt beats the field by an average of 0.10 seconds, with an average running time of 9.8 seconds (I made that up too). So, what are we to make of that? Well, that Owens’s COMPETITION was weaker than Bolt’s RELATIVE to each runner’s talent. We can’t even compare their mean times, because of track configurations, conditions and footwear.
But, what if I said that little 12yr old Adam Dunn beats his field by an average of 0.80 seconds, with an average running time of 14.2 seconds (made that up too)?
So, these transformations of distributions are RELATIVE to the actors involved. And, in no way should we assume anything about the distibutions relative to other distributions.
Here’s another example. Let’s say that we have 30 MLB teams, the average OBP is .330 and the league leader is at .430 (and the bottom guy is at .280). Then, MLB expands to 130 teams. The average OBP remains at .330 (for every bad pitcher added, you also added a bad hitter/fielder). The league leader is at .530 (every bad pitcher added makes it easier for the league leader to hit well), the OLD bottom guy (the one at .280) shoots up to .340, while the NEW bottom guy is now at .230.
Now what do you do? Do you transform this distribution? Well, I made the spread twice as large in the 130-team league as I did in the 30-team league.
So, that .530 guy in the 130-team league gets cut down to a .430. But, that .340 guy in the 130-team league gets cuts to .335 in the 30-team league… even though he was actually a .280 hitter in the 30-team league!
And that .230 hitter in the 130-team league gets bumped up to .280 in the 30-team league… even though THAT guy would have been in the low minors in the 30-team league.
The point here is that when you do these transformations to bring the top guy down, you ALSO bring the bottom guy UP.
That’s how these transformations work. Because these transformations depend on the two points I noted above: matching the mean, and matching the spread.
And this is the reason I shy away from all such distribution transformations. While they will answer the one very specific question, they are instead used in a more broad sense.
What we’d really want is to have some competition-neutral number. Say, we’d give Barry Bonds a number, say “200”. And we’d give an MLB regular player another number say “120”, and we’d give some bench player another number say “90”, and the average AA guy gets “60” and the top high school players get “40” and the best LL player get “20”. You have a scale like that.
So, if you have say a 6-team league, your league is filled with 140-talent-level to 200-talent-level players. Their mean will be say 160, and the spread would be say one SD = 7. If you had a 20-team league, then you’d have a league of 110-talent to 200-talent-level players, with a mean of say 140, and a spread of one SD = 10. And if you had a 2000-team league, your talent would be from 20-talent-level to 200-talent-level, with a mean of say 50, and one SD = 9
The problem is that we don’t have these competition-neutral numbers. We could try to come up with them, and that’s ostensibly what scouting is about, the point of the 20-80 scale. A guy who runs 3.1 to 1B gets an 80 regardless of what his competition-level is. He can be 15 years old, or 30 years old, he can be a great hitter, or he can never play baseball for his life. But, that 3.1 stands on its own.
These transformations of distributions simply try to hide all of that, and synthesize themselves into nice numbers. But those numbers are still only applicable under the original assumptions: matching mean, and matching spread of talent.
Wednesday, March 14, 2012
[My site] sponsors [large N] online leagues (12 team MLB, 5x5). For 2011, we have all the draft information and final standings.
I used my composite projections from 2011..., determined $ value per player, and then compared the correlation of each team’s summed pre-season value against their roto points.
Surprisingly, this only came in at 8%.
Let’s say that everyone works off the same list (Marcel). Everyone uses that to snake-draft, or (even better) auction off.
What do you think the expected value of each team is? Well, it’s going to be identical!
What is the correlation between the expected value of each team and its final observed value? It’s going to be zero!
That the correlation is so low can tell you two things:
1. It doesn’t matter how you draft because you get random results
2. It HIGHLY matters how you draft because everyone is so close to the same valuation system
All you need to drive the correlation UP is to have one knucklehead pick randomly. In that case, you have a larger spread in “true talent”, and so, that’s going to drive the correlation. Otherwise, since everyone will have most of the players roughly valued the same, the correlation is only picking up those guys that are valued differently. How many of those guys can there be?
Wednesday, March 07, 2012
By , 10:39 PM
I’ve been doing some interesting research on all of the above - actually how they relate. The data I am using is this:
Average seasonal fastball pitch speeds from FG, which uses BIS data, not pitch f/x, I think. DL data, which for this research, was just number of days spend on the DL in a season. For most of this research, I only noted “on DL” or “not on DL” for a particular season. Finally, I used my own NERC (normalized component ERA) data which are ERA looking numbers produced from a BaseRuns formula on the underlying components, including WP I think. They are all adjusted for context - defense (as best as I can, using UZR data), park, opponents, and league. 4.00 is defined as a league average pitcher in both AL and NL and a 4.00 in the NL is equivalent to a 4.00 in the AL since I do league adjustments. Anyway, those details are not that important.
Here is some really interesting data and discussion:
Read More
Monday, March 05, 2012
By , 01:37 AM
From the Sloan thread, where we were discussing handicapping touts and cherry picking results…
Here is another MGL quiz:
Let’s say that you make 200 bets a year for 10 years and your “true” winning percentage is 50%. IOW, you flip 200 coins per year for 10 years. You publish your yearly results. Someone doesn’t like you and they want to make sure that everyone know that you are a loser. They decide to be (somewhat) truthful and publish some or all of your results. They decide to choose your results anywhere from “the last 3 years” to “all 10 years”. He doesn’t want to choose the last year or 2 years because everyone will say that that could be just a fluke.
What are the chances that he can find a subset of your results, limited to “last 3 years,” “last 4 years,” all the way to “last 10 years,” such that you are indeed a loser, despite being 50/50 (theoretically) in actuality?
What about if you have a true 52% probability of winning?
55%?
Same question, but the guy decides that he’ll use 1 or 2 years as well if he has to?
Sunday, March 04, 2012
Fun stuff.
Friday, March 02, 2012
Let’s presume that the better team will win 60% at home and 50% on the road.
We have one scenario where they play two games on the road, then the next three at home. In that particular case, they have a 61.2% chance of winning a 5-game series, and, more importanly, a 15% chance of winning it in 3 games and a 10% chance of losing it in 3 games.
On the other hand, let’s say you have a 2-2-1 scenario. The odds of winning remains identical at 61.2%. However, the chance of winning it in 3 is all the way up to 18% and the chance of losing it in 3 is all the way down to 8%.
Indeed, suppose you gave two closer teams. Let’s give the odds as 56% chance of winning at home, and 46% chance of winning on the road. (Overall, an average of 51%.) In a 2-3 scenario (or any kind of scenario actually), the better team has a 53.8% chance of winning. But, in that 2-3 scenario, the chance of winning it in 3 is 12%, which is LESS than their chance of losing it in 3 (13%)!
The key point is that the overall odds don’t change though. But, that’s going to be lost on a large number of people, as they focus on the chance of losing the series because they didn’t have the home field advantage.
Sunday, February 19, 2012
By , 02:35 AM
http://www.sloansportsconference.com/?p=6137
Click on the link on that page to download or view the entire paper.
I read it. It is short. I don’t understand the model. I am not really sure what their “naive” model is. It sounds like it is simply a pitcher’s overall (across all situations, counts, etc.) fastball percentage, but they don’t really say.
If it is, of course anyone can come up with a model that incorporates inning, count, score, batter, etc. that is a lot better, so I have no idea how to judge their results.
Also, one of their parameters is “score differential.” They found very little weight to be attached to that parameter. That doesn’t make sense since surely if you are ahead by lots of runs in the late innings you are supposed to throw lots of fastballs. The reason their results don’t make sense, I think, is that they defined score differential as the absolute value of the difference between the scores. That is ridiculous of course. Pitchers are going to have markedly different approaches depending upon whether they are UP or DOWN in the game, especially when the margin is large. If I am up a lot of runs, I can throw mostly fastballs. If I am down lots of runs, I need to mix up my pitches as much as a I can to prevent any more runs from being scored.
So the reason they got around the same fastball rate regardless of the “score differential” was probably because the change in pitching approach when down and when up cancelled one another out.
If they didn’t consider the “sign” of the score (whether the pitching team was ahead or behind) in their model, I can’t imagine that it can be a very good model.
Anyway, if anyone wants to read this (as I said, it is a short paper), I would like to hear your comments. I am going to the conference and would like to ask the authors some questions…
Wednesday, February 15, 2012
I agree with Phil:
Any of those things will necessarily change the results a tiny bit, in one direction or the other. Maybe concentration makes things worse, maybe it makes it better. Maybe it’s even different for different hitters.
But we *know* something has to be different. It would be much, much too coincidental if every batter did something different, but the overall effect is exactly .0000000.
Clutch hitting talent *must* exist, although it might be very, very small.
So why are we so fixated on zero? It doesn’t make sense. We know, by logical argument, that clutch hitting can’t be exactly zero. We also know, by logical argument, that even if it *were* exactly zero, it’s impossible to have enough evidence of that.
I’ve repeated that refrain for years. By the simple reason that humans are involved guarantees that we’re going to get non-randomness. It’s always a question of the DEGREE of non-randomness.
Matt:
I made up a simple card game. There are two cards, A & B. I put one of them face down and you try to guess which card it is.
If it is A and you guess A you win $1 If it is A and you guess B you lose $3
If it is B and you guess B you win $3 If it is B and you guess A you lose $1
Obviously, me playing A is a much better move than playing B. I only have to give you $1 if you win and I can gain $3 if you guess incorrectly. Of course, you know that so if I always played A you would always guess A and clean me out.
I know game theory centers around finding the approach that makes your opponents decisions irrelevant. I did the math and came up with this:
My best strategy is to play A 50% of the time and B 50% of the time. Your best strategy is to guess A 75% of the time and B 25% of the time.
I think I did this correctly, but the result is so counter-intuitive that I’m wondering if I didn’t make a mistake.
MGL:
Yes, your answer is correct. I don’t know why you think it is counterintuitive. I must guess A enough to keep you from taking advantage of the fact that A is the best choice for you (if I played randomly or incorrectly too much toward B). You must not play A more than B because otherwise I would simply guess A all the time. Like the poker player who bluffs even slightly too much, his perfect opponent should always call a potential bluff.
In this case, it just so happens that because the A and B payoffs are “balanced” you must play A and B 50/50 and I must play the ratio of the payoffs (3/1).
Friday, February 10, 2012
Pete:
When it’s said 3 years’ defensive data is needed to judge a player, what does that mean? I’ll use Biggio as an example (since they’re talking about him at Baseballthinkfactory) - from ‘92-’02, he was worth about -5 runs/year fielding except for ‘97 where he was +19. Was he (1) a generally poor fielder who had a good/lucky year, or (2) still a poor fielder in ‘97 that looks good only because of the noise in the numbers?
Me:
You never throw data away, unless you have a REALLY REALLY good reason to do so. And even then, it better be REALLY REALLY REALLY good.
The more data you have, the less you need to regress. So, you need two years of fielding data to tell you as much as one year of hitting data. Would you make conclusions based on one year of hitting data? No? Then, you need more than two years of fielding data.
Saturday, January 28, 2012
This is an interesting story.
So in a sense, if they’re lucky enough to be unlucky, they could be tested numerous times.
According to Major League Baseball, about 1,200 random drug tests are administered during a regular season on the roughly 750 players, with 375 more tests given in the offseason.
If a player is found to have been in violation of the drug policy over a stimulant, he is subjected to six more random tests over the next year. Positive tests for stimulants are not make public.
Unless Bautista was exaggerating, you’d have to think he’d have to be really lucky to hit the drug testing lottery 16 times in the past two seasons. Unless somebody thinks there is a “reasonable or probable cause” or “suspicion” that he might be taking something.
But those additional drug tests are usually conducted based on a player’s sudden change in appearance, or demeanor, or if somebody saw something either in or outside the dressing room, or the player gets into trouble with the law.
Suddenly hitting home runs at a remarkable rate wouldn’t cause a drug test or a player to be specifically targeted, according to a Major League source. However, any player, who has previously failed a test, is subject to increased testing.
The Blue Jays third baseman/outfielder declined to comment on the matter via the team on Thursday but a team source said Bautista was neither upset about his testing nor wished to make an issue out of what was said at the banquet.
Thursday, January 26, 2012
By , 04:36 AM
It is generally accepted in the sabermetric community that the AL is a better league than the NL, at least for the last several years. This is evidenced by the fact that the AL has a large advantage in IL games, although at least some of that edge could be something other than overall “talent”, although this is not likely and several people, including myself, have found little or no inherent advantage to the AL in IL games (e.g., the NL teams do not have any DH’s, so they have to juggle their lineup in AL parks, on the other hand, in NL parks, AL teams have to sit their DH’s or juggle their lineup, perhaps putting a bad defender - their DH - in the field, the AL pitchers typically are poorer hitters than the NL pitchers, etc.).
Read More
Recent comments
Older comments
Page 1 of 342 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date