Filter posts by...
Statistical_Theory
Friday, February 10, 2012
Pete:
When it’s said 3 years’ defensive data is needed to judge a player, what does that mean? I’ll use Biggio as an example (since they’re talking about him at Baseballthinkfactory) - from ‘92-’02, he was worth about -5 runs/year fielding except for ‘97 where he was +19. Was he (1) a generally poor fielder who had a good/lucky year, or (2) still a poor fielder in ‘97 that looks good only because of the noise in the numbers?
Me:
You never throw data away, unless you have a REALLY REALLY good reason to do so. And even then, it better be REALLY REALLY REALLY good.
The more data you have, the less you need to regress. So, you need two years of fielding data to tell you as much as one year of hitting data. Would you make conclusions based on one year of hitting data? No? Then, you need more than two years of fielding data.
Saturday, January 28, 2012
This is an interesting story.
So in a sense, if they’re lucky enough to be unlucky, they could be tested numerous times.
According to Major League Baseball, about 1,200 random drug tests are administered during a regular season on the roughly 750 players, with 375 more tests given in the offseason.
If a player is found to have been in violation of the drug policy over a stimulant, he is subjected to six more random tests over the next year. Positive tests for stimulants are not make public.
Unless Bautista was exaggerating, you’d have to think he’d have to be really lucky to hit the drug testing lottery 16 times in the past two seasons. Unless somebody thinks there is a “reasonable or probable cause” or “suspicion” that he might be taking something.
But those additional drug tests are usually conducted based on a player’s sudden change in appearance, or demeanor, or if somebody saw something either in or outside the dressing room, or the player gets into trouble with the law.
Suddenly hitting home runs at a remarkable rate wouldn’t cause a drug test or a player to be specifically targeted, according to a Major League source. However, any player, who has previously failed a test, is subject to increased testing.
The Blue Jays third baseman/outfielder declined to comment on the matter via the team on Thursday but a team source said Bautista was neither upset about his testing nor wished to make an issue out of what was said at the banquet.
Thursday, January 26, 2012
By , 04:36 AM
It is generally accepted in the sabermetric community that the AL is a better league than the NL, at least for the last several years. This is evidenced by the fact that the AL has a large advantage in IL games, although at least some of that edge could be something other than overall “talent”, although this is not likely and several people, including myself, have found little or no inherent advantage to the AL in IL games (e.g., the NL teams do not have any DH’s, so they have to juggle their lineup in AL parks, on the other hand, in NL parks, AL teams have to sit their DH’s or juggle their lineup, perhaps putting a bad defender - their DH - in the field, the AL pitchers typically are poorer hitters than the NL pitchers, etc.).
Read More
Friday, December 23, 2011
Pinata time:
However, I scoff at the notion that pitchers have null impact on balls in play, as well as the corollary that anything outside of a league-average BABIP is attributable to the whims of random variation.
No current analyst holds the “null impact” position. I scoff at the notion that the sun will rise from the west.
Luck essentially removes the explanatory responsibility for any glitches in the sabr-matrix, which is irresponsible science on the part of those whose goal is to study the inner-workings of the sport.
Terrible sentence. Luck is luck. EVERY observation that results in a binary outcome is subject to luck. If something has a characteristic of happening that is greater than 0 and less than 1, say, something has a 98.567% chance of happening, then the fact that it did (1) or did not (0) happen at the moment you observed it is luck. It’s pure luck, because the chance that it would happen has already been established at .98567. Unless you are god, the timing of the event is unknowable. And so, when it happens, it’s luck. BUT IT’S LUCK CENTERED ON ITS TRUE MEAN. In this case, .98567. It’s not luck as in 50/50 chance of happening. It’s luck as to the timing of it. It’s random variation centered around a true (known with certainty or estimated with uncertainty) mean. And that mean is not 0 or 1.
Yet ignoring such constructs is at the core of modern concepts such as DIPS theory, as well as pitching statistics that eliminate from consideration any play that necessitates a fielder’s glove.
The eliminate part sounds like he’s talking about FIP. FIP doesn’t do that any more than OBP “eliminates” the fact that a HR is more valuable than a walk. OBP concerns itself with a subset of hitting performance (did the batter reach base). It “eliminates” the fact that a HR is more valuable. Heck, it even eliminates the fact that a runner has stolen bases and caught stealings. FIP concerns itself with the subset of performance that doesn’t involve the fielders. It takes NO POSITION on the other 75% of events that involves the fielders. It does not eliminate it, nor does it treat it as if it’s all random. FIP does what it does perfectly well.
Wednesday, December 14, 2011
You have a 66.7% chance of winning any game. You win the series if you win 3 games before your opponent wins 4 games. What are your chances of winning the series?
Try it yourself. Presuming I did this correctly:
Read More
Monday, December 12, 2011
By , 09:10 PM
Let’s say that the chances of a false positive for the Braun test is 1 in a 1000. What are the chances that Braun took a banned substance and that he did NOT have a false positive?
On the flip side, sort of, our friend Richard Justice (Houston sports journalist and radio guy) said that he believed Braun (that he did not take anything and it must have been a mistake of some sort). When asked why, he said the usual - Braun is a stand-up, honest, smart guy, etc.
So Justice (unbeknownst to him apparently) was simply stating the anterior or prior probability - that Braun is not a likely candidate for being a PED user. But, we have more information of course. We have a posterior probability that he took a banned substance, which is the 1 minus the probability of a false positive or some other sort of “mistake.”
So, again, given the prior and the posterior, what are the chances that Braun cheated?
And Justice, among others, needs a lesson about Bayesian math…
Tuesday, December 06, 2011
Phil’s argument for it.
Wednesday, November 30, 2011
Rangers, Mets, Lotte Marines, and now his fourth team, the Redsox. A compilation of reactions can be found here.
***
Anyway, I found this story:
“Ichiro is a mathematical genius,’’ he told me. “Because of that, he can read the angles of the field better than everyone else. When he runs to a spot in right field to make a catch, and the ball is there, waiting for him, it’s because he can see the angles better than anyone. I was in an elevator with him once. It was about a 40-floor hotel. He looked at the right side of the elevator, the even numbers, and added them up in his head in, like, two seconds.’’
There’s a little trick I like to play. It looks mighty impressive, even if it’s not. I can divide any number by 7, and get the answer to six decimal places. The trick is that the sequence after the decimal is always of the pattern 142857, and it loops back. So, 1/7 is .142857 and repeats. For 2/7, you start with the second smallest digit, and follow the same pattern: .285714. 3/7 starts with 4 (the third smallest digit), so you get .428571. And so on.
Even remembering the 6-digit sequence is easy enough. Double 7 (14), double that (28), double that and add 1 (57). So, once you have that 142857 sequence, you can divide any number by 7. Cute, right?
***
We know about adding all the numbers in sequence starting from 1, which is just n*(n+1)/2. The (n+1)/2 is simply the average of a sequence from 1 to n. And n is the count of numbers from 1 to n.
The Ichiro trick limits us to just the even numbers. In that case, instead of “n” in the above equation, we have “n/2”. That gives us the count. And instead of “n+1” over 2, it’s “n+2” over 2, which is the average between the maximum number and 2. So, we have n/2 * (n+2)/2.
We can further expand that to (n/2 * n/2) + n/2.
For odd numbers, it’s even easier (where n is the largest even number). The count remains at n/2, and the average is n-1 (to get your max odd number) plus 1, divided by 2, or n/2. So, simply n/2 squared.
To recap: adding up the odd numbers is n/2 squared. Adding up the even numbers is n/2 squared, plus n/2.
Therefore, adding up 1 through 40 is: 40*41/2 = 820
Limiting to odd numbers: 40/2 squared = 400
And to even numbers: 40/2 squared + 40/2 = 420
So, next time you see Bobby V in an elevator, you’ll know how to break the ice.
***
Are there little numeric “tricks” that you guys enjoy doing? I’d love to hear them.
Tuesday, November 29, 2011
Given enough trials, and given a large enough coins, and we can always find outliers… and the same applies to pitchers.
Using this fact, it follows that in our first year, if we have 100 pitchers, we expect half to outperform their FIP. This means that there are 50 players that outperformed their FIP in year one. Of those 50 players that outperformed their FIP in year one, we would expect 25 (.5*50) of them to outperform their FIP in year 2 by pure chance. Of those 25 players that outperformed their FIP in year two, we would expect 12.5 of them to outperform their FIP in year 3 by pure chance. Similarly, we can continue down this path halving the number from the year before. In year four, we would expect about 6 pitchers to have continued to outperform their FIP, and by year 5 we would expect just over 3 pitchers to have consistently outperformed their FIP by pure luck.
Because we started with 100 pitchers, we expect that about three of the pitchers would outperform their FIP in 5 consecutive years, by randomness alone. Many people point to those three pitchers and say, “Clearly, FIP is not accounting for something those three pitchers do.” We can now completely discount that argument for the “simulation”, because we have assumed FIP to be perfect. Thought experiments are nice because they easily allow you to comprehend and visualize a phenomenon, but there is not a lot to glean, if the experiment is completely incongruent with reality.
To put it simply, to “win” something 5 times in a row, just by pure luck, and you have a 50/50 chance of winning each time, then you will win five and lose zero a total of 0.5^5 = 3.1% of the time. Start with 100 coins or 100 pitchers, and you will flip heads, or beat your FIP, by flipping five times in a row, a total of 3 times.
And in reality?
giving us a not so unexpected final total of 3.6%
...
and finally 4% of the original starting pitcher group
This is not to say that FIP is perfect. But, just relying on the fact that you’ve been able to identify 3 or 4 extreme cases, when that’s exactly what you would have expected to find if it was all luck, doesn’t prove your point.
You need to find MORE than the expected extreme points, and NOT just “some” extreme points. Some extreme points means nothing, unless you know how many you expected to find by luck.
By , 02:23 AM
On XM radio’s MLB channel the other day, the talking heads were discussing Papelbon (right after his signing). One of them asked the question, “Do you think that Papelbon’s best days are behind him?”
Here’s a news flash, and an important statistical concept for you newbies:
For any overall pitching stat, like ERA, FIP, or ERC (component ERA), if a pitcher has been above average for that stat in the past, his better days are always behind him, regardless of his age or experience, assuming that we know nothing else about him. Of course when I say “always,” I mean that our projection for him going forward is always going to be worse than his past performance, using a weighted average of his last 3 years (say, 1, 2, 3 weights) to represent his past performance.
If you don’t believe me, I challenge you to give me any parameters that you think would defy that proclamation, and we’ll test it using historical data.
Wednesday, November 23, 2011
So says Phil.
Tuesday, November 22, 2011
By , 11:29 PM
Let’s see how many posts it takes for the geniuses on BBTF to figure this one out. So far 9 and counting…
Anyway, here is the link:
http://www.thegoodphight.com/2011/11/21/2485197/phillies-citizens-bank-park-not-a-hitters-haven
to an article which tells us that CBP has played almost neutral for the last 4 years, therefore it is now a neutral park, as opposed to the first 4 years when it was an extreme hitters’ park (around 1.07).
Let’s forget for a second how a park can all of a sudden change its true PF’s (it can’t other than by changing other PF’s in the league and even then it won’t change much - of course the “effective” PF can change - a little - with weather and with different players).
Instead, let’s do this thought exercise:
You have 30 parks with a true PF of x, y, x, etc. I am telling you that they never change (which is actually reasonably true, as I indicated above, barring a remodel of course). We track the observed (sample) PF’s for 8 years. What are the chances that in the last say, 3, 4 or 5 years (you get to choose the end points) some park will show an observed PF that is quite different than its true PF AND/OR quite different than the observed PF in its first 3, 4, or 5 years?
IOW, what can we conclude about the true PF of CBP? Not much other than its true PF is likely the un-weighted average of the observed PF over the last 8 years, regressed toward some mean (of a similar park, dimension, weather, altitude-wise, etc.). If you want to weight more recent years slightly more than past years, I don’t have much of a problem with that, although I don’t think that any weighting is necessarily appropriate…
Monday, November 21, 2011
Just love this.
That’s exactly what Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn did in a new academic paper (reported on in today’s National Post). They wanted to prove the hypothesis that listening to children’s music makes you older. (Not makes you *feel* older, but actually makes your date of birth earlier.) Obviously, that hypothesis is false.
Still, the authors managed to find statistical significance. It turned out that subjects who were randomly selected to listen to “When I’m Sixty Four” had an average (adjusted) age of 20.1 years, but those who listened to the children’s song “Kalimba” had an adjusted age of 21.5 years. That was significant at p=.04.
How? Well, they gave the subjects three songs to listen to, but only put two in the regression. They asked the subjects 12 questions, but used only one in the regression. And, they kept testing subjects 10 at a time until they got significance, then stopped.
In other words, they tried a large number of permutations, but only reported the one that led to statistical significance.
Friday, November 18, 2011
Phil gives a back-handed slap in the face to all those who fall in love with regression, and forget about reasoning, insight, and interpretation.
Thursday, November 17, 2011
A pitcher’s outcome line represents what his team did WHILE the pitcher was on the mound, and NOT (necessarily) BECAUSE he was on the mound.
***
A pitcher’s W/L record is one such thing, as is his BABIP, his ERA, his HR, all of which are influenced to some degree or other on things that the pitcher has no control over.
The same can be said for a catcher’s ERA, or the number of outs recorded by a fielder. Or, really, a ton of things that we’ve so neatly compartmentalized directly to a single player.
These are team statistics, to some extent or other. You have to think about it in that context.
Don’t be so foolish as to think that since you care only about what did happen, that you actually know which single person it did happen to, as if the other 8 players on the field did not exist.
Tuesday, November 08, 2011
Jeff asks an innocuous question:
As I understand it, the Rookie of the Year award is supposed to go to the league’s best rookie. Consensus seems to be that “best” is some combination of performance and playing time. This is why Brett Lawrie doesn’t show up at the top of many lists. But why should playing time be that important? Brett Lawrie came to the plate 171 times and hit .293/.373/.580. That is an outstanding performance. An outstanding performance over a limited sample, sure, but a more outstanding performance than any other AL rookie, as far as I can tell. Why shouldn’t he get more consideration for the award? It isn’t the AL’s most valuable rookie. It’s the AL’s best rookie. There’s room for interpretation. Man, there’s room for interpretation with everything.
It is an outstanding RESULT. It is an outstanding OUTCOME.
And those results, those outcomes, are LINKED to Brett Lawrie.
Can we therefore INFER that because those outcomes are linked to Brett Lawrie that we (necessarily) conclude that Brett Lawrie had an outstanding performance?
Just today, I caught every single green light. I mean every single one. That was an outstanding outcome. And, it was me, Tom, driving the car. Can you infer that I had an outstanding driving performance? Intuitively, we know that virtually all of that was luck. So, since we know most of it is luck, we simply conclude that all of it is luck, and regress my “performance” 100% and treat it as all luck.
Brett Lawrie came to bat less than 200 hundred times. And he was deeply involved in each one. It SEEMS like the outcomes linked to Lawrie is OWNED by Lawrie. But that’s not true! A large share of those outcomes are owned by Lawrie, but not all of them.
And here’s another weird part: the more outcomes he had, then the larger share of those outcomes that we can attribute directly to Lawrie. So, if you had 2 plate appearances for Lawrie, then we attribute very little of the OUTCOMES to Lawrie. We just don’t know if he was being a good driver, or just happened to be in the driver seat at that moment in time. If he had 20 PA, then we attribute more of those outcomes to Lawrie. If he had 20,000 PA, then we’d attribute 99% of each of those outcomes (including those first 2 outcomes) to Lawrie.
This is a Bayes world.
Unless you can make a perfect and direct connection from Brett Lawrie to a particular outcome, then we have no choice but to INFER FROM the outcome back to Lawrie, the extent to which Lawrie himself actually influenced that outcome. And one way to do that inference is through regression.
Remember: everything we see is an observation. And our job is to infer what caused that observation. And the more observations we have of Lawrie, the more we can infer each single observation.
I agree with Matt wholeheartedly.
***
I’ve had a minor issue with Pizza Cutter’s threshold for “stabilization”, which I’ve mentioned several times in this blog. Basically, Pizza sets the threshold at r=.70, whereas I set the threshold at r=.50. Why do I prefer mine? Because with my threshold, I can tell you exactly how much to regress the stats. It gives you extra information. In addition, I can explain it in English. If I set the OBP threshold at PA=210, then I can say: “If the player has 210 plate appearances, then his OBP is half real and half noise. Regress his OBP by 50% toward the mean.”
And, if the player had 500 PA, then you would regress by 210 / (210 + 500) = 30%.
For Pizza, r=.70 would mean THE EXACT SAME THING. But his threshold would be PA=500. So, his threshold say: “If the player has 500 plate apperances, then his OBP is 70% real and 30% noise. Regress his OBP by 30% toward the mean”.
So, exact same thing. But, if the player had 400 PA, then what? Well, in my case, you know exactly how much to regress by: 210/(210+400) = 34%. But with Pizza’s case? You’d have to do: 1-400/(400+.3/.7*500) = 34%. That 3/7ths thing there is not very attractive to me.
Pizza is as stubborn as I am, because we both knew exactly what the other guy meant, and still, both of us stuck to our guns on this issue.
Note: no actual pizzas were hurt in the creation of this post.
***
Derek Carty posted the 50% threshold here:
http://www.insidethebook.com/ee/index.php/site/comments/when_is_the_observed_data_half_real_and_half_noise/
Friday, November 04, 2011
y = -243.83x^4 + 478.68x^3 - 170.49x^2 + 14.134x
Monday, October 31, 2011
As if the balloting process for players isn’t enough of a problem, it seems that the process for managers also has the same issues.
We were already aware of how it’s silly to limit the number of people you can vote for (something like you can vote up to 4 of a ballot that has more than 10 well-qualified people) AND make those people meet a minimum threshold (75% or something).
Is it so hard to get experts in ballot-making?
Wednesday, October 26, 2011
By , 02:56 AM
The overwhelming consensus on BP, FG, this blog, and lots of other sites I have visited is, “No!” How did all these people come to that conclusion? Because it failed and it “cost” the Cardinals a good chance to tie or win the game. Does that make any sense? Of course not. Not in a rational sense. Can the outcome of a play that swings the percentages one way or the other maybe 1 or 2% inform us of the “correctness” of the play? Not in one single instance and not enough that a human being could possibly discern even after dozens or even hundreds of such plays. But people are irrational beings. When it comes to sports, they are out of their minds irrational.
So, can one determine whether running was correct in that instance without “running the numbers?” Not a chance. One can take a guess and be right 50% of the time, I guess. If you are a good sabermetrician, you might be able to do some quick mental calculations and maybe come up with the right answer with some degree of certainty, as long as the actual answer is not particularly close (i.e., the WE from each alternative is not a dead heat).
So what are all those people doing with their, “opinions?” I have no idea. To me, opinions should be reserved for ice cream flavors, what color car you like, and whom you would choose for your dream date. To me, there is no such thing as an “opinion” on which of two strategies yields the highest win expectancy. That is a matter of fact. That seems to be lost on 99.7% of the population.
So what is the right answer? I’m not going to tell you because I don’t know. I could know if I “ran the numbers” but I don’t want to deprive some aspiring sabermetrician of doing the work and making a name for himself.
OK, in all honesty, I can’t “know for sure” because I can only estimate the value of the requisite variables. Some more than others. But when the smoke clears, I could tell you one of three things with almost exactitude:
1) It is clearly a “run.”
2) It is clearly a “no run.”
3) It is close, depending on the exact values of all the variables, so we’ll just call it a draw.
Nowhere does my opinion matter…
Recent comments
Older comments
Page 1 of 320 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date