THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, June 08, 2011

Testing the binomial distribution theory in baseball

By Tangotiger, 04:17 PM

Ichiro has had 802 games where he came to bat exactly 5 times.  His OBP was .413.

The expectation of him getting on base 0 or once, using the binomial distribution, is 252 times.  In reality, it was 262 times.

Ichiro had 671 games where he came to bat exactly 4 times.  His OBP was .326.

The expectation of him getting on base 0 or once, using the binomial distribution, is 406 times.  In reality, it was 399 times.

If you add the two above:
- the expected number of times he would get on base 0 or once, based on the binomial, is 659 games
- the actual number of times he actually did get on base 0 or once, based on the binomial, is 661 games

Ichiro was the first guy I looked at.  That it ended up this close was fantastically fortunate for me.  But, it’s not a surprise.

So, there’s my challenge to anyone else: select 10 hitters.  I dunno… Rickey, Boggs, Gwynn, Raines… whoever.  Whoever you are interested in (though preferably not guys with lots of IBB).

Report the results.  You’ll find something close to what I found.

***

For those wondering why the OBP are so different for 4 and 5 PA: the PA was selected after the fact.  If he came to bat 5 times, chances are, his team (and him) were hitting pretty well.  In order to not have this issue, I would instead only look for the FIRST FOUR PA of each game.  Then you wouldn’t have this problem.


#1          (see all posts) 2011/06/08 (Wed) @ 15:13

You start with logic.  Logically speaking, I’m right.  The absolute minimum range, if you are God, is a 0.80 runs range.  It is indisputable.

In no way am I defending the PECOTA percentiles, but that claim is certainly disputable.  It is based on your assumption about how the game of baseball works, that it’s a weighted random number generator.  If that assumption is incorrect, your logic falls apart.


#2    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 15:21

Mike, unless Colin or Nate Silver is suggesting that they know more, that they can model baseball performance such that the binomial does not apply (and that would be quite the declaration, perhaps the most profound declaration in baseball statistics imaginable), the presumption is that PECOTA Percentiles are doing something terribly wrong.

The status quo is that the binomial distribution is the basis.  If someone wants to put out a forecast that discards the binomial distribution, then you better publish the research.


#3          (see all posts) 2011/06/08 (Wed) @ 15:25

Tango/4, I thought I was quite clear that I was not making a defense of the PECOTA percentiles.

A binomial distribution may well be the accepted basis for current projection systems.  I don’t dispute that.  But that’s a heckuva long way from saying that the binomial distribution model is indisputable God’s truth when applied to baseball.  For the latter claim you better bring some pretty strong evidence, rather than just stating that anyone who disagrees with you is illogical.


#4    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 15:44

No, it’s not a “heckavu long way”.  It’s a tiny step to take.


#5          (see all posts) 2011/06/08 (Wed) @ 16:02

Is Keynesian economics indisputable God’s truth for economic theory?  Is Newtonian mechanics indisputable God’s truth for physics?  Just because a model is widely accepted and accurate within a certain strictly defined area of application does not make it indisputable God’s truth.  That’s a path to serious error and not a tiny distinction.


#6    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 16:18

Mike I just started a new thread on binomial distribution.  You may appreciate the initial results.


#7          (see all posts) 2011/06/08 (Wed) @ 16:34

You may appreciate the initial results.

No, I don’t.  I’ve seen those kind of results.  It’s not new data to me. 

Confirming Newton’s laws by colliding objects on an air track isn’t impressive.  It’s what you do when you’re confronted with the evidence of the black body problem or the photoelectric effect that shows what kind of scientist you are.  Do you wave them off as noise to be ignored in the face of the otherwise impressive performance of your model, or do you realize that they are pointing you to the limitations of your model and to an underlying reality that is different than you have thought?


#8    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 16:42

What the heck… I did it for Tim Raines too.

Expected number of 0 or 1 time-on-base games is 869.  Actual number of 0 or 1 is 847.

So, a bit of clumping out of 1965 games.  We expected 44% of 0 or 1 time-on-base games, but we got only 43%.  That’s just one standard deviation from expectation.

We got fantastically lucky with Ichiro.  Raines here looks a bit more of what we’d typically find.

Also note that with Raines, we’d EXPECT some clumping, since we got the peak and valley of his career.  We KNOW that he’s not the same Raines throughout his career.

Nonetheless, it’s pretty much a bullseye.

Wherever you look, the binomial distribution is going to hold exceptionally strong in baseball (and other sports… or in time-series sports, the Poisson distribution).


#9    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 16:53

I’m saying the binomial is indisputable for the issue at hand.  There has been no evidence presented to dispute this, by you, or anyone anywhere.

Your example of Newtonian (based on what you wrote) is that they DID find some scenario where Newton didn’t hold.  That’s evidence.

Of all the things to dispute that I’ve ever said, my reliance on the binomial, especially as it relates to PECOTA, is a terrible choice to make!  I’ve made far easier claims to dispute than this.


#10    Lee      (see all posts) 2011/06/08 (Wed) @ 16:56

There has to be clumping across the board to some degree, even if the batter had zero platoon split, which immediately is the biggest factor that comes to mind, but there are so many that would lead to daily shifts in true talent: opposing starting pitcher talent, home/road, minor injuries, etc. No one’s true talent is constant. And while the shift probably isn’t very large, if you looked at a serious sample you’d have to see a solid gap between the theoretical distribution assuming constant talent and reality.


#11    Raines      (see all posts) 2011/06/08 (Wed) @ 16:58

If all you’re looking at is OBA, Raines is pretty close to the same hitter throughout his career.  His OBP is not much different whether you look at his Montreal peak or his Yankee run as a good part timer.

The offensive context around him is vastly different, as is the threat he poses as a baserunner, but for this exercise we don’t care about that.

Sammy Sosa would be a different case, spending part of his time as a .400 OBP slugger and part as a .300 OBP hacker - his career average doesn’t accurately tell you what kind of player he was at either point.


#12    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 17:11

Excellent point, thanks!


#13          (see all posts) 2011/06/08 (Wed) @ 17:20

Of all the things to dispute that I’ve ever said, my reliance on the binomial, especially as it relates to PECOTA, is a terrible choice to make!  I’ve made far easier claims to dispute than this.

I’m not disputing it because I want to find something you said to disagree with.  Among the many things you have said that I agree with, there are probably some that I disagree with, and I could go find them and work on evidence to prove you wrong, and they would probably be far easier to tackle than this topic.  But that’s not what I’m after here.

I’m disputing it for several reasons.
1. You said it was indisputable God’s truth.  That’s a dangerous position to take about anything, much less something very much in dispute.
2. This is an issue with extremely important consequences for the analysis of baseball.  If the binomial model doesn’t hold, as you noted above, “that would be quite the declaration, perhaps the most profound declaration in baseball statistics imaginable.” I wouldn’t quite phrase it that way, but I think we agree on the basic level of importance.
3. You are ignoring and explaining away many scenarios where the binomial model doesn’t hold by lumping them into the uncertainty.  You say that uncertainty is the rock bottom truth that even God could not pierce, but that’s an excuse for not digging further to understand why.

You want one concrete example where your binomial model falls short and more granular data is more accurate?  How about this one: the BABIP against cutters on the outside half from RHP to LHB is .261 +/- .010 from 2007-2010.  That’s not a terribly granular example, but it’s more granular than the data you are using, and the difference is already fairly significant.  The more granular you get, the more the probabilities will depart from the model you are using.  So maybe what I am disputing is not so much the binomial model, but the level of data which you believe is sufficient to use within that model (and not just sufficient, in fact, you claim it’s God’s truth indisputable that you don’t need more granular data).

--Btw, feel free to move my posts on this subject to your other thread, as long as there is some link back to this thread so people know that I wrote what I did in response to what you wrote here (about indisputable God’s truth, etc.) and not in response to what you wrote in your other thread.  I recognize this is off topic to whether PECOTA percentiles work or not.--


#14          (see all posts) 2011/06/08 (Wed) @ 17:29

This is fantastic. 

I did it for Mike Schmidt, looking at the full distribution of Times on Base for each of PA=4 and PA=5 and then doing a Chi-Squared test on the result.

The following is the data, shown as TOB/PA # Games.  Number of Games is 1414 with 4 PAs and 657 with 5 PAs.

0/4 254
1/4 535
2/4 456
3/4 155
4/4 14

0/5 42
1/5 155
2/5 214
3/5 168
4/5 69
5/5 9

The Chi-Squared Test gives p=0.85 for 5 PAs and p=0.35 for 4 PAs.  So, completely consistent with a binomial distribution.  Fascinating!


#15    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 17:52

You want one concrete example where your binomial model falls short and more granular data is more accurate?  How about this one: the BABIP against cutters on the outside half from RHP to LHB is .261 +/- .010 from 2007-2010.

That has nothing to do with the binomial.

Maybe your problem with me is based on a misunderstanding.

What you are talking about is bias.

As a clear example, let’s say that Rickey Henderson has an OBP of .400, that’s his true talent level.  We’ve got 13,000 PA, so we are pretty positive that’s what it is.

But, then we find that against one group of pitchers, his OBP is .395 and against another, it’s .415.

That’s a finding of bias.  In this example, the split was based on pitcher handedness.

In your example, you have outside cutters.  That’s a finding of bias.

Findings of bias are great!  We learn.  There’s bias in performance against FB/GB pitchers.  There’s bias in parks (Coors/Petco).  There’s bias in lots of places.  Outside cutters?  Great finding.  You found bias.

None of this has anything to do with the binomial distribution.  The binomial STARTS with the true mean of something.  And then asks: given a certain number of trials, how often is something going to happen?

And, as the Ichiro and Raines examples have shown, the binomial is a fantastical tool to use to answer that question.


#16    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 17:58

Some comments have been moved from here:

http://www.insidethebook.com/ee/index.php/site/comments/do_pecota_percentiles_make_any_sense_no/


#17    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 18:02

In Larry/14:

Expected number of 0 or 1 time on base = 991
Actual = 986

Fantastic!

Anyone else want to do one?  This is the only way to make believers, it’s for you guys to roll up your sleeves.  This is how I learned.


#18          (see all posts) 2011/06/08 (Wed) @ 18:05

Tango/15, I appreciate the response.

I think we have…
...areas of agreement on this topic,
...areas of disagreement that are exacerbated by semantics and terminology that could perhaps be resolved simply with further discussion and clarification of what we each mean, and
...areas of fundamental disagreement about the nature of the game of baseball that will only be resolved with further research.

It will probably take some time to sort through all those.  When I get a chance, I will elaborate more on my thoughts.


#19    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 18:06

Just for a running tally, this is actual number of multi-onbase games, expected number, total games, and number of SD from expected by binomial:

812, 814, 1473, +0.13, Ichiro
1085, 1080, 2071, -0.23, Schmidt
1118, 1096, 1965, -1.02, Raines


#20          (see all posts) 2011/06/08 (Wed) @ 18:27

I didn’t get a chance to read the full thread.  However, let me just say if no one has yet said it that simply looking at mean values does not prove that the binomial distribution is the correct one.  In the lingo, the mean is the 1st moment of the distribution.  The variance is the 2nd moment.  But those are just the lowest two moments of the distribution.  If indeed the binomial distribution is the correct one, then the full distribution needs to be compared to the data (i.e., all statistically significant moments should be compared to their binomial prediction).


#21          (see all posts) 2011/06/08 (Wed) @ 18:33

I tested using a Chi-Squared test, rather than just looking at the number of multi-hit games for a reason.  Namely, the mean and standard deviation aren’t the whole story if you don’t have a normal distribution.  If you want to claim that getting on base is an i.i.d. binomial process, you should test that proposition as intensely as possible given the data available.  By changing the question to “number of multi-hit games” you do get something close enough to a normal distribution (a binomial process with 1500-2100 trials) but it’s not the strongest test of the model. 

Also, all this shows is that the data is explained by the model, not that some other model isn’t the correct one.  In fact, going back to the original question here nothing here has shown that the randomness predicted by a binomial process is, in fact, found in the data.  If we find all the predictions, as a group, turn out to be “too good,” it might be the case that a binomial model overstates the uncertainty.

So far, 3 cases all within 1 SD.  If we do 100 more, and they all fall within 1 SD, then maybe there IS a problem with the model.


#22          (see all posts) 2011/06/08 (Wed) @ 18:35

Alan/20:

That’s what I did in 14.  And I was making that point in 21, but it appears you beat me to the punch.


#23    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 19:14

Larry: you say Chi, and 90% of the audience has been lost.

I say multi-onbase games, and 90% of the audience is still with me.

Once you sell them on the multi-onbase, then you can get into the discrete numbers.  I’m tracking them discretely, but I’m only presenting the grouped level data.

And I have no doubt that we’ll get an SD of all the SD of 1.  Right now, we’ve been lucky to get 3 such great fits.


#24          (see all posts) 2011/06/08 (Wed) @ 19:23

Tango:

90% of the audience for a post titled “Testing the binomial distribution theory in baseball.” Really?  I would hope not!


#25    Jimmy      (see all posts) 2011/06/08 (Wed) @ 19:56

Some of you need a primer in probability theory. First of all, there are (literally) infinitely many probability distributions out there. The binomial distribution is just a common one with nice properties and a functional form that is easy to play with. It is a theoretical construct and not something that we can ever observe being completely true of a process.

However, what we do know thanks to mathematicians is that the binomial distribution naturally arises when some event with a binary response (i.e. it either succeeds or fails, like a coin toss) is repeated with the same probability each time. As it turns out, this setup is a reasonable assumption for a lot of the random events we observe in life. For example, a coin toss: we know that it’s repeated, and we know that the coin and air density and flipping mechanism and what not don’t change that much, so it’s safe to assume that the probability of success also doesn’t change. Then we can safely model the situation with a binomial distribution. Now this doesn’t mean that we are completely sure the binomial distribution is exactly correct, but we are making a reasonable assumption that allows us to draw inferences.

The same with baseball. We don’t know for sure that the players’ probability of getting on base or getting a hit or whatever are generated by a binomial process, but from what we know about how hitters operate (i.e. hitters tend to have some true skill level that is roughly constant over time, which represents the probability of getting on base, getting a hit, etc.) it is reasonable to assume that the binomial model makes a good approximation. And that means a lot because it allows us to draw inferences and do other things with the data.

What Tangotiger has done here is to empirically test a few cases. He has not definitely determined that a binomial process is underlying baseball performance (nor will he ever, since it is impossible), but what he is saying is that it very well could.

For some of you who don’t believe that it’s a good fit, well, you might be right. But that’s something that should be determined through empirical testing like Tangotiger did, not through vague assertions. This is statistics, after all.


#26          (see all posts) 2011/06/08 (Wed) @ 20:16

What Tangotiger has done here is to empirically test a few cases. He has not definitely determined that a binomial process is underlying baseball performance (nor will he ever, since it is impossible), but what he is saying is that it very well could.

For some of you who don’t believe that it’s a good fit, well, you might be right. But that’s something that should be determined through empirical testing like Tangotiger did, not through vague assertions. This is statistics, after all.

Jimmy, it might not be clear from how this thread is constructed where it came from.  It did not start with Tango’s post at the top, and what I have written is not in response to what he wrote up top.

First of all, this is an ongoing conversation over the span of several months and several threads.

But more directly, my comments were in response to Tango saying in another thread (Tango linked to it in #16) that the binomial distribution was God’s indisputable truth for baseball and any other model for the underlying process was illogical.  That’s very different than him saying that “it very well could” be the process underlying baseball.  Tango then produced this thread in response to my comments, and, with my permission, moved my comments on the topic from that thread into this one.

I agree that empirical testing is the way to resolve this.


#27    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 20:27

What I actually said was this:

You start with logic.  Logically speaking, I’m right.  The absolute minimum range, if you are God, is a 0.80 runs range.  It is indisputable.  Once you add uncertainties and nuances of how runs are created in baseball, you have no choice but to accept the range is going to be much bigger.  It cannot logically be less.

The God reference was in knowing the true mean of the entity in question with zero uncertainty.  So, given that we know as a question of fact that the pitcher’s talent level was an ERA of 2.70 (i.e., God told us), then the binomial distribution (not God) tells us that the 90th and 10th percentiles will be a range of 0.80 runs.

Could there be a model that explains OBP in baseball better than the binomial?  Sure, it’s possible.  But, I have not seen it, nor do I expect to see it.  Anytime I have needed to test the distribution in all facets of baseball in the past (and in other sports), the binomial has never failed me.  The binomial seems to apply to OBP just as much as it does to coin-flipping.

In any case, if the mean has no level of uncertainty, then the lowest range in runs was 0.80, and Colin had 0.49, which shows there was a huge problem with what Colin did.


#28    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 20:30

Here’s another article I did that references God:

http://www.insidethebook.com/ee/index.php/site/comments/god_and_500/

All my comments referring God is meant solely in establishing an uncertainty level of zero for the true mean.


#29    Jimmy      (see all posts) 2011/06/08 (Wed) @ 20:32

Ah, I see. Thanks for the clarification.

As someone who makes his living on statistics, I am loathe to say that any one distribution is “God’s indisputable truth.” It is never, ever possible to observe some data and then empirically back out the underlying data process (such as a binomial process) completely. What one can do is hypothesize an underlying data process and then test how well the observed data fit that hypothesis. But no matter how good the fit is, one can never really say that it is for certain.

I do agree, however, that the binomial process is definitely the most reasonable out of the common data processes to be used with baseball data. The other common ones don’t really make sense, except as approximations to the binomial process (which is what I think a previous poster was getting at with the normal distribution). But as I mentioned earlier, there are technically an infinity of probability distributions so just because the binomial is the most reasonable and well-known doesn’t necessarily mean there isn’t another one out there that fits better.


#30    Jimmy      (see all posts) 2011/06/08 (Wed) @ 20:36

OK, I see what you mean. Yeah, I think this whole misunderstanding stems from people confusing “population” statistics and “sample” statistics. On the surface it’s a no-brainer distinction, but it seems like most of the people here haven’t got it quite right…


#31    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 20:53

I am loathe to say that any one distribution is “God’s indisputable truth.”

Just to be clear, and not necessarily for Jimmy’s benefit, but just for anyone and everyone out there: I never said or implied that.

***

By the way, even if there was a better distribution than the binomial, if the binomial suggested a range of 0.80 runs for the 10th and 90th percentile, a “better” one will still give you something like 0.78 or 0.83 or something.  It’s going to be really darn close.

It won’t be 0.49.


#32          (see all posts) 2011/06/08 (Wed) @ 21:07

I don’t see a theoretical reason that if you had perfect knowledge that the distribution wouldn’t be 0.00 runs.  That you claim the indisputable minimum to be 0.80 runs is what bugs me (or the assumptions that lead you to that statement are what bug me, however you want to say it).

Your application of the binomial distribution using a long-term true-talent mean works as well as it does with large samples because of our lack of knowledge of the underlying processes.

I’m not convinced that it works nearly as well with small samples.

I’m also not convinced that we would be able to differentiate the cases where it doesn’t work with the large samples from the statistical noise.

With better knowledge of the underlying processes, we ought to be able to better understand what is happening in the small samples and to identify the few large-sample cases where it doesn’t work.


#33          (see all posts) 2011/06/08 (Wed) @ 21:21

Any one of a number of models will be able to predict results with as good precision as the binomial or any other distribution; for any finite set of data, there are an infinite number of ways to perfectly describe it mathematically. You could say that players have no talent at all, that the only determination of what happens in a baseball game is the temporal coordinates of that game (down to some ridiculously small, I don’t know, picosecond level), and that with the correct really enormous formula with billions of seemingly-random-to-the-average-person coefficients on different terms added together, I get the correct outcome of everything, no variations whatsoever.
Of course this would be totally absurd from what you’re calling a “logical” standpoint (though really it’s intuition more than logic). We just *know* that players have talent levels and that such precise measures of time aren’t all that important. But strictly empirically, it’s valid. Of course, this system also (almost certainly) will lack any predictive power whatsoever. But the only reason I think yours has good predictive power is because it makes a lot of sense to me.
In any case, even just assuming the empiricism is an intuitive assumption you’re making that things will continue to happen as they have in the past. I think this is generally a very good assumption (again, because it just seems sensible) but that if I have some compelling reason to overrule it, it shouldn’t be some kind of hard-and-fast unbreakable law. And there’s absolutely nothing that can prove this truth, just like there’s nothing that can prove that a proposition and its negation cannot be simultaneously true.
More to the point here, you’re begging the question. You assume that baseball has a binomial distribution, and there’s absolutely nothing that you can’t explain by that hypothesis. If there is, that’s just some kind of ‘bias’ in the results, which is taken into account in the new, improved binomial distribution, with a better approximation of the true mean.
This is not to say that you’re wrong. In fact, you could be totally right AND have everyone else and every other way of doing it be totally wrong. You could also be right and have other ways of doing it be right, too. You’re never going to be able to prove other people wrong by saying that you’re right; you can only show how accurate your predictions are on data in the past, which is kinda impressive, and make predictions as to what things will be like in the future, which, if accurate, are more impressive. And the others can do the same. And you can argue with them over why your way is more sensible, more intuitively correct, more “logical” than their way.


#34          (see all posts) 2011/06/08 (Wed) @ 21:25

I’m not sure I understand the argument.  I _think_ there are three sides, and I _think_ this is what they are saying.  Please correct me if I have mischaracterized anyone’s position.

Tango (my interpretation): We have a measurement of A such that A = A’ +/- E1 +/- E2.  A is what actually happens.  A’ is the true mean.  E1 is the statistical uncertainty (random variation).  E2 is the systematic uncertainty.  Even if we could shrink E2 to zero ("God’s-eye view"), the irreducible E1 is much larger than the PECOTA error bars.

Colin (my interpretation): PECOTA is fine.  Tango’s estimate of E1 is much too big.

Mike/1 (my interpretation): E1 does not exist.  Baseball is not a Poisson process.  We’re not playing dice with the major leagues.  We’re playing with cards.  The cards are marked and have already been stacked.  We just can’t read the markings.

Is this at all accurate?


#35          (see all posts) 2011/06/08 (Wed) @ 21:41

As for the binomial distribution itself, I think it makes a lot of sense as a baseball model. It makes sense that a batter has a certain talent level, the pitcher has a certain talent level, given those two facts, X% of the time it will be a homer, Y% a strikeout, etc. It makes sense that these events should happen pretty randomly about that true mean, the different events more or less independent. It makes sense that with some more basic information (Good defense? Ballpark? Handedness? etc.), we should be able to significantly be able to improve the our estimate of the mean in given circumstances, thereby reducing a lot of the noise and making our predictions better. It makes sense that there are other factors too, such as the batter hitter matchup (and it makes sense that such things firstly don’t have such a big effect and secondly don’t happen enough for us to be able to predict very well).
Of course it also makes sense that we’ll never be able to figure out all the different little factors to make the thing 100% accurate, but it makes a good amount of sense that it doesn’t take all that much to get to say 98%.
But it also makes sense that if you have all the factors, including things like temperature, focus, psychology, and conditioning, the true mean will slide to 1 or 0. It of course also makes sense that there are a lot of these factors that are just unknowable by humans.
So basically I think it’s a good but slightly imperfect model. The main improvements come from what could be called biases, but when you start adding in a good fraction of these, the model stops looking binomial really - you stop being able to claim independence between events, because you have so many constraining variables. Of course, I don’t think we’ll get nearly that far.


#36          (see all posts) 2011/06/08 (Wed) @ 21:53

Mike/1 (my interpretation): E1 does not exist.  Baseball is not a Poisson process.  We’re not playing dice with the major leagues.  We’re playing with cards.  The cards are marked and have already been stacked.  We just can’t read the markings.

If I understand your formulation, I believe you have characterized my position correctly.


#37    Jimmy      (see all posts) 2011/06/08 (Wed) @ 21:54

I am going off the following passage: “That means if you have zero uncertainty of his true talent, and you are guaranteed 1000 PA, 68% of the time, we’d get a wOBA of .255 to .285 (which translates to 2.40 and 3.02).  The 80% range would get us to 2.32 and 3.11.”

And then the following statement by Tangotiger that: “The absolute minimum range, if you are God, is a 0.80 runs range.”

After reading the original post, I think there are a few inaccuracies here. Namely, he got the binomial distribution, poisson distribution, and the normal distribution somewhat mixed up in his examples of wOBA and ERA.

The following might only make sense for mathematically-inclined readers. I know that this goes way over the head of the layman, but the post is tagged with “Statistical Theory” so why not:

The main nitpick I have here is that the binomial distribution is appropriate for the example of wOBA but not for ERA. If you assume a true wOBA, you can treat it as a probability and then apply it in the 1000PAs (which are Bernoulli trials) to get a binomial result, which is more commonly referred to as a binomial count (e.g. 350 times on base out of 1000PAs). Then you can take that count and divide it by the number of PAs/trials to get a sample statistic to estimate the true wOBA.

You can’t do this with ERA exactly because it’s not usable as a probability (probabilities must be between 0 and 1) and, more importantly, does not have a binary response. Its response is discrete but can theoretically take any value from 0 to infinity. What you are looking for is a Poisson distribution where there is a real-valued “rate” parameter (i.e. the true ERA) which produces positive, discrete outcomes on each trial (i.e. the number of runs scored in a given game). Then you might imagine running that pitcher through 1000 games (each time getting between 0 and infinity runs scored), and backing out a sample rate which estimates the true rate.

It seems that when you give the .80 80% confidence interval, you are implicitly approximating the Poisson distribution of runs/9 with a normal distribution. And when you do the same for the wOBA, you are similarly approximating the binomial distribution with a a normal distribution. In this particular case, that’s fine for the binomial distribution because of the sample size, but not so for the Poisson/ERA case. The normal approximation for Poisson is only valid when the rate parameter is greater than 20 or so, and that definitely is not true here. With an ERA of 2.70, a lower bound for the SD is sqrt(2.70), and that can only increase with an approximation.

Anyway, this really has no practical bearing on the discussion going on, but I thought I might as put my statistical $0.02 in. Also, it goes without saying that Tangotiger does great work in general, so this is all just a minor trickle in the river.


#38    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 21:59

I don’t see a theoretical reason that if you had perfect knowledge that the distribution wouldn’t be 0.00 runs.

The ONLY way for the 10th percentile to equal the 90th perecentile of OBSERVED OBP is for each trial to not be independent.

The binomial distribution is dependent on the idea that each trial IS independent.

In your case, if the OBP is 0.333, then for every 3 trials, the batter will reach base exactly one.  So, three trials, one reach base, three more trials, one reach base.  And so on.

This would mean that the observed standard deviation is going to be exactly 0.

This only happens in things that are perfectly cyclical in nature.


#39          (see all posts) 2011/06/08 (Wed) @ 22:01

I agree with this part:

As for the binomial distribution itself, I think it makes a lot of sense as a baseball model. It makes sense that a batter has a certain talent level, the pitcher has a certain talent level, given those two facts, X% of the time it will be a homer, Y% a strikeout, etc. It makes sense that these events should happen pretty randomly about that true mean, the different events more or less independent. It makes sense that with some more basic information (Good defense? Ballpark? Handedness? etc.), we should be able to significantly be able to improve the our estimate of the mean in given circumstances, thereby reducing a lot of the noise and making our predictions better.

It’s this part that I have my disagreement with:

It makes sense that there are other factors too, such as the batter hitter matchup (and it makes sense that such things firstly don’t have such a big effect and secondly don’t happen enough for us to be able to predict very well).
Of course it also makes sense that we’ll never be able to figure out all the different little factors to make the thing 100% accurate, but it makes a good amount of sense that it doesn’t take all that much to get to say 98%.

I believe that the detailed tracking data that is now being acquired, and what we’ll be able to get in the next 10-20 years, will improve our model by A LOT more than from 98% to 99.9%.  I don’t see how anyone can claim that we’re anywhere close to 98% knowledge with our models right now.  There are so many things that happen in baseball, and we just don’t know why they happened, even after the fact.

The main improvements come from what could be called biases, but when you start adding in a good fraction of these, the model stops looking binomial really - you stop being able to claim independence between events, because you have so many constraining variables.

Yes, that was what I was trying to say, and you said it much better than I did.

Of course, I don’t think we’ll get nearly that far.

I don’t see any reason we can’t.  Of course it’s not trivial, we won’t get there overnight, and we’ll never have perfect knowledge.  But I believe the amount of knowledge about the underlying processes of the game that is available to us but not harnessed is greater than the amount of knowledge that we have currently harnessed.


#40    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:02

Wandering/34:speaking only for myself, yes.


#41    Jimmy      (see all posts) 2011/06/08 (Wed) @ 22:03

@Jeremy Williams:

“Tango (my interpretation): We have a measurement of A such that A = A’ +/- E1 +/- E2.  A is what actually happens.  A’ is the true mean.  E1 is the statistical uncertainty (random variation).  E2 is the systematic uncertainty.  Even if we could shrink E2 to zero ("God’s-eye view"), the irreducible E1 is much larger than the PECOTA error bars. “

This isn’t quite true. When we model data, we assume that there is some true PARAMETER (A’wink and that it characterizes the probability distribution, which then has a certain variance (E1) when individual realizations are run. We would like to know what the true parameter A’ is, but we don’t know so we construct the sample statistic which ESTIMATES the true parameter, A.

E2 does not exist. The parameter is treated as given and does not change. All the variation comes from the probability distribution itself, which is contained in E1.

The variance is a function of the parameters of the probability distribution. So in the case of the binomial distribution, there are two parameters (n = number of trials, and p = true probability) and those two parameters uniquely determine the variance. There is no “shrinking” of the variance or whatnot.

This can get confusing because, in the normal distribution, the mean and variance are themselves parameters which characterize the distribution.


#42          (see all posts) 2011/06/08 (Wed) @ 22:03

All of this brings me to the biggest problem I have with Tango in this thread (and others): how he’s invoking God. Now I realise that he mean’s these statements along the lines of “we know the true mean totally certainly”, but this is still wrong. God exists outside of time. God is omniscient. So to say that God would ever tell you that the true probability of any event happening is anything other than 0 or 1 (of course it’s much more likely that God just won’t tell you the probability at all) is either absurd or saying that God is lying.
If I asked you what the probability is that the Giants won the World Series last year, you’d just say 1 (assuming you aren’t a conspiracy theorist or wouldn’t want to play some strange semantic game). If I asked you what the probability was that the Giants would win the 2010 World Series in 1302, you should, now, with the more or less perfect information you have from the future, still say 1. If I asked you what their chances to win the 2010 World series were on August 1 2010 BASED ON (insert condition here, such as “their talent level” or “factors which they had control over” or “factors which they had reasonable control over given their innate intelligences and technology level” (though you’d have to ask the definition of ‘reasonable’ here)), then you should tell me something between 0 and 1, I suppose inclusive but almost certainly not at an endpoint, probably between .01 and .2.
So here’s where we get to the fate argument. I don’t think it was fate that the Giants won the World Series. Yeah, there were things that were beyond their control which contributed to their success (positively or negatively, most obviously to my mind the weather). But given a cerain amount of what in common parlance is luck, which refers to these basically uncontrollable variations, they played the game well enough to win the particular games they needed to to win the title. This doesn’t mean that they were necessarily any more or less likely to do so than these other teams given their talent level say. But they did it. Apart from their talent and natural abilities, this stemmed largely from the choices they made, their level of focus, etc.
But it’s sort of like asking if World War I was inevitable. No, it wasn’t. It was based on people’s choices, and they could have made different choices. But they did make those choices, so it happened, and it can’t not have happened now that it did happen.
Now I don’t understand how God thinks, clearly, but from God’s perspective, the thing either happened/happens/is happening/will happen, or it didn’t/doesn’t/isn’t/won’t. Either 1 or 0.
And most obviously, though God obviously could work out all these models and distributions absolutely perfectly, there’s clearly no reason for God to need to use one.


#43          (see all posts) 2011/06/08 (Wed) @ 22:05

The ONLY way for the 10th percentile to equal the 90th perecentile of OBSERVED OBP is for each trial to not be independent.

The binomial distribution is dependent on the idea that each trial IS independent.

You dismissed out of hand my analogy of baseball as a chess game in one of the earlier threads, but that’s an important part of what I am saying.  The trials are not independent.

When you take a big enough sample, the assumption of trials being independent mostly works because there is such a variety of pitcher and hitter matchups and game situations in the large sample that the trials are to a large degree independent.  In smaller samples, the independence is not true.


#44    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:09

Jimmy/37: you are probably new around here, so let me correct a couple of your assumptions.

OBP follows binomial.

wOBA does NOT follow binomial, but can be approximated closely.  For discussion purposes, we use wOBA and OBP interchangeably.

ERA does NOT follow Poisson or any known distribution (maybe Weibull).  The best distribution is what I call “Tango Distribution”.  We can approximate ERA as: wOBA divided by 1-wOBA, then raised to the power of 1.5, and then multiply by 12.

When I talk about binomial, strictly speaking, I’m talking about OBP.  Everything else is a translation.


#45    Jimmy      (see all posts) 2011/06/08 (Wed) @ 22:15

Tango,

I think that makes sense, since wOBA is weighted and the fact that the innings that go into ERA are not quite all the same (e.g. relievers vs. starters, etc) from a situational standpoint.

At any rate, all of these wrinkles in the specification of the distributions of these random variables should only make it less straightforward to estimate the variance of any one of the statistics… boy, deriving the distribution of ERA as you have defined it looks like quite the task (especially when the underlying statistic isn’t exactly binomial).


#46    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:16

Jimmy/41: there is uncertainty of the mean estimate, and then there is the random variation of the mean.  I’m sure you know that, and perhaps there’s something being lost in the translation.


#47    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:19

Wandering/42: god.  I always say that we are not talking about fate, that I am not talking about something predetermined, that I’m not talking about something that is 0 or 1.

So, whatever entity you can tell me that know the uncertainty of the true mean is 0, but also has the true mean as above 0 and less than 1, that’s the entity I want.  I called her God.

I’ll call her god instead to be clear.


#48          (see all posts) 2011/06/08 (Wed) @ 22:20

From Jimmy/37
“The main nitpick I have here is that the binomial distribution is appropriate for the example of wOBA but not for ERA. If you assume a true wOBA, you can treat it as a probability and then apply it in the 1000PAs”
Wait, how is the binomial distribution appropriate for wOBA? I mean, wOBA can be greater than 1. I know it basically never is, but a) isn’t that an artifact of the weighting? and b)won’t it really only be decent to use as a percentage in the range close to .330, where that weighting was set for, and the further you get, the worse approximation of a percentage (the OBP it’s scaled to) it will be?


#49    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:25

Jimmy/45: it’s even worse.  If you observe a worse performance with men on base than bases empty, that has an even addition higher multiplier, over and above what I’ve noted!

So, not only do you have a non-linear relationship that makes the OBP to ERA translation not so clear-cut, but then you have an additional relationship based on performance with men on base or bases empty.

Therefore, when I saw a range of 0.49 runs, you can imagine why I was in shock.  My gut (semi-educated) feeling is that the range should be at a minimum 1.00, and probably closer to 1.50.  And for relievers, at least 2.00.


#50          (see all posts) 2011/06/08 (Wed) @ 22:26

Mike/36: Are you claiming baseball events are pseudorandom, then, or actually nonrandom?

If events are pseudorandom (like “random” numbers on a computer), there is an explanation for each event, but it cannot be determined in advance by any possible observation within the system.

If events are nonrandom, then there is some possible observation that can improve predictions of future events over the “true talent + random” formulation.


#51          (see all posts) 2011/06/08 (Wed) @ 22:35

@Tango/47
Yeah, I’m not talking about predetermined stuff or fate either. More like postdetermined, but really extratemporally known. But apart from the not-especially-baseball-relevant philosophy, I think that you can only give a “true mean” under certain conditions, i.e. what Player A will do against player B in park D with defense E, time through the order F, base/out state G, probably with a year-by-year or maybe month-by-month updating of the abilities.
But if you really want to get all the biases out you have to include focus levels G-R, injury levels, etc. etc. and if you can get them all, you’ve got 0 and 1. Now I’m guessing things like focus and pitch selection and sudden gusts of wind are what you’re saying is the variance, and that’s fine, but it’s explainable - just not by us really. I mean, you can’t really measure how focused or “in the zone” somebody is, but it definitely is a huge factor.
So I guess what I’m saying is that your god would tell us what the true mean is based on the talent levels and ballparks of all things involved, but not the other stuff? Things that are measurable, but not things that aren’t? Is that right?
If so, this seems more like something that we could actually do given enough time (several decades at least, probably much longer) and interest, and she doesn’t need to be a god at all. Maybe just somebody from a future where game records had been lost, or an alien from another planet where they happen to have come up with baseball as well, but a bit earlier.


#52    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:36

I think based on Mike’s chess analogy, that he thinks there’s a give-and-take kind of relationship, and that, eventually, we should be able to see that to some extent.

My counterpoint is that because baseball is physical, it’s not a question of choice, which is what is in chess.  It’s a question of execution.  And a human can’t execute in such a predetermined fashion.

Even a machine won’t be able to get p=1.000.  Dr Nathan can setup his bat-ball collision test, but I can guarantee you that he’s going to run that experiment at least 10 times, because he knows that there’s going to be something that is irreproducible.  And, I can also bet you that the results he gets will follow a binomial distribution.

In chess, you move your pawn forward one square, well, it moves one square, regardless of how long it takes you, and whether you move it exactly 1 square or 0.99 squares.

If the true OBP is .400 for a player, we’ll never get our estimate less than .250 or higher than .500, no matter what you know about the player (other than something obscene like he’s hungover or decided to give up).  We can’t know so much about the property of something that we can reduce the mean all the way to 0 or 1 or .001 and .999.

Right now, our uncertainty of a mean estimate is around .020.  So, .400 +/- .020 (something like that).  If you are lucky, you can get .350 +/- .010 in one situation and .425 +/- .010 in another.

You are NOT going to get .250 +/- .001 in one situation and .500 +/- .001 in another.


#53    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:41

Wandering/51: this is what I think Mike would agree with, and I disagree with you.

You can know the property and behaviour of ALL entities.

What you can’t know is how and when everything will react when the collision happens.  You can’t know p=1 or p=0.

That’s because a human doesn’t just choose to swing, he actually has to swing.

So, I am including in god’s data the true mean, property, and behaviour of all entities involved.

What this god doesn’t know is at a particular point in time-space, exactly how all the entities will produce a binomial result at the collision point.

Replace a human with a pitching machine and a batting machine, then, yes, maybe you can get to 1.000.  (And god-damn it, you are going to be enormously precise in your calibration to make sure that the batting machine is going to swing at the right time without knowing the exact speed, trajectory or location of that pitch!)


#54    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:46

Continuing on that theme, imagine a perfectly calibrated pitching machine that allows throws 90, allows a fastball with the same break, and always down the middle.

If the ball travels 250 feet, it’s a hit.  If it doesn’t it’s an out.

Is the batter going to have exactly a .000 or 1.000 average?

Or, do we accept that, sometimes, the batter’s swing plane, power, or snap is simply going to be off a smidge?  And there is NOTHING we can do to predict that?


#55          (see all posts) 2011/06/08 (Wed) @ 22:49

@Jimmy/41: Sign error on my part.
A’ should be our best estimate of true talent.
A’ +/- E2 is the _real_ true talent (as observed by an omniscient being in a hypothetical universe without predestination. . . the paradoxes involved sound like a good idea for a novel).

I don’t see any true binomials wandering around.  E2 comes from uncertainties in the probabilities of the various state transitions; E1 comes from which of the probabilities are realized.  I think we can definitely state that E2 is nonzero.


#56    Tangotiger      (see all posts) 2011/06/08 (Wed) @ 22:50

"allows” = always

Sorry… watching hockey at the same time!


#57          (see all posts) 2011/06/08 (Wed) @ 23:06

Okay, think I got it now. Totally disagree with you (and Mike for what it’s worth), in a subtle but philosophically important way. Of course from a practical sabermetric standpoint, ESPECIALLY where we are now, you’re totally right.
And as for the ostensible topic of the thread, it’s really cool, though I’ve been aware of it for some time, when somebody pointed out how few games Jeter didn’t make it on base in one of the last few seasons, and I thought to myself “isn’t that about what you’d expect from a random distribution of OBP and PA/G?” and it was within like 3 - and the next few players I checked were all within 6 I think (seasonal totals).


#58          (see all posts) 2011/06/08 (Wed) @ 23:18

@Tango/54
Totally agree that it wouldn’t be 1 or 0. But I think this is largely due to things like focus, “in the zone-ness”, practice, fatigue (this is a big one - muscles just don’t do the same thing after a while), some inherent consistency skill - basically the kind of things you find in free throwing in basketball, albeit with different muscle groups and muscle memory and stuff. And then there are going to be things like muscle spasms.
Even if it was just making contact it wouldn’t be 1, and even in chess you actually have to be able to move the piece where you want, and that’s basically going to be 1, but it’s going to be pretty damn high if you’re not extremely fatigued. Grandmasters have written about their hands making moves their minds didn’t want, but I can’t say I really buy that.


#59    Sunny Mehta      (see all posts) 2011/06/09 (Thu) @ 01:14

Tango said:

“I’m saying the binomial is indisputable for the issue at hand.  There has been no evidence presented to dispute this, by you, or anyone anywhere.”

The following article presents evidence to dispute it (even though I get the feeling the author himself didn’t understand the magnitude of his findings):

http://www.fangraphs.com/blogs/index.php/were-going-streaking-again/

Also, Jim Albert has published several essays that exemplify breakdowns in the assumption of variance being binomial.

Bottom line, the binomial model has NEVER been proven to be “correct” in baseball, and in fact is pretty clearly not “correct.” I’ve discussed this personally with Albert. However, what the binomial model IS, is: 1) damn close, and 2) damn convenient.


#60    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 06:54

I don’t see any evidence there.

He didn’t account for bias, notably: times through the order.  Furthermore, he’s not facing the same pitcher each game. This means that each of the 661 PA has a DIFFERENT expected mean.  You can’t just randomly reorder them as he has.

Anyway, even without accounting for that bias, I see nothing there to dispute the binomial.

That aside: wonderful research!


#61    Guy      (see all posts) 2011/06/09 (Thu) @ 08:39

#59 What’s astonishing about that research is how LITTLE streakiness he found.  You have lots of factors that should cause streaks:  weather, parks/opponents, player health, place in the lineup come to mind (I’m sure there are others).  Despite all that, he finds only a tiny amount of streakiness overall.  I would call this very strong evidence for Tango’s position.


#62    Rally      (see all posts) 2011/06/09 (Thu) @ 09:27

"I don’t see a theoretical reason that if you had perfect knowledge that the distribution wouldn’t be 0.00 runs.”

Could we be talking about 2 different things?  If you had perfect knowledge of a player’s ability, then by definition your distribution of his ABILITY will be 0.00 runs.

But there is still a distribution on what ERA, OBP, or WOBA you should expect in his next 200 innings.  If he faces one batter, his OBP will be either 0 or 1.  Face the next batter and the distribution shrinks.  Repeat 750 times for a top starter.

Is PECOTA telling us that it think’s Felix Hernandez’s ERA ability is 80% likely to be between 2.36 and 2.85?  Or is it telling us that it expects an ERA in that range 80% of the time over his next 200 or so innings?

These are very different questions.  If PECOTA is trying to answer the first one, then it may well be correct.  If it’s trying to answer the second on, then it is almost certainly wrong.


#63    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 09:48

It’s trying to answer the 2nd one.  Otherwise, how can we possibly test it?


#64    Rally      (see all posts) 2011/06/09 (Thu) @ 10:04

Is that explicity stated?  I’d like to see Colin give an official answer to the question.


#65    Jimmy      (see all posts) 2011/06/09 (Thu) @ 10:05

@Tangotiger/46:

This may be true, but then your model would be something that we call a hierarchical or a mixture model. It is a combination of two distributions. To put it into baseball terms, let’s say you are trying to mode some kind of binomial process like OBA. If the data process generating outcomes was not a mixture, then you would have to assume a constant “true” OBA ability, which generates a success or failure in each at bat according to that true OBA probability. But if you wanted that “true” OBA ability to be able to fluctuate, then you would have to specify that randomness of the “true” OBA itself with another distribution, say a Beta distribution or something else that is constrained between 0 and 1 (because probabilities have that property). Then you would have to incorporate that additional variation by using Bayes’ rule to derive the posterior distribution of the binomial distribution given the prior distribution of the parameter.

To repeat: if your data process is based off of one distribution only, for example a binomial distribution, then by definition the parameter value is fixed and does not change. If you want to allow that parameter value to vary, then you have to model the randomness of that parameter with another distribution.

For example, one common model used by statisticians especially of the Bayesian bent is the normal mixture:

X ~ Normal(mu1, sigma1)
Y ~ Normal(X, sigma2)

In this case, the ultimate outcome is measured as realizations of Y (call them “y"), where Y has an underlying parameter mean of X, which is itself random according to another normal distribution.

Even in the case of the very easy-to-use normal distribution, deriving the variance of Y is not easy. It involves a lot of conditional probability, and, unfortunately, math.

The way you have been modeling the binomial distribution and forming confidence interval is consistent with a non-hierarchical model. In that case, you must assume that the parameter is fixed and has zero variance. Of course, none of this reflects reality but, as a famous statistician once said, “all models are wrong, but some are useful.”


#66    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 10:58

"Is that explicity stated? “

It was definitely told to me by at least Nate.  Probably also Colin.  It was certainly implied by Colin in his expose of the PECOTA Percentiles, where he tested the forecasted range against the OBSERVED actual data.

Otherwise, like I said, what’s even the point of having a percentile range like they show?  It would be completely untestable.


#67    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 11:06

Jimmy/65: this is exactly how I do it.

I estimate the true mean of something (OBP for a hitter, win% for a team, etc), and then I calculate the uncertainty of that estimate.  Then the expected observed distribution would be based on that non-fixed mean estimate.

What I was talking about here is to focus only on the idea that there is a fixed true mean, so the only uncertainty would be the random variation of the binomial trials.  This becomes the minimum variance to expect.  Add in the variance expected based on the uncertainty of the mean estimate, and you’ll get a wider observed variance.  Add in the variance expected by a non-fixed number of trials (300 PA?  1000 PA?), and that’s another yet wider variance observed (rather than presuming all pitcher will face 1000 batters).  And finally, adding yet another variance that the sequencing of events impacts the runs allowed, and we get another even wider variance.

Once you do all that, you then realize that someone saying that the +/- 1.28 SD gap will be 0.49 runs is outlandishly wrong, since it can’t even correspond to the most certain of forecasts (those of god… zero uncertainty in true mean estimate, guaranatee of number of trials, and sequencing being a non-issue).


#68    Jimmy      (see all posts) 2011/06/09 (Thu) @ 11:10

@Jeremy Williams/55:

“A’ +/- E2 is the _real_ true talent (as observed by an omniscient being in a hypothetical universe without predestination. . . the paradoxes involved sound like a good idea for a novel). “

I understand your point that the true talent can vary, but in that case you can’t model it with a single binomial distribution. The binomial distribution is a mathematical function whose functional form is specified by A`. Once you allow A` to vary, you have a multitude of binomial distributions. In order to USE that in any way (i.e. to get probabilities and variances, not just to talk about in a vague sort of way) you have to also specify the functional form of the randomness in A`. That is where the complication lies.

But no, if you are going off a single binomial distribution you cannot have variation in A`. It is fixed by definition.


#69    Jimmy      (see all posts) 2011/06/09 (Thu) @ 11:16

@Tangotiger/67:

That makes sense to me, but it certainly wasn’t how I read it in the beginning from the multitude of voices on this board. Your reasoning is sound, it’s just that some things are getting lost in translation and misinterpreted by people who don’t quite grip the basics of statistical inference.

Once again, it is of the utmost importance to distinguish between the “true” talent (i.e. the population parameter) and the “observed” talent (i.e. the sample statistic) which estimates the “true” talent. These things must be kept distinct, or else one risks confusing everything with everything.

Thanks for your patience!


#70          (see all posts) 2011/06/09 (Thu) @ 11:33

Jimmy,

That makes sense to me, but it certainly wasn’t how I read it in the beginning from the multitude of voices on this board. Your reasoning is sound, it’s just that some things are getting lost in translation and misinterpreted by people who don’t quite grip the basics of statistical inference.

If you are talking about me, please tell me specifically what you think I am doing wrong.


#71    Lee      (see all posts) 2011/06/09 (Thu) @ 11:42

I was pretty surprised to see this lengthy of a discussion from Tango’s original assertion. It seems to me that the onus is to prove that the binomial is NOT the ideal **approximate** distribution.

There have been many thoughtful posters bring their expertise on the nuance and explicit mathematics surrounding when exactly you can truly claim that the data really is following a true binomial distribution, but for the purposes of Tango’s original point - all you need to know is that it’s pretty darn close, which he attempted to show with the examples above.

Once you make the leap and say that the binomial is at least a pretty good approximation for the distribution, from there you can clearly see that the PECOTA percentiles are way off - and that is STILL assuming we have perfect information on the player (God’s knowledge… for anyone still confused.)

I thought Tango’s point was a slam dunk as soon as a I read it, and what I find most interesting in this entire thread is Mike’s assertion in #32 (excuse my lack of block quotes, does this board use HTML? Not sure the best way to quote this...)

“I don’t see a theoretical reason that if you had perfect knowledge that the distribution wouldn’t be 0.00 runs”

This is the most flooring sentence in the entire thread, and I’m not sure where to begin trying to refute it. There are so many variables that affect the outcome of plays in baseball that are completely and utterly outside of the individual player’s control (and even his teammates’ and opponents’ assuming you have perfect knowledge of THEM and can model it into your projection as well), and therefore, out of control of “perfet knowledge/god”. The littlest thing, down to a hairline fracture in a bat, can affect a play.

To think that a 0.0 distribution is possible… is crazy.


#72    Jimmy      (see all posts) 2011/06/09 (Thu) @ 11:52

@Mike Fast/70:

Mike, I wasn’t talking about you. I think you bring up some interesting points. I was speaking more generally about those who don’t really know what a “probability distribution” is, or what a “population parameter” is, because those things actually have very precise definitions that shouldn’t be bungled. For example, a “0.0 distribution” is not a distribution, nor was Tango or you even originally referring to a distribution. It is a confidence interval, which is also something that has a very precise definition.

I think, in general, that statistics is a field where the underlying technique is often much more complex and non-intuitive than people make it out to be.


#73    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 11:56

[quote]Type it this way[/quote]

Type it this way

***

To think that a 0.0 distribution is possible… is crazy.

I think it was brave of Mike to post his statement, because it shows where the disagreement is.  He’s making a statement that the observed variance will be zero after more than 1 trial.

It shows why and how we are disagreeing, and perhaps points to a misunderstanding.

The only way for the observed variance after N trials to be zero is:

1. fate (the estimated true mean was actually the exact true mean and that true mean was either exactly 0 or exactly 1)

2. cyclical (that the observed mean matched the true mean after X, Y, and Z trials)

You won’t get fate, unless all we have is a choice, like the chess example (you repeat your steps, and your pawn will always move one piece).  If you have to exert yourself, if there are multiple moving parts, each dependent on the other to some degree, then you can’t have fate.  That’s why we run experiments more than once.

And cyclical is extremely problematic, as it requires the result of one trial to be completely dependent on the result of a previous one or many trials.  So completely that if you repeat the one trial, you will ALWAYS get back the same result.  Again, makes sense in chess that you would always move your pawn one way in response to your opponent’s move.  But, it can’t apply in a physical exertion / mechanical process.


#74    Lee      (see all posts) 2011/06/09 (Thu) @ 12:13

Sure, I can accept that Mike may have exaggerated (or maybe personified his point of view) to identify the disagreement, which was not completely apparent from the onset. (Not to put words in Mike’s mouth.) Because 0.0 is indefensible.


#75          (see all posts) 2011/06/09 (Thu) @ 12:39

Lee, I debated about making another post to say that I meant “arbitrarily close to zero” rather than perfectly zero.  My point is that how close to zero you can get depends on your knowledge.  There is not some floor set by the binomial distribution using p = true talent mean as determined from a multi-season sample beyond which additional knowledge cannot take you.  That’s a mathematical feature of the model Tango is using, but it’s not a physical reality.  The more perfect your knowledge, the closer to zero you can get, with no theoretical limit but zero itself.  Since Tango was granting a God’s-eye view, I assumed that included things like knowing the weather and the wind gusts and the exact mass and size of the bat, etc.  I was not in any way saying that I thought we could humanly measure all those things with that perfect accuracy.


#76    Lee      (see all posts) 2011/06/09 (Thu) @ 12:58

Mike, I think I realized that a bit after I posted that you didn’t literally mean 0.0, however I still think there is an important distinction to make:

In our “God” scenario, which I understand as: In addition to knowing every player’s precise skill in every facet of the game, you have every other possible piece of data you could ever collect, and then some. You know which way the ball will bounce off each pebble, and the direction and speed of every droplet of water falling from the sky.

Even in your wildest “God” scenario, there is a human element. The first that comes to mind are umpires. Umpires by nature are imperfect, and while the may be predictable to a degree, they are un-perfectly-predictable, even to our “God”, assuming we all have free will. Plus, there is the human element in every player. While you can know skill and speed and muscle, you can’t know or perfectly predict decision making. This is I believe is what Tango is referring to as “Fate”.

My point is that, you’ve backed off from 0.0 to:

The more perfect your knowledge, the closer to zero you can get, with no theoretical limit but zero itself.

Which still doesn’t incorporate the fact that humans are playing a game in which decisions are made in tenths of a second, by everyone involved, and no model can ever predict that with near-perfect certainty.

This sufficiently strayed from the main point, and I apologize. But I do think it’s an interesting discussion.

In the end, I do think the term “God” maybe did muddle the discussion, even if I think most people understood how Tango meant it to be used. When you assume “God” means, “knowing all true talent and every piece of data, ever” this is more constructive than it meaning “knowing literally everything, including the brain function down to the synapses in someone’s brain and how they will react”. Because it’s ONLY in that second definition of “God"-like knowledge, that you can ever reduce the uncertainty to effectively 0.


#77          (see all posts) 2011/06/09 (Thu) @ 13:23

Lee, this is a discussion that spans a number of threads here.  I don’t believe I backed off at all or changed my position.  I did use a shorthand when I said “0.00”, but I had explained my position in great detail here in other threads already.

My core disagreement with Tango has nothing to do with what he posted at the top of this thread (in the green box).  I agree with that, if not 100%, then pretty darn close, not worth quibbling over anyway.  He posted that after I wrote much of what I wrote in the comments, which were moved to this thread after it was created.  It may appear that my comments were in response to what Tango wrote in the green box, but they WERE NOT.

My bone of contention with Tango is over whether we can move our knowledge level substantially beyond the “binomial distribution with true-talent mean from a multi-year sample”.  I contend we can make significant progress beyond that, Tango says anything we could do would be very marginal, what he calls small biases to the binomial model (correct me if I’m wrong, Tango).  In that view, though I don’t like it statistically (because I don’t think the biases are independent, which the binomial model requires), Tango and I are arguing over the size of the biases.  I think they’re likely quite big, and the challenge is finding ways to measure them.  Tango believes they are small, and that even the most creative measurements would only give small gains.  (I think.)

Another important point of--disagreement is not the right word, perhaps difference in focus?--between Tango and I is that Tango is mainly focused on predictive power of our knowledge, whereas I am interested as much or more in a descriptive understanding of the past, simply for an understanding of how things work in its own right, but also for application of that knowledge for training/coaching/diagnostic purposes.

Those are not unrelated, of course.  To the extent that you understand why something happened, you ought to be able to predict something about how it will happen in the future.  For deterministic processes (like ball-bat collision), there should be a pretty much 1:1 correspondence with understanding why it happened and predicting how a future event will happen.  For chaotic processes, like brain function (and perhaps muscle movement), maybe understanding why an event happened in the past will give us less predictive power than we want.

I certainly see how our ability to measure imposes limitations on our ability to gain knowledge about the causes of baseball events.  I do not see how the binomial model with seasonal-level inputs imposes any limitation on our ability to gain knowledge about the causes of baseball events.


#78    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 13:25

Even forget humans.  You can have a pitching machine throwing to a hitting machine, and the pitching machine can only throw fastballs at 90mph, and the only thing it can differentiate is location.  And you still won’t get anything close to 0 or 1.  Even if you want to control for location, you still won’t get close to 0 or 1 for a machine.

And if you can’t get it for a machine, where you know the property and behaviour of everything, including how it responds to various stimulii, you certainly can’t get it from humans.


#79          (see all posts) 2011/06/09 (Thu) @ 13:30

Two of the previous threads in this discussion train can be found here:

http://www.insidethebook.com/ee/index.php/site/comments/two_kinds_of_luck/

http://www.insidethebook.com/ee/index.php/site/comments/reader_mail_of_the_day_what_is_luck/


#80          (see all posts) 2011/06/09 (Thu) @ 13:48

Tom, what is the basis for your claim in #78?  I’m not aware of any experiment that remotely approaches the claim you make.

If you’re talking about standard pitching machines, are they any more accurate/controllable than a human pitcher?  And I’m not aware of anything but the very crudest robotic hitting machines.


#81    Lee      (see all posts) 2011/06/09 (Thu) @ 13:59

Tango/78

I agree with you. I was trying to start from the most irrefutable source of uncertainty, the human element, to show that 0.0, or anything close to it, is not possible.

Can you explain to me what you mean by the “true mean being 0 or 1”? I reread 73 and am not getting it.

----------

Mike/77

To the extent that you understand why something happened, you ought to be able to predict something about how it will happen in the future.  For deterministic processes (like ball-bat collision), there should be a pretty much 1:1 correspondence with understanding why it happened and predicting how a future event will happen.

Do you think this quote does the complexity of the game of baseball justice? Apologies is it feels like I am bringing up the same point without making progress, let me quote you again and ask you a question.

The more perfect your knowledge, the closer to zero you can get, with no theoretical limit but zero itself.

If this is too lengthy of a thought experiment, feel free to pass on it. But I’d love to hear a description of your perfect “God” model in which you can predict (to the level of certainty described in your quote) the performance of a player over the course of any moderate interval of time. It’s hard for me to believe that you can craft even a hypothetical model that would allow you to overcome the human element and decision making of players.

This is again off topic in terms of the binomial. Apologies.


#82    Jimmy      (see all posts) 2011/06/09 (Thu) @ 14:04

@Mike Fast/77:

My bone of contention with Tango is over whether we can move our knowledge level substantially beyond the “binomial distribution with true-talent mean from a multi-year sample”.  I contend we can make significant progress beyond that, Tango says anything we could do would be very marginal, what he calls small biases to the binomial model (correct me if I’m wrong, Tango)

Hi Mike--the catch that I’ve been trying to point out here is that the “true-talent mean” you refer to cannot be observed from a multi-year sample, or any other kind of sample for that matter. The critical assumption of the binomial process (which is theoretical to begin with, thus making it fruitless to argue over whether it is 100% realistic or not) is that the agent has an underlying talent level which is constant and never changes.

I’m also a little confused by what you’re talking about when you refer to “biases to the binomial model.” A model is a model… what exactly is the bias you are speaking of? Are you speaking of small ways in which the assumptions of the binomial model (i.e. independent Bernoulli trials, constant true talent level) may be violated? That is a slightly different discussion.

In my opinion, the discussion has floated a bit away from the original point that Tango made. The overview he gave of his thinking in comment #67 is concise and to-the-point. Everyone should try to understand what he said there first before moving on, because it seems to me like everyone is starting to form different perceptions of the basic thing we’re discussing.


#83    Jimmy      (see all posts) 2011/06/09 (Thu) @ 14:15

And I want to clarify that when I mean “cannot be observed” I mean that the mean can only be inferred from observations, but can never been completely known with complete certainty by the (non-prescient) observer. He can make better and better estimates/inferences if he knows the data process, but he will never be able to conclude with certainty what the true-talent mean is solely from observations (no matter how many of them there are).

-Jimmy


#84          (see all posts) 2011/06/09 (Thu) @ 14:27

@Jimmy/68,69:

I understand your point that the true talent can vary, but in that case you can’t model it with a single binomial distribution.

I think we’re mostly in agreement.

I am not claiming any time-dependence (or other variation) of true talent.  I am just claiming that the difference between our measured result (A) and our estimate of true talent (A’ ) is due not only to sampling uncertainty (E1) but also due to systematic uncertainties in our estimate of A’ (E2).

Once again, it is of the utmost importance to distinguish between the “true” talent (i.e. the population parameter) and the “observed” talent (i.e. the sample statistic) which estimates the “true” talent.

I’ll disagree about the “utmost importance” of distinguishing between the sample statistic and the population parameter.  It’s a good distinction to make, but not always an important one.

I was actually trying to make a 3-way distinction:
1) The measured sample statistic (A).
2) The estimated population mean (A’ ).
3) The true population mean (A’ +/- E2).

The description in my original post swapped 2 and 3.


#85          (see all posts) 2011/06/09 (Thu) @ 14:35

In my opinion, the discussion has floated a bit away from the original point that Tango made.

This discussion didn’t start with what Tango wrote up in the green box.  Some of comments in this thread were written before that.  Tango wrote what he wrote in the green box in response to me (after comment #5, specifically).  It seems to me we’re well in the center of the topic at hand.

I understand the reason for Tango moving these things out of the PECOTA thread, because they were off topic there, but I think it’s resulted in a bit of confusion here about what the topic really is.


#86    Jimmy      (see all posts) 2011/06/09 (Thu) @ 14:38

@Jeremy Willians/83:

Unfortunately, I have to insist that you are incorrect. The true population mean, again, does not vary. Let me give you an example: suppose that I am trying to estimate the average age of all the people in the United States at a given point in time. I claim that there is some true average age at that given point in time which can be had by surveying every single person in the US and then taking the average.

Since it is infeasible to do such a survey, I instead take a sample of the population and get some average from that sample. That is a sample statistic. When I take repeated samples (assuming I can do them instantaneously), that sample statistic will vary. That is called sample variance and is what you refer to as “sampling uncertainty.” But the underlying population mean will stay the same.

The problem here is that you are constructing an alternate source of variation that you call “systematic uncertainties in our estimate of [the true population mean].” But I claim that this is in fact the exact same thing you refer to as “sampling uncertainty.”

Again, the issue here is that the true population mean is not something that is sampled. It is an intrinsic characteristic of the thing you are studying which does not change over time OR repeated samplings. If you have an issue with this, then like I said earlier you have to either model that true population mean with a mixture model or abandon the attempt to impose a data process altogether.


#87    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 14:45

Jimmy clarified that Tango/67 is basically the crux of the argument for the most current discussion we are having.

***

Jimmy makes a good point that we can’t know the true mean of something without some sort of observation.  How fast is Usain Bolt?  We can measure how much he bench-presses, we can measure his muscles, we can measure every part of his anatomy in fairly precise terms with limited experiments.  But when you put it together, are you going to know what his expected true mean is over running 100m?  Or, are you going to rely on (mostly) on his actual performance in running the 100m to estimate his true talent level at the 100m?

Are you going to be able to estimate what the true mean is of Pujols facing Doc at 25C, at 16:00 at Busch?  You’re going to need an enormous amount of observation to get the uncertainty of that mean to be anything close to zero.  And even if you do, the mean certainly won’t be 0 or 1.  The mean estimate of Pujols v Doc will be pretty much determined, for the most part, based on the Odds Ratio Method.

***

True mean being 0 or 1 means predetermined fate, that the result of an event will have been perfectly foretold.


#88          (see all posts) 2011/06/09 (Thu) @ 14:48

Hi Mike--the catch that I’ve been trying to point out here is that the “true-talent mean” you refer to cannot be observed from a multi-year sample, or any other kind of sample for that matter. The critical assumption of the binomial process (which is theoretical to begin with, thus making it fruitless to argue over whether it is 100% realistic or not) is that the agent has an underlying talent level which is constant and never changes.

I’m also a little confused by what you’re talking about when you refer to “biases to the binomial model.” A model is a model… what exactly is the bias you are speaking of? Are you speaking of small ways in which the assumptions of the binomial model (i.e. independent Bernoulli trials, constant true talent level) may be violated? That is a slightly different discussion.

In both these cases, I’m trying to use Tango’s language to argue a point with him.  I do not believe that the unchanging talent level is a good approximation.  Tango does. (I think.) I understand that the true-talent mean comes with an uncertainty when you observe it from sample data.  This is where I believe Tango is invoking the god concept--that if god told him the true talent mean and shrank the uncertainty to zero, then he believes he would have a very-near-optimal model for the game, and that there is no knowledge which could tell you why a particular plate appearance ended with a hit or an out, you simply have a probability and a binomial distribution, and pure random chance determines the result.

I’m arguing that that is a gross over-simplification of the processes in play for a single plate appearance, and that it only works fairly well for a large, seasonal-sized sample because many of the unique conditions specific to a single plate appearance wash out over a much larger sample (e.g., the pitcher’s ability to locate his changeup, the pitcher’s fatigue level, the pitches this pitcher threw to him in the previous plate appearance, whether there are men on base, the positioning of the fielders, the lighting conditions, etc.).

These are the sorts of things that Tango called “biases” to the probability for the binomial distribution.  If they are small, I think you can probably be okay modeling them as changes to the probability (the “true talent mean") for the binomial distribution, like we do with things like the global platoon differential.

My impression is that Tango believes that things like the global platoon differential are the biggest kind of such “biases” to the mean that we could see, and other sorts of “biases” are likely to be much smaller.

I think the “biases”, when measured well, are likely to be of such big magnitude, and interdependent, such that modeling them as independent small changes to the binomial probability is no longer valid.


#89    Jimmy      (see all posts) 2011/06/09 (Thu) @ 14:52

And to tie that population example back to baseball (this is my last one, I promise):

Let us assume we are trying to estimate the true talent level of some baseball player via their batting average. I claim that this player has some true talent level which generates outcomes on the field via a binomial process (either hit or no hit, over “n” trials) and furthermore is constant but unobservable.

Since this level is unobservable, I instead try to estimate it via a sample from a game or a season. That estimate, formed from observations, will vary with each game or season I observe. That is the sampling variance. But I claim that the underlying talent level will not change.

If you claim the underlying talent level will change, then you model that with a another probability distribution. But that probability distribution will also have parameters which must remain fixed (otherwise it’s not a probability distribution).

You can stack probability distributions like this until the cows come home, but the fact is that at some point you have to hold some value fixed, which represents the underlying talent level of the player. And that makes sense, because just think about what you’re saying when you say that Ichiro is “better” at hitting than Jeff Cirillo--you’re saying that, on some level, one of these players has an intrinsic characteristic that makes them more successful than the other over repeated samplings and time.


#90          (see all posts) 2011/06/09 (Thu) @ 15:01

If this is too lengthy of a thought experiment, feel free to pass on it. But I’d love to hear a description of your perfect “God” model in which you can predict (to the level of certainty described in your quote) the performance of a player over the course of any moderate interval of time. It’s hard for me to believe that you can craft even a hypothetical model that would allow you to overcome the human element and decision making of players.

I am definitely not trying to avoid the human element.  On the contrary!  I believe that learning why players choose certain approaches in certain situations is a critical facet to understanding the game.

I’m not sure that I can answer your hypothetical simply because there are an infinite number of things which play into the results.  With perfect knowledge, you know all of them.  With slightly less than perfect knowledge, you lose some of them.  Hypothetically speaking, it’s just a matter of measurement.  That’s for past understanding.  For prediction, it gets more difficult when you have chaotic processes.  Take the weather, for instance.  We understand how the weather works quite well, but predicting what will happen very far into the future is very difficult.  However, it’s not a knowledge problem, mainly, it’s a measurement and computational problem because of the way that chaotic processes diverge so quickly from the inputs.  Some of baseball is probably like that, too, but I’m not sure how much.  At least a large part of it is just deterministic Newtonian physics.

Let me give a more concrete example.  Take Jose Bautista in September 2009.  Tango’s model would notice that he hit a few more HR and tweak its estimate of his true talent mean by a little bit.  But a method that measured his swing and body position and incoming and outgoing ball trajectories could have told you very quickly what he had changed and how it was affecting his results.  The Tango large-sample binomial model would simply say that it was mostly luck.

The more detailed model could also have given you much better confidence in predicting his future results than the Tango large-sample binomial model.  For prediction, unlike understanding past results, you’d have to include some study/understanding of how “sticky” batting stance adjustments (or whatever your evidence showed that Bautista had changed) are for the future and how able pitchers are to adapt against that, etc.  If you’re talking about that as the “human element”, then I agree, but that’s something somewhat different than (though related to) what I was claiming.


#91    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 15:04

Jimmy: Mike is saying, I think, that for any given plate appearance, given the property and behaviour of every single entity, we can then establish not only the uncertainty in the true mean to approach 0 (i.e., my god) but true mean itself is also 0 or 1 (i.e., fate).

If the true mean is anything other than 0 or 1, then the binomial applies.


#92    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 15:09

I’m not sure Mike is appreciating that if you have a true mean of something that is not 0 or 1, then you have no choice but to accept some sort of probability distribution (e.g., binomial) for the observed events for that true mean at that point in time-space.


#93    Jimmy      (see all posts) 2011/06/09 (Thu) @ 15:14

Jimmy: Mike is saying, I think, that for any given plate appearance, given the property and behaviour of every single entity, we can then establish not only the uncertainty in the true mean to approach 0 (i.e., my god) but true mean itself is also 0 or 1 (i.e., fate).

Yeah, I think that the distinctions are quickly becoming philosophical… I’ll abstain here, haha.


#94          (see all posts) 2011/06/09 (Thu) @ 15:15

I’m not sure Mike is appreciating that if you have a true mean of something that is not 0 or 1, then you have no choice but to accept some sort of probability distribution (e.g., binomial) for the observed events for that true mean at that point in time-space.

No, I understand and appreciate that quite well.  I’m not really arguing with you over the shape of the distribution.  I am arguing about how much the “true mean” can vary and how much of that variation we can potentially, as humans, measure.  I’m speaking mainly in a backward-looking sense, but I think the same applies to a lesser but still very significant sense for the future.

Btw, would you mind rescuing my earlier post in this thread from the Askimet filter?


#95    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 15:26

The discussion between Mike and I started with him disputing me here:

You start with logic.  Logically speaking, I’m right.  The absolute minimum range, if you are God, is a 0.80 runs range.  It is indisputable.  Once you add uncertainties and nuances of how runs are created in baseball, you have no choice but to accept the range is going to be much bigger.  It cannot logically be less.

Regardless of how we estimate the mean (whether it’s too hard to do anything other than say Pujols is always a true .420, or Mike can figure out in certain circumstances that he’s a .250 or a .600 based on some indicators), the key point is this:

Once you have a mean (and the mean is greater than 0 and less than 1), you need a distribution that tells you how often you are going to observe that mean over N trials.

The best distribution we have is the binomial.  If Pujols is a true .250 OBP over a particular environment, we still need to know how often are we going to actually observe him hitting more than .300 if given 200 trials.  And how often are we going to actually observe him hitting less than .100 if given 50 trials.

We still need the binomial distribution (or some probability distribution).

That’s because Pujols cannot OBP at .250 if N=1.  At N=1, he either OBP at .000 or 1.000.  At N=2, he either OBP at .000, .500, or 1.000.  And so on.


#96          (see all posts) 2011/06/09 (Thu) @ 15:46

I think what Mike is saying (and I agree, if I characterize him right) is that ”The best distribution we have is the binomial” is just wrong.  What we really have is a distribution that is the result of many bernoulli trials, each with different p, and which are not independent of each other.  Over large numbers of trials this composite distribution will approach a normal distribution due to the Central Limit Theorem, but over small numbers of trials, such as a month or even a season, it will not.

The fact that the binomial distribution works for the large sample case, and is convenient doesn’t make it right.  For large samples it explains things pretty well, but so does a normal distribution.  For small samples it isn’t contradicted, but neither are a lot of other distributions.


#97          (see all posts) 2011/06/09 (Thu) @ 15:50

I don’t agree with your assessment of the key point in #95.  To my mind the key question is whether Pujols is always a true .420 +/- .020 or something small like that, or whether we really can figure out that he’s a .250 or .600 in various circumstances.  If we can do the latter, then you can’t really say, “the absolute minimum range, if you are God, is a 0.80 runs range” or bigger.  You could potentially make your projection range much smaller.

I am not now and never have been disputing that as you move back from perfect knowledge, the uncertainty in your ability to predict will grow larger, and that you could describe those results with some level of accuracy by a probability distribution.


#98    Jimmy      (see all posts) 2011/06/09 (Thu) @ 15:58

The fact that the binomial distribution works for the large sample case, and is convenient doesn’t make it right.  For large samples it explains things pretty well, but so does a normal distribution.  For small samples it isn’t contradicted, but neither are a lot of other distributions.

Larry, I think your points are technically correct. I completely agree that the binomial model is wrong. But it really is true that “All models are wrong, but some are useful.” There are no real-life cases where any of the common distributions are right, but people use them anyway because they have been fleshed out theoretically in a way that allows us to make useful inferences.

In reality, it is probably true that each PA involves a slightly different p, but the relevant question is: are the p’s different enough to warrant giving up on the binomial model? And if so, what is the next best alternative?

I still think the argument stands that the data process generating certain baseball numbers is most appropriately modeled by the binomial distribution. Having a shifting “p” doesn’t change the fact that the PAs are more independent than not. The normal distribution may in fact fit the final data well, but it isn’t as canonical--the structure of baseball (series of repeated plate appearances, with a binary response) lends itself more naturally to the Bernoulli paradigm.

Of course, the only way to settle this is to do an empirical comparison… so I’m not claiming the last word at all here. Just my $0.02.


#99    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 16:04

...or whether we really can figure out that he’s a .250 or .600 in various circumstances.  If we can do the latter, then you can’t really say, “the absolute minimum range, if you are God, is a 0.80 runs range” or bigger.  You could potentially make your projection range much smaller.

No, you can’t!

Once you have a mean p, whatever that mean is, then the OBSERVATIONS will have a certain distribution, and there is nothing at all that you can do to change that.

The binomial (or some distribution) will offer the minimum variance possible.  You can’t get smaller.

I think there’s a disconnect here between the uncertainty of a true mean, and the observed trials (1 or 0) around that true mean.


#100          (see all posts) 2011/06/09 (Thu) @ 16:12

No, Tango.

Let’s make a hypothetical example where in half his plate appearances Pujols has a true mean of .250 and in the other half .600, and we know which of the two groups each plate appearance falls into.  We will get a better projection with that knowledge than we will if we lose the knowledge of which group each plate appearance falls into and we have to assume a true mean of .425 for the whole population of Pujols plate appearances instead.


#101    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 16:20

Yes, I’m saying I’m fine and I accept that you will have a true mean of .250 in half and .600 in the other half.

And if you have 300 PA at p=.250, you STILL need the binomial distribution to know the frequency at which he will hit more than .350 or less than .150.


#102    Jimmy      (see all posts) 2011/06/09 (Thu) @ 16:23

Let’s make a hypothetical example where in half his plate appearances Pujols has a true mean of .250 and in the other half .600, and we know which of the two groups each plate appearance falls into.

Since this situation involves no uncertainty (you know exactly which PA involves which probability), it is no different than conducting two separate analyses where .250 is the true, constant mean in one and .600 is the true, constant mean in the other. The binomial paradigm still applies in the sense that you are still modeling a data process where the true probability is constant. Nothing is different.

Also, it’s unfair to compare this with the binomial analysis over the grand mean of .425, because you have revealed the information ex-post. To make a fair comparison, you have to take away the fact that you know exactly which PAs correspond to which probability, in which case knowing the split reveals no extra information.


#103    David Gassko      (see all posts) 2011/06/09 (Thu) @ 16:25

Mike/100,

The perfect example being platoon splits. But, you will still be bound to have variance around each predicted true mean. Instead of .425 +/- .050, you might have .250 +/- .070 and .600 +/- .070 (all numbers for illustration only, but it is important to note that as you break the numbers down into categories, the uncertainties around each number will go up, though the overall uncertainty might go down). But ultimately you are bound both by the fact that you cannot perfectly estimate true talent and—and this you can’t get around no matter what—random variance.


#104    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 16:26

Jimmy: while your 2nd paragraph is correct, you added a point that is going to take us off on a tangent.  The focus should be on his 1st paragraph.  Once Mike and everyone is on board, we can talk about the 2nd.


#105          (see all posts) 2011/06/09 (Thu) @ 16:31

Tango/101, David/103, well of course.  I’ve never argued otherwise.

And likewise to Jimmy/102, with the exception that the VERY point I’m trying to make is that the two situations are not the same.  Of course it’s not “fair”.  Knowing more allows you to produce better projections.  That’s the main part of my point.

You can’t say that the perfect-knowledge limit for a pitcher projection is an uncertainty of 0.80 runs (or any other number) because you can always find more knowledge that lets you move the uncertainty lower.


#106    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 16:34

The 0.80 is not an “uncertainty"… it’s a range of observed performance.

That you CANNOT lower.


#107          (see all posts) 2011/06/09 (Thu) @ 16:34

David/103:

Given our current state of knowledge, there are higher uncertainties on the estimates of true mean for the smaller samples.  But that’s not guaranteed to always be true.  And it is irrelevant for the God-given knowledge of the true mean situation, anyway.

The point is that Tango started with the statement that the binomial distribution provides an absolute lower bound for the uncertainty in estimating the results of N upcoming PAs.  This is, in fact, not true.  Or at the very least, the evidence for this proposition is not currently sufficient.  It is, for all intents and purposes, true for the current state of predictive metrics.  That is a function of our current state of knowledge, not necessarily of the process by which batter-pitcher conforntations are actually resolved on the field.  It is entirely possible that we can uncerstand the process in ways that give lower predictive uncertainties than a binomial distribution would give.  This is, I believe, the point under dispute.


#108          (see all posts) 2011/06/09 (Thu) @ 16:36

Larry/107, yes, exactly.


#109    Jimmy      (see all posts) 2011/06/09 (Thu) @ 16:40

You can’t say that the perfect-knowledge limit for a pitcher projection is an uncertainty of 0.80 runs (or any other number) because you can always find more knowledge that lets you move the uncertainty lower.

Mike, I agree with you. And I think the solution to this whole circular argument is that anytime a lower/upper bound is stated, it has to come with qualifications, like “...given information X the lower bound is...” because it is always the case in the real world that more information leads to less uncertainty. This isn’t a mind-blowing revelation.

I think this is where we’re getting crossed up. Tango is trying to say: “given information X which includes the true mean ability Y, we can construct a binomial model from which we can theoretically calculate an 80% confidence interval for the sample proportion y.”

Then Tango is saying that, once you have that confidence interval, if you suddenly take away the knowledge of Y and instead replace it with an empirical estimator Y-hat (which is derived from observations), it will always yield a worse confidence interval.

I don’t see how this steps on anyone’s toes. Talk about circular…


#110    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 16:41

Larry/107: that would be earth-shattering if true.

What you are suggesting is that p=.333 for OBP and p=.333 for rolling a 1 or 2 on a die will NOT give you the same observed frequency distribution.

While I can accept that it won’t necessarily be exactly the same, it’s going to be so incredibly close that it’s not worth pointing out.

For all intents and purposes, once you have a p, and once the uncertainty around that p is 0, then the binomial applies, regardless of what the activity is, be it baseball, dice rolling, or racing.


#111    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 16:42

it will always yield a worse confidence interval.

Exactly.


#112    Jimmy      (see all posts) 2011/06/09 (Thu) @ 16:47

OK. I figured out where all the confusion is.

I think this is where we’re getting crossed up. Tango is trying to say: “given information X which includes the true mean ability Y, we can construct a binomial model from which we can theoretically calculate an 80% confidence interval for the sample proportion y.”

Then Tango is saying that, once you have that confidence interval, if you suddenly take away the knowledge of Y and instead replace it with an empirical estimator Y-hat (which is derived from observations), it will always yield a worse confidence interval.

What Tango really means is that, when you put the empirical estimator Y-hat back into the SAME MODEL you used to create the first confidence interval, it will always be worse. AND THIS DOES NOT DEPEND ON THE DISTRIBUTION YOU USE.

So whether you use a binomial distribution or another distribution which you discover has lower variance, once you replace the true population parameter with the sample estimator the variance will always increase. This is unequivocally true because of the added variability from generating the sample estimator of the true population parameter.

The key is to understand that the model (again, doesn’t matter whether it’s binomial or what) is the same in both cases.

-Jimmy


#113          (see all posts) 2011/06/09 (Thu) @ 16:52

Jimmy/109, as far as PECOTA is concerned, Tango may well be right, if it uses a binomial model.  Which I think it does for a large portion of its computations, though I’m not sure how the use of comparable players for aging would affect that.  But anyhow, I wasn’t ever arguing about whether Tango was right about PECOTA or not.

This argument between Tango and I goes back much further than this week.  You can read the threads I linked in #79 if you are interested.

I think it has applicability for our ability to project much better than Marcel.

But more than that, I’m interested in its applicability for things like assigning value to players in a backward-looking viewpoint, such as for MVP or Cy Young.  (I think an argument over how the 2010 Cy Young Award should be determined is what started this argument originally, iirc.)

And somewhere in between those two poles of backward-looking and forward-looking are things like training and coaching the players, which a Marcel-type model is completely agnostic toward, but for which a much-finer-grained model of “how baseball plate appearances happen” could be very useful.

What happens to take a .425-talent Pujols plate appearance from .425 to 0 or 1 is not all or nearly all “luck” and therefore unknowable or useless to know.  Much of it is knowable and useful.


#114          (see all posts) 2011/06/09 (Thu) @ 16:54

So whether you use a binomial distribution or another distribution which you discover has lower variance, once you replace the true population parameter with the sample estimator the variance will always increase. This is unequivocally true because of the added variability from generating the sample estimator of the true population parameter.

I realize that Tango said that, but I have never disagreed with that.  Of course it is true.


#115    Jimmy      (see all posts) 2011/06/09 (Thu) @ 16:59

Mike, I realize that this discussion is primarily between you and Tango, but I do not, in fact, intend for every single one of my posts to be directed at you. I wish you the best of luck going forward. Phew, time for me to get outta here.


#116          (see all posts) 2011/06/09 (Thu) @ 17:05

Tango/110:

I don’t think it would be.  Yes, I am suggesting that PAs may not be die rolls.  Nobody who plays the game thinks they are.  That’s why pitchers study video and scouting reports, try to sequence their pitch locations and pitch types, hitters play mental games.  This isn’t done to be theater, at least in their minds.  That’s hardly dispositive, common practice can often be wrong, but still, these action are taken to try to move the p needle and maybe we just don’t know how it works.  Especially since we know very little about player intent with our current data.


#117    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 17:05

What happens to take a .425-talent Pujols plate appearance from .425 to 0 or 1 is not all or nearly all “luck” and therefore unknowable or useless to know.  Much of it is knowable and useful.

When you go from .425 with a certain level of uncertainty to .250 or .600 with a no level of uncertainty, that is potentially knowable and useful.  That’s god-level knowledge.

When you go from that point (the .250 or the .600) to 0 or 1, that is ALL luck, and nothing but luck.  It’s called the random variation around the true mean.  That’s what the binomial distribution is.

To me, it sounds like Mike and I are simply right back at square 1.


#118          (see all posts) 2011/06/09 (Thu) @ 17:06

Jimmy/115, you’re actually very helpful and have contributed good thoughts to the discussion, so please don’t take my responses as wishing for you to bow out.

I do feel this discussion must be circular in some fashion, but I can’t put my finger on how.  Tango keeps telling me things I agree with as if he thought I disagreed with them, which tells me that there is still miscommunication somewhere.


#119    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 17:06

Larry/116: I said that once the p is established with 0 uncertainty… but then you are saying basically that we don’t have 0 uncertainty.  We’re talking in circles.


#120    Jono      (see all posts) 2011/06/09 (Thu) @ 17:16

Tango/117:

isn’t Mike just saying that instead of going from .425 to .250 or .600 we can actually in theory go from .425 to 0 or 1 given enough information?


#121          (see all posts) 2011/06/09 (Thu) @ 17:28

isn’t Mike just saying that instead of going from .425 to .250 or .600 we can actually in theory go from .425 to 0 or 1 given enough information?

I am saying that, but there’s also no way to prove that either wrong or right.

My biggest contention is that I think with humanly-obtainable measurements we can push our knowledge of the mean from .425 to something like .250 or .600.  The exact numbers aren’t important.  If it was .300 and .550, I’d still feel vindicated.

It seems to me that Tango is arguing that pushing our knowledge of the mean from .425 to something like .400 or .450 (with knowledge of platoon matchup, and so forth) is about as far as we can go.  And as such, that we are really not far from our optimal humanly-attainable understanding if we act like the mean is always .425, even though we know it will vary a little in reality.  And that one of the reasons why it won’t make much difference, and may well be futile, to try to differentiate when we are .400 or .450 instead of .425 is that our ability to even know the .425 is clouded by the fact that is estimated by observing a sample.  Is that an accurate summary of your position, Tom?


#122    Guy      (see all posts) 2011/06/09 (Thu) @ 17:40

Mike:  Just to clarify, you think we could eventually learn enough to distinguish IN ADVANCE the .300 and .550 true probabilities?  And I assume you don’t mean there are 1 or 2 opposing pitchers (or other factors) that would create such divergent probabilities, but some non-trivial number?


#123          (see all posts) 2011/06/09 (Thu) @ 17:49

Guy/122, yes.

For one example, what’s the platoon advantage?  I would think that we could learn things that would apply across the board that would give us 2x or 3x the knowledge (or 2x or 3x the impact, however you want to say it) about when and how the platoon advantage applies.

And that we could discover several other effects of similar size with application to a large portion of major leaguers.


#124    Tangotiger      (see all posts) 2011/06/09 (Thu) @ 17:59

I’m not really debating how far we can go from the overall .425 for Pujols to more specific conditions.  Other than we can NEVER get to 0 or 1 ever.  I agree you can get from .425 to .475 or even .500.  But, I’m not arguing that at all, how far we can go.

The ONLY thing I am talking about here in this thread is that once you’ve established the god-level mean p (that’s not 0 or 1), then the observed distribution of actual events will follow a probability distribution like the binomial.


#125          (see all posts) 2011/06/09 (Thu) @ 18:02

Tango/124, so basically you’re choosing not to respond to my critique?

I know that you want to frame the discussion the way you did in the green box at the top of this thread and in the way you did in #124, but that has nothing to do with my critique, so I’m not sure why keep falling back to there.


#126    Peter Jensen      (see all posts) 2011/06/09 (Thu) @ 18:57

I have mentioned this hypothetical previously, but it might be apt to bring it up in this context.  Suppose a certain batter (let’s call him Ichiro) has a career OBP of .373.  Every year it varies somewhat, maybe a low of .350 to a high of .414, but his career is long enough that everyone is confident that his true talent OBP has a mean of .373 with a SD of about .025.  An analyst finds that his OBP behaves as a perfect binomial distribution. 

But this is before any analyst ever thought to investigate platoon splits.  The reality is that Ichiro gets on base every time he faces a left handed pitcher and never gets on facing a righty.  (Reverse platoon, go figure).  What is creating the illusion of a binomial distribution is the ignorance of the factor of his platoon splits.  Instead of the uncertaincies increasing with the splits as David Gassko theorizes in Post #103, they actually decrease to zero, because the observed variances were not random variation, they were due to a factor not included in the original analysis.

Obviously, we already know about platoon splits, and park factors, and times through the order effect, and many other factors that could change the conditions of a player’s true talent from situation to situation.  Tango’s postion seems to be that any additional factors that may be found in the future will be minor adjustments.  Mike’s position seems to be that there is a good chance that analysis using new data will discover currently unrecognized factors that will bring us much closer to the 0 or 1 probabilities like my hypothetical.  If that actually occurs it will not necessarily be a rejection of the binomial distribution as applying to baseball, but just a change of the probabilities used to calculate the binomials for specific situations.


#127    Zack      (see all posts) 2011/06/09 (Thu) @ 19:03

Now I’m pretty dumb, but I think the question in #50 is really the crux of the issue.  Two quotes from Mike:

#114:

So whether you use a binomial distribution or another distribution which you discover has lower variance, once you replace the true population parameter with the sample estimator the variance will always increase. This is unequivocally true because of the added variability from generating the sample estimator of the true population parameter.

I realize that Tango said that, but I have never disagreed with that.  Of course it is true.

#116:

Yes, I am suggesting that PAs may not be die rolls.

The second quote is, I think, saying that whether a batter reaches base or not is non-random.  If that is true then there is a fundamental disagreement here. On one side is the somewhat platonic concept of true talent esposed by Tango, while Mike would be arguing that there is no such thing as true talent, that whether a batter reaches base or not is deterministic.

If however, Mike is saying that whether a batter reaches or not is at least partially random, than this isn’t really an argument about probability distributions, it’s just a disagreement about how accurately Y-hat can be estimated (and Tango’s point that no matter how accurately Y-hat is estimated, it will still follow a roughly binomial distribution), which is not really a fundamental disagreement.


#128    Guy      (see all posts) 2011/06/09 (Thu) @ 19:19

I think our future progress will be modest. Not sure how to quantify it, because I don’t what spread our current knowledge would produce for Pujols (facing a LH replacement-level starter, 3rd time thru the order at home VS best RH closer w/ Pujols batting as pinchhitter on the road).  But our ability to better forecast PA in a way that matters to baseball teams will be limited.  We know this for 4 reasons (at least):

1) A lot of people, both inside and outside the game, have been looking for such advantages for a long time.  Harder-to-discover factors tend strongly to also be small factors (in impact). 

2) Competitive pressures in the game already weeded out most of the interesting factors.  Good hitters who can’t hit a curve, or can’t hit the outside pitch, don’t survive.  Good pitchers whose slider gets crushed by RHH don’t throw the pitch (or don’t make the majors).  Every interesting difference is also a potential weakness, in a game that is extremely good at finding and exploiting them. 

3) Strong relationships between skills would be revealed in suprising predictive power from small samples.  For example, hitter-pitcher matchups would have a lot of predictive power.  Even if we didn’t yet understand exactly why some pitchers “owned” certain hitters, the relationships would be predictive.  And yet, we know that these patterns usually prove to have no predictive power. 

4) Many of the discoveries one can imagine will have very limited practical application.  This isn’t football or hockey where you have lots of pieces you can move onto and off the chessboard at will.  You set your lineup from a limited pool of hitters, and beyond making a few substitutions and a few tactical decisions, that’s it.  If some hitters just can’t hit in day games, well, their backup is usually so much weaker at the plate that it won’t matter.  The best hope for practical applications by far, I think, is helping some players to improve their performance.  I believe there’s potential there.


#129          (see all posts) 2011/06/09 (Thu) @ 19:23

I think the confusion is that Mike is talking about making a better estimate of the population mean, while Tango is talking about the observed sample variance around the population mean.

If I have a coin and ask you to estimate how many heads will come up in every 100 flips, you’d probably answer 50. If I did 100 sets of 100 flips I would get a distribution of results.

Now what if it turned out that I used a special coin that came up heads 67% of the time. Turns out your prediction of 50 heads sucked. So, I offer to repeat the test and let you use your new knowledge about the coin I am using. So, you now guess 67 heads for every 100 flips. We do 100 sets of 100 flips. Turns out the distribution is basically the same in both cases, but your prediction was much better the second time around.

So, Tango and Mike are both right. New info does allow you to make better predictions, but it won’t reduce the observed variance around your prediction. In order to reduce the observed variance around your prediction you would need a level of precision that isn’t realistic. If my coin came up heads 99.9999% of the time, then yes that knowledge would allow you to predict 100 heads out of 100, and in 100 sets of 100 flips the distribution wouldn’t look like my other examples.


#130          (see all posts) 2011/06/09 (Thu) @ 20:44

Where I agree with Mike (correct me if I’m wrong) is that I think there are reasons for everything happening. Many of these reasons are rather mundane and sometimes quite obvious - I’m typing this because I chose to, becuase I think it’s right, because I have internet access and a certain amount of knowledge, etc. With certain physical phenomena, it’s also simple - the Earth revolves around the sun due to its momentum and the gravitational pulls, etc. Others it’s a lot more complicated, like in shooting a basketball (ranging from free throws on the simple end which are still extraordinarily complex, to contested shots on the complex end), or basically anything in baseball. So if Player A has OBP p, whether he gets on base or not om a given instance is based on certain reasons, certain parameters (NOT fate). Those parameters over the whole population of events lead to p being what it is on average. In certain cases, part of this is well recognizable, such as with platoon splits. So in my view, you can’t really talk about someone’s true OBP, only the true OBP given certain conditions (their overall OBP would be based on average conditions, their OBP vs. righties would be based on average conditions where they play against righties, etc.). All of these will be dependant on more than just skill unless you specifically request that that be controlled for (i.e. if you’re asking god). If you have all the reasons, all these conditions, all the parameters, you should get the end outcome.
But unlike Mike, I don’t think we’ll ever get anywhere close to this amount of data. I think that the number of different parameters is astronomical, over 10^10. Even if you had all these reasons, I don’t think anybody could compute the equations fast enough to make any predictive statements. I also think it’s probably impossible to determine most of these reasons, and that it would definitely be a huge waste of resources to try. For a new parameter to be very useful, it’s got to be fairly simple. Most of them won’t be. So for all practical purposes, I’m with Tango. ‘cause figuring out how the appearance of the opponents’ uniforms affects somebody is just a big waste.


#131          (see all posts) 2011/06/09 (Thu) @ 20:51

I think a good analogy for our predictive power in baseball would be to say that we (as a species; really I mean our best experts on the subject) have about as much chance of the outcome of Pujols’ first PA tomorrow night as a meteorologist does of predicting what the weather will be like in St. Louis a year from today. We’ve got some information based on the general climate that time of year, but that’s about it. With betterresearch and data, both will improve, but the best you could practically get is something like a two week away weather forecast is right now, and we’re WAY far from that.
Trying to project out years in the future would be like guessing the weather 100 years from now, assuming that the players we’re projecting already have a decent amount of time in The Show. Even worse if they don’t.


#132    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 07:49

#127 and #129 are capturing the confusion well.

Mike, I’m not choosing to not respond (and this thread should be sufficient evidence!).I’m trying to make the question so precise that the answer will be unambiguous.

Let me try another: we all know about Gibson v Eck, right?  Let’s say Eck told Gibson what he was going to throw, how fast he was going to throw it, and where he was going to throw it.

Furthermore, let’s say that’s Gibson’s prime pitch and location.

Are we going to predict that Gibson’s 2% chance at a HR will become 100%?  No, no one will ever do that under any circumstances.

Will be predict 90%?  That is outlandish, but I WILL ACCEPT THAT.

Now, we replay the scenario 100 times.  How many times does Gibson hit a HR?  The AVERAGE is 90. 

That means that sometimes, we’ll OBSERVE him to hit 93 HR in 100 faceoffs, sometimes we’ll OBSERVE him to hit 83, or 98 or 92 or even exactly 90.

If you repeat this 1000 times, and you write down how often he hit a HR for each 100 times he faced Eck, you will get.... the binomial distribution.

That’s what the binomial distribution is: that an event will either happen (1, or HR) or not (0, or notHR).  And that the frequency distribution of observations will follow a pattern.

*If on the other hand* you are arguing that for each 100 faceoffs we’ll actually see exactly 90 HR, then what you are saying is that for each single matchup, you know at a 100% certainly whether he was going to hit a HR or not, that you were 100% sure 90 times he was going to hit a HR and 100% sure 10 times he was not going to hit a HR.  You are, in effect, stating that each PA not only has its own mean (which is fine), but that we can expect to know that true mean to be either 1 or 0.  That’s called fate.

If you agree with everything here, then we can talk about how far we can go from Pujols .425 to a specific .250 or .600, which is, by the way, the far more interesting question.


#133    Guy      (see all posts) 2011/06/10 (Fri) @ 09:04

then we can talk about how far we can go from Pujols .425 to a specific .250 or .600, which is, by the way, the far more interesting question.

One might even say the ONLY interesting question....


#134    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 09:20

Guy/133: It *is* the only interesting question… as long as we understand the distinction between the uncertainty of a true mean, and the random variation we observe *of* that true mean.

Since there still seems to be some level of confusion here, it doesn’t really make sense to talk about the interesting question until we all have the same basis of understanding of statistics and probability.


#135          (see all posts) 2011/06/10 (Fri) @ 09:26

I think we have a philosophical disagreement about the underlying realities that is impossible to resolve.  Impossible, as in not worth discussing another moment.

Let me try another: we all know about Gibson v Eck, right?  Let’s say Eck told Gibson what he was going to throw, how fast he was going to throw it, and where he was going to throw it.

Furthermore, let’s say that’s Gibson’s prime pitch and location.

Are we going to predict that Gibson’s 2% chance at a HR will become 100%?  No, no one will ever do that under any circumstances.

Will be predict 90%?  That is outlandish, but I WILL ACCEPT THAT.

Now, we replay the scenario 100 times.  How many times does Gibson hit a HR?  The AVERAGE is 90.

That means that sometimes, we’ll OBSERVE him to hit 93 HR in 100 faceoffs, sometimes we’ll OBSERVE him to hit 83, or 98 or 92 or even exactly 90.

I agree with the above.

If you repeat this 1000 times, and you write down how often he hit a HR for each 100 times he faced Eck, you will get.... the binomial distribution.

That’s what the binomial distribution is: that an event will either happen (1, or HR) or not (0, or notHR).  And that the frequency distribution of observations will follow a pattern.

At a theoretical level, I don’t agree that the distribution always has to be binomial.  Yes, the observations will obey some probability distribution.  Given our current data and current methods of looking at it, that’s the binomial distribution.  I’m not proposing using a different distribution, but I hold out the possibility that the binomial may not apply in all circumstances.  However, that’s a theoretical objection at the moment, and again, not really worth discussing further, in my opinion.


#136    DavidS      (see all posts) 2011/06/10 (Fri) @ 09:48

@126

If your scenario in the second paragraph were reality (and Ichiro’s observed fluctuations in OBP were due to the mix of LHP and RHP he faces) this would NOT produce a binomial distribution.  You would have MANY more expected games with either 5 hits or 0 hits than one would expect. For the Pujols example, if he is .250 OBP in certain situations and .600 in others, this will not produce a binomial distribution with mean .425.  Tango, correct me if I’m wrong, but it appears as if you have backed off your earliest position. I thought you started by saying that Pujols’ PA results roughly follow a binomial distribution with mean .425 (roughly).  You seem to now be saying that if, in certain cases, we know his true talent to be .250, then those PA will follow a binomial distribution of .250.  I don’t think anyone is going to argue with a statement that basically says “if all of the known information gives a mean of X, then any variation around X is luck, and hence, the results will be binomially distributed”.  However, unless the distribution of these true-talent .250 and .600 events is quite random (or miraculously well-distributed), you are going to see more clumping than with the original assumption.  I believe you are correct wheen you say the additional gains to our knowledge of Pujols talent at any given time are small and I believe this is justified by the data you provided about Ichiro and Raines in the very beginning.

@130 (and to the others who seem to be espousing a more deterministic/predictable model of human behavior)

I’m not criticizing at all, but I noticed in the fourth line of your post you typed “because” twice but had a typo in the first.  What do you think caused this?  Could it simply be that despite your best intentions, your fingers/brain aren’t accurate enough to type the letters in the correct order that you want them every single time?  I’m assuming there are no wind/park factors/defense/pitching that play into this.  Just you and your keyboard.  Do you (or anyone else here) think that this could have possibly been knowable beforehand?


#137    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 09:58

David/136: I haven’t backed off anything.  I may have made things clearer in my descriptions, but I have not changed my opinion on any matter in this thread.

***

Mike/135: Great!  Now we can discuss the good stuff.

I will start a new thread, if it’s ok with everyone, to discuss the particular issue.



#139    DavidS      (see all posts) 2011/06/10 (Fri) @ 10:12

@137 - Fair enough, I am interested to see your opinions in the next thread.  The data available does allow us to place some limits on either how much improvement we can get, or how nicely distributed those extremes have to be.

and @130 again - I definitely agree with your conclusion at the end so I apologize if I seemed to be placing you in a camp in which you do not belong.  I would also say that it doesn’t really matter how deterministic these events are with “perfect” information because we’re never going to get anywhere close to that.


#140          (see all posts) 2011/06/10 (Fri) @ 10:56

@136 I think it was due to carelessness. Do I think it could have been predicted beforehand? Basically only a probability could have been predicted - a binomial distribution - which could be increased with some knowledge about my level of fatigue, care, and distraction. But I do think that there were definite reasons why in the first case I mis-spelled the word and in the second I did not. It’s just I don’t think we ever will, probably ever could know what those reasons are (at least in baseball; for typing we may well be able to get pretty close).


#141    Ben V-L      (see all posts) 2011/06/10 (Fri) @ 15:58

Wow, what a thread.  I probably shouldn’t join a discussion this late in, but oh well....

If I understand some of the statements above about binomials and lower bounds, I’m going to disagree with them.  Let’s use the Pujols case: if we expect his true mean to be .425 OBP, and we’re planning for his next 1000 PA, the binomial distribution would tell us to expect him to reach base 425 +- 15.6 times.

But in doing this calculation we didn’t include the additional knowledge that when he faces pitchers born in Jan-June he has an expected true mean of .600 OBP and for pitchers born in July-Dec he has a .250 OBP.  In his 1000 PA, he has exactly 500 PA against each group, and the binomial distribution applied twice gives 300 +- 10.9 and and 125 +- 9.7 against these separate groups.  Combining these 1000 PA then gives the expectation 425 +- 14.6.

So if you simply applied the binomial distribution to the 1000 PA without knowledge of the birthday effect, you would be overestimating the variance.  The reality is that Pujols (and other players) would be nailing their expected values more than the binomial distribution would predict.  This is an indicator that you’ve got a determinable factor lurking around.


#142    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 16:14

Ben, the discussion is really more basic than this.

***

But, we can have a technical discussion anyway.  Your example can really be exposed even better here: you can have a mean of .500, but split it as .100 and .900, and we’ll get:

500 +/- 15.8

OR

50 +/- 9.5
450 +/- 6.7
= 500 +/- 11.6

Basically, the observed spread is going to be widest at p=.500.  And that’s because at its core, it p*(1-p), and that’s maximized when p = 1-p = 0.5.

So, at its most technical, if you have p=.0001 and p=.9999 as the two possible choices, your observed distribution around each mean is going to be close to 0 in both cases.  And so, now you have a case where p=.500 for the two combined and an observed spread of almost 0, even though half the time we observe .0001 and the other half it’s .9999!

That’s not really what I was talking about in my example.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 14:14
Pete Palmer’s new book: Basic Ball

May 25 13:18
Do pitcher’s reach back for velocity when needed?

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion