THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, June 11, 2008

Chipper: does not compute

By Tangotiger, 03:00 PM

Nate tells us that:

The probability of a .310 hitter getting at least 92 hits in 219 tries is 0.023 percent—that’s a one-in-4,423 chance, for those of you who like your odds Vegas style.

He also presents some more data, which presumes that Chipper is a true .348 hitter.  And in there, he’s got him hitting .438 in one simulation (out of 1000 ran).  If we assume that he gets a total 550 at bats, then .438 means getting 241 hits, or an extra 149 hits from now on, in 331 at bats (or a .450 batting average).

Here’s my problem: if the chance that a true .319 hitter will hit .400 or better in 219 tries is one in 4 thousand, how is it that a true .348 hitter will hit .450 (in 331 at bats no less) just one in 1 thousand?  It’s not.  That number is actually 20 thousand to 1.

The issue, as I see it, is that Nate gave a non-fixed true talent level (which is good) in his sim, so that, presumably, when he came out with a .438 batting average, his true batting average for that sim was say .370 or .380 or whatever.  However, he did NOT give the same possibility in his original 4 thousand to 1 assertion.  In that particular case, he presumed that .310 was a fixed talent level (which is bad).

It’s very possible that the chance of someone who is *probably* a .310 hitter (but could very well be anything around .280 to .340) has a one in one (not four) thousand chance of breaking .400.  I don’t know the numbers.  But, all Nate needs to do is re-run his sim, and keeping all his other parameters the same, except make Chipper a .310 hitter.  It’s on that basis that he should say what his chances are as a .310 hitter, and not say that he has a 4 thousand to 1 shot if he’s a true .310 hitter.

By the way, I ran a quick Marcel.  I’ve got him as a true .330 hitter.  I don’t see how .348 is possible, unless Nate is treating his 219 at bats as if it was 500-600 at bats.


#1    Tangotiger      (see all posts) 2008/06/11 (Wed) @ 15:45

Running my quick Marcel, I get him to be a .346 hitter if I consider his .420 batting average to have occurred in 500 at bats, not 219 at bats.  So, I have that big issue with this as well.  No way we can say he’s a true .348 hitter right now.

As well, looking at his chart, it looks like he made his probability distribution as a normal distribution with .348 as the mid-point.  This would also be false, and you would need to have a left-skewed distribution, since we know he comes from a population where the true talent is mostly concentrated in the .220 to .320 levels.

Since he’s looking at whether he’s going to cross some extreme threshhold, all of this has a big impact on the results.


#2    Guy      (see all posts) 2008/06/11 (Wed) @ 16:23

I think you also need to account for possibility of a skewed sample in terms of quality of opposition.  Nearly 30% of Chipper’s ABs have come against FL (5.00 ERA) or Pit (5.22).  Only 15% of his ABs have come against the 6 lowest-ERA staffs in the NL (AR, ATL, Chi, LA, Phi, StL).  I don’t know the weighted BA-allowed for his opposing pitchers, but wouldn’t surprise me if it’s well above average.


#3    Sam      (see all posts) 2008/06/11 (Wed) @ 17:13

Tango, I don’t get your point here:

“Here’s my problem: if the chance that a true .319 hitter will hit .400 or better in 219 tries is one in 4 thousand, how is it that a true .348 hitter will hit .450 (in 331 at bats no less) just one in 1 thousand?  It’s not.  That number is actually 20 thousand to 1.”

Just because it happened once in a thousand simulations doesn’t mean that the odds of it happening are 1 in 1000.  Isn’t this just a sample size issue caused by the fact that he “only” ran 1000 trials?


#4    Guy      (see all posts) 2008/06/11 (Wed) @ 17:24

I really don’t see how Nate gets to his conclusion.

First, if Pecota had him as .316 pre-season, how can his performance in 219 AB possibly change our estimate of his true talent to .348?  (Not to mention his age.)

Second, the range of possible true talents around .348 can’t possibly be symmetrical (.370 is far less likely than .330).

Third, even if he IS now a .348 hitter, hitting .385 the rest of the way is about 1.5 SD above that.  Throw in the risk of injury, and I still don’t see a 12-13% chance of getting to .400 w/ necessary PAs.


#5          (see all posts) 2008/06/11 (Wed) @ 17:54

Tango, to try to figure out what’s going on with respect to your last point, I’d turn your analysis around a bit: since we know that the .420 sample average is based on 219 at-bats, what kind of prior distribution would we have to assume in order to get a posterior distribution centered around .348?

Well, first, eyeballing the first graph, it looks like it’s centered closer to .315 than to .310, so for simplicity let’s assume a normal distribution around .315.

Since the 219 AB sample has a built in standard deviation of around .031 points of BA, the kind of quick and dirty regression you use in the book suggests that the prior distribution would have to have a standard deviation of around .021 points.

And indeed, if you look again at that first graph, that looks just about like what he has. And here’s how he explains where that prior comes from: “I have generated a normal distribution based on the performance of Chipper’s comparables, after regressing the comparables’ batting averages to the mean.” That’s a bit unclear, but i take it to mean that his comparables’ actual performance was a .315 average with a standard deviation of (something like) .028, which regression would tell us breaks down into .019 due to sample size (600 AB) and .021 due to variation in true talent.

If I have that right, it seems reasonable to me, though whether it’s “right” depends on how much faith you put in PECOTA’s methodology compared to more traditional Marcel-type analysis. I’m an agnostic on that point.


#6    MGL      (see all posts) 2008/06/11 (Wed) @ 18:59

I actually had tries to link to this article but the Book server was busy.  Here is what I wrote:

I have said this before, Bayesian probability is the correct and rigorous way to do projections.  Marcels, our formulas, like PA/(X+PA), for regressing toward the mean, etc., are all approximations of the rigorous Bayesian method.

That being said, the rigorous Bayesian method is quite simple.  As long as we know the exact distribution of true talent in the population that a player comes from, we can combine that with basic binomial (or multi-nomial for things other then BA or OBP) expectations, using basic Bayesian probability methods to come up with a distribution or a mean of a player’s likely true talent.  That is exactly what Nate did.  He also threw in some nice simulation things (the pitchers, the walks, etc.) which were not really necessary, but made his results “nicer” (they look like they were modeled after “real life") and a little more accurate.

There are, however, two problems I can see with his methodology.  What he used as a proxy for the likely distribution of true talent in the population of players from which Chipper belongs, is Pecota’s pre-season estimate of the distribution of Chipper’s likely true BA.  In other words, he skipped a step in the Bayesian model for computing Chipper’s current likely true BA (the “step” is included in the Pecota formulas).  While this is fine from a mathematical perspective, the “problem” is that we don’t know if that distribution is even close to being correct, since we don’t know how Pecota came up with them .

The second problem is that the basic Bayesian model that he uses assumes that a player, like Chipper, does not and can not change his true talent at any given time.  It seems like it does, but it doesn’t.  It assumes that we don’t know what his true talent is (we can estimate it at any time), but that it is static at all times.  It estimates that true talent from two things (which is what a Bayesian model does - it uses two probabilities - one, the prior, or a priori one, which is that Pecota distribution, and two, the current one, which is his current 2008 performance, and the binomial uncertainty surrounding that).

Basically, it (his Bayesian model) says that, given what we think (in this case, he is assuming that it is absolutely true - the Pecota pre-season distribution of Chipper’s likely true talent BA, that is) his chances are of being a .300, .250, .320, 350 (etc, etc. - in fact every possible BA) before the season started and given that he has hit .420 in 219 AB so far, what are the chances that he is a true .300, .320, .250, .410 (etc, etc. again).  Then it uses that new “likely true BA” distribution in the simulation.

Again, that is technically the correct and completely rigorous way to do it, but, as I said, it assumes that the pre-season Pecota “likely true BA” distribution is accurate, and perhaps more importantly, the model assumes that those are the only two probabilities that go into the Bayesian methodology - one, the likely pre-season true BA distribution, and two, his BA in 219 AB so far this year (and of course the binomial distributions surrounding that BA).  What it does NOT include, which is a possibly important parameter, is the distribution of probabilities that a player’s true BA can change during his career, for whatever reasons.  This can drastically change the numbers.  I think it can only change the numbers (his chances of hitting .400 or greater this year) upward in this case.

As I said, it seems like his Bayesian model includes that chance, but it doesn’t.  All of the numbers in the resultant distribution, the ones which go into the simulation, are actually answering the question, “What if we were wrong in or pre-season mean estimate of his true BA?” That is different from, “What are the chances that his true BA has changed?” And will yield different results in both the resultant distribution and in his simulation.  Of course if you include the “chances of a true BA changing” parameter, you are really combining the questions in your final distribution (the final distribution represents the chances that a player’s true BA has changed and the chances that you were “wrong” in your initial distribution).

Let me give you an example.  Let’s say that hitters’ true BA do not change.  So when we use Silver’s Bayesian model and it produces a distribution whose mean is .350 (IOW, that is our mean estimate of his true BA now that he had done so well in 2008), that automatically means that we were “wrong” in our pre-season estimate (which was a distribution with a mean of .310).

But, what if there were a great chance that batters’ true BA can change significantly from year to year, both up and down?  Well, if we estimate now that Chipper is likely a true .350 hitter, there is a significant chance (at least more than zero, which was the chance in the example above) that we were “right” with our pre-season .310 estimate, but that it has changed and he is now a true .350 hitter. 

Basically, if there is a significant chance that batters’ true BA can change a lot from year to year, it makes our estimate of Chipper’s mean true BA now a lot higher than if batters’ BA do NOT change much from year to year.

So, as I said, those set of probabilities are an important part of the Bayesian problem here.  (A Bayesian problem can use more than one set of a priori probabilities).  Yet, Nate does not include that at all.  He assumes that a batter’s true BA never changes, only our estimate of that true BA.

Also, off the top of me head, his “revised” estimate of Chipper’s true mean current BA of .348, seems a little high, given that it started out at .310 before the season started.  Since he used a pure Bayesian model to get to that .348 (and he has no parameter for a true BA actually changing - only our estimate of it), his distribution before the season started must have either been quite wide or skewed towards a higher BA.  Or else, I am just miscalculating in my head what a Marcel would put his current true BA at.


#7    tangotiger      (see all posts) 2008/06/11 (Wed) @ 19:01

Sam/3: My basic point stands that the 4000:1 was calculated based on a static true talent level, while the sim he generated was based on a dynamic true talent level.  So, they are not comparable.

***

Guy/4: exactly what I’m saying.  The age thing too should also not be ignored.

It’s almost certain that the way he did his comparables is that he treated his current 2008 as a full-season.  This may even point to an issue I have with PECOTA in general, in that I’m not even sure if Nate distinguishes between rate stats of a guy with 300 PA and 600 PA.  In any case, it’s almost certain that he’s treating his .420 BA in 2008 as if it came from someone with 500-600 AB.


#8    MGL      (see all posts) 2008/06/11 (Wed) @ 19:14

Here’s my problem: if the chance that a true .319 hitter will hit .400 or better in 219 tries is one in 4 thousand, how is it that a true .348 hitter will hit .450 (in 331 at bats no less) just one in 1 thousand?  It’s not.  That number is actually 20 thousand to 1.

First of all, he does not do it “that” way.  He does not assume that Chipper is a .348 hitter in his sim.  He takes the distribution centered on .348 and has his sim randomly (according to the expected frequencies of course) select points in that distribution.  One time it will be .320. Another time .370.  etc.  Plus he has different pitchers, etc.

I assume that the results of that kind of sim will be close to the result of a binomial distribution around a mean of .348, but it should generate some more extremes if nothing else because he uses different quality pitchers.  That will create more variance of course. 

And more importantly, as Sam said, just because something happened in a sim one time in a 1000 times run does not tell us what the odds of that event occurring are.  It could easily be 200 to 1 or 10,000 to 1.  Now if he ran off 100,000 seasons in the sim and something came up 100 times, then I think we could assume that the odds of that event occurring are near 1 in a 1000.

Anyway, I thought in my original post that the .348 sounded high given .310 going in and only 200 some odd AB so far this year.  But if he did the Bayesian model correctly then the .348 must be correct. But, as I also said in my original post, even if that .348 is correct, that assumes that his original distribution around the .310 (the famous Pecota “confidence intervals") are accurate.  Whether they are or are not is nother story. If Nate says that that is the approximate distribution of similar players after regressing their sample BA toward a mean, then I guess I have to believe him.  The problem of course, it who are you using as “comps.” If you are using players with similar stats, that is circular reasoning and wrong!  You really have to use as your comps players who are similar in any feature you want, but you are not allowed to look at their stats.  Not at all!  You can’t look at power hitters (high HR), etc.  As soon as you look at any other players’ stats and use them as comps, you are using as comps players who also got lucky or unlucky (depending on whether the player you are analyzing is above or below average in a certain stat or stats).  Now, if you are regressing the stats of the players who are comps, then what is the point of using those players as comps?  You might as well just regress the player you are analyzing.  Now, if you want to use similar players with similar stats and then use those same players future (or past) stats as a proxy for your player’s true talent, that is fine.  THat is actually what I assumed that Pecota does, so I am not sure why he talks about “regressing” the stats of the similar players.  Either you use “out of sample” stats for those comps and don’t regress, or if you are using “in sample” stats for similar players, it makes no sense to even use those players since you are just using them as a proxy for the player you are analyzing - you might as well just regress his stats.


#9    Vegas Watch      (see all posts) 2008/06/11 (Wed) @ 19:46

"By the way, I ran a quick Marcel.  I’ve got him as a true .330 hitter.  I don’t see how .348 is possible, unless Nate is treating his 219 at bats as if it was 500-600 at bats.”

Before the season, PECOTA had him at .316 and MARCEL had him at .307.  I did this same exercise using CHONE, PECOTA, ZiPS, and MARCEL in the post I link to in my name.  If you weight in this year’s PAs using .8, .8^2, etc. to his PECOTA, you get .337.  .348 really does seem outrageously high, and that has a huge effect on the result.  When I did it I got 3.6%.

I have a separate question though, although I’m not quite sure how to explain my thought process here.  It arises because we’re dealing with such an extreme mark (.400).  Take two scenarios, one in which he hits .200 over the next two months, another in which he continues to hit .420.  These simulations are not factoring in any changes to his true BA over the rest of the season.  The addition of this would increase his chances of hitting .400, since the .420 example (where he still has a shot at hitting .400) has a larger effect than the .200 example (where he does not).


#10    tangotiger      (see all posts) 2008/06/11 (Wed) @ 20:11

MGL/6, last paragraph: I’d bet anything that my last paragraph in post 7 explains that.

To go from a .310 estimate to a .348 estimate (38 point jump) because he is currently a .420 hitter (110 point jump) makes the regression of our sample as 65% toward the forecasted mean.

There’s no way that 219 at bats will let you regress only 65% toward the preseason forecast, if that preseason forecast was based on the amount of PA that goes into Chipper’s recent career.  It’s got to be at least 80%.  It’d be 80% for OBP.  For BA, it would be even more regression.

Like I said, I’d bet that Nate treated 2008 as if it was 500-600 at bats.


#11    MGL      (see all posts) 2008/06/11 (Wed) @ 20:22

The sim is assuming two things exactly and then uses Bayesian probability to come up with a distribution of true BA for the season.

One, it has a distribution of true BA before the season starts.  For example, it says that there was a 10% chance he is around a .290 hitter, 15% around a .300 hitters, etc.

Two, it has him hitting .420 for the 219 AB (or whatever it is) so far this season.

So we have a player who had x1 probability of being a y1 hitter before the season started and yet hit .420 in 219 AB, a player who had an x2 probability of being a y2 hitter before the season started and hit .420 in 219 AB.  From all of these, we can compute the probability that now he is an x1 hitter, x2 hitter, x3 hitter, etc.

That is a simple Bayesian problem.

From the distribution that results he simply has the simulator pick a BA randomly using the chance that Chipper is a true whatever that batting average is, for each AB.

That is the correct way to do it.  Actually he should run each season separately using one BA for that whole season and weight the results by the chances that Chipper presently has that particular true BA.

IOW, if there is a 10% chance now that he is a .330 hitter, he should run a 1000 season sim using .330 as Chipper’s true BA.  If there is a 15% chance that he is a true .340 hitter, he should run another 1000 season sim with Chipper as a .340 hitter.  Then he should average all the results (he can’t choose every possible BA, so he will have to choose maybe 10 or 20 possible ones), weighting each result by the chance that Chipper has that BA as his true BA.

The way he does it, by doing just one 1000 season sim and randomly choosing a BA for each AB (I think that is the way he does it) should yield the same results, only it will be for a much smaller sample size.

Anyway, he is NOT, as I already said, including in the model or any of the calculations the chance that Chipper’s true BA actually changes over time - only his (the model’s actually) estimate of that true BA).  If he included the chance that his true BA actually changes, he would have to specify in the model the chance that his true BA changes and by how much and how often.  That would be difficult.  I’m not sure how you would do that.  If it is true that player BA changes, then it is more likely that his true BA is higher now than either his model or a regular Marcel thinks it is.


#12    tangotiger      (see all posts) 2008/06/11 (Wed) @ 20:28

I don’t think anyone is disputing you mgl… sounds like you are arguing with yourself!

The two main points still stand: a .348 forecast seems much too high.  VegasWatch probably has the best estimate for PECOTA, not Nate.  When VegasWatch did the Marcel forecast as of today, he ended up with .330, which was the same one I got real quick.  So, I’m presuming he did PECOTA right when he says it should be .337.  And that is a far cry from .348.

The second point is that it doesn’t look like Nate used the population distribution of MLB, since we’d expect Chipper’s distribution to have a left-skew using Bayes.

The second point I’m not too concerned with.  The first however, is a huge problem.


#13    Vegas Watch      (see all posts) 2008/06/11 (Wed) @ 20:49

2005 PA- 432 * (0.8^3) = 221
2006 PA- 477 * (0.8^2) = 305
2007 PA- 600 * (0.8^1) = 480

221+305+480=1006

This year he has a .420 BA in 260 PA.

So, .316*(1006/1266) + .420*(260/1266) = .337

That’s all I did.  He would need to hit .420 for 188 more ABs (448 total) to get to .348.


#14          (see all posts) 2008/06/11 (Wed) @ 20:59

.348 was what Chipper was hitting from the start of 2006 thru yesterday. 2004 & 2005 were injury years, 2004 had a career low BA although still with power, so it might be reasonable to say that .348 is Chipper’ “true” level at this moment, over the last 2.4 seasons.

He’s got a .423 BABIP, which would be unprecedented to keep up. His past two seasons were .339 and .348, which are the 2nd & 3rd highest of his career. .330 in 2001 and .349 in 2002 were his only other seasons above .320, although he’s only been below .314 four times. Since the beginning of 2006, it’s .360. That level may be sustainable - Jeter is career .361, Ichiro .357, Miguel Cabrera .354.

This year Chipper also has a career low SO% of .092, tying 2000. Nine times he’s been between .118 and .131. That’s about 8 more balls in play so far this season.

Adding 400 more ABs at .348 to what he’s done already in the 1st 200 AB gives him a projected .374 at the end of the season.


#15    Guy      (see all posts) 2008/06/11 (Wed) @ 21:33

MGL:
There are two related problems.  One is the normal distribution Nate uses for his talent estimate.  It assumes that it’s just as likely Chipper has been an unlucky .330 career hitter as a lucky .290 hitter, which can’t be true.  That leads to another normal curve for the new talent estimate, which says a new true talent of .370 is just as likely as .330—you can’t possibly agree with that. 

Or look at it this way:  Based on 6898 AB, Pecota said Chipper was a .316 hitter.  He’s hit 100 points higher for 219 AB, and that raised the talent estimate by 32 points.  That’s what you’d get if you just weighted his 2008 partial season at 1x, and the entire rest of his career at 2x! 

I think Nate’s preseason estimate signficantly overestimated the likelihood Chipper was a true .330+ hitter, and the new estimate hugely overestimates the probability he is a true .360+ hitter.....


#16    tangotiger      (see all posts) 2008/06/11 (Wed) @ 22:16

Marcel only looked at starting with 2005:

2005: .296/.412/.556 (109 games)
2006: .324/.409/.596 (110 games)
2007: .337/.425/.604 (134 games)

2005 does not stand out as a season that needs to be discarded due to injury.

Why start with 2006?  That’s cherry-picking.  He missed 52 games, which I will presume is because he was injured in 2006.  So, I see no reason to start Chipper’s clock at 2006 as opposed to 2005.

Even if you somehow justify taking his 1100 AB since 2006, those still need to be regressed.  I’ll guess the AB regression required x/(x+AB), where x=300.  So, you need a 20% regression.  And if he was a .348 since 2006, that puts him as a true .332 hitter.

As far as I’m concerned, he’s around a true .330 hitter, give or take.  0.348 is not justifiable.


#17    MGL      (see all posts) 2008/06/12 (Thu) @ 00:27

No, I’m not arguing with anyone. Just thinking out loud.

And yes, Brian’s “analysis” is not right.

Discarding seasons because you think a player was injured is not right.  You might tweak them but you better be careful.  I don’t like doing that at all.  Fist of all, players play injured all the time and that should be included in a projection (the chance that they are injured and still play in the future).  Second of all, if you discard or discount any seasons where you think a player was playing hurt AND his stats were depressed, you are creating a biased sample of historical stats.  You have to also discount or discard seasons where a player is playing hurt and had GOOD stats (he got lucky!).  No one does that! 

Plus some people seem to think that if a player has a high sustained BA or whatever stats you are looking at, that somehow you don’t have to regress it.  Of course you do.  If Chipper or any other batter were hitting at a consistent .348 level for 3 years, you would still have to estimate his true BA at something like .320 or .330 (I don’t know how much to regress off the top of my head).

The question with the .348 that Nate comes up with is two fold:  One, is his distribution of likely BA before the season started, the one which centered on .316, correct?

I don’t know. I’m not sure if it can be perfectly normal, which it seems to be from Nate’s graph.  It is certainly symmetrical around the .316.  We know that the distribution of BA talent (not weighted by playing time) in baseball is heavily left-skewed, but this is not that kind of distribution.  Whether this distribution can be symmetrical, again, I don’t know.  I’d have to think about it.

The second part of the question is whether Nate did the Bayesian calculations correctly.  If the pre-season Chipper “likely true BA” is correct, then we can come up with the distribution of likely current true BA quite easily, given his current .419 BA in 222 AB.  I can do it one BA at a time, using a computer.  There is probably a cleaner way to do it using the whole distribution at once (assuming that it is in fact normal), but I don’t know how to do that.  In any case, if he did the Bayesian math correctly than his .348 estimate of the current true BA mean is correct.  If it is wrong, then his pre-season distribution must be wrong.

I doubt that he made a mistake in the Bayesian calculation.  So most of you who are disputing the .348 must be disputing the validity of the pre-season distribution.  What do you think it should look like and how would that affect the resultant distribution (of his likely current possible true BA’s) after doing the correct Bayesian calculations?


#18    Lion      (see all posts) 2008/06/12 (Thu) @ 03:28

Since Chipper’s average for any given plate appearance will be slightly higher or lower than his true average based on park, opposing pitcher, game state, all estimates made using a binomial distribution will be a little optimistic.  Does anyone know if this would make a significant difference?  If it’s unclear what I mean, if a player’s true average is .990 at home and .010 on the road, but the coach starts him every game anyway, he will bat around .500 over the course of 200 at bats almost inevitably, while a player whose true average is .500 home and away, he will have much more variance for 200 at bat average.


#19          (see all posts) 2008/06/12 (Thu) @ 04:25

MGL, I take your advice, although before that I was going to agree with Tango in #16.

Before I got into the conversation, I did want to take a look at Chipper’s year by year stats. The first thing that jumped out at me is where Nate got his .348. I agree that starting at 2006 is cherry-picking, but both you guys have written before about how as you get further back in time the stats can be discounted. I wrote that it “might be reasonable” to do it that way.

I know the math formulas tell you how much to regress, but has anyone here (I know, I haven’t finsihed The Book yet) done an empirical study? When I was working on park factors, it seemed I wasn’t regressing enough. I lnew what the upper and lower bounds should be, but even after regression the outliers were dominated by the one and two year samples. I remember The Book says to regress not to a population mean, but to the player’s mean, which can be hard to establish. And if a player’s mean can change over time, how can we be sure that Chipper’s a .330 hitter (give or take) instead of mid .340’s? intuitively, 2.4 seasons does seem fairly solid, although I do prefer 1500-1800 PAs (3 full seasons).


#20          (see all posts) 2008/06/12 (Thu) @ 05:43

This is a very similar argument to that on BP’s Playoff Odds Report. We have long and short term performance data, including what’s in the books so far for this eason, and then we attempt to predict what will occur the remainder of the season, and then add it to the performance so far to get the end of season predicted total.

I don’t want to do “arbitrary” cherry picking, but in examining a player’s career, can a trained eye detect certain patterns or influences? Personally, I look at the BA/OB/SA, then also BABIP, ISO, BB%, SO%.

In Chipper’s case, 2004 & 2005 (and before) will be discounted already because of the amount of time that has passed since now to then. Chipper has been consistent over his career in most categories, but how important are the old seasons compared to the last two in predicting the rest of the season, or next season? Projections put Chipper’s extablished BA at .316, which is lower than each of his last two seasons. Now that 2008 so far is considerably above that, I think it’s good evidence that .316 is too low and maybe there is a new higher level. A .423 BABIP has been shown historically to not be sustainable, but given his recent performance, and what others have shown is possible, .350ish is likely.

One of the things I noticed in Chipper’s 2008 stats were that his SO% dropped while his BABIP went up (very much). I have seen this happen to other players (Steve Pearce 2007 & Nate McLouth 2004 in the minors quickly come to mind) but it doesn’t carry over. It might not last for a full season. Looking at a player’s career, I would see this as an outlier.

Tying the last two points together, Jay Bruce this year has also cut his SO% while boosting his BABIP. Looking at his last three seasons combined he projects a high SO% with a high BABIP. The only player in ML history to have a SO% as high or higher with a BABIP as high or higher is Ryan Howard. That combination of skills is possible, but very unlikely. Bruce’s BA line translates to around .295, but all his comps in BA, BABIP, ISO, BB% & SO% all hit in the 265-275 range, so I don’t think he can sustain the BA unless he cuts his SO%, or is a very special talent.

Should we not discount injuries? In Chipper’s case they were several years ago, just giving further reason in addition to elapsed time to discount those years. Jason Bay had a very consistent performance for three seasons. Before 2007, he had knee surgery, and performed in line with his previous record for two months, then tanked the last four months of the seasons. There’s evidence that he wasn’t running well and wasn’t driving the ball. He had many questions coming in to 2008. Is the knee healed? Assuming it is, will he return to the pre-injury level, or should we expect something in between? I felt it was an either/or - if he was healthy, he should perform like he did previously (which he has done so far). If the injury lingers, then he should perform worse.


#21    Guy      (see all posts) 2008/06/12 (Thu) @ 05:47

MGL:  If you look at Nate’s graph for the pre-season estimate, it looks like the probability of Chipper being a true .340+ hitter was something like 20% (assuming each line = 5%, as on his last chart).  I don’t know the correct number, but that seems very high for a guy who has never hit .340 in a season over 13 seasons, and has a .310 average.  I assume Pecota throws out most of that prior data, and it may not improve the mean forcast to use it, but it should impact the size of the error bands.  And as Tango notes, the range should have a left skew. So I’d guess the pre-2008 likelihood Chipper was a true .340 is closer to 2% than 20%.

Unless.... we think what Chipper hit in 1998 is now irrelevant.  If players can make large changes in true talent late in their career, then arguably we don’t care what Chipper hit more than 3 or 4 years ago.  But such changes seem pretty rare to me, so I wouldn’t disregard 13 years of data.


#22    David Gassko      (see all posts) 2008/06/12 (Thu) @ 09:15

Guy, I don’t think that’s right. Over 480 AB, the random variance for a player with a true .316 batting average is around .021, meaning that his chances of hitting .340 this season (which is really what PECOTA is telling you) were around 15%. And that’s if we have his true talent exactly pegged. If we add some variance due to our imperfect knowledge of Chipper’s true talent (though with his sample size and consistency, I do think we have a pretty good idea of just what kind of hitter he is), 20% does not seem out of the realm of possibility.

The problem as far as I can tell is that PECOTA is not giving us a range of his true talent, but simply of his outcomes. So even if we *knew* that Chipper was a true .316 hitter, Nate’s exercise would arrive at a different answer, probably not far from .348.


#23    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 09:56

Actually DAvid, I think you are wrong and Guy is right.

That first chart is Nate estimating what Chipper’s true rate is, not his sample.  The chart is showing Chipper’s true rate with a mean of around .315 and 1 SD = .020 (eyeballing it).  So, Nate is saying that there’s a 16% chance that Chipper’s true rate, as of Apr 1, 2008, was at least .335.

That seems rather high to me.

It seems to me that what Nate did was not start with the correct population distribution of BA, which I will guess is around 1 SD = .025.  He probably used something much higher (i.e., based on sample BA, rather than regressing the sample BA).

I don’t know why Nate “spared” us from all the details.  This is a webpage, and so, we are not constrained to space.  Furthermore, simply a link to a page that showed his details would have spared us from all this second-guessing and reverse-engineering.

***

Here is Pinto’s chart:
http://www.baseballmusings.com/archives/027174.php


#24    David Gassko      (see all posts) 2008/06/12 (Thu) @ 10:26

Tango/Guy, you’re right. I should have re-read the article before posting. In that case, I simply can’t believe that PECOTA is giving us the right distribution of outcomes for Jones. No way that we couldn’t predict Chipper’s true batting average with a standard error of more than 20 points...I think.


#25    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 11:52

Bayes and Regression

I selected all players with at least 200 at bats in any season since 1993.  That gives me 4407 hitters.

For each hitter, I figured out the standard deviation that the binomial would expect given his number of AB.  Then, I figured out how many standard deviations his actual batting average was from the mean (of .267): his z-score.  I then took the standard deviation of the z-score.  It was 1.39.

If batting average was purely random, we’d expect 1.00.  As it stands, it was definitely not random.  While parks and uneven competition can contribute to a larger than 1.00 z-score, most of it is tied in to the true talent distribution of players.

r = 1-(1/1.39)^2 = .482

So, if you get a z-score of 1.39 from a group of players, that would imply that a year-to-year correlation of these players would give you r=.48.

The average number of AB of this group of players was 419 (harmonic mean of 376).

So:
r = AB / (AB+419) = .48, making AB = 449.

If we use the harmonic mean (376), then AB = 403

Our correlation equation becomes one of:
r = AB / (AB+449)
r = AB / (AB+403)

Similarly, we could have taken the unweighted standard deviation of our 4407 batting averages to get: one SD = .0303

The average of the binomial SD for batting average is one SD = .0225

Remembering that:
variance observed = variance true + variance binomial
.0303^2 = variance true + .0225^2

variance true implies one SD = .0203

Our correlation for this group of players is therefore:
r=(.0203/.0303)^2= .45

Fairly close to what we got.  Note that in our case here, we did not do any weighting, and therefore, we expected a somewhat lower number.

Anyway, going back to treating this as our equation going forward:
r = AB / (AB+400)

Then we know that at AB=400, correlation = .50, which implies that the standard deviation from the binomial equals the true standard deviation of our population of batting averages.

Since the binomial at AB=400 is one SD = .022, then we know that the true spread of our population of MLB players has a true talent of 1 SD = .022.

If the mean is .267, then this implies that we have a true distribution of players of .267 +/- one SD = .022.  95% of players will be roughly .267 +/- .044.

Now, the Bayes question to ask is: given that we have this known distribution of .267 +/- oneSD=.022, then what is the likelihood that someone will hit .420 in 220 AB was really the guy at .220?  At .225?  At .26493022?  At .3221?  At .38764?

You figure out the probability of each of those answers, multiply by the frequency of each of those talent levels existing and multiply by the batting average in question, and you get Chipper’s Bayes-generated true talent batting average.

Now, we also have further information.  The guy who hit .420 in 220 AB is not some random player, but he also happened to come in with a career .310 (or whatever) BA, or if we treat this as some Marcel weighted number, say we also know that he got 320 hits in his previous 1000 (weighted) at bats.  You add that (say 410 hits in 1220 at bats), and you ask the question again.

I’m going to lunch, and someone else can pick it up from here.  But, I can pretty much guarantee that the answer will not be .348.

For it to be .348, I’m guessing the true spread in BA would have to be .030, or even higher, or that his past performance is weighted less than I am doing here.  And I think this is where we have the disconnect.


#26          (see all posts) 2008/06/12 (Thu) @ 12:38

I actually do understand all the math, but I think the problem is we are aiming at a moving target.

Players do change. I’m browsing the Lahman batting database, with my query adding lots of calculated columns (Age, BABIP, ISO, BB%, SO%, etc). With the luxury of hindsight, I can see where players talent changed from one level to another, for whatever reason (got fat, back gave out, snorted coke, shot up, finally got healthy).

The problem we have is identify when the change occurs. If a player has been consistent (within a SD) for five years, then does much better or much worse, is this a one year blip, or are we at a new level? Is the change fluid, evolutionary, or does it come in spurts, up or down (from one plateau to another).

If Chipper is projected to be a .315 hitter, but for 2.4 seasons he’s been hitting .348, and for 200+ ABs he’s over .400, then maybe he’s not really a .315 hitter and his “true” BA skill today is not the same as what it was one or two or three years ago.

Have we asked how high must his true talent level be to produce the results we are seeing recently?


#27    Pizza Cutter      (see all posts) 2008/06/12 (Thu) @ 12:58

FWIW, I’ve got the split-half reliability for 250 PA’s at .328 and for 300 at .351.  (Chipper’s had 264 as of right now… call it .330?)

The NL is hitting .259 as a league.

.33 * .419 + .67 * .259 = .312, which is around CJ’s career BA.  However, he’s also had seasons in the .330 range the last two years, suggesting that he’s probably closer to a .330 hitter.  Also, Chipper’s batted ball profile (something that stablizes much quicker than BA) is different this year, with more line drives.  So, it’s likely that he’s a different hitter this year.

I looked at this subject in the StatSpeak roundtable last week.  Even given the most optimistic projection (he’s a true .350 hitter), I have his odds at 1 in 26 of finishing the season above .400.


#28    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 13:07

Brian/26:

We have no idea what his true BA is.  All we can do is estimate what it might be.  And, entering 2008, we thought it was .310.  We now think it’s .330.

IMPORTANT: we don’t even know if his true talent itself has changed.  It could simply have always been .330 for the last 2.5 years, but we didn’t have enough information to proclaim that.  OR, he could actually have improved his talent this year, so much, that while we were rock-solid sure he was .310 entering 2008, we are now rock-solid sure he’s .330 today.

But, all of that is irrelevant.  All we have to go on, god notwithstanding, is our best estimate of what his true talent is today, and our uncertainty level around that estimate.

And, I would like to see the evidence that he is now a true .348 hitter.  I don’t want to be “spared” that detail.  I don’t want to give Nate, who does otherwise great work, any benefit of the doubt on this one.  It’s not up to me to prove that he is not a .348 hitter, but up to Nate to prove that he is. 

And, of all the things he wrote in the article, the two things that would make him go from a 12% chance of hitting .400 to a 1% chance of hitting .400 had nothing to do with all the extra niceties that he did, but rather:
a) his estimate of .348 being his true talent,
b) the uncertainty level of this .348 being so incredibly high, and so symmetrical around that .348, that he could possibly hit .438 one time in 1000 and hit .420 or better what looks to me like 1% of the time.  And this had nothing to do with running the sim “only” 1000 times.  Ten times out of 1000, he’s hitting at least .420.

I will remain in a state of disbelief, as I await evidence.


#29    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 13:18

FWIW, I’ve got the split-half reliability for 250 PA’s at .328 and for 300 at .351.  (Chipper’s had 264 as of right now… call it .330?)

I *plead* with you Pizza to, in addition, report the average number of PA.  As it is, since 250 PA in half a season is so high, I will assume that the average is around 300 PA, and the one for 300 PA is an average of 325.

Proceeding on that basis:
r = x/(x+300)=.328
implies x = 615

r = x/(x+325)=.351
implies x = 601

So, that’s fairly consistent, that you would get:
r = .50, when number of PA = 600 (at bats = 540 or so).

Note, I get similar numbers when I use a threshhold of 400 AB (z-Score = 1.41, implying r=.50 when average AB = 520).  That is, I get virtually identical results as Pizza.

However, I think here we see a huge selective sampling issue.  By discarding so much of the players from our sample, it makes it look like there’s little differentiation among players.  And, it’s true!  At the very highest level of playing time, there’s little variance among guys in batting average.

But, do we want to say that Chipper is necessarily drawn from this group of hitters?  We could.  But, then we force him to not allow him to be drawn from the possible .330 talent level.

On the other hand, by lowering our threshhold to include as many “reasonable” MLB players that we can, our spread in talent level increases (while the mean decreases).

(Pizza: in this case, you would have to report what the mean PA is of your population.  I am guessing it’ll be around .280-.285.)

So, either Chipper is drawn from a population with a mean of .267 with 1 SD = .022, or he’s draw from a population of .282 with 1 SD = ... I dunno… say .014.


#30    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 13:24

I sent Nate an email asking him to engage us here.


#31          (see all posts) 2008/06/12 (Thu) @ 13:53

Did no one read my post (#5)? I think what Nate did was pretty clear.


#32    Nate Silver      (see all posts) 2008/06/12 (Thu) @ 14:09

Don’t have much time, but just to clarify a couple of things:

The .348 average does not really come from PECOTA.  The only thing that PECOTA was used for was to give a rough outline of the true talent distribution around Chipper’s batting average forecast.  But the performance of the individual comparables was regressed to the mean in order to estimate their true level of talent, and the distribution was normalized, which is not something that we do for normal PECOTAs.

There are a lot of different ways we could have estimated the true talent distribution without any reference to the PECOTAs.  Maybe some of you can experiment with this.  My guess is that the finding will actually be fairly robust, as no matter what the shape of the bell curve, the Bayesian mathematics are really going to accentuate the tail of the distribution.  On the other hand, if the distribution is asymmetrical, that might change the findings significantly (I don’t really know).


#33    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 14:14

Andeux/31:

I think what he tried to do was clear.  I think it’s also clear in what I am doing that I’m not going to get a 13% chance to break .400.

Regardless, he further compounds the talent level issue by giving such a wide uncertainty level around that .348.  He gives him a decent chance that he’s at least a .370 hitter.  On that basis, of course he’ll get a 13% chance of breaking .400.

I don’t think that Nate started with the league population mean of .267 +/- .022 (actually, probably more like .262 this year).  It seems pretty clear to me that he didn’t.  There’s simply no way he could get that .348 mean with the distribution that he’s showing (his second to last chart) if that were true.  He’s claiming that Jones is nearly 4 standard deviations from the mean, with a decent chance that he is 5 standard deviations from the population mean.  That’s a little hard to accept, isn’t it?


#34    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 14:24

Nate, thanks for stopping by.  Perhaps for his sake, if Nate has a chance to come back, you guys can precede your post by “NATE:”, if it’s a specific question that he can answer.

NATE: your uncertainty level around the .348 true mean seems to be around 1 SD = .020, and fairly symmetrical.  Can you confirm what the SD actually is?


#35    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 15:21

Using Bayes, with the following assumptions:
.267 = pop mean
.022 = standard dev of pop
219 = trials
92 = successes

I get a mean forecast of .321 for the batting average.

That is, if we know nothing at all about a hitter, other than he is 92 for 219, Bayes says he’s .321, with an uncertainty level of 1 SD = .017.

What if we add in 900 at bats of .310 hitting?  Bayes says .320.  Furthermore, the uncertainty level around this .320 is 1 SD = .012.

***

How do we get him to be a true .330 hitter?  The population spread has to be 1 SD = .025.  If that is the case, then Bayes says our uncertainty is 1 SD = .019.

If this hitter also happens to add in 900 at bats at .310, then Bayes brings him all the way down to .324 as a true hitter, with an uncertainty of that estimate of 1 SD = .012.

***

How do we make him a true .348 hitter?  Our population spread must be 1 SD = .031.  That is impossible, since we know that the *observed* population spread is 1 SD = .030.  Our uncertainty level in this case is 1 SD = .024.

***

I will have to retract one thing, and that is that the spread of our true talent uncertainty does follow very closely a symmetrical distribution.  So, I’ll have to take back that objection.

The rest of what I’ve said stands.


#36    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 15:26

By the way, if we add in a .267 league batting average of 400 at bats to 92 for 219, we get 199 hits in 619 at bats, for a batting average of .321.

Therefore, our regression toward the mean equation works!  You add in 400 at bats of average batting average (.267) to your players total sample, and you get .321.  The more precise Bayes says that a 92-219 player probably is a true .321 hitter.

Sweet, right?


#37    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 15:37

More by the way: the reason that Marcel said that he was a true .330 hitter, as opposed to the true .320 hitter, is that I’ve been using a standard regression model of adding league average performance of 240 PA.  While this works out good overall, it doesn’t work for individual components.  For example, for Ks and walks, you should use a much smaller number.  For OBP a bit smaller.  For batting average, as we’ve found, you need to add 400 AB.

So, if you’ve got Chipper with 92 for 219 today, and you credit his 2006/07 performance of 306 for 924 (.331) at say 70% for 214 for 647, and you add in .267 on 400 AB, you get:
AB, H
219, 92
647, 214
400, 107
---- ---
1266, 413 = .326

So, that I think becomes our best guess, that Chipper, today, is a true .326 hitter.

We can do further breakdowns by looking at strikeouts, linedrive rates, etc.  We’d need to do the same exercise, in terms of knowing the true spread among the population in each subcomponent.

Regardless, I doubt you’ll get him up to anything other than a bit over .330, if even.


#38    Pizza Cutter      (see all posts) 2008/06/12 (Thu) @ 15:53

Tango, when I quote those split-half reliability coefficients, they are generated by articificially restricting the sample for each player to X number of plate appearances.  So, when I say 250 plate appearances, I’m taking the first 500 PA’s for each player, assuming he has that many over the two year stretch that I’m looking at, splitting them in half by evens and odds, and running a correlation between the two numbers.  So, when you see 250, I’m comparing a sample of exactly 250 PA to exactly 250 PA. 

The selectivity of that sample is driven down by the fact that in order to make it into my sample for 250, a player only had to amass 500 or 600 PA over a two year period.  Still a selective sample, but at least a little better methodologically.  I’m using data from 2001-2006 as my source.

I’m actually working to generate coefficients down to individual levels of PA (so what’s the split half at 228 PA or 357), but those charts take forever for my computer to generate.


#39    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 16:08

513, 555 therefore are my new numbers for Pizza Cutter for PA, respectively.


#40    MGL      (see all posts) 2008/06/12 (Thu) @ 17:21

But the performance of the individual comparables was regressed to the mean in order to estimate their true level of talent, and the distribution was normalized, which is not something that we do for normal PECOTAs.

There are a lot of different ways we could have estimated the true talent distribution without any reference to the PECOTAs.

I don’t know exactly what he means in the first paragraph above, but all you really have to do to generate these true talent distribution curves is to take all players in history who have a similar profile at a similar age, and other similar characteristics (for example, in this case, switch hitters), and then see how each one does over some subsequent time period.  I have always assumed that is what Pecota does anyway, more or less.

If you take the average of all those “subsequent” performances, you will get your player’s mean true talent of course.  If you take those individual performances of all the similar players and regress those toward the mean, you will get your distribution.

For example, let’s say that there were 130 players similar to Chipper as of the end of last year, and all those players had at least one more year of playing time left.  If you take all those players and look at their year after you identify them as similar players, the mean of those players’ performance will be exactly equal to Chipper’s expected performance in 2008, essentially his pre-season projection.

If you look at each of those 130 players individually and regress their one-year stats to get their “true stats” for that one year, then you can come up with a nice distribution of Chipper’s likely true talent.  For example, if 20 of those 130 similar players, or 15%, hit .320 in that next year, and you estimate them to be true .312 hitters in that year, then Chipper has a 15% chance of being a true .312 hitter.  Etc.

The problem with that methodology, and it is a big problem, I think, is multi-fold.  One, you are unlikely to have much of a sample size of similar players.  And the more similar you want to make them, the smaller the sample.  And with a small sample, you will have lots of noise in creating that distribution and even the mean.  And if you want to have a large sample, you will probably end up including players who are not all that similar to your player in question, which makes your conclusions (the mean and distribution of your true talent estimate) pretty unreliable.  Two, all kinds of things change over the years, not the least of which is player fitness and aging.  Do you really want to use comps from the 1950’s for today’s players?  I think that Pecota uses a fairly recent time period for their players comps, but again, that creates a small sample size.  Three, for your distribution (the mean is no problem), how do you go about estimating each player’s true performance in that one year?  You would have to do the same thing for those players as you are trying to do for your player in question!  That would be a mess.  If you are just going to do a Marcel on them then you might as well just do a Marcel on your player in question!


#41          (see all posts) 2008/06/12 (Thu) @ 18:05

Tom #37 - Why regress with 400 PA of league avg BA when we have 13 prior years of Jones own performance. I do remember from The Book you saying about being able to regress to the individual players’s prior established record (presumably of enough length to not require regression of it’s own).

Replacing .267 league BA with Chipper’s .303 from 1993-2005
AB, H
219, 92
647, 214
400, 121
---- ---
1266, 427 = .337

I am intuitively quite comfortable with that.


#42    tangotiger      (see all posts) 2008/06/12 (Thu) @ 19:08

Well, your intuition is wrong, unfortunately.  You take all players with 5000 or 10000 career PA, and you will find that players with 3000 PA will forecast just as well.  That is, all those extra PAs really provide almost no additional information.

Read here for one such study:
http://tangotiger.net/banner.html

Basically, you need regression exactly as I have laid out.

***

In any case, I have provided the Bayesian solution. (Regression is really just a shortcut for Bayes.) And in the Bayesian solution, a guy who is hitting .420 after 219 at bats will receive almost no additional information from us to determine how good a hitter he is.  That is, regardless of how bad a hitter he has been prior to Apr 1, 2008, the performance of .420 is so extraordinary that we don’t need any more information!  The Bayes result, based strictly on 219 AB, gave us a true talent estimate of .321.  And when I tried different amount of AB of .310 hitter added to that, it always stayed at .321. 

So, regardless of intuition or comfort, you can’t escape the fact that Chipper is a .321 hitter.

(Note that since the BA across the league is down around 5 points, all of our original estimates need to go down 5 points to make the fair comparison.  This is an additional reason to make the .348 estimate even more unlikely.)


#43    Guy      (see all posts) 2008/06/12 (Thu) @ 22:08

It seems to me there’s an implicit assumption in the Bayesian calculation, that Chipper’s 92/219 was an unbiased sample.  But I think it’s likely that it came against substantially below-avg pitching.  If so, that makes it less likely he has make a big jump in talent. 

(BTW, he went 1-for-5 against Zambrano tonight, and dropped his average by 5 points.  This is really hard.  I’d gladly take the “no” bet with 8:1 odds....)


#44    Mike Fast      (see all posts) 2008/06/12 (Thu) @ 22:40

Guy, according to the Baseball Prospectus stats page, Chipper has faced very close to average pitchers this year.  That is assuming I’m interpreting their page correctly, which I’m not sure about.

http://www.baseballprospectus.com/statistics/sortable/index.php?cid=350487


#45    fifth of      (see all posts) 2008/06/13 (Fri) @ 01:34

Guy/45: I basically agree but it is a lot of work to do an accurate accounting of quality of opposition. BP has very simple reports, and they show Chipper’s opposition yielding .246/.336/.386, which would mean his pitchers faced have been a bit better than average. Then again, that’s an incomplete way to do it and it wouldn’t surprise me if you were more correct than that number indicated.

(Cliff Lee’s opponents: .250/.320/.381. Just throwing that out there.)


#46    Guy      (see all posts) 2008/06/13 (Fri) @ 07:12

5th:  Looks like that isn’t a factor for Chipper.  (Though I’m a bit skeptical of the BP numbers:  I checked a few teams, and several—including Braves—appear to have faced good pitchers.  I wonder if they’re weighting pitchers properly.)

Still, I think in general the possibility of skewed samples after 60 games (home/away, quality of pitchers, platoon edge) does somewhat increase the probability of extreme performances in absence of large change in true talent.


#47    tangotiger      (see all posts) 2008/06/13 (Fri) @ 09:05

The BA is the only “quality” we are looking at here.  And, I don’t know how the opp BA is calculated.  I presume it is weighted based on the number of AB (good) and I presume they are using to-date (2008) performance (bad).  You really want to know the pitchers’ true talent level in BA.

Guy’s basic point stands that we can have a bias here.


#48    Tangotiger      (see all posts) 2008/06/13 (Fri) @ 11:24

Mike/46 was marked for moderation and has been unqueued.

***

Mike, that does not look very average to me…



#50    Tangotiger      (see all posts) 2008/06/20 (Fri) @ 11:46

In post 35, I said this:

Using Bayes, with the following assumptions:
.267 = pop mean
.022 = standard dev of pop
219 = trials
92 = successes

I get a mean forecast of .321 for the batting average.

That is, if we know nothing at all about a hitter, other than he is 92 for 219, Bayes says he’s .321, with an uncertainty level of 1 SD = .017.

What if we add in 900 at bats of .310 hitting?  Bayes says .320.  Furthermore, the uncertainty level around this .320 is 1 SD = .012.

Repeating this exercise for Chipper now (98/249, .394), and Bayes tells me that he’s a true .317 hitter.

This is what should happen.  Chipper is still Chipper.  Our evaluation of him should not change drastically, if he goes on a hot or cold streak.  Bayes had him as a true .321 hitter when he was hitting .420, and Bayes has him as a true .317 hitter after going on a 6/30 (.200) run.

A regression toward the mean model would give you similar results.  I suspect that Nate’s process has him dropping substantially from the .348 true talent level to probably something like .335.

One good way to evaluate these models is to see how “tight” the true rate remains.  We don’t want to see his true rate bouncing up and down all over the place, only to settle at .320 at the end of the year, if some other model can keep it tight around .320 the whole time.

You can similarly evaluate post-season odds this way, as you don’t want the Rays to have a 90% chance of making the playoffs followed by 30%, then 80%, then 10%, then 100%.  Well, you could if the case warrants it.  But, if you have a competing system that has less variability and leads to the same conclusion, then the competing system is probably better.


#51    Rally      (see all posts) 2008/06/20 (Fri) @ 13:29

I ran an updated CHONE projection last week for Larry and got .318, pretty close.

Can you explain the calculation process for the Bayesian method used above?  If I change those 4 inputs around, how would I calculate the BA estimate?


#52    Rally      (see all posts) 2008/06/20 (Fri) @ 13:33

Never mind, I see post #25.  It might take awhile to understand it though.


#53    Tangotiger      (see all posts) 2008/06/20 (Fri) @ 14:39

In Excel, you’ll want:
NORMDIST

I find the frequency of some mean with some standard deviation happening between a .1795 and .1805 batting average (midpoint of .180), and go up in steps of .001 all the way up to .450.

If you multiply the frequency by the sample rate and add them all up, you get the mean you started with.

BINOMDIST

You use this function to figure out, for each batting average from .180 to .450 could have produced a 98/249 or better average.

Multiply this probability by the frequency calculated earlier to give you your distribution of true rates.

And from there, you do a weighted average, and you get the Bayes batting average.

Play around with it, and if you get stuck, let me know.


#54    MGL      (see all posts) 2008/06/21 (Sat) @ 12:56

Since you are assuming normal distributions for both probabilities, isn’t there a short cut to doing the Bayes calculations, rather than taking every possible BA (in increments, as you do it)?  Using calculus?


#55    Tangotiger      (see all posts) 2008/07/09 (Wed) @ 15:08

Chipper is now 108/285 (.379).  Bayes now has him as a true .315 hitter.

If you add 900 AB of .310 hitter, regression to the mean has him a .317 hitter.

Like I said, a good way to test a system is for the talent level of a majority of the players to remain fairly flat season-to-season.

The highest we got Bayes on Chipper was .321 and the lowest was .315, even as his BA went from .424 to .379. 

As I’ve shown at the start of all this, Nate’s estimate of .348 was unsupportable, and so was his forecast as to the chances of him breaking .400.


#56    MGL      (see all posts) 2008/07/09 (Wed) @ 17:59

For what it is worth, I have watched Chipper quite a bit in the last couple of weeks, and he “looks” like a .240 hitter, taking some awful swings and having some awful AB’s.

The reason I say that is because everyone was talking about how “locked in” Chipper was and how he clearly had changed his talent level, etc.

I can’t emphasize enough how much nonsense and crap I think that kind of “analysis” is. Sometimes a batter looks great and sometimes he looks horrible.  You cannot, and I mean CANNOT, tell whether a batter is a .280 or a .320 hitter from looking at him (I am not including a professional scout in that).  And even more importantly, in virtually any stretch of PA, any batter can look like a .400 hitter or a .200 hitter.

Of course, the proponents of batters “pressing” or being “locked in” will say that he WAS “locked in” before but that he is “pressing” NOW.  That is after the fact.  I challenge any of these guys to tell me when a batter is locked in or pressing and we’ll track his next X number of PA.  If that batter beats his projection when “locked in” by more than 10 points, or is more than 10 points less than his projection if he is “pressing,” then I will give that person X dollars.  If not, I win X dollars.  Again, open challenge.


#57    Tangotiger      (see all posts) 2009/04/04 (Sat) @ 16:42

Bumping for blackadder…


#58    Blackadder      (see all posts) 2009/04/04 (Sat) @ 16:53

Thanks, I’ll read through this later…


#59    Guy      (see all posts) 2009/04/04 (Sat) @ 23:17

FYI: Chipper actually hit .324 for July/Aug/Sept last year.  And .328 in June (when most of this thread was being written).  Doesn’t prove what his true talent was, of course…


#60    Tangotiger      (see all posts) 2009/04/04 (Sat) @ 23:24

Sweet.  Doesn’t prove anything, true.  I called something between .320 and .330 in the posts above, so seeing “.324” makes my heart warm.  He could have actually hit .280 or .380, and I’d have still been right to say what I did.


#61          (see all posts) 2009/04/05 (Sun) @ 11:29

Hmm, I posted a long comment, but it doesn’t seem to have showed up.  Is it awaiting moderation, because of the length?  I hope it didn’t get lost!


#62    Tangotiger      (see all posts) 2009/04/05 (Sun) @ 12:28

There’s nothing in the queue.

This software doesn’t like it when you embed HTML.  There are other instances when people have lost posts.

Best thing is to copy/paste to Notepad first before hitting submit.

I should update to the latest version though.


#63          (see all posts) 2009/04/05 (Sun) @ 12:50

I removed the HTML link, which was just to a wikipedia page anyway.  Here’s the original comment:

Ok, after thinking about this, I still don’t agree that Nate made the mistake of assuming Chipper had a full seasons worth of at bats in his analysis, which is the mistake Tango claimed Nate made.  I am not claiming that Nate’s analysis was correct--in fact, I agree that it was very likely wrong ex ante, and that Tango’s estimate of Chipper’s true talent was the more reasonable at the time.  However, the problem was not that Nate was assuming a full season of at bats.

What I think Nate did was the following: start with a pre-season projection of Chipper’s true batting average.  Looking at the graph in the article, he assumes it is normal with mean around .315, and standard deviation a little over .02.  Now, you can certainly argue that that is not a good prior to take; Tango would not doubt argue that our prior should be determined by applying Bayes to the population distribution.  But it is pretty clear from reading the article that this is, in fact, the prior Nate starts with.

Now apply Bayes’s rule to that distribution, i.e. mean .315 and SD .2.  If you apply what Tango does in #50 to this distribution, you will get a mean very close to what Nate says (I haven’t done this computation, because I did an exact computation below that gets the result a differnt way.) In other words, if you take the distribution Nate explicitly says he is taking, and apply Bayes’ rule to it, you get a distribution with the mean that Nate uses in the article.

Tango’s reasons for thinking that Nate must have used a full seasons worth of at bats was that that was the only way he could get a .348 average with Bayes, applied to the population distribution of BA.  But Nate was not applying Bayes to the population distribution, he was applying it to the prior that he writes down in the article.

The more interesting question, to me at least, was the one MGL raised in #54, namely, whether there was a way of doing the calculation exactly, obtaining a closed form solution, using calculus.  It turns out that the answer is, sadly, not quite, at least the way the problem is set up.  It is not hard, using Bayes’ rule, to set up the integrals in question.  And you want to evaluate the integrals from 0 to 1.  The problem is that the integrands you get are large polynomials times a gaussian, which do not in general have closed form anti-derivatives.

If you really want a closed form formula, you could extend the integrals over the entire real line.  Outside of [0,1], the quantity you are integrating is totally meaningless--what is the likelihood of hitting .420 given a true batting average of -42?--but because the gaussian will be so small in that region, the answer will be very close to what you want.  Even then, it will be a truly horrific expression.  Basically, this is just not the right way to go.

So what should one do, if one wants an exact answer?  I spoke to a friend of mine, who is finishing a Ph.D in statistics, and he informed me that a much better way to set up the situation is to use a Beta distribution, instead of a normal distribution (wikipedia’s entry on the Beta distribution is very good; you can just search for it)

The Beta distribution, like the normal distribution, has two parameters, B(a,b).  Unlike the normal, it is zero outside the interval [0,1].  However, since our normal in this case is very small outside of [0,1], a Beta with the same mean and variance will be VERY close to our original distribution; this I did check in excel, just to make sure.

Why use a beta?  Because it makes the Bayesian updating completely trivial.  Namely, suppose my prior for a binomial distribution is a B(a,b).  I then obersve n trials, with k successes and l failures, so k+l=n.  Then my posterior is also a beta, B(a+k, l+m) (in technical terms, the beta is the conjugate prior to the binomial distribution).  Since it is easy to compute the mean of a B(a,b) (it is just a/(a+b)), this tells me how to update my beliefs about the mean.

In this particular problem, the prior Nate uses for Chipper will be close to B(a,b), with a=169.608, and b=368.830.  Then my posterior after the 219 at bats is B(c,d), with c=261.608 and d=495.830.  And when I compute the mean of this distribution, c/(c+d), I get .345, basically Nate’s answer!  Pretty cool, huh?  In other words, if you want to compute Bayesian updates exactly, you should always use a beta instead of a normal.

I know this response is pretty long, but I feel like I learned something because of this.  Thanks guys!


#64    Tangotiger      (see all posts) 2009/04/05 (Sun) @ 13:16

I did it the brute force way.  I started with a true distribution of talent for MLB.  Then for every single true BA, from .180 to .380 I think, I figured out the chance that they could hit 92 for 219.

Multiplying the frequency of players times the probability for each, I come out with the exact answer.  And it was somewhere around .320.

Sorry, but if you come out with .348, you are presuming a distribution of true talent that simply does not exist, or that the estimate process that you are using simply doesn’t apply.  Likely the former is your wrong assumption.

What is your true talent distribution for MLB that you are presuming?


#65          (see all posts) 2009/04/05 (Sun) @ 13:32

Like I said, Nate’s prior for Chipper is normal with mean .315, SD a little over .02.  He doesn’t say anything about the overall distribution of talent in MLB.  Now, you can certainly argue that he SHOULD have considered the overall talent in MLB more, and I would be pretty sympathetic to that.  But his error would then be using the prior that he did, not accidentally giving Chipper 500 AB.

I want to be clear, I am not arguing that Nate’s answer was the right one; I don’t think it was.  I just don’t think he made the mistake Tango seemed to attribute to him.


#66    Colin Wyers      (see all posts) 2009/04/05 (Sun) @ 13:44

I’m out of my depth here, but the Wikipedia article mentions that the Beta distribution is a case of the Dirichlet distribution, and so I bring up this old thread:

http://www.insidethebook.com/ee/index.php/site/comments/modeling_baseball_player_ability_with_a_nested_dirichlet_distribution/


#67    Jason D      (see all posts) 2009/04/05 (Sun) @ 17:05

Blackadder, how did you arrive at the a and b values? Can this be done in Excel?


#68    Tangotiger      (see all posts) 2009/04/05 (Sun) @ 18:39

There is no way at all to come up with the Bayes unless you know the underlying population of true talent of MLB.

So, I don’t see how you can say this:

Now, you can certainly argue that he SHOULD have considered the overall talent in MLB more

If for example I said that the true talent distribution of MLB was a mean of .265 with 1 SD = 0, then we know exactly what Chipper’s true talent is, even if he went 100 for 150: .265.


#69          (see all posts) 2009/04/05 (Sun) @ 21:32

You can apply Bayes rule to update any prior you want.  Nate thinks the appropriate prior is the one he uses in the article, and he thinks it is the appropriate one because of the PECOTA projections.  The overall distribution of talent is irrelevant.  Again, I am not endorsing doing this, but this is what Nate did.  Do the computation in #35 with mean .315, standard deviation .021, and you will get Nate’s answer.

Jason, you don’t even need excel to get the a and b; you can solve for them exactly.  Basically, if you look at the wikipedia page, you can see how to express the mean and variance (which is the square of the standard deviation) in terms of a and b.  You can use those formulas to solve for a and b in terms of the mean and standard deviation.

The answer turns out to be this: let c= (1-MEAN)/MEAN, and let d be the variance.  Then we have
a=(c-d*(c+1)^2)/(d*(c+1)^3)
b=c*a

If you plug in .315 for the mean and .02 for the standard deviation (so .0004 for the variance) you should get the values above.


#70    Tangotiger      (see all posts) 2009/04/05 (Sun) @ 22:38

The overall distribution of talent is irrelevant.  Again, I am not endorsing doing this, but this is what Nate did.  Do the computation in #35 with mean .315, standard deviation .021, and you will get Nate’s answer.

Well, no wonder he got the wrong answer!

The question being asked is this: “Given that we have observed 92 for 219, what is the chance that it was done by someone with a .180 true mean?  By .210 true mean?  .242 true mean?  .349 true mean? etc, etc”

Once you have those answers, you ask: “How likely is it that this population of players has a player with a .180 true mean?  .210?  .242?  .349?  etc, etc”

You certainly cannot plug in .315 in the equation I have in post 35.

***

You can further ask, “Given that we have observed 92 for 219 AND his prior is .315, what is...” and continue with the questions.

In no way can you do this without knowing the population distribution.


#71          (see all posts) 2009/04/05 (Sun) @ 23:26

Be that as it may, that is what Nate did.  I don’t think it is as obviously crazy as you do--he was basically saying that the “population” Chipper was drawn from, if you want to think in those terms, was the abstract population of PECOTA comparables, suitably adjusted somehow--but I am inclined to agree that the way you do it does produce more sensible answers.


#72    Jason D      (see all posts) 2009/04/05 (Sun) @ 23:35

Blackadder, I get a similar result to your calculation by doing this:

sd = sqrt(p*(1-p)/n)
p = .315
sd = .02
solve for n = 539

.315 = x/539
x = 170

170 + 92 = 262
539 + 219 = 758

262/758 = 0.346


#73    Guy      (see all posts) 2009/04/06 (Mon) @ 14:41

MGL:  As the house Bayesian, could you comment on Nate’s method here?  (I say that seriously, not facetiously.) Does it make sense to use his prior estimate of Chipper’s talent, rather than the population statistics, in the way he does here? 

I’m not sure I see the basis for the SD of .021 on the pre-season estimate of Chipper’s .315 BA talent.  That would suggest there was a 16% chance he was truly—and always had been—a .336 hitter.  And a 2.5% chance—small, but non-trivial—that he was truly a .357 hitter.  That doesn’t seem like a plausible estimate given A) what we know about the MLB population in general, and B) Chipper’s 7000 ABs and lifetime .310 average.

It seems to me the real error range is probably about half as large, maybe smaller, and should be skewed (at least a little) toward lower estimates.  And that would in turn lead to a much more conservative estimate of his true talent after we take account of his 92-for-219 run.


#74    Tangotiger      (see all posts) 2009/04/06 (Mon) @ 15:04

Andy Dolphin’s equation is this to calculate the uncertainty of your estimate of the true rate:

1/sqrt(1/V1+1/V2), where V1 and V2 are the random and population variances

For batting average, V1 = .32*.68/2000, where 2000 is the number of at bats from your sample.  So V1 = .0001 (or .010^2)

The population variance is probably V2 = .025 ^ 2

Plugging this above, you get:
1/sqrt(1/.0001 + 1/.025^2)
= .009

As you can see, the uncertainty of the estimate is virtually identical to the sqrt(V1).  That is, your uncertainty is almost totally dependent on the size of your sample.

For someone with no prior AB, your uncertainty is identical to the population standard deviation.

So, your guess about the uncertainty being half what Nate says is dead-on.

Here is a related thread:
http://www.battersbox.ca/article.php?story=20040923122101999


#75    Guy      (see all posts) 2009/04/06 (Mon) @ 15:11

Thanks, Tango.  And why do you choose 2,000 ABs?  Is your thinking that additional (prior) ABs don’t really add new information, because they are 5+ years in the past?  (But, theoretically, if Chipper had 5,000 ABs in the prior 4 seasons, then our variance would be based on that n?)


#76    Tangotiger      (see all posts) 2009/04/06 (Mon) @ 15:16

Right, if you use my formula of

weight = .9994^daysAgo

Then your total weight for the last 3 seasons would come in at 2.44.  And if you go further back, the maximum amount is 5.000.

So, the most you can count someone is 5 times the number of his average at bats, or 3000 or so.

Nate I think uses 3 years anyway, so it’d be close to 2.44 times whatever Chipper’s average at bats was.  Maybe I should have used 1500 or something.


#77    Peter Jensen      (see all posts) 2009/04/06 (Mon) @ 18:22

That would suggest there was a 16% chance he was truly—and always had been—a .336 hitter.

Why is this topic being discussed as if a)a batters talent for hitting doesn’t change over the course of his career and b) a batter’s talent for hitting is not dependent on the handedness of the pitcher or his home park or a host of other factors?

Chipper’s run at the beginning of 2008 over his first 219 ABs was at least partially the result of some unusual distributions not under his control.  First he was seeing an usually large number of left handed pitchers.  37% of his first 219 AB were against LHP instead of the 29% of the previous 3 years.  He had previously hit for a better average against RHP, but in the beginning of 2008 he just destroyed LHP for a .432 BA and a .447 BABIP. An unbelievable improvement over the .272 BA and .286 BABIP of the previous three years. 

No one would expect this to be sustainable, but was any of this improvement a result of anything other than a chance occurence due to small sample size.  I believe there is some evidence that there was some real change in Chipper’s ability.  In 2008, Chipper was more aggressive hitting 18% of first pitches from left handers instead of the 13% in the previous 3 years.  He also was more successful hitting first pitches for 32 hits in 67 tries for the entire year against all pitchers for a .478 pace.  This compares to a .401 rate in 2005-7.  He seems to have accomplished this by flattening his swing and keeping his fly ball rate to around 25% where in the previous 3 years it had been around 37%.  Whether as a result of his success or a cause of it, his early 2008 splits saw a much greater number of first pitches taken for balls.  Of the 215 PAs where he didn’t hit the first pitch, 133 started with a ball and only 82 with a strike.

Chipper made right handed pitchers pay for giving him this advantage in the count. He hit 10 of 24 FBs for HRs in 75 ABs.

Chipper’s performance of hitting over .400 through early June was a combination of his extraordinary success hitting on first pitches, his low FB rate, his good decisions to not help pitchers by swinging at non strikes on first pitches, and his ability to make the most of his subsequent count advantage.  The actual skill that he has in these areas was certainly supplemented by luck, both small sample size luck and probably a lucky draw of pitchers.  But the analysis is much more complex than the simplistic answers in the previous posts.


#78    Guy      (see all posts) 2009/04/06 (Mon) @ 18:33

Maybe.  Then again, Bayes said he had become a .324 hitter, and he hit .324 for the rest of the year.  So, maybe you’re overthinking this, Peter.


#79          (see all posts) 2009/04/06 (Mon) @ 19:13

The article was done using stats through June 10.  From June 11 to the end of the season, Chipper hit .309.


#80          (see all posts) 2010/09/29 (Wed) @ 01:58

Wow, spirited stuff Tango.

I think that using mickey’s terminology will cause a lot of confusion, we’re dealing with a few different flavours of epistemic (systematic) uncertainty within any model, after all.  so I’ll avoid it.

The idea behind PECOTA is interesting.  It’s an end run around the math associated with joint probability methods.  More importantly it’s an end run around the assumptions associated with those methods (that chance variation is binomial, that the population subset is of the beta, guassian or other form with variance X, etc.).

Having said that ... as you’ve demonstrated, there are some serious issues with the execution.

Nate explicitly states that the PECOTA projections are the observed in this post on Chipper Jones from a couple of years ago.
http://www.baseballprospectus.com/article.php?articleid=7652

In that article he produces a chart that is not the PECOTA projection, but rather a narrower probabalistic curve that represents Chipper’s batting average ability at the beginning of the season.

That curve is far too wide.  Something has gone terribly wrong.  Apparently that’s a Guassian curve, and just eyeballing it I’d guess a standard deviation of about .020, maybe a shade more.  That’s a whack.

For comparison’s sake, Marcel’s probablistic estimate of Chipper’s batting average ability is g(p) :: BA408 * (1-BA)889
plot that out on the same paper as Nate’s chart and they are strikingly different.  The stdev of the Marcel plot is .013. 

An Albert-Marcel plot would have a stdev of .012 and would be a touch to the left of simple Marcel.  Still, it’s really pencil widths difference between the latter two.  PECOTA is madass wide.

BTW: I didn’t bother to make an age adjustment for Marcel in the example, just slide the curve to the left if you’re keen.

BTW#2: In the linked post, best not to read beyond the chart.  Nate is a smart guy and knows a lot about baseball, but imo that article can best be described as a series of blonde moments.  We all have days like that, I certainly do.


#81    Sunny Mehta      (see all posts) 2010/09/29 (Wed) @ 03:56

Vic/#24,

In the article you link to, Nate himself says “(A technical note to my regular readers: what you see in the chart below is not our usual way of doing a PECOTA forecast. Instead, I have generated a normal distribution based on the performance of Chipper’s comparables, after regressing the comparables’ batting averages to the mean).”

It looks like he’s basically using a special (and possibly very optimistic) prior for Chipper, i.e., he’s considering him to be part of a sub-population of elite hitters.

Then to get Chipper’s posterior ability distribution it appears he’s using only that half-season’s worth of results for the likelihood component.

Certainly I could see one taking issue with both of those two decisions. However, I actually think the simulation model he sets up afterwards is pretty clever.

In other words, while I think his inputs are contentious, I can’t find much fault with his general framework.

Am I missing something?


#82    Sunny Mehta      (see all posts) 2010/09/29 (Wed) @ 04:13

Vic,

To clarify further about my last post, if I were doing the Chipper study myself I’d probably do the exact same thing Nate did except that

1) I’d probably be a little more conservative with the prior I chose. You know I feel fairly strongly about not inadvertently introducing selection/survival bias in studies like this, so I’d either choose his prior based on his general fielding positions, or perhaps if I were being ambitious I’d find a few other non-results-related conditions to throw in (e.g., round selected in draft, minor league history, etc.).

(One problem is we don’t know exactly how Nate came up with his particular prior because he doesn’t explicitly tell us. If he uses anything purely on the basis of observed results, imo it’s cause for concern. However, if he uses some other kind of condition filter, perhaps related to physical attributes or something, I’d be willing to hear him out.)

2) I’d likely include a bigger observed sample for the likelihood component. I tend to be a “I’m usually pretty fine with using a player’s entire career’s results” kind of guy, but I’d be basically okay with the last three seasons. I could also possibly be convinced of the “weighted last three seasons” stance. And I know you are big on players’ true talents fluctuating more between seasons rather than in-season, so perhaps you’d have an even better idea.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 10 13:54
Performance through the ages

Feb 10 13:47
Win expectancy charts used in football… in 1983!

Feb 10 13:38
MGL: Today on Clubhouse Confidential

Feb 10 13:12
New PECOTA

Feb 10 12:17
Dwight Evans

Feb 10 11:40
Turbo Tax: the Netflix of tax software?

Feb 10 10:45
Psst… wanna intern in Canada?

Feb 10 09:25
For Your Soul

Feb 10 01:43
The will of the people?

Feb 10 00:36
Correlation of pitcher metrics: FIP strikes again