THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, September 06, 2006

What are the chances of a certain player winning a batting title?

By , 06:32 PM

Rob Neyer likes to say something like, “So-and so will win a batting title in the next 5 years,” usually referring to some excellent-hitting young player or prospect.  The last time I read one of those comments was in a chat a few days ago, when he was referring to Howie Kendrick of the Angels, the 23-year old first/second baseman who has posted some excellent numbers in the minors, at least BA-wise, and is hitting .296 so far in 196 AB for the Angels.

I was curious as to how often a good hitter actually wins a batting title, given the competition and given that one standard deviation of BA due to luck alone is more than 40 points in a full season worth of AB.


I ran a sim of 10,000 seasons, 550 AB per season per player, assuming a mean (and median) BA of .270, 85 qualifying players, and one standard deviation of BA talent = 27 points (courtesy of Tango, based on an off-the-cuff estimate I think).  Here is my distribution of players and their true BA:

2 .335
3 .315
9 .303
11 .288
16 .275
3 .270
16 .265
11 .238
9 .225
3 .212
2 .205

So, how often does one of the 2 best players in each league, with a true BA of .335, win a batting title?  30.26%.  In 5 seasons, the chances of such a player winning at least one title is thus 83.5%.

If a certain player is one of the 5 best hitters in the league, then his chances of winning a title is 15.9%, which is 58% in 5 years.

For someone like Kendrick who is young and has a good MLE BA, I am going to guesstimate that the best we can do is put him in the first 3 categories above, or one of the best 14 hitters in the league, of all qualifying batters.  His chances, then, of winning a title, would be 6.9%, which would give him a 30% chance of winning a title, assuming he qualifies in each season, over a 5-year span.

Interestingly, a true .265 player won a title 6 times in 10,000 seasons, and a true .275 hitter won a title in 4 seasons, so, around 1 out of every 500 seasons, we can expect a hitter who is around league-average in BA to win a title in one of the leagues.  I thought that was interesting.

#1    Joe Arthur      (see all posts) 2006/09/07 (Thu) @ 05:59

Interesting. Always good to have a cross-check on commentator blather.

Do you mean that each of the .335 hitters individually has a 15% chance in 1 year, and 41.8 in 5 years? Somehow the result doesn’t seem quite right to me, or guys like Carew and Boggs and Gwynn shouldn’t have been able to win so many individual titles. These guys with career averages of .328 or so - would you set their peak true talent to .340 or so? Or higher? I’m not sure that any current players have true talent as high as .335 ...

I don’t what a longer view of distribution of BA would look like, but recent real-life distributions [2000-2005] don’t completely match the assumptions of your sim. Oddly, even though there are fewer teams, the AL has as many if not slightly more qualifiers for the batting title. in the last 6 years, the AL has averaged 77 qualifiers with low of 70 and high of 81, NL has averaged 75.5 with low of 66 and high of 83.

The median does pretty well match the mean among the qualifiers each year[the qualifiers are perhaps an average of 10 points better than the overall league averages], but distribution of performance is more right-skewed than you’ve modelled, and the SD of actual performance (again limited to the qualifiers) appears to revolve around .027 [AL SDs of .021 to .031 for the 6 yrs, NL of .022 to .030 (twice)]. Shouldn’t the SD of true BA talent be set somewhat lower? If the true talent SD was set to something like .018 or .020, would you better model the dominance of Gwynn et al.?


#2    tangotiger      (see all posts) 2006/09/07 (Thu) @ 07:18

The more important part is that MGL has two guys at .335, and the next 3 guys at .315, and then 9 more at .303.  That’s a huge gap in true talent, Boggs/Gwynn like.  In the more realistic scenario, you’d have true talent at .320, .317, .315, .313, .312, etc…


#3    tangotiger      (see all posts) 2006/09/07 (Thu) @ 07:50

Over at BTF, Mike Emeigh said the following:

I don’t like the representation of expected performance levels as some “fixed value” of “true talent” plus “random variation”. “True talent” itself is a variable function of personal characteristics (day-to-day health being the largest one) and what we call “random variation” is better characterized as “unmodelled performance variation” - fluctuations in performance that very likely have an explanation, but for which we can’t explicitly account. We’d be far more likely to achieve greater acceptability of what it is that we do if we stop casting things that we can’t explain into the “random” bin.

He’s right, and he’s wrong.

No one is saying that true talent is some fixed, never-changing value.  Since we are dealing with humans, we know that can’t be true.

However, what is true is that once you account for the random variation, what’s left from the variance of the observed is the variance of the true talent.

That true talent is made up of some fixed amount of true talent, plus day-to-day variations, unexplained to the observer, but nonetheless real.  For our purposes, we don’t care.  The fixed plus variable portion of the true talent makes up the true talent.  There’s no real good reason to model the true talent as two separate things, since we can’t know what’s fixed and what’s variable.  And we can’t tell who has more variability than the other.  So, you lump it all into one true talent bucket.

However, if you look at aging, and 5-years is alot of aging, then you can’t do that, since the mean true talent has now changed.  It’s one thing to assume a fixed level of true talent over 6 months, and the variable true talent around that.  It’s another to assume a fixed level of true talent over 5 years.


#4    Joe Arthur      (see all posts) 2006/09/07 (Thu) @ 08:02

Yes; what I was loosely thinking was that you throw out one .335 guy and the other one has about 15/85-ths of a chance to win the batting title against the remaining field. Thus one guy whose true talent is 20 points higher than the next best player still doesn’t close to modelling the dominance of a Gwynn. Can he possibly be something 35-40 points better in true talent, or is there something wrong with the model?

In hindsight I shouldn’t have been surprised about the # of qualifiers in the AL, because those teams have a 9th full-time slot with the DH.

In terms of performance distribution, just adding buckets accross years for 2000-2005 [which isn’t completely right because league averages did vary a little each year, I counted these
[.20 = .195-.204, etc]:

.20:  1
.21:  1
.22:  2
.23:  19
.24:  40
.25:  77
.26:  95
.27: 122
.28: 123
.29: 101
.30:  96
.31:  70
.32:  37
.33:  32
.34:  13
.35:  6
.36:  5
.37:  4

If you march out from the center at .265-284, there are many more high averages than lows. In real life, lower spots in the batting order + pinch-hitting & platooning will keep the lower talents from batting enough to qualify. The extreme high end performances are dominated by special cases [Helton and Walker at Coors, Garciaparra at Fenway, Bonds with his low AB totals giving randomness more sway.] It would be a further complication to the model to layer these effects on top of true talent; on the other hand they do impact the real chances of winning real batting titles ...


#5    MGL      (see all posts) 2006/09/07 (Thu) @ 08:28

Of course, I used a crude and rough model for the sim.  All of the points are well taken, although I am not sure what Mike is talking about (of course true talent varies from day to day, although slightly I would think, and in actuality, the “true talent” is the batter’s context-neutral true talent plus the pitcher, park, weather, etc.).  It seems as if he is “denying” that the binomial random variance exists, but I’ll have to check his post on BTF.

And of course, just because on the average BA talent has some kind of distribution that we can model, that does not mean that in any given year or years there are not players who are 30 points better than everyone else, etc.  And of course, the talent among the baseball population itself is not nearly normal centered on the mean or median.  Not even close.  When we adjust for playing time it is close to normal because the great number of poor players get little playing time, are platooned, are shuffled around, etc.


#6    joe arthur      (see all posts) 2006/09/07 (Thu) @ 10:48

re: true talent.

I’ve always understood the phrase much as Tango describes, but do feel it is a bit awkward. I think of talent as something more or less innate, capable of development. What people have at a point in time is an ability to perform. To me it is sensible to say “Michael Jordan had a talent for baseball, but developed his basketball talents instead. When he tried baseball again, his lack of practice at baseball-specific skills meant that his ability to perform lagged behind others of comparable talent.”
I’d prefer a phrase like ‘current true ability,’ understanding that a player’s ability can change pretty quickly as he masters the changeup or needs to start wearing contact lenses.


#7    Joe Arthur      (see all posts) 2006/09/07 (Thu) @ 11:13

I do like alliteration, though. Tradeoffs, tradeoffs ...

Any takers on “actual absolute ability” ?


#8    tangotiger      (see all posts) 2006/09/07 (Thu) @ 11:24

Once a term is defined, that’s all that we care about.  What people have a problem with is the actual name, not the definition.

I can say “a player’s chance of getting on base, at this exact point in time, with an average pitcher, at an average park, with an average men on base, and an average guy on deck” and call it a “quadralee”, or I can call it “true talent”, or “true score”, or “current ability”, or whathaveyou.  It’s simply and truly irrelevant.

(Average would of course comprise of the entire distribution, so it’s not against a single average park, but 1/30th of each park, etc.)

I do believe that people simply have a hard time with definitions, if it uses words that they have predefined, and don’t want to allow for other definitions.


#9    dq      (see all posts) 2006/09/07 (Thu) @ 11:36

The more interesting thing here is Gwynn, he won 8 batting titles in 14 years, and lost by .002 and .005

His simple average of those years (adding them up and dividing by 14) is .343 - the runner-up of winner’s average was .340 for those years.

How much better does someone have to be to win over 1/2 the batting titles in a 14 year span?

Carew won 7 in 10 years, Boggs 4 in a row and 5 out of 6.


#10    tangotiger      (see all posts) 2006/09/07 (Thu) @ 11:59

I posted this elsewhere, so I’ll post here as well.

===============================
As for luck, etc.  There are two lucks: the pure random luck, where the mean of a coin-flip is 50/50, or rolling a 7 is 6/36.

The other luck is the random variation around a predefined mean.  A Pujols mean will be alot higher than a Neifi mean.  But, both will have random variation around that mean, on a game-by-game basis.

For any single at bat, everything can be considered luck, since the random variation for any single event will far outweigh the true variance among the players in question.  If a guy hits a HR, you have little knowledge, if you look at that HR in a vaccum, if that guy is a power hitter, or not.  Each PA, on their own, is almost all luck.  As you string PAs together, the random variation around the mean will start to be reduced, and after a large enough PA, will be almost minimized.

Instead of using “luck”, just say “timing”.  Pujols can control his mean… he is a great hitter because he works at it… but he can’t control his timing.... one day he’s 3-4, and the other he’s 0-4.  Random variation.  Timing.


#11    MGL      (see all posts) 2006/09/07 (Thu) @ 13:38

Tango, I have no idea what you mean by that, but it is not important.

Gwynn won so many titles “because” his true BA was likely a lot higher than everyone else’s in many (perhaps all) of those years.  Even if it wasn’t, it is possible for ANYONE to win ANY number of titles in ANY number of years.

How much better a player has to be to win that many titles in that many years is not a specific enough question to answer.  If you change the question slightly to, say, “How much better does someone have to be to have greater than a 50% (or whatever %) chance of winning X number of titles in Y number of years, then we can approximate an answer.

Many people do not realize that there are ways to couch legitimate questions that really yield no satisfying answers.  This is actually a pet peeve of mine.  For example, being a recognized expert on baseball, people might ask me, “Who is going to win the WS this year?” My response might be, “How should I know?” and they look at me with a blank stare.  Bill James once said, in response to a question like that, “I am an analyst, not an oracle.” Another question that does not have an answer is, “So, who is the best player in baseball?” My answer might be, “What do you mean by that?” Of course, much of the time, these things are not very important, and the specifics of the question can be reasonable inferred, but often they are important, at least in someone’s world (running a business for example).  In those cases, one must be very careful about how they couch their questions, both to someone else and to themselves.


#12    tangotiger      (see all posts) 2006/09/07 (Thu) @ 13:53

mgl, I don’t disagree with anything you say.

I’m only pointing out that the reason you got the results you did is because you “seeded” the population to have Bogggs/Gwynn-like players at the top, and then a huge falloff after that.  You’ve got the 15th best hitter at .288, and Kendrick as one of .335, .315, .303.  It is no surprise at all to expect such a hitter to win one over five years, 30% of the time.  Or, as you said, 6.9% in any given year.

I mean, if only the 14 top hitters qualify for the batting title, that makes it 1/14 or 7.1%!  In effect, the 15th and later batters will almost never win a batting title!

So, your distribution does not seem to represent any reality, since your true top 14 hitters will win 97% of the batting titles.


#13    dq      (see all posts) 2006/09/07 (Thu) @ 15:00

Your sim doesn’t appear to work outside the top players.

I get 9 average hitters winning the batting title since 1946-2004,118 years. I took the player’s ba the year before they won the title and the year after, and called that is normal level. I compared that to the ba for the year he led the
league for players with 400 ab + bb.

So instead of 10 times in 10,000, I get 9 out of 118.

YearNameAvePY/FY Ave LgAveDiff

1961 Cash 0.361 0.260 0.276 (0.016)
1953 Furillo 0.344 0.274 0.286 (0.012)
1953 Vernon 0.337 0.271 0.279 (0.008)
2003 Mueller 0.326 0.275 0.278 (0.003)
1960 Groat 0.325 0.275 0.276 (0.001)
1958 Ashburn 0.350 0.282 0.281 0.001
1981 Lansford 0.336 0.279 0.276 0.003
1985 McGee 0.353 0.274 0.271 0.003
1991 Pendleton 0.319 0.278 0.273 0.005


#14    dq      (see all posts) 2006/09/07 (Thu) @ 15:03

Actually, he doesnt have Gwynn/Boggs at the top - the 2 stars he has “only” win the batting title 30% of the time combined. Gwynn, Boggs, Carew each won the batting title more than 50% of the time over a long period of time. In order for them to have a 50% chance to do that, they must each be way better than the next best player in the scenario. If 2 players .025 better than the league can’t combine for the title more than 30% of the time, than 1 player must be way better than the league to win it over 1/2 the time.

How much better does a player have to be to have a 50% chance of winning batting titles:

4 in a row
8 out of 14 years
7 out of 10 years


#15    MGL      (see all posts) 2006/09/07 (Thu) @ 17:04

Tango, what I did not understand was your “two kinds of luck” post.

DQ, interesting.  I like the idea of using the year before and after as a proxy for a player’s true BA.  I do that all the time, but, in this case…

You have to use ALL 3 years, which makes their true BA considerably higher.

I guess I can redo the sim with a more granular distribution of players, based on the same .027 SD, but I don’t think the results will be much different.

It is a basic sim, which simply generates a random number per AB for each player in order to produce a hit or an out, based on their true BA, and then crowns one player the batting champ at the end of 550 AB for all 85 players.


#16    tangotiger      (see all posts) 2006/09/07 (Thu) @ 17:46

You have a mean, and you have a sample distribution around that mean.  When we talk about luck, we are talking about randomly choosing some point around that mean.

But, some people think of luck as being the mean itself.  “Albert Pujols isn’t lucky!  He’s Albert Pujols!”.  These people think we are talking about his mean, when in fact, we are not.  We are talking about the timing of his performance, centered around his own mean.


#17    dq      (see all posts) 2006/09/07 (Thu) @ 19:51

I debated whether to include the middle season or not, as in some cases (Vernon,Cash, McGee) the ba championship season was clearly above any other performance, and that appears to be a case where “luck/chance” is involved. Many of the others had a bad season, great season, good season, and would be more represented with the average.

I think the problem with the sim is that there arent usually 2 guys in the .330 range. They restricted batting titles for guys below them, and limited the lower echelon guys too much.

I think (and this is only thinking) that there are 5-6 guys with true talent around .315, and one of those goes usually goes the 1 SD/variance over the range and bats .340. Now, in some years, a .270 guy can go 2 SDs over and get the .340. I think if you look at the current batting leaders youll see something like that. The AL has the 5-6 guys with true talent at .315, with Mauer and Jeter more than 1 SD above. The NL has Cabrera at 1 SD above, but the guy from Pittsburgh looks like the new Mueller/Cash/McGee.

Gwynn was at the .345 range where in a normal year his average was higher than the .340 the other players hit.

So, in a sim of 10,000 seasons, you probably have 2,000 or so that have a Gwynn/Boggs/Carew at .345 +/-. In those years they will probably win 1/2 the titles or so, and a .270 hitter probably won’t win.

In the other seasons they won’t be there, and your top true talent is the .315 guys- this allows Norm Cash, Bill Mueller, et al batting titles.

The 5-6 guys are also the group of guys you would write about that could win a batting title.


#18    MGL      (see all posts) 2006/09/07 (Thu) @ 19:57

I really don’t know what a typical distribution of BA talent is off the top of my head.  I simply assumed a normal distribution centered on .270 with a SD of .027.

As far as the 3 seasons, you HAVE to use them (or any other number of seasons, as long as they are chosen without bias, which will generally mean that you must include the title season).  You have no choice.


#19    dq      (see all posts) 2006/09/07 (Thu) @ 22:03

We are trying to determine talent, not outcome.

Let’s take Norm Cash

.286 (353 abs)
.361
.243
.270
.257
.266
.279
.242

His 2nd best season with 400 ab is .283, 7 years later. If I just take the 3 seasons, then I’m calling his talent level .299 - .019 higher than his 2nd best full season

according to you guys:
outcome = talent + luck

I’m trying to determine how much talent and luck there is in the .361.

He was a career .271 hitter.
So I’m thinking the .361 includes a lot of luck, and that is true talent is much below that. I think you have a guy who is 3 SDs above his talent level here.

If his talent level was .298, then his chances of not hitting at least .284 the next 12 years would have been pretty low - what would that be .30^12 power or so? 2 million to one.

I think I should have probably used the career average here, that would reduce the impact of the lucky/fluke season, but still consider it.

Bill James Win Shares has an article on Fluke Seasons.


#20    dq      (see all posts) 2006/09/07 (Thu) @ 22:10

>>I really don’t know what a typical distribution of BA talent is off the top of my head.  I simply assumed a normal distribution centered on .270 with a SD of .027.

So you just made up some numbers to come to your conclusion.


#21    Joe Arthur      (see all posts) 2006/09/07 (Thu) @ 23:22

Just treating the qualifiers in the 2000-2005 leagues as a single dataset, the mean was .282 and the observed SD .027. Following the math in the appendix to the Book pp.368-72, the standard deviation in “true” batting average skill then would be .019. This would mean 95% of the players between .244 and .320. With about 80 qualifiers per year, that would suggest about 2 a year beyond .320.

The real distribution is a little skewed to the right.


#22    MGL      (see all posts) 2006/09/08 (Fri) @ 01:20

dq, this is a civil blog with polite, civil, discussion.  I do not want any flames, trolls, sarcastic comments toward other posters OR their posts, comments, opinions, etc.  In the future, please keep to that decorum.

BTW, I am wrong about using the 3 years as a proxy for a player’s true talent.  Let’s assume that a player has the same true talent every year, such that his career average is always equal to his true BA.  If we use those 3 years (year of the title, the year before and the year after, or any other years plus that title year), we will overestimate the true BA. If we use any year or years but NOT the title year, we will underestimate the true BA.  When estimating a player’s true BA from his sample BA, you have to use an UNBIASED sample.  Deliberately including or not including an anomolously good year is using a biased sample.

Until I figure out how to do it, I have to defer to one of the statistical experts.  The question is, “If a player wins a batting title, what is the best way to estimate his true BA, if you know the BA in all other years of his career?”

I am even wondering if dq might be right.  If we start by looking at all the battint title winners over the years and we want to find out their collective true talent, can we look at all the years before and after only?  I don’t think so, but now I am not so sure.

Now, I am not so sure how dq came up with his list, but if he looked for the title winners with the lowest pre and post title year BA, then I am sure that that is a biased sample (toward flukey low pre and post years) and definitely NOT represent the true BA of those players.

Estimating true BA (or whatever stat or talent) after you select players based on some sample of their performance, when that sample is selected for being high, low, etc., is a tricky thing, and one has to be careful.


#23    MGL      (see all posts) 2006/09/08 (Fri) @ 01:25

I used the .027 as the true talent SD that Tango gave me which was admittedly, by him, off the top of his head, or something like that.  If it is closer to .019, that’s fine.

“Skewed to the right,” as in fewer players at the high end, and the median less than the mean?


#24    MGL      (see all posts) 2006/09/08 (Fri) @ 01:30

I’ll also add that if we only look at title winners, their whole careers will actually be higher than their true BA on the average, I think.  To determine (estimate) their true BA, we have to regress their career averages.  For a random, unbiased sample of players, their career BA, on the average, will be exactly equal to their true BA.

I really have to think about this for a while, and perhaps “reverse engineer” some sims (which is an excellent way to figure these things out, BTW).


#25    dq      (see all posts) 2006/09/08 (Fri) @ 04:52

All apologies offered, sorry for any offense taken.

My list was derived from asking the question, “How often will an average hitter win the batting title?” The sim said 10 times out of 10,000.

So I looked at the batting champs since WWII, and used PY/FY ba as their talent level (I probably should have used career average).
I compared the PY/FY average to the CY average BA for all players 400+ ab +bb -

So, my list of players (Cash, McGee et al) isn’t a sample. It’s a list of players who hit about or below the average and won batting titles.

To get true tlent, you probably take career average, and adjust for age, park factor, and run content.


#26    MGL      (see all posts) 2006/09/08 (Fri) @ 11:56

np, dq.  Anyway, I’ll be on vacation for a few days…


#27    Joe Arthur      (see all posts) 2006/09/08 (Fri) @ 15:59

Mickey,

right-skewed = more players at the high end; median is slightly less than the mean in this sample.

Have a nice vacation! just got back from one myself ...


#28    bsball      (see all posts) 2006/09/12 (Tue) @ 06:41

I just looked at the top 10 BA in the NL for 1984 - 1988 to see if I could get some insight for why Gwynn won so many batting titles.  In those 5 years Gwynn came top 3 times.  I looked at the career BA and the career high BA for each of the other players on the list and found:

1. Gwynn’s career BA (.338) was nearly .040 higher than the 2nd best career BA for any player on the list in that year.  The next best career BA for all players in the top ten in any of the 5 years was .303 (Mark Grace).

2. Gwynn’s career BA was above most of the rest of the top 10 players’ career BEST BA.

Gwynn won the batting title 6 of the 7 years when he hit above his career average (and 2 of the years when he hit below his average).  In 1993 he was beat out only by the Coors inflated BA of Andres Galarraga (.370).  It seems like he was just so much better than anyone else that the only way for anyone else to win was for him to have a below average year.

It seems like if you are Gwynn you have a better than 50% chance of winning in any year.  If you are in the group of hitters below him you better hope that your career high year coincides with his off year.


#29          (see all posts) 2006/09/12 (Tue) @ 11:00

Hey guys,
Interesting thread.  There were some good criticisms of the original work, but no one went back to do more work, such as fixing the gap-between-the-best-players problem.  So I did some myself, and thought I’d share it:

Assuming Gwynns’ winning the batting title can be modeled by a binomial distribution, you would need a “true” batting title probability of ~45% have a 25% chance of expecting to win 8 out of 14 batting titles.  The 7 out of 11 statistic is a biased sample, since ‘87-’97 was selected to be his best 11 years. 8 out of 14 is more legit, since it was his first 14 years in the league. Though one may argue that we should calculate the likelihood of 8 out of 16, since he had 2 more allstar years at the end of his career.

So if his true batting title percentage was 45%, he has a decent chance (25%) of winning 8 out of 14. What does that correspond to in terms of BA?

MGLs original post can be used to answer that, with slight modifications. Tango is right about the huge leaps in skill between the top players.  So I ran a similar sim, running 10000 seasons, including 77 players who were randomly selected from a population density function fit to Joe Arthurs frequency data.  Since the density function is continuous, there are no artificial gaps between players’ ability.  Any gaps are those that appear randomly, and should cancel out over the 10000 seasons.

My model says that Gwynn would need to be a .347 hitter to have an average title-winning rate of .45, and thus have a 25% chance at pulling off 8 titles in 14 years.

His actual BA for those 14 years was .342

Is there anything I missed?


#30    dq      (see all posts) 2006/09/12 (Tue) @ 12:53

I used 7 of 11 since that was Gwynn’s peak, which I am more interested in. You may call it a biased sample, but it is a result that actually occured.
My question was how good you have to be to win 7 of 11, which is what he did. The 1st 3 years I would contend he was not at his peak; 11 years is a very long peak for a player as it is.

Also, I don’t understand why you solve for Gwynn at only a 25% chance. He did do this, so I would assume you would give him at least close to a 50% chance.

Since Boggs & Carew did somewhat similar feats, I would think the 25% chance seems low.

If 3 guys did a feat that only has a 25% chance of happening, does this mean there are 9 similar guys to Carew/Boggs/Gwynn ?


#31          (see all posts) 2006/09/12 (Tue) @ 14:09

Hi DQ,

How can you prove that 1987 to 1997 was Gwynns peak? He hit better in 1984 and 1986 than he did in 1990-1992.  He also hit better in 1998 and 1999 than he did in 1990-1992. Why not include those years?  Frankly, 1990-1992 were some of his worst years.  But that doesn’t necessarily imply anything about his “true” ability peak.  He could have had the best skills in 91, but got unlucky.  Its impossible to guess when his “true” peak was, so why bias your sample by cherry picking your data?

Re: the 25% chance, think about it in terms of the confidence with which you draw your conclusion: 

1) Gwynn won 8 of 14 batting titles.
2) If Gwynn were a .347 hitter, he’d win 8/14 only 25% of the time.
3) Thus, there is a 25% chance that Gwynn was less than or equal to a .347 hitter.
4) So one can say with 75% confidence that Gwynn was better than .347.

Why would you want to know what you can say with 50% confidence? What use is a conclusion that has a 50% chance of being wrong?


#32    dq      (see all posts) 2006/09/12 (Tue) @ 15:33

You are right about Gwynn’s peak; he had an unusual careet in that he was batting better at ages 37-39 than he was at ages 30-32.

Your .347 is a little confusing, because I don’t know what league average you ran the sim against versus what Gwynn’s actual content was. I’m actually 99.5% certain that Gwynn did not bat .347; he batted .342.

I want to know how good Gwynn was. I’m not flipping a coin, I’m trying to determine a measure of talent. 75% probability of a substantive number gives you a conservative answer. 55-60% gives you a moderate answer.

He was much better than a hitter who would win only 8 of 14 titles 25% of the time, he did it 100% of the time. In those 14 years, he won 4 batting titles by .022 or better, 4 more by less than .010, was within .005 twice, and once lost to Galarraga while hitting .358 -

he actually had a chance to win 11 titles in 14 years


#33    dq      (see all posts) 2006/09/12 (Tue) @ 21:48

Let’s try it this way. I know what Gwynn hit. My questions are (1) How much better is he then the next guy and (2) How many titles should he have won.

Okay, let’s start with (2). Gwynn hit .342 in this stretch, with a SD of .027. Assuming he was at the same skill level throughout, we can compute his chances of winning a title versus the next best average in the league. I simply took the average he had to beat, and the distribution of his .342 normally distributed to get this:

yr next bestchance win

97 0.366 0.187
96 0.344 0.470
95 0.346 0.441
94 0.367 0.177
93 0.370 0.150
92 0.330 0.672
91 0.319 0.803
90 0.335 0.602
89 0.333 0.631
88 0.307 0.903
87 0.338 0.559
86 0.334 0.616
85 0.353 0.342
84 0.321 0.782

Total 0.340 7.335

So, he should have won 7.3 titles, he won 8. Not bad.

For point 1 I wanted to measure Gwynn’s true talent against the next best player. To calculate the true talent, I computed the BA for the 2 prior years, the current year, and 2 future years for all players with 2000+ pa for the 5 years and 400+ pa in the NL for that year, and compared the #2 guy versus Gwynn.

Ave Gwynn Diff

Raines1984 0.307 0.326 0.019
Raines1985 0.318 0.338 0.020
Raines1986 0.315 0.336 0.022
Guerrero1987 0.314 0.333 0.019
Guerrero1988 0.304 0.332 0.028
Clark1989 0.304 0.330 0.026
Larkin1990 0.306 0.319 0.013
Larkin1991 0.310 0.327 0.017
Morris1992 0.315 0.336 0.020
Kruk1993 0.309 0.349 0.040
Piazza1994 0.326 0.357 0.030
Piazza1995 0.337 0.368 0.032
Piazza1996 0.339 0.362 0.023
Piazza1997 0.335 0.352 0.018

0.317 0.340 0.023

This shows Gwynn on the average .023 better than the next player (Gwynn is at .340 here because we factor in 1998 and 1999 in the calcs).


#34    Joe Arthur      (see all posts) 2006/09/15 (Fri) @ 17:55

Mickey’s original interest was in the chance that Howie Kendrick would win a batting title, and we diverged onto the question of what it takes to dominate like a Gwynn or a Boggs.

I basically replicated CDM’s model; simulating 10,000 seasons against a field of 77 qualifiers. Rather than comparing a single player at “true talent” of .347 against the field of 77, I leveraged the simulation by comparing the field separately against 20 different model players, whose true talent ranged from .270 to .365, and counting batting titles won by the model players against the field.
my results:
ability 0.270 won 2 times
ability 0.275 won 7 times
ability 0.280 won 17 times
ability 0.285 won 26 times
ability 0.290 won 46 times
ability 0.295 won 100 times
ability 0.300 won 197 times
ability 0.305 won 337 times
ability 0.310 won 523 times
ability 0.315 won 870 times
ability 0.320 won 1150 times
ability 0.325 won 1702 times
ability 0.330 won 2293 times
ability 0.335 won 2993 times
ability 0.340 won 3837 times
ability 0.345 won 4564 times
ability 0.350 won 5467 times
ability 0.355 won 6348 times
ability 0.360 won 7063 times
ability 0.365 won 7663 times

So, against the distribution of batting averages roughly typical of the 21st century, according to this simulation, a “true Talent” of .347 or .348 would be the break-even point to have a 50% chance of winning the batting title one season. A “Howie Kendrick” with a static “true talent” of .310 or so persisting accross his first 5 years would have about a 24% chance to win at least 1 batting title if he qualified each year, which roughly agrees with Mickey’s result ...


#35    tangotiger      (see all posts) 2006/09/15 (Fri) @ 18:01

Fabulous stuff!

Btw, what true talent distribution did you use for the field?


#36    Joe Arthur      (see all posts) 2006/09/16 (Sat) @ 05:23

I did something similar to what I understood CDM to have done:
1) started with assumption of true talent of mean of .282 and .019 SD
2) for each season, for each of 77 players, randomly picked a true average with a probability function based on a normal distribution with those parameters. Not sure if CDM did just this, or something more sophisticated.
3) used a ‘spinner’ model to simulate 550 AB for each player for that season, as well as the 20 model players.

My distribution of talent in the field would not capture the mild right-skew of the actual distribution in 2000-2005; if it had, the field would have won somewhat more batting titles. CDM may have done a better job modelling that…


#37    dq      (see all posts) 2006/09/16 (Sat) @ 20:03

Your sim doesnt work either - if I understand it a .290 player wins only 46 times out of 10,000 - that is well below what has actually happened.

If your run a simulation, you need to look at the results and see if they make sense. If they are way out of whack, then there might be a problem with the simulation.


#38    Joe Arthur      (see all posts) 2006/09/17 (Sun) @ 07:54

Hi DQ,

I don’t think your interpretation is correct. I’m giving the chance of a single .290 hitter winning. Your expectation seems to be set by the possibility of an “average” hitter winning, for which you previously suggested there were 9 in 118 seasons since 1946.

In my distributions of 77+1 player samples, there would typically be 4-5 players close to .290 and another 40 below that, each with a finite possibility of winning. The probability of a “hitter in the .290 or below” group winning is TOO HIGH in my simulation, because I did not try to replicate the real-life right skew of the actual distribution.

The model was intended to be generic in certain senses (no randomizing for park effects, no randomizing number of AB so that some players qualify with minimal opportunities). It was also modelled on the conditions of the 2000s, rather than attempting to model baseball history since 1946.

There are some deflations to the evidence you cited earlier which you have not noted.

It was easier to win a batting title in 1953 in an 8 team no-DH league with only 42 qualifiers(AL; 41 NL). Rerunning the simulation against 41 qualifiers, (making the .290 hitter the 42nd) and reducing the AB qualification slightly to 543 instead of 550 (matching the average AB of 1953 qualifiers), I got the .290 hitter winning 119 times [and a .300 hitter 420 times]. Of course there would be fewer .290 hitters in the pool, but the collective chance of some .290 hitter or below winning should rise a little. And 1953 was at the high end of the period in terms of the number of qualifiers. 1958 only had 71 qualifiers in both leagues combined.

At least the 1950s were comparable to the 2000s in the average and spread of batting average. You wrote in detail about Norm Cash without ever noting that his averages mostly were compiled in the 1960s. Starting in 1960, vs the league average, he was
+31
+105
-12
+23
+10
+24
+39
+6
+33
+34
+9
+36

In that context of league averages of .230 to .255, Cash’s .271 career average is about as relatively superior as a .300 hitter would be in today’s environment with .270 league averages. In 1961 he competed with 48 other qualifiers, and he only had to beat .324 to win. I don’t think it’s germane in this discussion that he hit so well that he won by a lot.

To look at your examples again:
Cash: .271 career in a low average environment
Furillo: .299 career in an environment comparable to 2000s
Vernon: .286 in a career spanning 22 years
Mueller: .292 career though 2005, (split into 7 years at .286 mostly in BA-suppressing SF, and 3 years in BA-enhancing Boston at .303, which is where he won his batting title)
Groat: .286 career hitter ( the bulk of his career was 1955-1966, so at least 1/3 in low average environments.)
Ashburn: .308 career hitter
Lansford: .290 career hitter, mostly compiled in the BA-suppressing environment of Oakland; his title won in Boston in 1981 with only 399 AB in the strike-shortened season.
McGee: .295 career hitter
Pendleton: .270 career hitter

Considering that these players’ peak talents should be somewhat superior to their career average, and adjusting (or just throwing out) the players with a lot of their career average from the ‘60s, I think Pendleton is the only one of your examples who wouldn’t have been a ways better than .290 talent in his park context in the year he won.

I think my model gives a .290 hitter [that is .290 with park effect built in] somewhat too much chance to win, not too little.


#39          (see all posts) 2006/09/17 (Sun) @ 12:00

Joe,
My model was no more sophisticated. You did almost precisely what I did. The sample you took has a STD of .0275.  A “true” ability STD of .019 results in a BA STD of .0269.  Nice work.

DQ,
You’re point is well taken. The results seem a little odd. So I reran the sim, looking at a couple extra distributions:

1) Players “true” > 2nd-Best’s (the field) “true”
2) Players “true” > Fields BA
3) Players BA > Fields “true”
4) Players BA > Fields BA

Only #4 gives us the likelihood of winning a batting title, but the others tell us something interesting, too…

On average, the best “true” ability in the league, measured by ability, is about .327 (a .327 hitter has a 50/50 chance of being the best “true” hitter).  But that doesn’t mean he’ll actually have a 50/50 chance of getting the best BA.

Over my 10000 trials, a .290 “true” hitter NEVER had the best ability in the league. There were always better “true” hitters, and frequently many other very similar hitters. (Our sample, after all, has a mean of .282).

Interestingly, a .345 hitter has about a 43% chance of having more “true” skill than the best BA in the league. He also has a 43% chance of having a higher BA than the best BA in the league. Thus, a .345 hitter doesn’t need to significantly out-hit his “true” ability in order to have a decent chance of winning a batting title. 

But the shape of these distributions are very different: A .320 hitter has about 0 chance of having more skill than the BA of the next best hitter.  But a .320 hitter has a 10% chance of winning a batting title (having a greater BA than anyone else).

The moral of the story: Joe’s analysis is correct. Good hitters (.290-.300) have nearly a microscopic chance to win the title.
Great hitters (.300-.330 range) must out-hit their “true” ability considerably in order to have a decent shot at the batting title.
Exceptional hitters (.345-.355) are so good that they don’t really need to get lucky to win a title. Both their “true” ability and their BA will likely be greater than the entire field.

I have some cool graphs that show this; email me if you’d like one.


#40    dq      (see all posts) 2006/09/17 (Sun) @ 14:28

After I emailed, I realized I did not think about it being cumulative, rather the chance for each hitter.

I think our conclusions are about the same. The only thing I’m not sure I understand is “ a .320 hitter has a 0 chance of having more skill than the BA of the next best hitter.”

Also, part of the reasons that Cash is above the league average compared to today is the DH. You should compare him against the NL 262/263 and call him a .290 hitter today.

Cash only finished in the top 10 in batting one other time, 7th.


#41    tangotiger      (see all posts) 2006/09/18 (Mon) @ 08:39

dq reported:

So instead of 10 times in 10,000, I get 9 out of 118.

as the number of times an average hitter has won the batting title.  It looks like he used anyone +.010 above league average, or worse, which corresponds to about +0.5 SD or worse.

The number of players at +0.5 SD and better, in a normal distribution, is 30%.  So, we have this group of players winning 109 of 118 batting titles, or 92%.

I’d have to overlay a normal distribution over Joe’s data to see if this corresponds.  So, maybe Joe can tell us instead.

The key point of course is that since 8% of batting titles were won by an average hitter, doesn’t mean that any average hitter has an 8% chance of winning the title.  One in the population of 70% of the hitters has an 8% chance of winning the batting title.  Since 40% of average hitters are in that 70% group (with the other 30% being below average, i.e., -0.5 SD and worse, and therefore unable to win a title), this means that each average hitter has a 0.5% chance of winning the batting title.  In 10,000 tries, that means 50 batting titles.

Joe’s sim looks like the answer would be around 10, so it does look low.

However, that original figure reported by dq is not necessarily a list of average hitters.  Willie McGee and Pendleton both won their batting title preceded by a horrible year.  McGee’s career average of .295 has likely less uncertainty around what McGee’s true talent level was the year he won the batting title, than his .236/.312 did of the year before/after.

Similar for Lansford.

Ashburn a league average hitter?  He won two batting titles, one of which he wouldn’t have qualified as a league average hitter.  The other one, at the age of 31, he had his highest BA ever, but surrounded by two ordinary years.  But his career BA is .308!

Groat finished in the top 5 in BA 3 times.  I could go on.  In short, I think the method of looking at just year before/after is flawed.  At the least, I would look at a hitter’s age 23-32 years as being representative of a hitter’s talent level.

From that respect, and with only 118 titles to begin with (i.e., if you end up with 4 or 7, both carry high uncertainty level), I’d say Joe’s model fits.  The important point is that once you model something fairly reasonably, the results are the results.

The only sticking point is if there’s something the model is not considering, which would yield incorrect results.


#42    tangotiger      (see all posts) 2006/09/18 (Mon) @ 08:48

Yikes.  A bit of a goof.  If we assume 100 qualifying hitters a year, then say 40 of them are above average, and 60 are average or worse.  (In a normal distribution, that 40 would be 30, but we are only looking at guys who qualified for the title, so alot of the bad guys aren’t in here.... right-skewed distribution).

Anyway, 8% are won by 60 hitters, and 92% are won by 40 hitters.  A random qualifying hitter will win 1.0% of the time.  If he’s in the gang of 60, that means 0.1% of the time, and in the gang of 40, it’s 2.3% of the time.

0.1% of the time, times 10,000 times, is 10 times.  And, that EXACTLY matches Joe’s results.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 21 17:29
Sabermetric Moves of the 2009 Pre-Season

Nov 22 06:40
The New Triple Crown

Nov 22 06:24
Chance of Scoring by Base/Out, Retrosheet Years

Nov 22 02:48
How good are the Fans in evaluating fielding?

Nov 21 20:13
Runs Produced

Nov 21 19:27
Marcel 2009 is here

Nov 21 16:43
Nate Silver: hero to interviewers

Nov 21 10:57
New BBTN

Nov 20 20:34
ABSO-lutely… not!

Nov 20 19:23
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being