THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Sunday, March 20, 2011

Why MLE’s are a mess…

By , 02:38 AM

I’ll warn you in advance (what other kind of warning is there?) - this is a long post and one that is hard to follow…

If I look at park and league adjusted AAA stats and compare them to MLB stats for the same players, weighted by the lesser of the two PA (e.g., if a player had 300 PA in AAA in a certain year and 100 PA in MLB in a certain year, I use the 100 PA to weight each of those stats, AAA and majors), I get this:


For players who had AAA and MLB league service in the same year, min of 50 PA in each year, for all players from 2006-2010 (I call these “same year” players):

AAA OPS: .846
AAA OPS normalized to all AAA players (nOPS): 1.11
MLB OPS (of those same players): .691
MLB OPS normalized to all MLB players (nOPS): .92
Ratio of MLB nOPS to AAA nOPS: .83

There were 560 of these “same year” players.  There were a total of 62,604 “PA” where “PA” is the combined minimum of the two PA (AAA and MLB) for each player.

The average number of PA in AAA per player was 228 and in MLB it was 168.

The average age of these players is 23.6.

There is nothing remarkable in these numbers.  We are basically saying that when a player goes from AAA to MLB (it could be the other way around – I did not distinguish time frames) in the same year, he loses 17% of his OPS.

Now, what if we do the same thing but for players that have AAA service time in year 1 and MLB service time in year 2 and compare their stats from AAA to MLB?  These players could also have had some MLB time in year 1 and/or AAA time in year 2.  The only requirement was some AAA time in year 1 (at least 50 PA) and some MLB time in year 2 (also, at least 50 PA).

For players who had AAA and MLB league service in consecutive years (first AAA, then MLB), min of 50 PA in each, for all players from 2006-2010 (We call these “next year” players):

AAA OPS: .824
AAA OPS normalized to all AAA players (nOPS): 1.08
MLB OPS (of those same players): .715
MLB OPS normalized to all MLB players (nOPS): .96
Ratio of MLB nOPS to AAA nOPS: .88

There were 600 of these “dual” players.  There were a total of 99,174 “PA”.

The average number of PA in AAA per player was 284 (in year 1) and in MLB it was 242 (in year 2).

The average age of these players is 25.8 in year 1.  So these players are 2 years older than the previous group of players above.

These players only lose 12% of the OPS value, whereas the ones who played at both levels in the same year lost 17%.  Why is that?

You might think, “That’s easy.  The second group is one year older in MLB than in AAA and thus they not only lose value by virtue of going from one level to the next, but they gain value because they are older.”

But…

For one thing, the second group is 25.8 in AAA. You would not expect them to gain much, if anything, from age 25.8 to 26.8 (we think that players peak from around age 26 to 28).

Anyway, what if we restrict both groups of players, the ones who played at both levels in the same year and the ones who played at AAA in one year and MLB in the next, to ages 26 or older?  We should not expect to see any increase in OPS in that age range due to aging.  If anything, the second group should lose a little value from year 1 to year 2 from aging.

For players who had AAA and MLB league service in the same year, min of 50 PA in each, for all players from 2006-2010, age 26 and older:

AAA OPS: .845
AAA OPS normalized to all AAA players (nOPS): 1.11
MLB OPS (of those same players): .684
MLB OPS normalized to all MLB players (nOPS): .91
Ratio of MLB nOPS to AAA nOPS: .82

So, not much difference for these players.  They were about 60% of the total pool of players (and 60% of the PA) that includes all ages, by the way.  Their average age is 28.3.

What about the “next year” players?

Only about half of these players were age 26 or older (average age 27.9 in AAA).  Here are their translations:

For players who had AAA and MLB league service in consecutive years (first AAA, then MLB), min of 50 PA in each, for all players from 2006-2010, age 26 or older:

AAA OPS: .827
AAA OPS normalized to all AAA players (nOPS): 1.08
MLB OPS (of those same players): .718
MLB OPS normalized to all MLB players (nOPS): .96
Ratio of MLB nOPS to AAA nOPS: .89

So, older players (28.3) who play in AAA and MLB in the same year lose 18% of their OPS, while older players (27.9 in AAA) who play in MLB in the next year only lose 11% of their value.  (The younger ones actually lose 13% of their value - remember all players combined lose 12%.) That seems backwards.  Younger ones should lose less value than older ones because of aging (younger ones should get better, older ones should not).

So what the heck is going on?  And what discount (12% - “next year”, or 17% - “same year”) should we use for AAA to MLB translations?

What if for the “next year” players, we only look at players who did not play in MLB in year 1.  The previous numbers for the “next year” translations represent players who played in AAA in year 1 and in MLB in year 2.  As I said two paragraphs ago, these players could have also played in MLB in year 1, although I am not sure what difference it would make in terms of the translations. In any case, we’ll only look at players who did not play in MLB in year 1 – only in year 2 (they still could have played in MLB in any year prior to year 1).

For players who had AAA and MLB league service in consecutive years (first AAA, then MLB), min of 50 PA in each, for all players from 2006-2010, no MLB performance in first year:

There are only 233 players (all ages), average age 25.6.  Originally, there were 600 players, so over 60% of them had MLB service time in year 1 also.

AAA OPS: .802
AAA OPS normalized to all AAA players (nOPS): 1.05
MLB OPS (of those same players): .725
MLB OPS normalized to all MLB players (nOPS): .97
Ratio of MLB nOPS to AAA nOPS: .92

Now these players are only losing 8% when going from AAA in one year to MLB the next year, when they didn’t play in MLB in the first year.  For those that did play in MLB in year 1, they lost 14% (from AAA to MLB) in year 2 (.837/1.10 in AAA and .709/.95 in MLB).  Wow, what a difference!

What these numbers are saying is that:

• If you play in AAA and MLB in the same year, you lose around 17% of your OPS value (remember everything is normalized to league totals – the raw numbers mean nothing).
• If you play in AAA one year and also MLB in that same year and then MLB again the next year, you lose 14% from AAA in year 1 to MLB in year 1.
• If you play AAA in year 1, do not play in MLB in that same year, and then play in MLB in year 2, you lose only 8% from AAA to MLB (one year to the next).

What if we break these last two groups – the ones who play in AAA and MLB in year 1 and then in MLB in year 2 and the ones who play in AAA in year 1 and MLB in year 2 but not year 1 – into young and old players?

The results are almost exactly the same (older players who didn’t play in MLB in year 1 lost 7% rather than 8% for all ages).  Age doesn’t matter.  If you played in MLB in year 1, you lost 14% (AAA to MLB) the next year, and if you did not play in MLB in year 1, you lost 8%, regardless of age.

What is happening?  I think there are two things going on.  As I showed, it has nothing to do with age.  It has to do with selective sampling and regression toward the mean.  The AAA players are not randomly selected and thus their stats do not reflect their true talent.  When we look at those players’ MLB stats, they are relatively unbiased and thus reflect their true talent.

So basically an MLE, at least the way it is computed herein (simply the ratio of MLB stats – e.g. OPS - to AAA stats), is not really a “translation” per se, but a translation AND a regression toward the mean. This is a very important point, and has some important implications.

The reason there is a regression toward the mean when we go from AAA to MLB stats is because the players who get to play in MLB got lucky on the average in AAA.  That is one of the reasons why they are promoted.  You have to do/have two things in order to get promoted from the minors to the majors – one, good talent or pedigree (such as being a high draft pick), independent of performance, as determined by scouts and other team personnel, and two, good stats in the minors.  As most of you know, any player or group of players with good (above the average of the population you come from, where that population is independent of your stats) stats got lucky, on the average.

With that in mind, let’s look at some of the numbers above.

Let’s start by accepting the proposition that no matter when you play AAA and MLB, your discount is always the same, and see if we can make sense of these numbers even though the discounts we found are all over the board.
Remember that “same year” players, young and old, lose around 17% of their OPS from AAA to MLB, while the “next year” group only loses 12% or so, again, regardless of age.  Maybe they are both actually losing the same amount in true talent, but the first group was luckier than the second such that it appeared that they lost more going from AAA to MLB.  In other words, more of the 17% was regression than the 12%.  Maybe they both lost 6% in true talent, but the first group lost 11% in regression and the second group only lost 6% in regression (or some other combination of numbers).

Let’s see.  The “same year” players had an OPS of .846 (nOPS of 1.11) in AAA and the “next year” players were only .824 in AAA (nOPS of 1.08).  So yes, we would expect the first group to regress more.  If we really want to separate the regression from the loss in OPS due to the change in level, we would need to establish the true talent OPS in AAA for both groups. I don’t think it would be correct to simply regress their observed OPS (.846 and .824) toward AAA average, since these are players who are deemed to be MLB ready by the powers that be for reasons other than their stats (presumably and partially at least), so the mean OPS of these kinds of players is proably higher than the average AAA player.

What if we look at their out of sample AAA stats – ones that are not used to determine whether they get promoted or not? That should give us a good idea as to their true talent.  It is not a perfect method, but it is pretty good.

First, we’ll look at the “same year” players’ AAA stats in year 2, if they have any.

For the “same year” players who had at least 50 PA in AAA in year 2, their OPS was .791 (in AAA), an NOPS of 1.05.  In year 1, it was .836/1.10.  (Remember that all “same year” players had a AAA OPS of .846, but some of them never played in AAA in year 2.)

In MLB in year 1, they batted .668/.88.  This implies a discount of 16%, a little less than the original 17%. 

Unfortunately, here we don’t have an unbiased estimate of their MLB talent, because we have a sample of players who played MLB in year 1 and then were sent back down to AAA in year 2, such that they likely had unlucky numbers in MLB. So that .88 nOPS in the majors is probably more like .91, which would give us a true discount of 13%.  Granted that .91 is a guess, but it should be clear that the “true” discount for these “same year” players is less than 16%.

For the “next year” players who had a discount of only 12%, not only did they only bat .824 in year 1 (as opposed to .846 for the “same year” players) which means that the regression toward the mean will be less than that of the “same year” players, but as it turns out, they also had some AAA time in year 2 (likely before they were called up), in which they batted .845!

In fact, these players actually had a combined AAA OPS in year 1 and 2 of .835. Now the discount is 13% rather than 12%, once we include their year 2 AAA stats.  But again, we likely had players who were lucky in AAA in both year 1 (.824) and year 2 (.845) such that .835 is not their true talent OPS in AAA.

Let’s look only at the stats of players who had AAA time in year 1 and 2 and MLB time in year 2. Basically this will eliminate those players who played in AAA in year 1, were called up to the majors at the beginning of year 2 and stayed in the majors for the whole year (or were injured at some point in year 2).  Those might be better players.

As it turns out, the players who had AAA time in year 1 and year 2 (and MLB time in year 2) only batted .811 in year 1.  In year 2, they batted .845, as I mentioned above.  Combined (year 1 and year 2), they batted .831/1.10 in AAA.  In MLB, they batted .703 with a nOPS of .94 (before it was .715/.96).  So these players had a discount of 13%.

Let’s see if we can get some idea as to the true talent level of players who get called up to MLB from AAA.  We need to find an unbiased sample of performance in AAA for players who were called up at some time.  This sample must be after they were called up and not before.

We’ll use the year 2 AAA stats of players who were called up in year 1.  That number was an OPS of .791.  In year 1 these same players had an OPS of .836.  What must the population mean be for .836 players to have a true talent of .791?  Let’s see.  The formula for regressing AAA OPS is % regression=589 / (PA+589).  (BTW, that is almost exactly the same as the regression formula for MLB OPS!) These players had an average PA in year 1 in AAA of 243.  Plugging that 243 into the above regression formula, we get a regression of 71%.  So .836 - (.836-mean) * .71 = .791.  Solving for the mean, we get .773.

Now let’s go back and regress all the above AAA stats toward this mean and see what the “real” discount is when going to MLB:

For the “same year” players, we had a year 1 OPS in AAA of .846 in an average of 228 PA.  So we regress 589 / (589+228), or 72% toward the mean of .773.  That is .793, which is a nOPS of 1.04.  The nOPS of these players in MLB was .92.  .92 / 1.04 is .88, so our “real” discount is 12% rather than original 17%.

For the “next year” players, those that had AAA stats in year 1 and year 2, batted a combined .831 in 530 PA in AAA.  We regress that .831 53% toward .773 for an estimated true talent of .800 or 1.06 normalized.  They hit .94 nOPS in the majors.  That is a ratio of .89 or an 11% discount.  We would expect there to be a slightly lesser discount for these guys (as compared to the “same year” guys) due to them being older in year 2 in MLB.

Finally, what about the “next year” players who had no AAA playing time in year 2?  They batted .841 in year 1 in AAA in 243 PA.  That requires a regression of 71%.  That is an estimated true talent of .793, or 1.04.  They batted .98 in MLB, so the ratio is .94 for a discount of only 6%.  My guess is that these guys are a lot better than .793 players.  They appear to have been promoted to MLB at the beginning of year 2 and never demoted in that year. 

Let’s look at how they did in the year before year 1.  In both years combined, year 0 and year 1, they batted .832 in 598 PA.  Regressing .832 50% toward .773 gives us an estimated true talent of .803.  That is a nOPS of 1.06.  However, with different weights, these guys batted 1.00 in MLB, for a ratio of .94 again, or a 6% discount.

As you can see, coming up with a pure MLE translation is not an easy exercise.  In some cases, we get a discount of 12% or 11%, and in others we get 6%, even when regressing the AAA stats.

One of the questions we need to ask when presenting MLE’s is, “What does the MLE mean and what are we going to do with it?”

If we truly want it to represent what that player would have done had he and everyone else (the defense for example) played exactly the same, and we want to include the good or bad luck, but “translated” to a different environment (MLB), then you need to use some pure coefficients, like the 12% or 6% discount that I estimated above.  Unfortunately just looking at minor and major stats, either from year 1 to year 2 or both in the same year, or some combination (and perhaps other years) won’t get you that number because of the regression and selective sampling issues, as you saw in all the gymnastics above.

If you want to know what a player would do in MLB given a certain stat or set of stats in another league, and you are going to include regression toward the mean – IOW, you want to know his MLB true talent, given a certain set of stats in another league – then you can simply use the actual minor and major stats. 

The problem with that – and it is a large problem – is that translation plus regression is not a linear function and thus you cannot simply use one coefficient for all players.  For example, let’s say that we use the “same year” translation of 18% that we get when comparing same year AAA (normalized) OPS to MLB OPS.  Well, that 18% includes a translation plus a regression because the pool of players was well-above average in AAA and thus their performance was better than their true talent (they were lucky as a group).

If you try and apply that 18% discount to a player who batted the same as the whole group in AAA, you are fine.  The result is exactly what you would expect if he gets called up to MLB (he will do worse because of regression AND because of the better league).

But if you had a player who was a league-average hitting player in AAA (maybe he is a SS or C with a great glove), he is not going to hit 18% worse in MLB.  His AAA OPS will not regress or it might even regress upwards.  It will also shrink of course, due to the better quality of the league.  Similarly, if you had a super star in AAA, say with a nOPS of 1.30, he is going to regress a lot toward some mean, perhaps the .773, and then lose some more due to the better quality of the league, for a total discount of more than 17% or 18%.

So if you want your translations to include regression (IOW, what WILL a player who hits X in AAA hit in MLB, not what would that X have been in MLB, including the luck), you had better have a non-linear equation for your translations!

At this point in time, I have no solution to these problems.  Typically we use MLE’s to help us with major league projections for those players who don’t have much or any MLB service time.  We also use them to wonder about who should or might be promoted in the near future.  As you can see, it is not real clear, though, how to come up with these MLE’s, which ones to use, and what they mean.

If we just had minor league numbers and we wanted to project major league ones, then we would want to use that non-linear equation, which we could come up with by running a multiple regression of minor stats on major stats including minor league playing time.  But if we also have some major league numbers and we want to combine the minor league ones with the major league ones and then plug those combined numbers into our projection algorithm which does regression and aging, we probably want some kind of linear translation for AAA to MLB stats that includes the luck in the minor leagues.  As you saw, that is not an easy number (the coefficient) or linear formula to come up with, and it might depend on when the player was called up or might be called up as compared to the bulk of his minor league playing time.

So when you see an MLE, don’t take it nearly at face value and always inquire what it is trying to represent.  A “major league translation” is not enough of a description for you to know what exactly it is and what it can be used for.  I believe we have a long way to go before we come close to mastering the MLE.

#1    KJOK      (see all posts) 2011/03/20 (Sun) @ 03:33

Tom:

1. I think to find the ‘answer’ you really need to also look at PITCHERS moving from minors to MLB in the same way as you do for hitters.  There are park effects and defensive changes between levels to consider, but I don’t think you can determine the true differences between levels without also including pitchers.  When I did this for Japanese players the hitters looked like they were moving to a much tougher league (something like an 18% hit) but then pitchers performed almost as well in MLB as you would have predicted if they had stayed in Japan, which led me to some very interesting MLE conclusions.

2.  It’s also helpful if you look not only at guys called up but at guys sent down - guys that played in MLB only in one year, then played in the minors the next year.  I think the last time I looked at this group, looking at only unadjusted performance as you’re doing, they had a HUGE spike in performance the second (minor league) year vs. their first year.

3.  Not sure I would recommend combining MLE and regressions together.  They are two seperate animals, and combining them only makes things more difficult and complex to figure out.  You first have to figure out the correct league difficulty adjustment, and that should give you a ‘true’ MLE (VALUE of the minors performance translated to the majors environment), THEN I think you apply the same types of regressions that you would normally apply to any MLB player from one year to the next to get your ‘projected’ performance.


#2    KJOK      (see all posts) 2011/03/20 (Sun) @ 03:36

Uhg, and out of habit I addressed “Tom” instead of “MGL”, even though I could have told just by the writing style who I was replying to...sorry ;>)


#3    flex0us      (see all posts) 2011/03/20 (Sun) @ 10:14

Pardon the interruption KJOK, but I wonder to what extend applying league difficulty would completely address the translation of stats of minor leaguers to Mlb. What I mean is, that by applying league difficulty you may get a good idea of a hitter performance in other leagues when “level of talent” stays the same, but if that level of talent changes (radically, as it does in Mlb),you won’t be getting a complete picture of a hitters true projection, you just will be getting a hitter performance in certain enviroment.


#4    Harry Pavlidis      (see all posts) 2011/03/20 (Sun) @ 10:37

Great stuff, MGL. This is something I’ve grappled with, mostly to my own aggravation: MLEs for batted ball types. Teasing out stringers and parks and regression and leagues and age and experience and same season, prior season, next season effects and selection bias and lions and tigers and I think I gave up.


#5    Tim Armstrong      (see all posts) 2011/03/20 (Sun) @ 10:55

Very enlightening work. Thanks, mgl. From now on, I will view individual MLE’s as a point within a player’s range of possible actual MLE’s, wherein the player’s TTL may also lay.


#6    MGL      (see all posts) 2011/03/20 (Sun) @ 14:32

KJOK, looking at batting performance normalized to the average batter in each league tells us the relative quality of batters in each league.  Same for pitchers.  When doing each one, we don’t care about the quality of the other, and it won’t change the numbers at all.  So I am not sure why you bring up the pitching.  Also, it may be true that the difference in the quality of the batting in, say AAA, is different from the difference in quality of the pitching, but I would be surprised if that were the case and in fact my numbers don’t bear that out, I don’t think.

Looking at batters who were in MLB in one year and then in AAA the next is a good idea. Of course, the MLB numbers will be biased toward bad luck and would have to be regressed before the ratios can be looked at.  But it is probably much easier to regress the MLB stats than the AAA ones, because the mean to regress toward is probably clearer (the mean OPS of all rookie players).  I will do that and report back.  Thanks for the suggestion.


#7    KJOK      (see all posts) 2011/03/20 (Sun) @ 14:43

"KJOK, looking at batting performance normalized to the average batter in each league tells us the relative quality of batters in each league.  Same for pitchers.  When doing each one, we don’t care about the quality of the other, and it won’t change the numbers at all.”

What I didn’t say very well was, if MLB is 15% ‘harder’ than AAA or Japan, then ‘in theory’ both batters and pitchers should decline 15% in performance when switching levels.  If batters decline 15% and pitchers only decline 5%, then it at least suggests that the league difficulty is not 15% but something less, and that there are other factors in play that may not impact batters and pitchers switching levels equally (defense and parks being two of those factors.)


#8    David Gassko      (see all posts) 2011/03/20 (Sun) @ 16:37

I thought KJOK was suggesting something different the first time I read his post, and I think it would be an interesting experiment: What happens if you build MLEs based on pitcher hitting? The sample sizes would be much smaller, but they might say something interesting, and you don’t have to worry about regression to the mean since pitchers aren’t selected based on their hitting.


#9    MGL      (see all posts) 2011/03/20 (Sun) @ 21:29

Using pitcher batting is a great idea. I always forget about that.  It would definitely have some sample size issues though, I think.

KJOK, hitting and pitching quality are two completely separate things. There is no such thing as a league being better or worse without looking at pitchers and hitters separately.  How would you tell otherwise?  The raw stats are meaningless.  An average hitter can hit .750 in AAA and then hit .760 in MLB.  That doesn’t tell us anything about the relative pitching or hitting between the two leagues.  MLB could have smaller parks, larger strike zones, colder weather, different baseballs, etc.

As I explained in my post on league changes from year to year and the difference between NL and AL, the only way to estimate differences between leagues is to compare normalized batting stats, for batting quality, and normalized pitching stats for pitching quality.  There is no cross-over.  If a league average batter in one league is 10% less than a league average batter in another league, then the other league is 10% better in batting.  How they compare pitching-wise will not change those numbers.  If the first league has Little League pitchers, that league average batter will still be a league average batter. If the second league has Little League pitchers or they are all Steven Strasburg, that same batter will still be 10% less than league average. Same thing for pitching.  It seems to be a common misconception that we somehow can tall the relative quality of leagues by looking at non-normalized stats.

So there is no such thing as a league being 15% better and the pitchers are 15% better but the batters are 5% or 10% better.  As I said, the league quality, relative to another league, is the sum of the difference in batting and pitching quality.  Totally separate…


#10    joe arthur      (see all posts) 2011/03/21 (Mon) @ 00:33

"So there is no such thing as a league being 15% better and the pitchers are 15% better but the batters are 5% or 10% better.”

KJOK might agree with you. The original post discussed MLEs solely in terms of hitters. Pitching talent appears to fall in a more compressed range than hitting talent (at least performance is in a more compressed range), but on the other hand there is more relative demand for pitcher talent [pitchers are 11% of the team on the field but 44% of the roster]. So it would not be surprising if AAA was 15% easier for hitters and some different rate for pitchers. If pitchers have a similar translation rate at AAA it is by coincidence, and you won’t see that at other levels of play.  You have a translation rate for the league for hitters, and a translation rate for the league for pitchers, and it would be deceptive to think about having a translation rate for the league, period.


#11    KJOK      (see all posts) 2011/03/21 (Mon) @ 01:56

"KJOK, hitting and pitching quality are two completely separate things. There is no such thing as a league being better or worse without looking at pitchers and hitters separately.”

Correct of course, but I don’t think you can look at the two in a vacuum either.  It would be very suprising if AAA hitters were less talented than MLB hitters by a substantial amount, but AAA pitchers were equal to MLB pitchers, right?

Now certainly, the Pacific Coast League could be loaded with hitters, and the International League could have a talent glut of pitchers, but if we found that an average AAA hitters APPEARED to be 15% worse overall than an average MLB hitter, I think we would expect as a default hypothesis AAA pitchers would be worse than MLB somewhere in the 10-20% range (after adjusting for context)?

But as you and Joe both point out, in reality that’s apparaently not the case.  Joe noted that pitching talent falls into a more compressed range than hitting talent (due to position players having position and defense considerations which makes their hitting talent dispersal wider.)

What I remember finding interesting is that whatever league I looked at, just looking at the “RAW” numbers, hitters ALWAYS appeared to lose more value in raw numbers than pitchers moving up a level, whether it was AAA, or Japanese Leagues. 

And this led me to two other considerations:

1.  Parks/context.  Moving up to the MLB level always results in a new set of parks that are larger/more pitcher friendly than the ‘old’ league.  There may also be differences in league scoring, in the baseball’s themselves, etc. that I lump in here as part of context.

2.  Defense.  Ball in play rates always go down as the level of play increases.

Both of these factors favor pitchers who move up vs. hitters that move up levels.  At least for Japanese players, by adjusting for ball in play rates then making some assumptions about parks/offense I could almost reconcile the raw batting and pitching data to have close to the same ‘level of Japan to MLB difficulty’ conversion rating.

As you said, MLE’s are a mess, and maybe I’m overrating the value of looking at the pitching side, or even coming up with erroneous conclusions by doing so.


#12    MGL      (see all posts) 2011/03/21 (Mon) @ 03:16

"So it would not be surprising if AAA was 15% easier for hitters and some different rate for pitchers. If pitchers have a similar translation rate at AAA it is by coincidence, and you won’t see that at other levels of play.  You have a translation rate for the league for hitters, and a translation rate for the league for pitchers, and it would be deceptive to think about having a translation rate for the league, period.”

Yes, that is pretty much what I said.  They are 2 totally separate things. There is no such thing as “league rate” which includes pitchers and hitters unless you want to simply combine or average hitters and pitchers. It might be true that the difference in quality for pitchers is less than that for hitters.  As you said, if they were both exactly the same it would be somewhat of a coincidence, although I would expect they would be similar.  But you are right that the spread of pitching talent is less than that if hitting, so we would probably expect not as much of a drop-off going from MLB to AAA.

Let’s see:

For “next year” pitchers, there is a 10% discount for OPS against.  Remember for batters it was 12%.  Pitchers typically need to be regressed more (for AAA batters, it is 589 PA for 50% regression, for AAA pitchers, it is 2400 TBF, and in MLB, it is 1200 TBF) so the actual drop-off is probably a lot less than that.

The normalized OPS against for AAA pitchers who have MLB service in year 2 is .93 (7% better than league average) in 246 TBF.  That would get regressed to maybe .96, which would put the true regression at 7% (in MLB they had a normalized OPS against of 1.03, so .96/1.03).

If we include AAA stats in year 2, we get 406 TBF and .92 in AAA and 1.05 in MLB for a discount of 12% this time. Regressed that is a discount of around 9%.

For “same year” pitchers, it was a discount of 15%.  .91 in AAA and 1.08 in MLB (.91/1.08=.85).  Regressing the .91 to .96, we get .96/1.08 or an 11% discount.

If we go backwards (MLB in year 1 and AAA in year 2, as in a demotion), like KJOK suggested with the batters, we get .97 in AAA and 1.08 in MLB, 217 TBF in AAA, for a discount of 10%.  If we regress the MLB 1.08 to 1.04, we get a 7% discount.

BTW, if we go backwards for batters, MLB in year 1 and AAA in year 2, we get a 15% discount, 1.02 in AAA and .87 in MLB.  The MLB stats are only 111 PA per player though, so they would be heavily regressed to maybe .93 or .94 (the mean I am regressing toward is .95), which would give us an effective discount of 8%.

So it looks to me like the true translation for batter is around 7 or 6% and for pitchers, 6 or 7%, but again, it is hard to tell.


#13    MGL      (see all posts) 2011/03/21 (Mon) @ 03:38

Correct of course, but I don’t think you can look at the two in a vacuum either.

I have no idea what “you cannot look at the two in a vacuum” means.  You can say that about anything in the world and I still wouldn’t know what it means without further explanation.

It would be very suprising if AAA hitters were less talented than MLB hitters by a substantial amount, but AAA pitchers were equal to MLB pitchers, right?

It is not possible.  If that constitutes surprising, then yes.

Now certainly, the Pacific Coast League could be loaded with hitters, and the International League could have a talent glut of pitchers, but if we found that an average AAA hitters APPEARED to be 15% worse overall than an average MLB hitter, I think we would expect as a default hypothesis AAA pitchers would be worse than MLB somewhere in the 10-20% range (after adjusting for context)?

I suppose.  It should be about the same, depending, as Joe pointed out, on the density of pitching and hitting talent.

But as you and Joe both point out, in reality that’s apparently not the case.

He said it probably isn’t the case.  He has no evidence that I am aware of.  I ran the numbers above and it looks like it is close.  Maybe pitchers are a little closer.

Joe noted that pitching talent falls into a more compressed range than hitting talent (due to position players having position and defense considerations which makes their hitting talent dispersal wider.)

I am not sure that is the only reason, but fair enough.

What I remember finding interesting is that whatever league I looked at, just looking at the “RAW” numbers, hitters ALWAYS appeared to lose more value in raw numbers than pitchers moving up a level, whether it was AAA, or Japanese Leagues.

If pitchers lose less value when moving up, as we are hypothesizing (maybe), then they should lose more in raw numbers.  If the batters lose more in raw numbers, then pitchers lose more in value. 

Imagine that the pitching talent is the same in AAA and MLB but batting talent is much better.  And let’s say that the OPS in AAA is .700 and it is .750 in MLB.

When a batter comes over, he will face the same quality pitcher and thus still hit .700 (assuming that parks, etc. are the same).

Pitchers will come over and face much better batters and have an OPS against of something greater than .700.  So pitchers do worse and batters do the same.  Whoever does worse when coming over has the least difference.  So again, if you find that batters lose more value than pitchers when moving up in quality then they would have less difference than pitchers and not more, as we suspect should be the case.

And this led me to two other considerations:

1.  Parks/context.  Moving up to the MLB level always results in a new set of parks that are larger/more pitcher friendly than the ‘old’ league.  There may also be differences in league scoring, in the baseball’s themselves, etc. that I lump in here as part of context.

Yes, of course.  But, there is no need to worry about these when doing translations if you simply use normalized stats.  Using non-normalized stats and trying to figure out parks, etc. does not help you in any way, shape, or form.

2.  Defense.  Ball in play rates always go down as the level of play increases.

Both of these factors favor pitchers who move up vs. hitters that move up levels.  At least for Japanese players, by adjusting for ball in play rates then making some assumptions about parks/offense I could almost reconcile the raw batting and pitching data to have close to the same ‘level of Japan to MLB difficulty’ conversion rating.

Yes, I suppose that could explain the contradiction in the way the raw stats go.

As you said, MLE’s are a mess, and maybe I’m overrating the value of looking at the pitching side, or even coming up with erroneous conclusions by doing so.

As I keep saying, no need to look at raw stats and context (parks, weather, baseballs, umpires, etc.).  They only make things worse.  Using normalized stats and doing pitching and hitting separately tells you everything you need to know.  It is the regression and selective sampling that makes everything a mess.

I hope that if nothing else, people start to realize that:

1) It is not at all easy to figure out MLE coefficients.

2) When you given an MLE, you better explain what it means and how you derived it.

3) When you use an MLE, you have to be careful what you are using it for and how you are going to use it.

4) There is no such thing as an “MLE” without further explanation.


#14          (see all posts) 2011/03/21 (Mon) @ 10:09

MGL - Besides looking at pitcher hitting from AAA to MLB, you may get a better set of data looking at the hitters that are being kept down in the minors to avoid super 2 status and called up around June 1 (Stanton, Posey, Santana).


#15    Tangotiger      (see all posts) 2011/03/21 (Mon) @ 10:32

I wrote this a long time ago.  Probably like 2002 or 2003:

http://www.tangotiger.net/hateMLEs.html


#16    joe arthur      (see all posts) 2011/03/21 (Mon) @ 11:13

Not much resolved since 2002-3, it seems.

One other comment. I worked on MLEs in the mid-90s, for 1993-95 seasons, for AA and AAA, hitters and pitchers. It was a naive attempt in several ways, but one possible selection effect I noticed, which I’ve never seen mentioned by anyone else, is “organizational.” In those days, Atlanta had a good major league team and little “traffic” between the minors and the majors, while Kansas City had a lot. In general, it looked like bad major league teams had more players getting some tryout in the majors, so the samples weren’t balanced - unequal representation of parks, so the sample might drift from league-neutral for park translation, and potentially unequal quality of players getting a chance on different teams. IF (big if) park and league normalizations can be done reliably, the problem is avoided. Mickey mentions doing these adjustments as part of the recipe for trying to compute MLEs; these are potentially important but certainly problematic steps in the MLE process as well.


#17    Ben F      (see all posts) 2011/03/21 (Mon) @ 13:10

You noticed/mentioned this, but then it seems like it fell away: In trying to discern how much of the AAA performance was skill and how much was luck, wouldn’t the amount of time (PA’s) spent in AAA (and MLB) play a factor?  Maybe a higher threshold of PA’s would make the AAA OPS and the MLB OPS more indicative of actual skill and less about potential luck?


#18    MGL      (see all posts) 2011/03/21 (Mon) @ 17:39

"MGL - Besides looking at pitcher hitting from AAA to MLB, you may get a better set of data looking at the hitters that are being kept down in the minors to avoid super 2 status and called up around June 1 (Stanton, Posey, Santana).”

Sure, but you still have selection issues.  Players who play poorly in the minors are less likely to be called up.  And you have severe sample size issues when you look at only a relativeley small subset of players.

“Maybe a higher threshold of PA’s would make the AAA OPS and the MLB OPS more indicative of actual skill and less about potential luck?”

Sure.  It will just affect the regression though (less regression).  One of the issues is what mean to regress toward in AAA. That is not so easy to determine.  Plus, again, the more you limit your data, the more unreliable it becomes (sample size).

IOW, still a mess…


#19          (see all posts) 2011/03/21 (Mon) @ 23:35

This definitely wins the “making my head hurt” award for the day, maybe even the week.  Which given what I’ve been doing at my non-mom’s basement job, kinda says a lot right now.

Despite my mental anguish though, it great write up and following discussion, I enjoyed it.

Sometimes I need reminding that working on things that are hard can actually be fun.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:37
What sabermetrics is NOT

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion