THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, August 05, 2008

Observed Performance Inferring True Talent (OPITT)

By Tangotiger, 11:50 AM

I talked about this at length in the Edgar thread, so let me reserve this thread for more generic and technical arguments and presentation.

Let’s say you have someone who has a .380(*) career wOBA in 10,400 PA (16 seasons of 650 PA).  How many standard deviations (SD) is he from the league mean of .340?  Answer: 8.0

(*)For those new around here, a .380 wOBA is the same thing as a .380 OBP, with a corresponding profile of SLG, something like .475 or so.

A guy with a .380 wOBA in 10,400 PA is roughly +36 wins above average (WAA) and 69 wins above replacement (WAR).  This is around the discussion level of someone being a hall of famer.

Now, suppose someone has a .420 wOBA.  How many seasons does he have to play in order for us to say that he is 8 standard deviations from the league mean of 8.0?  Answer: 4 seasons.  That gives him a WAR of 26 wins and WAA of 18 seasons.

And a wOBA of .460?  A little under 2 seasons.  And a wOBA of .500?  Just one season, with a 11 WAR and +9 WAA.  That is a Bonds-like or Pujols-like season at their best.

So, is that enough?  Is it enough to say that your performance is 8 standard deviations from the league mean, in order for your Observed performance to infer great talent?

I don’t know.

Now, let’s try asking: how far away are you from a .300 wOBA level, which is right close to replacement level.  Here’s how that looks:


wOBA seasons SD.300 WAR WAA
0.500 2.5 16.0 28.3 23.0
0.460 4.0 16.0 35.3 27.0
0.420 7.1 16.0 46.7 32.0
0.380 15.9 16.0 69.2 35.9
0.340 63.6 16.0 132.9 00.0

So, when comparing to the .300 wOBA level, here is how many seasons each Observed performance level requires to be 16 standard deviations from the .300 level.

Is this all we need?  Do we need 4 high-end Pujols season for us to put him at the same level as the typical borderline HOF candidate?

I don’t know.

We also see that if someone could compile a league average career for 64 seasons, he’s also in the discussion.  His WAR would be outstanding, but his WAA would be zero of course.

Let’s make the comparison level even lower, that of a .260 wOBA.

wOBA seasons SD.260 WAR WAA
0.500 4.0 24.0 44.3 36.0
0.460 5.7 24.0 50.8 38.8
0.420 8.9 24.0 59.1 40.4
0.380 15.9 24.0 69.2 35.9
0.340 35.8 24.0 74.8 00.0
0.300 143.1 24.0 (24.6) (323.6)

Now, we see we need 4 Bonds-like seasons, 6 Pujols-like seasons, 9 typical star-like seasons, 16 typical HOF borderline candidate seasons (and 36 league average player seasons) to all be 24 standard deviations from the mean.

And now, look at their WAA.  All so very close, all around +36 to +40 wins.  The WAR are still leaning heavily toward guys with longer careers.  But the league average and below players provide the wrinkle.

Is this what we mean about guys having a Observed Performance Inferring the same Talent?

I don’t know.  Maybe?  Probably?

If we try one last comparison, and that is how far away is the performance from a .220 wOBA hitter (basically, someone who is a decent hitting pitcher), this is what you get:

wOBA seasons SD.220 WAR WAA
0.500 5.2 32.0 57.8 47.0
0.460 7.1 32.0 62.7 47.9
0.420 10.2 32.0 67.3 46.0
0.380 15.9 32.0 69.2 35.9
0.340 28.3 32.0 59.1 00.0
0.300 63.6 32.0 (10.9) (143.8)
0.260 254.4 32.0 (618.9) (1,150.4)

Now, we need 5 Bonds seasons, 7 Pujols seasons, 10 star seasons, and 16 borderline-like seasons, and 28 league-average seasons.  All of these are 32 standard deviations from a true .220 wOBA player.  In this case, we see the WAR are much closer, all around 59 to 69, while the WAA are in the 36 to 47 wins.  In this case, the WAR for the league average player makes (in 28 seasons) its way into the discussion, but WAA precludes such a player.

So, which model are we talking about, when we talk about guys that are equivalent in their Observed Performance Inferring True Talent?

#1    Rally      (see all posts) 2008/08/05 (Tue) @ 12:51

What is the minimum number of outstanding PA’s to get to 8 SD’s?  How close would John Paciorek’s one game be?  Or would he have to go 50 for 50 before disappearing?


#2    Rally      (see all posts) 2008/08/05 (Tue) @ 13:28

If I did the formula right I get a guy with a .950 Woba in 45 PA at the same SD as the .380 guy for 20 seasons.


#3    Tangotiger      (see all posts) 2008/08/05 (Tue) @ 13:54

A guy with 13,000 PA with a .380 wOBA is 9.0 SD from the .340 point. 

A guy would need 56 PA at .950 wOBA to match that.

(Note: for wOBA, you have to do wOBA*(1.1-wOBA), as noted in the Appendix.)

On the other hand, if the comparison point is a .260 wOBA (which I think probably makes the most sense if you look at the blog entry above), you need 395 PA at .950 wOBA.

So, a guy who plays for 4 months and is basically Barry Bonds on a hot streak to end all hot streaks… he’s demonstrated the talent required more than say Fred McGriff in his career.


#4    dcj      (see all posts) 2008/08/05 (Tue) @ 17:54

I think that a Bayesian style analysis is the way to go here. Start with the distribution of true talent wOBA over all players, as a prior distribution. Then add 10,000 PA at .380 wOBA, or whatever, and get a posterior distribution.

This avoids the problem of having to choose a baseline.


#5    tangotiger      (see all posts) 2008/08/05 (Tue) @ 18:04

That sounds the same as SD from the league mean, implying a .340 baseline.

Can you work out an example of say someone who has a career 100 safe per 300 PA and 1000 safe on 3000 PA, where the true league mean is .3333, with 1 SD = .033?


#6    Rally      (see all posts) 2008/08/05 (Tue) @ 22:18

I’ve put this formula into the Lahman database to see how it works for seasons post WWII.  I’m not sure if I buy it as a measure of greatness yet.

Heres how some players stack up:

Rudy Pemberton’s 1996 comes in at 3 SD, for 41 AB of just over .500 hitting.  This is comparable to the best seasons a guy like Brian Downing put up.

Kevin Maas’s 1990 is a 2.4 Those are the big fluke partial seasons that come to mind.

The top seasons are by Bonds, Williams, and Mantle.  McGwire 1998 is up there (8.6).  Norm Cash 1961 also passes the 8 SD mark (8.5), the best year by a guy not thought of as a HOF type hitter.

Back to Edgar, his 1995 season is + 7.0


#7          (see all posts) 2008/08/06 (Wed) @ 02:38

Over his first 64 PA plate appearances of 2004, Barry Bonds had a wOBA of .740, which translates to +6.74 SD above a .340 wOBA.  This is about equal to the SD of a .380 hitter over 6400 PA, or 10 years.  That is much higher than, for instance, Jim Rice’s entire career.


#8    tangotiger      (see all posts) 2008/08/06 (Wed) @ 07:05

But what about if you only looked at Rice’s peak, since he obviously had alot of filler seasons in there that simply drag him down.

Also, as noted, I think I prefer the .260 baseline (or thereabouts), to handle the Kevin Maas issue.  In fact, ALL hitters will have a Kevin Maas issue, since everyone will have had say a 100 or 200 PA string in which their performance was 5 or 6 SD above the .340 mean.  That’s why lowering it to .260 might make more sense, in that it rewards longevity.

As you can see by the manipulation of the baseline, you kind of are fitting the model to the data.  But, that’s really what we are trying to do.  We know greatness (Koufax, not Shane Spencer) when we see it, and so you want a model to reflect that.

If you are happy with the model, then you must accept its conclusions.


#9    Tangotiger      (see all posts) 2008/08/06 (Wed) @ 08:00

I just wanted to highlight this fantastic post by Blackadder at BTF:

=====================================

I was thinking about this last night, and I believe when you actually examine the math there is much less difference between Dan’s and Tango’s positions are not that different.

What, precisely, is Tango’s method? He starts with a hypothetical player, say, Joe Smith, of a certain level, say average, or replacement, or AA. Then he “rates” the actual baseball player, Tom Awesome, by asking how likely it is that Joe Smith could have put up the performance that Tom Awesome did. Since the probabilities are obviously going to be incredibly small if Tom Awesome is even a marginal hall of fame player, it is easier to express them in terms of standard deviations. In other words, Tango rates players by asking how many standard deviations their performance is from the mean performance of some baseline player.

The stat Tango uses is wOBA, which is basically just a version of OBA that gets the relative weights of the offensive events right. It has the nice property that it is linearly related to the number of runs a player adds, hence the number of (offensive) wins. If a player has a fixed wOBA and has PA plate appearances, the standard deviation of his sample wOBA is

SQRT(wOBA*(1.1-wOBA)/PA)

(I am not sure why it is 1.1 instead of 1, but I think it is because wOBA is rescaled.) To make things readable, let’s assume that we use the baseline of .300 wOBA, which is a replacement level hitter. Then if Tom Awesome has a given wOBA in a given number of PA, his Standard Deviations above what the .300 wOBA guy does is

(wOBA-.300)/(SQRT(.300*(1.1-.300)/PA))

Now, since we are only interested in comparing players, we can ignore multiplicative constants in this formula (they won’t change any ordinal rankings). In particular, we can ignore the 1/SQRT(.300*(1.1-.300)), since we are comparing everyone to a .300 hitter. The formula becomes

(wOBA-.300)*SQRT(PA)

wOBA-.300 is proportional to batting runs above replacement per plate appearance. This, in turn, is proportional to batting wins above replacement per full season, assuming seasons of fixed length. By taking any player’s fielding, positional, and baserunning value and “reassigning” it to his hitting, wOBA-.300 can be taken to be proportional to WARr, the players WAR per season. PA is proportional, by definition, to the number of seasons. So the formula becomes

WARr*SQRT(N)

Where N is the number of seasons player. Squaring this, which won’t change any relative rankings, we get

N*(WARr)^2

As Tango’s definition of a players value: the number of seasons played, times the square of his rate of accumulating WAR. But this is REMARKABLY similar to Dan R’s old salary estimator! Indeed, if you apply Tango’s value formula to each individual season (so that N<=1) and add up the results, you precisely get Dan R’s formula. Applying Tango’s formula to a player’s entire career, or prime, is not exactly the same as applying it season by season and adding; the later method will, for example, favor players who were more inconsistent in their value. Still, the errors are not that great.

So what Tango’s method with the .300 baseline does is essentially rate each season by the square of its WAR. Weighting seasons non-linearly in this manner has been fairly common in HOM discussions as way to seamlessly blend peak and career considerations. Dan R, as I mentioned, used an exponent of 2, and now uses 1.5. Joe Dimino, in his Pennants Added method, uses something more like 1.25. Although no matter the baseline Tango’s methods are always at root quadratic, they are well approximated by other exponents.

For instance, consider Tango’s method with a .260 wOBA baseline. .040 points of wOBA is about two wins over a whole season, so running through the same calculation as before with the new baseline we get the formula

(WARr+2)^2*N

for a players value. In the range [4,11], which is the relevant range of WARr values for HOF contenders (anyone less is probably not a reasonable candidate, and there aren’t many people higher) the function

(x+2)^2

is very well approximated by

4.5*x^(1.5)

Thus, the .260 baseline is very close to using an exponent of 1.5 to weight seasons, which Dan R currently does (there are other differences here; Dan currently considers in-season durability, which this does not). The .220 baseline is about an exponent of 1.2, which is not that far from Pennants Added.

Thus, instead of thinking “what is my baseline”, you can think “with what exponent do I weight each season”? Personally, I like the 1.5 exponent, or the .260 baseline. But for people who want to rank players solely by career WAR, there is a baseline (in fact, infinitely many) that, in the [4,11] WAR range, are well approximated by a linear function.

So Tango is not really that much of a heretic after all!


#10    Tangotiger      (see all posts) 2008/08/06 (Wed) @ 08:08

"(wOBA-.300)*SQRT(PA) “

Would then become:

PA/1.15*(wOBA-.300) * PA^(1/3)*1.15

That is, I am removing PA from SQRT(PA) or from PA^(1/2), and splitting it into:
PA* PA^(1/3) = SQRT(PA)

This term:
PA/1.15*(wOBA-.300)
is WAR

So we get:
WAR * PA^(1/3)*1.15

We can drop the 1.15 constant and we have:
WAR * PA^(1/3)

So, we need to get it to the power of 3, so:

WAR^3 * PA

And then you continue with your process.

I think that’s right.  Haven’t had breakfast yet…


#11    Tangotiger      (see all posts) 2008/08/19 (Tue) @ 16:53

Someone at BTF said that my process implied that Kerry Wood’s one-game performance meant he was one of the best pitchers ever.  And, I’m sure some similar batting performance (I dunno… Mark Whiten?  somebody anyway you can can come up with) would lead to a similar conclusion.

If we compare against the .260 wOBA level for a hitter, 4 Barry Bonds seasons would have the same significance as someone with a 5.715 wOBA in 5 PA.  The maximum wOBA possible (all HR, no outs) is 1.950.  How many PA would you need to get that? 52.  If you can hit 52 HR in 52 PA, that would give you the significance you need to proclaim that player as just being above the threshhold for greatness (HOF).

The equivalent for a pitcher is being perfect all season.

Clearly, this process will not allow anyone, not matter how great a performance, to be considered great after one fantastic unbelievable season.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:33
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors