THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, September 14, 2006

More Leverage Index

By Tangotiger, 07:22 AM

Studes always gives me at least one thing I didn’t know in the ten things he didn’t know.  When it comes to Leverage Index, it’s an easy sell to me, even if I’m the salesman.  It’s clear what it’s doing, it comes up with easily understandable measures.  But, the problem with these “single-numbers” is that the reader has to work to understand what you did.  That’s where Studes comes in.  He does the work for you.  For example:


He runs through all the highest-leveraged PA of Pujols, Ortiz, and Jeter, three of the leading MVP contenders, and at the top of the list in clutchiness. Rather than giving you the single number, we actually see what these guys did in the clutch, PA by PA.  Let’s start with Pujols.  In his 15 highest-leveraged PA:

Pujols has gone 8 for 12 with three home runs and three intentional walks, for a total WPA of 2.0, almost exactly his clutchiness score

Two things to note here.  One, we see he’s been sensational with the game on the line.  That’s what’s good about LI, that we can numerically express each of his hundreds of PA, so that we can pick and choose those PA that have the highest level of fire.  Next thing to note is that his total WPA in these 15 PA is +2.00.  His average LI in these 15 PA was 4.28, meaning that each PA, on average, was worth 4.3 PA.  If you take the shortcut, we can do 2.00/4.28 to give us 0.47 as his WPA if the situation was of average LI (1.0) and not super-high.  To go the long route, take each WPA and divide it by the LI, and add up the total.  This gives us +0.48.  So, he adds +1.52 wins, because he timed his performance (by design or luck, your choice), in just these 15 PA.  This is his clutchiness score for these 15 PA (sort of).  His actual clutchiness score is still even higher, but he gets the biggest boost just on these 15 PA.

Repeating for Ortiz, his WPA in his super-high-clutch PA is +3.40, but if we remove the leverage from the mix, he’d have been +0.77 wins, making his clutchiness in these 14 PA an astounding +2.63 wins, and way above his overall clutchiness score.  In short, Ortiz clutchiness is completely captured in these 14 PA.

On the other side is Derek Jeter, who gets only +1.58 WPA in his 13 high-clutch PA, with a clutchiness score of +1.22 wins.  Jeter has simply been able to more evenly spread his clutchiness around, unlike Ortiz who has concentrated them the most.

***

Later on, Studes asks for people to buy the THT book, and, if possible, directly from the publisher.  Speaking as one who wrote/published a book, I can’t tell you enough how it’s important to buy directly from the publisher.  The profit margins are virtually non-existant on Amazon books, and very high directly from the publisher.  I know the reader saves a good deal of money, 5 or more bucks, but, as studes also said, and I believe, consider the extra money as a donation for more R&D.

The same applies for any author who you truly want to support.  Baseball Prospectus, Bill James, or whoever.  If they hold a special place for you, make the extra effort.

#1    David Smyth      (see all posts) 2006/09/14 (Thu) @ 12:16

As luck would have it, I pre-ordered the HBT (along with the 2007 B James HB) yesterday, not from the HBT or Amazon, but from Acta. They sent me an email ad, and I clicked and ordered. I’m not sure how much hassle it will be to cancel that order, but I’ll take a look.

I’m wondering, though, if it makes that big a difference, why does the HBT sell thru those other outlets? AFAIK, The Book was only available thru the publisher. I’m sure all people who would buy the HBT annual visit the site regularly, and would respond to the ad there.


#2    J.P. McIntyre      (see all posts) 2006/09/14 (Thu) @ 12:34

The other outlets offer much, much greater exposure, especially to those who are unfamiliar with the author’s work. Think of it as an investment in the future. If you write a good book, more people will see it and be inclined to buy your next book. Once you have strong sales, you can negotiate a better deal with outlets like Amazon, B&N, etc.


#3    tangotiger      (see all posts) 2006/09/14 (Thu) @ 12:37

Not speaking for THT, but let me tell you from my experience.  These are profit levels if you order from these sources:
- from author, 7-8$
- from publisher, 2-3$
- from Amazon, 0-1$
- from second-hand source, 0$

If you click-through the author’s site to get to the publisher’s site, the author also gets an extra commission, say $1 or $2.

As for the reason to selling through different sources, it’s simply volume x profit.  If you sell 1000 units as an author, you need to sell 8000 units via Amazon.  The absolute worst you can do is buy a used book, if you intend to support the author.

The most important part is of course to buy it from somewhere. 

Regardless though, being an author is terrible in terms of profits.  On a per-hour basis, my wife made more sending out the books that we made writing it.

The other thing to do to support these sites is to click on the banner ads.  You lose 10 seconds of your life, but the hit counts work.


#4    studes      (see all posts) 2006/09/14 (Thu) @ 12:38

Thanks for looking into that, David.  Every little bit helps.

The distribution question is actually something I asked ACTA, and they feel that selling through Amazon is still worth it.  It’s an area in which our economics clash with theirs a bit.  Still, we’re much better off with them than without them.  If you see another ACTA book you’d like to buy, click the link from our site instead of going direct to them; that helps us too.


#5    Guy      (see all posts) 2006/09/14 (Thu) @ 14:11

"Studes always gives me at least one thing I didn’t know in the ten things he didn’t know.”

Does this mean that of all the things Studes doesn’t yet know, Tango already knows 90% of them?
:>)


#6    studes      (see all posts) 2006/09/14 (Thu) @ 14:52

Wouldn’t surprise me one bit!


#7    Jim P      (see all posts) 2006/09/15 (Fri) @ 07:15

When I ordered the THT book last year, I didn’t mind paying the premium to the authors, but I did mind paying an extra $5 to UPS for shipping, which I could get free at Amazon (if I order another book at the same time).  Would setting up a PayPal donation link be against your principles?


#8    studes      (see all posts) 2006/09/15 (Fri) @ 10:47

That’s a good point about the shipping.

Yes, paypal donations are accepted (we have a paypal account) but I’ve never put a button on our site.  Maybe I should go ahead and do that, unobtrusively.


#9    Tom      (see all posts) 2006/09/15 (Fri) @ 11:14

I see the shipping has gone to 5.75$ this year (for a total of $25.70). 

What happens with Amazon ($13.50) is the $20 retail is sold to Amazon for $10 or so, who then sell it to the consumer for $13.50.  THT gets a cut of the $10, say $1.  Amazon gets $3.50, and ACTA gets $4 or so.

If you buy from the publisher, maybe THT’s cut is 6$, leaving the publisher with $8?  I dunno.

Anyway, an option is to buy from Amazon, and PayPal THT $5 - $10.  Essentially, the consumer is choosing to pay the authors over the publisher.


#10    studes      (see all posts) 2006/09/15 (Fri) @ 13:45

For those who don’t want to buy from ACTA, I’ve put a paypal donation button in the lower right column on the home page.  Not a hard sell at all, but maybe someone will click on it once in a while.


#11    tangotiger      (see all posts) 2007/05/25 (Fri) @ 16:20

DSG has an article:
http://www.hardballtimes.com/main/printarticle/pondering-pythagoras/

Where he has this formula to estimate LI:
1.11 – .137*GS/G - .86*(GF - Sv)/G + 2.165*SV/G

Pete Palmer came up with an equation as well.  I don’t have it handy, but I’ll try to dig it up.

When I run David’s version, I come up with Trevor Hoffman, 1998, as the highest LI (2.74).  Hoffman actually comes out with 5 seasons of an LI exceeding 2.3.  The LI at the bottom, bottoms out at 0.60 or so.

I think the spread is too wide.  It’s a tough job, because you are trying to use both starters and relievers.  I would separate GS and GR a bit better.  For example, the last two terms should divide by GR not G.  In short, if you fix the LI for a GS as 0.98, and you presume 6IP per GS, you can come up with an LI for GS and for GR, and then weight them for an LI.


#12          (see all posts) 2007/05/25 (Fri) @ 16:32

Using saves in the regression equation means that these events are overweighted.

I looked at Mariano in 2006 and his actual LI was 1.8x (or some such) vs 2.0y as predicted by the formula. Given Hoffman is top of the pile suggests that the equation overeggs elite relievers’ LI


#13          (see all posts) 2007/05/25 (Fri) @ 16:35

And of course, all middle relievers or set-up men have an LI of 1.11 ...!


#14    David Gassko      (see all posts) 2007/05/25 (Fri) @ 19:05

I’ve tested the equation historically, and it is actually impressively accurate. For example, here are some selected career LIs:

Trevor Hoffman - 2.10
Mariano Rivera - 2.02
Troy Percival - 2.02

(No one else over 2.00)

Bruce Sutter - 1.83
Lee Smith - 1.80
Goose Gossage - 1.43

The actual LIs for the latter three guys are 1.90, 1.73, and 1.62. Not perfect, but not too bad either.

Tango has long written that Hoffman and Percival have had off-the-charts LIs in their career, and Rivera has too.

I think the biggest problem with the method is actually that it might underrate relievers from previous eras who might have been leveraged without necessarily saving a lot of games. Given that it was developed using data from 2006, it shouldn’t have too many problems with modern pitchers, though I do grant it may overrate seasons like Rivera’s recent years, where Mariano is brought out for a lot of gimme saves (predicted LI is 2.28 from 2002-05 versus an actual of around 1.80).


#15    tangotiger      (see all posts) 2007/05/25 (Fri) @ 20:18

I’ve got Hoffman with an estimated LI (your equation) of at least 1.99 in virtually every season.  I don’t see how it averages out to 2.10.


#16    David Gassko      (see all posts) 2007/05/25 (Fri) @ 22:35

Hoffman’s predicted LIs:

1993 - 0.93
1993 - 1.06
1994 - 1.65
1995 - 2.02
1996 - 2.16
1997 - 1.98
1998 - 2.74
1999 - 2.28
2000 - 2.24
2001 - 2.45
2002 - 2.26
2003 - 0.44
2004 - 2.57
2005 - 2.50

If you multiply the leverage by his innings pitched in each year, that gives Hoffman 1728 leveraged innings versus 822 actual innings. 1728/822 = 2.10.


#17    Guy      (see all posts) 2007/05/26 (Sat) @ 00:05

DSG, a couple of ideas for possibly improving the model (if you haven’t already tried them):

1) Create separate models for starters and relievers, based on some GS/G cutoff.  This may give you a stronger model for relievers, which is what you mostly care about.

2) See if wins and/or total decisions adds anything to the model.  It won’t identify closers, but might help separate the 1.5 from the .9 relievers (getting a decision usually means a fairly close game). 

3) See if IP/G tells you anything.  Guys with shorter appearances may tend to have somewhat higher leverage.


#18    tangotiger      (see all posts) 2007/05/26 (Sat) @ 08:26

If we take out those seasons where he was not a closer (1993, 2003), that leaves us with around 1633 leveraged innings on 723 inning, or so, for an LI of 2.26. A bit high, though pretty good, all things considered.

I agree with Guy/17, part 1, which is what I was trying to say in my very last line in Tango/11.  Guy’s other ideas are also excellent.


#19    Guy      (see all posts) 2007/05/26 (Sat) @ 10:16

As we’ve discussed over at Ballhype (at excessive length), using saves to estimate leverage introduces a bias:  est. LI is too highly correlated with wins.  How big a problem this is probably depends on what you’re going to use it for.  But I would guess it will tend to overestimate leverage for pitchers on winning teams, at least for recent decades in which a high proportion of wins result in an awarded save.  And a guy who pitches a lot when his team is down by 1 or 2—maybe the #3 reliever in today’s game—might have too low an est. LI.


#20    Tangotiger      (see all posts) 2007/05/26 (Sat) @ 12:37

Pete Palmer’s estimate had something like SV/(TeamWins - TeamCG) or some such. And there was even a league factor in there, since a CG would often be a loss when both pitchers would go long, etc.

I couldn’t find his formula.  I know Rob Wood posted it once.  I’ll email Pete, and see if he can help us out.


#21    David Gassko      (see all posts) 2007/05/26 (Sat) @ 16:05

Guy,

I’ve actually experimented with adding wins and losses before, but didn’t like it because it screwed around with the starting pitchers. IP/G would probably do the same.

I could try to build a model just for relievers, but then what do you do about, say, swingmen? I’d like to be able to apply this model to all of baseball history, which means you have to be able to apply it to every pitcher equally.


#22    Guy      (see all posts) 2007/05/26 (Sat) @ 22:15

I was thinking you’d have one model for GS/G>=.40, and another for GS/G<.4 (or whatever cutoff you want).  But I can see the advantage of a single model.

So, maybe you try (W+L)*(G-GS)/G to pick up decisions for relievers.  And G/IP might work even for all pitchers (or just G alone).


#23    Guy      (see all posts) 2007/05/26 (Sat) @ 22:20

Or, if G/IP messes up your starter estimates, try weighting it by % of relief games:  (G/IP)*(G-GS)/G, which simplifies to (G-GS)/IP.  Then it won’t have much impact on starters.


#24    David Gassko      (see all posts) 2007/05/27 (Sun) @ 01:25

Guy,

Great suggestions! Hoffman’s predicted LI (2.08 actual) in 2006 drops from 2.59 to 2.24, though Rivera’s (1.83) actually goes from 1.94 to 2.04. Overall, the predicted LI of the top-ten leaders in saves drops from 2.15 to 2.04 versus an actual LI of 1.93. Overall, the correlation jumps from .77 to .85, though that might just be due to the fact that adding more variables always improves the correlation (I forgot to check the adjusted “r").

The new formula is:

.498 + .455*GS/G - .609*(GF-SV)/G + 1.924*Sv/G + .049*(W+L)*(G-GS)/G + .309*(G-GS)/IP

In terms of career numbers, I don’t know that it’s necessarily better than the simpler formula, though. Here’s how the previously mentioned guys look now:

Hoffman - 2.11
Rivera - 2.02
Sutter - 2.01
Percival - 1.97
Smith - 1.90
Gossage - 1.58

The model now overshoots Sutter’s LI instead of under-predicting (though the absolute difference goes from .07 to .11), it overrates Smith even more than before, but it’s much closer with Gossage. Hoffman, Rivera, and Percival basically don’t see any change (though I’m now going through 2006 instead of 2005, which should increase Hoffman and Rivera’s numbers a little bit).

There’s also a problem with pitchers from previous eras: Lefty Grove, for example, goes from 1.11 to 1.30, which I think is pretty clearly wrong. I don’t know that I really prefer this formula to the simpler incarnation.

Maybe I’ll try removing the IP component…


#25    Guy      (see all posts) 2007/05/27 (Sun) @ 08:11

The IP component must be virtually zero for Grove, so that’s probably not your problem.  I’d guess it’s the W-L factor.  The problem is that good starters used to have non-trivial # of relief appearances:  Grove had 159, vs. 1 for Clemens.  I know you’re looking for a single formula, but having pre- and post-WWII models would likely improve accuracy (since save frequency is also so different). 

Of course, Grove’s relief work—along with 298 CGs and .680 W%—probably means his LI really was high for a starter.  But I agree that 1.30 is too high.  (Come to think of it, CG/GS might be a good variable to test.)


#26    tangotiger      (see all posts) 2007/05/27 (Sun) @ 09:17

This is Palmer’s formula:

I developed a formula from wins, losses, saves and innings to give top relievers around double credit for their innings.  It is in the glossary of our encyclopedia under pitcher wins.

The formula is XMUL = 9 x (W+L+SV/XSV)/IP

XSV I had to add because of the proliferation of saves in recent years.

XSV = 10 x SV/W for the league, but cannot be less than 4, so if half the wins end up with saves, XSV is 5.

I also added a restriction that XMUL must be between 1/2 and 2.  Otherwise it could be zero with no decisions or saves.  It could be very high for a pitcher with 1 inning and 1 win.  Starters are typically around 0.9, while relievers can get up to 2. In the old days, top relievers used to get a lot of decisions, but not so much today.  Mop up guys can hit 1/2.

I wouldn’t set the upper bounds to 2.0.  Probably 2.5, if you have to force it.

DSG, maybe the the SV/XSV will give you an idea as to how to improve your system.


#27    Guy      (see all posts) 2007/05/27 (Sun) @ 21:15

David:  it occurs to me that % of games in relief—(G-GS)/G—isn’t the same as % of innings in relief.  You might want to see if using (G-GS)/(2*GS+G) works better.  That gives you % of relief innings, assuming 6 IP per start and 2 IP per relief game.  This should result in the two new variables having less influence on starters’ LI.  It may even be an improvement over your first variable (GS/G).

* *

BTW, Lefty Grove at 1.3 may not be as far-fetched as it sounds.  He had 298 CG and 123 GF.  If we take (CG+GF)/G as a rough estimate of the percentage of innings thrown in the 9th, Grove comes in at 10.7%.  In comparison, Clemens and Maddux are both 2.4%, Pedro 2.6%.  Grove pitched in the 8th and 9th innings in a LOT of games his team won, both as a starter and reliever.


#28    tangotiger      (see all posts) 2007/05/27 (Sun) @ 21:48

In 2006:
http://www.retrosheet.org/boxesetc/2006/YS_2006.htm

1.08 IP/GR (4.7 BFP)
5.82 IP/GS (25.3 BFP)

65% of BFP by SP

In 1957:
http://www.retrosheet.org/boxesetc/1957/YS_1957.htm

1.83 IP/GR (7.9 BFP)
6.42 IP/GS (27.3 BFP)

71% of BFP by SP

When I try to estimate SP and RP innings, I try to use the breakdown of the year in question.  If I want to do something quick, I use a 4:1 ratio.  For example, say I know a guy has 40 G, 15 GS, 150 IP.  How many IP as starter and relief?  I solve for this:
150 = 15*4x+(40-15)x
x=1.8

So, I give him 7.2IP as a SP, and 1.8IP as a reliever. 

I also try as a 4.6IP differential.  So, solve for:
150=15*(4.6+x)+(40-15)x
x=2

So, 2.0IP as a reliever, and 6.6 as a starter.

You can play around with something along those lines.

***

Aside note: notice that today’s starter only faces two fewer batters than from 50 years ago?  And, today’s pitcher goes deeper in the count, meaning more pitches per batter.  Overall, they have virtually the exact same pitch count.

Pitchers from the 1970s are the exception.


#29    Guy      (see all posts) 2007/05/27 (Sun) @ 23:19

So, based on Tango’s data, maybe we should assume the start:relief ratio for IP is about 4:1.  Then the variables you want to try are:

&#xIP; in relief:  (G-GS)/(3*GS+G)

Relief decisions proxy: (W+L)*(G-GS)/(3*GS+g)

Length of relief appearances: G*(G-GS)/(IP*(3*GS+G))

* *

In 27, I meant to say “if we take (CG+GF)/IP as a rough estimate of the percentage of innings thrown in the 9th....”


#30    dcj      (see all posts) 2007/05/31 (Thu) @ 02:14

In 2006:
5.82 IP/GS (25.3 BFP)

In 1957:
6.42 IP/GS (27.3 BFP)

...

Aside note: notice that today’s starter only faces two fewer batters than from 50 years ago?  And, today’s pitcher goes deeper in the count, meaning more pitches per batter.  Overall, they have virtually the exact same pitch count.

1957 CG/GS: 710/2470 = 29%
2006 CG/GS: 144/4858 = 3%

If a CG is 8.5 IP, the non-CG starts in 1957 averaged 5.6 IP. The non-CG starts in 2006 averaged 5.7 IP. Not what I would have expected!


#31    tangotiger      (see all posts) 2007/05/31 (Thu) @ 08:07

If you check out the Sandy Koufax pitch count log on my site, you will find the standard deviation of pitches per game to be quite high. 

You can check to see how many pitches Koufax pitched by CG and per not-CG.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:33
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors