THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, July 26, 2006

Mark Teahen, and GB/FB ratios

By Tangotiger, 07:32 AM

According to FanGraphs, Mark Teahen had a GB/FB ratio of 2.22 in 2005, and 1.37 in 2006.  Quite a shift, wouldn’t you say?  In 2005, 53% of his contacted balls were groundballs, while in 2006, it’s 50%.  Quite similar, wouldn’t you say?  What’s the difference?


The difference is that Line Drives are not calculated as either GB or FB.  His entire shift is in his LD to FB ratio, from 1.00 to 0.27.  Now, that’s quite a shift.

Personally, I prefer that if you are going to go with just one ratio, then use the GB to Airball ratio.  However, I hate ratios, because of their non-symmetry.  If the league average GB to AIR ratio is 1, then a 1:2 ratio is symmetrical to 2:1 ratio.  However, 1:2 is 0.50, and 2:1 is 2.00.  So, why not do the most simple thing, and quote the guy’s GB percentage.  In this case, Teahen went from 53% to 50%.  His Airball percentage is obviously 47% last year and 50% this year.

Just looking at his chart, I’m guessing that the shift in LD to FB means that he’s hitting the ball with more loft.  That’s further evidenced by his HR/FB rate doubling from 2005 to 2006.  Is this statistical significant?

Teahen has 24 LD and 65 FB, whereas last year he was 50/50.  It’s highly unlikely that the 50/50 was anything close to his true split.  I doubt any player has that kind of split.  Either it was a blip, or the Royals scorekeeping was off with him.  More interesting is his HR/FB rate doubling.  One standard deviation of HR, given 65 GB is .04 HR/FB.  Teahen’s HR/FB changed by .08 HR/FB (and that’s from one small sample to another small sample).  In and of itself, it’s not very significant.

However, if the LD to FB ratio change is something real, if he is in fact hitting the ball with more loft and distance, then this change is more real.  That is, while we expect our player’s sample HR/FB to regress to the league mean of around 10%, if we know something more about this hitter, for example, the % of FB that are long flyballs, then we would have a higher mean to regress to.

It’s cases like these that having a good scout would be invaluable.

#1    studes      (see all posts) 2006/07/26 (Wed) @ 13:44

During the offseason, I noticed that BPro moved from G/F ratio to GB%, which I think is a much better way to approach the issue.  Percentages are MUCH easier to interpret than ratios, and the whole flyball vs. line drive issue makes things much worse.

So, we switched to GB% and LD% at THT this year, and I think those two numbers tell you a lot more.  However, I’ve still had people ask for the old G/F ratios; I think, they’re just used to it.  Also, I had one person ask me to add F% as well, even though F% would just be 100% minus the other two ratios.

To conserve space and keep the site readable, I plan to just stick with GB% and LD% (as well as IF/F, another of my favorites).


#2    tangotiger      (see all posts) 2006/07/26 (Wed) @ 14:20

Well, the FB% is nice as well, even though it’s 1 minus the other two, because if you cut/paste your data into Excel, it saves the person from doing the calculation.

As for the IF/F, I would only do that if it’s significant.  HR/F makes sense, but does IF/F?  Does LD/Airball?  If they have no relationship, it doesn’t make sense to do the rate stat.  So my question: do they?

Looking at it, I’d say you want:
GB%, IF%, LD%, OF%, HR/OF


#3    tangotiger      (see all posts) 2006/07/26 (Wed) @ 15:02

I’m just running some regressions against infield fly (IF), and I get the following correlation coefficients (r):

GB: -.38
LD: -.35
OF: +.26

So, it looks like the IF should be compared to the OF.

How about the Line Drive (LD)?

GB: -.08
IF: -.35
OF: -.29

And the outfield fly (OF):

GB: -.90 (duhh*)
IF: +.26
LD: -.29

(duhh*): I’m computing GB rates against the sum of GB+ OF + LD + IF rates.  Since most of the events are GB or OF, GB is pretty much 1 - OF, so that’s why you see the r as close to -1.00 as possible.  So, it’s best to look at the comparisons within each group.

Other interesting tidbits:

Regression toward the mean of the various rates is an r=.50, at these levels of number of balls contacted:

GB, FB: 70
IF: 150
LD: 500

What does this mean?  If you have just 70 contacted balls, the observed GB rate needs to be regressed toward the league mean 50%.  But, you need 500 contacted balls to be able to regress the line drive by 50%.

In other words, if you have 500 contacted balls, this is how much you have to regress their rates toward the league mean:

GB, FB: 12%
IF: 23%
LD: 50%

(Of course, since the sum of the means must add up to 1.00, I’m not sure how to technically make this correct, without adding a correction factor.)

So, if you see someone with a big LD/BIP rate, that doesn’t mean he’ll keep it up.  Chances are that alot of those LD will turn into OF or IF (see 2nd chart above).


#4    studes      (see all posts) 2006/07/26 (Wed) @ 17:30

Well, the FB% is nice as well, even though it’s 1 minus the other two, because if you cut/paste your data into Excel, it saves the person from doing the calculation.

As you know, there is a finite amount of space on a web page.  We don’t create THT’s pages to include every conceivable stat.  We create them to concisely tell a ballplayer’s story.  Plus, if someone has to create a simple formula to calculate a stat that is essentially the inverse of another, I don’t believe you should include it in a web page like ours.  That would be like presenting both DER and BABIP for a pitcher.

My question when people ask for another stat is: okay, what stat would you give up in its place?

To me, IF/F makes a lot of sense.  There is a strong correlation between flies and infield flies, so infield flies will generally follow the FB rate.  But pitchers can vary in their fly rate per total flies.  MGL analyzed this and presented IF/F as the most meaningful cut of the data a couple of years ago, and I still believe it’s the best way to look at the dynamic.


#5    tangotiger      (see all posts) 2006/07/26 (Wed) @ 19:22

Seeing that both IF and LD have the same correlation (but reversed) to the OF, I would argue that how you treat the IF is how you treat the LD.

So,
- GB/Contacted Balls
- LD/Air Balls
- IF/Air Balls
would give you the minimum number of columns, and most meaningful.


#6    studes      (see all posts) 2006/07/26 (Wed) @ 20:10

But now you’re backing off your original point and making FB% less intuitive and more complex—(1-GB%)*(1-LD/Air)

Plus, LD/Contacted balls is closely correlated with BABIP, an important link in how we present our stats.


#7    tangotiger      (see all posts) 2006/07/26 (Wed) @ 21:23

My point is still to present what I originally said.  I was trying to say that if you want to present the minimum number of columns, and making it the most meaningful, then present the columns I just said. 

As for LD/ContactedBAlls being closely related to BABIP (more than LD/Airball), I’ll have to check it out.


#8    tangotiger      (see all posts) 2006/07/27 (Thu) @ 08:27

A best-fit of the four batted ball types against BABIP gives me:

BABIP
= 3.07
- 2.79 * GB
- 3.29 * IF
- 2.34 * LD
- 2.89 * OF

The p-value is .04 for IF, and .15 for LD.

The r of the above equation is .642


#9    studes      (see all posts) 2006/07/27 (Thu) @ 12:40

Wow.  Crazy equation, Tango.  By “GB” etc., do you mean “GB%”?  I assume so.  Interpreting then…

100% GB = .280 BABIP
100% IF = -.210 BABIP
100% LD = .730 BABIP
100% OF = .180 BABIP

Of course, the equations fall apart at these extremes, but this makes sense to me, and reinforces the previous research I’ve seen and done.


#10    tangotiger      (see all posts) 2006/07/27 (Thu) @ 13:23

Forcing the constant at zero, the coefficients are:
GB .28
IF -.23
LD .71
OF .18

Since we know that the IF should have a coefficient close to zero, we can force that in as well, to give us:
GB .27
IF 0
LD .72
OF .16

The OF is probably as low as it is, since the HR is counted in the OF, but it is not part of the BABIP plays.

r of above is .62

In the end, you should just use whatever the league mean is for each component.


#11    tangotiger      (see all posts) 2006/07/27 (Thu) @ 13:31

hmmm… the opposite happened.  I removed HR from the OF, and now the coefficents are:
GB .28
IF 0 (forced)
LD .76
FB .08

Anyway…


#12    David Smyth      (see all posts) 2006/07/27 (Thu) @ 16:24

---"My question when people ask for another stat is, what stat would you give up in its place?”

Well, lots of the stats given are basic ones, easily available on general sports sites such as ESPN. So, get rid of those (as necessary) and focus on the stats more unique to the HBT, such as batted ball stats and Win Shares.

For me, I don’t want someone else, even someone like Tango or Studes, deciding which stats are “better”. So, just give me the outcome types per PA, and let me decide how I want to combine/group them for some purpose.


#13    tangotiger      (see all posts) 2006/07/28 (Fri) @ 07:49

MGL’s DIPS Revisited article.  The first table shows the % of hits on each type of ball in play.  Those numbers are:

GB: .23
IF: .04
LD: .75
OF: .14

The overall average is .301 (BABIP).


#14    Mike      (see all posts) 2006/07/28 (Fri) @ 08:35

Does anyone have the link to MGL’s article about IF/F? I wanted to read over it again if it’s still available online or at archive.org.


#15    Marc Normandin      (see all posts) 2006/07/28 (Fri) @ 13:17

“For me, I don’t want someone else, even someone like Tango or Studes, deciding which stats are “better”. So, just give me the outcome types per PA, and let me decide how I want to combine/group them for some purpose.”

I think I’m inclined to agree with David on that point. I’d prefer access to more data, rather than less, and the data THT has on the player cards is part of the reason I go there so often.


Page 1 of 1 pages

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:33
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors