THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, December 21, 2007

Dan Fox’s Manny

By Tangotiger, 03:52 PM

No way:

I don’t believe there’s an issue with the sample being biased though (for example Manny is a large part of the data set above) since we’re comparing all performances at home with those on the road and if he stinks at Fenway he should probably stink on the road which would cancel each other out.

Maybe Dan hasn’t read my With or Without You (WOWY) in THT08 Annual.  But, there’s simply no reason to create a park factor for Manny that actually uses his own performance.  You’re not going to create a HR park factor for LHH at Yankee Stadium by using Babe Ruth, and then applying it to Babe Ruth.  You need to compare how Manny does compared to others in the same contexts.  You can’t compare Manny to half of Manny.

Normally, for park factors, a guy might be 5% of the sample, which is why we don’t care too much about removing the offending player.  But, in this case, he makes up 50% of the sample.  The shortcut we allowed in the 5% case can no longer apply here.  You may still get the same conclusion in this particular case, but that’s besides the overall point.


#1    Rally      (see all posts) 2007/12/21 (Fri) @ 19:33

How does MGL calculate his UZR park factors?

I see it as a necessary evil.  If you don’t include Manny in the calculation, then you’ve got to use different park factors for every player the Red Sox put out there, or any other team/ outfield spot.

You could use visiting players only, but some teams, like the Angels, hit extremely well at home for no obvious reason.  I think you need to balance it out by using both home and road data.

Its a lot easier, and I checked my Totalzone numbers:  Manny comes out, for 2007, at -4.2 runs.  If I take him out of the park factor calculation, which uses 5 year data (keeping other Red Sox LF in) it changes to ... -4.1 runs.

All of that difference for an hour of work.  From a cost/benefit standpoint, doesn’t seem worth it to me.


#2    tangotiger      (see all posts) 2007/12/21 (Fri) @ 21:39

Very interesting, for him.  I’d like to see this done for other extreme players, too.  I’d be surprised if we don’t find a few guys that is off by at least a few runs.

The case for Manny, as Dan Fox is implying here, is that he sucks equally at home and on the road, so that his performance moves lock-step with the rest of the LF in the league who play at Fenway.  Certainly after a few years, I can buy it.  For one year, no.  But, in this case, it did.

I have to believe that there’s several players every year where this process simply won’t work.

***

Also look at say HR factor for LHH at (whatever Giants stadium is now called).  If someone were to do one with Barry, and one without, I’d expect to see a noticeable difference.


#3    Rally      (see all posts) 2007/12/21 (Fri) @ 22:30

One thing that helps is that Manny misses a decent number of games, he’s not 50% of the sample, more like 40%.

The way I calculate, and I think Dan does as well, is take the league average for each of the 4 cells (line, fly, right, left) and apply the park adjustment.  League average is year specific and park adjustment is based on 5 years.  Then compare the players.  I get Ramirez at -25 total over the 5 years. That’s -17 on the road, -8 at home.

Another way is to ignore the league and look at how other left fielders, Red Sox and opponents, do in their games and compare Manny to that.  Of course the problems are that Red Sox hitters are pretty good, and other LF on the team may be above or below average, but its worth a look anyway.

Manny comes out at -35 that way, not much difference, 2 runs per year.  That breaks down as -16 on the road, -19 at home.


#4    DanAgonistes      (see all posts) 2007/12/22 (Sat) @ 03:23

Rally, yes that’s the way I’m doing it and I did consider the other way you mention - simply compare Manny to all non-Red Sox left fielders at Fenway for as many years as you’re going to compute park factors. I don’t think I ran the latter but I’ll do it and see what happens.

Using the first method for 2005-2007 and using only visiting players at Fenway and home players in Red Sox road games, the fly ball park factors come out to 1.93 for lefties and 2.08 for righties. Close but just a little higher than what I ended up with. But I certainly understand Tango’s point and sympathize with it.

Whatever else might be true, I certainly need to use 5-year instead of 3-year factors.


#5    DanAgonistes      (see all posts) 2007/12/22 (Sat) @ 03:24

And Tom, I did read your essay tonight smile

Excellent as usual. Love the Jeter piece as well.


#6    tangotiger      (see all posts) 2007/12/22 (Sat) @ 13:15

Cool, glad you liked it!

***

From 2000-2006, Barry Bonds (LHH obviously) has hit 144 HR at home and 145 on the road.  All other LHH at his home park (mates and opponents) have hit 231 HR, and his mates and opponents have hit 403 on the road.

As far as I am concerned, the landscape for Barry is that it depresses HR totals severely at his home park.  I’m not going to also include his HR and say that SF doesn’t depress HR totals that much. 

Barry makes up an enormous population of LHH who hit HR.

This situation is even more exagerrated if you look at Babe Ruth, at a time when only he hit HR. 

The right thing to do is to make sure that one player does not have a disproportionate share of the weighting when it comes time for various adjustment factors.

While the case of Manny may be proven to work out the same (i.e., he sucks everywhere), a case like LHH HR in SF won’t.  And, it’s incumbent on the guy who does the adjustment factors to show that there is, or is not, bias.

I’m glad Dan said that looking at the data for Manny showed no bias.  But, why not sidestep the whole issue and show the adjustment factors based on what everyone else sees?


#7    MGL      (see all posts) 2007/12/22 (Sat) @ 17:47

Tango, the problem with NOT including Bonds in your SF park factor is (simply) that you are reducing your sample size by a lot, and thus the reliability of your park factor.

This is a complicated issue that has several components.  Getting back to your Bonds example:

Let’s say that you want to determine what kind of a park factor to use for Bonds (in order to estimate his park neutral HR rate, I suppose).  If you use everyone else’s data BUT Bonds to determine the park factor in SF, then you have a pretty good indicator of a “true” PF for SF, especially if you have lots of different hitters in your sample and your sample is pretty unbiased and not dominated by any one or more hitters), but you have two problems.  One, as I said, by not using Bonds’s data, you have reduced your sample size and thus the reliability of the your PF (you have to regreess it more).  Two, if Bonds true PF is different from the “generic” one at SF, which you believe is the case with PF’s in general (that hitters probably have unique true PF’s), then you are in trouble (trying to park adjust Bonds’ HR’s using a PF that does not particularly apply to him).

Now, if you again only want to park adjust Bonds’ HR’s, so you decide to use only Bonds’ data to come up with a PF, then of course you are essentially just using his road HR (and maybe 1/16 of his home data) rate to determine his overall HR rate, this reducing your sample of his HR’s (cutting it in half).  That may be fine, as I have said that sometimes maybe using a player’s road numbers (and 1/4 or 1/16 of his home numbers of course) for anything is a good way around park factors, especially if you have large sample.  Of course you have to adjust for “home cooking.”

So the question is, if you want to park adjust Bonds’ HR rate in SF, should you use a PF that only includes other players’ data (and run the risk that SF does not affect Bonds the way it affects other players as evidenced by HIS HR splits), use his own park factor (which means that you are just using his road rate), or some combination.  I think the latter (some combination), but I am not sure.

Now what about for all other players?  Which HR park factor should you use?  The one for “all other players” would make sense as it is (hopefully) not dominated by any other players besides Bonds, however, as I said, by eliminating Bonds from the data, you severely reduce your sample size.  Again, I am not sure which to use. If you include Bonds, you run the risk of “polluting” the PF with Bonds’ unique PF.  I think I might still go with the one that includes Bonds’ data (again, if I am computing a PF to use for other players besides Bonds).

So I am not sure that this issue, Tango, is as clear cut as I think you are making it out to be.

To me, the bottom line with Manny is that I have his UZR on the road as quite bad, although not nearly as bad as at home, using my Fenway park factors, so I am quite sure that Manny is a very bad fielder, especially when we regress ANY sample data towards that of a slow, lazy, overweight outfielder.  I think that Dewan has him as quite poor on the road too.

I don’t know how/where Dan gets his road results from.  If they are just based on OF flies and outs (IOW, not using a rigorous PBP methodology), then they are most likely “wrong.”

If we have a unique park, like LF at Fenway, at the very least, we can take a player’s road stats (which are not really subject to park factors) and then combine them with a players “home stats compared to all other players at that park,” but heavily regressed for want of a large sample size and perhaps adjusted for those “other players.”

I have not done this, but given that we have good PBP location data, it is not that hard to figure a player’s defensive performance as compared to an average player at ANY park.  IOW, rather than using an overall “park factor” per se, we look at how average fielders handles the various types of balls hit to LF at Fenway and simply compare Manny (or any other fielder in Fenway) to that, on a “bucket by bucket” basis. One nice thing is that we have lots of historical data for Fenway park.  I have 10 years of PBP data.  For example, a 310 foot fly ball is never caught in certain slices in LF at Fenway, so for Manny or any other player, we ignore those fly balls.  Because everyone plays shallower at Fenway than a typical park, a pop fly or line drive to a shallow part of the OF is caught more often in Fenway on the average, so we use that as a baseline for all of Manny’s and other Fenwayers’ balls.  Etc.  It is not that hard.


#8    tangotiger      (see all posts) 2007/12/22 (Sat) @ 18:22

Ideally, you would weight each player the same, so that your factors are based on a representative sample.  Once you include Bonds in your LH SF park sample, you are biasing the results severely.  Clearly, Bonds is not affected by the SF park like the typical MLB hitter.

So, we have two issues:
1. including Bonds to come up with an overall LHH SF park factor, which you then apply to Bonds (which is definitely wrong)

2. including Bonds for the other hitters, which obviously means that the sample is not representative of the typical hitter

At the very least, you need to make your sample representative.  If it’s not, then it’s biased.  You could create groups of players.  If the percentage of HR hit by “power hitters” in MLB is 10% (as an illustration), then you to stick to that.  The key is to make your sample representative, and not assume that it is.

As it turns out, one-third of all LHH HR in SF games are hit by Barry Bonds.  That can’t be good for any park adjustment.

Getting back to Manny, if he’s involved in 40% of all plays in LF in Boston games, that can’t be good for any park adjustment, especially for a park that is so specific like Fenway.

If you have a situation that can be leveraged by a specific trait, then your adjustments will be very questionable.  Perhaps at Fenway, you don’t need much speed (or you need alot of speed, who knows).  The result is that if you have a player tailor-made (or opposite), then the adjusment will be biased.


#9    Anthony      (see all posts) 2007/12/22 (Sat) @ 20:40

I know sample size will be an issue here, but can we figure HR park factors by splitting players into high/low outfield fly %, and pulled outfield fly%. If Bonds’s park affects him differently than the league at large, it presumably would affect other lefthanded pull hitters the same way.


#10    Guy      (see all posts) 2007/12/22 (Sat) @ 21:54

I think the answer here may depend on what you want to use a park factor for.  Let’s assume, based on Tango’s data, that SF suppresses HRs for a typical LHH by 43% (in SF), but doesn’t suppress HRs at all for Bonds (ignoring the homefield edge, just to keep it simple).  So why do we need this park factor?
1) To project how many HR Bonds will hit if he moves to a new park;
2) To project how many HR another LHH will hit when moving to SF;
3) To measure the value of the HRs Bonds did hit.

For #1, we certainly don’t want to use a no-Bonds PF-- that would tell us his HRs would increase 37%, when his own H/R #s suggest he won’t increase at all.

For #2, I think we want to use a sample in which Bonds counts only a small amount or none at all (the Tango position), and we’d project a big HR decline for this non-Bonds LHH. 

For #3, I don’t think we want the Tango (no-Bonds) PF.  That would tell us Bonds’ park-adjusted HR total was 37% higher than his actual total.  But it’s not true that Bonds would hit that many in a neutral park.  It IS true that other LHHs, including SF’s opponents, will hit fewer HRs, making Bonds’ HRs more valuable.  But it seems to me that the value of HRs in SF isn’t dependent on handedness, it should depend on the overall offensive environment created by that park.  So, if the HR PF for RHH were zero, then Bonds would only get a PR bonus of say 12-13%.  In other words, we wouldn’t want to use a LHH-only park factor in the context of determining offensive value.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:02
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II