Tuesday, December 19, 2006
Ratios or Rates?
I am trying to convince JC over at Sabernomics that there is a huge difference between using GB/FB ratio, FB/GB ratio, and GB/(GB+FB) or GB rates. Head on over there. Below is a summary of my posts.
We talked about this at the time. Are you still doing GB to FB *ratios*, or GB rates? Ratios are not symmetrical, and whether you do GB per GB+FB, or FB per GB+FB, you should end up with the same result. Ratios don’t do that.
***
As for using the ratio, then how to justify using GB/FB instead of the reverse? What you are saying, by using GB/FB is that the higher the GB is more important than the lower FB. That is, let’s say the GB/FB has a mean of 1.00. A GB/FB of 2.0 is the same as a FB/GB of 0.50. But, using the GB/FB as the ratio has double the impact of FB/GB, even though they are describing the exact same thing.
Just because something best-fits better on the sample doesn’t mean that it’s the right thing. A best-fit analysis would give the run value of a double .66 and the single .52 (instead of the more true .77, .47).
***
Ah, but the coefficient will not change accordingly. What will happen is this: mow the guys with the highest FB/GB ratio will move *more* than the high GB/FB ratio players.
Think of it in an extreme situation: you have a guy with 100 GB and 1 FB. In your current PrOps, this guy has a 100.00 value, which you multiply by some coefficient, say “.002″. So, he moves +.20 points up. If on the other hand you used FB/GB ratio, your coefficient may be “-.002″, which multiplied to 1/100 (or .01) will be zero.
From where I sit, using GB/FB taints your process whereby the higher the GB, the more impact than the higher the FB.
If you create a FB/GB version of PrOps, show your results both way (old Props, new Props) for Frank Thomas and Derek Jeter, and you will see the impact of this bias.
***
I just ran three different regressions, using GB/FB ratio, FB/GB ratio, and GB/(GB+FB) or GB rate. This was ran against GPA on the THT site. (The use of GPA, or OPS, etc, doesn’t really matter.) I used 2004-2006 data of all players with at least 502 PA.
The 2006 Frank Thomas is the most extreme, with a FB/GB of 2.44. His resulting regression yielded results of: .287, .313, .298.
At the other end is the 2004 Ichiro, with a GB/FB of 3.55. His results are: .247, .261, .255.
The sample standard deviations are: .0057, .0072, .0071
In all cases, the mean was .276.
GPA is analogous to batting average. Those are some HUGE differences, don’t you think?
***
The correlation coefficients were (r) were .21, .26, .26. And, it should go without saying, that using FB/(GB+FB) produced the exact same estimated GPA for each player as the GB rate, as well as the exact same r.
Well, JC, rather than refuting my arguments chose to simply refute me! This is my last post there:
=======================================
My post #9 clearly shows that it makes a huge difference for the extreme GB and FB hitters, if you use GB/FB or FB/GB. However, there is no change whatsoever if you use GB/(GB+FB) or FB/(GB+FB).
There is no justification for using one ratio (GB/FB) over the other (FB/GB), even though they absolutely give you different results. In fact, you are not even justifying it. Just deciding to use it.
There is zero opportunity cost to changing from ratios to rates, since I was able to generate 3 different regression equations in 5 minutes.
Your thread started with a comment about fans being skeptical, and here I am, giving you a thoughtful and legitimate beef, and you are dismissing it with “if you don’t like it, don’t use it”.
======================================
Hey, I know I can be tough, but when I’m right, I’m right. JC is wrong, Clay is wrong (about EqA), Woolner is wrong (about Leverage Index), Forman is wrong about using all those James/Palmer metrics. Heck, I was wrong about BaseRuns, until David finally convinced me to open my eyes.
Either you open your eyes, put your ego to the side, and learn, or remain wrong. My attitude is completely irrelevant.