Tuesday, December 19, 2006
Ratios or Rates?
I am trying to convince JC over at Sabernomics that there is a huge difference between using GB/FB ratio, FB/GB ratio, and GB/(GB+FB) or GB rates. Head on over there. Below is a summary of my posts.
We talked about this at the time. Are you still doing GB to FB *ratios*, or GB rates? Ratios are not symmetrical, and whether you do GB per GB+FB, or FB per GB+FB, you should end up with the same result. Ratios don’t do that.
***
As for using the ratio, then how to justify using GB/FB instead of the reverse? What you are saying, by using GB/FB is that the higher the GB is more important than the lower FB. That is, let’s say the GB/FB has a mean of 1.00. A GB/FB of 2.0 is the same as a FB/GB of 0.50. But, using the GB/FB as the ratio has double the impact of FB/GB, even though they are describing the exact same thing.
Just because something best-fits better on the sample doesn’t mean that it’s the right thing. A best-fit analysis would give the run value of a double .66 and the single .52 (instead of the more true .77, .47).
***
Ah, but the coefficient will not change accordingly. What will happen is this: mow the guys with the highest FB/GB ratio will move *more* than the high GB/FB ratio players.
Think of it in an extreme situation: you have a guy with 100 GB and 1 FB. In your current PrOps, this guy has a 100.00 value, which you multiply by some coefficient, say “.002″. So, he moves +.20 points up. If on the other hand you used FB/GB ratio, your coefficient may be “-.002″, which multiplied to 1/100 (or .01) will be zero.
From where I sit, using GB/FB taints your process whereby the higher the GB, the more impact than the higher the FB.
If you create a FB/GB version of PrOps, show your results both way (old Props, new Props) for Frank Thomas and Derek Jeter, and you will see the impact of this bias.
***
I just ran three different regressions, using GB/FB ratio, FB/GB ratio, and GB/(GB+FB) or GB rate. This was ran against GPA on the THT site. (The use of GPA, or OPS, etc, doesn’t really matter.) I used 2004-2006 data of all players with at least 502 PA.
The 2006 Frank Thomas is the most extreme, with a FB/GB of 2.44. His resulting regression yielded results of: .287, .313, .298.
At the other end is the 2004 Ichiro, with a GB/FB of 3.55. His results are: .247, .261, .255.
The sample standard deviations are: .0057, .0072, .0071
In all cases, the mean was .276.
GPA is analogous to batting average. Those are some HUGE differences, don’t you think?
***
The correlation coefficients were (r) were .21, .26, .26. And, it should go without saying, that using FB/(GB+FB) produced the exact same estimated GPA for each player as the GB rate, as well as the exact same r.