Monday, April 20, 2009
Another article by Eric Walker with which I have some disagreements (imagine that!)
Here is the URL:
http://baseballanalysts.com/archives/2009/04/precisely_inacc.php
While he makes some good points, in his usual pedantic and cock-sure (I guess it takes one to know one) fashion, he draws some incorrect conclusions. Here is the comment I posted on the site:
While the “primer” by Eric is instructive and caution should always be taken when adjusting team or player stats using PF’s…
There are several problems with his thesis:
First of all, all of the issues he speaks of can be handled, statistically, with no problem whatsoever, if one has the know-how and takes the time to do so. So let us not throw the baby out with the bath water by declaring that “All PF’s are next to worthless, don’t work, cannot and should not be used to adjust player and team stats, etc.”
More importantly, even a “bad” park factor can be useful and can be used appropriately.
In fact, consider this statement by Eric:
The end results are not totally meaningless: we can say with fair credibility that San Diego’s is a considerably more pitcher-friendly park than Colorado’s, and that the Mets and the Marlins were playing in parks without gross distorting effects. But to try to numerically correct any team’s results--much less any particular player’s results--by means of “park factors” is very, very wrong.”
He is 100% correct in the first part. That obviously there is SOMETHING even in “bad” park factors that is able to give us useful information. Even a “bad” PF generally tells us that COL is more of a hitter’s park than SD or WAS and that TEX is more of a hitter’s park than OAK. So there must be SOME good information in PF’s, which there is of course. It is good information combined with noise (and perhaps some biases).
The last part of his statement is flat out wrong, considering the first part. If the first part is correct, which it is - that even a sloppy PF gives us some useful information, then it HAS to be correct that we can use those PF’s (in some way, shape or form) to adjust player and team stats so that our adjusted stats more accurately reflect a player’s or team’s performance in a park-neutral environment. HAS TO!
That does not mean, however, that we have to use a “bad” park factor as the purveyor of that PF may or may not want us to use it. For example, let’s say that our “bad” PF tells us that Coors inflates runs (as compared to the whole league) by 20% and that Petco deflates it by 10%. Even Eric would agree that while he does not trust those numbers exactly, that they probably are on the right track and are somewhat in the ballpark, no pun intended.
But he, and other “PF naysayers” don’t want us to use those PF’s at all, at least not quantitatively, to adjust player and team stats, at least according to the last part of the statement from him I quoted above.
Poppycock!
Would it be better to, say, take a run scored in Coors (by some team, player, or whatever) and do nothing or would it be better to say, adjust it by 1% (even though our “bad” PF says to adjust it by 20%)? If you said, “Better to adjust (by 1%) then not adjust at all,” you would be correct!
What about Petco and adjusting by 1% versus not adjusting at all? Same answer. I hope everyone gets my point so far.
So, it is not that these “bad” PF’s are not useful, it is that it is not necessarily correct (although it could be, at least as opposed to not using them at all, as you will see in a minute) to use the exact numerical adjustments that the purveyors of these PF’s might want you to use - for example 20% for Coors and 10% for Petco.
So the answer is not to “not use them at all” which is (incorrectly) throwing the baby out with the bathwater. The answer is for YOU to figure out how much of the actual PF that some system comes up with to use. You do that by evaluating the rigor of the system and how much data it uses. For example, there is absolutely NO reason not to use, for example, at least half of a run factor derived from a system that takes into consideration each team’s schedule and year to year changes in parks and uses 10 years worth of data. (And by the way, using 10 years of data in just about any PF “system” is going to yield a number that is more accurate than the same system using 1 year numbers almost no matter what, and it is not going to even be close, as long as that system is halfway decent).
Getting back to the Coors and Petco example above and the “bad” (but still reasonable, like the ESPN one) PF system that gives Coors a 120 rating and Petco a 90 rating, so if most of us are in agreement that it would still be correct to adjust Coors and Petco numbers by 1% (in opposite directions of course), then what about 2%? How about 5%?
Where do we cross the line from improving our numbers (by doing SOME adjustments) to doing worse than nothing at all? I don’t know the answer to that as it depends on, as I said, how the particular system is constructed (is it a good one or not), but I can tell you that for almost any halfway decent system you can always use a little bit of a park factor to neutralize performance and do better than nothing at all. And even if 120 and 90 are not correct, it STILL might be better to use the entire numbers than do nothing at all!
Let’s say that the real numbers are 115 and 95, which would not be unreasonable for a system that gets 120 and 90 (there is always going to be regression towards 100). Do you think using 90 and 120 to adjust player and team numbers would be better” or worse than using nothing at all? Again, if you said, “Better,” you would be correct!


Eric’s response to MGL’s critique is disappointing. It consists of a lot of yelling and a declaration that “this is not a religion”.
No Eric, it isn’t. Which is why maybe you should take another look at your own arguments.
He doesn’t even address MGL’s point about regressing the factors. It’s similar to the hand-waiving dismissal of multi-year factors that he gives in the original post.