Friday, October 12, 2007
Is MLB’s pitchf/x system accurate?
First of all, another great article and analysis by Dan Fox at BP.
As an aside, he had a great quote at the end of the article (which has nothing to do with the pitch data). People ask me all the time, “Who do you think is going to win the (insert award/series/etc.)?” Like who do you think is going to win the Indians/BoSox ALCS? Even after painfully analyzing the series for several hours, my pat (and factually correct) answer is, “I have no (bleeping) idea!” To borrow a phrase (again) from Bill James, “I am an analyst, not an oracle.” I can tell them the percentage chance of winning I think each team has (based on my model and my analysis), but I cannot tell them “who is going to win (obviously).”
Now let’s say that I had Boston at 65% (which I don’t). If they win, was I “right?” If they lose, was I wrong? What about if I had Boston at 51% (which I do) and they win? Right or wrong? Heck if I know the answers to those questions. I DO know, for example, that if I were a perfect modeler and knew the exact percentages in all baseball series or even every one of the 2,430 games during the regular season, and all my “rights” were when my favored team wins and all my “wrongs” were when my favored team lost, I would be “wrong” a heck of a lot!
As I also like to say, “If a good - no a great - analyst isn’t wrong a heck of a lot, he is probably cheating.”
Anyway, here is what Dan wrote about the difference between probabilities (which is all an analyst can do) and predictions (which are silly and meaningless for an analysts to make):
A subtler but related point in this vein is that some seem to think the models used to discuss events are necessarily predictions and therefore take a “told you so” approach when the end result seems improbable according to the model. But probabilities are not predictions, and so in addition to the fact that the models used to generate the probabilities are incomplete, even events that are unlikely do in fact happen. Only if you could replay the event hundreds or thousands of times could you say with confidence that the model is not useful.
Back to the pitch f/x data…
Recent comments
Older comments
Page 1 of 70 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date