THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, October 12, 2007

Is MLB’s pitchf/x system accurate?

By , 11:54 PM

First of all, another great article and analysis by Dan Fox at BP.

As an aside, he had a great quote at the end of the article (which has nothing to do with the pitch data).  People ask me all the time, “Who do you think is going to win the (insert award/series/etc.)?” Like who do you think is going to win the Indians/BoSox ALCS?  Even after painfully analyzing the series for several hours, my pat (and factually correct) answer is, “I have no (bleeping) idea!” To borrow a phrase (again) from Bill James, “I am an analyst, not an oracle.” I can tell them the percentage chance of winning I think each team has (based on my model and my analysis), but I cannot tell them “who is going to win (obviously).”

Now let’s say that I had Boston at 65% (which I don’t).  If they win, was I “right?” If they lose, was I wrong?  What about if I had Boston at 51% (which I do) and they win?  Right or wrong?  Heck if I know the answers to those questions.  I DO know, for example, that if I were a perfect modeler and knew the exact percentages in all baseball series or even every one of the 2,430 games during the regular season, and all my “rights” were when my favored team wins and all my “wrongs” were when my favored team lost, I would be “wrong” a heck of a lot!

As I also like to say, “If a good - no a great - analyst isn’t wrong a heck of a lot, he is probably cheating.”

Anyway, here is what Dan wrote about the difference between probabilities (which is all an analyst can do) and predictions (which are silly and meaningless for an analysts to make):

A subtler but related point in this vein is that some seem to think the models used to discuss events are necessarily predictions and therefore take a “told you so” approach when the end result seems improbable according to the model. But probabilities are not predictions, and so in addition to the fact that the models used to generate the probabilities are incomplete, even events that are unlikely do in fact happen. Only if you could replay the event hundreds or thousands of times could you say with confidence that the model is not useful.

Back to the pitch f/x data…


With respect to a pitch’s decceleration from start to finish, there is only ONE thing that determines that rate (given the same initial velocity, the same wind speed, and the same types of pitches) - the density of the air.  No problem there.

Now, there appears to me to be only 3 determinites of air density:  One, temperature, two, humidy, and three, altitude.  I THINK that by far, altitude is the most important, then temperature, and then humidity.

Given that, it should be easy enough to predict the approximate order of the parks with repsect to pitch decceleration.  Or at least which parks will be at the top and which ones at the bottom.

If you look at Dan’s list, however, there does NOT seem to be the order you would expect, as Dan points out (e.g., Comerica Park is both cold and low in altitude).  If you look at the pitch break lists you see some equally funky rankings of parks.

Either the pitchf/x data, at least with respect to pitch decceleration and break, is VERY innaccurate (and/or there are biases among parks), or there is a lot of unnaccounted for and different wind patterns in these parks. 

#1    tangotiger      (see all posts) 2007/10/13 (Sat) @ 07:35

Smooth Jimmy Apollo: Well, folks, when you’re right 52% of the time, you’re wrong 48% of the time.

Homer: Why didn’t you say that before!

***

I would hope, that it’s a given, that all predictions are made with the understanding that the predictor is expecting to be right 51-60% of the time, unless otherwise noted. There can’t be that many real-life Homer Simpsons, can there be?

Then again, if that’s true, if predictions are made with the expectation to being only a bit over 50% correct, why in the world do we care about individual predictions of individual people?

I also give a similar answer to MGL: I don’t know who will win, and no one else does either.  All I want is a good game.


#2          (see all posts) 2007/10/13 (Sat) @ 14:17

Actually, there are only 7 parks with a higher altitude than Comerica Park - Coors Field, Chase Field, Turner Field, Kauffman Stadium, Metrodome, PNC Park and Jacobs Field.  Several other parks (Wrigley Field, U.S. Cellular Field and Miller Park) are essentially at the same altitude, around 600 feet.

Here are some data that I have gathered relating to weather in the various parks:

Wind and Temperature:

http://www.hittrackeronline.com/Average_Weather_2002-06.xls

Humidity:

http://www.hittrackeronline.com/humidity.xls

Turns out only two parks are significantly drier than the rest during afternoon and evening: Coors Field and Chase Field.  One now has a humidor, and the other probably needs one…


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 04:02
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 19:13
Offense by position groups by decade

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?

Nov 19 13:50
Response of a fired head coach