Thursday, April 05, 2007
Pitcher release points and target areas
Is this cool or what? It’s by Joe Sheehan, but not THAT Joe Sheehan. If he keeps doing work like this, the BP Sheehan will be introduced as “not THAT Joe Sheehan”.
Buy The Book from Amazon
Is this cool or what? It’s by Joe Sheehan, but not THAT Joe Sheehan. If he keeps doing work like this, the BP Sheehan will be introduced as “not THAT Joe Sheehan”.
Do we think or know that this data is close to 100% accurate. I know that it is captured using software technology, which, in theory means the data should almost be pinpoint.
However, I do worry that there are some data fudges. For instance, on Enhanced Gameday I have never seen a called strike outside the strike zone (as signififed by the box). We know from Questec et al that this isn’t uncommon. Can these rogue strikes be detected with the data?
More interesting work:
http://www.hardballtimes.com/main/printarticle/another-look-at-enhanced-gameday/
And, Joe provides a link to the XML directory, which is here:
http://gd2.mlb.com/components/game/mlb/
Maybe someone proficient in XML can provide some XML parsers or XML databases. For example, is the Berkeley DB XML or Oracle XDB a good solution here?
I think/hope that overall the data is pretty good. I have noticed problems though, such as some velocities that don’t make any sense and occasionally instead of capturing the release point the camera seems to capture the pitcher’s head. However, I have seen called strikes that are outside the given box. Check out Julio Lugo’s 1st AB vs. Ohka tonight for an example. I think the problem with called strikes only being in the zone would be with non-enhanced data, where it is entered by hand with a strike-zone to guide the stringer.
I too would be very interested in a good XML parser.
What always amazes me is that these data recording outfits (BIS, STATS, MLB, etc) that provide data have no oversight. We are supposed to take everything on faith. (I once offered to do data quality check in return for use of the data, and was turned down flat.)
Yet, when someone like MGL actually provides the blueprint for UZR, he gets assailed and sometimes dismissed!
I know that the MLB.com exec I talk to says that the data quality is not the strongest, so caveat emptor. At least, the data is free. And, from an analysis standpoint, we can always accept 80% or 90% data recording accuracy, rather than having no data at all. At the same time, he says that he won’t implement the stopwatch idea because of the data accuracy limitation!
I’m drooling at the thought of this being parsable. Imagine a merge with Retrosheet’s files. *Sigh*
How many Joe Sheehan’s are there? I’m counting at least four at the moment:
http://baseballanalysts.com/archives/2007/04/more_fun_with_e_1.php
That’s the same Joe that Tango linked to at the start of this thread. Are there really 2 more?
My ruling:
BP Joe gets to keep his name because he was around first. All other Joe Sheehans must come up with a distinctive nickname, the way Alex Gonzalez became Sea Bass, so we can tell them apart.
It’s the same guy ... trying to be amusing, but spectacularly failed
I should have seen it coming John, my bad.
But sometimes I really am confused. I saw an article recently by a Joe Sheehan, I can’t remember where but not in their normal websites, and I honestly didn’t know if it was BP Joe, this Joe, or another Joe.
I have first hand experience with this confusion, there is a retired Angel blogger who used to run purgatory online, also named Sean Smith.
joe p sheehan works for now...like chris b young i guess. i feel like michael bolton in office space though.
tango, beyond comparing the data to a video of the game as a quick check, how would you go about checking the accuracy of the data?
I was talking about the regular pbp data, not the video data. Even so, there’s always some cross-checking, like some data points being completely out of norm that would be flagged for further review etc. In short, each data point needs to have a certainty level built for it. If there’s a fastball pitch down the middle by Zumaya at 82 MPH, then you’d flag it as a “0”. If it was 95, you might flag it as a “5”. Then, you go to the tape for each one, starting with the 0s, and turn them into “10”, by changing the 82 to 102, or even keeping it at 82 if that’s what really happened.
I would guess that none of the outfits does anything close to what I’m discussing. That is, if you were to ask “how certain are you of the results”, you’ll get a vague answer like “pretty certain”. And, if you were to ask, “how certain are you that the 83rd pitch of the Redsox game of Fri, Apr 13 was a fastball at 92 MPH in location -1,+4”, they’ll give you a blank look.
XML files will load directly into Excel correctly formatted with headers. All that is necessary is to write a program in VBA to batch load the files from the mlb.com site. There is some additional complexity to be able to coordinate with Retrosheet.
And the always-resourceful Dan Fox takes a crack at it:
http://www.baseballprospectus.com/article.php?articleid=6210&mode=print&nocache=1178904242
I particularly like the velocity chart at the end.
To accompany that fastball chart, you also want movement, as Dan is noting. Pedro, in that Game 7, likely had his fastball at the same speed, but with less movement. So, the tiring effect can be seen when the two are combined.
***
(Note to Dan: I think it’s confusing to show a chart based on the same unit, feet, and then stretch out of one of those axises. Mentally, I’m stretching them back in. If you start with the fixed units, no one will try to stretch them out.)
Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season
Jan 09 02:33
Cheers
Jan 08 23:45
The first Hardball Times Annual available for download!
Jan 08 21:16
Line Drives
Jan 08 20:23
(recent) Historical WAR on Fangraphs
Jan 08 16:07
Clint Eastwood is Archie Bunker
Jan 08 16:06
Hardball Times Annual 2008, starring…
Jan 08 15:58
Madoff’s Ponzi
Jan 08 03:41
Valuing relievers
Jan 07 17:41
The latest in park factors
And here’s another one:
http://www.detroittigersweblog.com/2007/04/a-different-look-at-zumayas-outing/