THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, April 05, 2007

Pitcher release points and target areas

By Tangotiger, 01:56 PM

Is this cool or what?  It’s by Joe Sheehan, but not THAT Joe Sheehan.  If he keeps doing work like this, the BP Sheehan will be introduced as “not THAT Joe Sheehan”.


#1    Tangotiger      (see all posts) 2007/04/17 (Tue) @ 10:37

And here’s another one:
http://www.detroittigersweblog.com/2007/04/a-different-look-at-zumayas-outing/


#2          (see all posts) 2007/04/18 (Wed) @ 13:49

Do we think or know that this data is close to 100% accurate. I know that it is captured using software technology, which, in theory means the data should almost be pinpoint.

However, I do worry that there are some data fudges. For instance, on Enhanced Gameday I have never seen a called strike outside the strike zone (as signififed by the box). We know from Questec et al that this isn’t uncommon. Can these rogue strikes be detected with the data?


#3    tangotiger      (see all posts) 2007/04/18 (Wed) @ 14:19

More interesting work:
http://www.hardballtimes.com/main/printarticle/another-look-at-enhanced-gameday/

And, Joe provides a link to the XML directory, which is here:
http://gd2.mlb.com/components/game/mlb/

Maybe someone proficient in XML can provide some XML parsers or XML databases.  For example, is the Berkeley DB XML or Oracle XDB a good solution here?


#4          (see all posts) 2007/04/18 (Wed) @ 21:12

I think/hope that overall the data is pretty good.  I have noticed problems though, such as some velocities that don’t make any sense and occasionally instead of capturing the release point the camera seems to capture the pitcher’s head.  However, I have seen called strikes that are outside the given box.  Check out Julio Lugo’s 1st AB vs. Ohka tonight for an example.  I think the problem with called strikes only being in the zone would be with non-enhanced data, where it is entered by hand with a strike-zone to guide the stringer. 

I too would be very interested in a good XML parser.


#5    tangotiger      (see all posts) 2007/04/19 (Thu) @ 10:32

What always amazes me is that these data recording outfits (BIS, STATS, MLB, etc) that provide data have no oversight.  We are supposed to take everything on faith.  (I once offered to do data quality check in return for use of the data, and was turned down flat.)

Yet, when someone like MGL actually provides the blueprint for UZR, he gets assailed and sometimes dismissed!

I know that the MLB.com exec I talk to says that the data quality is not the strongest, so caveat emptor.  At least, the data is free.  And, from an analysis standpoint, we can always accept 80% or 90% data recording accuracy, rather than having no data at all.  At the same time, he says that he won’t implement the stopwatch idea because of the data accuracy limitation!


#6    Pizza Cutter      (see all posts) 2007/04/19 (Thu) @ 11:41

I’m drooling at the thought of this being parsable.  Imagine a merge with Retrosheet’s files.  *Sigh*


#7          (see all posts) 2007/04/19 (Thu) @ 16:18

How many Joe Sheehan’s are there? I’m counting at least four at the moment:

http://baseballanalysts.com/archives/2007/04/more_fun_with_e_1.php


#8    Rally      (see all posts) 2007/04/19 (Thu) @ 16:58

That’s the same Joe that Tango linked to at the start of this thread.  Are there really 2 more?

My ruling:

BP Joe gets to keep his name because he was around first.  All other Joe Sheehans must come up with a distinctive nickname, the way Alex Gonzalez became Sea Bass, so we can tell them apart.


#9          (see all posts) 2007/04/19 (Thu) @ 17:26

It’s the same guy ... trying to be amusing, but spectacularly failed


#10    Rally      (see all posts) 2007/04/19 (Thu) @ 18:56

I should have seen it coming John, my bad.

But sometimes I really am confused.  I saw an article recently by a Joe Sheehan, I can’t remember where but not in their normal websites, and I honestly didn’t know if it was BP Joe, this Joe, or another Joe.

I have first hand experience with this confusion, there is a retired Angel blogger who used to run purgatory online, also named Sean Smith.


#11          (see all posts) 2007/04/19 (Thu) @ 20:54

joe p sheehan works for now...like chris b young i guess.  i feel like michael bolton in office space though.

tango, beyond comparing the data to a video of the game as a quick check, how would you go about checking the accuracy of the data?


#12    tangotiger      (see all posts) 2007/04/19 (Thu) @ 22:27

I was talking about the regular pbp data, not the video data.  Even so, there’s always some cross-checking, like some data points being completely out of norm that would be flagged for further review etc.  In short, each data point needs to have a certainty level built for it.  If there’s a fastball pitch down the middle by Zumaya at 82 MPH, then you’d flag it as a “0”.  If it was 95, you might flag it as a “5”.  Then, you go to the tape for each one, starting with the 0s, and turn them into “10”, by changing the 82 to 102, or even keeping it at 82 if that’s what really happened.

I would guess that none of the outfits does anything close to what I’m discussing.  That is, if you were to ask “how certain are you of the results”, you’ll get a vague answer like “pretty certain”.  And, if you were to ask, “how certain are you that the 83rd pitch of the Redsox game of Fri, Apr 13 was a fastball at 92 MPH in location -1,+4”, they’ll give you a blank look.


#13    Peter Jensen      (see all posts) 2007/04/20 (Fri) @ 16:35

XML files will load directly into Excel correctly formatted with headers.  All that is necessary is to write a program in VBA to batch load the files from the mlb.com site.  There is some additional complexity to be able to coordinate with Retrosheet.


#14    Tangotiger      (see all posts) 2007/05/11 (Fri) @ 13:32

And the always-resourceful Dan Fox takes a crack at it:
http://www.baseballprospectus.com/article.php?articleid=6210&mode=print&nocache=1178904242

I particularly like the velocity chart at the end.

To accompany that fastball chart, you also want movement, as Dan is noting.  Pedro, in that Game 7, likely had his fastball at the same speed, but with less movement.  So, the tiring effect can be seen when the two are combined.

***

(Note to Dan: I think it’s confusing to show a chart based on the same unit, feet, and then stretch out of one of those axises.  Mentally, I’m stretching them back in.  If you start with the fixed units, no one will try to stretch them out.)


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:33
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors