THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, March 13, 2008

Fangraphs keeps on rolling the new stats

By Tangotiger, 10:32 AM

Want to know what Jake Peavy throws?  Go to the bottom of that page: 59% fastballs, 18% sliders, 11% cutters (CU), 2% curveballs (CB), 11% changeups (CH).  In the last 3 years, he’s thrown 10,000 pitches.  (Minor note: I’d call the cutter CT, as you can easily confuse CU for curve or changeUp.)


#1          (see all posts) 2008/03/13 (Thu) @ 13:38

He also said that he’ll be adding it to the leaderboards


#2    MGL      (see all posts) 2008/03/13 (Thu) @ 17:45

One thing I love about he pitch type frequencies is that it enables us to find the appropriate platoon regression mean for each pitcher…

If you use the data in THT (in an article by John Walsh) that tells you the average platoon ratio for each type of pitch for RH and LH pitchers (which was a fascinating article by the way).


#3    John Walsh      (see all posts) 2008/03/13 (Thu) @ 17:52

Thanks, mgl.

BTW, does anybody know how BIS determines pitch type?  My understanding is that BIS video scouts (curious name!) are watching games on TV.  It doesn’t seem trivial to identify pitches by watching on TV.


#4    Mike Fast      (see all posts) 2008/03/14 (Fri) @ 00:56

John, I’m very curious about that same thing.  My understanding was the same as yours.  I don’t know if I read that somewhere or made the assumption based on what was written in the Fielding Bible about how they have a person watch the video of each fielding play.

However, now that I’ve seen this data, I strongly suspect they are using some sort of automated classification algorithm.  Some of the errors in their data don’t make sense otherwise. 

Take the case of Erik Bedard.  The BIS data says he threw 58% fastballs and 3% cutters in 2007.  They say his average fastball speed was 91.6 mph and his average cutter speed was 89.6.

Because Baltimore didn’t have a PITCHf/x system, we only have PITCHf/x data from some road games for Bedard, covering only 701 of 2942 pitches on the season.  However, in that sample, Bedard threw 32% fastballs and 28% cutters, and his fastball averaged 92.6 and his cutter averaged 91.6.  He threw very few fastballs or cutters slower than 90 mph all season until his final start on August 26 when he was injured and throwing in the 86-91 mph range. 

My suspicion was that BIS was classifying his cutter automatically based on speed and movement without regard to looking at anything on video or having someone put their eyes on the data to see if it made sense.  If they were classifying cutters as slower versions of the fastball maybe they got most of the Aug. 26 fastballs listed as cutters rather than realizing that Bedard was throwing hurt and both his fastball and his cutter were slower that day. 

However, David Appelman indicated to me that only 12 of the 90 cutters that BIS recorded for Bedard were from Aug. 26.  That’s still a disproportionate amount.  I have Bedard throwing 38 fastballs and 24 cutters on Aug. 26, and 184 fastballs and 172 cutters in other starts, so he actually used the cutter a little less in that start.  BIS data implies a 50 fastball/12 cutter mix on Aug. 26 and a 1656 fastball/78 cutter mix in other starts.

Their identification of the cutter for a few other pitchers is suspect.  They have Bannister throwing 8% sliders and 10% cutters, when I have him throwing 18% sliders and no cutters, and Bannister himself said he did not throw a cutter in 2007.  They have Kelvim Escobar throwing 4% cutters, when I don’t see that pitch in his PITCHf/x data, and when Escobar and Molina have listed his repertoire they have listed all the pitches I identified but no cutter.

As I mentioned to David elsewhere, I don’t intend my comments as a criticism of him adding the BIS pitch data to his site.  I am very glad for that and think it’s a great service.  As someone who is fascinated by pitch classification methods, I’m fascinated by the methods other people use and when I see mistake in their data, I want to know why their algorithm failed in that case before I trust their data as a whole.  Unfortunately with BIS, their methods are private, and up until this point, their data has mostly been private, too.

I would say the BIS data looks marginally more accurate than Josh Kalk’s, but I must also admit to being a little disappointed because I guess I was hoping it would be ~99% accurate rather than 80-90% accurate data given the resources that BIS has to throw at things.


#5    MGL      (see all posts) 2008/03/14 (Fri) @ 05:12

STATS has been recording pitch data for years, and I have lots of it, although I have never used it.  I think they just make an educated guess from watching the games on TV/video.  And they probably just use the TV guns, although they might have their own guns live at the games.  I don’t really know.  For the most part, I think it is pretty easy to identify a pitch on TV, if you are good at it and are used to watching many games and you are familiar with most of the pitchers, AND you have a scouting report on the pitcher (what he typically throws).  Some things are hard to distinguish on TV/video, (like cutters) but for the most part, I think it IS trivial (for the right person).  What is nearly impossible unless you are well-trained and experienced (and you are sitting directly behind HP), is to distinguish pitches live AT a game.  For examples, TV commentators are notoriously poor at reporting the pitches, although some are better than others.  Then again, they have lots of things to do and sometimes are just not paying enough attention.


#6    ultxmxpx      (see all posts) 2008/03/14 (Fri) @ 14:54

I think it’s great that the BIS data is finally available to the public for free (even if it is limited). With the pitch f/x data it might soon make their work obsolete. But in the meantime they have data from years past, while pitch f/x only has a good chunk from 2007 and the 2006 postseason (the latter of which I’ve never really looked).

I think they really do just judge by video. I think I would marginally trust Kalk’s data over BIS (yes we are dealing on margins), but I’d trust my own analysis the most because I’m self-absorbed. Mostly as an aside, I find it pretty hard via video to discern a pitch with any confidence without the pitch speed being displayed.


#7    Tangotiger      (see all posts) 2008/03/14 (Fri) @ 16:01

James confirmed that the BIS data is done via video, and they do not look at PITCHf/x.  I alerted him to the Bannister issue, and he said he would alert BIS.


#8    MGL      (see all posts) 2008/03/14 (Fri) @ 17:50

What is Kalk’s data?  Where does he get it from and where is it presented?  What is the Bannister issue?


#9    Mike Fast      (see all posts) 2008/03/14 (Fri) @ 18:08

Josh Kalk’s data is from PITCHf/x.  He did some corrections to the data (adjusting for park inconsistencies in release points and velocities, standardizing break for difference in air density) and then he made player cards for pitcher with 100+ pitches recorded by PITCHf/x last year.  These cards show speed and movement and have some breakdowns by count and handedness.  They have become the defacto PITCHf/x data standard because of their easy public availability for almost every pitcher.

You can find them here:
http://baseball.bornbybits.com/plots/players.html


#10    MGL      (see all posts) 2008/03/15 (Sat) @ 00:56

Thanks Mike.
Anywhere these “cards” are available in one spreadsheet?


#11    Renè      (see all posts) 2008/03/15 (Sat) @ 02:18

MGL: http://baseball.bornbybits.com/php/combined_tool.php

You can spider the data from there and get it into a spreadsheet, or else just use it as a tool to get your own data for splits and everything “by hand” if you’re interested in single pitchers/hitters. I’d been using that data for my first Pitch F/X analyses since MLB.com didn’t love me, but now I finally have complete data (and I’m losing sleep).


#12    SirKodiak      (see all posts) 2008/03/15 (Sat) @ 09:54

I love fangraphs, and promote when I can.  But one thing bothers me, and that is the incomplete ‘glossary’.  BRAA is a great example.  I don’t see it in the ‘glossary’, and it could mean so many things.  I assume it is Base Runs Above Average… but how is it calculated?  Applying lwts to the player’s numbers? or looking at the actual events and base/out state? or another way?  How is Average calculated?  By league?  Are pitcher ABs excluded?

Oops, I just checked his blog and he is updating them there as of 2 days ago.  Good start!  I hope they make it to the glossary soon.

I wonder if comparing BRAA by base/out and BRAA by lwts would tell us anything…


#13    Mike Fast      (see all posts) 2008/03/15 (Sat) @ 10:50

MGL/#10,
No, MLB frowns on making the data available in that sort of freely available format.  I don’t feel like getting into a fight with them over that because as much as I’d like to have everyone be able to easily and quickly work with the data, I also want the data to continue to be available in future seasons. 

I have instructions on my website for how to download the data and parse into a database yourself, but that requires a bit more work, I know.


#14    MGL      (see all posts) 2008/03/15 (Sat) @ 16:33

Thanks, Mike.



#16    Tangotiger      (see all posts) 2008/03/17 (Mon) @ 09:45

This should make it supereasy to get aging curves for fastball speeds, shouldn’t it?  As soon as David has the leaderboards anyway.

Finally, no need to try to infer this by strikeout rates.


#17    Rally      (see all posts) 2008/03/17 (Mon) @ 11:52

Colin, That is one amazing link.  Thank you Dave Appelman.

Do a sort on fastball speed, all pitchers, and you get Aaron Miles down at the bottom with his 72 MPH fastball.

Just one MPH faster than mine…


#18    Anthony      (see all posts) 2008/03/30 (Sun) @ 22:25

Looking at Gameday for the Nationals-Braves game right now, they’re including pitch type. Not sure how they’re identifying pitches, but they don’t get any more specific than ‘fastball.’

Love the in-game highlights though.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 04:02
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 19:13
Offense by position groups by decade

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?

Nov 19 13:50
Response of a fired head coach