THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, March 10, 2010

Open Letter from Cory Schwartz

This letter is in response to comments made at this thread:
http://www.insidethebook.com/ee/index.php/site/comments/pitchf_x_tools/

***


Mike, I’d like to address some of your comments regarding the MLBAM pitch classification engine. “Crappy” is a stronger critique than I think appropriate, but we do recognize that it’s not where it should be. However, to suggest that we’ve sat on our hands with what we’ve built is misinformed and incorrect.

We’ve treated this—and always presented it—as a work-in-progress. Along the way we have taken several changes to improve our classifications since first rolling out with a simple, two-pronged neural net, one for lefties and one for righties:

1. Added pitcher-specific scaling for velocity to better differentiate fastballs from changeups, etc.;

2. Added biasing into the classifications to better reflect pitcher-specific repertoires;

3. Implemented an entirely new and much larger set of training data, which we used to add a second hidden layer to the NN;

4. Tweaked (and continue to tweak) the input parameters of the NN to improve our differentiation of 2-seamers vs. 4-seamers, cutters vs. sliders, and other similar pitches.

At each step of the way when we’ve made changes, we’ve taken time to evaluate the results, determined next steps, then built and implemented further changes. The pace of change may not be rapid, and is admittedly slower than we also would prefer, but we have never stopped working on this in the background even if the results have not always been publicly visible.

For this season, we are currently testing fully customized neural nets for each pitcher, as well as new tools to more easily correct pitcher-specific repertoires and individual pitch-by-pitch classifications on a postgame basis. Both of these should be implemented soon after Opening Day, if not sooner. Once we implement these changes we will re-classify every pitch in our database based on the new custom NN’s, then evaluate the results and move forward as mentioned above.

Remember also that classifying pitches in real-time - for every pitch thrown, every game - is not the only challenge we face (and one you recognized in post #31); we are also limited by the ability to correctly define each pitcher’s unique repertoire, and to get accurate classifications to use as training data for the neural nets. We’ve enlisted the help of all 30 clubs, as well as from you and perhaps others on this thread, in collecting classification and training data but we’re limited by the accuracy of the source data. This has been a major effort on our part and continues to this day, and we’d be eager to see the results of any community-based results on this front.

In addition, our responsibilities to the Pitch-f/x system go far beyond pitch classification. As you can probably imagine this is an expensive and resource-intensive system to operate and maintain in 30 MLB ballparks, so our attention can’t always be focused on pitch classifications or any other specific issue, as much as we might like it to be. That the research community has been exploiting this data and has generated some amazing research from it is an unexpected benefit, but not one that we can allow to influence our overall objectives or priorities. As for Bloomberg Sports, they can defend their own products, but the suggestion that we’re not trying to improve our data because they are licensing from us is not only incorrect but completely counterintuitive. On the contrary, we have every incentive to improve our classifications—and ALL of our data—to make sure we are provide the best possible product to a major partner, which will in turn enable them to find better acceptance in the marketplace for their products.

I’ll leave it to Ross to address the relative strengths and weaknesses of the neural network approach, but I did want to address some of the critiques of how we’ve managed this from a business standpoint.

Thanks,

Cory Schwartz Director, Stats MLB.com

(30) Comments • 2010/04/17 • SabermetricsBall_Tracking
Page 1 of 1 pages

<< Back to main