THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
Mailbag:You ask:We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, May 12, 2008

PITCHf/x Summit Recap and Presentations

By Tangotiger, 08:16 PM

All your links courtesy of the fantastically accessible Sportvision, and a recap from PITCHfx-er Ike Hall.


#1    Tangotiger      (see all posts) 2008/05/12 (Mon) @ 20:38

Can someone who was there finally tell us the nonsense that MLBAM will turn off the data for analysts because a couple of teams are going to strong arm them?  Those teams are puny compared to MLBAM.

I’d be shocked and disappointed if the question was not asked, and if the impression didn’t leave you with anything other than full steam ahead.


#2    Ike H      (see all posts) 2008/05/12 (Mon) @ 21:01

I don’t remember if the question was specifically asked or not.  However, at least twice the MLBAM reps assured us that they were committed to leaving the data available for academic purposes.  However, the assurances weren’t 100% strong...or at least it didn’t seem so to me.  But I also thought that they were strong enough to guarantee or nearly guarantee that the data will be around for the next couple of years at least. But nobody thought their assurances were weak enough not to take them at their word.

It certainly did not seem that if anyone would object to the data being available, that it would be the teams.  The sort of subtle hint that I took away from all of it was that the most likely people to get offended and raise a substantial stink were most likely to be the umpires if people started using the data to bash certain blues.  (Or maybe the groundskeepers, if somehow someone was using the data to argue that mound heights are different at different parks...but I think at the moment, there are enough data quality issues that if such a claim were made, it should be taken with a large grain of salt).

But really, I didn’t feel from any of that that there was really any danger at all that the data would stop being available.


#3    Peter Jensen      (see all posts) 2008/05/12 (Mon) @ 23:02

The biggest danger to the data being pulled is if the collection of the data can not be made into a profitable concern for Sportvision and MLBAM.  They are exploring many avenues of how the information can be developed and marketed and that was part of the reason for having the Pitch f/x Summit.  They realize that this process takes time so no changes should be expected soon, but at some point they have to look at the bottom line and see some hope of recouping the rather large development costs they have already incurred and generating an ongoing profit.  They acknowledge and are appreciative of the work that amateur independent analysts doing, realizing that it improves Pitch f/x and helps to popularize it by demonstrating its vast potential. 

Ike didn’t mention the presence of the representatives from the major league teams.  All were invited and 9 teams had committed to send a total of 13 representatives.  I didn’t meet all 13, but I did get a chance to talk to 10 people from 7 teams.  None admitted to being heavily involved in Pitch f/x research, but they all seem very interested in the presentations.  There was also a reporter from the Wall Street Journal, and his article on the Summit should appear this Friday.  I was surprised at the amount of background knowledge that he already had of the field, and it should be interesting to see what his take on the event was.

There were also many employees from Sportvision attending.  I didn’t get the feeling that they were there because their bosses had asked them to attend.  They all seemed to be enthusiastic about Pitch f/x and were interested in anything that would add to it.  Everyone from Sportvision seemed excited about working there and also seemed to be very good at his or her job.  It was incredible how open and helpful everyone was.

I talked to all of the other presenters and each of us were coming away from the summit with new ideas of research to pursue, so expect a lot of new work in the next few months.  There was a lot of expertise in a lot of diverse fields in that room, and I know I learned a lot from the others that will help me in what I am doing and I sure that others felt the same way.  And we all had a great time!


#4    Matt Lentzner      (see all posts) 2008/05/12 (Mon) @ 23:35

One thing that Sportvision and MLBAM is very interested in is Pitch Classification (Fastball, Curve, etc). They want to show the chess match between the batter and pitcher. I think they see (rightly) that this is something that will draw in the casual fan and thus add value/make money.

Doing that is not straightforward at all. I think they’re really hoping that us guys who do this for fun will come up with something clever.


#5    tangotiger      (see all posts) 2008/05/12 (Mon) @ 23:47

In the PDF file, the MLB guy said that his challenge is doing something in real-time.  But, I didn’t really understand that.  I can follow him if it was the very first game, and so, he didn’t have anything to work with.  But, is he suggesting that he starts every game from scratch, and doesn’t take advantage of all the collected data for that pitcher?  Is that why Mike Fast gets better results than MLBAM?


#6    Matt Lentzner      (see all posts) 2008/05/12 (Mon) @ 23:48

Here’s another write up of the summit by Harry Pavlidis of “Another Cubs Blog”.

http://www.anothercubsblog.net/2008/05/12/state-of-f-x


#7    Matt Lentzner      (see all posts) 2008/05/13 (Tue) @ 00:08

Tango,

The holy grail of pitch identification would be a system that can correctly identify a pitch in under a second and know nothing about who threw the pitch other than what could be measured at that time. That’s a tall order obviously, but I think it is possible if you can apply some physical modeling to simplify the problem.

Once you start adding in a database/history (as you suggest) then it does make the job of indentifying easier, but it makes the job of getting quick results and managing the data harder.

Ross Paul has developed a neural net for pitch ID and he uses one version for righties and one for lefties. We discussed making a NN for each pitcher, but then you have 300+ NNs to keep track of as well as fileting the training data into dangerously small samples. Furthermore, a person (currently a MLB scout) has to generate that training data which is expensive and inconsistent.

The job that Mike Fast does is much easier since he doesn’t have to get an answer in under a second and he can apply the best pattern matching machine in existance (the human brain) to identify the pitches - although it’s a slow machine. Doing what Mike does, the way he does it, is completely impractical for a “real-time” system.


#8    Ike H      (see all posts) 2008/05/13 (Tue) @ 00:22

Tango, there are a couple of reasons he mentioned to essentially start every game from scratch...all of them in my opinion completely valid reasons.

First and foremost...although he hasn’t ruled out eventually using one, it would be rather cumbersome to maintain and have his program access a pitch database for every single pitch.  Thats a lot of overhead to carry when you might not need it to.  It may be that he uses one eventually, but for the time being he’d prefer not to have to.

But the database solution has other issues.  Say a guy adds a pitch that wasn’t previously in the database...either slowly throughout the season or over the off-season.  A classifier based on a database might try to force that new pitch into the cluster of one of his old ones.  Or say a guy starts developing a nagging injury which causes his velocity to decline.  It may then turn out that they start classifying all his fastballs as changeups. 

There really is no easy solution, because although you might have a lot of previously collected data, it’s hard to make the algorithm handle unforseen changes if you have certain expectations of the data.  Mike Fast gets better results because he has the benefit of only using hindsight.  If there’s one pitch that seems to be an outlier from every cluster, he can just call it a flaw in the data, or otherwise ignore it, while a real-time algorithm has to make some decision about that pitch, which you want to be at least close to being right.

Really though, the real-time classification has gotten a lot better since the first couple of weeks of the season.


#9    tangotiger      (see all posts) 2008/05/13 (Tue) @ 00:24

I am obviously missing something.  Let’s say that Mike has 6 starts from Beckett, and he’s figured out the clustering for all his pitches.

Now, a 7th game comes in, in real-time.  It’s a pitch thrown at 91 mph, with a certain rpm, and deflection angle.  Wouldn’t you simply compare that pitch to the 5 clusters of pitch types that Mike identified, and make that pitch one of those pitch types?


#10    tangotiger      (see all posts) 2008/05/13 (Tue) @ 00:27

I was reponding to Matt/7, so I see Ike/8’s point now.

However, the DB is no real overhead.  After all, you know who the pitcher is, so you would only download that data.  Furthermore, it would be a data warehouse of all preprocessed data.  So, it’s not going to recalculate the data points every time.

Your other point about “new” pitches stands.


#11    Ike H      (see all posts) 2008/05/13 (Tue) @ 00:27

Sure...but say Beckett has been spending too much time hanging around with Wakefield, and he decides to toss a knuckleball during a relatively meaningless at-bat.


#12    ike H      (see all posts) 2008/05/13 (Tue) @ 00:29

I guess things are updating too fast...my response #11 was in response to #9.  I see that point no longer needs to be made.


#13    Harry Pavlidis      (see all posts) 2008/05/13 (Tue) @ 00:38

T, re #1, there are certainly some best practices in regards to usage of data, and a sense of “we’re all in this together”.  The openness, interest in our work and perspectives makes me optimistic.  I suspect formalization of terms to come by 2009, but I suspect they’ll be reasonable, as long as the analyst community respects the interests of MLBAM and Sportvision.


#14    Peter Jensen      (see all posts) 2008/05/13 (Tue) @ 02:32

Tango post #10 - Ross wants his neural net to be pitcher neutral for several reasons.  One is the training problem that has been mentioned in earlier posts.  The other is minimizing operator input during the game.  That doesn’t mean that he is resistant to utilizing the knowledge gained from having observed the pitcher previously.  Ross mentioned that part of his problem was that the information about pitch types from the scouts that was used for his initial training data was not consistant for all pitches.  While everyone agreed on the curveballs, some would call a split finger a cutter others a change, etc. Some scouts didn’t differentiate between types of fastballs.  Ross used 3000 pitches classified by scouts during the training phase of the neural net.  2000 were used for the actual training and 1000 for testing the accuracy of the output from that initial training.  One of the possibilities suggested was that the neural net could be retrained by adding some of Mike’s information on pitch types from hard to classify pitchers to the training data.

Another modification that Ross is thinking of trying is adding a second hidden layer to his neural net.  I guess the what I was pleased to hear from Ross is that he knows that the pitch classification system he has now is not good enough.  He also knows that getting it right is a very high priority.  And he is open to any and all possibilities for improving his system, but he has the constraints that the system must make some decision on pitch classification for every pitch practically instantaneously without operator input.  Given that attitude I have confidence that we will have a much improved sytem by mid season at the latest.

And we should remember that all the basic data collected by Sportvision remains available to us.  We can always develop our own automated pitch classification system to produce analysis grade pitch classifications that doesn’t have to operate under the time constraints of Ross’s on air system.


#15    tom      (see all posts) 2008/05/13 (Tue) @ 05:44

This pitcher-neutral requirement is fine if we can accept that a Felix changeup is a Moyer fastball.  I don’t mind, since speed, spin rate, and spin deflection is all we really care about.  A rose by any other name…

But, humans (outside of me anyway) start with the baseline that a pitcher’s fastest pitch is a fastball, and everything else is relative to that.  So, I don’t see how you can even start with something as pitcher-neutral.  For Gameday purposes, doesn’t MLBAM require the knowledge of what the typical fastball is for each pitcher, at the very least?

I imagine however that if people adopted the speed/spin nomenclature that Ross would be far happier: “Pitch was 87mph, spin rate of 2100rpm, angle of 160 degrees”.

And if that means that a Felix changeup and a Moyer fastball gets called the name, then that’d be just dandy.  For me anyway, and I suspect for Ross’s algorithm.


#16          (see all posts) 2008/05/13 (Tue) @ 05:48

I’ve been spending hours a day on pitch classification, which should be an FDA controlled substance.  Some thoughts:

The holy grail of real-time pitcher-neutral classification is impossible, mostly because splitters and changeups overlap no matter how you slice the data.  But you could develop an algorithm that would do a damn good job on an unknown pitcher, provided you told the system which he was throwing.

For known pitchers, you would want to do your best analysis by human brain (which can take 10 minutes to a few hours—the latter for guys with multiple distinct variations on the 2-seamer and / or breaking ball) and then turn that into a personalized version of the general algorithm.  That’s because certain pitchers throw certain pitches anomalously enough to confuse a general system.

Nomenclature?  It’s often arbitrary.  Jonathan Papelbon’s “slider” is a 100% unambiguous cut fastball.  We call a low-effective-rotation curve a curve if it’s thrown slowly but a slider if it’s thrown hard, even though it resembles a slider only in velocity.  The territory covered by “curve” and “slider” probably needs about six terms (including “slurve") to cover accurately.  The low-rotation curve is probably as basic a pitch as the curve or slider (it’s thrown with the hand coming over the top to get topspin, like a curve, but the palm is left facing the plate like a slider rather than being turned so that the fingers are more perpendicular to the path of the ball), and we don’t have any name for it at all! 

My big Q is: does anyone have a cache of 2008 data in the form of Excel spreadsheets (for each pitcher, the .xml file from each appearance, imported and concatenated with a date or game_id field added)?  Having that publicly available would be a great time-saver.  If I had that, I could write up what I’ve learned without using Red Sox pitchers as examples.  (Well, I plan to do that, but with the data cache I’d get to eat and sleep, too.)


#17    Ike H      (see all posts) 2008/05/13 (Tue) @ 10:07

Actually, One of the things I suggested to Ross might go a long way to making a completely “dumb” (or pitcher-neutral) algorithm more effective is to utilize the warmup pitches a pitcher takes on the mound before the inning starts, or after being called in from the bullpen to get a baseline for at least some of his pitches.  Call the fastest one he throws a fastball, and if you can see the signals he’s giving to the catcher, you can know what some (maybe all?) of his pitches looks like.

This though requires some operator input if you want to get more than just the fastball.  But just getting the fastball is probably enough.


#18    Ross Paul      (see all posts) 2008/05/13 (Tue) @ 10:16

We are definitely going to be adding improvements to our classification system as the season progresses.  In early April we started using running averages of a pitcher’s minimum and maximum speeds as post processing filters on the neural net.  Fortunately this fixed most of what i call “The Moyer Problem”.  We also started leveraging a pitcher’s known repertoire to encourage (but not force) the network to choose a pitch that the pitcher is known to throw.
In addition to the second hidden layer, in the coming weeks we’ll be adding the % of pitcher’s maximum velocity (max_speed - pitch_speed ) / (max_speed - min_speed) as an input to the network.  Also, after hearing many suggestions at the summit, we’ll be looking into adding atmospheric data as yet another input.  I greatly appreciate all the helpful suggestions, hopefully we can get this system as accurate as possible!


#19    Peter Jensen      (see all posts) 2008/05/13 (Tue) @ 10:18

Ike - Isn’t Ross’s proposed addition of a input node that incorporates the percent of the pitcher’s top speed of each pitch type suppose to help with that as well?


#20    ike H      (see all posts) 2008/05/13 (Tue) @ 10:20

Well, as Ross just chimed in, yes…

I had forgotten about that....Being in meetings all day does that kind of thing to me.


#21    Tangotiger      (see all posts) 2008/05/13 (Tue) @ 12:17

We also started leveraging a pitcher’s known repertoire to encourage (but not force) the network to choose a pitch that the pitcher is known to throw.

Right, that means you are going away from the “pitcher-neutral”, specifically for the Moyer/Felix issue.  You need a pitcher-specific baseline to start with, at the very least, his top-end fastball.

I don’t know that you necessarily need to know the rest of the repertoire, but at the least it mitigates having someone throw 8 different pitch types if he only has 3.  But, again, it looks like you are aiming for pitcher-specific.

(I would prefer pitcher-neutral as an analyst, but the typical fan would argue against it.)

I can also see the issue MLB is having with real-time, if the calibration issue is not stable enough.  A pitcher could have his top-end fastball at 90 and his changeup at 82, and then you get a pitch that is at 86, which by spin rotation and deflection points to one of these two.


#22    Peter Jensen      (see all posts) 2008/05/13 (Tue) @ 12:52

Tango - You are confusing two issues.  Perhaps because Ike and I didn’t explain things well enough.  Ross hasn’t had the whole process pitcher neutral since April 8th.  He started the season that way but quickly abandoned the idea.  It is only the neural net part of the process that he is trying to keep pitcher neutral.  The pitcher specific information is applied pre and post neural net.  I hope I that I have understood Ross’s explanation correctly and am explaining it accurately, but I guess he can check back in if I am not.


#23          (see all posts) 2008/05/13 (Tue) @ 14:33

One of the most interesting unresolved big questions is: what exactly do we want to report, pitcher’s intent or actual results?  I actually have three columns for pitch coding in my spreadsheets:

1) Two letters describing the actual pitch thrown plus extra symbols indicating anomalous spin axis or RPM (I’ll use both eyeballs and two standard deviations to make that call).  This allows you to exclude anomalous pitches (which are either bad data or misthrown) when calculating average pitch parameters.  The extra symbols can also be used for other handy ad hoc purposes, e.g., as markers of arm slot when that slot doesn’t affect the movement of that particular pitch.

2) The same, trimmed of the extra symbols.  Assuming the anomalies are misthrown, you do want to include them in counting pitches by count, etc.  And you want to keep this level of analysis when examining what hitters do.

3) A translation of the actual pitch into intent.  Usually this means a systematic combination of what are actually distinct pitch varieties (e.g., low rotation curves plus sliders as either the pitcher’s “sliders” or “curves").  Once in a while a guy will simply throw a reasonably unambiguous slider given a curve sign or vice versa, and you can hard-code those if you’re lucky enough to see the catcher’s signals.

I can see a day, maybe just a few years off, when the real-time reporting of pitch types appears next to the speed on the game status bar (right next to the current Win Expectancy!).  So the big question needs to be answered: do we want to report actual pitch types, in as much detail as we can break them down, or simply what the catcher was calling?  There are certainly guys who throw their 2-seamers two or three distinct ways (getting extra sink or run) and probably do so with intent—so do you report them all as “2-Seamer” or do you add “Sinker,” “Runner,” and “Shuuto”?  Would that confuse the average viewer, or would they quickly learn how the subtypes group?


#24    Peter Jensen      (see all posts) 2008/05/13 (Tue) @ 14:56

Eric - Ross asked your question of the audience at the Summit.  I answered that in another recent thread MGL and others had seemed to come to a consensus that what mattered was matching a pitch type to the result of what was thrown and not worrying about what was intended.  Essentially answering the question with what the batter would have reported seeing.  Others in the room said that they thought that pitchers might not want to use the data with this batter centric view of pitch type because they would view such a pitch classification as a “mistake” since it didn’t match their idea of what they were throwing.  There was no real resolution to the question.


#25    Ike H      (see all posts) 2008/05/13 (Tue) @ 15:08

There can be no single resolution to the question either.  The spectrum of pitches is a continuum, and where one person/pitcher/batter draws the line between slider and cutter, are not at the same place that I would necessarily draw the line, or that anyone else would draw the line.

Pitchers certainly draw their own lines...but it’s not uncommon for two guys to throw 2 very similar pitches and call them something completely different.


#26    Tangotiger      (see all posts) 2008/05/13 (Tue) @ 15:17

I have no problem getting it all.  The pitcher intends to throw a pitch at 90mph, with 20rps, at 160 degrees, and ends up throwing it 88mph, with 22rps at 175 degrees.

The problem is we really don’t know what his intent was.  We try to infer his intent based on where the catcher was, what the pitch actually was, and what his repertoire is.  To that end, we need to be aware that there are two functions: (1) data recording, (2) data analysis.

So, if you need to know where the catcher was positioned, record that.  Get whatever data you can, and then make whatever reasonable inferences you can.


#27    Tangotiger      (see all posts) 2008/05/13 (Tue) @ 15:26

And by the way, where the pitcher intends to throw a pitch is also very variable.  This is easily seen at 3-0 counts with no men on base and not a power hitter at bat.  A pitcher would be crazy to try to throw at the edges here.  And even so, I’d bet you’ll find that 25% of the time, he’ll miss the strike zone.  So, whatever his intent, I think, is really useless.  As far as we are concerned, we can only deal with the data.

The pitcher himself could make use of the data and compare it to his own intentions as he remembers it, and make his modifications.  Otherwise, there’s not much for us to do about trying to infer his intent, that is not much more than guesswork anyway, beyond what the data shows us.


#28    Mike Fast      (see all posts) 2008/05/13 (Tue) @ 16:30

The discussion about including pitcher’s intent in the pitch classification process is not helped by the fact that people are using the same terms to mean different things.

If by “pitcher’s intent” you mean data like the catcher’s signal and the location of the catcher’s glove, that’s one thing.  That’s how I understood Eric Van’s comments.  That’s data that comes from watching the video and isn’t something we have for the vast majority of the data set.

Assuming that we’re only talking about the PITCHf/x data and not transcribed data from the video, we seem to have two different philosophies. 

The first camp takes the view that understanding the pitcher’s strategy/intent is important in classifying pitches.  If the pitcher has two pitches that he uses differently, we ought to see if we can do something to classify them separately based on the data available to us.

The second camp takes the view that it doesn’t matter what the pitcher is trying to do; it only matters what the batter perceives about the pitch.  If, hypothetically speaking, all breaking pitches look the same to the hitter, we should classify them together even if a pitcher uses two distinguishable types of breaking pitches differently depending on the situation.

I fall with the first camp for a few reasons.  One, publicly available information on how batters perceive pitches is a great deal scarcer and a great deal less precise than the information on how pitchers intend to throw pitches. 

Two, differences that are perceptible to the pitcher but imperceptible to the batter may have a consistent and measurable effect on the outcome of the pitch.  Because the pitcher holds most of the initiative in the confrontation, the reverse situation is rarely in play. 

Three, in my experience, those in the second camp tend to discard or overlook more useful information than those in the first camp.  The second camp tend to be those who want to aggregate the data and the first camp tend to be those who want to look for nuances.  Both approaches obviously have a place, but I contend that we generally don’t understand the data well enough yet to aggregate it well and that the nuances we are currently discovering are still very influential in our overall understanding of the data.

There’s a fourth reason I fall in with the first camp, but it’s part of a larger point.  Both the first camp and the second camp deal with a continuum of data and have to divide the data into defined bins somehow in order to use the data for almost all purposes. 

The first camp tends to use the pitch names as already defined by the baseball community for their bins.  This obviously has some negatives since the bins can be a bit fuzzy.  Its positives are in being able to draw from and test the existing baseball conventional wisdom and scouting reports. 

The second camp needs to define their pitch classification bins in some fashion and has the difficulty of determining the best way to do this. 

There are some applications in which it makes more sense to take the approach of the second camp, but I think it’s a poor choice for many other applications and a poor approach to cracking the general pitch classification problem.


#29    Harry Pavlidis      (see all posts) 2008/05/13 (Tue) @ 19:04

I don’t think of the camps in terms of philosophies, but, rather in terms of sets of questions.  I have tents in both camps.  Now that I have a clearer understanding, and greater confidence in PFX (seen it with my own eyes), I’ll personally be asking (and already am) more camp 1 questions.


#30    John Walsh      (see all posts) 2008/05/14 (Wed) @ 14:57

#23/Eric,

I can see a day, maybe just a few years off, when the real-time reporting of pitch types appears next to the speed on the game status bar (right next to the current Win Expectancy!).

When I read this this morning, I was thinking “a few years” might be a bit prudent and low-and-behold I’m here watching the Orioles-Red Sox game today and pitch types are appearing on the game status bar. No Win Expectancy yet, though grin


#31    Ross Paul      (see all posts) 2008/05/14 (Wed) @ 15:10

Well… I’m calculating Win Expectancy, but i keep getting 100% for Philadelphia… Philadelphia also keeps producing a 100% disappointment coefficient so I think my math must be off…


#32    Tangotiger      (see all posts) 2008/05/22 (Thu) @ 08:39

Mike Fast presents his recap:

http://www.hardballtimes.com/main/article/drinking-from-a-fire-hose/


#33    Peter Jensen      (see all posts) 2008/05/22 (Thu) @ 23:32

The article in the Wall Street Journal about the Pitch f/x Summit should be out on Friday.


#34    Mike Fast      (see all posts) 2008/05/23 (Fri) @ 00:28

Darren Everson’s article in the WSJ is available here:
http://online.wsj.com/article/SB121149546083915647.html?mod=sports


#35          (see all posts) 2008/05/23 (Fri) @ 15:43

Thanks to Mike Fast for posting the link to the WSJ article.  If we can continue this particular thread, I would be curious what people think of the article.  I am a bit disappointed at the brevity of the article.  Moreover, I thought the tone was one of a bunch of stats geeks finding a new toy to play with, a toy that real baseball folks don’t take all that seriously (with a few exceptions).  Or am I misreading the tone?  Certainly the summit summaries by Mike, Harry, and Ike have a very different emphasis.


#36    Tangotiger      (see all posts) 2008/05/23 (Fri) @ 15:58

I took it as nothing but very positive.

I’ve come to believe that when reading articles, it is sometimes useful to skip over some of the adjectives, and get to the content.  Let me highlight his important statements:

The baseball box score, bless its little numerical heart, is dead. It lived a nice, long life—about 140 years—but it has outlived its usefulness. Its archaic statistics, pilloried for years by serious statisticians, tell you only what players have done, not what they’re capable of doing. It’s the past.

The future doesn’t lie in newer, better statistics, however. It isn’t really grounded in numbers at all. The future, like it or not, is in pictures.

So, he sets the tone as there’s a major shift going on.

Pitch f/x starts baseball down the path of learning how players do things—which batter hits the ball the hardest, which shortstop has the quickest reflexes, what pitcher has the nastiest slider. It showed, for instance, that St. Louis Cardinals pitcher Adam Wainwright had one of baseball’s most violent curve balls in 2007, with up to nine inches of vertical drop and more than seven inches of horizontal movement. Perhaps consequently, Mr. Wainwright had an unusually high rate of swings-and-misses against his curve (38% versus a league average of a little more than 25%), according to a sampling of data by Harry Pavlidis, a baseball analyst who writes a prominent blog about the Chicago Cubs.

Nearly everyone at the conference believed such advancements in measuring fundamentals could finally bring a “why” to the “what” of box scores and stat sheets. The same technology will spread to hitting and fielding, they say, and could be applied to other sports.

“Instead of saying, ‘There’s a hard smash to third base’ we could say, ‘That ball was hit 106 mph and the third baseman had a third of a second to react.’ “ says Peter Jensen, a statistician and summit attendee who has written for the Hardball Times, a baseball analysis site. “That adds some context that’s been lacking so far.”

He lays it out, that we’re really doing scouting at this point.

“I think this stuff is helpful in a way,” says retired manager Jack McKeon, who won the 2003 World Series title with the Florida Marlins, “but I still think you need to use your eyes.”

The ironic thing about McKeon’s quote is that PITCHf/x is actually representing, numerically, what our eyes are seeing (if they were Lee Majors’ eyes).

So, McKeon really wants this, but doesn’t realize it yet.

***

There’s no question at all that we are on the cusp of the pinnacle of sabermetrics, that point where performance results and scouting observations converge.

This article, if you can skip past some of the humor directed at researchers (which seems to be a requirement, so that the skeptical reader knows he has a friend in the writer), does a great job in leading the reader in that direction.

Like I said, nothing but very positive.


#37    Mike Fast      (see all posts) 2008/05/23 (Fri) @ 17:00

Objectively, Tango’s probably right.  I had much the same reaction as Alan, but my wife agreed with Tango’s viewpoint, and she’s usually right about these kind of things.

I think what is missing from the WSJ article for those of us who attended the conference is the amazing nature of the discussion that went on.  It was probably not Darren’s objective to communicate that throughout the article, other than to note our enthusiasm:

By the end, the stats wonks, engineers and nine team representatives in attendance could barely contain themselves. “It’s tremendously exciting for people like me,” said Mat Olkin, a Kansas City Royals consultant.

Darren appears to have been more interested in communicating how PITCHf/x may change the game than documenting the amazing science and engineering that has gone into PITCHf/x and that occurred at the summit.  That probably makes a lot of sense for his audience.

But for anyone who was there, a sentence like “Sportvision says the system was measuring the ball’s speed at the release point rather than in mid-flight, as radar guns do.” does not begin to do justice to the impromptu lesson we got from Ken Milnes on how baseball radar guns process radar returns and how that affects their reliability as a measuring tool.  It seemed like there were a hundred such moments over the course of the weekend because of the caliber of the people that were there.


#38    SirKodiak      (see all posts) 2008/05/24 (Sat) @ 02:21

The use of nerd in the title, wonk* in the body, and the placement of the quote at the end (that made sure to point out McKeon was not just an ex-manager, but one that recently won the WS) that was in direct opposition to:

By the end, the stats wonks, engineers and nine team representatives in attendance could barely contain themselves. “It’s tremendously exciting for people like me,” said Mat Olkin, a Kansas City Royals consultant.

points me towards, at minimum, a condescending tone.

*An expert who studies a subject or issue thoroughly and excessively


#39          (see all posts) 2008/05/24 (Sat) @ 08:34

I also thought it was a little condescending.

You have to admit that those of us who play with MySQL and Perl scripts are nerds, but what’s important to the public and the teams is the ability to get new insights on player’s skills, and he did talk about those also.


#40    Peter Jensen      (see all posts) 2008/05/24 (Sat) @ 08:36

To be fair to Darren we are nerds and we are wonks.  I even remember many of us st the Summit describing ourselves that way.  I talked with Darren for quite a while as we walked to the ballpark on Saturday.  I was very impressed with the preparation that he had done before the conference to familiarize himself with the subject and the participants.  We talked about the day’s presentations, and it was clear that he had listened carefully and understood the important points made by each of the speakers.  He asked me intelligent questions about the implications of my talk and those of the other presenters.  He also asked if any research had been done on which pitchers had the best curve balls, fast balls, etc. and I later sent him a link to John Walsh’s Hardball Times article which he quotes in the article.  As Mike alluded to above, he took the point of view of how Pitch f/x was going to change the game of baseball for the general viewer and perhaps for the teams and players, and he communicated that point of view effectively.

I don’t know what more you could ask from a reporter.  The point of any professional conference is for it to be intersting and informative for the professionals that attend, and the Summit succeeded at that better than any other conference I have ever been to.  But it is a little too much to expect for any outsider to be able to understand the particular passions that drive the participants.  To outsiders of the subject of ANY specialtist conference the participants are going to seem a little weird.  I don’t think we were being picked on unfairly.


#41          (see all posts) 2008/05/24 (Sat) @ 08:46

Obviously no one who thought the article was a bit condescending has ever read a report on a science fiction convention that they’d attended!  Peter nails it with his final comments.  And note that the article says that representatives from the Red Sox and Yankees were as excited as the stats wonks.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main