Thursday, May 15, 2008
Place Hitting
I haven’t even read Walsh’s piece, but I’m excited to. BRB.
Buy The Book from Amazon
Voros got the ball rolling with DIPS. You can see what he’s doing, step-by-step, essentially recreating a pitching line, presuming all batted balls in park resulted in the same percentage of hits. And you end up with a dERA. (This has a high correlation with FIP, which is why I prefer FIP.) Then, Hardball Times over the past few years, have been breaking down the batted balls by FB, GB, LD, and so, each one has its own hit and out rate, or basically run value, culminating in a fantastic resource, and indeed, a staple in THT Annuals.
Essentially, a LD has the same run value as a walk, an infield fly has the same run value as a K. You could throw those into the FIP equation, if you like. Now, completing the circle, Graham basically follows the Voros method of recasting a pitching line but looking at the batted ball type. I’d like to call it DIPS 3.0, but Gassko used the name three years ago, and it looks like he did something very similar to Graham. I am glad that someone as seemingly resourceful as Graham (I don’t think I know him) is in our midst.
I’m having a hard time keeping all these different flavors of Batted Ball Run equations straight. I hope now you guys are in the same boat as I am. Maybe Graham can write something up for Hardball Times (without those greek letters).
This is my annual complaint. Don’t want to miss it.
STATS, BIS, MLB.com, please, record where the fielders are playing. If the NHL, with half the revenues of MLB, can employ at least a half-dozen scorers, MLB can afford to hire one extra guy to see where everyone is playing. And while you’re at it, give him a stopwatch.
Fabulous article by Peter Jensen:
Let’s take the two observers in closest agreement, BIS and Greg, split the difference between them and call that the best guess of the actual hit location. What is the minimum distance and degrees that will have 95 percent of both Greg’s and BIS’ observations included? The answer is +-18 feet and +-4 degrees. That’s a pretty big area. It is two whole zones in width.
...
It doesn’t matter if you have three observers or 3,000, the composite data will never have any less error than that of the two closest. Having many observers is only useful for finding those two best observers.
Fantastic stuff. And great point. Peter is right, that by throwing in as many observers as I can, I wouldn’t want to weight each one equally. The better the estimator (relative the other other 2999), the more I would weight that observer. Ideally, you’d be down to just one observer, the perfect guy. Realistically, you might have one observer carry 10% of the weight, another 9%, another 8%, and on an on, such that you only need about 20 observers out of the 3000.
However, his conclusion that the error is now 22 feet doesn’t necessarily mean that’s bad. If the two closest observers were within 18 feet of each other, but the third observer was in fact the best for a particular data point, I’m not sure that we’d want the 18 feet. For example, MGL and Marcel have a similar forecasting engine as its basis, while Chone does not. By selecting the two closest in agreement (MGL, Marcel) doesn’t mean that it’s necessarily bad if we also include Chone. Perhaps Greg and BIS are biased in the same manner (rely more on video than in-park).
Question to Peter: what is the correlation of STATS, BIS to Greg? And what is the weight for each of those two? Repeat for the other combinations. Couldn’t we come up with a better estimate of where a ball landed based on different weightings?
Thank you Brian Bannister!
The data that Greg used in the THT08 Annual:
http://sonsofsamhorn.net/index.php?showtopic=26701&st=20
Additional commentary in the preceding page of that thread.
Josh Kalk provides the necessary primer so that we have a baseline to compare against.
The confusion continues with the use of the word “break”. I sincerely hope that the heavyweights (Josh, John, Joe, Mike, Dan, et al) get their terms consistent. A “break” is what you see as a human being. What Josh is showing is spin-induced, gravity/time-less movement. A break is made up of those three things: the spin imparted by the pitcher, the speed of the ball, and gravity.
OPS? The correct currency is either runs or wins. John Walsh does it right in the THT Annual. You can get what you need from Craig Burley or me. The short rule is: never ever ever use OPS, unless you are trying to do something quick.
Josh’s work is simply too fascinating to have such imperfections. I’m being picky to all you guys, but you guys can set the standards here. Don’t confuse readers with “breaks” that exist in an equation, and OPS, which has no units.
David’s fine article will serve as the impetus for my diatribe:
Want to see how and where balls have been put in play against Paul Byrd when he has two strikes on a left-handed batter? Look no further.
Now downloadable:
http://www.hittrackeronline.com/forum/viewtopic.php?t=10
The always resourceful Dan Fox gives us a breakdown of batted balls. Here’s the summary, with a little change on my part:
HITS. SLG.. LWTS w/GIDP Type
0.728 0.940 +.324 +.324 Line
0.177 0.302 –.126 –.126 Fly
0.241 0.262 –.108 –.126 Ground
0.020 0.024 –.284 –.284 Pop
Those numbers EXCLUDE homeruns in the numerator and denominator. So, a flyball is a hit only 17.7% of the time, but the slugging average on a flyball is .302. Groundballs have a hit .241 times, but with a slugging average of .262. As it turns out, the extra oomph on extra bases for flyballs can’t overcome the extra outs for a hitter. The Linear Weights run value of a flyball (excluding HR) is about .02 runs lower than your standard groundball. However, don’t forget the GIDP. Throw the GIDP in, and guess what? The run value of a FB and GB are virtually equal!
But remember, the reason that we prefer GB pitchers, generally speaking, is that they give up less HR.
The run value of a line drive is similar to that of a walk, and the run value of a popup is similar to that of a strikeout.
Protrade checks in with the UZR version for hitters:
http://www.protrade.com/content/DisplayArticle.html?sp=Sfd06ae48-d89d-11db-8683-5577a9d16e8f
http://mlb.mlb.com/news/article.jsp?ymd=20070322&content_id=1854282&vkey=news_mlb&fext=.jsp&c_id=mlb
The main issue is getting enough parameters to distinguigh a Beltran hit to location x,y that was “hard hit” from a Neifi Perez’s hit to location x,y that was “hard hit”. On top of which, you really need to know the fielding alignment. In short, it’s a good first try, but the parameters required need to be extended.
Studes follows the Voros approach in describing some players. It is in fact Voros’ approach that allowed me to create aging charts. (See Legend at the bottom)
As you can see, each rate describes something specific.
Now, there’s no reason that you must look at things this way. It assumes a certain independence that perhaps is not warranted. You could for example, look at things in other ways. Rather than removing HBP from the denominator first, then the BB, then the K, then the HR, you can remove all four right away.
So…
We should all be indebted to people who parse through the play-by-play files, and present us with compiled data. When it comes to batted ball data, we start with Studes at Hardball Times. Dan Fox provided a tool on his site to parse through the PBP data (maybe someone can find me the link; memo to Dan: you have too much great stuff at your site, that it’s in desperate need of some organization). And now we can add David at Fangraphs. Fangraphs even adds the fantastic news of:
Can Baseball-Reference be far behind?Furthermore, at some point this season, we’re hoping to have batted ball splits available for all players for 2002 onward.
When it comes to batted ball data, we need to get away from the 100-year old practice of excluding SF from At Bats. It is one of the silliest distinctions made. And when it comes to OBP, we include it! As well, reached on error is not a pure luck on the batter’s part. They definitely provide some influence (though not as much as a hit). Again, we need to record that. Tom Ruane of Retrosheet goes further to also include reaching base on a fielder’s choice, when no putout is made (i.e., a force play was attempted, and failed, at 2B, 3B, or Home Plate, leaving the runner and batter-runner safe). That too is an excellent category to track.
Greg from Hittracker Online will be stopping by, explaining his project, and how you can help.
A researcher can be quite content to spend half his time analyzing the effects of Fenway Park. It has all the parameters that makes it interesting to study. Plus, it’s a fantastic place to see a game. A great marriage of analyzing cold hard numbers, and then appreciating baseball in all its glory.
But, we need help:
Take one hard-working and generous Dan Fox, combine them with crazy Redsox fans who look at the spread of balls hit by JD Drew, fans who also overlap Dodger Stadium over Fenway Park, to see what happens. Or, take other crazy Redsox fans, who analyze all of JD’s long balls, and he says:
The overall net effect was -2 HR’s, -3 Triples, +8 Doubles and -3 Flyouts. So, you lose 17 bases and pick up 16 - practically a wash.
According to FanGraphs, Mark Teahen had a GB/FB ratio of 2.22 in 2005, and 1.37 in 2006. Quite a shift, wouldn’t you say? In 2005, 53% of his contacted balls were groundballs, while in 2006, it’s 50%. Quite similar, wouldn’t you say? What’s the difference?
More great Edison-work being done here:
http://www.firstinning.com/articles/charts/
(Hat tip: studes)
One thing I would do is…
May 16 22:50
Dodgers’ win reversed because Mattingly did not attest to proper score!
May 16 20:44
How to beat the shift
May 16 20:02
Sponsoring MLB jerseys
May 16 19:34
Now you frame it, now you don’t
May 16 16:56
Did Manny Pacquaio actually quote Leviticus?
May 16 16:06
Does changing your pitch frequency lead to substantial change in results?
May 16 14:18
Extra Innings: One-minute review
May 16 14:16
This particular criticism of UZR is unfounded
May 16 13:21
Psst… wanna intern for the Astros?
May 16 12:23
Arena wars
THREADS
May 16, 2012
Now you frame it, now you don’t
May 16, 2012
Dodgers’ win reversed because Mattingly did not attest to proper score!
May 16, 2012
Does changing your pitch frequency lead to substantial change in results?
May 16, 2012
Sponsoring MLB jerseys
May 15, 2012
Andre The Hawk Dawson speaks
May 15, 2012
Euro 2012 Preview
May 15, 2012
How to beat the shift
May 15, 2012
Will Pujols end the season with at least 30 HR and .500 SLG?
May 15, 2012
Kershaw v Strasburg, part 2
May 15, 2012
Did Manny Pacquaio actually quote Leviticus?
Recent comments
Older comments
Page 6 of 342 pages « First < 4 5 6 7 8 > Last »Complete Archive – By Category
Complete Archive – By Date