THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, April 13, 2010

Batted Balls: BIS v MLBAM

By Tangotiger, 06:46 AM

Colin:

Failing that, a simple stopwatch could provide more accurate, quantifiable data than what we’re getting right now. And it is possible, to some extent, to review video of past games and get those measurements for players and seasons already passed.

In the meantime, consider this my sabermetric crisis of faith. It’s not that I don’t believe in the objective study of baseball. I’m just not convinced at this point that something dealing with batted ball data is, at least wholly, an objective study. And where does this leave us with existing metrics that utilize batted ball data? Again, I’m not sure. I can tell you I’m a lot less comfortable accepting their conclusions—even over a large number of seasons—than I was in the past. 

This is exactly where I’m at.  This is why in my WOWY, I rely much more on the identity of the pitchers and batters, than the classification of whatever someone thought was their FB, GB, LD.  And, like Colin noted: stopwatch.  I’m not sure how many years I’ve been banging that drum, but there’s so much truth in the easiest of technologies.  Heck, “one mississippi” is about the earliest technology there is. 


#1          (see all posts) 2010/04/13 (Tue) @ 09:57

just read the article.  very interesting stuff.

i think i tend to believe in the BIS date over MLBAM and STATS.  i trust the video evidence over the eyes of someone at a disadvantage in a press box. 

how many times have we heard the announcer get excited for a homerun when it was a pop fly to the outfield?


#2    Tangotiger      (see all posts) 2010/04/13 (Tue) @ 10:16

The advantage that BIS has is they can stop, rewind, etc.

However, there’s no one I trust more than Greg Rybarczyk, and as Peter Jensen showed in his groundbreaking article at THT, there’s as much disagreement with Greg with BIS as there is with STATS and MLBAM.

Basically, the very trust level of everyone involved is on shaky ground.


#3    Tangotiger      (see all posts) 2010/04/13 (Tue) @ 10:25

This is Peter’s article:

http://www.hardballtimes.com/main/article/is-seeing-believing/


#4          (see all posts) 2010/04/13 (Tue) @ 10:26

I don’t see any reason to trust BIS over MLBAM, or vice versa.  None that I find persuasive yet, anyway.

The discussion that Colin and I have been having (mostly over Twitter) about comparing UZR to Plus-Minus has served to further undermine my confidence in the data.  Or to reinforce Colin’s point about the data being in sad shape.

Colin is correct that if we can precisely identify and isolate the biases in the data, we can make progress again, but I’m at a loss of how to do that at the moment.


#5    David Cameron      (see all posts) 2010/04/13 (Tue) @ 10:41

I think Colin’s making a little bit too big of a deal out of this.  The perfect is the enemy of the good. 

I agree that we want to try to isolate biases where we can and be aware that not every person is going to treat a line drive the same, but there aren’t a lot of people doing analysis based on line drive rates, so I can’t agree that “batted ball data” should be seen with a skeptical eye.  LD%, sure.  But the community has acknowledged the lack of predictive value of LD% and generally avoided it.


#6          (see all posts) 2010/04/13 (Tue) @ 10:47

David/5, I think you’re glossing over the issue.  If your point is that what we have now is better than going back to the days when all we had were counts of singles, doubles, triples, home runs, walks, and outs--then, sure, you’re right.

But much of the current batted-ball-based analysis is built on data that is very shaky, or has a large error, if you want to put it that way.  Even this conclusion: “the lack of predictive value of LD%” may be false in reality and just an artifact of the data collection process.


#7    Colin Wyers      (see all posts) 2010/04/13 (Tue) @ 11:04

Wow, David, what an amazing way to ignore the issue.

What we do know is that GB rates tend to be the most stable between data providers. So for the sake of expediency we can say that every “error” in LD% means a corresponding “error” in FB% as well.

So if you have an LD bias you have a FB bias as well. So if you want to evaluate pitchers using HR/FB (as is done in xFIP) then you have a bias in how you are evaluating a pitcher’s ability to prevent/allow home runs. This is (again) a persistent bias, over a number of seasons, based upon what park a pitcher (or hitter) plays in.

So when you have a pitcher whose HR/FB rate is low/high for a period of years - do you say that pitcher has been lucky/unlucky? That the pitcher has some skill in preventing HR that xFIP isn’t capturing? Or do you acknowledge that there are potential problems in the data?

Other Fangraphs stats that implicitly include batted ball data (yes, including line drives) are tRA and UZR. I in fact provided examples of serious discrepancies in tRA that could be predicted from the biases found in the underlying rates. That doesn’t bother you? At all?


#8    dkappelman      (see all posts) 2010/04/13 (Tue) @ 11:06

Well, BIS is doing hangtime these days, but it’s just as available as Hitf/x, as in it’s not. 

Colin, this is not meant as a stab at you, and I realize you were going off what I was saying in the initial tRA post, but using the difference in tRA between FanGraphs and Stat Corner is an awful stat to use to illustrate this point because there are other differences.  Let’s take Felix Hernandez this year who BIS and Gameday have very very similar batted ball profiles on for 2010.

GB/LD/FB
BIS 67.6/13.5/16.6
GD 65.8/13.2/15.8

Now, here’s the difference in FanGraphs tRA vs StatCorner tRA

TRA
FG 4.62
SC 5.05

Almost a half a run difference.  Why are they so different, it’s probably the component park factors, mainly on LD% and HR% (I think). 

Actually, I’ll plug both of those stat lines into the FanGraphs tRA calculator and see what I get:

4.62 and 4.70.  So, about .8 of the differences is because of GB/FB/LD differences and the other .35 is park factors (or slightly different weights).

If you look at individual player GB% correlation from 2003 to 2008 between BIS and Retrosheet data, you get .94.  That’s among all players, whether they pitched 1 inning or 200 innings.  Here’s the others:

GB% - .94
FB% - .85
LD% - .72

It’s not like the data from BIS or Retrosheet are telling us completely different things.  Is the data perfect, probably not, but I don’t think I can really agree with your conclusion.


#9    Rally      (see all posts) 2010/04/13 (Tue) @ 11:38

So to get the BIS popups, it’s the infield fly%?

So a team showing 40% flyballs and 10% infield flies actually has 30% flyballs that should match up to retrosheet?  For some reason I thought that the 10% was the percent of flyballs that stay in the infield, so comparing to retrosheet should give you 4% and 36%.

In any case I’ve always adjusted any formula I used to the league average for that dataset.  And using a formula designed for one dataset with other data is not going to make any sense.  To me it’s old news that despite similar names, we may not be measuring the same things.


#10    Colin Wyers      (see all posts) 2010/04/13 (Tue) @ 11:38

I guess it’s two ways of looking at it. With a correlation of .85, what that’s saying is that 72.25% of the variance in FB% is “shared” or “explained” by a common factor (most likely the actual distribution of batted balls observed, plus the potential for any park bias that’s shared between both data sets).

But that leaves 27.75% that is explained by things NOT in common. That seems pretty large to me, given that they’re supposed to be measuring the same thing.

And if that 28% were distributed randomly, it would be less of a concern. But the whole point of this exercise is that it isn’t - no matter how many years of data you have for a guy, the differences in the data seem to persist.

And again, I don’t know if this is a problem of bias in the BIS data, in the MLBAM data, or in both. But I think it’s worth trying to figure it out.


#11    eitheror      (see all posts) 2010/04/13 (Tue) @ 11:46

Are the weights of the fangraphs tRA and statcorner tRA different?


#12    rluzinski      (see all posts) 2010/04/13 (Tue) @ 11:53

What is the “fallacy of the wisdom of the crowd’s theory”, according to Peter?  The average of the two observers in most agreement is not our best guess for actual landing location.  As Peter points out, we don’t know what source has the most accurate data, so the average of all the data would indeed be our best guess.  He’s just defining the smallest possible error, no?


#13    David Cameron      (see all posts) 2010/04/13 (Tue) @ 12:26

Colin,

I completely agree that we should try and figure out where the disagreements between the scorers are.  I just don’t think I’m burying my head in the sand if I disagree with your conclusion that all batted ball data is suspect. 

I think you have a legitimate point about potential scorer bias, and it’s something to be aware of.  I’m not trying to discredit anything you wrote.  I just think your conclusion was a little over the top.  If our options are imperfect data or no data, I’ll go with imperfect data, and do the best I can with that.


#14    Tangotiger      (see all posts) 2010/04/13 (Tue) @ 12:29

David: suppose UZR doesn’t ignores all balls labelled as line drives, but keeps ACTUAL line drives labelled as flyballs.

So, the out rate of those line drives should be .25, but they are counted as being .90 outs.  So, an OF, making an out on a true line drive is only going to get +.10 outs in the deal, when he should get +.75.

Indeed, bUZR and sUZR has over a 100 run difference on Andruw Jones, presumably because of the mislabelling of line drives.  Colin made a decent choice in selecting the Ichiro example from my blog post, but the Andruw Jones one is what shook the foundation for me. 

Using the same system (UZR), based on one data source, he was the best fielder of the seven year time period, and based on another data source he was league average.  If that’s the margin of error in the data, then that’s a tough sell.


#15    Tangotiger      (see all posts) 2010/04/13 (Tue) @ 12:31

"If our options are imperfect data or no data, I’ll go with imperfect data, and do the best I can with that. “

Right.  And you fight the conclusions of that data based on the uncertainty level of that data.  If someone says Teixeira is the best fielder at 1B in baseball, you can’t say “nope, you are wrong”.  He has a decent chance at being right.

He has no chance of being right if he says one of the Molina brothers is the best hitter in baseball.


#16    KJOK      (see all posts) 2010/04/13 (Tue) @ 13:00

I’m guessing a lot of this bias can be explained by the two sources using different definitions for what is a ground ball, fly ball, etc.?

BIS actually has 7 different Ball in Play types:

Grounder
Liner
Fliner (Line Drive)
Fliner (Fly)
Fly
Bunt Grounder
Bunt Fly

Definition of Grounder - a grounder is a ball that hits the ground before reaching an infielder playing at normal depth. If a ball is hit that first hits the ground further than a normal infielder’s depth, it is considered a liner or a fliner.

I’d guess that MLBAM may be considering anything that hits the infield as a grounder, which would cause the ground ball rates to be different.


#17    Colin Wyers      (see all posts) 2010/04/13 (Tue) @ 13:13

KJOK, unless I’m way off base with my methodology, very little of the bias should be attributable to that. I didn’t compare the rates directly, but first “normalized” them by subtracting the league average rate for that data provider.

So is there a reason we should think that there are relatively more borderline grounder/liners at some parks than others, over a period of years?


#18    Jamie      (see all posts) 2010/04/13 (Tue) @ 13:17

#17/Colin

Some fields are faster, some are slower.  they use different grass seeds and are cut at different lengths.  players will be up/back depending on how the field plays.  maybe different infield “shapes” can confuse stringers at parks.  where the appearance of an infielder playing back will look like he’s playing at normal depth.


#19    joe arthur      (see all posts) 2010/04/13 (Tue) @ 13:21

Colin’s presentation of the data, at the aggregate level, shows widest discrepancy in the identification of infield flies, not of line drives. That discrepancy must “explain” the largest share of the discrepancy with outfield flies.

And there is a system issue with retrosheet/mlbam hit-typing which Colin doesn’t mention in this article, which is that sacrifice flies are not actually hit-typed in those data sets. Sacrifice flies invariably appear in retrosheet as flies. BIS however will label a sacrifice fly as a liner or a fliner, which (from memory) probably accounts for roughly 1/8 of the “overage” in line drives between BIS and retro/mlbam.

As always, interesting work from Colin.


#20    Colin Wyers      (see all posts) 2010/04/13 (Tue) @ 13:22

Some fields are faster, some are slower.  they use different grass seeds and are cut at different lengths.  players will be up/back depending on how the field plays.

The point where you differentiate a GB from a LD is exactly the point where the ball first touches the ground. The cut of the grass has no impact on this - by the time the physical characteristics of the ground impact the travel of the batted ball, the LD/GB distinction should have already been determined.

maybe different infield “shapes” can confuse stringers at parks.  where the appearance of an infielder playing back will look like he’s playing at normal depth.

And that would be an instance where we’d want to try and identify what those parks are and control for that bias.


#21    Colin Wyers      (see all posts) 2010/04/13 (Tue) @ 13:23

And there is a system issue with retrosheet/mlbam hit-typing which Colin doesn’t mention in this article, which is that sacrifice flies are not actually hit-typed in those data sets. Sacrifice flies invariably appear in retrosheet as flies.

I don’t believe this is the case - looking at Retrosheet I was able to find sac flies coded as line drives and popups.


#22    KJOK      (see all posts) 2010/04/13 (Tue) @ 13:34

Colin - I’m probably misunderstanding, but it looks to me it’s the provider bias that we’re actually trying to explain?  Differing definitions of ground ball vs. fly ball vs. line drives would certainly explain some of the bias?

BIS doesn’t have an infield fly ball type - they have flys or fliners that are caught or retreived by infielders.

The whole purpose of fliners was to try to better separate out Fly balls vs. Line drives.  I think BIS has had Fliners for only since 2008.


#23    JBrew      (see all posts) 2010/04/13 (Tue) @ 13:39

Some possible classification differences:

Extreme shift (Ortiz for instance)- liner to short right fielded on the bounce by 2B and thrown out at first.  GB or LD?

Soft line drive (squibber?) off the end of the bat the bounces in front of the infielder but looked more like a LD off the bat.

Anything caught by the pitcher below his waist.  Decent chance it would bounce within the infield.

These are extreme (and small in number) examples, but there are probably others as well. 

--------

From the data Colin showed, the greatest discrepancies are in what percentage of FB are IFFB.  I think his third explaination is the best: that they are both biased and in different manners.  Isn’t the problem with UZR more of location not classification.


#24    Peter Jensen      (see all posts) 2010/04/13 (Tue) @ 14:08

Joe - The MLBAM text file describes all sacrifice flies as flies, but the INNING_HIT files differentiates those sacrifice flies that are line outs or fly outs.


#25    Peter Jensen      (see all posts) 2010/04/13 (Tue) @ 14:34

I would think that most of the ground ball/line drive differences would be on interpretations of hits between the 3B and SS and between the 2B and 1B where the ball is in the air as it passes the 3B or 1B but hits the ground before it passes the SS or 2B.  Also outs that are deflected by a fielder while in the air but then hit the ground before another fielder makes the play.

Colin is certainly correct that having biases that persist for different home fields is disturbing and that it would give much more confidence if the source of bias could be found and corrected.  But David K.s post #8 points out that the impact of the bias on any one particular player’s stats is likely to be minimal, which I think is the main point that David C. is correctly making in his post #5.


#26          (see all posts) 2010/04/13 (Tue) @ 14:41

But David K.s post #8 points out that the impact of the bias on any one particular player’s stats is likely to be minimal, which I think is the main point that David C. is correctly making in his post #5.

If that’s the assertion, then I don’t agree with it.  At least I don’t see any evidence offered that leads me to believe it, and I do see evidence to the contrary.  See Tango’s reference to the Andruw Jones problem, Colin’s reference to Ichiro Suzuki and Wandy Rodriguez, etc.

If the assertion is that the bias averaged across all players is zero, that’s true by definition, but doesn’t get us anywhere.

If the assertion is that the bias is minimal for MOST players, that MAY be true, but (1) it’s still problematic for those players for which it is not minimal, and (2) that’s not a helpful assertion until we can determine which players are affected to what degree.


#27    Peter Jensen      (see all posts) 2010/04/13 (Tue) @ 15:25

Mike - I am not sure what you are talking about.  David’s Post #8 showed that most of the tRA differences shown in the two examples that Colin gave where due to factors other than differences in hit ball type classification.  Likewise, the Andruw Jones and Ichiro Suzuki analyses by Tango show differences between UZR calculated from BIS and STATS data.  Those differences could be due to differences in UZR methodology, which MGL has acknowledged may not be exactly the same.  Or, more likely, due to differences in hit ball locations between the two sources.  But no one has shown that the differences are the result of bias in hit ball type classifications.


#28    J. Cross      (see all posts) 2010/04/13 (Tue) @ 15:45

David Gasko looked at batted ball park factors (I think using BIS data):

http://www.hardballtimes.com/main/article/batted-balls-and-park-effects/

and his park factors for line drives are all between .97 and 1.03.

Brian Cartwright looked at park factors using retrosheet data:

http://www.fangraphs.com/blogs/index.php/what-i-hate-about-line-drives/

and found line drive park factors ranging from 0.80 to 1.23.

These are wildly different ranges and I puzzled before about how they came up with such different results.  Part of it is that Gasko regressed his results to get “true” park factors whereas Cartwright’s numbers are sample park factors but I wonder if that’s not the only think going on here.

If retrosheet yields big park factors for LD% but BIS does not, doesn’t that imply that the BIS data is better?


#29    JBrew      (see all posts) 2010/04/13 (Tue) @ 15:52

Please correct me if I am wrong, but wouldn’t it be better to delineate the discussion on pitching and fielding.  The former focuses on the classification of the pitched ball, while the latter on a combination of classification and location. 

From the pitching side, it primarily deals with the percentage of outcome as discussed.  I won’t claim to know the background calculations, but I believe that a difference of a percent wouldn’t change the result much (Dave A. now shows a RMSE of 0.5 over on FG in the comments).  It’s a long way from being as accurate as hitting data, but it is also a long way from where stats were.

From the fielding side, doesn’t UZR also utilize locations in zones.  This is where I could see significant difference between in person and from video.  The main difference is that if the same stringer(s) is always at the same park, there could be some home park bias developing that rotating video stringers would not develop. The other difference is where each stringer locates the point of fielding the ball, where it lands/is picked up/etc.  Video doesn’t always provide a good starting position or it could be difficult to determine the location in the outfield.  I could see these two basic differences cause an outfielders UZR to vary based on BIS vs. STATS.

It seems like much of this discussion started from the introduction of +/- over on FG and that some of the differences between it and UZR were due to source data.


#30          (see all posts) 2010/04/13 (Tue) @ 16:04

Mike - I am not sure what you are talking about.  David’s Post #8 showed that most of the tRA differences shown in the two examples that Colin gave where due to factors other than differences in hit ball type classification.

He has hardly shown anything yet.  I expanded my thoughts on that in the comments at Fangraphs, but basically 1-2 starts is a meaningless sample size to defend the batted ball classifications.

Or, more likely, due to differences in hit ball locations between the two sources.  But no one has shown that the differences are the result of bias in hit ball type classifications.

I’m not saying it’s one or the other.  We don’t know, do we?

Colin and Harry have produced some pretty damning indictments of the line drive classifications by park.  Harry found that the stringers in Huntsville had completely stopped recording line drives, for instance.  Granted, that’s the minor leagues, but still, it raises questions.

http://www.hardballtimes.com/main/article/when-is-a-fly-ball-a-line-drive/

http://www.hardballtimes.com/main/article/minor-issues-in-the-air/


#31          (see all posts) 2010/04/13 (Tue) @ 16:08

It seems like much of this discussion started from the introduction of +/- over on FG and that some of the differences between it and UZR were due to source data.

I thought that Colin’s article followed from the one he did at THT about press box heights and line drive classifications (linked in #30).

The introduction of +/- at Fangraphs kicked off a big discussion, mainly on Twitter, comparing UZR and +/-.  I’ve posted a transcript of that over at THT Live for those who are interested.  We mainly raised issues rather than resolved them (can you resolve anything over Twitter?), but it was a useful intro look at the issue, I think.


#32    David Cameron      (see all posts) 2010/04/13 (Tue) @ 16:08

No one uses minor league line drive rate for anything.  It doesn’t raise any questions, because it’s an entirely different product than what we’re discussing.


#33          (see all posts) 2010/04/13 (Tue) @ 16:13

From the pitching side, it primarily deals with the percentage of outcome as discussed.  I won’t claim to know the background calculations, but I believe that a difference of a percent wouldn’t change the result much (Dave A. now shows a RMSE of 0.5 over on FG in the comments).

The RMSE over the whole population is not the issue.  The bigger concern is that some specific parks/pitchers see much larger (and persistent) bias than the overall population.  Colin picked a prime example in Wandy Rodriguez, particularly 2008-2009.


#34    dkappelman      (see all posts) 2010/04/13 (Tue) @ 16:14

JBrew, the RMSE of .5 is because retrosheet classifies about double the number of infield fly balls.

Comparing BIS data to Retrosheet data is not an apples to apples comparison.  There is clearly different criteria mainly for infield fly balls and perhaps line drives as well.  Should this be an issue if each data set is consistent?  I don’t know.  I feel like we’ve had this discussion before though.


#35    Colin Wyers      (see all posts) 2010/04/13 (Tue) @ 16:14

The article was written and turned into the editors at BP before Fangraphs announced the introduction of the Plus/Minus stats (which use the same data as UZR anyway). So unless I’ve turned clairvoyant, that wasn’t the intent at all.

Although it does bring up some interesting questions about how different treatment of the same data can lead to such divergent results.


#36    Rally      (see all posts) 2010/04/13 (Tue) @ 16:43

David,

See my question in #9.  When BIS presents infield fly % of 10%, is that A) 10% of batted balls? Or B) 10% of flyballs?

Looking at Colin’s chart it seems A is the answer.  But your quote “because retrosheet classifies about double the number of infield fly balls” suggests the answer is B.

I thought it was B, but now I have no idea.


#37    dkappelman      (see all posts) 2010/04/13 (Tue) @ 16:46

It’s definitely B.  FanGraphs presents infield fly balls as a percentage of fly balls, not as a percentage of total batted balls. 

BIS will do whatever you want with the data, but that’s just how I’m presenting it and it’s how THT presented it too.


#38    JD Sussman      (see all posts) 2010/04/13 (Tue) @ 17:30

Dave /32

No one uses minor league line drive rate for anything.  It doesn’t raise any questions, because it’s an entirely different product than what we’re discussing.

I agree that a very different animal. But people do use LD% for minor league evaluations http://projectprospect.com/article/2009/03/16/top-200-prospect-list.

I’m not a project prospect supporter (that is a rant for another time), but when I called Adam Foster out for citing minor league LD% all the time he said something like, “MLBAM says the numbers are reliable,” over twitter.

It is important that notion that minor league data is not only unreliable but not related to the major league BIS and Gameday data is understood by those citing the data.

Also, doesn’t statcorner use the minor league LD% in their minor league tRA?


#39    Colin Wyers      (see all posts) 2010/04/13 (Tue) @ 17:38

Okay. I’ll change how I’m removing the IFFB from the FB and rerun the analysis later tonight.

Just for reference, BTW, the guidance to MLBAM scorers on popups:

“· A ball hit on a high, short arc that is caught on the fly by an infielder only
· Used only for balls fielded by infielders, NOT by outfielders”


#40    MGL      (see all posts) 2010/04/13 (Tue) @ 18:01

I gotta agree with both Dave’s on #5 and #8 above.  Of course I am talking mostly about from the perspective of UZR.  I’ve worked enough with UZR for the last 15 years or so that I can tell you that when the smoke clears, no matter how good or bad the data is, including park and other biases, as long as it is reasonable, the results will be good and the differences among the results will be relatively small.  There will always be a small percentage of players (e.g., Ichiro and A Jones) for whom that is not true of course, but that goes without saying with any error that is normally distributed (as most errors are).  If I introduce a small and relatively insignificant (in the aggregate) error to any system or metric, there will necessarily be a small number of players or samples for whom or which the error makes a large and significant difference, and cherry picking those samples or players adds little to the discussion…


#41          (see all posts) 2010/04/13 (Tue) @ 18:23

MGL/40, did you read Colin’s article?  He’s not talking about randomly distributed error.

You can ignore the comments here or elsewhere about specific players.  Maybe in most or even all cases we don’t know one way or the other.  They’re illustrations but they don’t prove anything.

But Colin demonstrated very persuasively that he has identified systematic error that appears to be park-based.

I’m not convinced by those who want to hand-wave it away.  None of those who disagree with have been nearly as convincing as Colin’s evidence.


#42          (see all posts) 2010/04/13 (Tue) @ 19:43

Basically you have 2 working models for evaluating defense.  UZR and plus minus.  The uncertainty is unknown (systemic error, random error, etc).  They correlate fairly well on the aggregate, but for some players there are large difference.  This comes as a surprise to most folks who did not have access to plus minus since it was not available for free like UZR.

In this case, both models should be averaged for WAR purposes.  That’s what everyone else does with more than 1 model.  It is done in predicting hurricane forecasts where 1/2 a dozen of more models are averaged (since more than 2 models they throw out outliers for a given forecast, but you can not do that with only 2 models).

Now there is a danger that an attempt will be made to get the 2 models to converge, now that it is easy for the public to see how they differ. The problem with this is unlike hurricane forecasts, you can not tell which one was actually closest, or most accurate.  Defensive metrics can not be validated as well. Convergence in this case may actually make things worse, unless the changes are transparent and based on improving the stat, and not just to get it to be closer to the other stat to achieve a consensus. 

The new generation of defensive metrics will include actual (not estimated) player positioning, and speed of the batted ball (hang time), etc.  Then a great leap foreward may be expected.  Until then, averaging the 2 models is the best that can be done.


#43    KJOK      (see all posts) 2010/04/13 (Tue) @ 20:25

IIRC, MLBAM minor league stringers were hired by the teams, not by MLBAM.  I think most were probably interns.  Perhaps this is no longer the case, but I hadn’t heard it had changed.

BIS minor league data is probably not all that great either, considering the stringers are just sitting somewhere in the stands, but as a BIS minor league stringer I do check against what MILB.COM has, and it’s sometimes not even close to reality, with balls hit to Left Field marked as Right Field, fly ball singles listed as ground ball singles, etc. at least for the team I’m looking at.


#44          (see all posts) 2010/04/13 (Tue) @ 20:46

"MGL/40, did you read Colin’s article?  He’s not talking about randomly distributed error.”

Yes, which is one reason why I said this:

“...including park and other biases.”

My comments still stand.  There is enough uncertainty in the results of any of these defensive metrics that they can easily withstand a little park (and other) biases.

I mean, it is almost impossible to do any more than rough, basic park adjustments, and I am not even sure that Dewan does any park adjustments at all, at least he didn’t used to. That is a major park bias in and of itself, at least for some parks and some positions and somehow we were able to live with those!

I love the article, but I don’t think it is ANYTHING to get all bent out of shape with.  There are half a dozen or more ways to make defensive metrics better. One of them is with better and more objective data.  Bottom line is that with all of those improvements, we’ll take UZR and other defensive metrics from being 80% of the way there to 90%.

It is no surprise that plus/minus and UZR numbers can in some cases differ by a lot.  My last statement about errors is important.  The magnitude of errors, biased or otherwise, tends to be like a normal curve.  So if there are small differences, overall, between Dewan and my output, which there is, that means, by definition, that we will find a decent number of players who differ by a little, a smaller percentage of players who differ by a lot, a much smaller percentage of players who differ by a whole lot, etc, etc.

By the way, there are literally an infinite number of ways to process the BIS data to come up with defensive ratings and most of them are not very clean or precise (e.g. adjustments for baserunners, outs, game situations, parks, weather, batters, pitchers, etc, etc.).  It should NOT be surprising that ANY two systems using the same data will have different results.  I can (and do) sit here for 3 hours and make umpteen changes to the UZR methodology.  I’m sure John can and does too.  Any one of those changes will change things overall by a very little, but will change things a lot for a player or two, for the reasons I explained above (twice).


#45    Colin Wyers      (see all posts) 2010/04/13 (Tue) @ 20:50

Fixing the treatment of IFFB in the BIS data increases the year to year correlation of the error rates:

FB - 0.499
IFFB - 0.323


#46    joe arthur      (see all posts) 2010/04/13 (Tue) @ 21:46

Peter/24 - thanks for pointing that out, I had not noticed that.

my (partial) apologies to Colin/21; I overstated the case in #19. Nonetheless, the spirit of my comment is about half right; Retrosheet data from 2003-2006 rarely identifies sacrifice flies as anything other than fly balls.

Sacrifice flies are identified as line drives in 2007-2009 in frequencies comparable to BIS:

Retrosheet LD Pop GB Fly tot
2003-2006 24 19 3 5364 5410
2007-2009 401 23 0 3748 4172

BIS........  LD (IFF) Fly tot
2003-2006 593 33 4784 5410
2007-2009 435 23 3714 4172


#47    fyi      (see all posts) 2010/04/14 (Wed) @ 02:26

I believe STATS is timing all batted balls now.


#48    Clemente      (see all posts) 2010/04/14 (Wed) @ 19:55

reMGL 40

Wait, wait, wait---

I have noticed over the years tentative complaints about UZR on Ichiro, and Tango and MGL have quickly shot it down with ‘you don’t understand the process, it is right’.

Now, its ‘well, maybe in some cases it isn’t very right (Ichiro) or right at all (AJones), but overall its somewhat maybe partly OK, at least in aggregate, and oh, yea, the data might be pretty bad but its bad for everyone’ What?  This is all becoming more voodoo than science.


#49    Tangotiger      (see all posts) 2010/04/14 (Wed) @ 20:05

Clemente: don’t put words in my mouth.


#50    Steven      (see all posts) 2010/04/14 (Wed) @ 21:56

This is just a tad bit more meaningful than discussions about why certain run estimators are better than others or the battle of the projection systems etc.....

In other words...no fatal flaw here or no real big reason to alter our interpretation of either UZR or +/-....

Call it hand waving if you will but its not a particularly ground breaking insight nor is it one that will dramatically change our view of how reliable defensive metrics are or the conclusions about player value that are derived from such metrics.

But those who hate defensive metrics because they shine light on why three outcome players who can’t play defense aren’t as valuable the back of their baseball card suggests will take this and run with it I’m sure…


#51    MGL      (see all posts) 2010/04/15 (Thu) @ 01:28

"But those who hate defensive metrics because they shine light on why three outcome players who can’t play defense aren’t as valuable the back of their baseball card suggests will take this and run with it I’m sure… “

Exactly.  And somehow it causes trolls to crawl out from rocks (as opposed to basements) under which they live…


#52    eno      (see all posts) 2010/04/15 (Thu) @ 11:21

Won’t hit f/x ‘solve’ this problem? Once we can define a LD/GB/FB by x and y angle and velocity off the bat, won’t that eliminate much of this error?

Seems to me that we have a ‘pretty good’ framework in place, and we are on the cusp of adding technology that will greatly improve the reliability of said framework. Not all doom and gloom to me.


#53    Dan Turkenkopf      (see all posts) 2010/04/15 (Thu) @ 11:44

HitF/X is missing spin off the bat which drastically affects the flight of the ball.

Colin’s right in that the simple answer to all of this is a stopwatch. Report the time in the air and let the community figure out what that means.

I’m pretty sure I’ve seen Tango and MGL support that too.


#54    Peter Jensen      (see all posts) 2010/04/15 (Thu) @ 12:26

Dan - Having the hang time doesn’t help much if you don’t have accurate hit ball locations.  Lack of spin informtion will keep Hit f/x from giving us accurate hit locations for outfield air balls, but the speed off the bat info will help improve fielding metrics for outfielders.  And for infield ground balls the spin is not a large factor so fielding metrics for infielders can be markedly improved with the information from Hit f/x.  If we ever get that information.


#55    Rally      (see all posts) 2010/04/15 (Thu) @ 13:19

I thought that Robert Dudek showed that just having hang time and using nothing else gives you a better idea of a fly ball’s difficulty than knowing the hit location.

Seems to me like hang time + imperfect hit location would give you a pretty good basis for season defensive ratings.

Is hit f/x something that is currently being done, but not available to the public, or is it still a work in progress?  I thought I saw that BIS was recording hang time.  To have the good data you just have to be willing and able to pay the price or work for a team.


#56    Peter Jensen      (see all posts) 2010/04/15 (Thu) @ 13:58

Rally - I didn’t see the Robert Dudek study to which you referred.  Having hang time certainly will help some with our defensive metrics.  I think we do pretty well with the imperfect hit locations we have now.

My understanding is that Hit f/x was done on all plays last year and is currently being done on this year’s plays.  When I last checked in February the backlog of plays from 2007 and 2008 had not been done.  Sportvision claimed that it was still planning to release previous (but not current) data to the public at some point.  But the departure of Marv White as CTO and the ongoing negotiations with the teams made their plans uncertain.  They now have a new CTO so perhaps progress will be made soon.



#58    Greg Rybarczyk      (see all posts) 2010/04/15 (Thu) @ 14:13

Right now, defensive analysis is like watching a nice, big widescreen HDTV through three pairs of blurry glasses.

The glasses are:

1.  Lack of hang time data.
2.  Lack of accurate landing point data.
3.  Lack of accurate fielder start positions.

As we (hopefully) strip each of these blurry pairs of glasses away over the next few years, our ability to analyze defense, particularly outfield range, will become progressively better, but taking off just one, or even just two pairs will leave us short of our goal, and will only sharpen our desire to get to the final, clear picture.


#59    Terry      (see all posts) 2010/04/15 (Thu) @ 14:44

Maybe a better analogy is that current defensive analysis is like a big widescreen HDTV compared to previous analysis but the cable signal isn’t in high def yet (or its a 1028 screen being fed 720..)…

In other words, we’ve come a long way in design and see the big picture in a useful, improved way that was impossible before. Adding an HD signal isn’t likely to change the big picture significantly but it will likely help clarify some granular issues.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 02:54
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com