THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


2013 Bill James Handbook

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, March 13, 2009

Sample v True Talent

By Tangotiger, 05:40 PM

Boy, we’ve been giving Dewan a big platform these last two weeks!  Here’s another. The writer, Geoff Baker (used to write for the Montreal Gazette) asked Dewan that since Dewan believes the Mariners upgraded their fielding by +2 wins, then…

Q: Could you qualify that? Does that mean that if they won 62 games last year, they’re going to win 64 this year? Dewan: “Yes. Yes.”

No.  No. 

No!

If you “show” you win 62 games in one season, this does not mean that you are a 62-win team.  That is not the baseline you are working from.  The actual baseline you are working from is whatever your talent base happens to be.  And what that talent base did in their previous 162 games, as captured by the win-loss record, is not the true baseline.  What that talent base did in their previous 162 games, as captured by OBP, SLG, PA, UZR, ERA, FIP, and alot of other component metrics is alot closer to the true baseline.  And even then, it’s still not the true baseline.

Performance metrics are nothing more than samples.  Just like when you run an experiment and you record what you see is just a sample of what the thing actually is.  162 games may sound like alot of trials, but it is not.  And the reason that it is not is because all the teams are so close in talent anyway.  If you have 30 coins, and each of them is weighted so that it will land heads a specific percentage of time, from a low of 45% to a high of 55%, do you really think that if you pick out one coin, and it lands on heads 40% of the time after 160 trials, that this means that the coin actually lands 40% of the time?  Seeing that I’ve established that the minimum is 45% of the time (that’s your prior) then it becomes a simple Bayes problem to figure out the best estimate as to which of those 30 coins you’ve been flipping.  There’s even a chance that you’ve been flipping the 55% coin!

So, no, no way, does it mean that if the Mariners stood pat that they’d win the same number of games, and no way does it mean that if the Mariners adds 2 wins in talent that they’d win 2 more games.  That’s not how it works.

I don’t know if that quote captures the discussion between Geoff and John, and it’s not important for my purposes.  What is important is that the answer quoted to the question quoted is 100% wrong.

(Hat tip: USSM)

 


#1    .(JavaScript must be enabled to view this email address)      (see all posts) 2009/03/14 (Sat) @ 02:12

Tom,

A little off topic, but Baker in his blog suggests that one gets diminishing returns as one improves fielding defense. As he puts it “logic would seem to dictate that there’s probably a lot less room for tangible impact on the Mariners as a whole by upgrading this particular area [fielding defense].”

Is there any evidence of this?


#2    Tangotiger      (see all posts) 2009/03/14 (Sat) @ 02:28

If the team’s fielding was +250 runs, then he’s right.  If they replaced Beltre with Rolen, or Ichiro with Victorino, then he’s right.

But, that’s not what the M’s did.  In practical purposes, every team has at least one spot where they could easily upgrade on their fielding.


#3    Mark R      (see all posts) 2009/03/14 (Sat) @ 02:37

It makes sense that good defenders playing next to (and making up for the range deficiencies of) bad defenders would have more impact than those playing with other good defenders. Dave Cameron used the Howard/Utley example earlier today to point out some flaws in Dewan’s assumptions, and I can’t think of a better one. Howard can’t go to the right, so Utley shifts drastically to his left with lefty batters and picks off plays from Howard’s zone.

In contrast, Endy Chavez and Ichiro are going to cover a hell of lot of ground in Safeco’s outfield next year. And some of that ground would be covered by Franklin Gutierrez alongside lesser corner players. I don’t know what kind of overlap there is in practice with three rangy outfielders, but I’d have to think it’s substantial.


#4    .(JavaScript must be enabled to view this email address)      (see all posts) 2009/03/14 (Sat) @ 02:41

I’ll cut him some slack on that quote.  Regardless of Dewan’s understanding of your post, I think the question posed to him was basically, “You can say two wins, but do you mean it?  Are we really going to see them win two more games, or is this just hypothetical statistics mumbo-jumbo?”

And from that perspective… answering to the average baseball fan newspaper reader… I think his answer is fine.  Incomplete as far as we’re concerned, but it gets his point across and it answers the *spirit* of question the reporter was asking.


#5    Mark R      (see all posts) 2009/03/14 (Sat) @ 02:41

Tom,

You beat me to it and made the point more clearly than I did.

Replacing Ryan Howard with Albert Pujols would probably make Utley look worse but would still be a very nice upgrade to the Phillies’ defense.


#6    BenJ      (see all posts) 2009/03/14 (Sat) @ 03:37

Agreed with Mike/#4. 

Consider the context.  His specialty is defense.  He’s talking to a writer.  He’s not there to explain the difference between Seattle’s 2008 record and their “true baseline”.

If Seattle wins 79 games in 2009, as USSM suggests is their baseline, it’s NOT a 17-win improvement because of the changes made this offseason.  Dewan is giving two of the 17 to the defensive improvements.

Anything else (like the writer saying the 09 M’s are a 64-win team) is the writer’s implication.  Dewan’s not there to qualify every statement- he knows it will just confuse the writer and his readers.  Take it one step at a time.

Also, it looks like USSM also overlooked the Enhanced Plus/Minus part that goes into Defensive Runs Saved.  Two sentences after the Runs Saved factor chart in the book:  “For the corner infielders and the outfielders, we use enhanced plus/minus.”  Maybe the explanation was lacking, but the numbers aren’t crazy.


#7    philosofool      (see all posts) 2009/03/14 (Sat) @ 03:48

“I don’t know what kind of overlap there is in practice with three rangy outfielders, but I’d have to think it’s substantial.”

It’s obviously an empirical question, but I would speculate that three rangy outfielders can avoid stepping on each other’s toes by positioning themselves well. The problem with three rangy outfielders (“problem”) is that they might start to overlap and you don’t prevent more runs by having two guys that can catch a ball. It seems to me like you solve this by having your LF and RF play more further away from center, thereby minimizing the amount of overlap that there is. But whether this is possible in practice, I don’t know.

Also, it’s not clear yet what makes a good OF. (I should take that back: I don’t know what makes a good CF.) That THT article on defensive shifts the other day was really cool. It could be that a lot more of OF defense has to do with positioning than I thought. I wonder if Victorino looks awesome because he played next to Burrell. (Not that anything will change with that next year, except the name on the jersey next to him.) Anyway, I love the new analysis of defense. It just keeps getting more interesting. I should probably order the fielding bible, huh?


#8    Tangotiger      (see all posts) 2009/03/14 (Sat) @ 03:52

I did say this:

I don’t know if that quote captures the discussion between Geoff and John, and it’s not important for my purposes.  What is important is that the answer quoted to the question quoted is 100% wrong.

So, I am not disagreeing with Mike/4.  I’m basically not even scolding either one, but simply taking the statement at its face, to make my point, without ever trying to diminish or implicate Geoff or John.  As I said, I don’t care what Geoff or John meant, because my point is not to parse their words.  My point is simply to make my point about the statement as written.


#9    MGL      (see all posts) 2009/03/14 (Sat) @ 07:06

I have not read the article and quotes in question (when I do, I’ll stick my 2 cents in), but as far as this:

It makes sense that good defenders playing next to (and making up for the range deficiencies…

First of all, it might make sense to you (it does not necessarily to me, as an analyst who has looked at these kinds of things), but I don’t know of any evidence that suggests that this is true, and I doubt that it makes very much difference who plays next to whom as far as the value of a fielder’s defense to his team.  I don’t doubt that one can position himself in such a way as to optimize his own and his teammates’ defensive value, but I think that the overall value of that is minimal.  I think that a +10 or -10 defensive player (assuming those are their true talent values, which we can never know of course) is going to add +10 or -10 (theoretical) runs, regardless of who they are playing next to, give or take a run or two at the most.

Reminds me of the “it makes perfect sense that a good outfielder has more value in a large outfield and that a poor fielder’s value if mitigated in a small outfield” argument.  Unfortunately the evidence, for whatever reasons, suggests the opposite is true.

So let’s be careful about the “at face value such-and-such argument seems to make perfect sense” argument, especially when things are not completely self-evident although I suppose there are many of these things that people do in fact think ARE self-evident that I am trying to say are not…


#10    MGL      (see all posts) 2009/03/14 (Sat) @ 10:34

I am reading the article in question and I will critique it and whatever Dewan said. Keep in mind, as someone who has done a dozen or more interviews with journalists and bloggers, what they said you said is NOT always accurate and often taken out of context.  When I do a long interview with a journalist as I recently did with someone from ESPN and someone else from SI, I like to say this, half-jokingly (only HALF):

They will do a 30 minute or 1 hour interview with me, we’ll have a great discussion, the journalist will thank me for all the wonderful information and insight, and then…

I’ll read his piece and there will be one sentence from me, and even that, they’ll get wrong!

Anyway, I’ll pick on the writer, Baker, first:

It should come as no surprise that the World Series champion Philadelphia Phillies led the pack in 2008, finishing with a +78 score in defensive runs. In other words, the Phils “saved’’ themselves 78 runs through their defense. The AL championTampa Bay Rays finished 9th out of the 30 teams with +26—a vast improvment by that club over previous defensively-attrocious years.

I don’t know why, “It should come as no surprise…”  While a WS winner (or any team with an above-average record would tend to be above average in all facets of the game, the correlation between defense and winning the WS has got to be pretty small.

Plus, if it comes as no surprise that the WS winner led the pack in defense (which is not necessarily the case, BTW, if you look at other metrics - and I would probably just use team DE (outs per BIP) if I am looking at whole teams), then it surely should come as a pretty big surprise that the team that lost the WS (presumably the second best team in baseball) was only 9th in defense.

I am being picky, but as usual, I am a stickler for words being true and accurate, making sense, and standing behind one’s words, no matter who the audience is.  Writing for the mainstream should never be an excuse for making inaccurate or misleading statements. I mean that in general - I am not saying that the above quote is inaccurate - just a silly choice of words, IMO.

Dewan says this in the article:

“Beltre and Suzuki’s limited time in right field counted for four extra wins.”

That is not true!  Just because someone was measured by some less-than-perfect metric, as “saving 40 runs,” does not make it so!  Measurement error, measurement error, measurement error!

Again, I understand Dewan saying that - I sometimes say the same thing.  But I do want to point out how and why that is not so.

He (Baker) has a nice description of Dewan’s metrics.  Not perfect, but for a non-sabermetrician (and I am not sure I could do any better) on a mainstream site, even though it is a blog, it is pretty darn good.

From the writer (Baker):

f defense wasn’t the team’s biggest liability in 2008, then logic would seem to dictate that there’s probably a lot less room for tangible impact on the Mariners as a whole by upgrading this particular area.

As Tango said, no, no and no.  Logic would seem to dictate?  What logic is that?  You hear and read that all the time, and it makes no sense from any perspective.  Of course, it is probably easier to upgrade at a position or on offense/fielding/pitching if you are weak in that area, but other than that, there really is not any “point of diminishing returns”, within reason.  If you are plus +15 on defense and you ass 10 runs, it is the same as if you are -15 in defense and add 10 runs (again, obviously, it is usually easier to add 10 runs to a bad defense than a good one).

So, stats like Dewan’s, if accurate, do bring up some questions about the current team’s priorities. I think it’s more than fair to discuss whether the best way to attack and “fix’’ the 101-loss squad from last year was to make the biggest perceived improvements in the one area of the game the Mariners were actually pretty good at.

A big mistake by teams (and fans, writers, etc.) is to pigeonhole a “need” based on a real or a perceived weakness.  If you can upgrade in any area, regardless of where you are at in that area, you should.  Period.  If you focus on one area (one you are weak in) and neglect another (one you are strong in), you might miss out on opportunities to improve your team, if that is what you are trying to do.

There’s nothing wrong with trying to make a team better off defensively. I’m just wondering whether it’s really enough to avoid another last-place finish.

That is probably true. There is usually only so much you can do to improve in any one area.  If you want to be a contending team, you probably have to make sure that you don’t have any glaring weakness, but…

The bottom line is that the only thing that counts is your overall combined numbers - offense, fielding, baserunning, and pitching - and the combination does not matter (unless you are talking about the post-season where it matters who your number 1 and 2 starters are, etc.).

One other thing to note.  The concept of “needing more power” or something like that is even more silly.  Again, if a team is lacking power and it wants to upgrade their offense, it is likely that they will automatically have to upgrade their power, but by no means is that a given.  Again, regardless of whether you are weak or strong in power, or on-base percentage, or what have, your goal in improving your offense is simply to improve your offense - and it does not matter how you do that.  And yes, I realize that on a team level there are interactive effects that tend to favor a balanced offense…

And so, I asked Dewan just how much of an improvment the M’s could expect, from a won-lost standpoint, based exclusively on those outfield moves—adding Gutierrez and Chavez and moving Ichiro to right—that he’s so high on.

“About two or three wins,’’ he told me.

So far that is a reasonable answer.  I don’t think Dewan means that, “They will win exactly 2 more games than they won last year.”  He means that their true talent level, but virtue of the upgraded defense, is around 2 wins improved from last year.  How you explain that to a lay person, I don’t know.

Q: Could you qualify that? Does that mean that if they won 62 games last year, they’re going to win 64 this year?

Dewan: “Yes. Yes.”

OK, now that is a really bad answer, as Tango says, and is the point of this thread. 

But did Dewan really say that?  Did he qualify that statement and the qualification was omitted by Baker?  Did Baker paraphrase Dewan?  Etc.

Dewan: “Right. All things being equal, by improving their defense in the outfield, it’s going to mean about 20 runs or more when the whole season plays out.”

Again, that is a reasonable thing to say, although you would like Dewan to qualify that a little.  Again, maybe he did and it was not printed.  Who knows?


#11    Dan Brooks      (see all posts) 2009/03/14 (Sat) @ 14:01

Since we’re all talking about measurement error and defensive statistics, I thought I’d bring up my favorite example, since this is MGL’s blog and Dewan is the subject of this post.

-

If you’ve got a Hardball Times Annual, open your 2008 edition. MGL has an article called “Signals and Noise”, in which he estimates, using his defensive system, the value of the Toronto Defense to be +12 runs. I admit that I didn’t pay any attention to this number the first time I read the article.

No problem, right, because who cares about Toronto’s defense?

Well, it turns out that at least one guy does: John Dewan, who has the very next article in this book. In this article, using his defensive system, he estimates the value of the Toronto defense to be +92 (best defense that year, iirc).

No, really. If you turn three pages, you get two different estimates that differ by eighty (something like 60ish runs?).

—-

I mean, I understand that there is inherently measurement error in any system. But this just strikes me as an awful lot of error, and if we believe John about his discovery, that means that someone’s got a measurement error that’s half of the largest possible disparity in the observed effect.

I understand that all systems will necessarily have some amount of error and that both systems have their value. But this one always struck me as a great example of how difficult this variable really is to measure.


#12    MGL      (see all posts) 2009/03/14 (Sat) @ 14:50

Dan, that is a wonderful example.  If you compare UZR on fangraphs and Dewan, for individual players, you will see lots of large differences.  And both systems are using the exact same data and very similar methodologies!

What is the discrepancy?  Measurement error!  It can’t be error between a player’s true talent and how he performed.  Both systems are supposed to have the around the same numbers if that were the case.

So yes, to quote a number, be it Dewan’s, UZR, or any other one from any one of a number of good defensive metrics, and claim that “this is what the player or team actually did in terms of runs saved or cost” is a complete lie!

I’m glad we are finally talking about this, albeit in several threads at the same time.  I must admit that I am as guilty as Dewan or anyone else as implying or even explicitly stating that a number we come up with is what a player or team actually did, even if we admit that that number is not necessarily, and in fact, not likely to be that player or team’s true talent.  We must also admit that that number is subject to measurement error as well, and not just sample error, such as if we timed a player in a 100 meter dash using a perfect stopwatch system (therefore very little if any measurement error), and may not in fact reflect what that player actually did by any stretch of the imagination.

Which is one reason, by the way, and I want to shout this to Rob Neyer (and lots of other good folks who may be listening), that we have to be very careful about criticizing things like Gold Glove awards with the implied or explicit argument that the advanced metrics say otherwise and the advanced metrics must be right and the GG voters wrong.  That is not necessarily true.  I have actually said this before, in my own defense, that just because a defensive (or any other) metric says something does not mean that that is how the player or team performed. And that is because of measurement error!  This is a difficult concept to understand but a very important one nonetheless.


#13    Brian Cartwright      (see all posts) 2009/03/14 (Sat) @ 15:04

I’m working out some future articles on how projections work and how accurate they are. Again, these are measurements and estimates of performance. So, I don’t get too worried about how many decimal points of a run saved. I’ve been thinking along the lines of using t-scores to drives seven groups, three above average, average, and three below. I’ll be happy to get the guy in the right group when analyzing the data. That said, I do think there is enough evidence that Nate McLouth is not one of the elite CF’ers in the NL


#14    Dan Brooks      (see all posts) 2009/03/15 (Sun) @ 03:30

@MGL/12

Maybe I’m wrong, but I’ve always felt like statistics are good for detecting the differences that are hard to detect with our eyes. I mean, we really don’t need statistics to tell us that Mariano Rivera is a great closer or that Josh Beckett has a great fastball. Or, for example, I hardly watch any basketball, but I know that the current Celtics, when they are all healthy, have a great defense.

We shouldn’t need advanced statistics to separate the best from the pack or the worst from the pack. I feel like those kinds of things should be intuitive and obvious. It helps to have these advanced metrics for separating the “very good” from the “good”, or “the little below average” from the “little above average”.

When I do statistics for my experiments, I almost always intuitively know the answer of the ANOVA or the NLME or the T-Test beforehand. When you start doing these things enough, you have a pretty good intuition for what is a real difference between groups vs. what is fake. Granted, you guys are doing primarily descriptive statistics and not inferential, but still (assuming you’ve designed your experiment well), inferential statistics are really the most useful at the borders of significance, where it’s difficult to tell by the naked eye when something is really having an effect or not.

So, it was shocking to me that one of you could call Toronto only average while the other was calling them the best. As in, you’d think that your systems would agree on the best and the worst and that most of the discrepancy would lie in the middle.


So here is the question: What defensive metric do you really trust? I mean, of course it would be easy for you to say “UZR”, but honestly, would you look at the average of a number of metrics? Would you look at the UZR number combined with scouting information? Would you look at just scouting? Would you look at Tango’s “Fan report”? I mean, which has the best correlation, in your mind, to true talent?


#15    MGL      (see all posts) 2009/03/15 (Sun) @ 11:30

As Tango likes to say, and he is right, the less objective data and the corresponding metrics you have, the more you weight scouting-type data, of which the Fan Scouting Report is an excellent example.  For example, one year UZR (or whatever) mixed with one part scouting seems about right.  For 2 years of UZR, maybe one part scouting (2 parts UZR), etc.

As far as which objective metric, I have no problem combining them.  For example, personally, I combine UZR from BIS data with UZR from STATS data (some of the results are quite a bit different even though they use the exact same methodology).  I would have no problem combining UZR with, say, Dewan’s stuff.  There is a lot in Dewan’s stuff, like the DMP (some of which are already included in the plus/minus, and some of which are not, which makes it REALLY confusing, as I discuss in the Fielding Bible thread), which are not included in UZR and in his plus/minus, which is a subset of his total defensive ratings.

Don’t, however, get fooled into thinking that defensive ratings and evaluation are much worse than offense ones.  They aren’t.  That is partially an illusion created by the fact that when a ball is hit, from the offensive-side, we call it something, like a hit or an out, even though there is a lot of measurement error in those classifications.  What I mean by that is that when a player gets a bloop hit over the second baseman’s head that is scored a single, that is measurement error from the perspective of measuring true “hit making” talent.  For defense, when a ball is hit, we have no defensive counterpart to the single, double, etc., other than simply one fielder makes an out or an error, and therefore there is the illusion that everything that we measure in these advanced fielding metrics has large (larger than on offense) measurement error.  If the official scorer was forced to write down a “result” for every fielder every time a ball was put into play, like “digime” (made-up word) when the batted ball could not have been caught by that fielder, “solise” when it could have been caught, and “jigore” when it should have been caught, then all of a sudden, the perceived large measurement error in recording defense would disappear or be greatly reduced.  If that makes any sense.


#16    Colin Wyers      (see all posts) 2009/03/15 (Sun) @ 13:25

My question about defensive evaluations is not by how much they are wrong, but whether or not there are particular biases in the ways they are wrong.

In other words, is the measurement error in UZR/FB/etc. randomly or systemically distributed? If we want to simply stratify fielders into groups of good/bad, we would even prefer a measurement that had more measurement error if it was less biased.

I know how to measure those sorts of biases on offense; I don’t know how to measure defensive metrics in those way. But I think it’s the biggest question worth asking.


#17    Brian Cartwright      (see all posts) 2009/03/15 (Sun) @ 13:28

Makes sense to me.

When I was a summer league statistician and started reading abstracts more than 25 years ago, I devised a scoring system where every batted ball that was on the field of play (excluding over the wall homers and wall balls) was assigned to a fielder - the defensive stats had outs, hits (si, do, tr, hr), roe, other errors, etc. The idea was to extended team DER down to an individual level. The individual defender’s stats would add up to the teams DER. You can assign zones with varying degrees of difficulty (my system had 9 zones for each outfielder), but every batted abll was accounted for on defense. Even today, I am frustrated that we still have to guess on whether a ground hit to lf should be credited to ss or 3b. Maybe mgl can tell us if BIS or Stats bothers to record this simple fact, but I know it’s not in the free data (yet). Another good use for video. Sigh, I need to go to bed!


#18    Peter Jensen      (see all posts) 2009/03/15 (Sun) @ 14:29

MGL - I am glad that you are bringing the problem of measurment error into the discussion about evaluating both defensive metrics and offensive metrics.  There will always be measurement error in both and we should always be aware of that.

However, I do think that there is a fundamental difference between the amount of measurement error in offensive metrics and defensive metrics.  Even though there are bloop singles and robbed diving catches of smoking liners, most hits are earned by the batter hitting the ball harder than usual.  The batter may even earn some hits by being able to control where he hits the ball on the field too, but I think that will have to be confirmed by additional study when Hit f/x becomes available.  But a batter’s earned hits will far out weigh his unearned hits.

The situation is just the opposite for fielding.  The vast majority of fielding plays are routine.  Perhaps 95% or so could either be successfully fielded by the worst major league fielders or wouldn’t be successfully fielded by the best fielders who ever lived.  The plays that distinguish a good fielder from a bad fielder are probably fewer than 5% of all plays that we record in our metrics.  The problem is we don’t know which 5% they are and our observational skills just aren’t acute enough to help us that much.  So we have to record everything and hope that with large numbers the lucky plays will average out.  And mostly they do.  But it also means that every manipulation of the data that we do in the attempt to identify meaningful factors (like adjusting for park differences or pitching differences or dividing into smaller zones) is being done to 95% data that is really meaningless in terms of true talent.  That has the potential for creating large distortions.  This is a problem that is inherent in the fielding data being made up of such a large proportion of meaningless data.


#19    Guy      (see all posts) 2009/03/15 (Sun) @ 19:31

Peter makes a very good point.  The other way of saying it is that our true sample—of balls that may or may not be caught—is very small. 

I would argue, though, that there is a more fundamental difference between offensive and fielding stats.  Both stats capture a lot of luck, and so are imperfect measures of true talent.  But offensive stats are a nearly perfect measures of PERFORMANCE—hits and outs actually determine runs and wins, they are not arbitrary categories.  So if you don’t care about luck vs. skill, the metrics tell you exactly what a hitter did.  (the one exception I can think of is ROE, where we debit the hitter in metrics like OBP and SLG, even though he actually reached base). 

Defensive stats are different.  We know if a BIP became an out.  But we often can’t say for sure how to apportion credit/blame for that outcome.  Defenses are interdependent in a way that hitters are not.


#20    Brian Cartwright      (see all posts) 2009/03/15 (Sun) @ 21:03

I think it’s closer to 8-10% of BIP that make the difference between best and worst, varying somewhat by position. For each position, find the very best pct of outs made, with a large sample (2 to 3 season’s worth), then find the worst. That tells you how many plays everyone makes, and how many no one makes, with maybe 10% left over.

I agree with everything else Peter says. We can say that a ball was hit so hard, or landed at certain location, but we don’t yet have the resolution to see that precisely. Any measurement is only as good as it’s least accurate component.

I don’t think defenses are so interdependent that we can’t use our eyes to know which fielder had the best (even if slight) chance at a ball.


#21    Tangotiger      (see all posts) 2009/03/16 (Mon) @ 00:46

I agree with Peter’s basic sentiment, which is why I’m a champion of classifying balls as “routine outs”, “routine hits”, and everything in between.

The number is at least 50% higher than 5% though.  (I’ll presume Peter picked the number out of thin air to make his point.)

If you put the worst fielders on the field, and best fielders (even if you restrict it to those for their positions, and not put 8 Frank Thomases and 8 Ozzie Smiths), the range will be close to 400 plays.  With 4300 balls in play or so, that makes it 9%.  I think 7-10% is a reasonable estimate.

But yes, remove the other 90% as basically noise that clouds our evaluation, then we’d have a far lower uncertainty in our metrics.

It would be much better to know that of the 700 balls that went into Adam Everett’s basic area, that only 70 distinguishes the star from the scrub, and that Adam made a play on 60 of those.


#22    Peter Jensen      (see all posts) 2009/03/16 (Mon) @ 01:28

Yes, it 5% was a thin air guess for illustration purposes only.  I can certainly live with a more accurate 7 to 10% that both you and Brian have more closely estimated.

I don’t think defenses are so interdependent that we can’t use our eyes to know which fielder had the best (even if slight) chance at a ball.

If it were just a matter of deciding which fielder had the best chance at the ball, I would agree with you that a human observer could probably make that decision almost all the time.  But one would also have to decide who SHOULD have been closest to the ball and that’s not so easy.  Plus if and infielder fields a ground ball but the throw is just a little too late to get the runner how does one decide whether another fielder would have made the play if he had charged the ball harder, or had a slightly stronger arm, or a slightly quicker release?  Or was the runner just too fast?  Or did the first baseman not stretch enough? Or should the pitcher have fielded the hit ball in the first place?  I don’t have much hope for human observers to be able to make objective determinations of all these variables and come up with a clear decision of whether a play should have been made or not.


#23    MGL      (see all posts) 2009/03/16 (Mon) @ 03:09

But offensive stats are a nearly perfect measures of PERFORMANCE—hits and outs actually determine runs and wins, they are not arbitrary categories..

That is true, but it has no relevance in terms of predicting the future and in terms of estimating true talent.  And I have said repeatedly, I have zero interest in making judgments (like MVP type of “awards”) about past value.

Peter, I agree that there are other fundamental differences between offensive and defensive evaluations that make the offensive ones inherently more accurate and less prone to measurement error.

I will contend, however, that if we had a “fielding scorer” and his job would be to do what we are talking about above, on every batted ball, catch throw, etc., which is to record something akin to a single, double, out, etc., for each fielder on every play (throw, batted ball, etc.), then we would probably have a better record on defense than we have on offense, and our defensive evaluations would be better than our offensive ones.


#24    Brian Cartwright      (see all posts) 2009/03/16 (Mon) @ 03:40

At the most basic level, what I want to see is
1. a description of the batted ball, for classifying and calculating expected values
2. who the ball was hit to (any ball on the palying field gets assigned)
3. what was the result of each play


Commenting is not available in this channel entry.

<< Back to main


Latest...

COMMENTS

Feb 11 02:49
You say Goodbye… and I say Hello

Jan 25 18:36
Blog Beta Testers Needed

Jan 19 02:41
NHL apologizes for being late, and will have players make it up for them

Jan 17 15:31
NHL, NHLPA MOU

Jan 15 19:40
Looks like I picked a good day to suspend blogging

Jan 05 17:24
Are the best one-and-done players better than the worst first-ballot Hall of Famers?

Jan 05 16:52
Poll: I read eBooks on…

Jan 05 16:06
Base scores

Jan 05 13:54
Steubenville High

Jan 04 19:45
“The NHL is using this suit in an attempt to force the players to remain in a union�