Tuesday, May 15, 2012
Mets fielding storylines
For those who can’t get enough, Mark has you covered.
Buy The Book from Amazon
For those who can’t get enough, Mark has you covered.
This blogger didn’t read the fine print!
As MGL pointed out yesterday, the shift plays that Lawrie (or anyone) makes are excluded from the calculations. I’d respond directly to the blogger, but AN is blocked at the office.
Other than the over-shifting, then, yes, UZR is a measure of range and positioning. Indeed, EVERY fielding metric is a measure of range and positioning. Why is that? Because other than the over-shifting, we aren’t being told where the fielders are being positioned. So, while you as the observer can tell where the fielders are roughly positioned, no one is actually recording this information for us to use! You can’t then fault a metric for not using data that it’s not being given. Indeed, if we had the data, WE WOULD USE THE DATA.
Min 3000 IP, runs shown per 150 games:
SS: +15 Everett, rest: Br Ryan, Hardy, Vizquel, McDonald, Counsell, Izturis
CF: +21 Guti, rest: Andruw J, Ca Gomez, Patterson, Taveras, Bourn, Rowand, Chavez
Does this pass the 80/20 rule of Bill James, that 80% of the time it gives no surprises, and 20% of the time, you get a surprise?
If you save 14 runs over 150 games, you’ve had a very good to great fielding season. Dewan’s DRS has Lawrie with having saved 14 runs so far this year.
The way fielding systems work is conceptually simple and obvious: how many plays did the player actually make, and how many plays would a league average player for that position PROBABLY make. That’s it, in a nutshell.
With batting stats, the league average player faces the same kind of conditions in virtually every park (with some slight exceptions like Coors and Petco). He faces the same kind of pitchers, all hittable pitches are thrown in a 2x2 foot box (or equivalent ellipse), and so on. Not identical, but similar enough that there’s not that much difference in the hitting conditions for hitters.
Fielders are not like that at all. A fielder is completely at the mercy of whatever batted ball distribution there happens to be. Indeed, he can see the exact same hitter-pitcher combination for 300 innings (imagine a never-ending batting practice), and he STILL might face a different distribution of batted ball, compared to another player at the same position facing the same hitter-pitcher combination.
So, for a fielder, he can end up looking really good if he: (a) actually gets to more balls than the average fielder did, and/or (b) makes it look like the average fielder PROBABLY would have gotten to fewer balls, IF WE KNOW what the batted ball distribution he faced.
Now, let’s go back to DRS. It not only shows Lawrie with having saved 14 runs already after only 309 innings this year, but he ALSO saved 14 runs last year playing only 380 innings. That’s 28 runs saved on 689 innings, or a rate of 59 runs saved per 162 games. (UZR is at 21 runs saved per 162 games for his career.)
Now the Blue Jays 3B are #2 in the league in assist. So, we know that Lawrie is making more plays than just about anyone else. The question is if he’s making those plays because he’s getting harder opportunities (lower baseline makes him look really good), and/or, because he’s actually making plays that no one else would make. That is, is the way DRS describing those opportunities somehow biased? Is Lawrie actually getting really easy opportunities, but DRS is making it look like they are much harder than they are?
As I said, UZR doesn’t have this extreme viewpoint, even though they are both looking at the same dataset. Is it something about how plays are tracked in Toronto or with Toronto TV feeds?
This is where I think it would be lovely if MGL/Dewan can give us a description of Lawrie’s career so far, and why MGL thinks that Lawrie’s performance is great Gold/Glove caliber, but Dewan thinks it’s otherworldly. Unless of course they are precluded from discussing this for whatever reason.
Not sure what to call these plays, but I like the idea behind them.
1. John Buck and Chipper Jones lead with the most outs made, without using their gloves.
2. Alexi Casilla / Gonzale lead with the most outs made, without using their hands.
3. Freddy Galvis leads with the most outs made, without using his feet.
A while ago, someone pointed out to me that Baseball Pro has Maddux’ FairRA+ at an obscenely low 105. For comparison’s sake, Pedro’s career is 127, RJ is 122, Clemens is 121, and Schilling is 120. (Higher is better.) At the same time, while Maddux has a “pitching” WARP of +58, his “total” WARP is +84. (If I understand it properly, total WARP includes fielding and hitting.) In comparison, Clemens, who has pitched about as often, has a total WARP of +103, or 19 wins more than Maddux. Fangraphs’ fWAR has Clemens (146) 25 wins ahead of Maddux (121).
So, Clemens v Maddux, in the end, have the same gap in total WARP as they do in fWAR. It’s just that Maddux derives as astounding 26 wins from his fielding (and hitting I guess), since his pitching WARP is only +58 while his total WARP is +84.
If we look at their FIP- at Fangraphs, we see that Clemens is 70 while Maddux is 77 (lower is better). That also happens to mimic their ERA- (Clemens, 70, Maddux 76). So, whether you look at FIP, which ignores all aspects of fielding (and sequencing), or you look at ERA (which includes fielding and sequencing), Clemens and Maddux maintain the same gap. That is, Maddux is Maddux because of his FIP.
So, does Maddux actually have a 26 win gain with his fielding, or, is his fielding really not that valuable (because if he didn’t make the play, maybe his infielders will have made the play anyway)?
Or, am I missing something.
In this corner: Dewan’s metric uses Hang Time data. And Dewan also uses subjective classifications of how good or bad the fielder made the play.
And in that corner: MGL’s metric has alot of the little adjustments, like pitcher’s GB/FB tendencies, that tries to isolate as many biases as possible. UZR only has access to “soft, medium, hard”, and uses that as a proxy to hang time.
Both use BIS data. And notwithstanding the above, both otherwise follow a similar methodology, relying as their core the batted ball locations as marked by the stringers. The differences are all on the periphery, which may make a big difference for a few of the players.
Ok, I finished reading the book two days ago. There are plenty of good things in the book, and plenty of so-so things in the book. The feeling I got was that the book felt disjointed. It’s a worthy read, because you’ll get a couple of doubles, maybe a triple. Chris Dial’s article is a good read especially for those new to fielding analysis. The methodology section and other articles is a good attempt at trying to bring everything together, but in a couple of places it falls flat. Bill noted his issues with the shift data. I think the replacement-level section left alot to be desired.
The data portion is a double or triple, even a home run for some.
For the hard core guys, you’re left disappointed in some places, but still enough to keep you reasonably happy. For the newbies and those who have dipped their toes already in the fielding water, you’ll be inspired, or at least be quite satisfied with the book.
I think Chris Dial hit the nail on the head when he described the target audience, and for someone who has an obvious conflict of interest, he was quite impartial:
I think the book works for everyone. My article was read by people I work with who didn’t have much of the base knowledge that FanGraphs has and they got it and understood value in the analysis. Yes, they asked some very basic questions, so I think it works on the beginner level. I think it works on the intermediate sabermetric level the best, and I think at the highest levels, it will present a different way of looking at information, and will allow the open-minded experts, like yourself Tom, to say “I hadn’t considered that before”, even if you find some other flaw in methodology.
This thread is to discuss the nuts and bolts in the back of the book. I only read a couple so far. The first is a preamble, and the rest are going to seem like I’m being nit-picky. If that’s how it looks to you, then you are not able to read my mind. If on the other hand, it looks to you that what I’m saying is relevant, then congratulations, and you are on the path to sabermetric insanity. How you choose to see what I’m about to say will lead you to two different paths.
1. Why, why, why call something a “vector” and then compound that by using seemingly meaningless numbers? A ball is hit at the “150 vector”. Wouldn’t it be more helpful to say that it was hit 30 degrees to the right of the 2B bag? And that the 210 vector was hit 30 degrees to the left of the 2B bag? If you subtract 180 for ever vector number presented, then you get 0 degrees over 2B bag, +45 degrees down the 3B line and -45 degrees down the 1B line. Those vector numbers as presented in the book don’t jump out as spray angles.
I understand that BIS stores the numbers as they are, but in terms of presentation details, and making something meaningful to the reader, spray angles is what you want, and forcing the angle at 0 degrees up the middle, so that +30 and -30 degrees are mirrors of each other are all reasons to make the change. Indeed, had FB3 started with this idea first, there’s no way in the world they would then convert to the “180 vector = 2B bag” as being better.
2. The Run Expectancy Matrix: never ever ever use one year of data. Some times you will get lucky, but, 2011 is NOT that year. How so? Question: would you score more runs from 3B or 2B? While that’s an easy answer, as it turns out that with two outs, more runs scored from 2B than 3B. In 2011 though. Historically speaking, you get .04 to .05 more runs scoring if you are on 3B than 2B with 2 outs. And you can guess that if you have a small sample size of runner on 2B-only and 3B-only situations with 2 outs, then random variation will rear it big disgusting head to make it seem like what we saw in 2011 is actually the truth.
3. After going out of their way to talk about run expectancy, they then look at “enhanced plus/minus” in terms of bases saved on the hitter only, and then credit a triple with three times the value of a single, and a double twice the value of the single. So, they completely ignore the run saving of runners on base (not to mention the fact that even at the batter level only, a double is not twice the single in terms of runs, and a triple is not three times, though it’s not terribly off).
That’s all the sections I got through.
For those who missed the two-part article that MGL wrote that set the standard for fielding metrics a decade ago, here they are:
It should give new readers an understanding of how the data is parsed to determine the chance of an out for a given play.
Dewan gives an illustration, though there’s nothing really new here for most people.
Bill James:
Their data shows that David Ortiz’ batting average on ground balls and short liners over the two years has been .245 when the shift was used, and .232 when it was not used. Their explanation for this is “Well, teams shouldn’t be shifting against David Ortiz anyway.” We only recommend using a shift against a hitter who pulls his ground balls and short liners 80% of the time. David’s not at 80%, so we wouldn’t recommend using the shift against him.
This is exactly the kind of information we need, on all hitters. When Tampa employs the shift, how do those hitters do with the shift and not the shift? Are there certain types of hitters where WOWY says that you should employ the shift? Is that “80% threshold” that type of hitter?
Based on Bill’s review of the book, the answer is “we don’t know”. Which is of course the very question that needs to be asked and answered. Anyway, Bill pulls no punches on his thoughts on the matter, which makes him exactly the kind of blogger that I enjoy reading. Any punch above the belt is a fair punch.
Bill does say early on:
This is not merely “fielding” that is being studied here; this is one of the best books ever written about sabermetrics.
Tough, but fair.
Great stuff from Doug, and you can see he really keeps up by his comments on Jason Bay. It’s been fantastic to see the evolution of fielding metrics, as you go from simple things like looking at
- plays made per game,
- to plays made per inning,
- to plays made per ball in play,
- to plays made per ball in zone
- to the adjustments based on
: spray angle (crudely into discrete angles, to continuous angles),
: launch angle (crudely into discrete ground/air, to now continous),
: hang time,
: pitcher/batter tendencies,
: park,
: game situation,
: and eventually the starting spot of every fielder
There’s been a huge amount of strides made, and we’ve got huge strides to go.
I’m a big champion of the scorer marking whatever he sees, however objective or subjective it is. The more information, then the more I can do with it, figuring the bias or whatnot.
Which is why it’s great to see this from Dewan. It’s not clear how they handle any scorer bias, but, it’s a step in the right direction.
Youk:
“He was limited,’’ said manager Bobby Valentine, who spent part of the winter watching tape of his new players. “His ability to turn and throw and turn and catch were not what he would want it to be, I’m sure.’’
According to Valentine, the Red Sox defensive metrics showed Youkilis having poor range to his right and left but still able to make plays coming in. As measured by the width of a baseball, Youkilis was minus-3 to the right, minus-5 to the left and plus-2 coming in.
MGL and Dewan seem to concur, overall.
Max performs his WOWY study.
He’s reporting a split-half correlation of close to r=.50. He doesn’t show how many PA on average in each group. I’m just going to assume it’s about 3000 PA in each group. If that’s the case, then you’d regress his observed numbers by 3000 / (3000 + PA), which is around the kind of impact a pitcher has on batted balls. The difference of course is that a catcher is on the field far more than a pitcher is, and so, we don’t need to wait 5 or 6 years to tease out his talent from his observations. We can do it in one year.
If all that is accurate, that makes Max’s process equivalent to something like UZR.
Great work Max!
Bill last week made a comment about 25 runs being too wide a spread between a good fielding RF and a bad one. He then went about to show if he could have been right, and he says he was dead wrong.
I knew he was dead wrong, and he proved it exactly as he should have done. He actually went into a sim-model. I’ll show you the math version, which anyone who read the end of SolvingDIPS will recognize.
First, we start with the spread in runs allowed, which Bill notes is about one standard deviation being 80 runs (it’s closer to 70 in the 1980s, and closer to 90 in the current era, so I’ll use 81, because it’s a nice number, 9-squared, and being 0.5 runs per game).
Anyway, defense is pitching + fielding. I’ll just throw some numbers out there, and say one SD in pitching is 64.5 runs and one SD in fielding is 49 runs. We just have to make sure that we get something so that 67.3^2 + 45^2 = 81^2. You want to make fielding as one SD = 40 runs or 50 runs, then fine, whatever. Just take something reasonable.
Now, we have 9 fielding positions (though obviously the spread in each position won’t be identical). For now, let’s say that it is. So, we have 9 times x^2 = 45^2. Which, you will notice, is simply 3 times x = 45. X is therefore 15. One standard deviation in fielding is 15 runs.
That’s if you have all the players equally represented at each position. Of course, we know that’s not true. Within each position, things are more clustered. You won’t find terrible fielders at SS or great fielders at 1B. So, within one position, it’s more like one standard deviation is 10 runs.
Anyway, that basically means that 95% of players at each position will be +/- 20 runs of the positional average, and pretty much the limit will be +/- 30 runs of the positional average.
UPDATE: If you neutralize the position like I do (adding the positional adjustment to UZR or Total Zone or what have you), then you use the one SD = 15 runs. So, almost all players will be +/- 45 runs of the neutral-position. We know that we add 12.5 runs for catchers, and we subtract 12.5 for 1B. So, a great fielding catcher who is say +30 runs would be +42.5 as position-neutral, and a crappy -30 1B would be -42.5 runs as position-neutral, for a range, in this case of +/-42.5.
Something like that…
In this great article by Mike Fast in BP a few months ago, he described a method by which he estimated catcher framing performance using Pitch f/x data. He was generous enough to provide a complete database for all catchers in 07-11.
From those numbers I computed an estimate of each catcher’s framing true talent by simply taking his total observed numbers and regressing toward the mean (zero) by adding 4500 called pitches (about 75 called pitches per game, BTW) of league average framing (zero of course), as he suggests in the article. I did not do any weighting by year, age adjustments or anything like that. I just used the 4 year combined numbers that Mike provided. (BTW, I later learned that there was an error in Mike’s computations, so I multiplied his run values by .65, as per Mike).
To test his numbers, I first broke the list of catchers and their true talent framing skill into two groups of around 25 players each (an arbitrary number of players in each group) - the best and the worst. The average framing skill in the best group, weighted by the number of PA they caught in 07-11, was +7.5 runs per 150 games, and for the worst group, it was -7.7. That is around a .05 runs per game influence, which would show up in their pitcher’s ERA, RA9, or ERC (component ERA). Only a part of that would show up in DIPS or FIP, since framing also influences BABIP.
Anyway, to test his number, I did a WOWY on those catchers. I looked at the results of all pitchers they caught when they were in the game and when they were not. I did not control for anything else, like park, batters, H/A, etc. A pretty standard WOWY analysis. We can thank Tango for that, BTW. I then looked at the WOWY differences in wOBA, SO, and BB rates.
I looked at 05-11 for some reason rather than just 07-11. So I used some in-sample data (07-11) and some out-of-sample data (05-06). The average catcher in the “good framing group” this time pro-rated to the number of PA they caught in 05-11 (rather than just 07-11) was +7.3 and for the “bad framing group”, -7.6, around the same as for 07-11. IOW, also around .05 runs per game.
Here are the results:
The good framing group had a wOBA difference of .008 points. IOW, looking at the same pitchers, when the good framing catchers caught them they allowed a wOBA of 8 points less than when some other catcher (a slightly bad framing catcher, on the average) caught them. That translates to around .24 runs per game - a lot more than we expected. The BB per PA had a .004 difference (around .15 fewer BB per game) and the K was .003/PA (.11 per game) more.
For the bad framing catchers, they had a .003 higher wOBA, or .09 runs per game, .11/game more BB, and .23/game fewer K. The runs per game number is also more than we expected.
However, we expect to find much more of a WOWY effect in the in-sample data than is expected using the regressed in-sample framing data, because the actual framing performance of these good and bad framing catchers was much more spread out than the estimated true talent numbers (the regressed performance).
The total number of “min” PA were 302, 434 for the bad framers and 88,738 for the good framers. So the standard error in wOBA is around 1.7 points for the good framers and .9 points for the bad framers. (That is not exactly how you do a standard error for a WOWY; in fact, the real SE’s might be almost double since a WOWY is a difference between two numbers.)
Now, this is not such a great test because most of the data is in-sample (07-11). IOW, in the WOWY test, I used the same data that Mike used to come up with his catcher framing numbers. While he did not use the same method at all (WOWY), it is possible that there are some dependency issues.
The best way to test his numbers is to use out of sample data (and hope that the catchers had around the same skill that they had with the in-sample data).
So first I only used Mike’s data from 07-09 (and did the appropriate regression of course) and then I did a WOWY from 05-06, and 10-11 (4 years).
The average catcher in the bad framing group (based on only 07-09 framing numbers), prorated by the number of PA they caught in 05-06 and 10-11, was -8.9 per 150, and in the good group, +8.1. That is around .057 runs per game.
Here are the results of the out-of-sample WOWY. These numbers should be close to (rather than larger) the true talent estimates, unlike the in-sample numbers.
Bad framers
wOBA diff: .09 runs/game
BB diff: .114 BB/game
K diff: .19/game
Good framers
wOBA diff: .03 runs/game
BB diff: .076 BB/game
K diff: .114/game
These numbers combined, (.03 + .09)/2, or .06 runs per game, are exactly in line with what we would expect from Mike’s numbers, which is very comforting. In fact, I love it!
Later today, I will do the same test on Max Marchi’s numbers, which were also derived from the pitch f/x data, but use a different method I think…
May 16 22:50
Dodgers’ win reversed because Mattingly did not attest to proper score!
May 16 20:44
How to beat the shift
May 16 20:02
Sponsoring MLB jerseys
May 16 19:34
Now you frame it, now you don’t
May 16 16:56
Did Manny Pacquaio actually quote Leviticus?
May 16 16:06
Does changing your pitch frequency lead to substantial change in results?
May 16 14:18
Extra Innings: One-minute review
May 16 14:16
This particular criticism of UZR is unfounded
May 16 13:21
Psst… wanna intern for the Astros?
May 16 12:23
Arena wars
THREADS
May 16, 2012
Now you frame it, now you don’t
May 16, 2012
Dodgers’ win reversed because Mattingly did not attest to proper score!
May 16, 2012
Does changing your pitch frequency lead to substantial change in results?
May 16, 2012
Sponsoring MLB jerseys
May 15, 2012
Andre The Hawk Dawson speaks
May 15, 2012
Euro 2012 Preview
May 15, 2012
How to beat the shift
May 15, 2012
Will Pujols end the season with at least 30 HR and .500 SLG?
May 15, 2012
Kershaw v Strasburg, part 2
May 15, 2012
Did Manny Pacquaio actually quote Leviticus?
Recent comments
Older comments
Page 320 of 342 pages « First < 318 319 320 321 322 > Last »Complete Archive – By Category
Complete Archive – By Date