THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, October 28, 2008

Scouting Play-by-play logs

By Tangotiger, 02:35 PM

We tracked a good portion of the balls in play yesterday, using nothing more than our eyes and baseball sense.  Below you will find the complete log of our observations.  Note that this was our first try, and obviously, we’d need some guidelines.  For example, if you have a play that will be made 40% of the time by 20% of the fielders, and 20% of the time by 30% of the fielders, 10% of the time by 40% of the fielders, and 0% of the time by 10% of the fielders, it’s not really apparent what the average is (18%). 

Anyway, for posterity’s sake, and future research purposes:


112 TangoTiger FANS 95% Utley on Baldelli
115 Phil D FANS 98% Utley on Baldelli.
127 Zach FANS 99% Utley on Baldelli

143 TangoTiger FANS 100% Rollins in RF
149 Phil D FANS 100% - Baldelli on Rollins

167 TangoTiger FANS 10% Longoria on Werth
169 Phil D FANS 5% - Longoria on Werth

181 TangoTiger FANS 100% Longoria on Utley

207 TangoTiger FANS 85% Rollins
211 Zach FANS 95% Rollins
213 MGL FANS 95% Rollins
216 Phil D FANS 90% - Rollins on Bartlett

229 Zach FANS 0% on Iwamura grounder
233 TangoTiger FANS 5% Hamels on Iwamura, 1% SS/3B on Iwamura

235 Zach FANS 70% Crawford to Utley
237 TangoTiger FANS 85% Utley
239 Phil D FANS 10% Hamels
240 Phil D FANS 90% Utley

250 TangoTiger FANS 100% Baldelli on Werth
252 Phil D FANS 100% Baldelli
255 Zach FANS 100% Baldelli

263 TangoTiger FANS 10% Feliz, 60% Rollins
265 Phil D FANS 80% Rollins

271 TangoTiger FANS 10% Werth on Pena
273 Zach FANS 2.5% Werth on Pena
277 Phil D FANS 10% Werth

285 TangoTiger FANS 10% Rollins on Longoria
289 Zach FANS 25% Rollins
295 Phil D FANS 20% Rollins

307 Zach FANS 85% Rollins, 75% Utley finishing the DP
309 TangoTiger FANS 98% Rollins, 97% Utley
315 Phil D FANS 100% Rollins - just the out at second, right?
316 Phil D FANS 95% Utley

330 TangoTiger FANS 10% on Bartlett
332 Phil D FANS 5% Bartlett
336 Zach FANS 10% Bartlett

338 TangoTiger FANS 75% Kazmir
344 Phil D FANS 80% Kazmir (to 2B)

362 Zach FANS 95% Iwamura
367 TangoTiger FANS 98% Iwamura on Utley .... I really liked Utley in that PA
369 Phil D FANS 95% Iwamura

386 TangoTiger FANS 100% Rollins .... now, here, are we going to have to call it 70% because of the rain?
394 Zach FANS 100% Rollins; 85% considering the conditions
396 Phil D FANS 95% Rollins.

409 Zach FANS 85% getting the tag, 95% to first
427 Phil D FANS 80% Utley the tag.
428 Phil D FANS 95% Utley the throw.

546 TangoTiger FANS 100% Crawford on Victorino
552 Zach FANS 100% Crawford
559 Phil D FANS 98% Crawford

576 TangoTiger FANS 100% Pena.... now about those field conditions, and the point of all this...I would say that under these conditions, we shouldn’t even bother tracking it, if we stick to field conditions plays.... I think this game would have been a rain delay if it was not the World Series
582 Zach FANS 95% Pena
586 TangoTiger FANS 100% Pena

664 TangoTiger FANS 100% Howard
666 Zach FANS 100% Howard
670 Phil D FANS 95% Howard

680 Zach FANS 30% Rollins
693 TangoTiger FANS 40% Rollins (though I’d like to see it from the higher angle with Upton… might be 10%)
701 Phil D FANS 40% Rollins

740 Phil D FANS Ruiz 0%
750 Rally FANS 0% for left fielder
754 TangoTiger FANS 0% Pena
760 Zach FANS 0% left fielder

764 TangoTiger FANS 10% Burrell on throw

775 Zach FANS 90% Victorino on Longoria
781 Phil D FANS 90% Victorino

The full log can be found on the liveblog.

#1    Tangotiger      (see all posts) 2008/10/28 (Tue) @ 14:55

There was really two tough calls, both involving Rollins as the fielder.  One where the ball went by Feliz, and Rollins got it, and made the throw from the hole. The other is with Upton running.

But, I think all we’d need is an hour of experience and guidelines to feel comfortable in making our calls.

I think this methodology would blow UZR, Dewan, and PMR out of the water, no offense to the three gentlemen noted.  Those guys have to figure out, based on whatever parameters they have, as the chance of Rollins getting Upton out on that particular GB, and depending on the parameters they use, they can come up with anything from 10% to 90%.  It could really be that wide.  Now, obviously, alot of these things work out in the wash.

Nonetheless, with the NHL employing about twice as many scorers as MLB (something like 12 to 6), even though MLB has twice the revenue, it should be no sweat for MLB to hire three balls-in-play stringers.  90 stringers, plus a few people to manage all this… 1 million$.  Seems like it should pay for itself in no time.


#2    MGL      (see all posts) 2008/10/28 (Tue) @ 22:25

What’s the payoff to MLB?  I agree that this kind of system would blow away UZR, Dewan, etc.

You would want some kind of feedback loop as I mentioned in the love blog.  For example, someone would go back and take all the made plays that we classified as 90% and all the missed plays we classified as 10% and see if they were about the same.  Or look at video and try and match made plays with missed plays and see if they add to 100%.  Or something like that.  At the very least, we would want to see if there were any systematic biases in the ratings of plays.  That is one thing that UZR-type systems have going for them, I think.  We know exactly how often a certain play would have been made (instead of the observer guessing), where “certain play” means only (approximate) speed and direction. 

Maybe some combination of the two systems would be much better than one alone.

And maybe UZR/Dewan et al. actually is better for large samples of data.  It should be.  The only thing it lacks which the human observation does not, is the fielder initial positioning, which is no small thing.  Of course, and we have talked about this before, to some extent we want to include fielder positioning on our evaluations of a fielder.

When doing the type of evaluations we did last night, I forget whether our instructions were, “given where the fielder WAS playing,” or, “given where a normal fielder usually plays.” I think it is the former.  If it was, then if a fielder happens to play in better position for whatever reason (e.g., maybe he is good at adjusting for pitch thrown or he knows the batters better), he is going to get penalized by our new system, but rewarded by UZR/Dewan.

So there definitely would have to be some kinks worked out before we get all excited, and even then (with kinks worked out), I am not more than 80% convinced that it would be a better system for one or three years worth of data.

Certainly, it would augment the STATS and BIS data considerably.  Given that, I guess, it is hard to believe that it wouldn’t be better by itself, but again, the (roughly) objective data that goes into UZR/Dewan adds something to the mix as well.  As I said, I would be a little concerned about us butchering certain types of plays or just being biased in general with respect to certain types of plays.  It probably would not be a bad idea to include “certainty” for all plays.  For example, 80% play, with 60% certainty!


#3    David Gassko      (see all posts) 2008/10/29 (Wed) @ 00:30

Mickey,

This already exists in the STATS play-by-play data. They label each play 1-5 in terms of difficulty. You could easily compute how often a play is made based on its difficulty rating, compute a +/- kind of metric from that and see how well it predicts a player’s UZR in the next year. My guess is that it would not do very well, but I’m willing to be proven wrong.


#4    salb918      (see all posts) 2008/10/29 (Wed) @ 10:42

I have a question.

Regression to the mean depends on two things, right?  The first is sample size and the second is the spread in skill among the overall population.

A player will see, in general, fewer chances in the field than the plate appearances. 

The difference between the best and worst players by offense is ~90 runs while the difference between the best and worst players by defense is ~40 runs.  Something like that.

Anyway, it the regression for estimating fielding skill will be much heavier than the regression for hitting skill.  Let’s now say we had a perfect defensive metric (call it superZR) that took into account velocity of ball off the bat, number of bounces, time in the air, etc.

Would the marginal difference between the theoretical superZR and UZR/PMR/RZR/SFR/whatever be completely swallowed by the strength of the regression (whether to the mean or to the scouts)?

I guess a simple way to test this would be to convert a “bad” fielding metric, like FPct or ZR, into runs and estimate true skill by regressing to the mean (or scouts) and then compare that to Chone’s projections.  Anybody interested in tackling this?


#5    MGL      (see all posts) 2008/10/29 (Wed) @ 13:22

Sal, the “spread in skill” variable with respect to the amount of regression, is “the spread in skill that metric is actually measuring.”

The better the metric, the more spread in skill there will be, so that is why the perfect metric gets regressed less than the less-than-perfect one.

David, yes I know that STATS does that.  It is 1-4 and not 1-5 (at least the data that I use) and only on balls that are actually fielded (it would be nice to have the converse on balls that were not fielded).


#6    salb918      (see all posts) 2008/10/29 (Wed) @ 13:32

The better the metric, the more spread in skill there will be,

Why is this necessarily true?  I’m not trying to be difficult, I just don’t understand.


#7    MGL      (see all posts) 2008/10/29 (Wed) @ 19:34

That is tautological.  If the metric is not measuring any skill at all, how can there be a spread?  It is not the spread of skill that we are really interested in.  It is the spread of the skill that we are presumably measuring.  That is by definition.

What if there is a large (or whatever) spread of defensive skill in the population.  And let’s say that we have a defensive metric that used the color of a player’s uniform to measure his defensive skill.  What is the regression?  100% right.  What do we care what the spread of defensive skill actually is if that is not what we are measuring?

If we use fielding average as our metric, we only care about the spread of skill with respect to fielders’ error rates, in order to determine the regression amount (well, that and the sample size of course) right?  What do we care about the spread of skill in fielders’ range?

When we do, for example, a year to year regression, that tells us the spread of skill given the underlying sample sizes in our regression of course (the number of opps for each player). But what people often don’t realize is that it only suggests the spread of skill which we are actually measuring (based on how good our metric is), not the spread of skill which we think we are measuring.

Again, if we have a really crappy defensive metric (not the uniform one) - something which measures defense to SOME small degree, but not much - the y-t-y correlations will be small and thus the regressions will be high, but that certainly is not because the spread of skill for defense is small.  It is because the spread of skill for what we are measuring is small.

There is one catch to that.  It is possible that our metric is bad that it correlates well from one year to another and thus we don’t regress it a lot. But that does NOT mean that it is a good metric for what we are measuring!  It just means that something is correlating well from year to year, but not what we think we are measuring.

The reason we like UZR is not only that it correlates well (with itself) from year to year, but that we like and trust the methodology.  If I presented it as a great defensive metric without talking at all about the methodology, but I proves that it had a great correlation from year to year, it would be a meaningless metric for us to accept and trust.  The basic difference between accuracy and reliability.  For a good metric, you want both if possible. Sometimes, you can’t really measure the accuracy, because you have nothing to measure it against (like lwts and actual run scoring).  In that case, all you can do is evaluate the methodology and hope that that implies good accuracy.


#8    salb918      (see all posts) 2008/10/29 (Wed) @ 20:25

ok, fair enough--but do we think that, even by superZR, that the spread in fielding skill will be revealed to be >40runs from best to worst?  That would be quite amazing if we found that the spread was of order of 80 runs as it is in offense!


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 09 16:41
Sabermetric Moves of the 2009 Pre-Season

Jan 09 19:56
Modeling Baseball Player Ability with a Nested Dirichlet Distribution

Jan 09 18:08
Line Drives

Jan 09 18:04
Challenging Nate Silver (and all other forecasters)

Jan 09 17:31
Cheers

Jan 09 17:14
Teaching sabermetrics at school

Jan 09 16:51
The first Hardball Times Annual available for download!

Jan 09 14:44
Vote for the Worst Player in MLB

Jan 09 12:29
Clint Eastwood is Archie Bunker

Jan 09 12:16
Mailbags on Parade