THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, February 29, 2008

Observational Analysis

By Tangotiger, 10:30 AM

Greg is right on.  Good article, and is an excellent intro to his fantastic piece in the THT08 Annual.  Along with Walsh’s article, these two articles are worth the price of the book by themselves.

I’ve long said that the pinnacle of sabermetrics will be the convergence of Performance Analysis and Scouting Observations.  PITCHf/x and HITf/x are FIELDf/x will be front and center at this revolution.  Thank you Sportvision: you will enter the Sabermetrics Hall of Fame.

Question to MGL: can you tell us how STATS codes these two plays that Greg references?  And, if Appleman is around, how did BIS code those two plays?

I will disagree with Greg about only knowing the speed off ball (SOB) as not being good enough.  If STATS and BIS correctly labelled the location of these plays, and if you have the SOB, and the angle off the bat, that would go a long way to tell you how fast the ball got there.  It would be more helpful if you knew how many hops as well.


#1    Peter Jensen      (see all posts) 2008/02/29 (Fri) @ 12:31

The best system will be a Sportvision system enhanced with extra cameras to cover the entire field (let’s already give it a name, Total f/x, maybe that will make it happen more quickly).  I know that this is under serious consideration as a future enhancement, but when it might occur is dependent on having a paying customer for the additional information and the creation of complex software that automatically track and enter the information desired.  When that might actually happen is anybody’s guess.

Hit f/x, using the existing hardware and the existing software modified to track the ball as it leaves the bat, would be an interim alternative while the ideal system is being developed.  As Greg notes it would give an exact measure of the speed off the bat, but only a calculated estimate of the ball’s landing spot.  And it would give no information about fielder positioning.  It too will require paying customers, but since the software modifications are relatively minor the development costs would be much less than a Total f/x system.  This means that it could be producing data much sooner, hopefully later this year or next.

And the statement that it would add nothing to defensive analysis is much too strong.  There is no reason that the environmental factors that Greg mentions couldn’t easily be incorporated into a Hit f/x system.  Altitude is a constant and can be hard coded into the software for each stadium.  Temperature is readily available and could be programmed by the operator at the beginning of each inning.  Wind speed and direction are the most difficult and, unfortunately, the largest factors.  Greg goes to great pains to get the best estimate for them for his Hittracker analysis, and it is doubtful that another observer will be as accurate, but estimates by the Sportsvision operator might be close enough.  The last factor is the Magnus force on the ball.  Both Hit f/x and Hittracker would have to estimate this, but the detailed information captured by the Hit f/x cameras might make their estimate more accurate.

The bottom line will be if an estimated landing location by Hit f/x will have a smaller margin of error than the estimated landiing location of trained and motivated observers.  Or, more to the point, will the estimated landing location by Hit f/x be accurate enough for aggregating data for defensive analysis.  This can be tested, and until it is it is to soon to write off Hit f/x as a potential source for defensive data.


#2    Tangotiger      (see all posts) 2008/02/29 (Fri) @ 13:03

I believe that a Wisdom of Crowds approach would beat any software for landing spot(s).

You have an average of 30,000 people at a ballpark every game.  They have the singular focus of tracking a baseball.  Imagine equipping 1% of them (300 fans) with a handheld device, where their job would be to:
a. tell us whether the ball hit the ground, fielder, or fence first
b. tell us at what point on the field the ball hit the ground, fielder, fence
c. tell us how many hops it took before it was picked up by a fielder, or hit the fence
d. which fielder touched the ball first

You can then simply take the median of all the 300 observational points.

Furthermore, you can ensure quality control by seeing how well each person conforms to the group.

Say that for question a), 280 people said the ball hit the ground first, and 20 said it hit the fielder first.  That gives you a very high reliability on the 280 and low on the 20.  (It’s possible on the real shorthops that the 20 were right, especially if they are all positioned in the front row infield.  You’d use that information as well.)

I would say that the consistency of results would be far higher than any software.  There’s simply no way you’ll get “loss of data” or gaps that software and hardware would almost surely give you.

How much could this even cost?  And I don’t even think 300 fans is needed.  Even just 30 would be fine by me, if you spread them out over the entire park: human triangulation.  30 fans x 30$ x 81 games x 30 teams = $2MM recurring costs

Hardware: 100$ per unit x 300 fans (presuming you have a base of 300 fans from which 30 will be chosen every game) x 30 teams = $1MM hardware costs

Software: $1MM costs

So, you need $2MM up front, plus $2MM every year.  What would Sportvision propose?


#3    Peter Jensen      (see all posts) 2008/02/29 (Fri) @ 13:31

You just don’t need super accurate hit locations to do excellent fielding analysis.  Since the data has to be aggregated together to avoid the problems of small sample sizes, you don’t gain much,if anything, by having the accuracy increase from +/- 10 feet to +/- 2 feet.  Hit f/x should be within +/- 10 feet, plus it gives you the speed off the bat, which is necessary for fielding analysis as well as the single most important piece of information that we don’t have for pitching and offensive analysis.  And Hit f/x is the only method that can consistently give you an accurate SOB.


#4    Tangotiger      (see all posts) 2008/02/29 (Fri) @ 13:45

One thing I would add is the position of the fielders when the pitcher is on the mound.  In that case, I’d probably split up the Fans’ recording to ensure that half start with the OF and half start with the IF, and if they have time, record the position of the remaining group.


#5    Peter Jensen      (see all posts) 2008/02/29 (Fri) @ 14:05

I would estimate that it would be cheaper to implement Total f/x in upfront costs and have no recurring costs (beyond what they already have for pitch f/x), plus no costs for dropped, lost, or stolen data recorders.  Total f/x (and Hit f/x) would operate totally automatically (no extra employee necessary for downloading the data to the database from your data loggers). Total f/x would give you exact values for the data that your system only estimates and you get speed off the bat which your system could never give.  You do the cost benefit analysis.  Your idea is a non starter.


#6    Tangotiger      (see all posts) 2008/02/29 (Fri) @ 14:28

I was only focusing on the FIELDf/x portion of the TOTALf/x. 

Focusing only on what FIELDf/x would provide (location of ball, fielders), there will be gaps.  After all, it’s trying to measure objects.  It’ll pick up anything in motion (3B coach, umpires, birds), the wind and fans stomping their feet can set the hardware off by a bit.  So, a big portion of FIELDf/x is an estimate based on however it is designed and calibrated.  Gaps in data will be guaranteed, just as it is in PITCHf/x.  And quality of data will always be an issue (at least for the foreseeable future).  Putting GPS on players would help immensely.  Players may object, but I’m sure their wives can force them to agree, especially if it’s embedded within their wedding bands.  “Honey, I’m telling you, I was on the bench!”.... “Is the bench out in the bullpen, behind camera view?”.

The huge upfront costs of FIELDf/x is in the design and testing phase. 

I’m not so sure that my proposal would be a non-starter, especially if the number of Fans to make it successful isn’t 30 but just 5 or 8.  And instead of having 300 units allocated, you only need say 20 that must be returned after every game.


#7          (see all posts) 2008/02/29 (Fri) @ 15:42

Tango - I’d suggest a point-and-click computer program, and have the triangulators be at-home viewers.  They’d have the advantage of any telecast replays, good camera views, and potentially DVR service to rewind anything they missed.  And they can watch at their own pace.  Only issue is, everyone gets the same point of view from the broadcast, so I guess it’s not true triangulation.


#8    MGL      (see all posts) 2008/02/29 (Fri) @ 16:47

Fielding may be complicated enough that “wisdom of the crowds” for just recording how tough a play was, whether made or not, would be better than any objective methodology involving even perfect data, including fielder positioning (which is critical of course).

I have always thought that the best metric might be having a few trained and experienced observers recording whether a play was made or could have been made by an average fielder 10%, 20%, 30%, etc. of the time.

OK, maybe exact fielder positioning and perfect data on the ball might be better.  Even then, I think you still want some additional “subjective” data, like “bad hops” a bad sun field, swirling wind, etc.  The pure subjective methodology would take care of that of course.  For example, even using perfect ball data, let’s say you have a pop fly on the infield with 40 mph gusting, swirling winds.  If the 1B drops the ball, that will go as a terrible error using perfect, objective data.  If we have the “fan toughness rating,” they might say that only 50% of all 1B would make that particular play.  Same thing with a sun field in the OF, or wet grass, etc.


#9    MGL      (see all posts) 2008/02/29 (Fri) @ 16:50

Question to MGL: can you tell us how STATS codes these two plays that Greg references?

Which two plays?


#10    Tangotiger      (see all posts) 2008/02/29 (Fri) @ 17:31

Agreed on 8.

Two plays:
“On Aug. 25, 2007, in the 4th inning of a game against the St. Louis Cardinals, Atlanta’s Andruw Jones hit a leadoff single. On Sept. 20, 2007, in the 6th inning of a game against the Milwaukee Brewers, Andruw Jones grounded out, 4-3, to end the inning. At a glance, these two events don’t seem to have much in common besides the batter, but beneath the surface there is an interesting story.”


#11    Colin Wyers      (see all posts) 2008/02/29 (Fri) @ 18:29

Mike/7: At-home viewing wouldn’t give you the data you need unless you had a special feed just for that purpose. When the ball is put into play, normal telecasts are focused on the pitcher/batter matchup; you don’t get any idea of where a fielder was positioned prior to the ball being hit watching a game on TV.


#12    MGL      (see all posts) 2008/02/29 (Fri) @ 18:56

I wouldn’t think that there would be any differences in the way that STATS scored those plays, other than they include a subjective “difficulty factor.” I’ll check though.


#13    dave smyth      (see all posts) 2008/02/29 (Fri) @ 19:04

I agree completely with MGL #8.


#14    Greg Rybarczyk      (see all posts) 2008/03/02 (Sun) @ 01:08

Late to the discussion, just back from vacation :(

#0 SOB good enough only if the marked locations from STATS and BIS are landing points, not fielding points.  Otherwise, I agree with it all..

#1 If a model were grafted to HitFx to incorporate weather, then it wouldn’t be just HitFx, so I think we agree, Peter.  HitFx plus a good aerodynamic model would be very powerful…

#2 Tango, did we ever swap messages about this?  I can’t recall if we did, but I have a bunch of notes of how a wisdom of the crowds app would work, most of which match your description pretty closely.  I think I had envisioned it being from TV instead of live, but otherwise it is a super idea that I think may well happen…

Incidentally, this method would also produce a Retrosheet-style boxscore of the game if you had those observers also write down the basics of the action - it would be nice to have that created and validated by the crowd in real time…

#8 MGL - nice idea, but the subjectivity worries me.  If someone has to decide how tough a play *was* by how tough it *looked*, then we are once again exposed to silky-smooth shortstops who make jump-throws from the hole (well, except for the part where Tango counts up the plays and we wonder where all the outs went).  Not to mention that if we compile a large number of votes, we open up to sampling bias, as there are more viewers in certain TV markets than in others (wait, am I describing the same problem here?)

Overall, this would capture some things we would want (the sun, wind part), but we’ve got to try to avoid scoring plays subjectively whenever possible, IMO.

#11 Totally agree, the current camera angles are a problem, but they don’t have to be forever.  Widescreen TV’s changed the way we see football, as you can now see a lot more of what’s going on in the secondary than ever before.  We could wind up seeing a wide angle shot of the field a lot more frequently.  I’m not sure how to make that happen, but the predominant camera angles are not set in stone, anyway…


#15          (see all posts) 2008/03/04 (Tue) @ 00:03

#2 I’m not a software engineer, but it seems to me that you could eliminate some of those hardware costs if the proper program could be written for web enabled PDA’s and cell phones where the data could be texted in. If you posted the program on your site, you may get 5-8 observers in most ballparks to at least attempt it.


#16    MGL      (see all posts) 2008/03/04 (Tue) @ 00:43

Tango, STATS coded those two plays exactly the same.  Ground ball, in location “M” (just to the left of the second base bag), medium speed.  STATS also codes how easy or hard a play it was (fielder skill level) for each fielder.  The ground out was coded “22”, which means it was a 2 for the fielder and a 2 for the first baseman.  The scale is 1-4, with 1 being the easiest, so a “2” is easy but not the easiest, I guess.  I always like an odd number for “scales” so that it is easy to choose a medium or average designation, but maybe people who are experts at coding things know it is better NOT to have an odd number of choices because that way it is too easy to just call most things “medium” or a “average.” Maybe they want to force the observer to take a stand. I don’t know.  Not my area.

Kind of makes me realize that I really should be using the “fielder skill” designation for UZR.  it would have helped with this play and I am sure, as is pointed out this article, that there are plenty of other situations where it would help.  Basically a “fielder skill” can give us some idea of the positioning of the fielder.  OTOH, it depends if we want to include the fielder positioning in his skill set.  If a fielder makes a spectacular play because he is out of position, and hence, STATS would record it as a “4” do we want to give him extra (UZR) credit?  Probably not.  OOTH, if a fielder is “out of position” because of a shift, then his position should be accounted for and most of the PBP metrics will screw that up.  Which reminds me of something else.  On my list of things to do is when compiling the UZR data, to assume a shift when certain batters are at the plate, like Ortiz, Delgado, and Bonds, although I am not exactly sure how to get that information other than from “common knowledge” or from inferring it from the data.  Maybe Jeter gets screwed in UZR (and other metrics) because of all the times he plays on the second base side of the field with Papi at the plate! wink


#17    MGL      (see all posts) 2008/03/04 (Tue) @ 00:45

Of course your WWOY takes cares of shifts, positioning, and everything else!


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main