THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, July 06, 2010

TotalZone updates

By Tangotiger, 12:07 AM

I just woke up from putting my kid to sleep.  I am definitely going to put off reading this to tomorrow.


#1    Rally      (see all posts) 2010/07/06 (Tue) @ 11:07

Peter posted this in another thread:

“Rally #84 - Congratulations on your update!  Its a lot of work, but I think you will find it is worth it.  Did you update your past files as well as 2010?  Also, the convention for designating hit ball angles that was adopted here last year by consensus has 3d base at -45 and 1st base at +45.  I think you will find that most people have been using that system since then, reluctantly even me.  Please email me as we have many things to compare between your new system and BZM.”

I’ll respond here since that thread is a bit crowded.

I guess I need to multiply my degrees by -1 to match up with the consensus.  I do have 2B as zero.  No big deal though, since I don’t intend to publish the degrees in any way, just code inside my database to get the run out put.

I did do the back years, 2005 and on.  The basic idea remains the same, chance for a player = plays fielded + plays not fielded.  The big difference is being more precise on plays not fielded based on where the ball is hit.  It makes a big difference in the infield, and some in the outfield though I haven’t compared the details yet. 

The situation in the OF is this:  If a ball is hit to left center for a hit, Shane Victorino gets to it before Raul Ibanez.  Old TZ charged this as a hit against Victorino always, new TZ will charge it to Ibanez if it was hit closer to his zone.  Plays like this are a relatively minor problem, my guess is 90%+ of the outfield hits are picked up by the guy whose zone they were hit in.  Not as big a deal as the partial credit for everything hit between 3B and SS, but it helps a bit.

I found a pure bug in the TZ ratings of 2009 rightfielders that is fixed along with the update, so we can compare old TZ and new for LF and CF, but old RF data should be thrown out.

My guess is that David Appelman and Sean Forman will get the new stuff up on their sites before I do mine.  I’ll update once I finish the 1950-1953 ancient TZ ratings after retrosheet’s update.


#2    Guy      (see all posts) 2010/07/06 (Tue) @ 12:53

Great stuff, Rally.  Would it be easy for you to tell us how this changes the ratings, if at all, of the best/worst fielders 2005-2009 (combined) at a few positions?


#3    Tangotiger      (see all posts) 2010/07/06 (Tue) @ 17:11

For Righthanded batters: The third baseman starts at 45 (actually, there are a few hits in foul territory, and the third baseman gets those too. A ball that goes past the bag fair and is picked up by the left fielder in foul ground will show an angle greater than 45). The third baseman’s responsibility ends at 25. 24 to -1 belong the the shortstop, and -2 to -27 belong to the second baseman. The first baseman gets -28 to -45, plus the ones in foul territory. For lefty batters, the 3B gets 45 to 23, shortstop 22 to -4, 2B -5 to -28, and 1B -29 to the foul line and beyond.

Ok, looking at my (unpublished) system, and I pretty much agree on the infielders, and we’d have to, since I applied similar reasoning.

And I love the hard work put in here:

Yunel Escobar, the top rated shortstop last season, made 48 plays out of his zone (only Miguel Tejada had more). Through MLB.com’s game archives, I watched almost all of these plays. On 13 plays, he was on the second base side because the defense used the shift against a lefty pull hitter. These were ordinary plays, not evidence of great range preventing a hit that the 2B should have had. He probably had more chances on the shift than most shortstops, playing in the same division as Ryan Howard and Adam Dunn. These two players hit 11 of the 13 shift balls. There are a few cases where the hit location code is clearly wrong, such as when the coding indicates the ball was in the 3B zone, but the shortstop actually moved slightly to his left to field it, or when the coding says it’s on the 2B side (having 2B as a marker makes it much easier to judge where the zone boundaries are), but the shortstop clearly fields it on his side. There were 6 miscoded plays, 21 more where it appears the ball was in the shortstop’s zone (though not certain), including some that were routine grounders. There were 2 plays where I couldn’t load the game or find the inning in question. Only 6 plays, in my judgement, were outstanding plays where Yunel ranged into another fielder’s zone.


#4    Tangotiger      (see all posts) 2010/07/06 (Tue) @ 17:27

My conclusion is that this process is best: Count all plays made, regardless of where the ball is hit. Count hits against a fielder when they pass through his zone. There are some problems, on shift plays that are not made maybe the shortstop should be charged instead of the 2B, depending on where they set up. And if there are some outs that are coded clearly in the wrong zone, then some hits must be miscoded as well. Those are limitations I’ll have to live with, as fixing the data errors would basically entail watching every game for every team.

This really comes down more to a ZR type system, of which it’s easy enough to explain.


#5    Brian Cartwright      (see all posts) 2010/07/06 (Tue) @ 18:28

I think Rally presented a very clear and well reasoned explanation.

Peter #104 in the VORP thread
“I wish we had more people using the MLB hit locations.  At one point both Brian and Colin had fielding metrics in the works that were going to use the hit locations, but I don’t think that Brian has incorporated them yet”

I am getting there. I now have done the necessary table joins to create a hits table where I have gamename, event_id, park_id, resp_bat_id, resp_pit_id, x, y, and result (along with a few other columns). Just haven’t gotten the time yet to go to the nest steps, which would be to aggregate by ballpark in order to determine park adjustments (need to reread Peter’s articles on this) and then calculate the angle and distance for each batted ball. Then of course it will take some time to find the best way to incorporate the new data.

Where I expect to get the most benefit is in assigning ground ball hits to the outfield to an infielder. This is currently my most time consuming query, and am hoping that the vectors will not only produce a more accurate estimate but also require less processing time.


#6    Rally      (see all posts) 2010/07/06 (Tue) @ 20:48

Brian, are you linking the hit location to retrosheet?  Or getting the whole thing from Gameday?

I’ve come a long way in the last year or so, but getting the fielder identities set up to make my own retrosheet DB is beyond my skills right now.  As for getting the hit location angles, email me if you have any questions.

Here are the leaders and trailers for Old TZ from 2005-2009.  I was surprised to see the runs leader for all of MLB is a first baseman.  My stats sure do like Pujols.

1B Pujols +75
Overbay +51
Teixiera +42

Giambi -25
Fielder -29
Sexson -34

2B
A Hill +48
Ellis +44
Carroll +42
Utley +41

Uggla -31
Cantu -31
Weeks -50

3B
Inge +65
Feliz +59
Rolen +56
Lowell +45
...
Beltre +29

Braun -34
Wright -34
Teahen -51
EE-5 -56

SS
Everett +63
Vizquel +42
Tulo +35
Escobar +35

Jeter -20
...
Betencourt -25
B Harris -28
F Lopez -30
Tejada -34

Now I’ll look at the new metric


#7    Matthew Cornwell      (see all posts) 2010/07/06 (Tue) @ 20:52

Rally,

Will converting to new TZ affect pitchers?  If so, how?

Since we are talking fielding and TZ, here is another question I asked on another post but have yet to receive comment for yet.  Hopefully it isn’t too far off topic.

say a pitcher like Jiminez has an incredible GIDP rate due to high GB rates but the infield defense behind him is also above average.  How does Total Zone treat the extra double plays that he induces?  Does his team defense get all of the credit for the extra double plays turned? Does Jiminez’ WAR see the benefits or is it somewhere in-between?


#8    Rally      (see all posts) 2010/07/06 (Tue) @ 21:04

Now with revised in the 2nd column:

1B Pujols +75, +80
Overbay +51, +68
Teixiera +42, +32
(Helton +56 revised)

Giambi -25, -21
Fielder -29, -50
Sexson -34, -23
(LaRoche -55 r)

2B
A Hill +48, +35
Ellis +44, +40
Carroll +42, +43
Utley +41, +48
(Polanco +44 r)

Uggla -31, -9
Cantu -31, -32
Weeks -50, -48

(B Phillips -54)

I have no idea why my numbers don’t like Brandon Phillips.  He seems to rate better by everyone else’s stats, and doesn’t look like a bad fielder.

3B
Inge +65, +64
Feliz +59, +52
Rolen +56, +67
Lowell +45, +31
...
Beltre +29, +49

Braun -34, -36
Wright -34, -26
Teahen -51, -56
EE-5 -56, -29

SS
Everett +63, +80
Vizquel +42, +51
Tulo +35, +28
Escobar +35, +44
(Rollins +48 r)

Jeter -20, -30
...
Betencourt -25, -68
B Harris -28, -33
F Lopez -30, -47
Tejada -34, -25

One observation is that playing next to Betencourt for a few years really did a number on Adrian Beltre’s TZ.


#9    Rally      (see all posts) 2010/07/06 (Tue) @ 21:09

This should have a minimal effect on pitchers.  It really should not change the team defense numbers much, if at all.  There are still the same number of hits allowed, the new system is just doing a more precise job of assigning them.

On double plays, they are based on opportunities.  If a defense turns more DP than average then they get the WAR credit, if there are more DP’s turned because the pitchers induce more GB with runners on base the credit is theirs, in the form of fewer runs allowed.


#10    Brian Cartwright      (see all posts) 2010/07/06 (Tue) @ 21:21

Rally #6 - because I am integrating minors with MLB I am getting everything from Gameday, but I have a Retrosheet db as well and probably should calculate MLB fielding from Gameday pre 2005.

Colin, Peter and I analyzed the Gameday hit locations a year or two ago and I worked out a method for estimating home plate location that seems to match well what Peter did. I need to do the code the SQL in order to do each minor league park each year.


#11    Colin Wyers      (see all posts) 2010/07/06 (Tue) @ 21:31

To get fielder identities from Gameday data:

http://basql.wikidot.com/gameday-fielder-positions


#12    Rally      (see all posts) 2010/07/06 (Tue) @ 21:40

I was lazy on the home plate location, just assumed an average spot.  The further you are from home plate, the less the exact location affects the degree.  I’m interested in the angle of hits that make it to the outfield, and if the exact location is anywhere in the ranges that Peter listed in his THT article, you’ll only be off by a degree, maybe two.  Infield hits would be a different story.

On Brandon Phillips, TZ has him really bad (-20 runs) in 2006-2007, just a bit worse than average last 2 years.  In 2007, UZR and Dewan both have him as an outstanding fielder.  In 2007, I have 100 hits to the outfield going through his zone, the highest season total for any 2B.  I wondered if it was the raw stats (plays made and not made) that rate Phillips so low, or the adjustments.

The adjustments/buckets include batter handedness, runner on first or not, ball/strike count, and ballpark.  I created a table that gives a run value just on the totals, no adjustments at all.  That is a lot easier to look at for auditing than all the adjustment buckets.

Phillips was -19.9 before the adjustments that year and -20.4 after for 2007.


#13    Brian Cartwright      (see all posts) 2010/07/06 (Tue) @ 22:14

and a thank you to Colin, as last year he provided me with his code for deriving the defensive starters and subs from the Baseball on a Stick (BBOS) downloaded db of Gameday.

My current defensive subs code is nearly the same, but I join with my games table to add team_id.

My starters query is now much different, primarily because I use a cleaned up version of gameday.players as input. Again, I am able to include team_id. Colin’s version has gamename and homeaway as keys for joining, but having team_id made referencing these tables easier for me.


#14    Brian Cartwright      (see all posts) 2010/07/06 (Tue) @ 22:19

Oliver shows Philips as pretty average,
+1, +12, -6, +1, +2


#15    Guy      (see all posts) 2010/07/06 (Tue) @ 23:10

Very interesting results.  Uggla’s change is dramatic—do Ramirez and Jacobs look worse under new system?

I was expecting to see more big bumps to the best fielders, like Everett and Rolen got.  But I realize now that one reason a player can have a high rating in the old system is playing next to a good teammate.  THOSE players will decline (appropriately) in the new system.  It’s the subset of players whose high ratings were genuinely a function of their actual talent that should see their TZ improve.

Looking forward to some OF #s.....


#16    Rally      (see all posts) 2010/07/07 (Wed) @ 21:59

Jacobs looks a little worse.  Hanley actually looks better.  Not sure why that is without spending some time digging into it.

Did some correlations for players with at least 1000 innings in a season between old TZ and new.

1B .84
2B .78
3B .94
SS .89
LF .91
CF .90
RF .89


#17    Rally      (see all posts) 2010/07/07 (Wed) @ 22:01

Standard deviation for all positions, 1000 inning minimum:

9.9 old TZ
10.9 new


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 20:29
Who is Jeremy Lin?

Feb 11 20:11
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul