THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, April 24, 2007

TotalZone, a new fielding measure

By Tangotiger, 10:30 AM

Chone rolls up his sleeves:
http://mvn.com/mlb-stats/2007/04/23/totalzone-a-new-defensive-measure/


#1    uyyu      (see all posts) 2007/04/25 (Wed) @ 00:28

It would be great to do this type of research...but not having micrsoft access makes it impossible.  “Baseball Hacks” keeps collecting dust by the minute...:/


#2    Rally      (see all posts) 2007/04/25 (Wed) @ 09:04

You don’t need Access, actually Baseball Hacks has more examples for MySQL than Access.

I have Access 2003 on my desktop (allowed to have one copy through my work) and Access 97 on my laptop.  Once you convert to 2003, you can’t share with 97, but I have yet to find a difference in the types of queries I can run.

You should be able to get a legit copy of 97 pretty cheap, and even cheaper if you go the pirate route, which I would never do or condone since it is a mortal sin to withhold potential profits from Bill Gates.


#3    Chris Miller      (see all posts) 2007/04/25 (Wed) @ 12:53

MySQL works good, Open office has a database program that’s similar to access.  I use Access as a front-end to sql server for Retrosheet.  I can do SQL queries, and it does all the formatting auto-magically and can be easily copied or exported into excel.


#4    uyyu      (see all posts) 2007/04/25 (Wed) @ 20:51

MySQL isn’t as easy as access.  I can’t even get the database loaded into it.  If any of you can help, I’d appreciate it.


#5    Chris Miller      (see all posts) 2007/04/26 (Thu) @ 13:15

Check this out:
http://www.modwest.com/help/kb6-253.html


#6    tangotiger      (see all posts) 2007/12/14 (Fri) @ 11:26

Rally offers more data:
http://lanaheimangelfan.blogspot.com/2007/12/2007-total-zone-shortstops.html


#7    Tangotiger      (see all posts) 2008/01/10 (Thu) @ 11:40

Rally explains the system:
http://www.hardballtimes.com/main/article/measuring-defense-for-players-back-to-1956/

I occasionally don’t know where a ball was hit, or whether it was on the ground or in the air

This is exactly why I didn’t both looking at whether the balls was a GB or FB, since the data was incomplete.  Now, Rally offers a way to try to estimate those numbers (I was also trying to stay away from estimating anything, but I would definitely consider including that in mine):

For example, Rod Carew in his career made 4,288 outs that meet my definition of a play made by one of the seven fielders. Of these outs, 28 percent were made by the second baseman, 18 percent by the shortstop, and 3.5 percent by the right fielder. Thus, whenever Carew gets a hit but I don’t know where, I charge 0.28 hits to the second baseman, 0.18 hits to the shortstop, and 0.035 hits to the right fielder. I repeat this routine for each such case.

However, there’s no reason that the hit and out distribution will be the same.  In fact, they can’t be.  After all the out rate on FB is higher than the out rate on GB.

So, all we need to do is to make a mapping of hit rate “ownership” based on the out rate.  And, since we have both pieces of data in the 2003-2007 data, we can try to figure it out.

If for example, the average hitter gets 12.5% of his outs recorded by the SS and 10.5% of his hits go through the SS zone of responsibility, then we would use these adjustment numbers.

In the Carew case above, 18% of his outs were recorded by the SS.  I would therefore presume that 16% of his hits would be the responsibility of the SS.

I doubt it makes much difference, but I think it makes it a bit better.


#8    Tangotiger      (see all posts) 2008/01/10 (Thu) @ 15:59

Rally, I saw your comment at BTF: the untangling of Jim Palmer and his fielders is easily solved with the WOWY approach.  I was going to take a look at this with Cal Ripken, at some point soon.

For example, in the annual, I looked at each SS and compared how their pitchers did with and without that SS.  But, in the case of Ripken, alot of his pitchers also had Mark Belanger.  So, Ripken may be unfairly compared.  I have to keep going iteration by iteration, until I get some stability.

This is exactly like the SoS (strength of schedule) issue in college.


#9    Rally      (see all posts) 2008/01/11 (Fri) @ 10:32

That would be difficult to put into a database that’s trying to calculate the values for all players.  At least I think it would be, and wouldn’t know how to do it.

What I was thinking to put in as an adjustment is to take each pitchers hits in play above/below team level, on a career basis, and regress the values.  Then I adjust the hit values based on the pitcher’s career difficulty to hit.


#10    Tangotiger      (see all posts) 2008/01/11 (Fri) @ 10:56

Not difficult, just very involved.  I’ve got it partially setup, but man, it takes a long while to run.  One day, I don’t know when, I’ll eventually release all my code.  Maybe a bit at a time.

***

I definitely prefer the WOWY approach, since I don’t want the player compared to himself.  But, I can see a day where I’ll apply adjustments are you are doing to finally get the big-bang overall results.  The problem is that once you start applying adjustments, I lose 90% of the readers.  “Oh, you did some adjustment where Vizquel gets compared to a higher baseline because you said this and that and… whatever.” I don’t blame readers for this.

As much as possible, we should strive for factual data.  Results with adjustments should be presented in concert with, not in lieu of.  It’s for this reason UZR gets as much resistance as it does, and why Dewan’s plus/minus, an inferior system in that it considers less data, is more palatable.

Telling people how Boggs hit at Fenway and elsewhere is one thing.  Presenting an “adjusted batting average” is quite another… unless you ALSO present his Fenway and elsewhere averages.


#11    studes      (see all posts) 2008/01/11 (Fri) @ 11:07

As much as possible, we should strive for factual data.  Results with adjustments should be presented in concert with, not in lieu of.  It’s for this reason UZR gets as much resistance as it does, and why Dewan’s plus/minus, an inferior system in that it considers less data, is more palatable.

Very well put, Tango.  I think of two camps of people: those who are exploring “truth” and don’t care about convincing a lot of people (like MGL) and those who are trying to get the truth out to more people (like John Dewan).  The latter requires a simpler approach, like what I tried to do with my hitting evaluation article yesterday.


#12    Rally      (see all posts) 2008/01/11 (Fri) @ 14:20

Good point - and I enjoyed that article Studes, its the kind of article I’d like to point beginners towards.

Spending as much time here and BTF as I do makes it hard to appreciate the level of baseball understanding out there, even among otherwise intelligent people who really like baseball, but have a very limited grasp of how to properly value players.  That point was driven home last night listening to people moan about Jim Rice not getting into the HOF.  I didn’t even bring up Tim Raines.  I saw the mountain I’d need to climb, the time I had to do it in, and just had a beer instead. 

I’m kind of torn between the two camps.  The main reason I did this was to satisfy my own curiousity, but once I’ve got a system I wanted to share it.


#13    Tangotiger      (see all posts) 2008/01/11 (Fri) @ 14:33

I’m in the same boat as you Rally.

I really enjoy looking at the Fielding Bible and the BJ Handbook, because the counting data is logically organized, and allows me to manipulate it to meet my needs.

Presenting final numbers, which itself also has value, is in a sense, the final step.  There’s not much more you can do with it, other than nod or shake your head.

So, it’s not really a negative criticism.  I do it alot, in terms of just presenting the final numbers.  Just something to be aware of.



#15    Tangotiger      (see all posts) 2008/03/26 (Wed) @ 16:52

MGL gets around 1 SD = 10 runs per 162 G, more or less.

Taking guys with 41 to 80 games, 81 to 120 games, and 121+ games, here’s the average number of games, and the standard deviations of runs per 162 G:

avgG SD
57 17.1
101 14.3
144 13.0

Doing some fancy math, the “True talent” spread is roughly 11 runs per 162 G.

***

Another way to get here is to look at the spread of runs allowed per game.  That’s roughly 1 SD = 0.59 runs allowed per game, or roughly 1 SD = .015 runs per PA, or roughly 1 SD = .013 wOBA.

With 6300 PA per team, that makes the random spread as 1 SD = .0064 random wOBA, implying a 1 SD = .011 true wOBA.

That true wOBA spread would be between the pitchers, fielders, and parks.  Let’s say that makes it .009 for pitchers, .005 for fielders, and .004 for parks.  That is, the square of each of those, added, gives you the square of .011.

(I have the spread for pitchers higher than for the fielding team, because 30% of the time, the fielders aren’t involved in a play.)

All just reasonable guesses at this point.

So, .005 per PA times 39 PA times 162 G divided by 1.15 gives you 1 SD = 27 true fielding runs per 162 G.

And if each of the 8 fielding positions has a spread of 1 SD = 9.5 runs, then a team of such fielders will be 9.5 squared times 8, then square root = 27 runs.

***

As you can see, MGL’s UZR of 1 SD = 11 runs conforms very closely to my off-the-cuff expectation of 1 SD = 9.5 runs.

So, as long as your spread is 1 SD = 10 runs per 162 G of true talent or so, I’m happy.

That means that your spread needs to be *higher* than that, for a sample.


#16    Rally      (see all posts) 2008/03/26 (Wed) @ 19:40

If I take the top seasons by total chances (I don’t have a games measure:

Using average chances ~= 500 the SD is 8.5 runs (about 3500 player-seasons)

With average chances ~=600 (1200 players) it’s 9.1

Zone rating is 9.3 at average chances = 450 (800 players).  I would call them very similar, as players will have more TZ chances in a season than ZR chances (since the former counts all balls in play)

I’m happy with the spread, as it errs on the conservative side.  If I got spreads that look like Palmer’s fielding runs (Nap Lajoie 45-50 runs some years) I probably would have given up.  It’s not an accurate enough system to put extreme numbers out there and expect people to take them seriously.


#17    tangotiger      (see all posts) 2008/03/26 (Wed) @ 19:51

Definitely on the low side, agreed.  If the true talent should be 10 runs, then the observed should be around 12 or so, as UZR shows.

Pretty good though, and you are right, if you are going to err, err on the low side, not the high side.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 09 16:41
Sabermetric Moves of the 2009 Pre-Season

Jan 09 19:56
Modeling Baseball Player Ability with a Nested Dirichlet Distribution

Jan 09 18:08
Line Drives

Jan 09 18:04
Challenging Nate Silver (and all other forecasters)

Jan 09 17:31
Cheers

Jan 09 17:14
Teaching sabermetrics at school

Jan 09 16:51
The first Hardball Times Annual available for download!

Jan 09 14:44
Vote for the Worst Player in MLB

Jan 09 12:29
Clint Eastwood is Archie Bunker

Jan 09 12:16
Mailbags on Parade