Wednesday, October 11, 2006
Zone Rating
When I was looking for fielding data to generate my Scouting Report ballots, it took me a while to find a good source. Some places merged guys like Abreu into one line, or ignored one of his records, or put multi-position records into 1, etc. Everything wrong about what you’d want a database, and these sites had it. Except SI. They had both the data split the right way, but also in an easy to use format. If we look at their SS data:
And select all SS with “Ch” greater than 81, we have 44 records. It looks like “Ch” is chances, but not in the PO+A+E chances, but actual chances in their zone of responsibility. Or more likely, balls in their zone, plus outs made outside their zone, to be true to the classic definition of STATS, the data supplier to SI.
Anyway, let’s get back to our 44 SS. Ideally I’d split them up by league, but, whatever. I’m only trying to make a general point. Someone else can do the heavy lifting. We have the ZR for each SS, the Chances, and we can calculate the population average ZR easy enough (.831 for our group of 44). All we want to do is figure out each SS z-score. The random standard deviation is simply sqrt(.831*(1-.831)/Ch) for each SS. Do that. Figure out each SS ZR relative to the .831. The respective figures for Adam Everett is .017, .074. Divide one by the other to get the z-score. Everett is 4.4. To get a feeling of z-score, just remember that in a random distribution, 68% are within a z-score of -1 to +1, and 95% are at -2 to +2. So, Everett is really out there. Do this for all the SS. Then, take the Standard deviation of those z-scores. If ZR was perfectly random, the answer will be something close to 1. We get 1.37, which is significant. Of course, with such an outlier as Everett, it’s not much of a surprise. It would be more beneficial to do this with multi-year of data, or to also include 2B and 3B.
Nevertheless, what can we do with this 1.37? Lots!
Regression toward the mean is 1/1.37^2, or .53. Pretty nifty, eh?
Next, figure out the average number of Ch. A simple average is 334, but the way Andy does it in The Book, it’s 239 (*). So, at 239 chances, the regression amount is .53. Our equation of x/(x+Ch) makes x=270 . So, with 270 chances, you regress 50% toward the mean. 270 chances is 723 innings, or 80 games, or half a season. This is a great result!
Now, the heavy lifting. Repeat this for all positions, and split it up by league. Group results for IF, OF, 1B, C, P. Tell me what you get.
***
The cool part is that you can now figure out how much to regress each player’s ZR! Continuing with Everett, we regress his ZR by 35.7% toward the mean. Since his ZR was +.074, this becomes +.048. That’s his number. That’s our best guess, using this data, as to the true talent of Adam Everett. He makes +.048 more outs per play than the average of these 44 SS. With about 540 plays in a full season, that’s +26 plays, or about +20 runs.
***
(*) To figure out the average number of chances, you do 1/average(1/ch1+1/ch2+...+1/chN). If you do this in Excel, create a column called 1/Ch, and put that formula there. Then, just do
=1/average(myCol:myCol)
, with myCol simply being the Column in Excel with the formula, like Q or something.