THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, April 30, 2007

Adjusting College Stats

By Tangotiger, 10:20 AM

Jeff and Kent look at college hitters, and do something exciting:

In short, a competition adjustment needs to happen on a plate-appearance-by-plate-appearance basis. So that’s what we did.

I think it’s fairly clear that if Preston Wilson and Derek Jeter (both drafted as 18-yr olds in the same draft year and round) face 16-yr old punks, we’re not going to learn much about which of these two hitters is better.  Whatever holes these hitters may have can only be exploited by guys with the stuff to take advantage of that. As you go up the chain, this chasm in talent level is reduced. 

Some people believe that there is still a chasm between AA/AAA and MLB, hence supporting their belief that guys that can hit minor league pitching can’t hit major league pitching.  But, the reliability of MLE is strong enough to refute some (though not all) of that belief.

For people who really want to see this chasm, looking at PA-by-PA high school and college stats would be the way to go.  If you can show that such a chasm does exist at those levels, and it doesn’t exist at the MLB level, then you have to ask to the extent that it exists in between.  How much is the chasm in A, AA, AAA and the other post-college leagues?  I think it’s fairly easy to say “not much”, but that’s not a real answer.


#1    MGL      (see all posts) 2007/04/30 (Mon) @ 16:04

It is very important to use regressed SOS adjustments and not the actual ones, especially for small samples of data.  I am assuming that Jeff does not use the regressed stats of opponents to do the adjustments, but I could be wrong. That can really screw up the data.  For example:

Let’s say that in MLB you are 3 games into the season and you want to adjust each team’s w/l record by their strength of schedule.  Well, after only 3 games, some teams have faced teams that are 3-0, 0-3, etc.  What do you think would happen if you used these 3-0 and 0-3 records to adjust each team’s w/l record?  Complete havoc.  Same thing to a lesser degree after 10 games, etc.  Not to mention the fact that there might not even be much spread of talent in the competition in the first place (which will be “factored into” the regression).  Let’s say that you knew that every team were almost exactly the same strength.  IOW, if you did regress the competition’s stats, the regression would be close to 100%.  If you used actual stats to do the SOS adjustments, even after a lot of games, you would FALSELY be adjusting everyone’s stats, since the difference in the stats of each player’s competition would by definition be random flucs.


#2    tangotiger      (see all posts) 2007/04/30 (Mon) @ 17:23

It’s also unclear how the RUN park factors were used.  A run park factor is not the same as a component park factor.  Run park factors are runs per out, while component park factors are event per opportunity (like HR per PA).  Typically, the component park factors are the square root of the run park factors.

And of course, just as I’ve shown that Larry Walker, Dante Bichette, and Juan Pierre did not have (nor should we have expected them to have) anything close to the same benefit of playing at Coors, you’ll have the same situation in college.


#3    MGL      (see all posts) 2007/04/30 (Mon) @ 19:01

Plus I’m not sure if those park factors are regressed either.  They really need a lot of regression, as the data size is small and not very good quality either.  Of course it would help if he (Boyd) knew some of the characteristics of the parks in which case you would not want to regress everything towards 1.0.

And as I always say, even if parks don’t effect everyone equally, which they don’t, using SOME degree of a park factor for everyone is better than not using anything, especially when parks are quite different, as they can be in college.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 15:29
Line Drives

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors

Jan 06 21:23
Coaching your son, or against him?

Jan 06 11:04
Dual Positions, using bUZR

Jan 05 23:05
Cheers