Monday, April 30, 2007
Adjusting College Stats
Jeff and Kent look at college hitters, and do something exciting:
In short, a competition adjustment needs to happen on a plate-appearance-by-plate-appearance basis. So that’s what we did.
I think it’s fairly clear that if Preston Wilson and Derek Jeter (both drafted as 18-yr olds in the same draft year and round) face 16-yr old punks, we’re not going to learn much about which of these two hitters is better. Whatever holes these hitters may have can only be exploited by guys with the stuff to take advantage of that. As you go up the chain, this chasm in talent level is reduced.
Some people believe that there is still a chasm between AA/AAA and MLB, hence supporting their belief that guys that can hit minor league pitching can’t hit major league pitching. But, the reliability of MLE is strong enough to refute some (though not all) of that belief.
For people who really want to see this chasm, looking at PA-by-PA high school and college stats would be the way to go. If you can show that such a chasm does exist at those levels, and it doesn’t exist at the MLB level, then you have to ask to the extent that it exists in between. How much is the chasm in A, AA, AAA and the other post-college leagues? I think it’s fairly easy to say “not much”, but that’s not a real answer.
It is very important to use regressed SOS adjustments and not the actual ones, especially for small samples of data. I am assuming that Jeff does not use the regressed stats of opponents to do the adjustments, but I could be wrong. That can really screw up the data. For example:
Let’s say that in MLB you are 3 games into the season and you want to adjust each team’s w/l record by their strength of schedule. Well, after only 3 games, some teams have faced teams that are 3-0, 0-3, etc. What do you think would happen if you used these 3-0 and 0-3 records to adjust each team’s w/l record? Complete havoc. Same thing to a lesser degree after 10 games, etc. Not to mention the fact that there might not even be much spread of talent in the competition in the first place (which will be “factored into” the regression). Let’s say that you knew that every team were almost exactly the same strength. IOW, if you did regress the competition’s stats, the regression would be close to 100%. If you used actual stats to do the SOS adjustments, even after a lot of games, you would FALSELY be adjusting everyone’s stats, since the difference in the stats of each player’s competition would by definition be random flucs.