THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Sunday, May 11, 2008

What does it mean when a batter has had a bad or good April?

By , 08:14 PM

I did basically the same thing I did with pitchers, but this time for batters.  I used only BA because I used box scores to gather the April and non-April data.  Later I’ll use OPS or something like that.  Again, the data are from 04-07.

I also only looked at batters who were on the same team the year before the year with the good or bad April.

For the projections, I used a basic Marcel, weighting each season 25% more then the prior season.  I regressed based on 50% regression per 600 AB.  And I age adjusted by adding 2 point of BA for all players younger than 28, subtracting 4 points for older than 30 but less than 36, and subtracting 8 point for older than 35.

As with the pitchers, I broke all batter with at least 50 AB in April into two groups.  Those who hit less than .200 and those who hit more than .350.


The “good” group hit .375 in April in 4404 AB, and the bad group, .182 in 3478 AB .

The good group had 22,019 more AB for the rest of the season, and the bad group, 15,309.  There were 52 players in the good group and 48 in the bad group.  There could be duplicate players within each group or even across groups, as I am looking at each of 4 years separately.

Over the rest of the season, the good April group had a projected .282 average going into the season.  After they hit .375 in April, their projection went up to .286, after weighting the current year (the .375) 25% greater than all prior years.  How did they actually hit?  .295.  So they exceeded their updated projection by 9 points.

The bad April batters had a .272 projection going into the season, and .268 updated including their .182 in April.  They hit .269 for the remainder of the season, right around expected.

So, it looks like the batters are like the pitchers, in that better performance seems to indicate a change in true talent much more than poor performance.  In fact, poor performance does not seem to indicate a change in true talent at all, since giving the current season 25% more weight seems to be too much for this group.

It is looking more and more that for both batters and pitchers, using a “symmetrical” weighting system or even the entire algorithm, may not be correct, and that we have to do different things for players who who recent upticks in performance than we for players who show recent downticks.  Which makes some sense, as I explained in the pitcher thread.  It is probably more likely (much more, I think) that a player, especially a younger one, improves his true talent than he gets worse in true talent.  In fact, if we restrict the above sample to players 33 and over, the good players do NOT outperform their projections anymore.  A simple Marcel works for both the good and bad ones.  For players younger than 33, the good ones outperform their Marcel by 13 points!  (The bad ones actually outperform by 3 points.

Keep in mind that when splitting the players up by age, we really start getting into small samples, so sample size caveats apply.

I want to take this opportunity to remind everyone that while we are used to using “symmetrical” projection algorithms, the correct algorithm is a discrete Bayesian problem, and the symmetrical ones we use are an approximation that assume that a player, for example, has the same chance of improving his true talent as he does for getting worse in true talent (aging aside).  Not to mention that these algorithms also assume that the chance that a player is a “true” .220 hitter is the same as the chance that he is a “true” .300 hitter (assuming a mean of .260), which is also not true of course.  One of these days, we will use a “discrete” non-symmetrical Bayesian algorithm for projections.  For example, the general Bayesian problem that should be asked, is “Given what we know of the distribution of true talent in baseball for any given age, and given what we know about the distribution of how much and how often players change their true talent, both for the good and bad, what do we estimate this player’s true talent as, given a series of sample performances over X amount of time, and given his age (at least)?”

As I said, using a “symmetrical” and very generic approximation, as Marcel, and any-Marcel-like projection system does, may not be as good as we thought.

Later I will redo the whole thing using OPS.

(9) Comments • 2008/05/13 • SabermetricsForecastingStatistical_Theory
Page 1 of 1 pages

<< Back to main