Sunday, May 11, 2008
What does it mean when a batter has had a bad or good April?
I did basically the same thing I did with pitchers, but this time for batters. I used only BA because I used box scores to gather the April and non-April data. Later I’ll use OPS or something like that. Again, the data are from 04-07.
I also only looked at batters who were on the same team the year before the year with the good or bad April.
For the projections, I used a basic Marcel, weighting each season 25% more then the prior season. I regressed based on 50% regression per 600 AB. And I age adjusted by adding 2 point of BA for all players younger than 28, subtracting 4 points for older than 30 but less than 36, and subtracting 8 point for older than 35.
As with the pitchers, I broke all batter with at least 50 AB in April into two groups. Those who hit less than .200 and those who hit more than .350.
The “good” group hit .375 in April in 4404 AB, and the bad group, .182 in 3478 AB .
The good group had 22,019 more AB for the rest of the season, and the bad group, 15,309. There were 52 players in the good group and 48 in the bad group. There could be duplicate players within each group or even across groups, as I am looking at each of 4 years separately.
Over the rest of the season, the good April group had a projected .282 average going into the season. After they hit .375 in April, their projection went up to .286, after weighting the current year (the .375) 25% greater than all prior years. How did they actually hit? .295. So they exceeded their updated projection by 9 points.
The bad April batters had a .272 projection going into the season, and .268 updated including their .182 in April. They hit .269 for the remainder of the season, right around expected.
So, it looks like the batters are like the pitchers, in that better performance seems to indicate a change in true talent much more than poor performance. In fact, poor performance does not seem to indicate a change in true talent at all, since giving the current season 25% more weight seems to be too much for this group.
It is looking more and more that for both batters and pitchers, using a “symmetrical” weighting system or even the entire algorithm, may not be correct, and that we have to do different things for players who who recent upticks in performance than we for players who show recent downticks. Which makes some sense, as I explained in the pitcher thread. It is probably more likely (much more, I think) that a player, especially a younger one, improves his true talent than he gets worse in true talent. In fact, if we restrict the above sample to players 33 and over, the good players do NOT outperform their projections anymore. A simple Marcel works for both the good and bad ones. For players younger than 33, the good ones outperform their Marcel by 13 points! (The bad ones actually outperform by 3 points.
Keep in mind that when splitting the players up by age, we really start getting into small samples, so sample size caveats apply.
I want to take this opportunity to remind everyone that while we are used to using “symmetrical” projection algorithms, the correct algorithm is a discrete Bayesian problem, and the symmetrical ones we use are an approximation that assume that a player, for example, has the same chance of improving his true talent as he does for getting worse in true talent (aging aside). Not to mention that these algorithms also assume that the chance that a player is a “true” .220 hitter is the same as the chance that he is a “true” .300 hitter (assuming a mean of .260), which is also not true of course. One of these days, we will use a “discrete” non-symmetrical Bayesian algorithm for projections. For example, the general Bayesian problem that should be asked, is “Given what we know of the distribution of true talent in baseball for any given age, and given what we know about the distribution of how much and how often players change their true talent, both for the good and bad, what do we estimate this player’s true talent as, given a series of sample performances over X amount of time, and given his age (at least)?”
As I said, using a “symmetrical” and very generic approximation, as Marcel, and any-Marcel-like projection system does, may not be as good as we thought.
Later I will redo the whole thing using OPS.
I redid the above study with OPS, and for 8 years, 00-07.
For the bad April players, 95 of them, they had a collective .490 OPS in April and posted a .744 OPS for the rest of the year. Their projected OPS before the season was .764 and after April, it was updated to .743. So again, having a dismal April seems to have no predictive value other than as it changes the Marcel.
I used 2X as the weighting for the current year as compared to the pre-season weighted OPS (that is less than weighting the current year 2X the previous year). I also did a fairly crude age adjustment, as with the BA study.
I used 500 PA as the 50% regression point.
The good April players once again showed a completely different pattern. They were a collective 1.159 in April, and then performed at a .928 for the rest of the season.
Their pre-season projection was .877 and then .900 after the torrid April. So they outperformed their projections by 28 points in OPS!
Who were these dastardly over-achievers, and in what year did this breakout occur?
For some reason, the hot April guys averaged 30.2 years, while the cold April guys were 30.3, both a little on the old side. (BTW, I screwed the ages up in the last, BA, study, and the pitcher study. I had each group too old, as the ages I was using were everyone’s ages in 2007, not the year they were hot or cold in April.) We might expect the cold April guys to be a little old, and losing their skills, but the hot ones? I REALLY hope there is no suggestion that these guys starting juicing in the hot April year. Let’s look at the players (I have not looked at the names yet) to see if there is any indication that they are/were.
Wow, it is an ugly list! It looks like a portion of the Mitchell report! I am going to “star” each player either identified as a PED user or suspected of being one, even if it is just through rumor or innuendo (with my apologies to their mothers and children). The number in parentheses is how many times they show up on the list (number of years, as you can only be on the list once per year).
Of course once you are on the list once, you tend to be on it the next year, as Marcel still does not “know” that your true talent level has changed so radically (28 points or so), even though you have one full year of high level performance.
E. Alfonzo, 2000, 27
Alou (2), 2004, 2006, 38, 40
Bagwell *, 2000, 32
Barmes, 2005 (26)
Berkman, 2006 (30)
Bonds * (6), 2000-2004, 2007, 36-40, 43
Branyan, 2005, 30
Broussard, 2006, 30
Buck, 2007,27
Cabrera, 2007, 24
Caminitti *, 2000, 37
Casey, 2004, 30
Catalanato, 2006, 32
Chavez, 2006, 29
Daubach, 2000, 28
Delgado *, 2000, 28
Drew, 2000, 25
Dunn(2), 2004, 2005, 25,26
Dye, 2000, 26
Edmonds * (3), 2001-2003, 31-33
Ensburg, 2006, 31
Erstad, 2000, 26
Fullmer, 2003, 28
Giambi * (2), 2001, 2006, 30,35
Glaus *, 2000, 24
Gomes, 2006, 26
Luis Gonzalez *, 2001, 34
V Guerrero (3), 2000, 2002, 2007, 24, 26, 31
Hafner, 2006, 29
Hall, 2006, 27
Hawpe, 2006, 27
Helton (4), 2000, 2003-2004, 2006, 27, 30, 31, 33
Hunter, 2002, 27
Jenkins *, 2001, 27
Jeter, 2006, 32
Charles Johnson, 2004, 33
Chipper Jones (2), 2001, 2005, 29, 33
D Lee, 2005 ,30
Lowell, 2002, 28
V Martinez, 2006, 28
McGwire *, 2000, 37
Mientkiewicz, 2001, 27
Piazza (3), 2000, 2001, 32, 33, 34
Posada, 2004, 33
Pujols (2), 2003, 2006, 23, 26
H Ramirez, 2007, 24
Manny (2), 2002, 2004, 30, 32
Rios *, 2006, 25
Brian Roberts *, 2005, 28
A-Rod (2) *, 2003, 2007, 28, 32
Rowand, 2007, 30
Sheff *, 2000, 32
Shelton, 2006, 26
Sosa *, 2002, 34
Swisher, 2006, 26
Tatis *, 2000, 25
Tejada (2) *, 2005, 2006, 29,30
Thome (2), 2004, 2007, 34, 37
Upton, 2007, 23
Vander Wal, 2001, 35
Walker, 2001, 35
V Wells, 2006, 28
C Wilson, 2004, 28
That is 17 of 52 players I have starred, or 33%, although some may disagree with the inclusion of A-Rod, Edmonds, Delgado, or L-Gon. I am only including A-Rod because Canseco has somewhat implicated him and so far Canseco has not been proven wrong about anyone.
Of the “bad April guys,” Beltre, Cameron, Delgado, and Tejada are known or suspected users, or 5 out of 89, or 6%.
Just for the heck of it, let’s remove the 17 suspected steroid users form the database and see what happens…
Wow! With the same weighting, the projected OPS of the now 60 player seasons, after the hot April, is .895 and the actual is .889!
Before the difference was 40 points between projected and actual, and now, it is only 4 points! Wow! Pretty scary.
(For the bad April guys, with the PED players removed, we have projected .738 and actual .736.
If I only remove those names who either admitted or tested positive (Bonds, Caminitti, Giambi, Sheff, Sosa, Tatis, Tejada, Roberts, Genkins, McGwire, Glaus, and Rios), we get .891 projected and .906 actual, or a 14 point difference, down from 28 points.
BTW, thinking about guys like Thomas and Edmonds and other older players who have been released after bad starts, let’s see if a terrible start by an older player has any more significance than one by a younger player: Of course we are just looking at stats. It is not inconceivable that a team and its scouts, coaches, and manager, can “see” if an older player has lost his skills or is just having a fluke bad month.
Anyway, for players older than 33, and we have 24 of them, and after batting .497 (OPS) in April, they have a projected rest of season of .740 and an actual one of .737. A little less than expected but not really much difference.
Of course, these are the guys that WERE allowed to play, so who knows about the guys who were NOT?
For the younger (less than 30) players (N=44) with a bad start, they were projected at .737 and hit .725.
That is a little weird. One reason for that is probably that I am regressing towards a league average player, regardless of age. I should be regressing towards the mean for that age, which for these players age 26.4 is lower than an average player.
If we split the “good April” guys into under 30 and over 29, while removing all the steroid guys, we have projected .872 and actual .873 for the younger ones (N=33), and .901 and .910 for the older ones (N=27), which is fairly interesting.
Anyway, all told, if we remove known and suspected PED players from our sample, I don’t see much evidence that a hot or cold April has much if any predictive value for any class of players that I have looked at.