THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, September 30, 2010

Using injuries for forecasting

By Tangotiger, 09:17 AM

Now here’s some new ground being broken:

As an example, let’s consider hitters who went to the disabled list with an injury to the lower arm (hand, wrist, or forearm). It’s widely accepted in baseball that wrist injuries have a lingering impact on a hitter’s ability to hit for power. This gives us 77 hitters to study, with 32,763 total plate appearances the following season.

Using the same method we used to look at Ichiro Suzuki yesterday, we can come up with an expected batting line for these hitters. As a group, weighted by playing time, they were expected to hit .266/.333/.427 the following year. Instead, they hit .270/.344/.439.  So we can see that these hitters as a group tended to exceed their baseline forecasts.

It will be interesting to see how much is real (causation) and how much is best-fitting (correlation).  But, definitely, this is one of those untapped areas.  I have to believe that this will impact pitchers far more than non-pitchers.  This will be pretty fun to watch develop.

And, this warms my open source heart:

So what we’ve done is taken a publicly accessible injury database, created by Josh Hermsmeyer of RotoBase, and worked on proofing it and improving it for incorporation into PECOTA. (Once we’ve finished updating the database, we will be releasing it at some point during the offseason, for other researchers to use.)


#1    JEH      (see all posts) 2010/09/30 (Thu) @ 09:53

The database will be fun to look at.

I have made adjustments manually in the past (upwards for past performance for players who played through an injury and downwards for future performance for players coming off certain injuries), but it has always been a messy process because I did not have comparable information for all players.


#2    MGL      (see all posts) 2010/09/30 (Thu) @ 14:25

I have not RTFA yet, but this looks like great stuff, and yes, an area that has been untapped so far - how do past injuries affect projections?  There would be 3 avenues I would think.  One, the player has been injured for a while and thus his historical stats are depressed, such that when he is healthy his performance will be better than his Marcel.  Two, he has an injury, usually an acute one, such that when he comes back he is not 100% and his performance will be worse than his Marcel, and the trick will also be to figure out the trajectory of the that performance in terms of getting better and better as he recovers over time. Third, does a certain kind of injury or for certain players does an injury cause a permanent (or very long term) decline in performance, such that his performance will be worse than his Marcel, at least for a while (until his pre-injury stats are not longer relevant).

And I agree that for the pitcher this type of thing might be more significant, but probably more tricky to analyze as well.


#3    Tangotiger      (see all posts) 2010/09/30 (Thu) @ 14:42

Right, you need some sort of “recovery rate”, and “level of recovery” parameter.

So, say you have a pulled hamstring.  You might recover back to 100% say 80% of the time, and the 20% of the time you don’t recover fully, you recover to the 60% level.

And maybe you distinguish between a speedster and not, the past frequencies, etc.  You really have to set up some sort of actuarial table.


#4    MGL      (see all posts) 2010/09/30 (Thu) @ 16:18

A little off-topic, but this quote from Colin in the comments section of the article is important.  Those of you who are not that well-versed in statistical concepts should make sure you understand this.  We see questions all the time regarding, “How large of a sample is large enough to make reliable statistical inferences?”

Here is the answer from Colin:

I don’t like making a binary distinction between significant and not significant when it comes to sample size. Larger sample sizes are obviously more useful than smaller sample sizes, of course.

So you have to look at two things - the size of the sample and the magnitude of the effect. A large magnitude in a small sample can tell you something important, you just can’t treat it as being as important as an effect of that magnitude over a larger sample.

I’ll add one thing:  A small effect over a large sample can be quite significant as well.

I have a friend who knows just a little about statistical concepts. He often falls into the trap of ignoring any empirical result from a very small sample size.  For example, let’s say that in 10 observations, we get a sample mean of .8.  And let’s say that the null hypothesis expects a .5 mean.  My friend would say that the .8 is meaningless, because, “Anything can happen in 10 observations.” I have to remind him that .8 is actually 2 SD away from .5 in 10 observations, which means that it might be quite significant.  I also have to remind him that it is exactly (well, not quite) equivalent to 55% in 500 observations.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 15:28
Largest demonstration in Canadian history?

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 15:02
Pete Palmer’s new book: Basic Ball

May 25 14:44
What sabermetrics is NOT

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion