THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Monday, October 27, 2008

Do first_half/second_half splits mean anything?

By , 08:11 PM

I thought the results were important enough to warrant its own thread, rather than continuing the last one on first half, second half splits.  If you didn’t follow the discussion in the thread I was talking about, here is the link:

http://www.insidethebook.com/ee/index.php/site/article/first_half_second_half_splits/

To summarize the discussion, a few people pointed out some unusually large first half, second half splits in performance for some players.  The discussion is whether for players in general, those splits “mean” anything, which is the same thing as asking whether they have any predictive value, which is the same thing as asking whether they correlate to any degree from one year to another.  For example, we find that platoon splits for RHB have very little predictive value.  No matter what a RHB platoon splits are in any given time period, they will tend to revert to near league average for all RHB in any other time period.  For LHB, there is some predictive value - the larger the sample size of data we have, the more predictive those sample results are.

For RHB (since there is some predictive value), we might need 10 years of split data to “tell us anything” about that player’s true talent platoon ratio or difference.  For LHB, it might be 2 or 3 years of data.

Anyway, I was skeptical that any sample of first half and second half splits means anything, i.e., has any predictive value.  Of course, even if there is a tiny amount of predictive value in any sample data, if the sample is large enough we eventually get tremendous predictive value. But, in baseball, we really only get to use one year at the least and maybe 5 or 10 years at the most, worth of data to have any practical significance, of course.  If we have to wait until we get 15 or 20 years of data for it to have much predictive value, that is not particularly interesting, to me at least.

Anyway, one way to see how much predictive value there is in a certain amount of data, we can run a regression from one time period to another.  If the correlation is really low, then there is little or no predictive value to that particular stat for that amount of data (the number of opportunities underlying each element in the regression).  Hopefully we have enough data (both data points in the regression and a decent sample size for the underlying number of opportunities for each data point), such that the uncertainty in the resultant “r” is fairly low (a small standard error).

I did such a regression on first half, second half splits. Here is the methodology and the results: 

Read More

(15) Comments • 2008/11/04 • SabermetricsStreaks
Page 1 of 1 pages

Latest...

COMMENTS

May 25 19:41
What sabermetrics is NOT

May 25 19:41
Pete Palmer’s new book: Basic Ball

May 25 19:38
“Why Kickstarter works”

May 25 17:32
Largest demonstration in Canadian history?

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

October 27, 2008
Do first_half/second_half splits mean anything?