THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Wednesday, February 08, 2012

New PECOTA

By Tangotiger, 11:22 AM

This was quite a surprising claim from Colin:

These values are on a very different scale, since due to the lack of an intercept the values have to sum to one for the first regression and to three for the second regression, but they’re also very different in a more meaningful sense; recasting the first year to 1 (which is practically already done for us), we get weights of 1/.92/.90.

As you know, Marcel uses 5/4/3 for hitters (meaning 1, .8, .6) and 3/2/1 for pitchers (1, .67, .5).  (I think it was 3/2/1… can’t confirm right now.)

I personally use .9994^daysAgo for hitters and .9990^daysAgo for pitchers, which has the effect of being 1, .8, .64, .512 (and so on, each 80% of the previous) for hitters, and 1, .7, .49 (and so on, each 70% of the previous) for pitchers.

Tests from other research makes me think that it should be even more aggressive, so maybe 1, .7, .5 for hitters and 1, .5, .25 for pitchers.  But, I haven’t researched that, so, I’ll just leave it there for now.

Colin has gone way to the other side, essentially going with a .9998 or .9999^daysAgo kind of model.

Now, I agree with the framework for his testing, that you should and must include the PA component when establishing the weights.  Frankly, this is an important step.  When I did it for Marcel, I basically forced everyone in the system to have at least 300 PA, so that I didn’t have to worry about this portion too much (I should have worried a little about this, at least).  Indeed, if you give everyone at least 500 PA for each of the three years, this step becomes basically unimportant (no worries at all).  That’s because the weighting of each year (the PA of year 1 divided by the PA of years 1 + 2 + 3) will be the same for each wOBA of year 1, 2, and 3.

So, getting back to Colin’s important point: he’s saying that if you introduce the PA weighting component, we see that every year is important.  I find this very hard to believe.  I mean, it’s an exciting finding if true, and I’d like to see more research on this for sure.  My guess at the moment is that there’s a selection bias issue, with guys of limited number of years, or for young guys. 

Basically, does Colin’s finding apply across-the-board, or is it really limited to a subset of the population?  I’d bet on the latter, and I’d bet that the Marcel 5/4/3 would still hold for players who are regulars.  In any case, it’s an exciting prospect to consider.

***

A correction to Colin’s note here:

The third, and perhaps most important, takeaway has to do with regression to the mean. We can add a simplistic version of regression to the mean to our forecasting model by adding a TAv_REG of .260 (the league average) with a PA_REG of 1200. (The PA_REG comes from the Marcels; it’s included here mostly for the purposes of illustration. The regression component in PECOTA is a more rigorous model based on random binomial variance—again, the purpose here is only to illustrate the concepts.

Consider a player with 650 PAs in three straight seasons, or 1950 total PA. Using the Marcel weighting of 1/.8/.6, that comes out to 1560 effective PA— in other words, throwing out 20 percent of a player’s PAs during that time period. That means 56 percent of a player’s forecast comes from his own performance, and 44 percent comes from the regression to the mean component. Using weights of 1/.92/.90 yields 1833 effective PA, throwing out only about six percent. Using the same regression component, that’s 60 percent of a player’s forecast coming from his own production and only 40 percent coming from regression to the mean. (And if you follow from the conclusions above and start using more years to forecast a player as well, even less regression to the mean is necessary.)

There’s a calculation error in there.  Marcel uses 5/4/3/2 model, with the 5/4/3 being the weights for years T, T-1, T-2, and the 2 being the weight for regression toward the mean (using 600 PA as the seasonal number).  So, if you had say 700 PA in year T, 400 in year T-1, and 500 in year T-2, you get these effective weights:
year T: 700 x 5
year T-1: 400 x 4
year T-2: 500 x 3
regression: 600 x 2

That 600x2 is the same for everyone.  Colin’s calculation error is that rather than using 5/4/3, he used 1/.8/.6.  The net effect is that he showing a far bigger regression amount than Marcel is actually doing.

(77) Comments • 2012/02/12 • SabermetricsForecasting
Page 1 of 1 pages

Latest...

COMMENTS

May 26 03:03
Pete Palmer’s new book: Basic Ball

May 26 01:11
Largest demonstration in Canadian history?

May 25 23:40
“Why Kickstarter works”

May 25 19:41
What sabermetrics is NOT

May 25 16:59
Howard Stern

May 25 15:12
Do pitcher’s reach back for velocity when needed?

May 25 12:51
Chad Curtis

May 25 11:26
Lack of hustle during a game

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

THREADS

February 08, 2012
New PECOTA