Wednesday, February 08, 2012
New PECOTA
This was quite a surprising claim from Colin:
These values are on a very different scale, since due to the lack of an intercept the values have to sum to one for the first regression and to three for the second regression, but they’re also very different in a more meaningful sense; recasting the first year to 1 (which is practically already done for us), we get weights of 1/.92/.90.
As you know, Marcel uses 5/4/3 for hitters (meaning 1, .8, .6) and 3/2/1 for pitchers (1, .67, .5). (I think it was 3/2/1… can’t confirm right now.)
I personally use .9994^daysAgo for hitters and .9990^daysAgo for pitchers, which has the effect of being 1, .8, .64, .512 (and so on, each 80% of the previous) for hitters, and 1, .7, .49 (and so on, each 70% of the previous) for pitchers.
Tests from other research makes me think that it should be even more aggressive, so maybe 1, .7, .5 for hitters and 1, .5, .25 for pitchers. But, I haven’t researched that, so, I’ll just leave it there for now.
Colin has gone way to the other side, essentially going with a .9998 or .9999^daysAgo kind of model.
Now, I agree with the framework for his testing, that you should and must include the PA component when establishing the weights. Frankly, this is an important step. When I did it for Marcel, I basically forced everyone in the system to have at least 300 PA, so that I didn’t have to worry about this portion too much (I should have worried a little about this, at least). Indeed, if you give everyone at least 500 PA for each of the three years, this step becomes basically unimportant (no worries at all). That’s because the weighting of each year (the PA of year 1 divided by the PA of years 1 + 2 + 3) will be the same for each wOBA of year 1, 2, and 3.
So, getting back to Colin’s important point: he’s saying that if you introduce the PA weighting component, we see that every year is important. I find this very hard to believe. I mean, it’s an exciting finding if true, and I’d like to see more research on this for sure. My guess at the moment is that there’s a selection bias issue, with guys of limited number of years, or for young guys.
Basically, does Colin’s finding apply across-the-board, or is it really limited to a subset of the population? I’d bet on the latter, and I’d bet that the Marcel 5/4/3 would still hold for players who are regulars. In any case, it’s an exciting prospect to consider.
***
A correction to Colin’s note here:
The third, and perhaps most important, takeaway has to do with regression to the mean. We can add a simplistic version of regression to the mean to our forecasting model by adding a TAv_REG of .260 (the league average) with a PA_REG of 1200. (The PA_REG comes from the Marcels; it’s included here mostly for the purposes of illustration. The regression component in PECOTA is a more rigorous model based on random binomial variance—again, the purpose here is only to illustrate the concepts.
Consider a player with 650 PAs in three straight seasons, or 1950 total PA. Using the Marcel weighting of 1/.8/.6, that comes out to 1560 effective PA— in other words, throwing out 20 percent of a player’s PAs during that time period. That means 56 percent of a player’s forecast comes from his own performance, and 44 percent comes from the regression to the mean component. Using weights of 1/.92/.90 yields 1833 effective PA, throwing out only about six percent. Using the same regression component, that’s 60 percent of a player’s forecast coming from his own production and only 40 percent coming from regression to the mean. (And if you follow from the conclusions above and start using more years to forecast a player as well, even less regression to the mean is necessary.)
There’s a calculation error in there. Marcel uses 5/4/3/2 model, with the 5/4/3 being the weights for years T, T-1, T-2, and the 2 being the weight for regression toward the mean (using 600 PA as the seasonal number). So, if you had say 700 PA in year T, 400 in year T-1, and 500 in year T-2, you get these effective weights:
year T: 700 x 5
year T-1: 400 x 4
year T-2: 500 x 3
regression: 600 x 2
That 600x2 is the same for everyone. Colin’s calculation error is that rather than using 5/4/3, he used 1/.8/.6. The net effect is that he showing a far bigger regression amount than Marcel is actually doing.


Recent comments
Older comments
Page 1 of 344 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date