Thursday, February 24, 2011
Marcello’s playing time
Jeremy gives us an updated model of Marcel’s playing time forecast (dubbed Marcello, which I love, considering my Italian heritage especially). By the way, one reason to give forecasting systems silly names like this is because it shows that we can’t be all that accurate to begin with.
Anyway, whereas when I created Marcel, I used a regression based on, I dunno, probably data of players born since 1895 (when Ruth was born).... maybe I used 1931 (Mays/Mantle); Jeremy instead uses much more recent data. While my best-fit equation was 50%YearT + 10%YearT-1 plus 200 PA,his is 75%, 10%, and .... unknown.... probably plus 50 or plus 75 PA. But, he goes beyond that, and looks at the number of PA in YearT and YearT-1 and reasons that if the PA in YearT is greater than YearT-1, then he can discard YearT-1 altogether (think of September callups or recently promoted to full-time status). So, in that case, it’s 80% of YearT plus, I dunno 75 or 100 PA or something.
That’s great stuff. I should have thought to at least verify the model against more recent data, and maybe I’ll do that soon. But, thank god we have the up-and-comers like Jeremy (and really, we may as well call these young guys the here-and-now because they’re well-established and don’t need to prove anything more to us) willing to do all that work.
Anyway, another point that Jeremy brought up is if we are trying to forecast the mean, median, or mode. We had this discussion a few years ago. I said:
From 2001 to 2005, this was Albert Pujols AB: 590, 590, 591, 592, 591
Consistent! That’s the word, right? Wow, I will forecast Pujols for 591 AB in 2006, and I’m definitely going to be right! Yes, consistent, healthy. That’s Pujols. We all know what happened to Pujols this year. His final AB total was 535.
What did Marcel say before the season started?
531.
dq (where are you anyway?) brought up a great point, and it relates to the issue of mean v median. First he confirmed my Pujols example:
There were 130 players, whose 3 year average was 614.6. 180 + .6 * ab = an average of 548.8; their actual ab’s averaged 553.3 -
But, then he talked about the absolute error:
But, the 3 year average (3YA) was closer to the right answer 60% of the time versus Marcels. The average of the absolute difference for 3YA was 72.2 versus the Marcels 82.0.
The median of the 3YA was 28.7; the median of Marcels was 65.2.
...
So, I would say that the chances of Pujols getting 592 abs were greater than the chances of him getting 535. On the other hand, over the course of x seasons, he will average 535, because in one of the years he will fall short.
Therefore, it’s an interesting point to discuss as to exactly what it is that we are forecasting. Is it the mean, or the 50 percentile point? With rate stats, it follows a bell curve, so we don’t have to worry about that, as you’ll generally get the same answer. For playing time however, it makes a huge deal, because you are bounded at the high end to 162 games, and 0 at the low end, and the guys we care about are all talented enough to play 140-160 games per season.
So, what is it that we really are after?


Recent comments
Older comments
Page 1 of 344 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date