THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, October 27, 2009

How well did the Community forecast playing time?

By Tangotiger, 04:42 PM

Here’s a quick study:


For those guys expected to get more than 580 PA:
n = 60 players
604 = expected PA
582 = actual PA

Between 481 and 580 PA:
n = 123 players
540 = expected PA
510 = actual PA

Between 381 and 480 PA:
n = 74 players
436 = expected PA
406 = actual PA

Between 281 and 380 PA:
n = 52 players
327 = expected PA
293 = actual PA

Between 181 and 280 PA:
n = 71 players
235 = expected PA
202 = actual PA

Between 81 and 180 PA:
n = 66 players
128 = expected PA
117 = actual PA

Between 1 and 80 PA:
n = 134 players
39 = expected PA
51 = actual PA

0 PA:
n = 104 players
0 = expected PA
90 = actual PA

As you can see, a consistent bias for any player with at least 180 PA, and for any player below 80 PA, the bias goes the other way (but not as bad). 

If we make three groups of players:
at least 180 PA expected: expected an average of 444, actually got 414… 30 PA too much

between 1 and 179 PA expected: average expected of 68, actually 72… a match pretty much

0 PA expected: actually got 90 PA

***

Guys who actually got zero PA, but were expected to play in 2009 (by the Community):

player_id ACT_PA EXP_PA name_tx
458628 0 324 Mather, Joe
459943 0 319 Clement, Jeff
425774 0 169 McPherson, Dallas
132961 0 168 Jenkins, Geoff
430958 0 158 Rabelo, Mike
445615 0 109 Ramirez, Max
457850 0 107 Casto, Kory
440361 0 106 Hopper, Norris
453327 0 103 Antonelli, Matt
460131 0 101 Bogusevic, Brian
446382 0 101 Barton, Brian
(bunch under 100)

As you can see, Joe Mather and Jeff Clement are the two biggest surprises as to who did not play in MLB.

Guys who were not given any playing time forecast, but actually played in MLB:
player_id ACT_PA EXP_PA name_tx
325392 587 0 Podsednik, Scott
150456 587 0 Kennedy, Adam
458085 565 0 Coghlan, Chris
467827 491 0 Parra, Gerardo
434540 358 0 Jones, Garrett
440251 351 0 Roberts, Ryan
425556 337 0 Nix, Laynce
116662 331 0 Jones, Andruw
435408 306 0 Santos, Omir
132788 283 0 Millar, Kevin
425543 211 0 Wilson, Josh
(bunch under 200)

Almost none of these guys were listed on any one team’s 40-man roster by the time the survey ran.  This just tells me that I need to run a bit later in Spring Training (I ran it two weeks before the season started).

The biggest surprises.  These 23 guys got at least 200 PA more than expected:
player_id ACT_PA EXP_PA diff name_tx
448242 451 61 390 Gwynn, Tony
493596 430 42 388 Beckham, Gordon
472528 398 44 354 Valbuena, Luis
460099 411 62 349 Reimold, Nolan
346874 432 101 331 Uribe, Juan
430574 397 94 303 Maier, Mitch
450314 599 299 300 Zobrist, Ben
434636 376 78 298 Pagan, Angel
460579 533 261 272 Morgan, Nyjer
466320 540 268 272 Cabrera, Melky
451594 518 264 254 Fowler, Dexter
430948 634 386 248 Callaspo, Alberto
465784 438 195 243 Cabrera, Everth
457705 493 254 239 McCutchen, Andrew
407781 599 362 237 Byrd, Marlon
445988 503 270 233 Prado, Martin
425661 266 35 231 Paulino, Ronny
434704 388 159 229 Young, Delwyn
456422 678 451 227 Bourn, Michael
407797 309 88 221 Green, Nick
461416 324 108 216 Venable, Will
466988 509 296 213 Bonifacio, Emilio
452655 676 474 202 Span, Denard

And these 27 guys got at least 250 PA fewer than expected:
player_id ACT_PA EXP_PA diff name_tx
433582 110 571 -461 Jackson, Conor
449107 127 588 -461 Aviles, Mike
408314 166 625 -459 Reyes, Jose
476704 76 526 -450 Lowrie, Jed
113232 112 551 -439 Delgado, Carlos
294558 29 457 -428 Nady, Xavier
460086 189 606 -417 Gordon, Alex
425867 193 585 -392 Greene, Khalil
430001 162 546 -384 Weeks, Rickie
435623 54 417 -363 Frandsen, Kevin
424825 215 577 -362 Crisp, Coco
457727 199 557 -358 Maybin, Cameron
136267 32 389 -357 Glaus, Troy
435520 106 436 -330 Flores, Jesus
493127 260 588 -328 Iwamura, Akinori
458628 0 324 -324 Mather, Joe
459943 0 319 -319 Clement, Jeff
136767 31 344 -313 Chavez, Eric
459941 115 408 -293 Buck, Travis
451186 265 550 -285 Milledge, Lastings
459991 23 303 -280 Sanchez, Gaby
458210 256 531 -275 Casilla, Alexi
114789 254 528 -274 Giles, Brian
430965 202 473 -271 Snyder, Chris
435222 268 536 -268 Fields, Josh
114260 17 273 -256 Floyd, Cliff
346795 182 438 -256 Chavez, Endy

#1    Xeifrank      (see all posts) 2009/10/27 (Tue) @ 17:56

Would you consider that normal that the expected plate appearances were higher than the actual plate appearances until you got lower down on the chart?  What would be some of the likely causes of this...? catastrophic injuries? bad estimates? players getting playing time that weren’t listed?

What is the standard variation on plate appearances (actual vs expected) for the complete sample?  Or standard variation by each of your sub group if that’s more telling.

vr, Xei


#2          (see all posts) 2009/10/27 (Tue) @ 18:14

Regression to the mean, is my guess.  If you project a higher-than-average amount of PAs, you’re more likely to have erred on the side of over-projecting.  If you project a lower-than-average amount of PAs, you’re more likely to have erred on the side of under-projecting.

Related to Xei’s question: why did it flip around the 80 PA mark?  I’d have guessed it would have flipped at about the average projected PA for the entire population we’re looking at, which I would guess is more like 300 PA.


#3    mulkowsky      (see all posts) 2009/10/27 (Tue) @ 18:19

Thanks again, Tango.  This is amazing stuff!

Let me put a pitch in for the fantasy players for you to still do it two weeks out from season start.  Any closer to the season and many of us who would love to use this information in our drafts will have already drafted, and the increase in accuracy isn’t worth the decrease in timeliness/usefulness, IMO.

Thanks again!


#4          (see all posts) 2009/10/27 (Tue) @ 18:58

The “0” group suggests that people are projecting their estimate of the median or mode, not the mean.  If you think a guy has a 10% chance of being full time (600PA) and 90% chance of not playing, you should project him to 60PA if you’re using the mean.

Nobody who is still in professional ball should be projected at zero.  If you think the guy is only a 10% shot to be a bench player get 100 PA, you should project him at 10 PA.

Perhaps the community gets a bit of extra utility in getting a projection exactly right, which is why they say zero.  smile


#5    J. Cross      (see all posts) 2009/10/27 (Tue) @ 21:16

It seams like everyone is projecting too many PA’s.

Just a quick look at the 483 hitters I have in the spreadsheet (which unfortunately at the moment is only guys with lahman ID’s so does not include any players who debuted in 2009).  The list also doesn’t include players who missed the entire season.

These players averaged 345 PA

and system projected an average of:

Marcel/community: 403
Zips: 457
Pecota: 414
Chone: 483

Correlations with actual PA for this pool of players:

Marcel .686
Zips .531
Pecota .525
Chone .545

Based on this quick and dirty analysis, the fans win.


#6    J. Cross      (see all posts) 2009/10/27 (Tue) @ 21:26

There is a strong bias here, however.  Some of the above system only projected guys that they expected to get significant playing time (for instance Chone didn’t project anyone below 100 AB) and this list of players is the list of guys who had projections from all of the above system, in other words, the guys who were expected to get significant playing time by these systems.

So, the guys who were expected to get significant playing time… got less than expected.


#7          (see all posts) 2009/10/27 (Tue) @ 22:32

Was the total number of predicted PAs close to the total number of actual PAs?


#8    J. Cross      (see all posts) 2009/10/27 (Tue) @ 23:04

good question.

I don’t have all the players so I can’t make that comparison.

one more thing, the community’s success in projecting playing time might not translate to fantasy success.

For the 146 players who were expected to get 500 PA or more (average > 500 PA), the kinds of guys who get drafted in most fantasy leagues, marcel didn’t do as well.

R (with actual PA)

Marcel .205
Pecota .174
ZiPS .277
Chone .287

for the guys that the stearmollers projected (projected starters less guys who have yet to appear in the majors)

Marcel .372
Pecota .393
Zips .424
Chone .425
Fantistics .506*

*I added in Fantastics on the advice of an emailer who said that they were good for playing time projections and this seems to support that.


#9          (see all posts) 2009/10/27 (Tue) @ 23:31

Another thing that occurs to me is injuries.  If some forecasts aren’t discounting the PAs for the possibility of injuries, and others are, that might skew the results.

Nothing wrong with not considering injuries, if that’s what you prefer ... the user can discount them to his preference.  For instance, if you told me Jeter was projected to 650 PA barring injuries, and I thought there was a 10% chance of a DL stint that would last 15 days, I’d discount him by maybe 7 PA.


#10    Brian Cartwright      (see all posts) 2009/10/28 (Wed) @ 00:24

Good point by Phil.

At the moment, I am projecting playing time by simply Marceling the last three seasons. I will amend that to count how many days out of the season the player was on an active roster, at any level. Will do something like the mean number of PAs for games he was active and if there was a disproportionate number of inactive games. So maybe project games first, then multiply by PAs per game (thinking as I type)


#11    Tangotiger      (see all posts) 2009/10/28 (Wed) @ 09:26

I should point out how I generated the Community playing time forecast.  This is what I asked the Fans:

Number of Games
150+ (Iron Man)
130-149 (Regular)
90-129 (Platoon)
30- 89 (Bench)
1- 29 (Callup)
0

From that, I created the number of games as such:
155
140
115
70
20
0

So, if you forecast 90-129 games, I count that as 115 games, etc.

From that, I create PA per G as follows:
value/100+2.5

So, if you forecast someone with 115 games, I give him 3.65 PA per G.

And from that, PA follows:  115 x 3.65 = 420 PA.

So, anyone who selected the “ 90-129 (Platoon)” selection counted that as 420 PA.

I did all this with a bit of correlation here, and intuition there.  And, most importantly, I made sure that it all added up to around 182,000 PA (actual PA in 2009 was 181,051; I “forced in” an estimate of 182,319).

That was my first problem.  There was 9340 actual PA from which I had a total estimate of zero (basically, guys that I did not put on the ballot).  My bad, because of all the late free agent signings, but also guys not on the 40-man roster that made the team.  Of all the players who actually were on the ballot, they had an actual PA of 171,711 PA.  And THAT should have been the estimate I should have forced to (well, I didn’t know it was going to be that, but I should have taken 95% of the actual of 2008, and presumed that would have been the target for 2009).

Basically, I should have knocked down all estimates by 5%.  The 420 PA from the example above should have been 399.  An estimate of 600 PA should be an actual of 570, and so on.

Had I done that, the Community would have nailed practically every class of playing time estimate.  So, the only reason they don’t look as good as they should is because of me.

Regardless though: because the bias is systematic and not random, insofar as Fantasy Baseball is concerned, it doesn’t matter, since it all cancels out.  That is, if I knock out 5% PA from everyone, this won’t affect the rankings of the nonpitchers.


#12    Peter Kreutzer      (see all posts) 2009/10/28 (Wed) @ 15:36

One of the big issues when making projections is pertinent here.

One knows that the Top 10 AB leaders at the end of the season will have about 6500 AB.

But if you project your Top 10 AB leaders to have 6500 AB, chances are that at the end of the season they are likely to have only 5900. About 10 percent of the expected at bats will go missing, due to random injuries.

Since the injuries are random, to get the right PT total you have to wind everyone down about 10 percent. But when you do that you end up with no players projected to have 600 AB, which looks wrong (and, of course, will be wrong).

The only other logical choice is to guess which players will be hitting the DL, and make them take the hit. This leads to bold and usually wrong projections.

The best choice, it seems to me, is to ignore the overall totals and project as if players are going to play as much as they usually do when they’re healthy, unless they have an injury history that forces an adjustment.

But again, this leads to wrong overall totals.


#13    Tangotiger      (see all posts) 2009/10/28 (Wed) @ 15:45

Does it really matter though (for our purposes)?

If you give Ichiro 600 PA, and Gutierrez 500, and Chavez 400, or you give them 660, 550, 440, respectively, it all cancels out anyway, right?

Really, you might as well give the top guy “1.00 PA”, and everyone is a fraction of that.  It becomes irrelevant for the overall valuation.  The ordinal rankings remain the same.


#14    Peter Kreutzer      (see all posts) 2009/10/28 (Wed) @ 15:56

For our purposes, it doesn’t matter. The proportions are what matter.

I was pointing out why those of us who publish projections might overproject at the top end. It looks wrong to do it right.


#15    Tangotiger      (see all posts) 2009/10/28 (Wed) @ 16:11

Right, I agree with you otherwise.

This is the same idea with the rate stats.  If you presume the league OBP will be .340, and it turns out to be .325, it’s irrelevant.  Your players are forecasted around the league mean anyway.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential