THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, February 19, 2009

PECOTA cards

By Tangotiger, 01:45 PM

This is Ryan Howard.  You’ll notice some inconsistencies, which I can only presume is because of the handoff between Nate and Clay.  For example, Howard’s 50th percentile PA forecast is 570, with his 10th to 90th percentile range as 548-594.  The weighted mean forecast is 631 PA.  His Projected Playing Time is 90%, or 640 PA.

His 7-year forecast is interesting.  He is expected to go from a forecast of 631 PA at age 29 to 530 PA at age 35 for a total of 4059 PA.  I don’t know how he gets a bump at age 32.  I have a pretty simple model, and my totals for those 7 years is 4065.  So, weirdly enough, if I start him off at 631 PA, we both end up with identical totals, except mine is a smooth drop, while PECOTA has him bouncing up and down a bit before settling down.  Interesting.

The MORP looks wrong, as it seems to apply the old MORP equation on the new WARP data.  I think. 


#1          (see all posts) 2009/02/19 (Thu) @ 14:53

I assume you got your BP2009?  What did you think of Clay’s essay in the back?  He seems a little mixed up still.


#2    Tangotiger      (see all posts) 2009/02/19 (Thu) @ 15:24

I actually didn’t get it yet.  As soon as Amazon delivers, I’ll let you know…


#3          (see all posts) 2009/02/19 (Thu) @ 15:28

Bill James’ Brock2 method also jumps up and down a bit ... it was based on an explicit Excel spreadsheet, and I never did figure out where the jumps came from.  But I didn’t try.


#4    Tangotiger      (see all posts) 2009/02/19 (Thu) @ 15:42

Yeah, James put it in explicitly at age 37 or so.

I guess PECOTA’s is a result of the small number of comparable players and so, is subject to sample size issues.


#5          (see all posts) 2009/02/19 (Thu) @ 15:44

Don’t quote me on this, it’s an educated guess based on Between the Numbers and what I remember reading from their work, but the career path is based off a weighted composite of the future stats of their comps.

Instead of using regressions or whatnot to make an age adjustment, their career path would be derived from the weighted sum of how their comps did looking forward. And, since you’re summing samples, there is going to be noise in the year-to-year numbers, as the samples’ good years and bad years don’t always cancel out.

It also explains how they derive their drop percentages, by seeing what (weighted) percentage of their comps drop out of the league.


#6    azruavatar      (see all posts) 2009/02/19 (Thu) @ 15:47

"which I can only presume is because of the handoff between Nate and Clay”

Has Nate explicitly walked away from BP or just tacitly?


#7    Tangotiger      (see all posts) 2009/02/19 (Thu) @ 15:52

No idea.

Pat/5: I agree that this is what’s happening.  A smoothing function would be better, but, that’s being too picky really.

The comp-approach does have alot of advantages over a regression model.


#8          (see all posts) 2009/02/19 (Thu) @ 17:37

Well, anything that far in the future is wild speculation, anyway. How accurate do you think you could project 2009 if you ignored the data of the last 3-6 years?

It’s mostly useful to see what players are more likely to maintain a high level of play and who’s destined for a sharp downfall.

Is he more likely to become Mo Vaughn or Cecil Fielder? PECOTA seems to be implying the latter.


#9    Rally      (see all posts) 2009/02/19 (Thu) @ 18:12

What’s the difference?  Vaughn got fatter and injured.  Cecil just got fatter.  Both lost their value relatively quickly after age 30.


#10    Pat Senechal      (see all posts) 2009/02/19 (Thu) @ 18:30

(In retrospect, Cecil’s a bad example. He degraded faster than I thought.)

But, if he could jump on a Frank Thomas career path, then he will have value into his mid-thirties, and that’s the difference between getting a Teixeira-style contract, and getting a Dunn-style contract, when his current deal expires.

But a player’s longevity is a skill he can control, so being able to project it a valuable exercise.


#11    Dan Rosenheck      (see all posts) 2009/02/20 (Fri) @ 10:53

The Wieters, Gerut, and Werth forecasts seem like computational errors to me (especially since Wieters’ .625 SLG in AA is being translated to a .627 major league equivalent).  Does anyone else agree these cases mean these numbers are simply Wrong and there will be a new PECOTA released before the start of the season?  I remember PECOTA had Andrew Miller at a 4.09 projected ERA last year or something until they suddenly bumped him up to a much more reasonable 4.60 or so like two days before Opening Day.


#12    Patriot      (see all posts) 2009/02/20 (Fri) @ 12:23

My comments BP from a quick lookover:

1. He talks setting his system up so that it assumes that a replacement player is an average fielder, except he doesn’t believe in that so he “nudges” the position averages based on defensive factors. I have no problem with this, but he never mentions the alternative of doing it in the Tango/Fangraphs.

2. WARP is figured in the idealized context of 9 RPG.  His example is Jose Reyes.  He calculates a W% for Reyes new team by adding Reyes’ runs above an average SS to the team runs scored and his runs saved in the field to the team runs allowed.  But then when he figures the W% for the team with the replacement level SS, he subtracts 22.11 runs per 486 outs. 

Unless I’m missing something in his explanation, the explanation makes it appear like he is using the same replacement level hitter at each position (-22 runs). 

Then he does an example with Johan.  Johan was -1.7 runs versus an average pitcher, but the comparison replacement level team scores less runs than the team with Johan does.  I really don’t understand the logic there at all.

I don’t want to dismiss the possible that I’m misreading.

3. They got read of the player index this year, which makes it harder to use as a reference.  The first thing I wanted to do was read the comments on OSU alums, but without an index I had to go searching to see which ones they wrote comments for in each team section.

4. They still say “Pythagenport” record in the team boxes.  If they are using Pythagenport, that’s fine, but the website glossary makes this very confusing.


#13    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 12:37

I will make my comments on the BPro book when I read it.  For now, I’ll comment on Patriot’s comments.  Clay is right to suspect that the replacement level player is not an average fielder.  He is indeed a slightly below average fielder.  However Clay gets to -22.1 runs per 162 games, I’m happy, seeing that I use -22.5.  Anything in the -20 to -25 range is fine with me.

***

I’ll reserve comments on the positional adjustments (which seems to be based on the offense at that position), once I read it, especially as it pertains to the DH.

Are there any league adjustments?

***

I told Clay last year about the name of Pythag, he agreed with us, and he cc:ed the editor to that effect.  If it is as Patriot says, then someone dropped the ball.  Though it was certainly not Clay.

I will note that while the name here is the old one:
http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=136

There are two instances where the term was changed, here:
http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=492
and here:
http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=490

Looks like some effort is made to make the corrections.


#14          (see all posts) 2009/02/20 (Fri) @ 13:01

If you look at the translations on this page, http://www.baseballprospectus.com/statistics/pageEasreg.php, Wieters’ translation is slightly lower.  Wieters’ PECOTA projection in general is pretty insane.  They are basically saying that Wieters is one of the top 5 hitters in baseball right now.  If you look at his similar players, he has a similarity index of 0.  I’m not sure if a player has ever gotten that low before.  Some of the players on his list, while pretty much meaningless for his projection with the low similarity score, are pretty interesting.  For example, some guys there are Evan Longoria, Darryl Strawberry, Ken Griffey, Prince Fielder, and Albert Pujols.

Rally did have a more reasonable sounding .354 wOBA translation for him.


#15    Dan Rosenheck      (see all posts) 2009/02/20 (Fri) @ 13:49

I stand by my assertion that the Wieters forecast is a mistake and will be fixed.


#16    Patriot      (see all posts) 2009/02/20 (Fri) @ 14:15

Scrap my comments on my point #2.  His position adjustments are fine within the construct of his system, and it was me who was missing something (the -22 runs is versus an average hitter at the position, not an average overall hitter.  Just a misinterpretation on my part). 

Except for the pitchers.  He’s assuming that a replacement level pitcher is an average fielder but also a bad hitter.  It would make more sense to assume they are an average fielder and hitter.


#17    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 14:21

Patriot, what is his replacement level pitcher, and does he distinguish between starter/relief roles?


#18    David Cameron      (see all posts) 2009/02/20 (Fri) @ 14:28

In the Johan example, he claimed a 5.75 ERA as replacement level.  If he’s using a different mark for AL/NL or starter/reliever, he didn’t say it.  I think we have to assume that he probably is, it was one of those things that caused me to shake my head when I finished the article and say “really, that’s your explanation?”


#19    philly      (see all posts) 2009/02/20 (Fri) @ 14:33

re:#11 and changing PECOTAs

Last year BP released a second set of PECOTAs right before the start of the season and the change that was incorporated was to factor in defensive support into the pitcher projections.

The example of Miller’s projected ERA going up by ~0.5 runs in front of the bad Marlins defense is an example of that change and not an error.  Presumably his translated stats were all the same.

That won’t change the Wieters forcast at all, but a lot of pitchers will have their projected ERAs moves quite a bit.


#20    Patriot      (see all posts) 2009/02/20 (Fri) @ 14:34

The replacement level pitcher allows 1.25 runs more than average, which is 128% of the league average (4.5+1.25)/4.5, which IIRC is right around where you set it for starters.

His example is for Santana, but he doesn’t mention anything about a seperate level for relievers.  He does say that reliever’s runs saved are multiplied by their Leverage (I assume the Woolner version since it’s BP and it doesn’t state otherwise).  So it seems as if closers will be overvalued by getting both the leverage adjustment and the same replacement level. 

If pitcher WARP was listed in the book, I would look up Rivera, Papelbon, etc. and see how many WAR he figured them at, but WARP is only listed for position players.  For pitchers they have the Support-Neutral and/or Win Expectancy based values.

As another note, they list both VORP and WARP for position players; they still can’t pick one or the other, apparently, although now the only real differences are WARP including defense and using EQR instead of MLV.  Prior to Clay’s changes, the baselines were very different of course.


#21    Patriot      (see all posts) 2009/02/20 (Fri) @ 14:36

David C./18: He specifies at the beginning that he is using translated stats, so everything has been converted to a 9 RPG environment.


#22    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 14:53

I can only think that “VORP” is used because it’s a popular term.  If they take it out, it would be “hey, what happened to VORP?”.  Is it possible that VORP is actually RARP?  That is, VORP+fielding+replacement = WARP?

Yes, I set replacement-level for starters at 128% or so of overall league average.  Just doing it now… if the average is 4.50 total runs per 27 outs, then my replacement-level is 5.80.  I’d be fine with anything between 5.50 and 6.00.


#23    Dan Rosenheck      (see all posts) 2009/02/20 (Fri) @ 17:14

That seems rather low for a SP replacement level, Tango.  If I recall, you have average non-closer relievers as .410 starters, and replacement relievers as .470 relievers.  But won’t those replacement relievers by definition pitch the lowest-leverage innings, at about 0.6 or so, with everyone else moving up a notch?  That would make the SP replacement level .410-((.5-.47)*.6), or a .392 winning percentage, instead of .380...although I suppose there’s a limited supply of low-leverage innings out there, isn’t there?


#24    Tangotiger      (see all posts) 2009/02/20 (Fri) @ 17:55

The overall average reliever is a .520 when pitching as a reliever.  That probably breaks down as .620 for the ace, .540 for the setup guy, .470-.510 for the rest of the bullpen.  Take off .090 for everyone, and you get the rest of the bullpen at .380-.420 as a starter.


#25          (see all posts) 2009/02/20 (Fri) @ 18:29

Since Wieters has been mentioned several times here, I thought I’d compare Clay’s numbers to mine.

The link Victor gives in 14 is to a page where Clay has a straight MLE for one league at a time, so that only has Wieters’ stats for Bowie in AA. When you add that to Clay’s translation for Wieters at Frederick, the combined line is 306/398/528.

Checking a few other players, Clay translates below AAA more pessimistically than I do.

But, when you look up his Pecota card, they list 301/396/513 at Frederick and 349/436/627, both well above Clay’s page, but very close to what I have for the singe season MLE. Must be using different engines, or some correction.

My equivalent single season unregressed MLEs
2005 NCAA 285 362 451 356
2006 NCAA 273 362 463 361
2007 NCAA 276 366 452 359
2008 A/AA 331 411 561 416

My projections
2007 NCAA 265 345 429 340
2008 BAL 298 376 495 377

Which falls between PECOTA’s 25% & 40%, so now they are adding in even more optimism. In research for an upcoming article on accuracy of projections, I found that about 30% of players sustained a one year jump into a second year, 70% went back to where they were before.

If I take .30*.416+.70*.360 it comes to .377 - wow, never tried that before, but it came out the same as my projected wOBA, so I guess that’s Oliver’s weighted mean.


#26    Rally      (see all posts) 2009/02/21 (Sat) @ 20:57

Wieters looks completely realistic if you compare it to their Cristian Guzman projection - a .323 batting average, 3rd best in the game after Pujols and Jones.


#27    Dan Rosenheck      (see all posts) 2009/02/21 (Sat) @ 21:55

I am convinced we are going to see a whole new set of PECOTA’s sometime soon.  These are just obviously buggy.  I hope they come out before my fantasy draft!!


#28    Colin Wyers      (see all posts) 2009/02/21 (Sat) @ 23:49

Looking at the PECOTA card for Wieters and the DTs, about the only thing I’m left to conclude is that they’re listing the PEAK translations, not the regular translations. Which leads me to believe that’s what was inputted into PECOTA, as well.

Looking at Tyler Flowers’ card - yep, same thing. His PECOTA card has an EqA at A+ of .291, which matches the peak translation:

http://www.baseballprospectus.com/statistics/pageCarpeak.php

About 20 points above his regular translation.

I think you have to take any PECOTA forecast that includes minor-league translations and just throw them out. They’re damaged goods.


#29    Colin Wyers      (see all posts) 2009/02/22 (Sun) @ 00:13

This doesn’t seem consistent - Teagarden’s stats seem to match up with the regular translations. Towles’ stats don’t seem to match anything he did in 2008.


#30    Rally      (see all posts) 2009/02/22 (Sun) @ 13:59

Dan, I think I’ve taken care of any bugs in my projections, and if Pecota is any more accurate than mine I’ve yet to see the evidence of it.

Percentiles, multiyear forecasts, player valuation, it’s all here. Click on the website link.


#31    Dan Rosenheck      (see all posts) 2009/02/22 (Sun) @ 18:10

Rally, I take a weighted average of projections, so I need “fixed” PECOTA’s as well as your CHONE, ZiPS, and Marcel.  I’ve sent you a few emails about your projections; have they not reached you?


#32    Rally      (see all posts) 2009/02/22 (Sun) @ 22:24

Dan, the last email I got from you was on 2/10 regarding the position adjustments.  I don’t have a spam filter on my end.


#33    Zach      (see all posts) 2009/02/23 (Mon) @ 22:01

Nate Silver talked to ESPN’s Gene Wojciechowski about PECOTA and more (click my name). Woj. noted how PECOTA offered a very generous projection for Matt Wieters near the end, and Nate agreed and didn’t say anything about a data error.


#34    Tangotiger      (see all posts) 2009/02/24 (Tue) @ 10:10

Post 33 was marked for moderation and is now open.


#35          (see all posts) 2009/02/24 (Tue) @ 13:08

If they are using the “Peak Projections” Clay has a very steep aging curve. Someone who is 21 or 22 projects quite a bit above the regular translation, while a 25 or 26 will have crappy projections.


#36    Tangotiger      (see all posts) 2009/02/27 (Fri) @ 10:58

4. They still say “Pythagenport” record in the team boxes.  If they are using Pythagenport, that’s fine, but the website glossary makes this very confusing.

Ok, I read one of the intro chapters and they talk about Pythagenport, highlighting that it’s Clay’s and makes no mention of Patriot.  I was particularly annoyed at reading that paragraph.  Since Clay told me last year that he agreed with me completely, and that he cc:ed the editors to that effect, it’s even more annoying to see disregard (blatant, laziness, or whatever) on the part of the editors (and writer) on this issue. 

It’s not like there’s a point of disagreement here.  We all agree on the facts, and Clay thought it important enough to alert the editors and cc: me last year.

Anyway, I brought it to Clay’s attention once more.

***

And being editor is a thankless task.  I know.  The three of us spent the better part of a month simply correcting every little typo.  It was unreal, actually.  We had already done a proofread on a chapter by chapter basis.  We were happy with that.  Then, when I merged all teh chapters together, we saw over 1000 typos that we simply missed in the first several read-throughs.  Correcting typos after the thing has been typeset already was the most depressing part of the book, enough that I don’t want to write another book.  We corrected those 1000 typos, and then we found yet another 300 more typos.  And then another 100.  At some point, we just had to stop looking for errors.  And guess what.  As soon as it was published, we found a typo on the very first page (than/then).

So, I feel for the editors here.  At the same time, this has been documented, and has been brought to their attention.  It simply should not have slipped through the cracks once more.


#37    Tangotiger      (see all posts) 2009/02/27 (Fri) @ 14:27

Patriot: I thought I understood the new baseline for Clay, but maybe I don’t.

Let’s look at SS and CF, which will be my biggest source of disagreement.

He gives them both a baseline comparison level of .251 EqA.  And, if you presume that in 2008 SS hit a (translated) .245 and CF were .260, then isn’t Clay saying that the overall average SS (off+def) will be a bit worse than the average player and the CF will be a bit better than the average player?  (In 2008 anyway.) That gap looks like 10 runs.

In my case, I do have the CF being better than SS in 2008, but by 5 runs.


#38    Patriot      (see all posts) 2009/02/27 (Fri) @ 16:08

Good catch.  I have to admit I didn’t look too closely at the specific positional adjustments.  For reference, here are all of the baselines and their equivalent in terms of relative runs/out (a .260 EqA is average, and converts to .1724 runs/out, with R/O = 5*EQA^2.5):

C--.234, 76.8%
1B--.274, 115
2B--.255, 95.2
3B--.261, 100.9
SS--.251, 91.5
LF/RF--.267, 106.8
CF--.251, 91.5
DH--.285, 125.8
P--.125, 16

Compared to the long-term positional averages, the ones that stick out are CF (much lower) and corner outfield (a little lower), SS (a little higher) as I’m sure everyone can see for themselves.  Of course Clay is attempting to incorporate the position’s defensive value, so we wouldn’t expect them to match perfectly, nor should they. 

In 08, the CFers actually had a .268 EqA and the SS had a .255. So per 450 outs, the average ‘08 CF is +6 and the average SS is -4, a 10 run difference just as you said. 

So unless we’re both missing something, he is in fact saying the average CF is ten runs better than the average SS, at least in 08.  And the difference would only increase over a longer timeframe, because according to EqA the center fielders’ R/O was 13% higher.  For 1992-2001, the CFs were at 102% of the league average (in terms of ERP, which of course should be close to what EqA would say) and the SSs were at 86%, a 19% edge for CF.


#39    Tangotiger      (see all posts) 2009/02/27 (Fri) @ 17:18

WARP specific commentary can be placed here:
http://www.insidethebook.com/ee/index.php/site/comments/the_new_warp/


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 13:18
Do pitcher’s reach back for velocity when needed?

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves