THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, March 16, 2010

The Tom Seaver Rule

By Tangotiger, 02:04 PM

THT:

Note that even after fixing the pitcher MLEs, Stephen Strasburg still projects for a 2.86 ERA this season. Wow.

Strasburg is going to be 22 years old this year.  And his forecast is to be at about 66% of the league average.  In the 2010 Hardball Times Annual, I said:

Let’s focus on all those pitchers with at least 3000 batters faced between the ages of 25-28 [born between 1922 and 1971].  There’s 201 of these pitchers.  At that age, Pedro Martinez allowed a total of 3.01 runs per 9 innings pitched (both earned and unearned).  The league average during his time was 4.92.  Pedro allowed runs at 61% of the league average.  We’ll call this metric the Runs Allowed (RA) Index.  Pedro’s figure is the lowest for the time period we’re discussing.  Tom Seaver is next at 65%, followed by Greg Maddux also at 65%.  Fourth in line is Kevin Appier (!) at 67%.  Rounding out the top ten: Whitey Ford, Robin Roberts, Jose Rijo, Roger Clemens, Jim Palmer, and Billy Pierce.

Four points:
1. If Strasburg has a MEAN forecast of 67% of the league average in runs allowed, and we have ALOT of uncertainty of this, then his actual true talent assessment is somewhere between 50% and 85% of the league average.

2. To the extent that Strasburg is actually a 67% pitcher, that puts him in the running for 2nd best pitcher over the last 70 years for pitchers aged 25-28, a list that includes Greg Maddux, Roger Clemens, Tom Seaver, and the underappreciated Kevin Appier.  Except those guys did that at the age of 25-28, while Strasburg is going to be 22, and presumably will get better by the time he hits his 25-28 stride.

3. You can’t possibly make that kind of bet can you?  Isn’t it better to say that the maximum potential upside for ANY non-MLB pitcher ever, past, present, or future, is Tom Seaver?  Isn’t it reasonable to say that?  Isn’t it better to say that Strasburg’s runs allowed talent is a 65% - 100% pitcher of league average, with a mean forecast of close to 80%?  Basically, you give me the best college or Japanese performance ever, and I say that the UPSIDE forecast (two standard deviations from his mean forecast) for that pitching line cannot be better than Tom Seaver.

4. Regression, regression, regression.


#1    David Gassko      (see all posts) 2010/03/16 (Tue) @ 14:47

Hey Tom,

I’m going to have to disagree here. We forecast Strasburg to have one of the best ERAs in the NL, sandwiched in between Chris Carpenter and Tim Lincecum. This is the real comparison point, then. Is that projection realistic? I think so.

First of all, recall that pitchers don’t really improve much if at all as they age. What that implies is that a 22 year-old pitcher is likely at or near the peak of his abilities. Carpenter is almost certainly past his peak; Lincecum, too. So is it unrealistic to say that Strasburg at his peak will be a little worse than Carpenter and Lincecum at theirs?

I don’t see why not. After all, he has posted absolutely INSANE college numbers—better than anyone we’ve ever seen as far as I know. So statistically, he is likely the greatest college pitcher of the modern era, if not all-time. On top of that, scouts RAVE about Strasburg—every scout who has seen places him near the top of pitchers they’ve observed. So the stats and scouts agree—Strasburg is one of the best pitchers to come along in many years.

And again, remember that unlike with a hitter, this is NOT a question of projectibility. Hitters coming out of college are prospects—they have to learn and improve before they can jump to the major leagues. But pitchers are not necessarily so. As far as I’m concerned, Strasburg is already a top major league pitcher—not in terms of value already accumulated obviously, but in terms of what I think he is capable of doing in 2010 and beyond. The only question that remains with him is durability (and obviously that’s a big question). But if Strasburg can prove durable, there’s no reason to believe that he will be anything other than one of the best pitchers in baseball.


#2    Brian Cartwright      (see all posts) 2010/03/16 (Tue) @ 14:50

and quite possibly more regression is necessary, and I will be researching that further asap.

In another thread I made a comparison between Strasburg and Prior. They had very similar college stats, and similar projections. Prior did meet his projection before his arm fell off.

 ERA   BH   HR   BB   SO
2.82 .309 .028 .057 .289


#3    David Gassko      (see all posts) 2010/03/16 (Tue) @ 14:51

Oliver says that Carpenter, Strasburg, and Lincecum all project for nearly equivalent ERAs. In other words, the odds that Strasburg will have the best ERA in that group are 3-to-1. If you think that’s ridiculous, Tom, why don’t you give me 4-to-1 odds on that bet? I’ll take it.


#4    berselius      (see all posts) 2010/03/16 (Tue) @ 14:52

To reinforce David’s point, don’t forget about Mark Prior’s debut too. I know - just one data point - but Prior and Strasburg seem pretty similar in hype coming out of college. It’s not *unreasonable* that Strasburg could post those kinds of numbers.


#5    Rally      (see all posts) 2010/03/16 (Tue) @ 14:52

I looked at pitchers from age 19 to 24.  That’s Strasburg’s group, great young pitchers.  We have no idea if he’ll even be able to lift his arm when he’s 25-28.

Some notables (ERA+):
Oswalt 153
Licecum 141
Clemens 141
Seaver 140
Prior 133
Gooden 132

67% of league would mean Strasburg comes in at 150.  Yeah, that is hard to believe, but in Brian’s defense, Strasburg has better college numbers and better scouting reports than Lincecum or Prior.  The pre-professional data says this is a once in a generation pitcher.  Depends on how far you think you can trust that data.


#6    Mike Fast      (see all posts) 2010/03/16 (Tue) @ 14:57

It’s also worth noting that PITCHf/x data indicates he’s pretty special, too.  His average fastball is something like 98 mph, and that’s unheard of for a starting pitcher.  We know that 98-mph fastballs have a significantly lower BABIP than the typical fastball.

One of these days, I’m going to get my equivalent to Jeremy’s fxRV system going, and then I can put a number on what I think his stuff says about his run prevention talent.


#7    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 15:03

If Prior is the comp, then he gave up runs at 75% of the league average, according to Rally’s chart.

Also, and I can’t say this enough: we are cherry-picking AFTER THE FACT.

When you come up with these lists, as I did with mine, and we come up with a range that says that the best OBSERVED rate was 70% of the league average, this INCLUDES alot of good luck and a little bit of bad luck (i.e., more good luck than bad luck). 

So, I see that list that Rally put up, and you have to come up with something worse than 70%.

Do you know how much worse?  That’s so very easy to do.  You come up with the 10 best ERA+ performances of pitchers at age 20-22, and then tell me what his ERA+ was at age 23.  And THAT becomes your absolute best-guess point for a pitcher’s talent.  And no better.  None.

This is regression, regression, regression.


#8    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 15:13

Ok, I ran this:

Spanning Multiple Seasons or entire Careers, From 1901 to 2009, From Age 20 to 22, (requiring birth_year>=1922, birth_year<=1986 and BFP>=1500), sorted by greatest Adjusted ERA+

The top 10, and how they did at age 23:
ERA+ Player Age 23
151 Vida Blue 108
149 Dwight Gooden 102
139 Britt Burns 100
134 Bert Blyleven 142
132 Sam McDowell 120
130 Dave Rozema 105
128 Ralph Branca 94
125 Frank Tanana 154
123 Gary Nolan 103
121 Don Drysdale 139
AVERAGE 114

This is just like my Pujols/HR article.  The true talent range should be 70% to 100%, and we’ll get an observed range of 60% to 110%. 

Or, in ERA+ terms, we’ll observe ERA+ of 90 to 167.  And what did we observe?  ERA+ of 94 to 154.

If you want me to limit it to age 21-22, I’ll do that next.


#9    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 15:34

Prior’s coming out party was his 2nd season, not his first.  Same for Dwight Gooden.  Vida Blue’s coming out party was his first full-season, but that’s after 80 innings in two prior seasons.  Seaver in his 2nd season.  Clemens was in his 3rd season.

***

David,

I’ve got Lincecum and Carpenter at between 67% and 70% of the league average.  So, yeah, I don’t think that a pure-rookie pitcher can match that.

I think it’s completely unreasonable to forecast a rookie pitcher to be in the running for the Cy Young.

Let’s say that Lincecum’s true forecast is 62-72%, and Carpenter’s true forecast is 65-75%, and Strasburg is 65%-95%.  We should observe Lincecum at, what 57-77%, Carpenter at 60-80%, and Strasburg at 60-100%.

Now if all three were equals, the odds of Strasburg showing the better ERA is 2:1 (one in three).  If you want to do the bet at 2:1, I’ll take that.

What are the fair odds, given the distribution I have shown?  Can someone work that out?  I’m going to guess 4:1 or 5:1?  Whatever that is, that would be the fair odds.


#10    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 15:44

Brian: how about this.  Can you tell me what MLB performance would be needed at ages 19 and 20 and 21 in order to get a forecast you are giving Strasburg at age 22?

That is, let’s say we have this pitcher in a 4.30 league ERA:
Age IP ERA
19 140 3.20
20 170 2.90
21 200 2.70

Let’s say you have that.  What’s your forecast for him at age 22?  That is, why kind of pitching lines at age 19-21 do we need to see in order to get the forecast we are seeing for Strasburg at age 22?

Now, chop all the innings in half:
Age IP ERA
19 70 3.20?
20 85 2.90?
21 100 2.70?

How much lower do you have to make the ERA in order to get it to match to Strasburg’s forecast at age 22?


#11    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 15:57

Also, Brian, I’d like to see your 10th and 90th percentile forecasts for Strasburg, Lincecum, and Carpenter. 

(The range better be much wider for Strasburg than for Lincecum.)


#12    Guy      (see all posts) 2010/03/16 (Tue) @ 16:05

What I think you have to account for here is the rather large uncertainty we have about the quality of opposition Strasburg faced in his 28 starts in the Mountain West Conference.  I have to think that there is quite a bit of variance within the conference, in terms of teams he faced, and a fair amount of variance over time in terms of the quality of the league—much more so than, say, a AAA league.  Historical data on college performance may not give you a good read on these teams in this league in particular seasons.  And this variance doesn’t just increase the error bar on his forecast, it means we have to lower his mean estimate, because his great performance likely means he faced weaker opponents than is typical.

So I come down with Tango:  I can’t imagine how you could ever know enough about his opposing hitters—virtually none of whom have played professionally—to project such an elite performance in MLB.


#13          (see all posts) 2010/03/16 (Tue) @ 16:14

I cry Wieters.


#14    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 16:28

Interesting tidbit looking at his stats.  He has 109 IP with 195 SO and 61 non-HR hits.  109 IP means 327 outs.  Take away the 195 and that gives us 132 outs.  Let’s say 125 are batting outs.  With 61 non-HR hits, that makes his BABIP of .328.  He also walked 5% of the batters he faced.

I don’t know how that compares to the average college pitcher.


#15    J. Cross      (see all posts) 2010/03/16 (Tue) @ 16:28

I think Guy makes a good point about the strength of competition.  A Strasburg or Colby Lewis actually influences how we see the strength of their leagues based on how good they were.

What are we regressing Strasburg to?

The average pitcher might have 6.8 K/9 but the a right-handed starting pitcher with an average fbv of 96 mph is more like 8 K/9.  I think I’d reject any projection that has him to close to 8 K/9 as regressing too much or to the wrong mean. 

We need to take into account that he’s the #1 pick.  And not a Luke Hochevar kind of #1 pick, more like a Justin Upton automatic #1 pick.


#16    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 16:43

By the way, who did the scouts love more:

- Strasburg facing 400 batters in his last year of college at age 20/21

- Dwight Gooden facing 800 batters at age 18/19 in his last year in A ball (when he struck out 300)

I don’t think the “best college evah” means that you get to forecast him as something better than Mark Prior.  And, like I said, Mark Prior was at 75% the league average his first year anyway.

***

You have to think in terms of the uncertainty.  You have to.  If you give Strasburg the same mean as Lincecum+Carpenter, then that means the uncertainty level is wider around Strasburg.  Which means it’s plausible that he’s much better than they are.

Even Wayne Gretzky, the greatest hockey player ever, who won an MVP in his first 8 NHL seasons, even The Great One would not have been forecasted to be the best player in the league in his rookie year.  And the NHL is a league where rookie players are often enough the best player on his team.

It becomes the absolute limit of any forecasting system to forecast anyone to be the best in his league in his rookie year.  And that means the upside, two SD from the mean, becomes the limit.

And that’s why you should stick with The Tom Seaver Rule.


#17    Mike Fast      (see all posts) 2010/03/16 (Tue) @ 16:54

Strasburg’s average fastball speed (in the AFL) was 98 mph, which is substantially better than the 96 mph group that Jared quotes as striking out 8/9ip.

Fastest fastballs by starting pitchers over the last 8 years:
2002 A.Burnett 94.9 (Mark Prior threw 93.8 mph)
2003 K.Wood 94.9
2004 A.Burnett 95.4
2005 D.Cabrera 96.2
2006 F.Hernandez 95.2
2007 U.Jimenez 95.8
2008 U.Jimenez 94.9
2009 U.Jimenez 96.1

Basically, no starting pitcher throws remotely as hard as Stephen Strasburg does.  How do you handle that fact when regressing to the mean?  What mean?


#18    David Cameron      (see all posts) 2010/03/16 (Tue) @ 17:09

To be fair, we haven’t seen Strasburg pitching every five days yet, so we don’t really know his “major league” velocity.  In college, he pitched every Friday.  In the AFL, his work was limited.  This spring, his work has been limited. 

Put him on an every five days schedule, I’d bet he’ll settle in closer to 96-97 as an average.  Which is still amazing, but a little less amazing. 

I’m probably more towards agreeing with Tom on this, but I do think there’s something to be said for regressing Strasburg to an non-normal mean.  His stuff comps are guys like Prior, Wood, Felix, and Gooden, who were all tremendous very early in their careers. 

Best in the league? Too optimistic.  It seems Oliver is just not regressing non-major league performances enough.  But Strasburg is going to be very good, very quickly.


#19    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 17:21

"His stuff comps are guys like Prior, Wood, Felix, and Gooden, who were all tremendous very early in their careers.  “

I have no problem accepting that Strasburg’s true talent level, as a rookie, is that of what Prior, Wood, Felix, and Gooden’s were as rookies.

But like I said, NO ROOKIE can be forecasted to be the #1 pitcher in his league.  The reason is because you have such a huge uncertainty around his true talent, unlike the much smaller uncertainty you have of Lincecum, that if you forecast them with the same mean, then you naturally give Strasburg upside, in his rookie year, much higher than Lincecum’s 2010.

This is the point: forecast me their 10th and 90th percentiles.  And once you do that, then try to explain that.

***

Mike: what Dave said.  Why did you not list relief pitcher velocities?

Also, how much movement is there on Strasburg’s fastballs, compared to those starting pitchers, and whatever relief pitchers will make your list?


#20    Mike Fast      (see all posts) 2010/03/16 (Tue) @ 17:36

I pretty much agree with everything in David/18, including his guess that he’d probably settle in around 96-97 mph on average in a major league schedule.  Nobody has done that in the last eight years.

However, I don’t think we should denigrate his AFL velocity readings too much.  He was throwing every fifth game there, although not very many innings altogether.  Here was his workload and when we got PITCHf/x velocity readings:

10/16 - 3.1 inn, 11 bf
10/22 - 2.2 inn, 16 bf (avg FB 97 mph)
10/27 - 4.1 inn, 16 bf
11/2 - 5.0 inn, 19 bf (avg FB 99 mph)

So he wasn’t slowing down throwing more innings, as far as we can tell.  I honestly don’t know how to translate velocity from a couple AFL starts well after the end of his college season to how he would do through a full season of 30+ starts against major-league hitters.


#21    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 17:40

Even if he can touch 100 mph as a starter, why would he bother?  He’s going to sacrifice speed for movement, just like every other starter does.

To me, speed is irrelevant without seeing the movement numbers.


#22    Mike Fast      (see all posts) 2010/03/16 (Tue) @ 17:42

Why did you not list relief pitcher velocities?

Because most of them are throwing one-inning stints and are preparing as such.  That’s much different than what Strasburg was doing, even in our two-start sample.

Also, how much movement is there on Strasburg’s fastballs, compared to those starting pitchers, and whatever relief pitchers will make your list?

I’m skeptical we know how to measure much of anything about a pitcher’s talent from his pitch movement.  We can probably tell you with some reliability whether he will be a groundballer or a flyballer, but in terms of talent, I’m not convinced we know that much.


#23    Mike Fast      (see all posts) 2010/03/16 (Tue) @ 18:42

Btw, my rule of thumb is that 4 mph of fastball velocity = one run allowed per 9ip for a starting pitcher and 2 mph of fastball velocity = one run allowed per 9ip for a relief pitcher.  It’s the best I’ve been able to determine using the crude measuring sticks I have available.  I’m willing to see that refined, and I should probably post a detailed description somewhere about how I arrived at that number.

However, if you accept my rule of thumb as accurate, and frankly I don’t know how well it applies in the rarefied air of 95+ mph starters, you’d give Strasburg roughly half a run per 9 advantage over the Prior, Wood, Felix group.

I know that’s a lot of ifs, so I don’t necessarily think I’m throwing a perfectly accurate estimate out there, but I think you need to do some sort of adjustment in the ballpark of what I’m talking about.


#24    J. Cross      (see all posts) 2010/03/16 (Tue) @ 20:12

I’ve got each mph adding ~0.3 K/9 and reducing HR/FB by .0035, both a bit less for righties and a bit more for lefties.  That said, I didn’t control for any thing else and, since these are all MLB pitchers, the guys who don’t throw hard must have something else going for them.

Did you look at how changes in velocity from one year to the next effected individual pitchers?


#25    rwperu34@hotmail.com      (see all posts) 2010/03/16 (Tue) @ 20:19

"The Next Mark Prior”

There have been five in my memory.

David Price (how quickly we forget)
Mark Prior
Kris Benson
Ben McDonald
Andy Benes

Tim Lincecum was the sixth college pitcher taken in the 2006 draft (Hochevar, Reynolds, Lincoln, Morrow, Miller).  Max Scherzer certianly would have gone ahead as well if not for bonus demands (he got twice as much as Lincecum taken one pick later).

All that is my way of saying, there is no freaking way Strasburg has an ERA expectation of 2.86 for 2010! 4.00 would be more like it.


#26    J. Cross      (see all posts) 2010/03/16 (Tue) @ 20:34

What if we dump the college stats altogether.  Marcel projects a 4.50 but since he throws (at least) 4 mph harder than the average RH starter, we use Mike’s rule of thumb and knock off a run.  That gives us a 3.50 ERA.  Ridiculous?


#27    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 20:43

ACtually, Marcel starts with the league average which is 4.3.  If you want to make that 3.3 ERA, that’s 77% of league average.  And that’s close to what I was saying.  And that’s close to MArk Prior’s rookie year.


#28    Tangotiger      (see all posts) 2010/03/16 (Tue) @ 20:52

Here, let me give you a tough one.  Ready?  You have Dwight Gooden’s stellar year in A ball.  You have his great rookie year.  And you have his coming out party in 1985.

Now, what is your forecast for him in 1986?  I want everyone to try and do it.

I’m going to guess that he’d have a forecast of about 60% of league average.

That’s about as good a forecast as you are going to give any young pitcher.

Now, how in the world are you going to give Strasburg a forecast of 66% of league average?  It’s impossible.

What you are telling me is that what he did in college facing 400 batters last year and, I dunno 1000 batters in 4 years is equivalent to Gooden’s 3 magical years, 2 of which came against MLB players, and having faced close to 3000 batters.

No.  No way.


#29    Guy      (see all posts) 2010/03/16 (Tue) @ 21:01

#25:  Actually I think it was Alan Benes, Andy’s younger brother, who was considered the greater prospect.

And going back a bit further, remember how Wilson, Pulispher, and Isringhausen were going to be the Mets’ big 3 for years to come. 

Interesting piece in today’s WaPo on Kerry Wood, the Strasburg of his time:  http://www.washingtonpost.com/wp-dyn/content/article/2010/03/15/AR2010031501566.html


#30    MGL      (see all posts) 2010/03/16 (Tue) @ 21:18

Lotta stuff here.  As Tango said, the best you can forecast a rookie player for is the combined second year of players who are selected for their outstanding first year (with no regard to how they did after that).  That is your upper bar.

As far as velocity goes, we should be comparing him to the top velocity pitchers in college and/or in minor league or AFL ball.  For example, do we know whether Felix, Jimenez, and all those other guys might have thrown harder before they got to the majors?  That is entirely possible - that he cannot sustain his current velocity in the majors for whatever reasons - heavier workload, trying to get more movement or better location against better hitters, etc.

Also, as always, it depends on what the forecast is for?  By that, I mean is it the expected numbers assuming that the player actually pitches or is it his true talent, even if he should never pitch or make only one start?  Those are 2 very different things.

Or is the projection only if “he pitches a certain number of innings?” If that is the case, you want to make your projection much higher than his actual true talent.

In general though, I agree 100% that these low projected ERA’s are WAY too low for a rookie.  Too many things can happen to make the actual performance worse and nothing can really happen to make it better.  Sure, if he pitches > 120 innings, the projection might not be far off, but for him to pitch that many IP means he got lucky at first and for a while.  If he comes up and stinks up the place, even by luck alone, he is going to get sent down or into the pen.  Any rookie who plays a lot got lucky.  Probably any rookie who even plays a little got a little lucky.


#31    rwperu34      (see all posts) 2010/03/16 (Tue) @ 21:18

#29-It was definitely Andy. #1 overall pick in 1988. I remember the hype that year started early. I was so excited when Evansville came to the Tempe regional and was completely awed when he threw a one hit shutout against an ASU team that average about 9 runs a game and hadn’t been held under four in any single game all season. I think Alan got more hype because he came a few years later as media was starting to explode, but Andy was definitely the better prospect coming out of college.

Another thing that comes to mind is, while it’s likely that Strasburg is as good as he’s ever going to be, that doesn’t mean he is as good as he’s ever going to be. If you put his chances at 25/50/25 of improving/steady/declining, that still means he’s got a 25% shot at getting better. The guys who are the best in the league are typically going to come from the group of pitchers who got better. Lincecum, Halladay, Sabathia, Greinke, King Felix, Haren, all of these guys have improved over time. With Sabathia and Haren the improvement has been ridiculous.


#32    tangotiger      (see all posts) 2010/03/16 (Tue) @ 22:47

I’m looking at the 2004 Marcels on my site.  Prior is at 3.15 and Pedro at 2.84.  The league average then was like today.  That puts Pedro at 66% of league average.

I’m looking at 2001 Marcels (Pedro was younger and better).  I gotta figure this is as good as it gets.  League average was 4.7, and Pedro was 2.62, or 56% of league average.  This makes his true talent at somewhere between say 50% and 62%.

So, the absolute best pitcher at his best you can safely say allows runs at 50% of the league average.  I know we OBSERVED him one year at 35%.

Cherry-picking his 1997-2003 years, and his index is 47% of league average.  Let’s just say 50% is the true talent limit.

If you want to give Strasburg a 50% index as his upside, you probably give him 100% as his downside.  The uncertainty range has to be great.  That puts him at 75% as his mean.  That’s roughly 1 run better than league average.

That’s as far as you can go.


#33    Mike Fast      (see all posts) 2010/03/16 (Tue) @ 23:01

Tom, I think your criticism is valid up to a point.  Yes, if you want to project a player, it’s not probable that any player is going to put a season that is the best ever or among the very best ever.

But what are you suggesting that Brian do with his projection system?  Put in an artificial cap above which no player is allowed to be rated?

I think this line of criticism has more applicability for the Wieters/Montero crowd, great players/prospects both, but not of the once-in-a-lifetime talent like Strasburg.

I’ll push back the same way against rwperu/25.  None of those players had a 98-mph average fastball (with good control and good secondary pitches, too, for that matter).

I agree that if I were guessing what ERA Strasburg would put up were he to pitch in the majors in 2010, I’d probably guess something in the 3.50-4.50 range, simply because I’m not the kind of guy to go out on a limb with projections.  But I also don’t think a good projection system is going to tell you that.  Strasburg’s talent is up there with the hype.


#34    Mike Fast      (see all posts) 2010/03/16 (Tue) @ 23:07

As far as velocity goes, we should be comparing him to the top velocity pitchers in college and/or in minor league or AFL ball.  For example, do we know whether Felix, Jimenez, and all those other guys might have thrown harder before they got to the majors?  That is entirely possible - that he cannot sustain his current velocity in the majors for whatever reasons - heavier workload, trying to get more movement or better location against better hitters, etc.

I struggled earlier today to write a reply to Tom’s assertion that pitchers will lose velocity as they play against advanced competition and sacrifice speed for movement.  I couldn’t get a good response written, so I gave up.

However, I think I can write a better response to your comments in the same vein, MGL.

I don’t know of any evidence that pitchers tend to lose velocity as they advance up the chain.  We have scant systematic evidence of the velocity of pitchers in the minors and none from college that I know of.  Maybe somebody wants to gather up the scant evidence we do have from the WBC, Futures games, spring training, etc. 

Absent that, my anecdotally gathered guess is that pitchers don’t particularly lose velocity as they advance.  Obviously some pitchers do lose velocity, and some pitchers gain velocity, and some pitchers get hurt, but I would guess that absent injury velocity doesn’t particularly suffer from advancing to a higher level.


#35    Nick Steiner      (see all posts) 2010/03/16 (Tue) @ 23:34

I think I agree with Tango here. 

While Mike raises valid points about Strasburg’s stuff and the scouts opinions of him that could possibly justify a sub 3 ERA projection, Oliver is blind to that.  All it knows it Strasburg’s age and his college stats.  And given the uncertainty of college stats and the uncertainty of pitchers in general, Strasburg simply has too much of a range of expected performance to be projected at a mean of 2.86.  Because projection systems don’t use scouting data, they have to be tempered with their projections of outlier players.


#36    Jared      (see all posts) 2010/03/16 (Tue) @ 23:40

Since I can’t add much to the sabermetic side of the discussion, shouldn’t Strasburg’s age in the 2010 season be regarded as 21? He won’t turn 22 until mid July, meaning he’ll be 21 for most of the season. From a scouting perspective, a player’s “seasonal age” is determined by how old he is on June 30th of the given season.


#37    MGL      (see all posts) 2010/03/16 (Tue) @ 23:59

Mike, I don’t know at all whether pitchers lose velocity when they enter the majors, but it is reasonable to think that pitchers who throw that hard in high school and college don’t need to have much more of anything (movement, location, deception, mixing up pitches, etc.) to succeed.  Not so in the majors, obviously.

I realize that Strassberg does have good command, other pitches, etc. and he is not just a flamethrower.  But, as I said, it is not unreasonable to think that while a pitcher who throws 88 or 92 in college will not lose velocity in the majors, that a pitcher who throws 98 might, because he has to use more guile and command to get MLB hitters out.  I’m just speculating though.

Tango, I have been doing pitcher projections for 21 years. The best I have ever projected a starter - Pedro, Maddux, and Clemens in their heydays, Haren, Halliday, Lincecum, and Webb now, is around 66% of a league average starter.  No one was ever projected better than than.  No one.  I don’t think that 50% is a reasonable limit.  I think 66% or 67% is.  I am very certain of that.


#38    Jared      (see all posts) 2010/03/17 (Wed) @ 00:11

The only reason a guy who throws 98 would intentionally decrease velocity is to gain command. For Strasburg, command is not a problem. So if he does lose velocity, I think it would be because of a change in mechanics or injury.


#39    rwperu34      (see all posts) 2010/03/17 (Wed) @ 00:16

#33,

Maybe none of those guys had a 98 MPH fastball. We really don’t know, especially with the guys from the 80s.  All were known as flamethrowers coming out of college though. They were all considered generational prospects. All were expected to dominate right from the start.

Strasburg’s ceiling is best pitcher in baseball. That’s if everything breaks right. There is no way to expect him to be the best pitcher in baseball right out of the gate. There is just too much that can go wrong.


#40    Kincaid      (see all posts) 2010/03/17 (Wed) @ 00:19

Nick/35’s point struck me as problematic as well; Oliver was said to basically do something similar to Marcel with the only added complication coming from the MLEs.  So if you are getting that kind of projection without regressing to a very low mean ERA by picking out comps based on velocity or scouting reports or draft status or whatever and just regressing Strasburg to the overall mean, then that means that if you were to take those things into account, then he would be even more unworldly (less worldly?).  Unless I misunderstood something about how regression is done in Oliver, anyway.  I wouldn’t think you could even hit whatever you decide the reasonable limit for a prospect is if you are still regressing them to a generic mean.

Also, Brian said something about regressing players to the mean for their level.  What does that mean for Strasburg?  I assume it can’t be an average D-I pitcher; is it average for a D-I pitcher who was drafted (mixed with however many innings of average AFL ball), or average for a certain level of the minors, or what kind of mean is Oliver regressing draftees to when they are coming from a primarily amateur season?


#41    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 00:36

After I calculate each league’s MLE factors, I run through the average batting and pitching line to get a ‘typical’ MLE from that league, which is what the players are then regressed to. If the player is in more than one league, it’s done as a weighted mean based on PAs in each.

For college, the regression is based on players in each conference who went on to play pro.

Pitching Regression Means

League   BH   HR   BB   SO
ACC    .337 .051 .129 .116
B10    .326 .049 .117 .120
P10    .330 .051 .125 .121
SEC    .330 .052 .119 .122
MTW    .346 .061 .121 .103
AFL    .318 .051 .129 .116
NL     .303 .039 .082 .175


#42    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 00:44

I will be retesting the amount of regression. The amounts used are the same as before I added age normalization into the MLE process two weeks ago. I will look to get the mean error as close as possible to zero, while minimizing the total (rmse) error, and bringing in the outliers as much as possible, but Strasburg may still be out there.

This was something I had planned to do over the summer after the initial programming settled down. I still have fixing the catcher’s fielding, defensive aging curves and efforts to individualize all aging curves (trends, comps, etc) on the to-do list. and baserunning...my wife is hoping I get off the computer one of these months.


#43    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 00:46

Mike: I’m suggesting that either Brian did something wrong, or that the algorithm he has, which was tested on his sample data, has no data point that extreme that he can extrapolate on.

***

Jared: age is not an integer!  Strasburg will be 21.80 to 22.30 years old (or whatever), for those 6 months.

To the extent that you want an integer, I don’t subscribe to the notion that we round every number on the face of the earth to the closest integer… except for age.

***

MGL: while you will forecast someone at 66%, this means that you have an uncertainty around that 66% of .... something.  So, the true talent of a pitcher forecasted at 66% is something like 56% to 76%.

So, two points:
1. Pedro faced 5500 batters from 1997-2003.  After that fact, we can tell what his true talent level was at that time.  His ERA/lgERA is going to have very little luck in there.  And, he was at 47% (!!) of league average.  At 5500 PA, you regress 5% toward the mean.  That puts his true talent level in those SEVEN years at an AVERAGE of 50%. 

2. Marcel forecasted Pedro for the 2001 season at 56% of league average.  Run your forecasting system following the 2000 season, and I’m quite sure you’ll get something pretty darn close to that.


#44          (see all posts) 2010/03/17 (Wed) @ 01:15

"The only reason a guy who throws 98 would intentionally decrease velocity is to gain command. For Strasburg, command is not a problem. So if he does lose velocity, I think it would be because of a change in mechanics or injury.”

I’m sorry, but this is just an absolutely false statement.  Not only do you not know this, but I don’t even know why you’d think it. 

Joe Nathan “only” threw 95, but he dropped velocity in order to get more movement and bust right handed batters inside.  He had no problems with control, he just wanted more movement.

I can’t think of another specific example, but I’m sure there are plenty.


#45    Jared      (see all posts) 2010/03/17 (Wed) @ 01:21

Tom, I was simply mentioning his DoB since you were referring to his seasons by age.


#46    Mike Fast      (see all posts) 2010/03/17 (Wed) @ 01:29

Mike: I’m suggesting that either Brian did something wrong, or that the algorithm he has, which was tested on his sample data, has no data point that extreme that he can extrapolate on.

Tom, yes I’ll buy those as legitimate questions/concerns. 

I guess the point I was trying to make was that Strasburg is an extreme outlier.  He’s more extreme than Mark Prior and Kerry Wood.  That’s true based on his college performance, and it’s true based on his pitches and scouting profile.  I know that current projection systems don’t consider the latter, but I included them as a point of independent evidence that shows that he is exceptional. 

I honestly don’t know how/if a projection system should be able to project Strasburg correctly.  I don’t know what a “correct” projection for him should look like.  I agree you are in the right ballpark, but would an algorithm that’s “working” arrive in that ballpark with Strasburg?


#47    Mike Fast      (see all posts) 2010/03/17 (Wed) @ 01:38

Steven/44, I don’t see any evidence for your Nathan example.  He’s been throwing 93-95 since he came into the majors.  His velocity peaked at around 95 mph in 2006-2007, and his since dropped to just below 94 mph, but he didn’t gain any movement with that.  I’d like to see some evidence before I accept your claim that plenty of pitchers sacrifice velocity for movement.

While it’s true that many pitchers throw a cutter that’s slower than their four-seamer, and some fraction of pitchers’ two-seamers are slower than their four-seamers, I’m not aware of very many examples of players choosing to drop the velocity of their four-seamer in order to get more movement on it or to radically reduce the frequency with which they throw it in favor of either the cutter or the two-seamer. 

There are probably a handful of such pitchers, but I’d be surprised if anyone could list more than a dozen who did that in the majors in the last three years where we have the data to check one way or the other.


#48    Jared      (see all posts) 2010/03/17 (Wed) @ 01:42

Steve/#44, whoa there. Didn’t mean to ruffle any feathers.

According to Fangraphs.com, Nathan’s average fastball velocity dropped from 94.8 in 2007 to 93.5 in 2008 and 93.6 in 2009. Not a huge difference.

Now, if Strasburg can throw 97 instead of 98 and got more movement, I think he’d be a fool not to take the movement and slightly decreased mph. But when you throw 98 with good command, movement isn’t all that important.


#49    MGL      (see all posts) 2010/03/17 (Wed) @ 04:11

Just based on history, I don’t think that 98 is sustainable as an MLB starter. I would almost guarantee that.  More stress and anxiety, heavier workload, more and more chance of slight injury, desire for greater command and movement (at the MLB level), and the simple fact that we think that young pitchers lose velocity every time they head take the mound.  It is likely that even if everything falls right for Strasburg that by the end of the season, he will be throwing 97.5 (if he starts out at 98).  Probably much less from fatigue, depending on how much he is protected.

Again, I think this boils down to what is his projection if he stays healthy, nothing else goes wrong, and he accumulates 120 innings at the MLB level versus what is his true talent coming out of the box, including the chance that he bombs or gets hurt.  And I think those numbers could be .5 runs apart.

As far as what number do you regress a player towards?  Simple.  Whatever population you think contains players of his attributes, as long as those attributes are not related to his statistics, at least the ones you are trying to regress.  Draft order, age, level of experience (college, minors, etc.) height, stuff, etc.

Also, it depends on whether you are trying to project his performance IF SOMETHING, like if he pitches in the majors, or if he pitches A LOT in the majors, or you are trying to project his true talent, such as to help a team decide whether to promote a player or not.  It makes a big difference, as I state above.

For example, if you are simply trying to project Stas’ true talent, you do NOT regress him to college pitchers who have played in MLB, Brian!  You regress him to ALL top college prospects and even then, that will be too optimistic because one reason for his top prospect status is his gaudy college stats.  Technically you want to regress his MLE’s toward that of ALL college pitchers of similar physical and mental attributed. Remember what regression is. It is a shortcut for a Bayesian probability.  What are the chances that he is a true talent X pitcher who has X stats in N opportunities, given the distribution of true talent in whatever population he came from.  So, for example, you want to include the possibility that he is merely an average tall, college pitcher who throws hard, but is not really true talent great as his stats suggest.  That likelihood is always significantly more than zero because there are so many of those players (although in Stras’ case, because of the 98 thing, there probably isn’t).

Now, if you just want to be “right” with your projection IF he pitches more than a couple of innings in the majors, then you can regress him to college prospects who actually pitched in the majors.  But, this is NOT a true talent projection and would not be helpful to a team trying to decide how good a prospect is.

So while choosing what mean to regress to should be simple and straightforward in principle, one has to be careful and it usually requires some messy assumptions, estimations, and fudging, but nothing fatal…


#50    dcj      (see all posts) 2010/03/17 (Wed) @ 05:10

Nice discussion. Overall I have to side with Tango and MGL. A couple random thoughts:

Pedro faced 5500 batters from 1997-2003.  After that fact, we can tell what his true talent level was at that time.  His ERA/lgERA is going to have very little luck in there.  And, he was at 47% (!!) of league average.  At 5500 PA, you regress 5% toward the mean.  That puts his true talent level in those SEVEN years at an AVERAGE of 50%.

Yes. Maybe by 2003 he was a true talent 3 ERA pitcher playing a little over his head, but in 99-00 he was just as dominant as his numbers show.

Just based on history, I don’t think that 98 is sustainable as an MLB starter. I would almost guarantee that.

That seems reasonable. Out of curiosity, does anyone know how hard Nolan Ryan threw?


#51    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 07:44

MGL, not college pitchers who went to mlb, those who played pro - over 2000 of them in the minors.


#52    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 08:09

Jared, perhaps you are unaware:

http://www.tangotiger.net/wiki/index.php?title=Seasonal_Age


#53    David Gassko      (see all posts) 2010/03/17 (Wed) @ 08:20

Well, this thread really blew up while I was gone. I’m glad Brian is going to look into this some more, but I still believe that it is not unreasonable to project Strasburg to be near the top of the ERA leader board next year, presuming he can pitch enough innings to qualify. I don’t think people are properly understanding the upside here. Strasburg’s upside is not “best pitcher in baseball,” it is “greatest pitcher of all-time,” on a per inning basis.

Tango/27 is willing to project Strasburg for a 3.30 ERA solely based on his fastball velocity. Are you really going to say that when you add in his fantastic secondary stuff and his insane college statistics, that isn’t enough to knock .4 points off his projection? I think it should be.

And note, by the way, the numbers from Brian/41. Strasburg is being regressed to a mean that is 14% worse than the average NL pitcher in terms of BABIP, 56% worse in terms of HR rate, 48% worse in terms of BB rate, and 41% worse in terms of SO rate. Oliver is definitely not giving Strasburg the benefit of the doubt. It just so happens that Strasburg is that good.


#54    J. Cross      (see all posts) 2010/03/17 (Wed) @ 09:03

I’m pretty sure I’m allowed to reproduce one:

Strasburg - PECOTA:

103 IP, 4.26 ERA, 125 K, 44 BB, 14 HR

Is their a Wieters Tidal Effect?  Get burned on your Wieters one year and pull back too far the next?


#55    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 09:19

Jared, can you post the 10th and 90th PECOTA percentiles of:

Lincecum
Carpenter
Bronson Arroyo
Strasburg


#56    David Gassko      (see all posts) 2010/03/17 (Wed) @ 09:23

J. Cross/54,

PECOTA’s projections for college players are simply based on their signing bonus—it doesn’t actually use any sort of college translations (unless they’ve recently changed something).


#57    Rally      (see all posts) 2010/03/17 (Wed) @ 09:34

They have to be looking at college stats to project 125 K in 103 innings.  No way you get a line like that out of a signing bonus alone.


#58    Rally      (see all posts) 2010/03/17 (Wed) @ 10:00

I wanted to compare Strasburg to some of the other college pitchers mentioned, but the stats can be hard to find.  Did find this:

K/9 IP
Strasburg 16.1
Prior 13.2
McDonald 12.0

His strikeout to walk rate is similar to Prior’s (in fewer innings).  In comparing to a Kerry Wood or Justin Verlander, Strasburg has far better control.  Couldn’t find any college numbers for Benes.

I think if he’s healthy and maintains a 98 MPH velocity he’ll meet Brian’s projection.  But his durability is a big unknown, and injuries happen all the time.  So all things considered, a more conservative projection is appropriate.  I’ll take a stab in the dark and say 3.50.


#59    Rally      (see all posts) 2010/03/17 (Wed) @ 10:03

Oh, and David Price, 13.1 K/9.

Jered Weaver came in at 13.3, a college pitcher who was very polished but not quite the stuff of the others mentioned.


#60    Snapper      (see all posts) 2010/03/17 (Wed) @ 10:09

"Strasburg’s upside is not “best pitcher in baseball,” it is “greatest pitcher of all-time,” on a per inning basis. “

That seems like a little much.

I agree with Tango and MGL.  To project his expected performance as top-3 in the league is insane.


#61    Guy      (see all posts) 2010/03/17 (Wed) @ 10:19

One consideration with someone like Strasburg is how much growth potential he has.  I know Brian fixed the “double-counting” problem with his MLEs (adding an age adjustment when the translation already captured it).  But I assume his translations still incorporate some amount of growth between a player’s college performance and his later MLB performance.  However, a super-elite performer like Strasburg is likely to be someone with much less potential for further growth than the average college pitcher.  His performance means he’s probably a guy who reached his peak very early (or close to it).  A lot of the guys we’re talking about here—Gooden, Benes, McDonald, Wood—were basically as good in their first season or two as they would ever be.  Injuries obviously complicate this analysis, but most of these guys never improve in the majors.  And when they do, it’s often because they fix control problems (e.g. Ryan, Randy)—but Strasburg has no upside there.  So I think you have to approximate a straight translation for this kind of player, as in:  what would he have done in MLB if he had pitched there last year? 

*

I also wonder if Strasburg’s velocity in some sense makes it harder to interpret his extreme college K rate.  He was facing a lot of guys who would have struck out even if Strasburg’s catcher had promised they would see only fastballs.  These guys were just overmatched.  There may be a “tipping point” where we can’t just use a multiplier to translate the college K rates.  (If Strasburg were a 19-yr-old HS senior with a 22 K/9 rate, would we be confident we could translate that based on past HS pitchers?)


#62    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 10:23

Rgearding regression, the effect will of course be smaller with a larger sample size for the player. Strasburg had a weighted sample of 926 PA, Lincecum 1465.

A problem I have right now is that I haven’t yet had time to reprogram my diagnostic queries to match the aged based MLEs.

However, last night I was running some numbers, although they might not be optimally setup. I looked at projection using only college stats in year drafted, compared to 1st three years in MLB.

138 college pitchers met MLB criterion for matching in period 1998-2009.

time to reach MLB

1 14
2 42
3 33
4 23 
5 19
6  7
7  4
8  0
9  1

I got a FIP rmse weighted by IP of 1.43. IIRC, In the SIERA testing I got FIP rmse’s of about 1 for y1, 1.1 for y2, 1.3 for y3.

College only, I have Jered Weaver at 3.75 ERA, David Price 4.27, Lincecum 4.43 - his problem was he walked a ton of guys in college that he doesn’t now.


#63    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 10:38

There may be a “tipping point” where we can’t just use a multiplier to translate the college K rates.

Right, exactly.  This is my point with say Barry Bonds and 3com park factors for LHH.  Barry Bonds hit as many HR at home as he did on the road.  I forget how many AB that was.  I think it was 150 HR in both.  Anyway, that’s alot of AB.

But, ALL OTHER LHH hit two-thirds as many HR at 3com as they did away from 3com.  Now, isn’t it likely that what happens is that 3com turns 380 ft flyballs into 370 ft flyballs.  But for Bonds, well, it turns 430 ft flyballs into 420 ft flyballs.

And the same thing with Dante Bichette and Juan Pierre at Coors.  Bichette may simply be the ideal warning-track power guy who got his sweetspot distance at Coors, and nothing was going to help Juan Pierre.

Wade Boggs was tailor-made for Fenway.

ETc, etc, etc

And the same principle applies with Strasburg: at some point, you have a segment of punk-a$$ college kids who simply can’t hit anything above 96-98mph, and Strasburg was feasting on them.

***

Jeff Sackmann did a fantastic look recently in terms of doing minor league translations by ONLY look at head-to-head matchups for minor league hitters of minor league pitchers who were eventually called up.  THAT was so very lovely to see.  Was that in the Annual?  It was great great stuff.

If you split up Strasburg stats between guys drafted in the first 5 rounds, and guys drafted in the 20th or later rounds (or not drafted at all), and compare him to the next best college pitchers, it would seem to me that Strasburg will get to be closer to the pack.

***

There was a similar issue here:
http://www.sports-reference.com/olympics/blog/?p=178

Neil,

It doesn’t seem right. What if you only considered the games between the big 6 (or 7 if you want to add Slovakia)? Basically, I’m wondering if you are making too big a deal if Russia beats Japan 12-0 and Canada beats them 9-0.

While it adds to our knowledge that Russia likes to beat up on bad teams, does it REALLY add knowledge that it might have an advantage over Canada?

Tom

Which is why I so much loved Gabriel’s chart that showed how the Big 7 did against each other.

***

Sweet spot, tipping point.  Whatever you want to call it. 

***

Brian: I hope you don’t mean to tell me that you treat 1000 PA against college hitters with the same level of reliability as you treat 1000 PA against MLB hitters.


#64    Rally      (see all posts) 2010/03/17 (Wed) @ 11:06

If Lincecum has 1465 PA vs MLB hitters, and Strasburg 926 vs college hitters, I’d probably treat Strasburg the same as a MLB pitcher with around 460 PA for regression.  I weight lower levels less just like I weight previous years less. It might not make that much difference though, for strikeout rate, since that component is regressed so little anyway.

He’s forecast for above average but not elite control, babip higher than league average, and a close to normal HR rate.  It’s the 11.3 K/9 that is driving the forecast here.


#65    Mike Fast      (see all posts) 2010/03/17 (Wed) @ 12:03

Strasburg - PECOTA:

103 IP, 4.26 ERA, 125 K, 44 BB, 14 HR

Those peripherals produce a FIP of around 3.82 (assuming a constant of 3.20).  Is PECOTA projecting his BABIP or strand rate to be that poor?

By comparison, here’s the same line from Oliver:

120 IP, 2.86 ERA, 150 K, 34 BB, 10 HR

That’s a FIP of around 2.92 and a BABIP of around .300, give or take a few points.


#66    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 12:04

Rally, that was not Lincecum vs MLB and Strasburg vs college, the sample sizes I quoted were all for college.

Tango, I regress the same amounts but to different means, after having applied the park/age/league factors.

Is there something else I could be doing to have the projections reflect the uncertaincies?


#67    Mike Green      (see all posts) 2010/03/17 (Wed) @ 12:05

I have some skepticism that Strasburg’s projected K rate should be significantly higher than Nolan Ryan’s in Shea.  On the other hand, I would be surprised if his BABIP is above league average (the Mountain West environment tends to lead to high BABIP, and with Strasburg’s K rate and stuff, you’d guess that he would be better than league average in the NL).

The big issue is durability, which impacts not only on expected innings pitched but also on performance.  Hence, I side with Tango. As if that is a surprise.


#68    J. Cross      (see all posts) 2010/03/17 (Wed) @ 12:11

Tango,

It looks like they don’t have Beta pecota cards for pitchers yet.  At least not that I can find.  They have projections in their depth charts but those just like to their 2009 cards.


#69    Rally      (see all posts) 2010/03/17 (Wed) @ 12:13

Ryan didn’t strike out more than a batter per inning until he went to the Angels.  In his record season, he struck out 10.6 per 9.  This was more than twice the league average (5.1).  Strikeouts are much more common today (7.1 for NL 2009), so Strasburg at 11.3 is not as impressive as Ryan’s 1973 season.


#70    Mike Fast      (see all posts) 2010/03/17 (Wed) @ 12:15

Jared/68, they have said they hope to release the PECOTA cards for pitchers some time this week.

Do they list a projection for hits allowed for Strasburg?  Oliver projects 97 hits.  If you could share that, we could compare approximate BABIP projections (assuming league average rates for GDP and the like).


#71    David Gassko      (see all posts) 2010/03/17 (Wed) @ 12:17

Hey Brian,

I think Rally’s suggestion is a good one. That is, if a pitcher has 1,000 PA against college hitters, and the MLE multiplier for college is 0.5, you count that as 500 PA when doing your projection. If the MLE multiplier for AAA is 0.9, 1,000 AAA PA would count as 900 PA. 1,000 MLB PA would obviously count as 1,000.

And each component will then have a different sample size, too. So if the BABIP MLE multiplier is .8, you regress BABIP the same as you would for a pitcher with 800 major league PA, but if the BB MLE multiplier is .5, you regress BB the same as you would for a pitcher with 500 major league PA.


#72    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 12:34

OK, I was confused at first...are you saying a new variable by league/level that reduces the sample size prior to regressing? It would be easy to code.


#73    David Gassko      (see all posts) 2010/03/17 (Wed) @ 12:45

Yep. So if a player has 200 PA in MLB, 150 in AAA and 100 in AA, and the MLE multipliers are 1, .9, and .8 respectively, his adjusted sample size is 1*200 + .9*150 + .8*100 = 415, rather than 450 as we would have right now.


#74    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 12:47

Brian, I’ve talked about it in the past.

Suppose you have Justin Upton with 600 PA in 2007, and you have no idea what his stats are in 2008.  And you need to forecast him in 2009.

Now, let’s say you have Elvis Andrus with 600 PA in minor leagues in 2008.  And you need to forecast him in 2009.

I am suggesting that the reliability of Upton’s stats from two years ago (in MLB) is the same as Andrus stats from one year ago (in minors).

If you test that, then you know how much uncertainty there is in the stats.

And you go on down the line.  I would say a player’s MLB stats in 2006 will tell you more about his performance in MLB in 2009, than a player’s A-ball performance in 2008 will tell you about 2009.

And so on. 

This is what you are missing: the uncertainty of what the OBSERVED performance stats are telling you.

College stats is at least 50% noise (with respect to what it tells you about MLB).  It’s just useless data that tells you nothing.  So when you look at the data, it’s got so much uncertainty in it.

It’s like looking at Canada play Norway and Italy and Japan in hockey and looking at Russia play the same teams, and thinking you’ve got a certain level of uncertainty.  No, it’s much wider, because that competition doesn’t inform you very much.


#75    Guy      (see all posts) 2010/03/17 (Wed) @ 13:05

David/Brian:  Why would the sample size adjustment be proportional to the change in talent level?  Wouldn’t it be more on point to look at the size of the errors in the projections you make from each level?  That is, how much larger are the errors for a projection based on college data, as compared to prior MLB experience?
It may end up the two track each other pretty well, but it’s not obvious to me they will.


#76    David Gassko      (see all posts) 2010/03/17 (Wed) @ 13:14

You’re right, Guy. My suggestion was simply a “hack.” The correct way to do it is to try to minimize the errors at each level.


#77    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 14:37

I agree with Guy. The goal is to minimize the errors, but we are working on the methods.

A preliminary part of the projections is to calculate the MLE by league each year. From that, I can do a parallel set of projections only using data from a given level, MLB, AAA, AA...college.

MLB projections compared to MLB future performance would be the baseline rmse. Then see how much larger the rmse, in each component, is in each level over the same gap in years.


#78    Rally      (see all posts) 2010/03/17 (Wed) @ 15:09

I was looking at the projections for the #2 pick after Strasburg, Dustin Ackley, who also has played only college and fall league.

It’s showing a .320+ BA and .400+ OBP for this year.  I find this more bothersome than Strasburg.  He’s a good prospect, but let’s wait a bit on the HOF plaque. At least Strasburg’s supporters can point to a skill that exceeds anyone in recent memory (better fastball than any SP).  Are scouts telling us that Ackley has a Boggs/Gwynn level swing?  Are his college stats unprecedented? Another case where much more regression is needed.


#79    Rally      (see all posts) 2010/03/17 (Wed) @ 15:23

I looked for college hitters in the same conference with similar numbers and found one a year earlier: Buster Posey.

I thought Posey had a great season in 2009, rocketing up through the Giants system and getting his first taste of MLB ball.  His MLE for the year was 255/332/382.  I know as well as anyone that MLE’s can be harsh, and that is very respectable for a 22 year old catcher.

But looking at his college MLE’s it appears he had a massive off year.  His MLE was 334/425/513 for his last year of college.  Did he really get that much worse last year as he established himslef as a top prospect?  Or is there something off with the college MLE?  I’d stake any amount of $ on the latter.


#80    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 18:34

I don’t see the problem with Posey

Posey’s stats at Florida

        raw      unregressed mle  
      BA  OB  SA   BA  OB  SA
2007 382/446/520  301/362/396 
2008 463/566/879  334/425/513

plus .342 in 111 AB in the minors in 2008, which went into the MLE

In 2009 he had a raw .420 wOBA at San Jose in the Cal league, then .389 at Fresno in the PCL, along with .316 in 84 PA in the AFL and .106 in 17 PA in SF. I’m assuming the park & league factors led to an overall poor 255/332/382 MLE for 2009.

Doing the three year weighted mean, it all comes out as 288/370/434, a .356 wOBA. Good BA, good OB, average power, one of the better hitting catchers.

To give a better perspective, I’m preparing a list of college projections 1998-2009 of batters who reached MLB.


#81    philly      (see all posts) 2010/03/17 (Wed) @ 19:45

Are scouts telling us that Ackley has a Boggs/Gwynn level swing?  Are his college stats unprecedented?

Not quite HoF level, but BA did name him the collegiate hitter of the decade.  There seems to be some questions about his ultimate power projections, but the hit tool and expected BA/OBP are very much elite.  He’s not a “once a generation talent”, but he’s considered not to be too far below that.


#82    J. Cross      (see all posts) 2010/03/17 (Wed) @ 19:56

And you go on down the line.  I would say a player’s MLB stats in 2006 will tell you more about his performance in MLB in 2009, than a player’s A-ball performance in 2008 will tell you about 2009.

And so on.

This certainly makes sense (that lower level stats are less predictive in the same way more time distant stats are) but do you have reason to think that the added uncertainty of extra year removed is roughly equal to the uncertainty of one extra level of competition removed?

This is something I really should have found the answer to if I want to call myself a spoon bender but I don’t have time to figure this one out before the season.


#83    rwperu34      (see all posts) 2010/03/17 (Wed) @ 20:00

Brian,

Is that a peak translation or his 2008 MLE? If it’s taking his age 21 numbers and extrapolating them out to an age 27-29 peak, I can almost believe it. Almost. If that MLE is saying what Buster Posey did his final year in college was better than what Matt Holliday did in the majors, then something needs to be recalibrated.


#84    J. Cross      (see all posts) 2010/03/17 (Wed) @ 20:03

David, that’s interesting that they just use signing bonus.  I kind of like that in an odd way.

Mike, PECOTA has with 96 H in those 103 innings 14 HR and 125 K…

so that BABIP is hits 82/(103*2.83 - 125 + 82)

.330 BABIP or there arounds.


#85    Tangotiger      (see all posts) 2010/03/17 (Wed) @ 20:24

Jared: just speculative instincts, FWIW.  I prefer a model that passes the sniff test first, and then surprises me with a Chanel.

Otherwise, you might get the fake stuff from Canal St.


#86    rwperu34      (see all posts) 2010/03/17 (Wed) @ 20:24

College position players taken in the top five of the draft;

2008-Alvarez, Posey (Alonso, Smoak, and Wallace were all considered better hitters than Ackley coming out as well).

2007-Matt Wieters

2006-Evan Longoria

2005-Alex Gordon, Jeff Clement, Ryan Braun

2004-None, although Stephen Drew might have been if not for bounus demands.

2003-Rickie Weeks

2002-none

2001-Mark Teixeira

2000-none

1999-Eric Munson

1998-Pat Burrell, JD Drew

1997-Troy Glaus

1996-Travis Lee

1995-Darren Erstad, Jose Cruz

Let’s have the average MLE for these guys in the year draft+1.


#87    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 20:29

Here’s the projections for the college hitters 1998-2009, who have since lost their MLB rookie status, for the year after they were drafted. It will include minor league stats if they played pro in the same year they were drafted (too much work to separate them right now).

http://spreadsheets.google.com/pub?key=tsMWjdHT0u17kOdubTFrOpA&output=html


#88    J. Cross      (see all posts) 2010/03/17 (Wed) @ 20:43

Jared: just speculative instincts, FWIW.  I prefer a model that passes the sniff test first, and then surprises me with a Chanel.

Well, for our hitter projections we weighed minor league stats by .75 (relative to MLB stats) and we just made up that number based on nothing.  We should break it up by level for pitcher projections but I suppose I’ll still just be making up the numbers.  0.9, 0.8, 0.7, 0.5 for AAA, AA, A and college might have to do.  Maybe 0.7 for Cuba too.


#89    Brian Cartwright      (see all posts) 2010/03/17 (Wed) @ 20:49

one correction - my draft list starts in 2002 (which is the year most of my college stats start), so projections are from 2003-2009

2002 266/345/448 347 Teixeira


#90    Rally      (see all posts) 2010/03/18 (Thu) @ 10:00

Of the top 20 college hitter projections in Brian’s list, only one played at that projected level his full year as a pro.  That would be Ryan Zimmerman, who did it in the majors.  Yes, this is cherry picking, but if regression is working correctly, I can choose a selective sample of 20 best hitters in year X and be right on target for the group in year X+1.

With the growth expected as a player goes from age 21/22 to his prime, a few of these players have done better than Brian’s first year projection: Longoria, Ethier, Zimmerman, Braun.

Another 5, once they developed, have essentially equalled those projections: Pedroia, Quentin, Garko, Hill, Drew.

The others have failed to meet those projections, even as they matured as hitters.


#91    Rally      (see all posts) 2010/03/18 (Thu) @ 10:01

Oh, and my source for their first year MLE’s was Dan Szymborski’s huge historical file.


#92    David Gassko      (see all posts) 2010/03/18 (Thu) @ 11:45

I updated the to-do list:

http://www.hardballtimes.com/main/forecasts/todo/


#93    David      (see all posts) 2010/03/22 (Mon) @ 23:03

Unfortunately I don’t have the time to read every comment at this point in time but I find Tango’s following comment puzzling: “You have to think in terms of the uncertainty.  You have to.  If you give Strasburg the same mean as Lincecum+Carpenter, then that means the uncertainty level is wider around Strasburg.  Which means it’s plausible that he’s much better than they are.”

While I certainly agree that these Cy Young level predictions for Strasburg are optimistic to the most ridiculous degree (who thought Wieters could be out-done?), I’m not following the logic that requires Strasburg to have a much higher upside than Lincecum and Carpenter. It would makes sense to me if we knew that Stragsburg’s performance projection is normally distributed but I am not seeing how that is given (now that I think about it, it doesn’t really make sense to me that Carpenter’s would be either due to the higher probability of injury). Frankly it seems to me that Strasburg’s projection should be two tailed and/or heavily skewed given that the uncertainty about the whole statistical projection relies on non-standardized data and a small sample size of less than perfect examples.


#94          (see all posts) 2010/03/22 (Mon) @ 23:33

It’s ridiculous to post a mean. At least throw an uncertainty term. And the better thing would be to run Monte Carlos. Sample and re-sample his stats - see how robust they are. The best thing to do is test your model’s performance in a naive fashion. But realize that since everything is always changing (especially in pitching mechanics and the attention that is paid to them today) your predictions based on things that happened in the past will never be able to fully capture the progression of the game. You are either limited by time scale or sample size - in an exponentially changing world this is going to more and more the case.

If you think differently go try and get lucky in the stock market. Come back when you can’t beat the S&P.


#95    Tangotiger      (see all posts) 2010/03/23 (Tue) @ 09:19

David/93: if you have two distributions with the same mean, but one has more uncertainty around that mean, this means one distribution will be wider than the other.

The only way for the two to have the same upside, the same mean, but one distribution to be wider than the other is for Strasburg to have a worse downside AND a better mode.  That is, his distribution will be so left-skewed to balance out the worse downside.

In either case, my way, or this way, Strasburg will have a better chance at a Cy Young season.  And that makes no sense.


#96    Mike Green      (see all posts) 2010/03/23 (Tue) @ 09:54

Tango/74,

Yes.  There is a significant error bar projecting from double A to major league performance.  The error bar projecting from college to major league must necessarily be much, much larger. 

If you look at the development pattern after college of all players (ie any particular player’s competition), it is easy to see why.  The best players from college are drafted; most are not. The cream of the crop almost always spend a year in A ball after being drafted; most other drafted college players spend longer than that or never make it to double A.


#97    J. Cross      (see all posts) 2010/03/23 (Tue) @ 15:07

At long last, the PECOTA pitcher cards:

Player/90%/50%/10%

Lincecum 2.43/3.16/3.42
Carpenter 2.89/3.19/3.55
Bronson Arroyo 3.87/4.50/4.79
Strasburg 3.94/4.66/5.26

Carpenter and Lincecum look to have very differently shaped outcome distributions.


#98    Tangotiger      (see all posts) 2010/03/23 (Tue) @ 15:23

Lincecum’s 90% forecast divided by his 10% forecast: 71%

Carpenter: 81%

Arroyo: 81%

Strasburg: 75%

And there ladies and gentlemen is why I say the PECOTA percentile forecasts are irrelevant. 

1. How in the world is the range of Strasburg’s forecast the same as established stars?  It’s impossible.  We know less about him, and so the uncertainty of his forecast must be wider.

2. He has only a 10% chance to post a better than 3.94 ERA?

This all goes back to the way Nate does his comparables.  What Nate does is that he looks ONLY at rate stats.  And so, he would take Strasburg’s MLE rate stats, looks at comparable pitchers based on rate stats, and then looks at those pitcher’s observed ERA, and use that as the range.

This is 100% wrong.


#99    Brian Cartwright      (see all posts) 2010/03/23 (Tue) @ 19:23

I’ve offered Strasburg $100 if he can beat a 2.86 ERA this year (in MLB)


#100    J. Cross      (see all posts) 2010/03/31 (Wed) @ 02:15

I started out weighing college stats by 0.4 and Strasburg still projects to be ridiculously good.

If I completely ignore his college stats and regress 100% to what’s expected based on his fastball velocity (and to a lesser extent age, league and role) he projects to have a 3.52 ERA.  If I weigh his college stats by 1 (still adjusted to get MLE’s but given equal weight to MLB stats) he projects to an absurdly good 2.73 ERA (not that different from the THT projection).

more fully:

weight (relative to MLB stats), proj. ERA
0.0, 3.52
0.2, 3.10
0.4, 2.93
0.6, 2.83
0.8, 2.78
1.0, 2.73
1000, 2.51 (essentially no regression at all)


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 01:20
Who is Jeremy Lin?

Feb 12 00:40
Clutch analogy

Feb 12 00:38
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul