THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
Mailbag:You ask:We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, May 02, 2008

What do good and bad starts by pitchers tell us?

By , 03:53 AM

I looked at the first month of the season (April only) ERA for all pitchers who were either exclusive relievers or exclusive starters.  I did this for 04-07.

From these, I put them into two groups:  Those that started out great in ERA and those that started out terribly. 

For starters, I used an ERA of over 6 or under 3 for a poor or great start.  For relievers, I used 5.2 and 1.8.  This created 82 and 89 pitchers in the two starter groups, bad and good starts, and 68 and 91 pitches in the two reliever groups.  Again, that is for 4 years combined.  There might be duplicate pitchers (probably are) in some groups from year to year.  Not in the same year of course.

For each pitcher in each group, I computed his projected ERA before the season started, using a crude Marcel with no age adjustments (I did do a regression).

I also computed their projected ERA after the one month of great or poor performance, updating the pre-season projection with the new information by weighting the new information twice the amount of the prior (pre-season) information.  Finally, I looked at their ERA for the remainder of the season.

Here are the results and some commentary:


IP1 is the number of IP in April
ERA1 is the ERA in April
preERA1 is projected ERA before season started
preERA2 is the projection updated after April was over
ERA2 is the ERA for the rest of the season (AFTER April)

Starters

Bad start

N age IP1 ERA1 prERA1 prERA2 ERA2
82 31.2 27 7.12 4.43 4.56 109 4.70

Starters

Good start

N age IP1 ERA1 prERA1 prERA2 ERA2
89 31.7 33 2.25 4.15 4.05 152 4.09

I’ll make a couple of comments here.  I am guessing that the ones who started out real bad were more injured as a group, and that is why they were .14 runs worse than their updated projection, although .14 runs is not a whole lot to get excited about.  Also, I did not do any age adjustments in my Marcel projections, so since these pitchers averaged in their early 30’s (31.7) a little worse than their projection is probably expected for all groups.

Anyway, the starters who started out great essentially pitched at their projected level - in fact, a little worse (not statistically significantly worse, I am assuming).  So, at least with regard to these guys AS A WHOLE, PLEASE don’t try to tell me who is “real” and who is not.  Everyone pitches to around their expected level.

Now the relievers.  Obviously, they don’t average many innings in one month, but we still see a lot of talk about them, like Turnbow is no good anymore, etc.

Relievers

Bad start

N age IP1 ERA1 prERA1 prERA2 ERA2
68 32.1 12 6.78 4.12 4.22 47 4.12

These guys actually pitched better than their updated projection, although, again, I doubt that the difference is statistically significant.  But it looks like there is nothing going on when relievers pitch badly in the first month of the season.  And we are only talking about 12 IP each.

Relievers

Good start

N age IP1 ERA1 prERA1 prERA2 ERA2
91 32.5 12 .92 3.95 3.84 54 3.68

These guys who pitched great for 12 IP in April exceeded their updated projection by .16 runs, which is good, but again, nothing to write home about, and also probably not statistically significant.

Let’s look at some other subsets of players just for fun:

Here are young pitchers only - less than 27 years old.  There are not a whole lot as you could guess from the average age of the entire group above.  Beware of sample size issues here.

Starters

Bad start

N age IP1 ERA1 prERA1 prERA2 ERA2
8 25.0 26 6.63 4.72 4.83 94 4.74

Nothing unusual here.  If you are young and you start out real badly, it doesn’t seem to mean much.  Your projection is your projection.  They actually pitch a little better than expected, but that might be an aging thing again.

Let’s look at the young guns.  The young starters who started out lights out.

Starters

Good start

N age IP1 ERA1 prERA1 prERA2 ERA2
12 25.3 32 1.95 4.31 4.11 149 3.60

Wow, they exceeded their projection by .51 runs.  We are only talking about a total of 1783 IP, but that is still quite impressive.  Even if we assumed a .1 or .2 decrease due to aging, they still exceeded that.

The young relievers show nothing remarkable, so I won’t even show their stats, plus we really have small sample sizes for them.

How about the old guys (33 and over)?

Starters

Bad start

N age IP1 ERA1 prERA1 prERA2 ERA2
26 35.9 25 7.25 4.38 4.47 108 4.82

Maybe when an older pitcher pitchers badly, it is a suggestion that he is near the end.  Or again, maybe it is just because I am not doing any aging adjustments plus the injury thing, and that these guys really need to be age adjusted aggressively.

Starters

Good start

N age IP1 ERA1 prERA1 prERA2 ERA2
29 38.2 33 2.34 4.05 3.99 144 4.31

Hmmm.  They also pitch a lot worse than expected.  It is probably the lack of aging adjustments again.  If we tack on another .3 in ERA or so for both the good and bad April pitchers, for these old guys, for aging, they pitch about what they are expected to pitch.  So really nothing remarkable there.  For example, unless there is something specific about Morris’ pitches, independent of his performance thus far (which is why you CANNOT let your scouts see numbers, although you can’t avoid it with MLB pitchers of course), I just don’t see how you can write him off after one month.  What gives you the right to do that?  According to this model, I had Morris with a projected normalized (to 4.50) ERA of 4.65 going into the season.  Even if you add .5 runs for aging and his bad start, he is still better than replacement level.  Teams and fans will always, always, always (did I say always?) overvalue recent performance, good or bad.

Finally, and again, just for fun, let’s look at pitchers, like Cliff Lee, who are not too old or young, and look like they have finally turned the corner.  These are the guys who did not have a great projection going into the season, but have pitched lights out for the entire month of April. These are the guys that hundreds of articles are written about, right?  Are they for real?

I am restricting to ages 27-31 and those starters who had a pre-season projection of greater than 4.25:

Starters

Good start

N age IP1 ERA1 prERA1 prERA2 ERA2
20 28.9 32 2.22 4.55 4.38 147 4.19

Well, they definitely did better than expected, but not quite the studs they looked like in April.  I will grant them a (very) partial “for real” on the order of .25 runs or so (the .19 above plus a little for aging), which is not wood.  That was not the case for relievers, BTW.

Finally, how about the not so young or old starters who were supposed to be good, but really stunk in April?  Was their projection just a mirage?

Starters

Bad start

N age IP1 ERA1 prERA1 prERA2 ERA2
5 29.4 28 6.92 3.80 3.99 123 3.86

There were only 5 pitchers, so sample size beware, but they appeared to do just fine if left alone.  The 16 relievers to make this list did the same.  They pitched at a 7.18 ERA clip in April, but 3.77 the rest of the season (with a projection of 3.97).

#1    David Cameron      (see all posts) 2008/05/02 (Fri) @ 12:12

Why use ERA as the determinant instead of NERC or even FIP? I don’t think Rob’s argument about Lee was that he’s going to be above average because his ERA is low, but rather, because his peripherals are ridiculous.


#2    fifth of      (see all posts) 2008/05/02 (Fri) @ 12:12

Good to see I’m not alone in thinking people are way off on Matt Morris. (Not that he’s good, but I think people like piling on pitchers who got good contracts after bad seasons.)

In the 27-31 starters with a good start, I would think we are seeing that these pitchers have healed from injuries, right? I would think this is the most likely age group to have pitched through an injury recently but also be able to ~fully recover. Younger pitchers would be more likely to have not been pitching in the majors when injured or to have been coddled when injured (idea for a study, if the data could be corralled: how often do lg min, arb, and FA pitchers make starts when they are injured?), and older pitchers are less likely to make full recoveries. (I am asserting these claims, which could well be false.)


#3    Tangotiger      (see all posts) 2008/05/02 (Fri) @ 12:26

Using FIP, OPS, or Peripheral ERA would be better, but when looking at groups of pitchers, I don’t think the conclusions would change much.

Regardless, I find the research illuminating.


#4          (see all posts) 2008/05/02 (Fri) @ 12:51

What’s that number between prERA2 and ERA2? The line looks like this:

N age IP1 ERA1 prERA1 prERA2 ERA2
82 31.2 27 7.12 4.43 4.56 109 4.70

What is the 109 showing? There are more numbers than categories, unless I’m missing something (I probably am)


#5    Mike Green      (see all posts) 2008/05/02 (Fri) @ 13:00

I wonder if a hot start by a young hitter is as suggestive of improvement as it is for a young starter.  My theory would be that it is not.

My theory is that confidence-building is more important to a pitcher’s early development curve than to a hitter’s.  The hot start for a young pitcher either is indicative that the confidence is growing or in fact helps to produce that confidence.

My theory would be that you would see a similar effect for a young hitter who has a hot start (confidence is of some importance for hitters too), but at a lower level.


#6    fifth of      (see all posts) 2008/05/02 (Fri) @ 13:08

Dan/4: I don’t think MGL labeled it anywhere, but it appears to be IP2 (innings pitched post-April).


#7    MGL      (see all posts) 2008/05/02 (Fri) @ 16:05

Yes, sorry, that number is the number of IP pitched after April.

ERC or something like that is always better, but when dealing with large groups, ERA is just fine. Plus, I wanted to use the same thing that “people” use in real life for pitchers which is mainly ERA.  IOW, I deliberately wanted to choose pitchers that had high or low ERA’s at the start of the season.  If I used ERC or NERC, the number of pitchers with very high and low numbers would be much smaller, given the same criteria (>6 and <3 for starters, etc.).

It is kind if like if you were researching hot and cold streaks and batter/pitcher matchups. Why not use BA since that is what all the broadcasters (and probably managers) use to “determine” whether a matchup is good or bad or a player is hot.

I am not sure I get the injury thing, “fifth of?”

Tango, do you do an aging adjustment for your pitcher Marcel’s?  How?  I just did a weighted average of all prior year ERA’s, with each subsequent year being weighted 50% more than prior year, and I regressed based on IP/(IP+300).  I regressed all starters toward the league average ERA for all exclusive starters for that year (cheating a little), and same thing for relievers. 

If I do the same study for hitters, my guess is that we would see pretty much the same thing, but I think that ALL subsets of hitters, young, old, etc., will simply match their Marcel projections.


#8    Tangotiger      (see all posts) 2008/05/08 (Thu) @ 15:20

I forget what my aging is for pitchers, but you can go to the main Marcel page:
http://www.tangotiger.net/marcel/

then follow one of the links, and one of the posts talks about how I did the pitchers:
http://www.tangotiger.net/archives/stud0346.shtml#1028

The age adjustment is the same as hitters, and it is:
If over 29, AgeAdj = (age - 29) * .003
If under 29, AgeAdj = (age - 29) * .006

Add 1 to those numbers, and then divide or multiply as appropriate.

So, a 39-yr old pitcher would have an AgeAdj = 1.03, so you bump up all his bad numbers by 3% (runs allowed, HR allowed, walks allowed, etc) and reduce by 3% all his good numbers (IP, K, W, etc).

***

http://www.bb-ref.com/pi/shareit/jz2T

That’s the list of all pitchers with at least 7K and at most 1 walk per start.  Cliff Lee had a 3K start that broke his string, so he doesn’t appear anywhere near the top.  He has 39K, 2BB in 37IP, so that’s damn impressive. 

Anyway, the best in the lot is Curt Schilling in 2002, where he had a 9 game streak, resulting in 93K and 4BB in 67IP.  That’s fairly jaw-dropping.

He followed that up, 2 months later, with a 6 game streak of 54/2 in 46IP.  Yowza.

Lots of good to great names in there.  There’s also Shane Reynolds (36/3 in 26IP).

As Rally pointed out elsewhere, since K rates have the least amount of variance associated to them (correlation year-to-year is very high) and since BB rates are almost as reliable, it would be hard to believe that an average pitcher could go 39/2 in K/BB while facing 156 batters.

For pitchers, I would bet that you could regress the K/BB ratio about 50% toward the league mean after 156 batters.  (I’ll check into it.)

If that is the case, that would mean regressing the 39/2 halfway toward a league average of roughly 26/13, making Lee’s true talent 32.5/7.5.

A quick way to turn that into ERA is:
5.40 - 12 * (K-BB)/PA

So, that 39/2 would be 2.55 ERA
The league average 26/13 would be 4.40 ERA

Regressing 50% makes it 3.48 ERA as his true talent based only on his K/BB.

But at 50% regression, the uncertainty level is fairly wide (though I don’t think wide enough as to put him as below average).

I’ve also found with Marcel that while the 3/2/1 is fine for the “average” components, when it comes to K rates, something like 6/2/1 makes more sense, and the BABIP rates would be 1/1/1. 

We also need to look at the quality of opposing hitters.

***

MGL: could you redo your study, this time selecting based on (K-BB)/PA ?


#9    fifth of      (see all posts) 2008/05/08 (Thu) @ 20:06

Re: the injury point I made earlier

If you have three age groups for pitchers - 20-26, 27-31, and 32+ - then the group that is least likely to have been pitching through injury at the major league level recently is the 20-26 because these players can be optioned to the minors and their clubs are more likely to have a long-term investment in their future.

The group that is least likely to have improved relative to their recent performance because of recovery from injuries, I am speculating, is the 32+. At this age, you are less likely to get another chance at the major league level, and you are less likely to be able to recover from an injury.

So, for the type of study above, I am suggesting that 27-31 might be the sweet spot for players whose projections are heavily influenced by previous ineffectiveness caused by injuries that do not have substantial lingering effectings moving forward. If Cliff Lee were 24, he may not have pitched as much in the majors with injury problems. If he were 34, he would be less likely to be starting in the major leagues at all, and he would also be less likely to have healed (my guess, not a truth). At 29, his projection is heavily influenced by a performance we have reason to discount for injury reasons. Since MGL’s projections don’t know the specifics of the injuries, if we were to have the illusory God’s Eye View projection that can do the proper actuarial work to account for the impact of injuries, my speculation is that the 27-31 group above would have a GEV projection more in line with their results. This is my suggested rationale for MGL’s statement that this group has a “(very) partial “for real” on the order of .25 runs or so” effect.


#10    MGL      (see all posts) 2008/05/08 (Thu) @ 22:14

Tango, I am using box scores for pitcher performance, so if I know ip, hits, bb, and so (the usual box score info for pitchers), what is the best formula for estimating PA?


#11    MGL      (see all posts) 2008/05/09 (Fri) @ 00:27

I looked up the numbers on B-R, and it looks like around 9.6% of all baserunners become outs (OOB), so I guess I can just use ip*3 + bb + hits - (hits+bb)*.096 + roe, where roe has to be estimated, since it is not included in the pitching line in the box score, as .04*ip.

Does that sound about right?

If I wanted to get a little more precise, which I don’t think is necessary, I can narrow the outs on base down by figuring a percentage of the non-SO outs as a function of the number of baserunners, as GDP, and then the rest of the OOB are possible CS, PO, and other outs on base, as a function of baserunners.


#12    MGL      (see all posts) 2008/05/09 (Fri) @ 00:33

the runners on base should be hits+bb+ip*.04, where the .04*ip is the roe.  And of course, I can just combine the 3*ip (total outs) and the .04*ip (roe) to make one term, 3.04*ip.

So the entire thing is:

3.04*ip + hits + bb - (hits + bb + .04*ip)*.096


#13    MGL      (see all posts) 2008/05/09 (Fri) @ 02:56

Ok, I repeated the study for (k-BB)/PA, where BB does not include IBB or HP.

The y-t-y correlation for pitchers with at least 300 PA in back to back seasons was .685 with an average PA in each year of 682.  So, for the regression equation in the Marcel projection, I am going to use:

300/(PA+300)

I am using 650 PA as the average number of PA, since we have pitchers with all different numbers of PA (min of 300) in the regression sample, which tends to drive the “r” down a little (I think), as opposed to if they all had 682 PA each.

I don’t know if the K/BB ratio would be around the same, but for (K-BB)/PA, you regress 50% at 300 batters (approx.).

OK, the y-t-y corr. for K/BB for same group of pitchers was .626, less than (K-BB)/PA. So for K/BB, you need close to 400 TBF for a 50% regression.

Anyway, for the Marcel’s, in addition to the 300/(300 + PA) regression equation, I used a weighting system that weights each year twice the previous year, which is pretty aggressive, although as Tango suggests, maybe the weighting for (K-BB)/PA should be even more aggressive than that.

I don’t really know how to handle the aging adjustment.  There really is no consensus on how pitchers age in general or for each of the components.  If you do “delta” aging curves for pitchers, because of selective sampling, you find that pitchers get worse at all ages, as they age.  You also find that they decrease their K and BB rates (I think) also at all ages.

If you try and regress the sample stats in your aging algorithm (the “delta” method), the results really depend on the exact amount of regression.

So for now, I am doing no age adjustment.  Maybe Tango can help me out with that and I can redo everything with some kind of age adjustment.

In the 4 year period (04-07), the average starter in April had a K-BB per 500 TBF of 41.  For relievers, it was 50.

I will go on record as saying that I am not crazy about (k-BB)/PA, only because the result is not intuitive or familiar to most people.

This time I compared April to May rather than April to the rest of the season.

Starters (good ones in April)

N=74
April IP/player=33
April (K-BB)*500/PA=90

Pre-season projected (K-BB)*500/PA=59
Updated (after April) proj.=63
May IP=34
May (K-BB)*500/PA=71

Starters (bad ones in April)

N=73
April IP/player=27
April (K-BB)*500/PA= -4

Pre-season projected (K-BB)*500/PA=33
Updated (after April) proj.=28
May IP=29
May (K-BB)*500/PA=21

Reievers (good ones in April)

N=87
April IP/player=12
April (K-BB)*500/PA=110

Pre-season projected (K-BB)*500/PA=57
Updated (after April) proj.=62
May IP=13
May (K-BB)*500/PA=72

Relievers (bad ones in April)

N=90
April IP/player=12
April (K-BB)*500/PA= -9

Pre-season projected (K-BB)*500/PA=42
Updated (after April) proj.=37
May IP=11
May (K-BB)*500/PA=38

As you can see, for almost all of the 4 buckets, our projections fall “short” of the actual May performance, in terms of it mimicking the April performance.

Let’s see what happens if we don’t regress the projections as much. Maybe we are just overregressing the projections.  Maybe the 50% regression point is closer to 150 PA as Tango suggested rather than the 300 PA I found when I did the y-t-y correlation.  I’ll use 150 in the regression equation rather than 300.

GS (good April starters)

PM (Projected May) = 65
AM (Actual May) = 71

BS (bad April starters)

PM = 26
AM = 21

GR

PM = 66
AM = 72

BR

PM = 34
AM = 38

That isn’t much better. In fact, for the bad relievers, the projection is even worse than before.

We’ll scrap that idea, and leave the regression alone, using 300 PA as the 50% point.

This time I will redo the projections using a more aggressive weighting.  I will weight each season, 3 times the prior one, and then April 2 times the entire weighted pre-season data, rather than 2 times.

GS (good April starters)

PM (Projected May) = 64
AM (Actual May) = 71

BS

PM = 26
AM = 21

GR

PM = 65
AM = 72

BR

PM = 35
AM = 38

Nope, that does not help either.  One thing is that because of the more aggressive weighting in prior seasons, I am forced to regress more, as my “effective” total PA is less (e.g., if I have 100 PA this year and 100 PA last year, and I weight them equally, I have 200 PA for my regression - if I weight this year’s PA 10 times more than last year, I only have 110 or so “effective” PA for purposes of regression).

What if I weight the April performance 6 times that of all prior years’ performance? Keep in mind that it sounds like a lot of weighting, but when we consider that each pitcher only has between 10 and 35 IP or so in April, even an aggressive weighting is not going to change the pre-season projection all that much.  Remember that we weight by a “recency factor” AND by the number of PA in each year (of course).

Anyway, here are the projected and actual May performances when we weight April 6 times more than the pre-season data:

GS (good April starters)

PM (Projected May) = 67
AM (Actual May) = 71

BS

PM = 22
AM = 21

GR

PM = 68
AM = 72

BR

PM = 31
AM = 38

We get closer other than that last group which seems to be an anomaly, in that they regress toward their pre-season projections a lot more than the first three groups for some reason.

Maybe it has to do with there being no adjustments for batters faced. I don’t know.  Or just a sample fluke.

BTW, the average age for all 4 groups is high.  31.3, 32.2, 31.9, 33.2.  Maybe the reason that group 4 regresses more, or they do better than the other 3 groups is because they are older, and I did not do any age adjustments.  That assumes that older pitchers have better ratios ((K-BB)/PA) or that their ratios get better with age. I don’t know if that is true or not.

Clearly there is something going on beyond a normal Marcel, but:

One, (K-BB)*500/PA is real sensitive such that our regular old Marcel projections (using a 2 times weighting system) are really not that far off.

Two, look at what these guys are regressing in April FROM: A 90 to a 71 and a 110 to a 72, where Marcel predicts a 63 and a 62, respectively.  Clearly, a regular Marcel is better than their actual April performance (one overshoots by around 29 points and the other undershoots by around 10 points).

For the bad pitchers, their actual April ratio undershoots by around 25 to 47 points (starters and relievers), and the regular Marcel overshoots by 7 and -1 (starters and relievers).

For some reason, in the case of K-BB/PA, it appears that bad April performance is more flukey than good April performance, for some reason.  I would have thought that it is always the opposite - that bad short-term performances suggest injury, and good ones suggest a fluke.

I think, that for pitchers at least, who probably change their true talent much more often and more dramatically than do batters, it is more likely that they have improved their true talent for some reason, than they have gotten worse, including injury.

That sort of makes sense, as other than injuries and age, it is unlikely that a player all of a sudden gets worse (in true talent), but there are many reasons why players, especially pitchers, might get better.

Keep in mind two things:  One, sample size warnings are definitely in effect.  Two, the data is from right smack dab in the middle of the steroid era, at least at the end of the steroid era, such that some of these good April performances may be pitchers on PED’s.

I would love to do the same study from the pre-1980’s period.


#14    tangotiger      (see all posts) 2008/05/09 (Fri) @ 06:58

MGL:

ERA = 5.40 - 12 * (K-BB)/PA

It’s sortof DIPS/FIP, without the HR.


#15    Tangotiger      (see all posts) 2008/05/09 (Fri) @ 08:33

Using my translation in post 14, I will repeat MGL’s post, adding in the translated ERA.  (By the way, I’ve been calling it the szERA, for strike zone.  I should call it kwERA, for strikeout / walk.)

In the 4 year period (04-07), the average starter in April had a K-BB per 500 TBF of 41 (kwERA = 4.42).  For relievers, it was 50 (kwERA = 4.20).

I will go on record as saying that I am not crazy about (k-BB)/PA, only because the result is not intuitive or familiar to most people.

This time I compared April to May rather than April to the rest of the season.

Starters (good ones in April)

N=74
April IP/player=33
April (K-BB)*500/PA=90 (kwERA = 3.24)

Pre-season projected (K-BB)*500/PA=59 (kwERA = 3.98)
Updated (after April) proj.=63 (kwERA = 3.89)
May IP=34
May (K-BB)*500/PA=71 (kwERA = 3.70)

The empirical shows 61% regression toward mean after 33 IP.  This implies a 50% regression point at 52IP (or roughly 225 PA).

Starters (bad ones in April)

N=73
April IP/player=27
April (K-BB)*500/PA= -4 (kwERA = 5.50)

Pre-season projected (K-BB)*500/PA=33 (kwERA = 4.61)
Updated (after April) proj.=28 (kwERA = 4.73)
May IP=29
May (K-BB)*500/PA=21 (kwERA = 4.90)

Implies a 50% regression point at 56 IP.

Relievers (good ones in April)

N=87
April IP/player=12
April (K-BB)*500/PA=110 (kwERA = 2.76)

Pre-season projected (K-BB)*500/PA=57 (kwERA = 4.03)
Updated (after April) proj.=62
May IP=13
May (K-BB)*500/PA=72 (kwERA = 3.67)

Implies a 50% regression point at IP = 30

Relievers (bad ones in April)

N=90
April IP/player=12
April (K-BB)*500/PA= -9 (kwERA = 5.62)

Pre-season projected (K-BB)*500/PA=42 (kwERA = 4.39)
Updated (after April) proj.=37
May IP=11
May (K-BB)*500/PA=38 (kwERA = 4.49)

Implies a 50% regression point at IP=606

***

For starters, it’s fairly clear, that the regression point is at roughly 50-60 IP, meaning 200-250 PA.  That is also the same amount to regress a hitter’s hitting stats.

So, we can say that the pitcher’s K and BB numbers are along the same line as a hitter’s overall hitting line, with respect to how “real” it is.

Cliff Lee has 156 PA, meaning we need to regress his 2008 performance roughly 60% toward his forecast entering 2008.  Marcel had him at roughly kwERA = 4.3 or so.  In 2008, he’s at 2.55.  A 60% regression puts him at kwERA = 3.60.

This is highly instructive, since a typical pitcher after 156 PA should regress about 88% or so toward his forecast for regular ERA.

It’s clear that since the component of K and BB are so different from HR and BABIP, that you can’t apply your standard regression in these extreme cases.

Getting a forecast to change by 0.70 in ERA after 156 PA is enormous.  If Lee was a slightly below average starter coming into 2008, he has now put himself as somewhere close to being almost a #2 starter.

We didn’t even consider the fact that he’s only allowed 1 HR, and that his BABIP is lights out.  If we include that (that is, we’ll regress his 2008 HR and BABIP performance 90-95% toward his forecast), that puts him into the #2 starter class.

Basically, Lee has turned himself from a 6MM a year pitcher into a 12MM a year pitcher.  Based on this analysis.


#16    fifth of      (see all posts) 2008/05/09 (Fri) @ 11:16

tango/15: Given the scope of MGL’s study, I’m wondering whether what is actually changing is the projection of the player’s true talent level into the future. There may be a difference between what April tells us about the player and what April tells us about the May the player will have.


#17    Tangotiger      (see all posts) 2008/05/09 (Fri) @ 11:25

The presumption is that a player will have a fairly constant true talent level, and therefore, looking at May only, or looking at May-Oct will give you the same results.

What you are suggesting is that you can have both short-term and long-term true talent changes (like you are on drugs or something).  So, what we see in April gives us a short-term boost for our expectations in May, because something real has happened, but it’s something that does not have a long-lasting effect.

And, what we’ll see in Jun-Oct does have some lingering effects from April, but will mostly revert to our expectations of his true talent as of Apr 1, 2008.

I can’t necessarily disagree with you.  But, the “lingering” short-term effects are VERY short term from the studies I’ve seen.  We’re talking about minutes, maybe hours… days if you want to stretch it, but not weeks.


#18    Guy      (see all posts) 2008/05/09 (Fri) @ 12:47

This is very interesting.  We all knew that K and BB rates reflect less luck than other performance measures, but I certainly wouldn’t have guessed that 45 IP worth of performance data could change a talent estimate by .70 ERA.  As Tango indicates, that’s huge:  almost the difference between an average pitcher and a replacement, or an average pitcher and a star.

And kudos to MGL for pursuing this wherever the data took him, even though it somewhat contradicts his original point about Lee (that we shouldn’t get too excited about his April performance).  I await the thread at BTF where posters applaud his intellectual integrity.....

On May vs. rest-of-season, it does seem possible to me that part of what we’re seeing here is that a great April K-BB performance means the pitcher is completely healthy.  And that in turn makes it likely he’ll be healthy in May, since many pitching injuries develop and worsen over time.  So I think it would be worthwhile to see if the May-Sept. numbers are just as far from the pre-April projection. 

Finally, are the projections all adjusted for current season’s park?  I assume so, but if not then pitchers moving from CO to LA (or vice-versa) could exaggerate the predictive power of April performance.


#19    Tangotiger      (see all posts) 2008/05/09 (Fri) @ 14:23

I await the thread at BTF where posters applaud his intellectual integrity.....

Last week I asked in that infamous thread how much they would pay for Cliff Lee if he were a free agent today.  Not a single person replied.  The thread died shortly thereafter.

That’s the difference: one person gives an opinion, however brash, and gives his supporting argument (that’s mgl).  Another person, rather than give his opinion instead bashes the original opinion (that’s a portion of the btf posters on that thread).

This is similar to when they linked to the Changing Rules article.  Rather than provide their opinion on the rules and why they could work or not, and what they could offer as a change, a significant majority simply decried the changing rules as being an affront.

So, when you put someone on the spot, and ask them a very specific question, the snarkiness simply stops dead.  Truth and evidence-supported opinion usually drives away snarkiness.

When did snarkiness = funny?


#20    MGL      (see all posts) 2008/05/09 (Fri) @ 16:31

I never read the BTF thread, but, as I said in a previous post, whoever started it, either intentionally or unintentionally was “baiting”.  Think of all the great threads we have had here that never ended up in BTF.  But heck, I guess all they want is “action”, like a tabloid.  So anything that gets attention is fair game.  People on BTF generally do not like real technical stuff.

The initial post on that thread was not my best work.  And I was wrong about Lee, then I was wrong.  If anyone doing cutting edge research is not wrong about things a fair number of times, they are not doing cutting edge research.  You can’t always test something, retest it, wait for all the dust to settle, etc., before you state an opinion, even if you state is as if it were fact.  Well, I guess you can, but that would be no fun.

Anyway, these change in talent and predictability things are tricky, and I admit that I and all of us are just now learning about them and we are a long way from solving them.  There is no need to blast anyone’s thoughts on them since nothing is really settled.  There are a lot of things to criticize me for, but “intellectual honesty” is not one of them, I don’t think.  Anyone that does that, is just a blow hard, and does NOT read my stuff, or simply has a chip on their shoulder or likes to level undeserved criticism on people.

As far as Lee’s true talent improving .70 runs, making him a #2 starter now, we CANNOT take the change in K/BB, come up with a new true talent level for that, and then turn that into an “ERA” and assume that that means his true talent overall stats will change by that much.

IOW, if I redo the study and look at the ERA of those same pitchers, I don’t think that their ERA will change as much as their K/BB would suggest, but I will check.  So let’s not jump the gun on Lee’s overall talent, as measured by ERA, or ERC, or whatever.

Also, I used May, but I agree that the results will probably be the same for May-October.  The problem with using May-October is that then you have to worry about ALL pitchers’ numbers changing as the season goes on and correct or adjust for that.  I’ll run the same numbers though using May-October.

I think I only looked at pitchers who stayed on the same team, but I am not sure.  If not, then yes, there could be a problem with the numbers being biased by pitchers who change teams.  You always have to watch out for that in these kinds of studies.


#21    MGL      (see all posts) 2008/05/09 (Fri) @ 23:30

Ok, I redid the study and I did make sure that all pitchers played on the same team, at least in their last two years (the year of the good or bad April and the prior year).

I also looked at May to the end of the season and not just May,

I also added ERA to the (K-BB)/PA to see if it parallels the expected ERA based on Tango’s formula:

ERA = 5.40 - 12 * (K-BB)/PA

Remember, that I did not think it would.  Let’s see if I am right or wrong:

GS (good April starters) N=58

PM KBB (Projected May-October) = 57
AM KBB (Actual May-October) = 73

PM ERA (Projected May-October) = 4.10
AM ERA (Actual May-October) = 3.68

BS

PM KBB = 32
AM KBB = 26

PM ERA = 4.57
AM ERA = 4.79

GR

PM KBB = 57
AM KBB = 77

PM ERA = 3.92
AM ERA = 3.65

BR

PM KBB = 43
AM KBB = 46

PM ERA = 4.13
AM ERA = 3.79

So, I was wrong.  The difference in (K-BB)/PA does in fact reasonably help us to predict how much we will be “off” in ERA.

For example, for the starters who had the good April start, our (K-BB)*500/PA projection undershoots their actual performance by 16 points, 57 projected and 73 actual.  16 points per 500 PA is .032 per PA, which, if we plug into Tango’s formula, is .374 in ERA runs.  We do, in fact, undershoot (or overshoot actually) the actual ERA by .42 runs (projected 4.10 and actual 3.68).

Amazingly, even if we weight the current performance 10 times that of the prior projection, we still don’t correctly project the actual rest-of season performance.  For the 4 categories, for proj/actual, we get 71/73, 20/26, 72/77, and 29/46 (the last category, the bad-start relievers, is still screwed up for some reason), at least for the good-start guys.

The problem may not be in the weighting, though, it may be in the regression.  Without looking at different categories of pitchers (some with a lot of prior PA or IP and some with only a few, it is hard to determine from this kind of analysis if we are overregressing or under-weighting the recent performance.  Impossible in fact.

I am using 300 PA for the 50% regression for the BB-K/PA and I am using 300 IP for the ERA.

If I change to 200 IP and PA for the 50% regression points, and still weight the current April performance by 10 times the prior performance, I get projected/actual for the “good-start” pitchers of 73/73 and 76/77, but the bad pitchers are way off.  We project them to be much worse than they actually are in the rest of the season (18/26 and 26/46).

For one thing, it looks like we need to do separate weightings for good and bad starts, probably for the reasons I discussed before (that it is more likely that a pitcher improves his true talent than he gets worse in true talent.

If we go back to a 4 times weighting for current year and the less aggressive regression (using 200 IP and 200 PA), we end up falling short in our projections again.

Plus even with the 10 times weighting and the less aggressive regression, we still fall sort on our ERA projection, even though we nail the K-BB projection for the good-start pitchers at least.

If we do an even less aggressive regression for the ERA (100 IP only) and we use a 10 times weighting, we get the best projections for K-BB as well as ERA, again, at least for the good-start pitchers.

So, basically, I don’t know what the heck is going on.  And I have no idea how to project these guys for the rest of the season.

Basically, I have to admit that my previous notions of updating projections as the season goes on, at least for pitchers, and at least for extreme performances, has been turned completely on its ear.

I have to do more research.


#22    tangotiger      (see all posts) 2008/05/10 (Sat) @ 07:53

Yes, you would definitely have to know how much the forecast coming in was already regressed.

I would suggest only looking at pitchers with at least 300 IP over the last 2 years.  Or, if you are feeling adventurous, look at the “reliability” column in Marcel, and only select those pitchers with a reliability above some threshhold (like .75 or something).


#23    Tangotiger      (see all posts) 2008/05/12 (Mon) @ 09:07

Pizza said:
http://mvn.com/mlb-stats/2008/05/12/statspeak-world-famous-roundtable-may-12/

BB/K ratio actually takes 500 BF until it stablizes enough to be considered a reliable indicator…

This is the frustrating thing about how Pizza does his intraclass correlation (which to me looks exactly the way I do it).

When it hits a *minimum* threshhold of 500 batters faced, he gets r=.70.  However, if you reach a minimum of 500, that means the *average* of the sample is going to be around 650-700 or so.

His statement looks fairly ambiguous until you read how he actually does his study.

I implore Pizza: it’s fine to show the minimum, but in describing things, please, use the average.  Your statement in the context it was written (reliability of a stat for particular pitchers) would lead us to believe that at between 400 and 600 PA, the r=.70.

***

That said, instead of K/BB, try K minus BB per PA.  You will likely get something much more reliable.


#24    MGL      (see all posts) 2008/05/12 (Mon) @ 14:17

More than the ambiguity of the number of PA, I HATE the wording, “becomes a reliable indicator.”

What does that mean?!  I have NO idea what a “reliable indicator” is!  To me, that would mean like a 10 or 20% regression at the most.  To some people that means > 50%.  I assume he means that that is the point at which the regression is 50%.  If that is what he means, he needs to say that!

If it is the 50% regression point, I think that saying that that is the point at which it is considered a “reliable indicator” is a VERY misleading statement and a dangerous one.

Say a player reaches that point at has an OPS of .950, where league average is .750.  Well, people will be going around proclaiming that Pizza said that this guy is likely a true .950 hitter, since he has reached the point in PA where it is “a reliable indicator of true talent.” That would be a horrible assumption, to think that he is a true .950 hitter, when he is most likely a true .850 hitter.  And of course, at the 50% regression point, the further from the mean his sample performance is, the worse that sample performance reflects his true talent.  So to make a blanket statement about “reliability” at any regression point can be misleading.  Say we are at the 25% regression point and a player has a BA of .410 and league average is .250.  That still puts his true BA at .370, which is a far cry from .410, yet some people would say that at the 25% regression point, “Surely sample performance is a great indicator of true talent!”

I don’t have a problem proclaiming at very small samples, that a measurement means little or nothing, and that at very large samples, that a measurement indicates a lot (assuming a decent spread of skill in the population in the first place).

I DO have a problem proclaiming ANYTHING else in between other than telling us the approximate regression amount and I especially DO have a problem giving us some “magic point” at which a player’s sample performance goes from NOT meaning something to meaning something.  There just isn’t one.


#25    MGL      (see all posts) 2008/05/14 (Wed) @ 06:00

I redid the pitcher hot and cold April study.  This time I added a 4 more years for a total of 8 (00-07).

I also looked at OPS against.  For the pre-season projections, since I was using a “basic” pitcher database that only gives hits and HR (and BB, HP, etc.), I took the non-HR hits and just prorated the s, d, and t, using league averages.

For April and May, I used real OPS against (I used a PBP database).

Anyway, for the cold April pitchers, using a 4 times weighting for the April OPS, it overvalues the pitchers by around 13 OPS points.  Pre-season, they were projected at .760.  After a 1.092 April, their updated projection was .812.  They pitched to a .825 for the rest of the season.  Maybe the bad April suggested injuries.

For the hot April pitchers, they started the season with a .727 projection (obviously very good pitchers).  After a torrid April, in which they pitched to a .490 OPS against, their updated projection was .691, again weighting April 4 times the prior years.  They pitched to a .690 for the remainder of the season.  So the projection was right on the money.

The conclusion might be that a hot April tells us nothing more than a regular projection with an aggressive current season weighting would, but a bad April tells us something, perhaps that the pitcher is injured.

One of the weaknesses with using OPS against is that it does not include a pitcher’s K rate.  A change in K and BB rate, as my last study looked at, (K-BB)/PA, appeared to suggest more of a change in true talent than this study suggests, using OPS against as a measure of talent.

IOW, if a pitcher is having a great month in terms of OPS against, it might not mean anything particular, but if his K and BB rates (or some combination) have changed, it might mean that he has changed his true talent more than a regular Marcel projection would indicate.

Again, I think the jury is still out with pitchers and hot and cold starts to a season.  For one thing, I think a much larger database is needed to increase the sample sizes.  Of course, the problem with that, which is often the problem when we need many years to get a large enough sample size, is that if we go back too many years, what was true back then might not be true today, and vice versa, for various reasons.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main