THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, August 19, 2011

Pitcher X is 13-4, 2.00 ERA, 153 IP.  For CY YOUNG voting, you place him right BELOW which pitcher?

By Tangotiger, 03:36 PM

(Note: presume the league average starting pitcher is a .500 pitcher, with a 4.00 ERA.)


SabermetricsPoll
#1          (see all posts) 2011/08/19 (Fri) @ 16:16

Are you purposefully asking about the Cy Young Award here, or would really prefer that we rank them according to value?


#2    Tangotiger      (see all posts) 2011/08/19 (Fri) @ 16:32

I’m not sure I see a distinction.  But if there is, then however you would vote for the Cy Young.


#3    Don      (see all posts) 2011/08/19 (Fri) @ 16:42

Interesting twist. Unlike with your MVP poll, I think the playing time aspect is critically important for pitchers. It’s a difference of opinion that I didn’t even know I had. Great thought experiment!


#4    grady      (see all posts) 2011/08/19 (Fri) @ 16:49

Any reason for the small amount of innings pitched?  There’s obviously a wide variety of reasons this could be, and I think it makes a difference (i.e. high K/9, high bb/9, injuries).


#5    Tangotiger      (see all posts) 2011/08/19 (Fri) @ 16:49

I’d like to hear from the 3 people who chose to put the 13-4 pitcher with the 2.00 ERA (153 IP) ahead of the 19-5 pitcher with the 1.80 ERA (216 IP).

The 19-5 pitcher has a lower ERA, higher win%, more wins, and more innings.  Why put the 13-4 ahead of that guy?


#6    Tangotiger      (see all posts) 2011/08/19 (Fri) @ 17:00

Shocked at the high votes for putting this guy below the last pitcher.  That pitcher allowed 54 more runs in 63 more innings (with 0 more wins and 7 more losses).

It would seem the argument would be that those 63 innings still have SOME value, even if you have to have 54 more runs allowed in them.

Enjoying the insight your answers imply…


#7          (see all posts) 2011/08/19 (Fri) @ 17:04

It would seem the argument would be that those 63 innings still have SOME value, even if you have to have 54 more runs allowed in them.

But you didn’t ask about VALUE, you asked about the Cy Young Award, which is given to the outstanding pitcher in the league.

I can’t see voting for a pitcher who pitched 70% of a season as the season’s most outstanding pitcher ahead of an above-average starter who pitched the whole season.

If you ask me about value, I’ll give you a different answer.


#8    Tangotiger      (see all posts) 2011/08/19 (Fri) @ 18:03

Mike, not to say you are wrong or anything.  What did you think of:

Rick Sutcliffe
CC Sabathia

Where would you have them, in their respective seasons?


#9    Tangotiger      (see all posts) 2011/08/19 (Fri) @ 18:21

Fascinating results!

After 99 votes:

31% put the player between Pitcher D (+10 W/L differential) and Pitcher E (+8).  That puts our Pitcher X (+9) right in the middle of that.  That means that these people prefer AVERAGE as the baseline. 

34% put the player between Pitcher E (16-8) and Pitcher F (15-7).  The difference between that and Pitcher X (13-4) is 3-4 or 2-5.  That’s a replacement level of around .300-.400.

Then you have 26% of the votes that are even below THAT level, essentially setting the replacement level in the .100-.200 level.

And a small minority (8%) that basically relied on ERA, reasoning that the IP level was enough.

Anyway, the overall average replacement level comes in at somewhere between .350-.400, which is, again, right where I’ve got it.

Again, great job!

I’m just surprised at the more wider guesses really.


#10          (see all posts) 2011/08/19 (Fri) @ 18:30

I would have voted for Gooden in 1984 and probably Lincecum or Santana in 2008.

I don’t think Sutcliffe or Sabathia would have made my ballot in either case on the basis of their NL performance, though with them switching leagues, there’s a pull to give them some sympathy credit for their performance in the other league.


#11    DavidS      (see all posts) 2011/08/19 (Fri) @ 19:02

I voted for just behind the 3.00 ERA pitcher.  I reasoned that replacement level was roughly a 5.00 ERA and that would equate to 216 innings at around 2.90.  However, I was also influenced by the Cy Young part of the question and I had trouble picking a guy who didn’t even qualify for the ERA title.  I penalized him a little for that and bumped him behind the 3.00 ERA pitcher.

Echoing Comment 3, I don’t think I would have made a similar adjustment for the MVP although I can’t explain why.


#12    Tangotiger      (see all posts) 2011/08/19 (Fri) @ 20:07

Mike: though not necessarily #1 for Sutcliffe, CC, or RJ, and setting aside the AL sympathy vote, how high would you have them? 

Would you still have “an above-average starter who pitched the whole season” above them?

Basically, I’m challenging you to establish some baseline point that you would setup as the equivalency.

If after that you would still have them like #150 or #20 or something, then fine, that’s your call.  I’m just trying to make sure you’ve worked it out to get it to that point.


#13          (see all posts) 2011/08/19 (Fri) @ 20:31

Tango/12, don’t hold me too close to the exact order here.  I’m giving it off the top to give an idea of where I’d put Sabathia NL in 2008.  (I’m not spending much time figuring out park effects, for example, other than giving the Rockies a little extra credit.)

Lincecum
Santana
Webb
Haren
Hamels
Dempster
Lowe
Nolasco
Oswalt
Jimenez
Billingsley
Sheets
Volquez
Peavy
Jurrjens
Cain
Cook
Then Sabathia would be somewhere around here with guys like Pelfrey, Kuroda, Lohse, Lilly, Maholm, etc.


#14    Tangotiger      (see all posts) 2011/08/19 (Fri) @ 20:50

Mike: excellent.  Just looking for a general idea.

So, the excercise Mike went through, and what this pool intends to put you through, is to make you, the reader, figure out your various “baseline” levels in whatever it is you are doing (Cy Young, MVP, replacement level, etc).


#15    Tangotiger      (see all posts) 2011/08/19 (Fri) @ 21:48

After 149 votes, the equivalency level for 13-4 is:

15.7 wins, 8.3 losses

That means adding 2.7 wins and 4.3 losses (7 games).  And 2.7/7 = 0.386

And what replacement level do I use for starting pitchers?  0.380

Again, just brilliant on the consensus to get to this point.


#16          (see all posts) 2011/08/19 (Fri) @ 22:09

I also seem to have an inherent want to penalize a pitcher more for lack of IP then a hitter for missed ABs. I think it comes down to feeling that there is more of a risk/reward relationship with pitchers and IP, where many might be able to perform better by risking a corresponding increase in injury chance. I feel this relationship is less clear with hitters. Not sure there is any truth to this, but i think that’s my motivation for feelings similar to those expressed in #3.


#17          (see all posts) 2011/08/20 (Sat) @ 02:20

Considering that closers with less than 100 IP get high praise and consideration for Cy, I don’t think there is any reason to punish a 153 IP pitcher with a 2.00 ERA.  Heck, he could even be a closer.  Maybe he pitched 2 innings per game to close out 75 games and picked up 13 wins and 4 losses along the way.  Would that completely change the way you look at him? 

But anyway, even in the much more likely case that he is a starter, 153 IP with a 2.00 ERA is very impressive and seems pretty close in value to the guy who pitched to a 3.00 ERA in 216 IP.


#18          (see all posts) 2011/08/20 (Sat) @ 04:08

I’m figuring the 153-inning pitcher at ERA somewhere between 2.6 (assume league average is 20 runs/200 innings above replacement) and 2.9 (assume replacement is 5.0 ERA).


#19    rempart      (see all posts) 2011/08/20 (Sat) @ 10:04

I like to leave them as a rate by regressing by 300 BFP or around 75 innings of league avg performance. That means a regressed ERA rate of 2.66 for the 153 inning pitcher.So… 1.5 becomes 2.14, etc. He therefore goes after the 2.10 guy and before the 2.40 guy. They would be 2.59 and 2.81 respectively when regressed.


#20    Tangotiger      (see all posts) 2011/08/20 (Sat) @ 10:18

rempart: in your case, you are interested in his estimated true talent level, if all you know is his performance of that year.  Nothing really wrong with that.

I would question whether the population that each of these pitchers belongs to is the same, and therefore, regressing them to the same mean.  I mean, you could do that.

But, we also know something more, that if you throw 216 IP, you are probably better to begin with than if you throw 153 IP.

So, you could also have a different regression point for each pitcher’s IP level.

Then again, the reason a pitcher is allowed to get 216 IP is based on him not allowing many runs to begin with.  Hence, the tough part of regression (or Bayes): estimating the prior distribution.


#21    Tangotiger      (see all posts) 2011/08/20 (Sat) @ 10:20

rempart’s method, by the way, is pretty good, if you do NOT want to credit a pitcher with luck.

Basically, rather than treating a 1.50 ERA as “belonging” or “earned” by the pitcher, we recognize that part of that attribution has nothing at all to do with the pitcher, and so, is just luck.

Reasonable arguments can be made from both sides.


#22          (see all posts) 2011/08/20 (Sat) @ 11:20

It strikes me as though that’s not really what CYA (or MVP) is trying to do though - if we just gave it to the most talented player, shouldn’t we expect to give it to the same guy a lot more often than we do?
I’ve alwyas seen it as giving it to the guy who had the best year that year.


#23    Tangotiger      (see all posts) 2011/08/20 (Sat) @ 11:51

Yes, but just because you have an ERA of 1.50 doesn’t mean you actually performed to an ERA of 1.50.

You are presuming that an ERA is a reflection of your performance.  When in fact your ERA is a reflection of your performance, plus other things that had nothing at all to do with your performance.  But, you are still being attributed those things to you.


#24    Tangotiger      (see all posts) 2011/08/20 (Sat) @ 11:53

As an example, many saberists simply completely discard the won-loss record of a pitcher.  Why?  Because a large portion of that has nothing to do with the pitcher.  SOME of it does have to do with the pitcher, but at a seasonal level, most of it does not.

So, a saberist will regress 100% toward the mean… i.e., IGNORE it.

What to do with ERA?  Well, regress to some extent.

It’s the same principle. 

The problem is most people treat it as an either/or thing, either the thing does reflect the performance of teh pitcher, or it does not.

The reality is that it’s in-between.  And that’s why regression is proper to use.


#25    Peter Jensen      (see all posts) 2011/08/20 (Sat) @ 13:16

Re: Tango’s question in post #12 and Mike’s Answer in Post #13 concerning how to rank Sabathia’s 2008 partial season with Milwaukee.

My DIRVA_Plus pitcher ranking metric which was specifically designed to rank pitchers as a counting stat to rank pitchers for the Cy Young awards based on the runs saved for their team adjusting for the team defense has Sabathia ranked number 5 in the NL in 2008.  The top ten in the NL are:

Pitcher------Inn-----Runs_saved

Lincecum-----223--------55.5
Santana------234--------42.0
Webb---------226--------33.7
Peavy--------173--------33.1
Sabathia-----130--------33.0
Haren--------216--------31.3
Volquez------194--------28.0
Lowe---------211--------25.8
Billingsley--196--------25.6
Kuo-----------69--------25.3

Sabathia had an additional 11.2 runs saved while playing for Cleveland.  Some of the other pitchers had some additional small amounts save as relievers.  Since this is a counting stat abd not a rate stat Sabathia should get ful credit for the runs he was able to save in his 130 innings.


#26    Tangotiger      (see all posts) 2011/08/20 (Sat) @ 13:24

Peter, your stat is “runs saved… relative to average”, isn’t it?

So, in essence, it’s:
IP * (rate - baselineRate)

Therefore, you aren’t making a fair comparison here, with respect to what Mike is saying.


#27    rempart      (see all posts) 2011/08/20 (Sat) @ 14:43

I used the neutralized stats at B-R, and I put the pitchers for 2010 into 2 buckets one each for SP and RP (Tango had a method for this awhile back that I used). Using the formula .94*(IP*3-K)+H+BB+K I calculated BFPs for all starters. I then sorted by BFPs and established 10 percentile groups. These are the results.

BFP RA9
887 4.30
818 4.60
762 4.66
680 5.42
556 5.39
426 5.46
324 5.74
218 6.55
116 6.40
42 6.23

Suppose you wanted a sliding scale to establish an average to regress against. This would work out to about 100 BFPs equal .25 change in RA9.
You would end up with something like this.

900 4.25
800 4.50
700 4.75
600 5.00
500 5.25
400 5.50
300 5.75
200 6.00
100 6.25
0 6.50

Or as a formula, 6.50-((pitcher BFP/100)*.25)

A real world example from last year:
Oswalt 808 and 3.37
THudson 880 and 3.49

If we use the first chart. Oswalt comes from the pool of 4.60s, and Hudson from the 4.30s and they each end up at 3.70(assuming 300 BFP regression).

However, using the or a formula to predict the average of the group he came from. I get Oswalt would be regressed to a 4.48RA not 4.60 as chart1 says. Oswalt would then be 3.67 and Hudson remains 3.70.

I might research this backwards when I have time and see what it looks like.


#28    Tangotiger      (see all posts) 2011/08/20 (Sat) @ 15:07

Great job!

(Note: B-R.com has PA… it’s noted as BF I think.)

What I really like about this is that the guys with almost no IP get regressed to the replacement-level, while those with tons of IP get regressed to a bit better than the league average.

Note that now that you created small populations, the amount of PA you add is GREATER than 300.  Think about that for a second…


#29    Geoff Buchan      (see all posts) 2011/08/20 (Sat) @ 15:51

Tartar Sauce/17

I don’t know if you intended it, but you almost perfectly described the first AL reliever to win the Cy Young, Sparky Lyle, in 1977:
http://www.baseball-reference.com/players/l/lylesp01.shtml

Lyle went 13-5 with a 2.17 ERA in 137 innings. Oh, and he had 26 saves, too.

That winter, the Yankees signed free agent Goose Gossage to close, and Lyle went from (in his own words) Cy Young to Sayonara, being traded after the 1978 season to Texas for a group of prospects including Dave Righetti.


#30          (see all posts) 2011/08/20 (Sat) @ 17:25

@Tango23: Yes, but you only gave us ERA and W-L. Well, harder to measure other things without other measures. So if there were an option that was “I need more information,” I would have picked that. I actually had that thought go through my head.
And it doesn’t seem to me that regressing toward the mean really accounts for this problem - “That doesn’t tell us how he did for sure, so let’s assume he did closer to averag.” That doesn’t make sense to me, while it does make sense regressing toward the mean for talent or projections is the way to go.


#31    Tangotiger      (see all posts) 2011/08/20 (Sat) @ 19:47

Again, you have to be careful in interpreting what an ERA is, and how much it really has to do with the pitcher, and how much it has to do with the park, the fielders, or just pure luck.  And how much of that pure luck you want to attribute to the pitcher or fielders or no one at all.


#32          (see all posts) 2011/08/21 (Sun) @ 01:35

Something interesting to note.  The distribution of runs allowed for a pitcher is asymmetric about the mean.  Pitchers are more likely to be a little lucky than a little unlucky. 

(In large samples, they are more likely to be very unlucky than very lucky, however.  In small samples, the “slightly lucky” region goes down to zero runs.)


#33    Tangotiger      (see all posts) 2011/08/21 (Sun) @ 08:27

Runs follows the Tango Distribution.  wOBA however follows a more normal distribution.


#34    rempart      (see all posts) 2011/08/21 (Sun) @ 10:45

"Note that now that you created small populations, the amount of PA you add is GREATER than 300.  Think about that for a second”

You are referring to the average number of BFPs per group created-right?

Thanks for all the feedback!


#35    Tangotiger      (see all posts) 2011/08/21 (Sun) @ 11:58

I mean that the smaller and more specific you make each sub population, then the larger the regression toward the mean component becomes.

Follow it to its logical conclusion, and make the sub population so specific as that it only includes that one player, then you regress his performance 100% to the implied true rate of that sub population.  That is, once you know the prior distribution, and once you know that he’s the only one in that population, then your prior is your prior, and it’s irrelevant what he actually does in his samples.


#36    rempart      (see all posts) 2011/08/21 (Sun) @ 15:51

I created a DB that has 1998-2010 pitcher data.

In my top group, the average BFP is 933, min of about 900 BF, and the RA9 is 3.99. I found 57 matched groups of players, who pitched back to back seasons in the top threshold (above 900 BFPs). In year one they averaged 3.72 RA9, followed by 3.82 in year two. Therefore, if they regress back .10 of the .27 distance to the average that implies regression of about 1600 BFPs of avg performance.

Is that right?


#37    Tangotiger      (see all posts) 2011/08/21 (Sun) @ 16:22

I’m not so sure.  First off, we have to accept that a pitcher is getting lots of PA simply because the manager is going to observe him not giving up lots of runs.  So, we have a bias.

I think the best way to figure out the true talent level by PA is to look at the RA9 for the NEXT season.

So, gather up all the pitchers in year T with PA at least 900.  Then, find out the average (or median) RA9 in year T+1.  That I think will give you the population mean of those pitchers.


#38          (see all posts) 2011/08/21 (Sun) @ 19:59

@Tango31
But again, I only care about the results here. So the pitcher gets credit for luck. That seems a little ridiculous at first, because its NOT how you want to generally evaluate a player, because normally when you’re evaluating, you’ve got an eye to the future. But here I don’t think that’s what I want to do. I just want to say “Who had the best results this year?” And since luck is part of that, I count luck.
Now of course, ERA, W-L, and IP aren’t going to give you the total picture there. I do need to know something about their defenses, their parks, etc. Basically I want to know how many wins that player added to their team, which mainly comes down to how many runs were scored by the other team because of their faults, or rather how many runs were saved because of their successes. Mostly. There is somewhat of a clutch skill they’d get credit for if we could measure it, but this is not at all stable for short periods and somewhat hard to measure. Of course it is THE positive thing we get out of W-L that’s not in something like ERA (or, obviously more preferrably, RA9). So mostly I’d want to look at RA9, their defenses, their opponents, their parks, and their IP, and maybe some little way to measure their clutch. But you only gave 3 things. So I went of their 3 things. If you’d only given W-L, I’d have just gone based on that. I wouldn’t have felt I had enough information there, nor do I here. But I assume the point of your post was more of an ‘all-else-equal’ kind of thing, because that’s the only way it makes sense to me; it’s like if I told you I’m going to draft a basketball player, and I have to decide between a 6’ 1” guy and a 6’ 6” guy. Well that’s sort of a silly question, because I don’t know anything else about them. Now if I told you their scoring, rebounding, and assist numbers, I’d have a much better idea, but that’s still not nearly enough - I have no information, for instance, about how well they shoot or how they play defense. And no amount of regressing these scoring, rebounding, and assist numbers to any kind of mean is going to tell me anything about their defense (probably not much if anything about their shooting either). But that’s analogous to how I’m feeling with this information, so I’m going with the best I have.


#39    Tangotiger      (see all posts) 2011/08/21 (Sun) @ 20:08

I think we’re going off on a tangent here.

I was only responding to rempart’s method of distilling talent from what is representing in the results.

The point is that the results are not necessarily indicative of the pitcher.

Imagine that I ONLY gave you a pitcher’s W/L record.  Would you then automatically put the 13-4 153 IP pitcher ahead of the 13-10 pitcher 216 IP pitcher?

It’s not clear to me that we necessarily would.

Would we automatically put the 13-4 153 IP behind the 16-8 216 IP pitcher?  It’s not clear to me that we would either.

Why?  Because 13-4 does not REPRESENT the pitcher. 

Now, you can fairly reasonably accept, in the poll, “all other things equal”, and, on that basis, we proceed as we normally would.

But, I’m offering rempart support for his point of view that you would want to regress.  And his perspective is supportable.


#40          (see all posts) 2011/08/21 (Sun) @ 22:03

@Tango/33: If the distribution of runs allowed in a single inning is Tango, then the distribution in a large number of innings is not-quite-Gaussian.  However, the distribution is broader than would be expected, and there is an asymmetry about the mean.  (The median and mode are both smaller than the mean, but approach the mean in the limit of infinite IP.)

I haven’t gotten the bugs out of my writeup yet; but per your suggestion, I will be submitting to Hardball Times.


#41          (see all posts) 2011/08/22 (Mon) @ 09:47

@Tango 39: Sure. Absolutely. Here you’ve given me two stats, W-L and IP, and I’m not 100% sure off the top of my head how to weight them. Neither is a great measure of effectiveness, both have their plusses and minusses, so I’d have to consider it a while. But we always have more than those two things, so it’s sort of a moot point.
And my point on regression is this: I don’t think you regress to do anything finding (past) performance. I think you regress to estimate talent. And since I in particular want to measure performance, I wouldn’t regress. If rempart wants to measure the talent levels the pitchers exhibited in that season, then he’ll regress. It’s not entirely evident which thing we want to measure for CYA, so both stances are reasonable, though I have my opinions and I’m sure other people have other opinions. It’s part of the ambiguity of the award.


#42    Tangotiger      (see all posts) 2011/08/22 (Mon) @ 10:21

The only criteria for the Cy Young is in its name: “Most Outstanding Pitcher”.  It says nothing else about what to consider or not consider.

It’s not the Most Valuable Pitcher, though it used to be called that.

It’s not the Pitcher On The Mound When Great Things Happened.

It’s Most Outstanding Pitcher.

If you have two millionaires, and I asked for Most Outstanding Millionaire, and it was a choice between a lottery winner and the guy who was self-made, who do you vote on?

So, rempart makes a good point that while we can presume that “all other things equal” would apply to all their other supporting stats (his defense, his offense, his bullpen support, his bases empty / men on base split), that still is not enough.

And so, he needs to regress the 153 IP and the 216 IP so that he can figure out the accomplishments that we can link to the pitcher.

It’s perfectly sensible.

On the other hand, you can simply give the award to the lottery winner, and if he’s on the mound when great things happen to occur, then so be it.  That makes him an “Outstanding” pitcher.


#43    Tangotiger      (see all posts) 2011/08/22 (Mon) @ 16:56

After 321 votes on Fangraphs (compared to the 320 here so far), we have these totals, with you guys listed first, and Fangraphs listed 2nd:

A 9 (13)
B 8 (8)
C 21 (19)
D 105 (89)
E 106 (111)
F 47 (53)
G 15 (15)
H 10 (13)

http://www.fangraphs.com/blogs/index.php/poll-where-does-pitcher-x-place-in-cy-young-voting/

It seems to me that we’ve got a pretty strong overlap in terms of “mindset”.


#44    rempart      (see all posts) 2011/08/22 (Mon) @ 18:03

” think the best way to figure out the true talent level by PA is to look at the RA9 for the NEXT season."{SEE BELOW}

“So, gather up all the pitchers in year T with PA at least 900.  Then, find out the average (or median) RA9 in year T+1.  That I think will give you the population mean of those pitchers.”
{ALSO DID OTHER BFP GROUPINGS}

1998-2010 SPs (does not include RPs)
USING B-R NUMBERS NEUTRALIZED TO 4.42/RG

YEAR T    -first row                YEAR T+1
            
#    BFP    IP    R    RA9        BFP    IP    R    RA9

36    984    8674    3577    3.71        855    7508    3320    3.98

299    890    63848    29781    4.20        760    54080    27104    4.51

434    807    83128    42191    4.57        690    70529    37523    4.79

259    705    42850    23138    4.86        615    37375    20485    4.93

182    600    25307    14430    5.13        563    23778    13107    4.96

149    502    17380    9970    5.16        570    19851    10832    4.91

137    399    12668    7356    5.23        488    15626    8643    4.98

134    303    9284    5710    5.54        450    13905    8137    5.27

117    196    5218    3334    5.75        425    11340    6769    5.37

161    100    3663    2347    5.77        358    13280    7867    5.33

41    33    289    243    7.57        250    2338    1428    5.50

I’m not so sure I wouldn’t combine them into 3 groups(see beow) because the cutoffs seem to be there.

BFP    IP    R    RA9        
900    72522    33358    4.14            61588    30424    4.45

665    181333    97085    4.82            167159    90590    4.88

179    18454    11634    5.67            40863    24201    5.33

It is interesting to note that the upper group of starters is at the league avg, the next lower group is +.5rg, and the lowest group is about +1.0rg.

Still analyzing the data. I would like to be able to apply a regression constant and the correct average. Then I want to check the CY Races back to 1998. Any thoughts?


#45    Tangotiger      (see all posts) 2011/08/22 (Mon) @ 18:18

Fantastic stuff!

In the 2nd chart, is the BFP the minimum, or average?


#46    rempart      (see all posts) 2011/08/22 (Mon) @ 18:24

It is the avg. The groups mins are as follows:

BFPs
850 and over
350-849
under 350


#47    Tangotiger      (see all posts) 2011/08/22 (Mon) @ 18:53

Fantastic!

Ok, this is how you have to work it, with pitchers with at least 850 PA.

1. Realize that there is a bias.  How much?  0.31 runs.  We can tell because they were observed at 4.14, when they should have been at 4.45.

(I just realized… I should have told you to limit it to pitchers in their 20s.  Otherwise, we have to contend with aging as a bias.)

So, you increase everyone’s ERA in Year T by 7.5%.

(Of course, if you go back to multi-years, and look for pitchers with Year T and Year T-1, and Year T-2 with at least 2500 PA, the bias is going to be reduced drastically.)

2. Regression toward the mean is to 4.45.  How much?  I don’t know, but if you give us a year-to-year correlation, we can figure that out pretty quickly.


#48    rempart      (see all posts) 2011/08/22 (Mon) @ 19:36

I am going to go back and eliminate the pitchers over 30 and see what happens.

I get a correlation coeff of .33 for runs allowed between year t and t+1 for the top group.

From the old mailbag,.67/.33*900=1800.So… 1800BFP to the 4.45 mean?


#49    rempart      (see all posts) 2011/08/22 (Mon) @ 19:59

Results without the old guysNot much change.

BFP    IP    R    RA9        BFP    IP    R    RA9
895    39048    18117    4.18        789    34201    16922    4.45
                                
                            r    0.46
                                
646    106199    56918    4.82        615    101034    54313    4.84
                                
                            r    0.27
                                
175    14681    9264    5.68        382    32547    19339    5.35
                                
                            r    0.03

Is that right
?


#50    Tangotiger      (see all posts) 2011/08/22 (Mon) @ 21:07

Great job!

Ok, so for the stud pitchers, add 6% to their RA9, then add 1000 PA of 4.45 RA9 performance.

For the middle group pitchers, you add 1700 PA of 4.84 RA9 performance.

For the bottom group, you reduce their RA9 by 6% and then add 5000 PA of 5.35 RA9 performance.

Now, I know you did RA9, but you really want to do BaseRuns (i.e., component runs), or FIP, or even just (K-BB)/PA.

Again, really good stuff.  Keep it up!


#51    rempart      (see all posts) 2011/08/23 (Tue) @ 09:30

Tango,I appreciate the encouragement and support. You are without a doubt in my mind the tops in the field of sabrmetrics and baseball stats analysis. Bill James has long been passed.

I’m playing catchup a bit after being away for a few years. Happy to help out in any small way.

Just to put a wrap on this. Below are some results. I plan to go back and take a look at Base Runs or FIP.

TOP YEARS Year Age BFP RA9 regRA9
Randy Johnson* 2002 38 998 2.36 3.47
Randy Johnson* 1999 35 1042 2.41 3.48
Randy Johnson* 2001 37 954 2.49 3.57
Greg Maddux 1998 32 948 2.50 3.57
Zack Greinke 2009 25 885 2.47 3.59
Randy Johnson* 2000 36 970 2.57 3.60
Randy Johnson* 2004 40 928 2.54 3.60
Kevin Millwood 1999 24 873 2.68 3.70
Kevin Brown 2000 35 883 2.71 3.71
Roger Clemens 1998 35 928 2.77 3.72
Tim Lincecum 2009 25 874 2.77 3.74
Johan Santana* 2008 29 931 2.82 3.74
Kevin Brown 1998 33 993 2.87 3.75
Tim Lincecum 2008 24 899 2.80 3.75
Tom Glavine* 1998 32 901 2.82 3.76
Felix Hernandez 2009 23 941 2.87 3.77
Ben Sheets 2004 25 906 2.85 3.77
Curt Schilling 2001 34 992 2.92 3.77
Felix Hernandez 2010 24 963 2.91 3.78
Roy Halladay 2010 33 955 2.93 3.79
Ubaldo Jimenez 2010 26 859 2.88 3.80
Greg Maddux 2000 34 968 2.96 3.81
Brandon Webb 2007 28 938 2.99 3.83
Pedro Martinez 2000 28 783 1.65 3.83
Pedro Martinez 1998 26 917 2.99 3.84
Cliff Lee* 2008 29 856 2.96 3.84


BOTTOM YEARS Year Age BFP RA9 regRA9
Charlie Morton 2010 26 365 8.88 5.56
Kyle Davies 2006 22 300 9.00 5.56
Scott Elarton 2007 31 176 11.50 5.56
Hayden Penn 2006 21 108 15.30 5.56
Dewon Brazelton 2005 25 341 8.74 5.57
Al Leiter* 2005 39 640 7.52 5.57
Aaron Myette 2002 24 237 10.34 5.58
Horacio Ramirez* 2007 27 442 8.41 5.58
Eric Milton* 2005 29 822 7.10 5.58
Ryan Drese 2002 26 612 7.63 5.58
Sean Bergman 2000 30 326 9.13 5.58
Jose Lima 2002 29 291 9.56 5.58
Hideo Nomo 2005 36 456 8.35 5.58
Micah Bowie* 1999 24 255 10.24 5.59
Tim Redding 2005 27 148 13.66 5.59
Paul Abbott 2002 34 133 14.76 5.59
Brian Bannister 2010 29 562 7.88 5.60
Dennis Tankersley 2002 23 234 10.96 5.60
Jaret Wright 2002 26 110 18.00 5.62
Carlos Silva 2008 29 663 7.64 5.63
Roy Halladay 2000 23 338 10.04 5.65
Scott Kazmir* 2010 26 649 7.90 5.69
Hideo Nomo 2004 35 379 9.79 5.74
Jose Lima 2005 32 747 7.79 5.74
Ryan Rowland-Smith* 2010 27 488 8.91 5.75
James Shields 2010 28 869 6.93 5.79
Andy Larkin 1998 24 358 11.57 6.01


#52          (see all posts) 2011/08/23 (Tue) @ 09:54

I admit, I’m surprised how little of a penalty there is from TBB in regards to pitching 63 LESS innings. That’s 9 or 10 (likely) starts.

So, [1] either this pitcher missed a lot of starts (1/3 of the season), or [2] He doesn’t pitch deep into games. (Or switched leagues during the season).

The latter throws a wrench in the machine.

It’s the kind of crap that leads to Rick Sitcliffe (16-1 2.69 ERA 150 IP) winning the CYA over Dwight Gooden (17-9 2.60 ERA 218 IP, 276 K, 11.4 K/9, league leader in WHIP, H/9). Well that and they could have just given the award to Sutcliffe b/c the Cubs won the East.

I don’t understand why a player’s statistics don;t travel with them when they switch leagues during the season? It doesn’t make the award situations any cleaer.


#53    Tangotiger      (see all posts) 2011/08/23 (Tue) @ 10:07

MLB is made up of two conferences that we pretend is two leagues.  (Apparently, 10% interconference games is enough to count the two conferences as leagues, but 20% is enough to count them as one league.)

The writers give out two MVP awards, two Cy awards for MLB, etc, but give out one in the other sports.

Its inertia, plain and simple.  If George Orwell would have invented a sport for his book, it would have been MLB.


#54    rempart      (see all posts) 2011/08/24 (Wed) @ 11:36

Not to belabor this. But, FIP RA came out better. I like the results much better.

Top Group 845PA FIP=4.49 r=.51 constant=800 +5%
Middle 563PA FIP=4.73 r=.30 constant=1300
Low 174 PA FIP=5.12 r=.13 constant=1000 -8%

Results below for 1998-2010:

I wonder if some combination of this # and the RA9 might be a good predictor of who the CY winner is in a given year. Or at very least gets the guy we think should have won it.

BEST25
PITCHER Year Age Rfip
Pedro Martinez 1999 27 3.03
Randy Johnson* 2001 37 3.34
Pedro Martinez 2000 28 3.42
Kevin Brown 1998 33 3.45
Randy Johnson* 2004 40 3.49
Randy Johnson* 2000 36 3.56
Tim Lincecum 2009 25 3.61
Pedro Martinez 2002 30 3.61
Zack Greinke 2009 25 3.62
Curt Schilling 2002 35 3.64
Randy Johnson* 1999 35 3.66
Mark Prior 2003 22 3.69
Roger Clemens 1998 35 3.70
Tim Lincecum 2008 24 3.71
Curt Schilling 1998 31 3.72
Randy Johnson* 2002 38 3.73
Ben Sheets 2004 25 3.80
Randy Johnson* 1998 34 3.81
Greg Maddux 1998 32 3.81
Cliff Lee* 2010 31 3.82
Francisco Liriano* 2010 26 3.83
Jake Peavy 2007 26 3.83
Jason Schmidt 2003 30 3.84
Justin Verlander 2009 26 3.84
CC Sabathia* 2008 27 3.88

Worst 25
Wade LeBlanc* 2008 23 5.43
Nate Cornejo 2001 21 5.43
Chris Fussell 2000 24 5.44
Jose Lima 2000 27 5.44
Cesar Carrillo 2009 25 5.45
Joel Bennett 1999 29 5.45
Jamie Moyer* 2004 41 5.45
Ryan Rowland-Smith* 2010 27 5.46
Chuck James* 2008 26 5.46
Steve Trachsel 2008 37 5.46
R.A. Dickey 2006 31 5.47
Hayden Penn 2006 21 5.47
Brett Jodie 2001 24 5.47
Andy Benes 2001 33 5.47
Brett Hinchliffe 1999 24 5.47
Ruben Quevedo 2003 24 5.47
Dewon Brazelton 2005 25 5.48
Travis Blackley* 2004 21 5.51
Miguel Batista 2000 29 5.53
Jim Parque* 2002 27 5.53
Scott Elarton 2007 31 5.54
Doug Waechter 2004 23 5.54
Wayne Franklin* 2003 29 5.54
Lance Cormier 2007 26 5.54
Braden Looper 2009 34 5.55
Jaime Navarro 1998 31 5.57


#55    Tangotiger      (see all posts) 2011/08/24 (Wed) @ 11:51

Well, if you are looking to improve, then I’d go beyond the 1yr of data, and look at 2yr and 3yr.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 09:39
What sabermetrics is NOT

May 25 09:31
Do pitcher’s reach back for velocity when needed?

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story