THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
Mailbag:You ask:We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, April 02, 2008

Community Forecast, 2007 - Pitcher Results

By Tangotiger, 02:56 PM

The previous thread on this topic focused mostly on forecast results of hitters

I’m starting this thread to deal exclusively on pitchers.  I haven’t done anything, so I’m looking forward to seeing the results as much as you are.


#1    Tangotiger      (see all posts) 2008/04/02 (Wed) @ 15:27

MARCEL:

There were 659 pitchers who pitched in 2007.  Marcel has a forecast for 531 of them.  Those forecasted comprised 91% of the innings pitched.

Of those 531 forecasted, the average ERA difference between actual and forecasted was 0.92 runs. 

Of the 128 not forecasted, they had an ERA of 5.00.  If I give them a presumed forecast of 5.50 for each pitcher, then the total error for all 659 pitchers was 0.99 ERA.

Marcel: 0.99 ERA, n=659

***

I broke the pitchers by “reliability”, with classes of greater than 0.76, less than 0.54 and in-between.  Each reliability class had a total of around 13,000 IP.

So, this works out to about 30% of the innings were guys with a high reliability of forecast, 30% with medium reliability, 30% with low reliability in forecast, and 10% with no forecast provided.

Of the 85 pitchers in the high reliability (basically veteran starters), the average difference in ERA between forecast and actual was 0.61.  If I had assumed a league average ERA for every single one of these pitchers, the average difference would have been 0.75.

As you can see, not much gain.

For the guys with medium reliability, the average difference between forecast and actual was 1.01 ERA.  If we had forecasted league average, it would have been 1.15.

And for the low reliability pitchers, the average difference was 1.15.  And an all-average forecast would have been 1.14.  In essence, the forecast for these pitchers was basically useless.

***

Alright, so that’s the landscape, the baseline.  Let’s see how everyone does.


#2    Tangotiger      (see all posts) 2008/04/02 (Wed) @ 15:44

COMMUNITY:

They forecasted 494 of the pitchers, comprising 87% of the innings.  Of those forecasted, the ERA difference was 0.90 runs.

Putting in a flat 5.50 ERA for the unforecated pitchers, and we get an overall error of 0.99 runs.  Identical to Marcel.

Community: 0.99 ERA n=659

***

By reliability:

High group: 0.62 difference, compared to 0.74 if random.  Close to Marcel.

Medium group: 0.94 difference, compared to 1.13 random.  Better than Marcel.

Low group: 1.10 difference, compared to 1.09 random.  Same as Marcel.

Among the debut2007 pitchers, the Fans provided forecasts for 46 of the 128 debut pitchers.  They were off by 1.66 ERA, compared to a random of 1.51.  Basically, they would have been better off guessing league average.

***

The overall league average forecasted was 4.13 compared to an actual of 4.40 for those pitchers.  Clearly, fans are very optimistic with their pitchers.

***

Overall, we see that the Community knows generally what they are doing, about as much as Marcel.  I’ll look into the “depth chart” next.


#3    Tangotiger      (see all posts) 2008/04/02 (Wed) @ 15:55

If I take the 30 aces that the Community expected, I get a total of 729 actual saves in 2007.

If I take the 30 aces that Marcel expected, I get a total of 730 actual saves in 2007.

That’s a big surprise.  I mean, the Community should know who is going to get the saves, right?  Certainly more than Marcel should.


#4    MGL      (see all posts) 2008/04/02 (Wed) @ 19:15

I really don’t like the idea of just looking at ‘average error’.

For example, with low reliability pitchers, you conclude that both Marcel and the community would be just as well or better off assigning some constant ERA to each pitcher.  I don’t buy that AND if it is true, I would want to see more proof than just comparing average error.

For example, with “unknown” entities, we want to see if the forecaster can distinguish between good and bad pitchers.  That is the most important thing a forecaster can do with unknown entities.  Obviously for reliable pitchers, that is easy.

So, I would want to see what the actual ERA’s are of the worst projected half and the best projected half, or of all intervals, from worst to best.

For example, if the forecaster’s average ERA forecast for their worst half was 5.90 and those pitchers’ actual ERA was 5.73 and for their best half, the forecast was 4.82 and the actual was 5.02, then I would know that the forecasts are doing something (important), regardless of what the average error is.  In fact, for these low reliability pitchers, where Marcel and the fans (and probably everyone else) supposedly add nothing, I would want to see how they did for their best quartile (or whatever interval), the next best, etc.  If it looks something like this:

expected actual

3.84 4.1
4.22 4.4
4.93 4.6
5.86 4.8

Then they are doing a great job, being able to distinguish a poor from an average from a good from a very good pitcher, again, regardless of what the average error is.

I suspect that the fans and everyone else will be able to do that (distinguish good from bad pitchers, on the average) even though they are not beating a random or constant projection in average error, for low reliability pitchers that is.


#5    tangotiger      (see all posts) 2008/04/02 (Wed) @ 19:41

What you said is exactly what I did with the hitters at the end of that thread (see link at the top of this thread).  I split up the players based on the forecasted OPS (great, good, average, fair, poor), and see how each forecaster did. In the hitter’s case, Chone did the best, just ahead of MGL.  I’m going to do the same for pitchers as well. 

Basically, I’m doing it every which way someone would want, to satisfy each reader.


#6    Tom Meagher      (see all posts) 2008/04/02 (Wed) @ 22:16

Tango,

How do you feel about comparing projected ERA to a component ERA for the actual 2007 performance? I couldn’t say if it’s worth the effort, but it’s a tough impulse to deny. Obviously this would supplement rather than replace the straight ERA comparison.


#7    MGL      (see all posts) 2008/04/02 (Wed) @ 22:42

I would think that using component ERA would take away some of the noise associated with regular ERA.  I can’t think of any reason why it wouldn’t be better, unless someone wants to argue that pitchers have significant control over their ERA over and above their component ERA.

It is curious that we use a components for hitters but not for pitchers. Why is that?  If we are going to use ERA for pitchers, why not use something similar for hitters, which might include runs and RBI.  Can’t we say the same thing about hitters that people would say about pitchers - that they have control over how many runs they produce over and above their components?  Whenever I actually think about it, I realize that using ERA for pitchers for ANYTHING (in terms of evaluating them) is as silly (or not silly) as using runs, RBI, and clutch performance, to evaluate hitters.  What if the ERA stat were never invented and all we had to go by was what a pitcher allowed?  Would anyone even think of saying, “Wait, I have a better idea.  Let’s only take the runs that score against a pitcher, and only those that would have occurred had no errors been made, and we’ll use that to evaluate pitchers.  Yeah, let’s do that rather than what a pitcher actually allows.” As sabermetricians, we would think that was ridiculous and a giant step backward.


#8    tangotiger      (see all posts) 2008/04/02 (Wed) @ 23:07

MGL, you missed a great post by Joe Posnanski a few weeks ago along those lines.  Lemme see if I can find it…

http://joeposnanski.com/JoeBlog/2008/03/09/statheads-and-true-wins/

Blogger: I have come up with a new statistic. It involves balls put in play. I call it batting average.
Establishment: Great! How’s it work?
B: See, what we’ll do is, we’ll take the number of hits that the batter has and divide it by the number of at-bats that he has in order to determine how often he gets a hit.
E: That sounds like on-base percentage. What’s the difference?
B: Well, it’s all in what you call “at-bats” For one thing, we don’t count walks.
E: What do you mean you don’t count walks?
B: They don’t count. We take plate appearances and subtract walks. They never happened.
E: How can a walk never happen?
B: It just doesn’t.
E: Aren’t walks good things? Like in Little League, we always say “Walk’s as good as a hit.”
B: I hate walks. They’re gone. So let’s say a guy comes to the plate 12 times, and he gets four hits and walks twice …
E: Right … that’s a .500 on-base percentage.
B: Exactly, but if you just subtract the walks, you will see that he has a .400 batting average.
E: Um, OK.
B: But there are other things. If you hit a fly ball, and someone tags up and scores a run, that does not count as an at-bat.
E: Why not?
B: Because you are sacrificing yourself for the betterment of the team? I call it a sacrifice fly. Get it?
E: Well, what are you sacrificing if it doesn’t even count against your stats?
B: You just are, OK?
E: What if you hit a ground ball and the runner scores.
B: How’s that?
E: Let’s say the infield’s back and a guy hits a ground ball to get the run in. How do you score that?
B: No, that’s not a sacrifice fly.
E Why not? Doesn’t that accomplish the same thing?
B: It just isn’t. Come on, pay attention. What’s it called. Sacrifice FLY? Hello! He didn’t hit a fly ball.
E: It just seems to me …
B: Sacrifice bunts also do not count as at-bats. And when you get hit by a pitch … doesn’t count.
E You don’t get any statistical notice for getting hit by a pitch?
B: Like it never happened.
E: I’m afraid to ask this: What happens if you reach on an error.
B: That’s the beauty of this system. According to my new batting average, you’re out.
E: But you’re not really out.
B: I know. Isn’t it great?
E: Why does this have to be so complicated?
B: It’s batting average! It will take over the world!

You can do this with pretty much every core baseball stat. ERA? Have you ever considered how convoluted and absurd ERA really is? First of all, there’s the whole inane concept of what constitutes an “earned run” vs. an “unearned run,” which I cannot go into now or this projected 3,500 word blog might be closer to 40,000. Let’s just say this: The unearned run? A ridiculous part of baseball. Maybe it had a purpose at one time. Not now. I mean, what, you’re not going to count a run against a pitcher because someone (maybe even the pitcher himself) made an error. What? I mean, there are so many things wrong with this … well, just take one second and look at the OTHER side or the unearned run scenario: Nobody keeps track of “saved runs” when a centerfielder makes a ridiculous diving catch or third baseman dives and takes a away a down-the-line double throws out a runner from foul ground. That play might save the pitcher three runs. Maybe we should start charging the pitcher those three runs on his “IRA” — Imagined Run Average.


#9    MGL      (see all posts) 2008/04/03 (Thu) @ 06:35

I had not been following the hitter forecasting thread.  In it, I wrote:

BTW, this is the type of analysis that I think is much better than just giving us the average forecasting error per player (weighted by the number of PA I think).  These results also are an example of how misleading and/or deceptive the “average error can be” especially when you can forecast just about anything for a player and do pretty well with “average error.”

I would like to see the above breakdowns for low, medium and high reliability players, and debut players especially.  Before, based on average error, there was the inference that randomly forecasting debut players or just giving them leave average, or rookie average forecasts would have done just as well or better than most if not all of the forecasters.  I don’t buy that and I doubt that that is the case.

Also, I don’t think Tango, you should use Marcel to determine the classes.  That makes little sense as Marcel could be awful at projections within classes.  You should use classes based on each projection system or based on actual performance.

For example, you want to look at all of Chone’s projections between .650 and .700 and see how they did.  Then all of his .700 to .750 and see how they did.  etc.

Or, equally good is to use all actual .650 to .700 players, and see how Chone did for that group.  There is absolutely no reason to use Marcel to determine the classes.  Of course, it won’t make much difference what you use to determine the classes, but my way I think makes more sense and is fairer to all forecasters.

#8, I really have no problem with unearned versus earned runs.  None at all. Although I think that ERA is an arbitrary and terrible stat (as opposed to simply what a pitcher gives up - again, unless there is a strong skill component in preventing runs over and above the components, which I don’t think there is), if I were going to use ERA, I would want to try and factor out defense.  Certainly one way to at least start doing that is to assume that all balls that should have been caught are caught.  That is the essence of unearned runs.  In fact, that is exactly how you determine what an unearned run is.  You backtrack and see what “would have happened (yes there is sometimes no clear answer)” had no error occurred.

Clearly that is better than doing no adjusting at all.  You don’t want to penalize a pitcher if in the short run his team happened to make a lot of errors behind him or not.  Clearly using earned run average more reflects pitching skill than all runs allowed.  Clearly.  That was/is the point and it works.  Sure, you can try and adjust for defense even more than that, by, for example, assuming that all great plays were NOT made (a ‘reverse’ error). But that would be TOO difficult and subjective (yes, I know that errors are subjective too). You don’t throw something GOOD out because something else that would be good is not included.

And the idea that a pitcher’s error should be counted against him as an earned run is silly also, unless you want to call a pitcher’s ERA a reflection of his pitching AND his fielding.  If you do that, you might as well incorporate a batters baserunning and fielding in his batting average.  ERA is, at least as we know it, supposed to be a reflection of a pitcher’s pitching skill. His defense can be looked at separately (and his batting), if you want to know his whole package.

So yes, while ERA is not that great in the short run, it is good in the long run.  And in the short run, ERA is better than just RA (or even in the long run), is it not?  So why criticize the use of unearned runs?  It was actually a good idea and still is, regardless of whether you like ERA or RA or not, since it makes a little adjustment for defense and does it properly.  In fact, that is one of the ways that I normalize my pitching stats (for pitcher projections).  I don’t use UZR for errors.  I first assume that all errors are outs and then I add in league average errors (based on ground balls and fly balls allowed), and then I adjust for error-less UZR for all the fielders behind a pitcher.


#10    Colin Wyers      (see all posts) 2008/04/03 (Thu) @ 15:37

There are better ways to simply measure pitcher skill. I won’t presume what people are trying to measure with ERA versus RA because I suspect it varies.

But you’re presuming that pitchers have no control over the distribution of errors behind him, which simply isn’t true. Ground ball pitchers have more errors behind them than fly ball pitchers. If you have two pitchers of equal value otherwise, one that gets mostly ground balls and one that gets mostly fly balls, ERA will overrate the ground ball pitcher and underrate the fly ball pitcher.

ERA doesn’t go far enough toward removing defense from the equation to justify its existance. If I want a fully defense-neutral pitching stat, there are better ones. If I want to see a pitcher’s run allowed according to his environment (which includes his park, league and defense), then RA is better.


#11    Tangotiger      (see all posts) 2008/04/03 (Thu) @ 15:59

Right, I was going to say about GB and FB pitchers.

Of the 149 pitchers with at least 1000 IP since 1994, it’s no coincidence that Brandon Webb (86%) and Derek Lowe (87%) have the lowest % of earned runs per total runs allowed.  Curt Schilling is the leader at 96%.

So, if Webb has a 3.00 ERA, that means he allows 3.5 runs per game.

And if Schilling has an ERA of 3.36, he also allows 3.5 runs per game.

This is a huge difference.  I definitely am not going to assume that Webb has received exceptionally poor fielding and Schilling exceptionally good one. 

A 0.36 run difference is fairly substantial here.  Of course, this only affects a few players, but, as is typical, the few players that we are really interested.


#12    Tangotiger      (see all posts) 2008/04/03 (Thu) @ 16:03

Or, equally good is to use all actual .650 to .700 players, and see how Chone did for that group.  There is absolutely no reason to use Marcel to determine the classes.  Of course, it won’t make much difference what you use to determine the classes, but my way I think makes more sense and is fairer to all forecasters.

Actually, it was more work, which is why I went the Marcel route.  And, as you said, it won’t make much difference anyway, seeing how well Marcel correlates to everyone else.

However, I *cannot* use the actual performance to determine the classes.  All forecasters will undershoot, since, by definition,the actual performance of the high-production hitters will have good luck linked to it.  All forecasters would lose there.

In any case, the question really is: “I really think these hittes are good… does Chone agree?” Rather than using Chone itself to determine if those hitters are good, I use Marcel.  And I use Marcel for all forecasters.

I think it works out fair.  Certainly, unbiased, for all the other forecasters.

Regardless, all data for hitters was published, so anyone can do anything they want with it.


#13    MGL      (see all posts) 2008/04/03 (Thu) @ 18:52

However, I *cannot* use the actual performance to determine the classes.  All forecasters will undershoot, since, by definition,the actual performance of the high-production hitters will have good luck linked to it.  All forecasters would lose there.

True, I screwed that up.  I still think it makes more sense, and is a lot simpler and easier to understand, if you use the forecasts themselves to determine the classes.  Using Marcel introduces something which has no business being there and just muddies up the waters.

It should be:

Here are the players that Chone predicted to be poor.  Let’s see how they did.  Here are the players that Chone predicted to be around average.  Here’s how they did. Etc. Period.

It should not be, “Here are the players that Marcel predicted to be poor.  Here is what Chone said they would do and here is how they actually did. If I saw that, I would say/think, “Who the hell cares what Marcel thought they would do?  Why should I care about Marcel?  Why introduce an extra step/variable? Just tell me what Chone thought and what they actually did, in whatever classes or intervals you want!”


#14    Tom Meagher      (see all posts) 2008/04/04 (Fri) @ 00:09

If it’s being broken down by reliability, then MGL’s points are I think of more theoretical than practical concern. But the group that Marcel has near league average will probably have a large chunk that’s got a very different forecast from the projections using minor league data. So practically, if you are using Marcels, I think you need to break the groups down by both Marcel projection and Marcel reliability simultaneously instead of showing those two divisions independently.


#15    Tangotiger      (see all posts) 2008/04/04 (Fri) @ 10:44

Tom: ok, I can do the double-breakdown.

***

Marcel, by QUALITY:

Anyone forecasted with an ERA under 4 is in the “good” group, over 5 in the “bad” group, and then the rest.

82 good pitchers forecasted.  Marcel forecasted an ERA of 0.73 runs below league average.  Actually performed at 0.81 runs below average.  Verdict: a bit too much regression by Marcel.

68 bad pitchers forecasted.  Forecast of 0.80 runs worse than average.  Actually, 0.49 runs worse than average.  Verdict: not enough regression by Marcel.

This may be a case of selective sampling, and we’ll see if all the forecasters are in this boat.

381 average pitchers forecasted.  Forecast and actual both at +.07 worse than league average.

***

Doing both, by reliability and quality:

forecast actual qual relClass n1
-0.65 -0.61 3 1_Low 26
-0.77 -0.80 3 2_Medium 36
-0.73 -0.91 3 3_High 20

0.09 0.14 4 1_Low 203
0.07 0.09 4 2_Medium 121
0.04 -0.01 4 3_High 57

0.79 0.06 5 1_Low 36
0.82 1.05 5 2_Medium 24
0.79 0.46 5 3_High 8

The first group is all the good pitchers (qual of 3 means ERA forecasted as 3.00 to 3.99).

Pretty much spot on, except for the bad pitchers (ERA in the 5s), with low reliability.  Marcel forecasted an ERA that was +0.79 runs worse than league average, and in fact, those 36 pitchers were right around league average.

For the bad pitchers with high reliability (only 8 of them), and the bad pitchers with medium reliability, if we were to combine them, we’d be ok.

Basically, from this look, it seems that anyone with a bad forecast and low reliability, should have a league average forecast (likely for selective sampling reasons).

I’m to the point where I think we should have a minimum quality standard of players to allow in the forecast, since bad players will simply not be allowed to get the playing time, unless they “show” good performance to begin with.

***

Chone was nice enough to give me his mappings, so I will do his stuff this afternoon.


#16    Tangotiger      (see all posts) 2008/04/04 (Fri) @ 12:12

The Community, by Quality:

Fantastic job!  The 79 pitchers that Marcel deemed good pitchers that the Fans forecasted, gave them a mean forecast of 0.80 runs better than league, and in fact were 0.81 runs better than league.  Just great stuff from the fans right there.

Of the 46 bad pitchers they forecasted, the mean forecast was 0.72 runs worse than average, and actually were 0.47 runs worse than average.  (This is the selective sampling issue we are postulating.  In any case, the results are similar, if not a bit better, than Marcel.)

The average pitchers were forecast at 0.10 runs worse than average, and were in fact 0.04 runs worse.

And, BEST OF ALL, of the 46 pitchers who made their debut in 2007 that the fans forecasted, the Fans forecasted an ERA of 0.40 runs worse than average, and were in fact 0.39 runs worse than average.

Great job to the fans on the pitchers.


#17    Tangotiger      (see all posts) 2008/04/04 (Fri) @ 14:50

CHONE.

Part 1, Overall.

531 of the 659 pitchers who pitched in 2007 were forecasted (7% of innings not forecasted).  Those pitchers were off by 0.88 runs.  Including the missing pitchers the usual way (giving them each a flat 5.50 ERA forecast), and overall, Chone’s error was 0.95 runs.  Out of the 3 forecasting systems, Chone takes the lead.

Chone: 0.95 ERA diff, n=659

***

Part 2, Reliability Classes.

Chone forecasted all 85 pitchers that Marcel deemed “easy to forecast”.  The average error was 0.63 runs, compared to an error of 0.75 runs if Chone had forecasted league average for each pitcher.  In this respect he is the same as Marcel and the Community.

Chone forecasted 180 of the 181 pitchers that Marcel deemed “a bit hard to forecast”.  The average error was 0.94 runs, compared to 1.15 if random.  This is a match to the Community and better than Marcel.

Chone forecasted 225 of the 265 pitchers that Marcel deemed “pretty darn hard to forecast”.  The average error was 1.07, compared to the random of 1.09.  This is a slight improvement to the Community and Marcel.

Chone forecasted 41 of the 128 pitchers to make their 2007 Debut.  The average error was 1.09 compared to the random of 1.04.  That’s a bit better than the Community.  Marcel didn’t forecast any of these pitchers.

We can see here that Chone does a decent job, compared to the Community.

***

Part 3, Quality Classes.

Chone forecasted 80 of the 82 pitchers Marcel thought were “good pitchers”.  Chone thought that these pitchers would post a mean ERA of 0.67 runs better than average.  They in fact posted an ERA of 0.82 better than average.  Chone was behind Marcel, and significantly behind the Community.  Way too much regression.

Chone forecasted 62 of the 68 pitchers Marcel thought were “bad pitchers”.  Chone thought they were 0.53 runs worse than the league average.  In fact, they were 0.49 runs worse.  A fantastic performance, and easily beating the other two. 

Chone forecasted 348 of the 381 middling pitchers.  Chone forecasted them at 0.08 runs worse than average, and we 0.04 runs worse.  That’s worse than Marcel, but better than the Community.

Of the 41 pitchers forecasted of the 128 who made their debut in 2007, Chone thought they were 0.35 runs worse than average, but they posted an ERA 0.56 runs worse than average.  Chone takes a big back seat to the Community.  Marcel abstained.

***

Part 4, Reliabilty and Quality Classes.

forecast actual qual relClass n1
-0.57 -0.63 3 1_Low 24
-0.79 -0.80 3 2_Medium 36
-0.63 -0.91 3 3_High 20

0.16 0.05 4 1_Low 171
0.05 0.08 4 2_Medium 120
0.04 -0.01 4 3_High 57

0.52 0.00 5 1_Low 30
0.49 1.05 5 2_Medium 24
0.60 0.46 5 3_High 8

0.35 0.56 Debut 0_Debut2007 41

First group is the good pitchers (ERAs in the 3s).  The high reliability good pitchers seem to have been over-regressed by Chone.  Is it possible that his aging routine is too aggressive?  As a comparison, Marcel had those exact same pitchers as being 0.73 runs better than average.

The third group (bad pitchers) mimiced Marcel.

***

It seems that some combination of Chone and Community would be good.  But, so far, Chone has a slight overall lead.


#18    MGL      (see all posts) 2008/04/04 (Fri) @ 15:39

I am not sure there is a selective sampling issue at all with any of the classes of pitchers, even the low reliability or debut pitchers.  (BTW, are the debut pitchers included in the “low reliability” class?)

Please give me an example of the selective sampling issue?

Here is the way I look at it:

Let’s say we have 10 pitchers who are forecast as bad and let’s assume that they are rookies or with very little MLB experience.  Let’s say that we forecast them to have a 5.50 ERA collectively.  Let’s also say that the 5.50 is in fact, each one’s true ERA.

Let’s say that 5 pitch well and 5 pitch badly in the first few outings (say, 20 IP) and that the bad ones get benched and the good ones accumulate another 80 IP.  What do we get?

5 bad ones: pitch 20 IP of, say 7.00 ERA and don’t pitch again.

5 good ones:  they MUST pitch 20 IP of 4.00 (the bad and good combined have to average 5.50). They go on to pitch another 80 innings and those 80 innings have to be at 5.50.

So we have 20 IP of 7.00 (5 pitchers).  20 IP at 4.00 (5 pitchers).  80 IP of 5.50 (5 pitchers).

The total average ERA for all these pitchers combined is 5.50.  Of course.  No matter how you shake it, no matter who you let pitch or who you bench, the average ERA of all these pitchers combined is always going to be 5.50.

Where is the selective sampling?  You never get any selective sampling unless you set a minimum IP to include in your sample.  Once you do that, you run into selective sampling.  In this case, if you set the minimum IP of more than 20 IP, you eliminate the 5 pitches who pitched 20 IP each at a 7.00 ERA, and you are left with a group who got a little lucky in their first 20 IP.

Again, if you include all pitchers in your samples and always weight their ERA’s by their IP (or simply compute a collective ERA, which is the same thing), you will NEVER have a selective sampling problem/issue.

You are not using a minimum IP criteria for any of the pitchers, Tango, are you?  If not, you should not not have a selective sampling issue.

68 bad pitchers forecasted.  Forecast of 0.80 runs worse than average.  Actually, 0.49 runs worse than average.  Verdict: not enough regression by Marcel.

This may be a case of selective sampling, and we’ll see if all the forecasters are in this boat.

Basically, from this look, it seems that anyone with a bad forecast and low reliability, should have a league average forecast (likely for selective sampling reasons).

Again, I don’t see a selective sampling issue here unless your “bad pitchers forecasted and low reliability group” had a min IP to be included at all in the sample.  If not, they should pitch at exactly their true level, as should all other classes/groups of pitchers.


#19    Tangotiger      (see all posts) 2008/04/04 (Fri) @ 15:50

Hmmmm… makes sense.  I’ll have to think about it.  I’m also trying to explain why there’s such a gap.  It could be purely luck, as maybe a couple of guys had career years.  I’ll look at that.

***

No exclusion of any pitcher.  All weighted by IP.

***

Reliability classes are: high, medium, low, based on how Marcel perceives those pitchers (which is really number of weighted innings pitched in the last 3 years, using a 3/2/1 weight). 

Any pitcher debut in 2007 is put in a 4th reliability class. 

So, 85 in the high, 181 in the medium, 265 in the low, and 128 debut gives us 659 pitchers who pitched in 2007.  These comprise around 30%, 30%, 30%, and 10% of all innings pitched in 2007.

***

Technical note 1: there were 660, but one pitcher had zero IP.  So, I just removed him.

Technical note 2: “debut 2007” really means “didn’t pitch between 2004 and 2006”.  A player COULD have pitched in 2003, and not in 2004, 05, 06, and then pitched in 2007.  To Marcel (only knowledge of 3 years), that counts as a “debut 2007”.  Needless to say, these are extremely rare, if they exist at all.


#20    MGL      (see all posts) 2008/04/06 (Sun) @ 18:44

Any particular reason why you don’t use 4 years, or even longer, in the Marcel’s?  At least for hitters.  I use 5 years for hitters and 4 for pitchers.  I think that for the veteran hitters that are either really bad or really good for 4 years or more, if you only use 3 years, you will end up over-regressing, which I think was the case when you looked at everyone’s regressions for hitters, no?  For pitchers, it probably does not matter much, since the weights are more aggressive, and you have more more TBF (than PA for batters), at least for full-time, healthy, starters.


#21    tangotiger      (see all posts) 2008/04/06 (Sun) @ 20:28

Technically, the weights should be:

x^y

where x = .8 for hitters and .7 for pitchers (more or less)

y = number of years ago

So, you could go back as many years as possible.

As for why I didn’t do that, I don’t know.  Probably for ease of explanation, I settled on 5/4/3 for hitters and 3/2/1 for pitchers.


#22    Tangotiger      (see all posts) 2008/04/09 (Wed) @ 12:53

MGL.

Part 1, Overall.

604 of the 659 pitchers who pitched in 2007 were forecasted (4% of innings not forecasted).  Those pitchers were off by 0.93 runs.  Including the missing pitchers the usual way (giving them each a flat 5.50 ERA forecast), and overall, MGL’s error was 0.97 runs.  MGL is a shade behind Chone, and sliver ahead of Marcel and the Fans.

MGL: 0.97 ERA diff, n=659

***

Part 2, Reliability Classes.

MGL forecasted all 85 pitchers that Marcel deemed “easy to forecast”.  The average error was 0.65 runs, compared to an error of 0.75 runs if MGL had forecasted league average for each pitcher.  In this respect he is similar to the rest.

MGL forecasted all 181 of pitchers that Marcel deemed “a bit hard to forecast”.  The average error was 0.97 runs, compared to 1.15 if random.  This puts MGL in the middle of the pack.

MGL forecasted all 265 pitchers that Marcel deemed “pretty darn hard to forecast”.  The average error was 1.11, compared to the random of 1.14.  This is a slight improvement to the Community and Marcel, and a match to Chone.

MGL forecasted 73 of the 128 pitchers to make their 2007 Debut.  The average error was 1.38 compared to the random of 1.35.  That’s like the rest.

We can see here that MGL does a decent job.

***

Part 3, Quality Classes.

MGL forecasted all 82 pitchers Marcel thought were “good pitchers”.  MGL thought that these pitchers would post a mean ERA of 0.81 runs better than average.  They in fact posted an ERA of 0.81 better than average.  Fantastic job!

MGL forecasted all 68 pitchers Marcel thought were “bad pitchers”.  MGL thought they were 0.73 runs worse than the league average.  In fact, they were 0.49 runs worse.  Similar to Marcel and the fans.

MGL forecasted all 381 middling pitchers.  MGL forecasted them at 0.06 runs worse than average, and were 0.07 runs worse.  Another good job.

Of the 73 pitchers forecasted of the 128 who made their debut in 2007, MGL thought they were 0.56 runs worse than average, but they posted an ERA 0.61 runs worse than average.  Fantastic again.

***

Part 4, Reliabilty and Quality Classes.

forecast actual qual relClass n1
-0.50 -0.61 3 1_Low 26
-0.93 -0.80 3 2_Medium 36
-0.86 -0.91 3 3_High 20

0.24 0.14 4 1_Low 203
0.03 0.09 4 2_Medium 121
-0.09 -0.01 4 3_High 57

0.76 0.06 5 1_Low 36
0.69 1.05 5 2_Medium 24
0.72 0.46 5 3_High 8

0.56 0.61 Debut 0_Debut2007 73

First group is the good pitchers (ERAs in the 3s).  MGL basically nailed them. 

The third group (bad pitchers) mimiced Marcel.

***

MGL does a great job overall, and still, he ends up in the middle of the pack.


#23    Tangotiger      (see all posts) 2008/04/09 (Wed) @ 13:57

PECOTA.

Part 1, Overall.

549 of the 659 pitchers who pitched in 2007 were mapped and forecasted (6% of innings not forecasted or mapped… I’m sure PECOTA forecasted a few more, but missing out on a few, I can live with).  Those pitchers were off by 0.93 runs.  Including the missing pitchers the usual way (giving them each a flat 5.50 ERA forecast), and overall, PECOTA’s error was 0.97 runs.  Same as MGL.

PECOTA: 0.97 ERA diff, n=659

***

Part 2, Reliability Classes.

PECOTA forecasted all 85 pitchers that Marcel deemed “easy to forecast”.  The average error was 0.64 runs, compared to an error of 0.75 runs if PECOTA had forecasted league average for each pitcher.  In this respect he is similar to the rest.

PECOTA forecasted 180 of the 181 of pitchers that Marcel deemed “a bit hard to forecast”.  The average error was 0.97 runs, compared to 1.15 if random.  This puts PECOTA in the middle of the pack.

PECOTA forecasted 228 of the 265 pitchers that Marcel deemed “pretty darn hard to forecast”.  The average error was 1.12, compared to the random of 1.12.  Middle of the pack.

PECOTA forecasted 56 of the 128 pitchers to make their 2007 Debut.  The average error was 1.32 compared to the random of 1.25.  That’s like the rest.

We can see here that PECOTA does an average job.

***

Part 3, Quality Classes.

PECOTA forecasted all 82 pitchers Marcel thought were “good pitchers”.  PECOTA thought that these pitchers would post a mean ERA of 0.76 runs better than average.  They in fact posted an ERA of 0.81 better than average.  Very good.

PECOTA forecasted 60 of the 68 pitchers Marcel thought were “bad pitchers”.  PECOTA thought they were 0.67 runs worse than the league average.  In fact, they were 0.46 runs worse.  Similar to Marcel and the fans.

PECOTA forecasted 351 of the 381 middling pitchers.  PECOTA forecasted them at 0.08 runs worse than average, and were 0.07 runs worse. Good job.

Of the 56 pitchers forecasted of the 128 who made their debut in 2007, PECOTA thought they were 0.33 runs worse than average, but they posted an ERA 0.39 runs worse than average.  Good job again.

***

Part 4, Reliabilty and Quality Classes.

forecast actual qual relClass n1
-0.67 -0.61 3 1_Low 26
-0.89 -0.80 3 2_Medium 36
-0.72 -0.91 3 3_High 20

0.12 0.10 4 1_Low 174
0.03 0.09 4 2_Medium 120
0.08 -0.01 4 3_High 57

0.72 -0.06 5 1_Low 28
0.60 1.05 5 2_Medium 24
0.72 0.46 5 3_High 8

0.33 0.39 Debut 0_Debut2007 56

First group is the good pitchers (ERAs in the 3s).  PECOTA did a very good job on them. 

The third group (bad pitchers) mimiced Marcel.

***

PECOTA does a job overall, and still, ends up in the middle of the pack.

***

Chone seems to be running away with this.  If PECOTA’s are going to be marketed as “deadly accurate”, Chone should be “super-deadly accurate”.

Seriously, it’s basically a joke that any forecaster will claim any superiority over anyone else.  The best you can do is to be at the top, with several others.  You will not stand alone, or anything close to that.

Worse is the followers of forecasting systems who proclaim “generally regarded as the best”, or some b.s. like that.  Next time you see someone say that, send them over here.

Next year, I’ll put out the challenge to all the forecasting systems, and end this foolishness, once and for all.  Forecasting is the sabermetric equivalent of bending spoons.

Maybe I should offer prize too:
http://skepdic.com/randi.html


#24          (see all posts) 2008/04/10 (Thu) @ 00:53

Chone, what’s your secret?  You are the man, or should I say the monkey!

I used to think that Pecota was the best, but that was probably because whenever they test various systems, they always seem to do the best.  I wonder why that is? wink


#25    Rally      (see all posts) 2008/04/10 (Thu) @ 10:11

Good question.  Wonder what’s so different from their tests and an independent one?

Last year I think I became the first projector to use BBtype in the formula - I use it to help predict BABIP and HR rates.  THT might do this too, but I don’t think PECOTA did, at least not last year.

It involves several levels of regression because a pitcher’s line drive rate is pretty fluky.

For pitchers where I don’t have BBtype, minor leaguers and Japanese imports, I just assume an average mix.  I think this year I used the mix for Nick Adenhart after looking up his page on minorleaguesplits.com, but didn’t do it for other minor leaguers in interests of time.  If I had more time I’d figure out how to spyder the minor league data from gameday.


#26    Tangotiger      (see all posts) 2008/04/10 (Thu) @ 10:29

BBtype being “batted ball type”, or the spread of GB, FB, LD a pitcher allows?

I’m looking at the extreme pitchers, and comparing Marcel to Chone, that we both had, and I had a reliability of at least 0.60 (137 pitchers in all):
http://www.fangraphs.com/leaders.aspx?pos=all&stats=pit&lg=all&qual=60&type=2&season=2007&grid=All

First up: Derek Lowe, GB pitcher.  Marcel had him 0.31 worse than Chone (4.15, 3.84… actual 3.88… good for Chone!)

Brandon Webb.  A match (3.64, 3.60… 3.01, tie)

Hudson, 0.21 worse with Marcel (4.40, 4.19.... 3.33, good for Chone!).

King Felix, Marcel 0.61 worse (3.95, 3.34… 3.92, Marcel takes it!)

Wang, Marcel 0.14 better (3.95, 4.09… 3.70, Marcel again).

Those are basically your 5 GB pitchers that people talk about.

I’m sure Chone is right, but it’s hard to tell really the magnitude.  There’s just so much noise in any kind of performance data you look at.


#27    Rally      (see all posts) 2008/04/10 (Thu) @ 13:15

It’s hard to tell with ERA because of the noise.  Maybe focusing on the HR rate will show a difference.

What I do is use sample data and regression to predict what % of BIP are GB, FB, LD, and Pops.  Then from that result I get a number to regress sample BABIP and HR to.  Then adjust for the defense and ballpark.


#28    Tangotiger      (see all posts) 2008/04/10 (Thu) @ 13:55

However, do you use a different 2b+3b per h-hr rate for GB, FB, and LD?

Basically, the run value of a GB and FB (excluding HR, including DP) is virtually identical.  If the purpose is to estimate BABIP, that’s one thing.  But, if it’s too estimate ERA, then there’s really no advantage to splitting out the GB and FB for BIP.  All you’d really care about is % of BIP that are LD.  And, if that’s constant for all pitchers, then you don’t need that either.

What you are left with is just using FB to estimate HR.


#29    Rally      (see all posts) 2008/04/10 (Thu) @ 14:56

I don’t predict the number of doubles and triples, but I do use the GB/FB rate to get a stat called hit value, the number of total bases per hit in play.  We’re just evaluating ERA here, but my goal is to predict all the common pitcher stats (IP, H, R, BB, HBP, SO) to the best of my computer’s ability.  Hit value ranges from about 1.2 for an extreme groundballer to 1.3 for a guy like Chris Young.


#30    Rally      (see all posts) 2008/04/10 (Thu) @ 14:58

Take two pitchers with similar numbers and the same ERA - the groundballer will give up more hits, but they will be for fewer bases.  Consequently, the flyballer will post a better WHIP.  Just as important for fantasy baseball.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main