THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, March 01, 2010

Chone v2.0 - adding objective scouting data

By Tangotiger, 12:22 PM

Niiiiiiice.  This is exactly the kind of place where you need to look to improve forecasts.  What John Mayne has done here is to show that the observed past performance means more if you are a hard-tosser than a soft-tosser. 

This can be an issue of selection bias: if you are a soft-tosser with a bad performance, you don’t get much chances.  If you are a hard-tosser with a bad performance, you keep getting chances.  So, when you look at the results, all the bad performances you have left above a minimum IP threshhold will likely be filled with hard-tossers.

So, what I would prefer is to look only at pitchers with an ERA under 4.50 in year X.  This at least controls for quality in some respect.

Otherwise, fantastic work.


#1    Peter Jensen      (see all posts) 2010/03/01 (Mon) @ 13:43

Tango - If I understand your critcism correctly, I don’t think selection bias is a problem in Mayne’s study.  First, he doesn’t seem to be using an IP threshhold, including examples with 0 innings and 12 innings in his tables.  Second, the hard tossers tend to over perform their projections, not under perform, so if anything this methodology is understating the trend.  Third, Mayne seems to be using the charts just to illustrate a possible tendency and not for hard core analysis to show the methodology used to actually make an improved projection.  From his comments at the end, it seems he has used analysis to create better projections, but prefers to keep the actual methodology proprietary.  Fourth, it appears from looking at his charts carefully and his response to MGL’s comment at THT that he has error bars that vary by IP to set the boundries of whether a performance is actually even with, or better or worse than a performance.  Fifth, Sean’s follow up analysis in the comments at THT confirm that analysis along more conventional lines supports Mayne’s conclusions. 

Overall a very nice and important study.


#2          (see all posts) 2010/03/01 (Mon) @ 14:40

There’s an IP threshhold of 100 IP *for the prior year*. There’s no IP threshhold for the actual year. (I’m John R. Mayne. The avalanche of kind words and smart criticism is quite welcome.)

--JRM


#3    Guy      (see all posts) 2010/03/01 (Mon) @ 15:50

I posted this over at THT as well:

I think it would be useful to distinguish between soft throwers in general vs. pitchers who have lost velocity in the previous year.  Some of what we’re seeing with the soft throwers is a decline in velocity in the prior year (due to injury or aging), which could mean we should weight earlier years less than CHONE does.  Zito in 2006/2007, Chris Young in 2008, Pedro in 2006 all fit this profile.  It would be interesting to look at this kind of player separately from those who are ALWAYS slow (e.g. Webb, Maddux, Wolf).  My guess is the “decliners” contribute most or all of the underperformance by soft tossers. 

[Data presented by Sean in the THT comments thread suggests the hard-throwers may not really overperform.]


#4    Tangotiger      (see all posts) 2010/03/01 (Mon) @ 15:54

Guy, good point.  The interesting part is that Chone doesn’t need to have a counterbalance for the hard-throwers.  If the soft-throwers are injured or are on a drop from year x-1, then that could explain the issue.


#5    J. Cross      (see all posts) 2010/03/01 (Mon) @ 16:03

Good stuff, JRM! 

Not to give it away but Steamer is going to try to use velocity in its 2010 pitcher projections.

For the pool of 873 pitchers we looked at the correlation between Marcel’s K/AB and actual K/AB was 0.716.  But the correlation between Marcel “adjusted for fastball velocity” K/AB and actual K/AB was 0.741.

In other words, we go from explaining 51% of the variance to 55% of the variance. 

Disclaimer: This was done by adjusting a Marcel projection for year Y with velocity from year Y which is, obviously, unfair since we only know velocity in year y-1.  b/c velocity is so consistent, the major problem here might be that it’s picking up on the fact that guys who went from starting to relief outperformed their projected K/AB and had above average fastball velocity.


#6    greenback      (see all posts) 2010/03/01 (Mon) @ 19:52

It would be interesting to see the relationship between year (X) velocity and year (X+1) time on the disabled list. For that matter, Will Carroll’s traffic lights might be useful.


#7    Nick Steiner      (see all posts) 2010/03/02 (Tue) @ 04:28

Using John’s data, the fast pitcher’s overperformed their projected ERA by .06 points, using a weighted by actual innings pitched average.  The slow pitcher’s underperformed by an average of .47 points.  I didn’t include guys who were injured or in the minors.  Based off of that, I would say that John might be on to something (at least regarding over regressing bad pitchers or guys with slow velocity); however, I think his article is great in it’s concept more than in what he actually found.


#8    Nick Steiner      (see all posts) 2010/03/02 (Tue) @ 05:33

I should clarify that the “test” I did above was in response to MGL’s first comment in the THT peice, in which he said that the “score” method was a poor way of analyzing the data.


#9    MGL      (see all posts) 2010/03/02 (Tue) @ 09:09

In the THT comments, I said that the differences could simply be Chone not using fastball velocity to establish different means to regress toward.  Let’s say that all fast pitchers have an average ERA of 3.90 and all slow ones, 4.50.  If you regress everyone towards 4.20 or whatever the overall mean is, of course the fast ones will outperform the slow ones relative to their projections.  I don’t know if Chone uses velocity to establish the means.  Do you Sean?  Very few forecaster do, I think.  I don’t.  We should.


#10    Rally      (see all posts) 2010/03/02 (Tue) @ 10:24

No, MGL, I don’t.  And at the very least I will investigate a method for doing so next year.  I think I’d have to assume average velocity for minor leaguers who we don’t have velocity data for.

The timing is impeccable.  My gameday/pitch f/x database, which I’ve wanted to do since I got Joe Adler’s book a few years ago, became operational exactly one day before John published this.  Up until now I couldn’t have done this even if I wanted to, but next year will be a different story.

It looks like the same thing as regressing a batter’s power to his size, his babip on groundballs to his speed score, or defense to speed scores.

I don’t regress anyone to an ERA, component stats are calculated and regressed and then combined to get an ERA. The next step is to see what components are affected.  Probably strikeouts, but we’ll see.


#11    Sky      (see all posts) 2010/03/02 (Tue) @ 10:40

I’d like to see WHY velocity might cause a pitcher to beat CHONE (or vice versa).  Is it a BABIP thing?  HR prevention thing?  ERA beating FIP thing?


#12    J. Cross      (see all posts) 2010/03/02 (Tue) @ 11:19

I think it’s mostly a K% thing for the reasons MGL describes.


#13    Sky      (see all posts) 2010/03/02 (Tue) @ 11:37

Ok, so then what’s the mechanism that explains faster fastballs = more Ks?  Is it more swing and misses?  Is it that faster fastballs get more takes because they’re harder to make decisions on?  Do faster fastballs set up other pitches better, and those other pitches actually get the third strike?  Do faster fastballs indicate more break on other pitches, when arm speed is converted into spin instead of velocity?


#14    Rally      (see all posts) 2010/03/02 (Tue) @ 11:55

Probably a bit of all of that.  Seems way too intuitive to me to spend much time analyzing.  I mean, just go in a batting cage and see how often you make contact against 80 MPH pitches.  Then go in the slow cage at 55 MPH.

Or look at some quotes from baseball history from batters facing the great hard throwers.  You can’t hit what you can’t see.


#15    Guy      (see all posts) 2010/03/02 (Tue) @ 12:16

I don’t quite understand the discussion in comments 9 to 14.  As Nick reports above, it doesn’t appear that the hi-velocity pitchers really do overperform.  And since Rally is already projecting K rate separately, which doesn’t need much regression after 3 years of data, I wouldn’t expect them to overperform.

What may be the case is that low-velocity pitchers underperform.  If so, then the question is whether CHONE is overregressing weak pitchers in general or just the low-velocity pitchers?  And if it really is low-velocity pitchers, is it all of them or just those whose velocity has recently declined?


#16    J. Cross      (see all posts) 2010/03/02 (Tue) @ 15:07

Ok, so then what’s the mechanism that explains faster fastballs = more Ks?  Is it more swing and misses?

Hard throwers do have a lower contact% but that’s not the whole story.  Even when you control for contact%, fastball velocity helps predict K/9.  Once you control for contact% the magnitude of the “velocity effect” is only about a third of what it was before though.

I think it’ll be interesting to find out whether there’s still a velocity effect even once you regress to a player’s velocity peers in the projection.


#17    J. Cross      (see all posts) 2010/03/02 (Tue) @ 15:39

Do faster fastballs set up other pitches better, and those other pitches actually get the third strike?

It looks like hard throwers have more effective sliders but actually have less effective curveballs and changeups.  This probably doesn’t mean that throwing hard hurts your curveball but more likely, I think, that if you don’t throw hard you’d better have a good curveball.

Some slopes using fangraphs data:

wFB/C v. FBV: 0.059
wSL/C v. FBV: 0.22
wCB/C v. FBV: -0.14
wCH/C v. FBV: -0.15


#18    MGL      (see all posts) 2010/03/02 (Tue) @ 16:27

If it were mainly a regression effect, you should see little difference for pitchers with lots of history and a larger difference for pitchers with a little history.


#19    J. Cross      (see all posts) 2010/03/02 (Tue) @ 22:53

If it were mainly a regression effect, you should see little difference for pitchers with lots of history and a larger difference for pitchers with a little history.

This looks to be true.

Marcel’s reliability represents the amount of history that Marcel is basing it’s projection on.  A reliability of 1 is no regression and 0 is all regression.

I split all pitchers into three equal-sized groups (each group was 206 pitchers) by Marcel reliability score.  I only looked at starting pitchers.  I also used an “adjusted fastball velocity” by adding 2.5mph to the fastballs of LHP’s (b/c LHP’s at any FBV get as many K’s as RHP’s that throw about 2.5mph harder)

group 1 (most history) info
Reliability: 0.812
mK/IP: 0.717
real K/IP: 0.705
adjFBV: 90.1

group 2 (some history) info
Reliability: 0.701
mK/IP: 0.732
real K/IP: 0.704
adjFBV: 90.9

group 3 (least history) info
Reliability: 0.353
mK/IP: 0.738
real K/IP: 0.706
adjFBV: 91.1

The fastball velocity effect in each group:

group 1:
real K/IP = .95*mK/IP - 0.0108*adjFBV - 0.95

group 2:
real K/IP = 1.026*mK/IP - 0.0124*adjFBV - 1.17

group 3:
real K/IP = 1.00*mK/IP +0.0242*adjFBV -2.24

So, in group 3 with roughly twice as much regression as group 2 the fastball velocity effect was roughly twice as large.  I think this definitely suggests that this is a regression effect.


#20    MGL      (see all posts) 2010/03/03 (Wed) @ 05:09

J Cross, good work, eh, as the Canadians might say..


#21    Guy      (see all posts) 2010/03/03 (Wed) @ 07:56

I still think this is mainly a function of aging and/or injury in the case of the low-velocity pitchers.  They pitch far fewer innings than the high-velocity pitchers, which is partially just a talent difference but not just that. 

Maybe Sean uses JC Bradbury’s baseball-players-never-get-old aging curve in his projections? :>)


#22    Rally      (see all posts) 2010/03/03 (Wed) @ 10:09

I beg your pardon?  That is way below the belt.


#23    Rally      (see all posts) 2010/03/03 (Wed) @ 10:43

Just kidding.  I definitely have players getting worse as they get older.  Don’t see any practical use of Bradbury’s work there, since I don’t know beforehand which players will have long careers.

Looked at Fangraphs list of 77 ERA qualifiers and grouped by fastball velocity.  The slowest 15 (89.4 and less) struck out 14.8% of batters faced.  The middle group struck out 18.5%, and the hardest throwers (93.1 & up) struck out 22.5%

I think it is that simple that players should be regressed to the mean of the population they are coming from, and if I do this then the group projections will be about the same for each category.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential