THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, February 25, 2010

Matt on DIPS

By Tangotiger, 01:10 PM

Everything he says in the first half, I’m on board with, especially this:

The reality is that everybody who can get hitters to whiff enough to hold a roster spot on a major-league team has similar skills at preventing hits on balls in play.

DIPS is so powerful because the 75% of the time you put the ball in play, you better be darned good at it overall.  If you keep getting hammered, you won’t even have a chance at making MLB.

This however:

This is the reason that tRA did so poorly at predicting ERA the following year compared to FIP, despite having all of the same information and batted ball rates mixed in. Since tRA asked the question, “What would the average pitcher’s ERA be, given his strikeout, walk, home run, pop-up, ground-ball, non-HR outfield fly ball, and line-drive rate?” it was given an answer that highly correlated with line drives. There is a negative -0.23 correlated between line drive in a given season and ERA for pitchers who pitched at least 40 innings, but line-drive rate does not carry over to the following season. Thus, any DIPS statistic that relies on line-drive rate will unravel the following season if it tries to predict ERA. That is why when tRA was compared to FIP in predicting the following year’s ERA, it did worse. It uses all the same information, and a bunch of extra information to confuse itself. Basically, tRA is FIP having a nightmare.

I introduced Batted Ball FIP (bbFIP) last week.  The equation is pretty simple:
ERA = 11*bigs + 3*smalls + constant
where
bigs = [(BB+LD) - (SO+iFB)] / PA
smalls = (oFB - GB) / PA
constant = whatever you need to align to the league
BB = BB-IBB+HBP

Line Drive is in there.  And according to my testing of bbFIP and FIP and SIERA:

1.05 bbFIP
1.05 SIERA
1.11 FIP

And Brian’s results:

Year FIP bbFIP qERA bsrERA SIERA Tango wERA
0 0.743 0.840 0.898 0.924 0.883 0.908 0.694
1 1.076 1.026 1.040 1.037 1.010 1.003 1.180

tRA as best as I understand, is similar to bbFIP in terms of parameters used.  So, I don’t think Matt is correct in what he said.  It simply doesn’t follow my testing and Brian’s testing.  My guess is that there may have been a coding issue, perhaps with the way they calibrated tRA?  I dunno.  But, it just doesn’t sound right.

I also want to point out that the “Tango” that Brian mentioned, which is simply ERA = 11*(BB-SO)/PA + constant did the best!  That is, ignore all aspects of batted ball (INCLUDING HOMERUNS!), and focus only on the 25% of the PA that result in a walk, strikeout or hit batter, and it does as well or better than anything out there.


#1    Sky      (see all posts) 2010/02/25 (Thu) @ 13:36

tRA*, please.

I know it’s tough to recreate, but it was created to address exactly the next-season issues of tRA.  With all the regression that SIERA uses, tRA* seems like a very fair yardstick.  Graham, any chance you can help these guys out and provide the equation for tRA* to someone who promises to keep it secret?


#2    Nick Steiner      (see all posts) 2010/02/25 (Thu) @ 13:37

Well the best DIPS estimator by far I imagine would be something like tRA*, which regresses each component as a function of the estimated amount of control that players have over them.  All the other ones apply 100% regression to some things and 0% regression to others, and we know that is incorrect.  So if you really wanted the best ERA estimator, you would do something like plain Linear Weights, regress each component individually, and apply baseruns, I think.


#3    Tangotiger      (see all posts) 2010/02/25 (Thu) @ 15:39

I don’t think it’s fair to compare a stat that uses different parameters, not unless that stat purposefully set the coefficient to those parameters as 0.

FIP for example doesn’t care about batted balls because that’s not the purpose of FIP.  bbFIP has a different purpose.  SIERA has a different purpose.

It’s no mistake that FIP does better at same-year, SIERA does better at year+1, and bbFIP is in-between.

The point is not trying to declare one better, but to say WHERE one is better than the other, and WHY.

They all have their place, and our job is not to say which is better (unless one actually encompasses everything something else has), but to tell the interested reader when to use something and not use something.


#4    NLBB15      (see all posts) 2010/02/25 (Thu) @ 15:52

As a reader one of the stats I’m looking for is a quasi true talent level. My interest in SIERA vs. FIP (and others) is which statistic best demonstrates a players talent level given one season of data? I imagine year+1 is a good test of this “true talent level.” (Of course things are missing like age and a multi-year sample)

Following along thus far, it seems that SIERA and xFIP are almost tied in yr +1 testing. It seems that there are questions about where tRA, tRA* and bbFIP fit in. Is this single year ttl a legitimate question to ask of a statistic or should I just be looking at chone and marcel systems? Was there a year+1 consensus before SIERA was born?


#5    Sky      (see all posts) 2010/02/25 (Thu) @ 16:21

Yes.

“The point is not trying to declare one better, but to say WHERE one is better than the other, and WHY.”

We all fail to make this distinction way too often.  Decide on the question you’re trying to answer, and then pick your stat/process.  There are an awful lot of similar questions being asked that don’t necessarily have similar answers.


#6    Rally      (see all posts) 2010/02/25 (Thu) @ 16:36

"So if you really wanted the best ERA estimator, you would do something like plain Linear Weights, regress each component individually, and apply baseruns, I think.”

It should also look at more than one year of data, and weight those years appropriately.


#7    Matt Swartz      (see all posts) 2010/02/25 (Thu) @ 17:21

Line Drives per Batted Ball has an intra-class correlation of 0.007.  That’s not a typo.

The selected exerpts of the article make it very easy to argue with me because they ignore the point I was making, which is: Stop thinking of preventing line drives as a pitcher skill, because it isn’t.  Stop thinking of BABIP as being uncontrollable because of the defense effect and focus on the luck effect largely related to line drives.

That’s the problem with tRA, and that’s what I was saying.  It tests worse than FIP at next-year ERA (not by much after the park adjustment actually, IIRC, but the clear result was that it didn’t add anything special at all), and I doubt it was a coding issue because it was Colin that coded tRA for us, and I suspect he did it as well as anyone could have constructed it because he actually tested it and optimized it I think.  It came out very similar to Colin’s tests at THT last year as well, which also showed tRA is just FIP with noise that helps same-year ERA.  Why is a community that is based on empirical estimation willing to accept only one statistic that was not tested (tRA) and treats another one tested repeatedly (SIERA) with skepticism?  Linear weights are great at estimating hitters, but it’s fairly obvious that something is added when you incorporate the pitcher pitching to every better in the inning instead of the hitter batting once.  There is situational pitching, there are correlations between BABIP and K%.  It’s not that FIP is useless (it measures something different than people use it for) or xFIP is useless (it measures ERA similarly from a totally different angle, validating both methods and giving people an opportunity to use both to do better (xFIP and SIERA as a simple average did better a tiny bit better than either individually as I showed in the comments section of the recent unfiltered)).

If tRA* regresses various components (we weren’t able to get the formula, I heard, so it’s still a black box to me), then it just makes it an untested projection system instead of what tRA-regular was until recently which was a mostly untested ERA estimator that people liked because it felt nice and misinterpreted the issue of BABIP as being a defense-dependent rather than equally line-drive-and-hence-luck-dependent.  Rally’s exactly right-- use more than one year of data if you’re trying to regress back to a mean.  The point is to isolate skill and looking at next-year ERA is a very good way to isolate skill.  tRA*, in fact, i think only regresses line drive per ball in air, which is only looking like it has a positive correlation because FB% has persistence, not because LD% is telling.  you could show correlations for (sock height)/(sock height + flyball rate) and declare sock height should be included and regressed because flyball rate in the denominator is generating that correlation.

I think bbFIP is interesting, but I think included line drive rate makes it not quite a skill-estimator.  That’s fine, it just answers a different question.  I haven’t played with it yet, either, so it may teach me something later on I’m not seeing yet. 

I even like the “Tango” method of just looking at (K-BB), but it obviously ignores one skill-- groundball/flyball skill.  It’s interesting that ignoring a clear skill does better in testing, but I have to assume there’s enough room for improvement in general in these metrics that a well-done estimator of a subset of skills but without a skill included can catch up with a well-done estimator that tries to get at more skills with 7 years of data like we did with SIERA.  It’s going to be biased in that a groundball pitcher is going to be underrated by the (K-BB) method, and I don’t know who is underrated really badly by SIERA-- hopefully not many pitchers if we did it right-- but there’s room to improve in this whole area.

I’m going to do my best to just hit on my point very hard because I’m not sure I’ll have a chance to reply later on but I really want to emphatically say that if you read the article, I make a case for why line drives per batted ball aren’t a skill, how we showed that, and so it’s really important that is when you consider biases of something like tRA.

The other point that Tango agreed with before that I should really re-highlight is this.  Using xFIP and other such things developed by Markov chains obviously ignores some information (situational pitching, effects of correlated skills and combined skills, etc), while SIERA is not atuned to linear weights and can miss things because the coefficients have to be approximated with less information.  If I were running a team, you’d better believe I would use both, because they are missing different things.  Something that gets very similar results as xFIP for the majority of pitches from a different angle is a useful tool even for the most fundamentalist linear weight believers.


#8    David Cameron      (see all posts) 2010/02/25 (Thu) @ 17:24

The entire backlash you’ve faced on this, Matt, is because of your (and BP’s history of) propaganda. 

Stop calling it better.  Stop saying it’s revolutionary.  Stop selling it, and let it’s value sell itself, and you’ll find the reaction much, much different.


#9    Matt Swartz      (see all posts) 2010/02/25 (Thu) @ 17:47

Dave, if it’s “propaganda” instead of logic and mathematics that are the issues for people, that means the criticism is based on something other than what it should be based on, which makes it just emotional fingerpointing.  I love that when my home-field advantage stuff came out on BP, some snarky commenter here said something to the effect of “I would have expected more out of BP” or something like that.  It highlights the issue.  It was like my third article at BP.  I’d been a regular commenter here for a long time.  Things from BP get treated differently, and that’s why I got treated hostily.

Despite apparently being full of propaganda, we admitted we miscoded xFIP.  It’s not as straightforward as we thought, which is good, and it means there are two very good estimators out there measuring simlar things from a different angle.  That is how we have treated it since confessing that mistake from our supposed propaganda machine.  It also has room for improvement too.

I think it’s revolutionary, so I’m going to call it that.  Anybody who doesn’t sell their ideas when they think they are good ought to think about the point of publishing them in the first place.  I’m going to sell the hell out of SIERA, because I think it’s a damn good idea.  You unfolded WAR at Fangraphs with just as much certainty because it was full of good ideas.  You argued like crazy in the comments for it, because people were making statements that were inaccurate.  Good for you.  There were people making baseless claims in there and you fought.  Good.  Let’s stop pretending that we should undersell good ideas.

That said, “BP’s history of propaganda” is just picking a fight that I’m not going to get into, largely because of how much of it took place before I knew what BP was.  It’s irrelevant, based on emotion, and pointless in the discussion here.  This should have been about line drive rate and pitcher’s ability to miss bats.  Leave the credit-and-discredit-by-association stuff at the door.  We made a mistake in coding xFIP, and that was bad, and we should be called on that part.  But given what we knew at the time, calling it the best at what it was trying to do seems like an assertion of what looked like fact, not a propaganda part of some elaborate plan somebody wrote back in 1998 in an article I’ve never seen.

Dave, I didn’t think either of us ever had a problem with the other, and I don’t see why you’d be mad at me.  I respect what you do, and usually agree.  I like it when you argue your case.  You’re good at it.  I don’t think that’s a flaw or propaganda.


#10    David Cameron      (see all posts) 2010/02/25 (Thu) @ 18:03

Good stuff at BP does not get treated differently.  Nate’s stuff was universally accepted, because it was awesome.  Almost everything Dan Fox published, from his EqBRR to SFR, was immediately accepted, because it was laid out well and was presented honestly, without overzealous hype. 

I don’t have a problem with you, Matt, nor am I mad at you.  I think you’re a good analyst, and you were a good hire for BP.  But when you say stuff like “we screwed up the coding because xFIP is hard to calculate”, we’re going to roll our eyes. 

If you want to come off as a salesman, you won’t get the respect of an analyst.  There’s a big difference between presenting and supporting factual cases for things and selling.  From PECOTA to TAvg or whatever its called, BP is in the business of selling products that are not any better than what is freely available elsewhere

You don’t have to pull that crap.  Be a good analyst, and we will happily support everything you do.  Use your platform to try and promote your work based on marginal (or not real) differences, and you’re going to get called on it.


#11    Mike Fast      (see all posts) 2010/02/25 (Thu) @ 18:13

I liked the article, and the first comment has me thinking about something I’ve researched before from a different perspective.

Line Drives per Batted Ball has an intra-class correlation of 0.007.  That’s not a typo.

The selected exerpts of the article make it very easy to argue with me because they ignore the point I was making, which is: Stop thinking of preventing line drives as a pitcher skill, because it isn’t.  Stop thinking of BABIP as being uncontrollable because of the defense effect and focus on the luck effect largely related to line drives.

Matt, you do realize you’re not the first person to discover this, right?  David Gassko published that finding four years ago.

http://www.hardballtimes.com/main/article/another-look-at-batted-balls-and-dips/


#12    Mike Fast      (see all posts) 2010/02/25 (Thu) @ 18:15

I just realized something I wrote in #11 isn’t very clear.  My first paragraph has nothing to do with the rest of what I wrote.  Those are two separate thoughts.

That comment that has me thinking is the one by Matt W about a way to visualize the ball-bat offset.


#13    Matt Swartz      (see all posts) 2010/02/25 (Thu) @ 18:25

Dave/10:
NBC is making money off the Olympics and HBO is making money off Entourage.  I get more value out of HBO than the monthly charge, so I buy it, and I don’t consider HBO to be any more or less of a salesman than NBC.  I buy THT Annuals and DVDs of The Office from NBC.  I subscribed to BP and I subscribe to HBO.  I don’t want to buy All State Insurance, but your banner ad is telling me to.  Doing something well and making money off of it aren’t mutually exclusive.  They are heavily correlated, as they should be.  I assume you aren’t just doing All State a solid, and you shouldn’t either.

The false contrast of Fangraphs’ quality for zero cents an article and BP’s quality for three cents an article is a pointless one.  If I didn’t read every article I could find on a topic I was writing about, I’m not being careful.  That’s how I think research should be done, with a proper understanding of the literature.  If I simply want to understand baseball, I get value of both sets of metrics by knowing their strengths and weaknesses.  If I were a GM, I wouldn’t be like “well Fangraphs is free”; I’d want to read both.

If you don’t think people pay attention to the url address of the analysis they’re reading, I can’t argue with you.  I assume people are inherently biased and knowing your bias is how to fight it.

Nate’s stuff was not universally accepted at all.  I don’t know where you’re getting that.  You didn’t even accept it when you mentioned PECOTA in that very post.

Mike/11:
I hadn’t seen that article.  It should be sold and sold repeatedly, because it looks great.  Regardless, I think it’s good to highlight these things over and over again because I didn’t know what THT was in January 2006.  My point is exactly the same: this is a fact that has been published and burried and should be highlighted and re-highlighted because most analysts don’t know it.  I gave a spin today at re-highlighting this fact and giving my best guess as to why it was true; I had mentioned this fact in BP Idol last April, and mentioned again at one point in the SIERA series.  David Gassko mentioned it in 2006, which only proves it’s true and it’s a shame it doesn’t get discussed more.


#14    Mike Fast      (see all posts) 2010/02/25 (Thu) @ 18:42

Matt, let me say that I appreciate you coming here to discuss stuff.  And that I am glad that you and Eric made your SIERA series out from behind the paywall.  And that Eric was open about the issue with xFIP calculations once it was discovered.

I see you personally doing a lot of good stuff.  And other people at BP, too.  But that does not mean that I like Baseball Prospectus or the direction it has taken.  It’s not primarily (for me at least) about the whether they charge a subscription fee or not.  I was a subscriber from 2003-April 2009.  So I have no problem paying for good content at BP.  I decided last year that it wasn’t worth paying for any more, and the ten months since then have only confirmed my decision.

I can go more into why I terminated my subscription, but I don’t get the sense that the people at BP care.  They (collectively) seem more interested in telling me why I’m wrong to have let my subscription lapse.

Nate’s stuff was not universally accepted at all.  I don’t know where you’re getting that.  You didn’t even accept it when you mentioned PECOTA in that very post.

PECOTA of today and PECOTA of 2003-2007 are very different beasts.  I liked PECOTA of 2003-2007.  I have no use for PECOTA of 2010.

Nobody’s work is universally accepted, but Nate’s work on PECOTA and baseball market sizes are two things that come immediately to mind as things he did very well.  Any time he wrote a Lies, Damned Lies column I know I looked forward to it.  His work on MORP has gotten criticism.  His work on the percentile forecasts for PECOTA is something that I don’t think he ever adequately justified/tested.  So it’s a mixed bag for anyone.  But if you are basing your assessment of people’s opinions of BP on what’s been said in the last couple years, it hasn’t always been that way.  BP used to be respected.

I hadn’t seen that article.  It should be sold and sold repeatedly, because it looks great.  Regardless, I think it’s good to highlight these things over and over again because I didn’t know what THT was in January 2006.  My point is exactly the same: this is a fact that has been published and burried and should be highlighted and re-highlighted because most analysts don’t know it.

Sure.  I agree that it bears repeating.  I liked what you wrote about it.

I think before tRA came around that David Gassko’s LIPS was considered the standard.  Now, FIP and xFIP got more press, but I believe LIPS was fairly well known.


#15    Mike Fast      (see all posts) 2010/02/25 (Thu) @ 18:54

I thought I’d find out how much stuff had been written about LIPS since David introduced it in 2006 by doing a quick Google search on “David Gassko LIPS”.  It turns out there’s quite a lot, more than I even would have guessed.

But in the process, I came across this interesting follow-up article from David that I missed when I published my survey of significant DIPS literature at THT.

http://www.beyondtheboxscore.com/2006/5/22/01353/5180


#16    tangotiger      (see all posts) 2010/02/25 (Thu) @ 19:07

Line Drives per Batted Ball has an intra-class correlation of 0.007.  That’s not a typo.

Agreed.  Lichtman was the first one I saw who showed that:
http://www.baseballthinkfactory.org/files/primate_studies/discussion/lichtman_2004-02-29_0/

Just about exactly 6 years ago.  He has the year-to-year r on OF line drives of .009 (and .049 for IF line drives).

***

Why is a community that is based on empirical estimation willing to accept only one statistic that was not tested (tRA) and treats another one tested repeatedly (SIERA) with skepticism?

I don’t know that that’s true.  tRA is not that universally used.  I know I never use it (no offense to Graham, but I’ve never taken the time to test it).

Speaking only for myself, I’ve got skepticism of all these stats, including FIP.

I think bbFIP is interesting, but I think included line drive rate makes it not quite a skill-estimator.  That’s fine, it just answers a different question.  I haven’t played with it yet, either, so it may teach me something later on I’m not seeing yet.

Right, all bbFIP is is a same-year estimator that only includes non-fielder-resulted outcomes.  We know the run impact of a line drive is like a walk, we know the run impact of an infield fly is like a K, and we know the gap between the ground ball and outfield fly is around a third or a quarter of the gap of the walk and K.  So, it basically is what it is. 

And we know that there is far more persistence in a walk than a line drive but bbFIP is unconcered with that fact, just as surely that runs created is unconcerned with the persistence of any stat.

So, agreed, it’s setup to answer a particular set of questions.

It’s going to be biased in that a groundball pitcher is going to be underrated by the (K-BB) method

I don’t know that that’s true.  It should be true of course, since the run value of a FB is worse than a GB.  And to the extent that it is true, how bad is it?  And, since it tests overall as well as the other metrics, then what it loses in the obvious GB v FB, it must make up somewhere else.

I don’t think that we really disagree much overall.


#17    tangotiger      (see all posts) 2010/02/25 (Thu) @ 19:14

Things from BP get treated differently, and that’s why I got treated hostily.

I think it’s a fair enough point that there is BPro bias that gets carried over.  I would say that this affects maybe 5% or 10% of the posters/readers, so you have to be careful to not extrapolate that too far.  But, yeah, I’m sure it’s not r=0.

And yes, the best thing to do is argue on the merits and ignore the adjectives.


#18    David Gassko      (see all posts) 2010/02/25 (Thu) @ 19:26

Matt, though I thoroughly enjoy your work and have no bone to pick with either you or BP (in fact, I have had nothing but good experiences talking to both current and former BP staffers), I do think it rubs people the wrong way when you describe SIERA as “revolutionary.” It’s a nice stat, and I know you’re proud of the work you did, but it is not going to revolutionize baseball analysis. DIPS was revolutionary. So was linear weights or range factor or UZR.

Even if you believe that SIERA is a bit better than some alternative stats such as xFIP or LIPS (personally, I don’t, but I think it’s fine if you do—it certainly isn’t worse), to call it “revolutionary” sounds pretty self-aggrandizing. That’s not the impression I get of you from any of your other writing so I’m assuming you’re willfully exaggerating as we all do sometimes, and as someone who has done the same from time to time, I see no reason to hold that against you. However, if you want to know where the hostility is coming from—whether or not it’s fair—I think that’s where you need to look.

Just my two cents.


#19    Matt Swartz      (see all posts) 2010/02/25 (Thu) @ 19:43

David, thanks.  I’m glad you like the stat and think it stands with xFIP and LIPS.  Perhaps revolutionary isn’t the right word.  In all fairness, we didn’t use it in the series, and I think my first time using it was when Dave Cameron just told me to stop calling it revolutionary.  That was probably a bit of (unnecessary) counterpoint thrown in on my part to demonstrate that people should sell what they’re publishing, so I said that it is when he said it wasn’t.  I do think it’s really important to have a DIPS stat that works backwards without quite assuming independence of events.  I think interactions, situational pitching, etc., and all these things are very important.  I don’t expect them to be revolutionary in a literal sense, but none of these things started wars for independence so it’s obviously exaggerating.  Fair enough.  I won’t call it revolutionary, but I still thinks it’s cool and useful, and should be debated on merits.


#20    David Gassko      (see all posts) 2010/02/25 (Thu) @ 20:04

By the way, I agree that by far the most interesting thing about SIERA is the significance of the interaction terms. What I would love to see you and Eric do is dig into the Retrosheet data and see why they are significant—now that would be great research.

And by the way, I hate to link to my own articles, but I did a write a piece on DIPS a few years ago that went into a lot of detail about the relationship between BABIP and fielding independent skills that a lot of the regulars here probably haven’t seen but would be interested in:

http://www.hardballtimes.com/main/article/uncovering-dips/


#21    NLBB15      (see all posts) 2010/02/25 (Thu) @ 20:12

I’m glad to see LIPS enter the debate. Brian, any way to add this to your test? I appreciate and have confidence in all your extra testing. Maybe there is an existing LIPS yr+1 test online?

And where can we go to look up a pitcher’s LIPS? I love the introductory article but can’t seem to find 2009 LIPS. I think I once tried to find it by paying for THT’s battedball report but even that didn’t work.

I know LIPS has been around for a few years but I don’t see it mentioned often. Perhaps it will perform poorly in the yr+1 test which would explain part of it. What do the big boys think about LIPS?


#22    Colin Wyers      (see all posts) 2010/02/25 (Thu) @ 20:43

tRA*, please.

I know it’s tough to recreate, but it was created to address exactly the next-season issues of tRA.  With all the regression that SIERA uses, tRA* seems like a very fair yardstick.  Graham, any chance you can help these guys out and provide the equation for tRA* to someone who promises to keep it secret?

You’re conflating two different uses of the word “regression” - while SIERA uses an OLS regression to figure its terms, it does not regress the components.

And again - I simply don’t know how to compute tRA*. It’s entirely proprietary to StatCorner. Nobody has any idea how well it predicts next year’s ERA/RA. (Nobody had any idea how “reliable” tRA was until I did my testing at THT, either.) It really is on Graham to either do his own testing or open up his methodology so that others can test it.

And again, my implementation of tRA is publically available for anyone with a Retrosheet database on MySQL to use:

http://basql.wikidot.com/tra

So if anyone has a way to improve that implementation and better tRA’s performance in these tests, please let me know.


#23    Tangotiger      (see all posts) 2010/02/25 (Thu) @ 21:04

By the way, I agree that by far the most interesting thing about SIERA is the significance of the interaction terms.

Right, it’s the little things like that that are the fun things, much like Matthew’s recent work at Fangraphs in looking for specific relationships.

When you put alot of stuff in the equation, it’s hard to see what it’s doing. 

For example, look at batted ball FIP: it sets it up clearly how much run impact each event has, relative to the walk and K.  That is it’s #1 selling point.  #2 is that it’s linear.  So, I think that becomes the starting point, much like The Marcels.  If you want to uncover some additional truth, bbFIP and regular FIP become sort of the basic monkey-level metrics.

How does tRA and SIERA and LIPS and the others compare to it, where do they get the extra advantage, and at what cost?

Even in things like BaseRuns v LWTS: it’s clear BaseRuns is what we want for pitchers, but I’ll still often use LWTS for pitchers because it’s so gosh-darn easier to calculate.  And I’m the biggest champion of both.

So, that’s what I’d like to see: where are the little gains being made.


#24    First Time Poster      (see all posts) 2010/02/25 (Thu) @ 21:32

re: Matt #7

“tRA*, in fact, i think only regresses line drive per ball in air, which is only looking like it has a positive correlation because FB% has persistence, not because LD% is telling.  you could show correlations for (sock height)/(sock height + flyball rate) and declare sock height should be included and regressed because flyball rate in the denominator is generating that correlation.”

I’m not very good at math, so I’m not going to attack your oh-so-cheeky claims about socks, denominators, and regression, but is it really that hard to know what tRA* regresses? Sure, it’s a black box to you in how it actually calculates, and I understand that. But seriously, you get PAID to write about sabermetrics. The tRA primer ON STATCORNER says exactly which values are regressed in tRA*:

“K%, BB%, HBP%, GB per ball in play%, IFF per ball in air%, LD per ball in air%, and HR per FB%”

While I know you’re bursting to defend SIERA at any opportunity possible, maybe next time hold back on openly criticizing and dismissing a statistic’s validity when you don’t even know what it incorporates.


#25    Nick Steiner      (see all posts) 2010/02/25 (Thu) @ 22:16

Matt/7

This is untrue:

If tRA* regresses various components (we weren’t able to get the formula, I heard, so it’s still a black box to me), then it just makes it an untested projection system

In, say, FIP or SIERA, why do we give pitcher’s zero credit for what happens after the ball is put in play?  Because it’s been accepted and shown that pitcher’s have very little control over the outcomes of batted balls.  In fact, as you and other’s have shown, they even have little control over some batted ball types.  So in essence, in DIPS formulas, we are regressing Value on BIP 100% to the league mean.  And we are also giving pitcher’s full credit (regresssion 0%) there strikeouts, walks, groundballs etc. 

However, pitcher’s clearly don’t have 100% control over their strikeouts and walks, nor 0% control over their Linear Weights on balls in play.

In my opinion, the most logical DIPS estimator (which would really be more of a true LIPS actually), would be one that gives pitcher’s the proper amount of credit for each outcome, based on how much estimated control they have over them. 

That’s, in effect, what tRA* does.  And that doesn’t make it a projection system, it makes it a better run estimator, and I have no doubt that if you tested it against Year N + 1 ERA, it would be the best out there. 

Now it wouldn’t be that hard to replicate.  Graham has told me that he uses regression values similar to the one’s Pizza Cutter’s had, and so I’m sure that someone could whip up a basterdized tRA* without too much trouble.


#26    Matt Swartz      (see all posts) 2010/02/25 (Thu) @ 22:40

Nick/25:
No, SIERA does NOT assume pitchers don’t have control over outcomes on balls in play. It’s a regression.  Look it up or read the articles.  SIERA is agnostic about this, allowing them to be correlated with K, BB, and GB rates and allowing those coefficients to pick up those skills’ effects in a regression, which they should do and do do.  I think I’ve hammered that point pretty hard.  That is a huge feature of SIERA.  It sounds like a feature of LIPS too.  Good.

I have no interest in what you have “no doubt is the best N+1 ERA estimator” if you didn’t test it.  I’m waiting for Tango to drop his standard “state of opinion as fact without evidence” comment and call you out on that, but I’ll do it first.  I don’t think regressing LD/BIA is useful to regress, and I think it’s clear why.

First Time Poster/24:
What’s your criticism exactly?  That I guessed it regressed LD/BIA and was right?  Should I memorize every sabermetric article ever before having a discussion about any of them?

________

I’m getting pretty damn sick of this hater shit.  I try to participate here, because I like talking about baseball with smart people.  I’m not sure it’s possible to have real conversations about it here, without being treated differently.  And if you don’t think I’m being treated differently by at least some people here, you have your head up your ass-- read the comments on my old StatSpeak articles and read the comments on my BP articles.  Then tell me how I got so stupid about sabermetrics so fast in between last spring and summer?


#27    Tangotiger      (see all posts) 2010/02/25 (Thu) @ 22:48

And if you don’t think I’m being treated differently by at least some people here, you have your head up your ass

I think my post 17 says as much, though I obviously don’t see it to the extent that Matt does.  Regardless, Matt does see it, which is the perspective that matters.

Like I said, somewhere (I didn’t realize how many BPro articles I linked to this week), it’d be a shame if we lose any of the regular posters because of what may be construed as personal animosity.

The more the focus is on the merits, the better.


#28    First Time Poster      (see all posts) 2010/02/25 (Thu) @ 22:56

My criticism is that between your responses on BP and here, you’ve on *numerous* occasions dismissed tRA* out of hand, on each occasion mentioning that the only difference between tRA* and tRA is that it ONLY regresses LD rate. In reality, tRA* regresses SEVEN different things.

Memorize every sabermetric article? Seriously? Don’t be hyperbolic. I *do* think that you should memorize the components that go into sabermetric statistics before talking about them (and ESPECIALLY before DISMISSING them as valid competition to SIERA). Saying tRA* is just tRA with normalized LD rates is akin to discussing xFIP as “something that measures walk and strikeout ratios.” (Although maybe that’s too sore a subject at the present...)

And if it’s “hater shit” to question a glaring gap in your comprehension of a statistic, when you author for a site that I *pay* for the privilege of reading, well, then, I guess I’m just a hater.


#29    Mike      (see all posts) 2010/02/25 (Thu) @ 23:08

Agreed.  Matt, for what it’s worth, I think you have a point about the feelings here towards BP, but it’d be a shame to lose you here.  For me, I’d just like to get to the bottom of all this, and compare SIERA to tRA* and the others that have yet to make it into the RMSE comparisons.

One thing to keep in mind when you say that “any DIPS statistic that relies on line-drive rate will unravel the following season if it tries to predict ERA.” A regression cannot have GB%, LD%, and FB% in the same equation due to issues of extreme multicollinearity.  As GB% increases, one of the other two can only decrease, and if all three are included in the regression, this will render the coefficients completely meaningless.

According to First Time Poster/24, tRA includes GB%, as well as IFF per ball in air and LD per ball in air.  Therefore, it is essentially giving us GB% and LD%, leaving out FB%.  For illustration, let’s simplify and say:

100% = GB% + LD% + FB%
100% - GB% = LD% + FB%
Now let’s just call “x” = LD% + FB%.
This means LD% just equals “x - FB%”.

So in reality, any reference to LD% in an equation is just “x - FB%”.  The end result is that the inclusion of LD% in an equation does not in fact doom it as a DIPS estimator.  In fact, leaving out LD% and including all other possible balls in play will actually just produce the same basic equation.  It’s all the same.


#30    Nick Steiner      (see all posts) 2010/02/25 (Thu) @ 23:09

The SIERA formula is this:

SIERA = 6.262 – 18.055*(SO/PA) + 11.292*(BB/PA) – 1.721*((GB-FB-PU)/PA) +10.169*((SO/PA)^2) – 7.069*(((GB-FB-PU)/PA)^2) + 9.561*(SO/PA)*((GB-FB-PU)/PA) – 4.027*(BB/PA)*((GB-FB-PU)/PA)

When a pitcher gets a strikeout, you are giving him full credit for that strikeout in the formula.  When a pitcher get’s a GB or FB or PU, you are giving him full credit for that in the formula.  Ditto for walks. 

I am saying that pitcher’s DON’T have 100% control over how many strikeouts, walks, GB, FB and PU they allow.  And they DON’T have 0% control over the outcomes of their batted balls or their LD%.  In reality, pitcher’s have more control over some stats than others, but it’s not the 100% - 0% ratio that pretty much all DIPS estimators, including SIERA, assume.  Do you not agree with that? 

And because we KNOW that pitchers indeed have gradient degrees of control over their outcomes, a metric that satisfies that will be better (and more predictive) than one that doesn’t. 

I don’t have proof for this statement, but I will take MGL’s stance and bet whatever the standard internet fair is that I am correct.  (psst. Graham, this would be a nice time to back me up!)

I hope you weren’t referring to me when you mentioned “hater shit”.  My post was purely about DIPS and the topic of Tango’s post and had nothing to do with you being at BPro.  Neither is this one.


#31    Mike      (see all posts) 2010/02/25 (Thu) @ 23:10

"Agreed” meant with Tango/27, btw.  The best part of this site is the respectful dialogue, and I would hate to see that break down.


#32    Matt Swartz      (see all posts) 2010/02/25 (Thu) @ 23:18

Mike/29:

The issue you’re discussing is why Tango suggested (GB-FB-PU)/PA as our groundball number months ago, and it’s why we used it.  Otherwise, the regression would incorporate line drive rate indirectly as you say.

Nick/30:

No, this is why regression is a useful tool.  If you look at the average effect of strikeouts in that formula (which is the one before we fixed our park factors btw-- check SIERA article #5 for the admission and correction there), it’s higher than linear weights would estimate.  That’s because BABIP is negatively correlated with K%, so the least squares estimation method saw pitchers with 1% higher K% also achieve even lower ERAs than those outs would indicate because of the correlated fractional BABIP drop, and gave the pitchers some credit for both strikeous and BIP outs.  Read the articles that David Gassko’s linking, which I’ve been enjoying here.  Most of pitcher BABIP skill is correlated with these things, so they pick it really well even with noisy coefficients due to limited data (2003-09).  I talked about this feature of SIERA in the article this thread is about.


#33    Nick Steiner      (see all posts) 2010/02/25 (Thu) @ 23:26

Matt thank you for the reply.  I am about to go have dinner, but I’ll read through your response in more detail when I get back.


#34    Jeremy      (see all posts) 2010/02/26 (Fri) @ 01:17

I accept tRA*, even though it has yet been tested, because I feel the concept behind it is sound. Out of any publicly available metric that tries to measure a pitcher’s skill, tRA* makes the most intuitive sense to me. Someone can fine-tune a run estimator to beat out BaseRuns, but no one’s beaten the theory behind it. In my opinion, the framework for tRA* is a step above whatever else is out there.

And my opinion of Matt hasn’t changed since he moved to BP. I just guess I got a different impression than David Gassko from reading Matt’s other writing.


#35    Graham      (see all posts) 2010/02/26 (Fri) @ 01:26

Matt:
Your opinion is that of a statistician, mine is that of an engineer. So be it - I really don’t care what you think of my work, and I’m sure you don’t care that I hold you and your entire field of study in something very much approaching utter contempt.

To each their own.


#36    Graham      (see all posts) 2010/02/26 (Fri) @ 01:34

As for testing: I have done my own testing. tRA* beat xFIP by about the same margin (a little more, actually) that tRA normally beats FIP by. However, I don’t publicise tests that I conduct because

a) there is a very good chance that I screwed something up, and
b) nobody in their right mind should ever trust non-independent test results.

If someone wants to re-derive tRA*, find the correlation on the required stats year to year and build it. Just because I don’t explicitly state the formula doesn’t mean it’s a black box, and I’m confident that someone can beat tRA* just by doing a more detailed study on the correlation values.


#37    Sky      (see all posts) 2010/02/26 (Fri) @ 12:21

So I don’t have the chops to actually test tRA*, but it doesn’t seem too hard to come up with a decent approximation of what it’s doing in order to serve as a proxy.  (Would that be tRA**?)

Take the 8 outcomes that tRA uses and regress them according to results that I think others (Pizza?) have already figured out. (SO%, BB%, HBP%, GB/BIP, IFFB/BIA, LD/BIA, FB/BIA, HR/FB).  Heck, change LD/BIA to simply LD/BIP if it makes it easier.  It’s not Graham’s tRA* at this point anyway.

Not saying anyone is required to do this, but I think it would be interesting.  And I think we’re all into “interesting”.

Also interesting would be seeing how Tango’s bbFIP-style metrics stack up and some form of xFIP/FIP where regression of HR/FB% is optimized between 0% and 100%.

Those are articles I’d be interested in reading anywhere.

So, starting point, can someone post Pizza’s famous regression/intraclass correlation/whatever results from statspeak?


#38    FreeZo      (see all posts) 2010/02/26 (Fri) @ 13:07

As a neutral Non-BP subscriber, there is a clear difference in how BP Matt is treated by some as he was previously. Even if most of it is aimed at his employer, it still makes for a caustic environment. 80% of the discourse is healthy, but the other 20% can create the feeling of a me against the world burden. The reality is that the Matt’s continued participation is a community blessing.

There seems to be little professional incentive to continue to do while hiding behind smoke and mirrors would possibly better protect the BP mystique. I doubt his Book blog participation will raise BP subscriber growth but it does speak to his personal dedication to the growth of the field.

This isn’t commentary about SIERA’s place among the spectrum of stats. That has been and should be challenged, but on the merits of the work alone. Don’t let some silly business marketing gimmickry create a toxic enough environment to chase Matt back to his safe ground. We all lose if that happens.

Now for a simple request for a simple man:

Could someone point me to a demonstration of estimating the platoon skill of a switch-hitter?


#39    Nick Steiner      (see all posts) 2010/02/26 (Fri) @ 13:20

Matt - you are still missing the point of my concerns.  While the SIERA formula will acount for BABIP expectaction as a function of strikeouts, it is still giving the pitcher full credit for each of the terms in the formula. 

I am proposing that the pitcher should not get full credit for each of the terms in the formula, but rather a range of credit depending on the volatility of the stat as a function of sample size. 

Also, if I plug in two identical pitcher’s, however, one guy has a .400 BABIP (or LW on batted balls) and the other guy has a .200, they will both be valued equally by the SIERA formula (as well as FIP, tRA, etc.).  However, the guy who allowed the .400 BABIP likely pitched worse than the guy who allowed the .200 BABIP.  Even if the difference is due to LD rate, there is still *some* skill in a pitcher’s line drive rate, just like there is some luck in a pitcher’s K-Rate. 

Hence, regression (to the mean).


#40    berselius      (see all posts) 2010/02/26 (Fri) @ 13:22

FreeZo,

They wrote about this in The Book. I wrote a spreadsheet that will calculate them for you based on that work while looking into this for the 2010 Cubs. You can find it at http://www.anothercubsblog.net/projections/nl-central/2010-cubs-split-projections.html


#41    jinaz      (see all posts) 2010/02/26 (Fri) @ 13:23

Sky,

Here’s Pizza’s hitter article:
http://web.archive.org/web/20080102094412/http://mvn.com/mlb-stats/2007/11/14/525600-minutes-how-do-you-measure-a-player-in-a-year/
Assigned it for my class last week. 

Don’t have the pitcher one, though, and that’s probably what we need.  Easiest way is to find someone who linked to it so we can get the old URL--that makes it easy to find on archive.org.  But I haven’t had any luck with that.
-j


#42    FreeZo      (see all posts) 2010/02/26 (Fri) @ 13:40

Berselius
Thanks for posting. I read The Book and have no issues with RHB or LHB but am still a bit confused with how to do it for a SH. The calculator gets me the final answer, but without the understanding of the work. Obviously I am shamefully off topic, would you be willing to e-mail me at ? Thanks.


#43    jinaz      (see all posts) 2010/02/26 (Fri) @ 13:54

Found it!
http://web.archive.org/web/20080112135748/mvn.com/mlb-stats/2008/01/06/on-the-reliability-of-pitching-stats/

Now someone go make tRA* for me, k? smile
-j


#44    Sky      (see all posts) 2010/02/26 (Fri) @ 14:19

Here’s some relevant methodology and results from Tango that takes Pizza’s numbers and tells how many PAs are needed for 50% regression: 

http://www.insidethebook.com/ee/index.php/site/comments/reliability_of_statistics/#30


#45    Mike Fast      (see all posts) 2010/02/26 (Fri) @ 14:32

I love this place.  Sabermetric utopia, indeed!  I’m almost tempted to go gin up tRA* myself, even though I don’t begin to have the time. smile


#46    Tangotiger      (see all posts) 2010/02/26 (Fri) @ 14:33

Switch hitters, p.167:

Based on our estimate of how much to regress, a player with around 600 appearances against left-handed pitchers should regress about halfway toward the league-average.

So, if you have a switch hitter with 1200 PA against LHP and 1800 against RHP and the league split is 0, and this player is showing +30, then you regress 600/(600+1200), or 33%, meaning he’s a true +20.


#47    tangotiger      (see all posts) 2010/02/26 (Fri) @ 14:55

Part of this reminds me of the reaching base on error.  The run value of that event is slightly more than the run value of a single, something like .49 to .47.

BUUUUUUT, if you were to create a LWTS equation that best-fits to year+1, guess what?  The coefficient for the single might be .41 and for the RBOE might be .08 or something like that.

This line drive discussion is very similar.

Indeed, to go back to the other discussion, imagine that we want to create a LWTS equation that best-fits to next year.  You are going to get much different weights than we are used to, AND you would also include strikeouts somehow.

So, be very careful what you are trying to do, because you might get something that is not very apparent to be correct.


#48    Sky      (see all posts) 2010/02/26 (Fri) @ 15:05

Also, I assume the idea is to calculate event frequencies based on regressed rates and then plug into BaseRuns?  Am I missing another way to do this?

Also, one should be using league-average rates for outs for each event, right?  Like shown here in the first table?

http://statcorner.com/tRAabout.html


#49    Nick Steiner      (see all posts) 2010/02/26 (Fri) @ 15:10

Ooh, I think it would also be cool to, instead of regressing a pitcher’s components the the league mean, regress them to a subset mean selected by Pitch f/x similarities scores. 

Josk Kalk did some work in those a while back. Steve Sommer as well.

http://stlsportsscene.wordpress.com/2009/10/25/similarity-scores/

Although this might be a bit out of the scope of a basic tRA** metric.


#50    Guy      (see all posts) 2010/02/26 (Fri) @ 18:08

Two general observations:

1) I’m not convinced that separately regressing 7 or 8 components necessarily has to be the best approach to predicting next-year ERA.  The assumption being made there is that all the components are independent.  But I’m not sure we know that to be true.  For one thing, we have concerns that the coding of BIP may sometimes be influenced by the outcome of the play, and to the extent that happens a high LD% might be somewhat associated with a lower BA on OFs and/or GBs.  A few extra HRs allowed may tend to mean 1 or 2 less 2Bs or 3Bs.  Perhaps pitchers who get a few extra Ks in a given season are “converting” PAs by weak hitters that would otherwise often be BIP outs.  These could all be fairly weak associations, but in aggregate still mean there is almost no extra predictive power gained from regressing all these components because there is some tendency for strengths in one dimension to be offset by weakness elsewhere.  And the power of Tango’s simple K-BB metric provides some evidence this is true.

2) I’m not entirely sure what these metrics are supposed to be measuring.  I understand what projections are and why we use them, but it makes no sense to limit a projection to one year of data, so these must not be projections.  And I see some value in trying to assess how well a pitcher actually performed in year zero, separate from the contribution of his fielders.  But if we’re going to strip out things like LD% because they contain a lot of luck (which is true), then we aren’t measuring the pitcher’s full performance.  So I guess the idea is to assess what portion of this year’s performance can be attributed to the pitcher’s own, repeatable skill set (while pretending we don’t know a lot of things about this pitcher that we in fact do know).  Is that right?  And if so, why is that an interesting question?  This seems like a lot of effort in pursuit of something of little value. But maybe I’m missing something....


#51    tangotiger      (see all posts) 2010/02/26 (Fri) @ 18:35

Guy said:

And the power of Tango’s simple K-BB metric provides some evidence this is true.

If I remember right, and I think I’m 99% sure I’m right, it was you, as GuyM, at Fanhome, that brought up that K-BB per PA was better than K/BB.  All I did was run a regression to confirm it.

I’ve been calling it kwERA (for K, walk) for lack of a better term.


#52          (see all posts) 2010/02/26 (Fri) @ 22:15

I must have missed it along the way, but what is Tango’s K-BB Metric (or kwERA)?


#53    Tangotiger      (see all posts) 2010/02/26 (Fri) @ 22:20

Bobby, the last paragraph in the intro to this thread:
ERA = 11*(BB-SO)/PA + constant


#54          (see all posts) 2010/02/27 (Sat) @ 00:08

Okay, thanks.  I’ve been caught up so much in reading the comments that I didn’t recall that from the intro.


#55    Tangotiger      (see all posts) 2010/02/27 (Sat) @ 00:11

And that little equation is either a bit better, just as good, or a bit worse, than SIERA, xFIP or anything else out there.

All to say that everything else that gets added to it has the slightest of all effects.  That little equation goes further than even DIPS ever did.


#56    Nick Steiner      (see all posts) 2010/02/27 (Sat) @ 00:59

What I want in my DIPS is a measure of the underlying skill in a pitcher’s performance.  Since each at bat outcome that a pitcher induces has some luck and some skill involved in it, they should all be regressed to some mean accordingly.  Either that or Pitch f/x ERA.  The latter is probably impossible at this point, and the former is very easy to do while satisfying my DIPS needs.  Like I said, a hybrid between the two would be best, regressing each component to and individual mean based off of Pitch f/x similarity scores.


#57    Guy      (see all posts) 2010/02/27 (Sat) @ 10:57

"Since each at bat outcome that a pitcher induces has some luck and some skill involved in it, they should all be regressed to some mean accordingly.”

This is true as a generalization, but it doesn’t follow that regressing will tell us the “underlying skill” for an individual pitcher’s performance.  Josh Beckett had a 8.4 K/9 rate.  Should we regress that 25% (or whatever) toward the league mean and say his true skill in 2010 was 7.8 K/9?  Not in this case, because his career rate is 8.5. 

Of what value is this 7.8 estimate?  It doesn’t tell us how good Josh Beckett really is (or was).  It doesn’t tell us how well he performed or contributed to his team in 2010.  It’s just the best guess we could make of Beckett’s talent IF he had never pitched in professional baseball prior to 2010.  And why do we want that number?  What would we ever use it for?


#58    David Gassko      (see all posts) 2010/02/27 (Sat) @ 12:53

I agree with Guy that it’s very difficult to define exactly where the value lies in a DIPS or LIPS type statistic, though I do believe that they in fact have value. One way to define that value is to say that we want to know how a pitcher performed in a given time period (i.e. a season), and that the best way to answer that question is to remove all the obvious sources of luck. In other words, if we agree that pitchers should have no (or little enough to ignore) responsibility over categories such as LD% or HR/OF, we might do a better job assessing the quality of a pitcher’s performance by ignoring those categories. That is, we only want to focus on the things we believe a pitcher does have the primary responsibility for in evaluating the quality of his performance.


#59    NLBB15      (see all posts) 2010/02/27 (Sat) @ 13:34

Is LIPS located anywhere online? Can someone link me to a site with a pitcher’s 2009 LIPS? I’ve seen the initial articles with the methodology but is the statistic even publicly available?


#60    Guy      (see all posts) 2010/02/27 (Sat) @ 16:03

David:
OK, so we’re talking performance/value—what did these pitchers really DO?  In that case, I can see some argument (but only some) for regressing BABIP, a la DIPS.  Because a good or bad number there is arguably not the pitcher’s luck alone, but belongs just as much to the fielders.  Why should Chris Carpenter get 100% of the credit for his .274 BABIP, as opposed to sharing some of it with StL fielders?  But your LD% and HR/FB examples seem like poor choices for regression, because no one on the team but the pitcher was involved with those.

The argument, I guess, is that we know these results included a lot of luck. But why do we want to deny only pitchers the credit/blame for their luck?  Surely Hanley Ramirez was also lucky to hit .342, but no one suggests he “really” only hit .322 last year.  I guess that’s because there’s no possible way his teammates deserve any of the credit (unlike a pitchers’ BABIP).  Nonetheless, he got lucky.  This idea is one of the unfortunate byproducts of DIPS, I think, in which concepts that are (mostly) correct with regard to true talent morph into performance metrics. 

And in any case, you can’t answer the question you pose with just one year of data. Or rather, you can’t answer it well enough to make this worth doing.  To know how much a pitcher was “responsible” for the offensive outcomes on his watch, we have to know how talented he really is.  Which means you’re really back to doing a projection, but with the odd self-imposed handicap of using just one year of data.  To return to my Beckett example, we need to know his true strikeout ability but regressing him to the league mean certainly doesn’t tell us what Beckett ‘really’ delivered.  And even BABIP will be regressed far too much (Carpenter really is above average on BABIP).


#61    David Gassko      (see all posts) 2010/02/27 (Sat) @ 16:36

Guy,

I’m not arguing for estimating true talent (and by the way, I’m not a fan of tRA* for that exact reason). If you want to estimate true talent, do a projection—I agree. I’m arguing for estimating “true performance” or something along those lines. You say that a pitcher should share credit with his fielders for his BABIP, but I would argue that in the same sense, he should “share credit” with (and in fact, almost all credit is due to) the batters he faced for his LD% and his HR/OF. If we can agree that a pitcher has no control over those categories (or that the degree of control is small enough to ignore for our purposes), then the “credit” for those categories should go to someone other than the pitcher. It’s nice to have a statistic that does just that.


#62    Nick Steiner      (see all posts) 2010/02/28 (Sun) @ 04:31

This is key, from David/61:

You say that a pitcher should share credit with his fielders for his BABIP, but I would argue that in the same sense, he should “share credit” with (and in fact, almost all credit is due to) the batters he faced for his LD% and his HR/OF.

A pitcher’s stats aren’t compiled in isolation and depending heavily on the batter’s, umpire’s, fielder’s and ballpark.  DIPS tries to neutralize the fielder’s impact, and park adjustments try to neutralize the park’s impact, and I believe that the best DIPS would try to neutralize all 4.

I agree with Guy that regressing everything to the mean might not be a great idea.  But on the other hand, that’s exactly what traditional DIPS does, except that it regresses certain things 100% to the mean and other things 0%.  And we all know that is not the way it is in reality.

DIPS, to me, should measure how many runs the pitcher would have given up per 9 innings if we only looked as his performance and not that of the batter’s, umpires, fielders and ballpark.  Of course, that is very hard to do, as his stats are all going to be dependent on those things.  That’s why I proposed a Pitch f/x ERA a while back, but since we know that is pretty much impossible to do at the moment, we have to find a proxy for that.

My suggestion would be to create similarity scores between pitcher’s using his pitch attributes, location, selection and some measure of sequencing context.  Then regress each component towards the mean each pitcher’s group of similar pitcher’s.  So say that Josh Beckett’s most similar pitchers via Pitch f/x are Burnett, Carpenter and Josh Johnson.  They would have a collective K rate of somewhere around 8 I think, so Beckett’s K rate would regress to somewhere around 8.2 from 8.7 or something.

You do that for all components, plug them into Base Runs, or some type of equation that attempts at modeling dynamic and interrelating events, and get your ERA model.  I think that would be by far the most logical and best DIPS estimator.  It would also not be that hard to do once you established the similarity scores.


#63    Guy      (see all posts) 2010/02/28 (Sun) @ 09:31

"If we can agree that a pitcher has no control over those categories (or that the degree of control is small enough to ignore for our purposes), then the “credit” for those categories should go to someone other than the pitcher.”

A lot of the “credit” doesn’t belong to any player on the field, but to the gods of chance. There appears to be a determination to remove this luck when evaluating pitchers’ performance that is never applied to hitters.  But why?  Chipper Jones hit .364 in 2008, and about 50 points of that was luck—i.e. he couldn’t do it again.  Should we say he “really” hit .310 that year?  Should we reduce his WAR accordingly?  And if not, why is that any different? (And don’t say “because he really got those hits,” because a pitcher with a low HR/FB rate also really didn’t give up those HRs.)

If a lot of people think these are fun metrics to play with, more power to you.  But if the idea is use a luck-free metric to measure actual pitcher performance—and only pitchers—then I think that’s a mistake.  I can see the case for excluding BABIP in WAR, as I believe Fangraphs does.  But even that is questionable:  if you did that for Tom Seaver, pretending his BABIP was league-average, you would eliminate a significant portion of his actual value.  And if we start regressing Ks, BBs, and HRs on an annual basis, then you really have serious problems—you can’t possibly do that and call it “true performance.”

BTW, David, despite my reservations I do want credit for the name “LIPS,” which I first suggested to you. :>)


#64    Tangotiger      (see all posts) 2010/02/28 (Sun) @ 10:37

For whatever it’s worth, I remember MGL using LIPS way back at BAseballBoards.com when we were debunking DIPS.  I seem to remember him being the first, but who knows…


#65    David Gassko      (see all posts) 2010/02/28 (Sun) @ 10:38

Hey Guy,

I need to see the some evidence before I give you credit for LIPS. Just kidding, like most sabermetric acronyms it’s a terrible name so you can take all the credit you want. grin

Anyway, I actually would not use a LIPS or FIP type metric in WAR—I prefer the way Rally does it to the Fangraphs method. LIPS is not a value metric. It is a metric that tells us how a pitcher performed in the categories he has significant control over.

The reason you can’t do LIPS for hitters is that hitters tend to have significant control over pretty much all their numbers—i.e., Chipper Jones deserves “credit” for much of his batting average, if not all of it. With pitchers, discarding their HR/OF or LD% might give you a better picture of the actual quality of their performance in a given seasons, but for hitters that is not the case.


#66    Tangotiger      (see all posts) 2010/02/28 (Sun) @ 11:47

Mar 1, 2002, I said:

And MGL also makes a great point that the variability in $H might simply be luck. The $H is dependent on
- the hitters
- the fielders
- luck
- the park
- the pitchers

From the pitcher viewpoint, the hitters should all cancel out, more or less. The park might cancel out, over a large enough sample.

So, maybe the determining factor might be luck, and perhaps LIPS is the better term.

And I only said it in reference to something MGL previously said.  But, yeah, whoever wants the name can have it.


#67    Mike Fast      (see all posts) 2010/02/28 (Sun) @ 12:21

To market it to the average fan, perhaps we want a flowery name like True Unadjusted Luck Independent Pitching Statistics?

But we really we need to adjust for the player’s home ballpark.  Thus, it would probably be better to have Home Outcome Translated Luck Independent Pitching Statistics.

Probably, our results would the most impressive if we applied one of the new statistical techniques: Luck Independent Pitching Statistics Trendlines Implementing Convolution Kernels.

I keep going back and forth, though, on Guy’s question of whether we may as well either be doing full projections or looking at true performance, but not some mixture of both.  Future Luck Independent Pitching Statistics or Full Luck Observed Pitching Statistics?

The ultimate goal, of course, is Completely Luck Independent Pitching Statistics, but the only way to get those right now is to watch every play on video.


#68    Sky      (see all posts) 2010/02/28 (Sun) @ 12:33

This is getting off topic a bit, but I really don’t think you get to new fans with any new metric, let alone a well-named metric.  Like Sciambi wrote in his BPro article, it’s the concepts that deserve attention.  So talk about how fielders can help/hurt pitchers’ runs scored, how ballparks help/hurt, how line drives aren’t constant year to year, etc.  Once people are actually convinced about those ideas, they’ll start to doubt the numbers they see and start wondering how you can account for all these new concepts.  At that point, calling it “DIPS”, “LIPS”, or “LEPROSY” doesn’t matter.  In other words, the metric is the destination, not the journey.


#69    Guy      (see all posts) 2010/02/28 (Sun) @ 12:53

David:
So I think we agree that these metrics are neither a good value metric nor a very good projection.  So we agree on the most important things. 

You say they tell us “how a pitcher performed in the categories he has significant control over.” I guess that’s true, but I don’t personally find that a very interesting question.  More importantly, I think a lot of fans will confuse that with a measure of value or “real performance”, so to that extent they may do more to reduce than enhance people’s understanding of the game (though that isn’t your fault, of course). 

As for Chipper, hitters in general do have more control over their BA than pitcher do over their BAPIP.  But that doesn’t tell us how much credit a particular player deserves for one year’s performance.  Chipper’s .362 had a LOT more luck in it than Carpenter’s .274 BABIP.  To credit Chipper with 100% of his outcome, and Carpenter little or none of his, seems arbitrary to me.

On the name, I have a distinct memory of suggesting to David back around 2005/2006 (via email or a comment thread) that “LIPS” would be a better name than DIPS X.0 for the work he was doing.  Then LIPS appeared.  Since I’ve never read baseball boards, I’m pretty sure I came up with it independently.  However, it’s clear MGL came up with it earlier.  And, of course, it hardly matters!


#70          (see all posts) 2010/02/28 (Sun) @ 13:05

Mike Fast/67

TULIPS, HOTLIPS, LIPSTICK, etc.--genius!  I appreciate creative writing.  Well done.


#71    dave smyth      (see all posts) 2010/02/28 (Sun) @ 13:23

Tango, do you have a link to the initial discussion of kwERA?


#72    tangotiger      (see all posts) 2010/02/28 (Sun) @ 14:59

Dave, do you mean at Fanhome?  Because that’s where it was.  I don’t know if I ever downloaded those threads.  See if you can find it by looking for szERA maybe.  Definitely look for GuyM too.


#73    Jeremy      (see all posts) 2010/02/28 (Sun) @ 15:07

Guy or David, can one of you express your arguments against using tRA* as a value metric?


#74    Guy      (see all posts) 2010/02/28 (Sun) @ 17:12

I have a vague recollection of someone (maybe DSG?) once posting a link to an archive of the old fanhome/scout sabermetrics threads.  Anyone know how to find that stuff now? 

*

I’m shocked by the high correlation Pizza reports for LD%. I’m sure this has been discussed elsewhere.  Is it really true that pitchers’ LD% is highly consistent? 

*

Jeremy:  sorry, I don’t know enough about tRA* to critique it specifically.  But I gather it’s a metric that regresses both the DIPS variables and various types of BIP—is that right?  If so, my major objection would be that by regressing those variables you will deny a lot of credit to great pitchers (and understate the “badness” of weak pitchers).  The metric has to assume that Josh Beckett was quite lucky to post an 8.4 K/9 rate this year, when we know that isn’t true.  Regressing BABIP is of course much more justified than regressing K or BB, but even there you will miss some real skill.  And then there’s the problem of situational pitching.  Glavine was about 20 wins better than his FIP.  Some of that may have been Andruw, but a lot of it was Glavine.

And not to beat a dead horse, but if we’re going to regress pitchers, then why not do the same thing for hitters?  Pujols will lose 30 or 40 pts of OBP and SLG every season.  (I am not actually suggesting this, of course, just showing that this does not provide an accurate measure of performance.)


#75    Jeremy      (see all posts) 2010/02/28 (Sun) @ 18:29

Thanks for the explanation Guy. I wouldn’t mind regressing for hitters. We kind of tried this with xBABIP.


#76    studes      (see all posts) 2010/02/28 (Sun) @ 18:40

I thought I suggested LIPS to David in one of many email exchanges.  I remember arguing with him that his metric wasn’t really defense-independent.


#77    David Gassko      (see all posts) 2010/02/28 (Sun) @ 19:47

The first mention I can find of LIPS in my inbox is in an e-mail from me to Jay Jaffe. So everyone and their mom can stop trying to take credit for it. grin


#78    Guy      (see all posts) 2010/02/28 (Sun) @ 20:27

I found my suggestion of it, in a thread on DIPS 3.0 here:  http://www.baseballthinkfactory.org/files/newsstand/discussion/the_hardball_times_gassko1/.  I’m relieved to see I didn’t imagine this.

However, if 5 of us independently came up with the same idea, how clever can it be?


#79    Nick Steiner      (see all posts) 2010/03/08 (Mon) @ 22:21

What do you guys think about this?

http://www.lookoutlanding.com/2010/3/8/1362878/trar-and-wobar

Guy - I believe this would qualm your fears about regressing K/9 and BB/9 too much for pitchers, as Matthew regresses them to a three year average instead.

I would LOVE to see how these test out, as they appear to do nearly everything that I want in a DIPS metric (including component park adjustments).


#80    Colin Wyers      (see all posts) 2010/03/09 (Tue) @ 01:55

"tRAr is basically what tRA* was intended to be. It takes each of the inputs to tRA (GB, FB, LD, IF, Bt, K, BB, HBP, HR) and regresses them first toward the pitcher’s recent (max three year) historical rates and then factors in the league average if that 3-year sample is too small.”

What does that mean, regresses them to the player’s historical rates? That’s not how regression works.

And what is “too small” such as to require the league average? All players should have the league average included - if you’re handling the regression right, then for players with a large enough sample the regression simply won’t make an impact when you round to a sane number of significant digits.. Why draw a line in the sand yourself arbitrarily?

At this point you’re building a projection system in what seems to me like a very arbitrary fashion, and simply not calling it a projection system. Why should I use this instead of, oh, an actual projection?


#81    Nick steiner      (see all posts) 2010/03/09 (Tue) @ 02:05

Thinking about it Colin, I agree with you fully. 
And I now understand Guy’s question about what exactly we are trying to with DIPS stats.


#82    Colin Wyers      (see all posts) 2010/03/09 (Tue) @ 02:32

Right. If all you want to do is predict future performance, then what you need is a projection system, not a “metric” or “stat.” Once you’re including three years of stats and regression, you’re projecting performance.

The role for DIPS (as I see it) outside of projection is in separating the contributions of a player from that of his fielders in the past tense.


#83    Nick Steiner      (see all posts) 2010/03/09 (Tue) @ 02:40

The role for DIPS (as I see it) outside of projection is in separating the contributions of a player from that of his fielders in the past tense.

Okay, so that makes sense - DIPS is a value stat, not a skill stat.  If you want a skill stat, doing a proper regression is the best way to go. 

So that makes me curious, once again, to the point of stats like SIERA, xFIP, LIPS, etc. which regress stuff like HR/FB and, in the case of SIERA, LD%.  Those things are only being regressed (100% to the mean) because their isn’t a lot of skill in them.  But that shouldn’t be relevant because they did actually happen, and if you want to measure skill, you would do a full scale projection anyway.


#84    Peter Jensen      (see all posts) 2010/03/09 (Tue) @ 04:36

I find it odd that you are having this huge discussion, both here and at Lookout Landing, on the same day as I introduced my pitching stat, which seems to do everything you want in separating pitching from fielding, and discusses many of the same issues, and yet no one here even bothered to comment on it.

http://www.hardballtimes.com/main/article/yet-another-pitching-metric/


#85    Mike Fast      (see all posts) 2010/03/09 (Tue) @ 05:02

Speaking only for myself, while I’m still following this thread, my eyes glazed over somewhere shortly after post #50.


#86    Peter Jensen      (see all posts) 2010/03/09 (Tue) @ 05:04

The role for DIPS (as I see it) outside of projection is in separating the contributions of a player from that of his fielders in the past tense.

Colin - That may your interpretation of DIPS, but it is historically incorrect.  The whole impetus behind DIPS was to better predict ERA in year plus 1. It is NOT a good description of what a pitcher has done in the past.

Once you’re including three years of stats and regression, you’re projecting performance.

This statement is wrong as well.  The progression should go like this.  Take a pitcher’s performance and pare out any outside factors (like fielding, park, quality of opposition, influence of umpires) that you know affect that performance and that you know are outside of the pitcher’s control.  What you are left with is a closer measure of the pitcher’s true talent, but is still subject to small sample size error.  If you want to get closer to the pitcher’s true talent you have to increase the sample size, which means including more years in the sample.  You can put as many years in as you want as long as you think the pitcher’s true talent is staying relatively stable and not increasing or decreasing because of age, injury, or change in pitching technique.  There is nothing magical about a single year’s data.  You can then take this to the next step of creating a projection for the future, if you wish.  But establishing a player’s true talent is a completely separate step from using his true talent to create a projection.


#87    Nick Steiner      (see all posts) 2010/03/09 (Tue) @ 05:35

I agree with Peter/86 actually (god I keep flip flopping on these).  What Matthew is doing is taking the pitcher’s actual outcomes and regressing them towards his estimated mean - that doesn’t make it a projection, it makes it an estimate of the skill involved in his stat line.  The fact that he uses previous years data doesn’t make it a projection. 

Peter, BTW, the bulk of this discussion took place well before you unveiled your metric.  My post in 79 was the first one in this thread since February.


#88    Guy      (see all posts) 2010/03/09 (Tue) @ 10:04

Nick:  What is the difference between a projection and an estimate of the skill component within a given year’s performance?  It seems to me they are the same thing.  What does an estimate of player’s skill in year X take into account than his projection does not (or vice versa)?

Peter/86:
“That may your interpretation of DIPS, but it is historically incorrect.”
“This statement is wrong as well.”
Peter, you bring a lot to these discussions and do some fine analytic work on your own.  But it seems to me you could sometimes make your points in a more collegial way.  As Tango says, no one has to pull punches as long as they’re above the belt.  But a little softening around the edges—“I see it differently” rather than “you are incorrect,” or “I think you’ve omitted an important element” rather than “that’s wrong”—would make it a lot easier for people to hear your contributions.  This is a community of people trying to learn from each other, not a battlefield.

Mike/85:  I hope it wasn’t post 50 that induced the glazed eyes!


#89          (see all posts) 2010/03/09 (Tue) @ 10:51

From Voros’ original DIPS article:

So what have we done? We’ve taken individual pitcher stats and we’ve used only the ones that are not affected by defense and have a definite relationship to pitching ability. Hits allowed is not one of these statistics and so we don’t use it. ERA is another and so we don’t use it either (as our new method with a few minor adjustments will correlate with ERA the following year much better than ERA itself as you’ll see in a future article). Instead we use stats like BB, HR and SO (the most important of the pitching stats) and league averages for the others. The method can affect our evaluations of pitchers by a LARGE MARGIN (as you saw above). The method adjusts for park and league and most importantly, THE QUALITY OF DEFENSE THAT WAS PLAYED BEHIND HIM IS COMPLETELY REMOVED FROM THE EQUATION.

(emphasis in the original)

It certainly reads to me as if he is touting both the objective that Colin refers to (separating pitcher from his fielders) and the one that Peter refers to (projecting future ERA). 

Whether he’s right about DIPS ability to describe the past isn’t germane to the question of whether that was part of what it was designed for.  I agree with Guy--it’s too insignificant of a semantic point to get hung up on.


#90    Nick Steiner      (see all posts) 2010/03/09 (Tue) @ 11:27

What is the difference between a projection and an estimate of the skill component within a given year’s performance?  It seems to me they are the same thing.  What does an estimate of player’s skill in year X take into account than his projection does not (or vice versa)?

Well, it’s a matter of semantics really, but it makes sense to me. 

1) A “True Skill” estimator like Matthew did has a different order than a projection.  A projection takes a player’s recent stats, applies regression to the mean and weights them.  Matthew’s order is to establish the mean first, than regress a player’s stats to that mean.  The more fits the idea of LIPS (that there is luck involved in everything a pitcher does), and while it may end up getting you the same place as a regression, the order is an important distinction. 

2) Matthew’s stat doesn’t include aging curves or predicts the player into his home ballpark.  Those are the steps that turn a true skill estimator into a projection system. 

The thing about SIERA and xFIP and LIPS is that they *are* projection systems under the definition that Colin gave above.  They are intentionally ignoring things that actually happened and were not influenced by fielders (LD%, HR/FB) because it’s more predictive to keep them out.  In other words, xFIP regresses a pitcher’s HR/FB rate 100% to the mean - effectively assuming there is no skill involved in the pitcher’s actual HR/FB.  The fact that Matthew’s stat uses 3 years of data to find the proper mean, than regresses each stat to it, doesn’t make it different than xFIP or SIERA - it makes it less arbitrary in his regression applied. 

Once you start ignoring or regressing things that actually happened and weren’t influenced by defense, that turns it form a DIPS stat to a LIPS stat.  And the best LIPS stat will be one that takes into account previous years of data to find the proper mean to regress to. 

Whether he’s right about DIPS ability to describe the past isn’t germane to the question of whether that was part of what it was designed for.  I agree with Guy--it’s too insignificant of a semantic point to get hung up on.

I don’t really agree with that Patriot.  I think that Voros assumed that defensive (and ballpark) luck was really the only thing things out there.  Of course, we now know that there are many stats that have a lot of luck involved in them (HR/FB, LD%) and even K’s and BB’s have some luck in them. 

So one could easily “beat” DIPS by just regressing those things a pitcher has little control over.  But then it isn’t DIPS, it’s LIPS.  And once you change it to LIPS, I don’t see why it’s a problem to use multiple years of data and actually have non-arbitrary regression values.

And that’s why it’s unfair to test SIERA and xFIP against tRA and FIP.  tRA and FIP don’t care whether or not what happened in the past is a skill or is predictable, they care about removing defensive performance.  There is no real “test” for a metric that describes the past - which is a problem as there is no way to know if Peter’s recent metric is better than FIP and tRA at doing so.


#91    Colin Wyers      (see all posts) 2010/03/09 (Tue) @ 12:04

I don’t really agree with that Patriot.  I think that Voros assumed that defensive (and ballpark) luck was really the only thing things out there.  Of course, we now know that there are many stats that have a lot of luck involved in them (HR/FB, LD%) and even K’s and BB’s have some luck in them.

The thing is that BABIP isn’t necessarily “luck” in this sense, any more than offensive support for a pitcher is “luck.”

Both of them are skills - they simply aren’t the PITCHER’S skills. They are a combination of the skills of the eight other players on the defense (and, to the extent that the pitcher fields or hits, his skills at fielding or hitting - not his skill at pitching).


#92    Nick Steiner      (see all posts) 2010/03/09 (Tue) @ 12:09

Yes, I understand that Colin.  By “luck” I mean stuff that is out of the pitcher’s control.  Were you refuting my overall point or just correcting the statement?


#93    Guy      (see all posts) 2010/03/09 (Tue) @ 12:18

Colin:
I don’t think you’re right about this.  I think most of the variation in a pitcher’s BABIP around his true talent results from random variation in the difficulty of the BIP he allows, not the quality of fielding behind him.  This should be measurable.  One could look at the variation in PZR (range of projected BABIP given a pitcher’s BIP allowed), and compare that to the variation in UZR behind a pitcher.  I think you’d find the former varies more than the latter. (And keep in mind that the UZR data will actually overstate the impact of fielding, because a pitcher whose fielders posted a very good UZR will tend to be those whose BIP were actually easier to field than UZR “thinks” they were, and vice versa.)

Nick:
I can see the value in a metric that tries to remove the impact of fielding.  Which of these various metrics do that best we can argue about.  And I can see the value of estimating a player’s true talent, which is exactly the same thing as doing a projection (except for estimating the impact of aging).  But I still don’t see the value in removing luck and calling the result “performance.” Nor do I understand how that would be, or could be, any different than a true skill estimate.  And again, if there is some value in the LIPS approach that I’m missing, why do it only for (to?) pitchers?


#94    Nick Steiner      (see all posts) 2010/03/09 (Tue) @ 12:36

Guy if you follow the Lookout Landing link I posted above, they introduced a stat called wOBAr, which does something very similar for hitters.


#95    Colin Wyers      (see all posts) 2010/03/09 (Tue) @ 12:36

Guy - I actually have tried to measure this before, with Project Scoresheet data. I don’t have the results in front of me. (And honestly I don’t remember if the variation of BABIP due to BIP distribution was larger than the observed variance minus the variance due to BIP distribution.)


#96    Peter Jensen      (see all posts) 2010/03/09 (Tue) @ 12:50

One could look at the variation in PZR (range of projected BABIP given a pitcher’s BIP allowed), and compare that to the variation in UZR behind a pitcher.

Guy - Since MGL adjusts UZR according to how hard the ball is hit, which is part of the possible pitcher skill you are trying to measure, I am not sure how successful such a methodology would be.


#97    Guy      (see all posts) 2010/03/09 (Tue) @ 13:25

Peter:  Agreed, but I’m only interested in trying to figure out how much of the variation can be attributed to fielders.  Since I want to credit/blame the pitcher both for any BABIP skill he has AND any random variation, I don’t need to separate those two (for this purpose). 

Colin:
I’d be interested in seeing your data.  I took a quick look at Pinto’s PMR data for 2008 here:  http://baseballmusings.com/?p=29182.  If you credit fielders with the difference between a pitcher’s actual and projected DER, the variance in projected DER and fielding is virtually identical (pitcher .018, fielders .017).  However, I think this overstates the fielder contribution.  PMR (or UZR) estimates the likelihood a ball in a give bucket will become an out.  However, there’s a variance within that bucket.  In any bucket, the balls that are converted into outs will, on average, actually be easier to field than those that become hits/errors.  So pitchers who appear to benefit from great defense will actually have surrendered easier BIP than PMR/UZR believes (and the reverse for pitchers who appear to have been hurt by terrible defense).  So I’d guess that the pitcher (skill and luck combined) accounts for at least 60% of the variance, maybe quite a bit more.  In any case, you certainly can’t say this is entirely, or even mostly, about the fielders.

Come to think of it, this bias within fielding metrics means that in a sense we should be regressing these metrics.  Not in terms of assessing true talent, but just to get an accurate measurement of what happened in a given season.  How much to regress would depend on how much variance we think exists in each bucket.  Does a .60 out BIP really encompass some .35 and .85 balls, or just a range from .55 to .65?  If it’s the former, then you need to regress more, if the latter not much at all.  Maybe MGL already accounts for this?


#98    Guy      (see all posts) 2010/03/09 (Tue) @ 13:31

Nick:
What do these hybrid metrics do for you?  It seems like you’re trying to remove some of the luck in a performance, but not all of it.  What does that leave you with? 

And as I argued in the other thread: 
If we follow LIPS to it’s logical conclusion, there is in fact no difference between “performance” and “skill.” We should just do the best projection we can and say that was a player’s “actual performance” this year, since any departure from that must be either luck or the influence of teammates.  Does that really make any sense? 

One advantage:  our projections will all be perfect!


#99    Nick Steiner      (see all posts) 2010/03/09 (Tue) @ 13:35

Guy, the only differences are what I posted in #90.  The main thing is that it is not trying to predict future performance, but show the amount of skill in past performance, because it doesn’t account for aging and regresses the stats in a different order. 

So basically, it’s a projection system that isn’t concerned with projecting future performance.  So I can see why you might find it stupid.


#100    Colin Wyers      (see all posts) 2010/03/09 (Tue) @ 14:08

Well you only need to account for aging in a projection if you expect a player to, well, age any in the sample that you’re projecting. For, say, xFIP, that’s not necessarily the case.

Now with tRAr, you SHOULD be applying aging to the past three years of performance, since you’re essentially predicting skill in y based upon years y-1, y-2 and y-3, and the player HAS aged in year y relative to those three seasons.


#101    Nick Steiner      (see all posts) 2010/03/09 (Tue) @ 14:49

Good point Colin.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 17:51
Clutch analogy

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 15:58
MGL: Today on Clubhouse Confidential

Feb 11 11:54
Who is Jeremy Lin?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul

Feb 10 21:07
Hero of the month: Brittney Baxter

Feb 10 18:32
Moneyball at Villanova

Feb 10 17:00
Psst… wanna intern in Canada?