And I’m supposed to have confidence in an evaluation metric with undisclosed methodology with undisclosed changes. Silver says that WARP is now calculated with VORP? What now?
Silver also mentions BP developing a PBP metric to “compliment” FRAA. Which is interesting, but… compliment? “Supplant” or “replace” would be better words to use, I’d think. BP’s own Dan Fox has SFR, which (for the purposes of PECOTA, at least) should cover the vast majority of their dataset. Why use FRAA at all if you have SFR?
I always thought WARP-1 was some combination of their VORP and FRAR numbers, is that not the case?
Anyway, nice to hear they are finally coming up with a PBP defensive metric.
Sam, BP has two offensive Runs Above Replacement metrics - Keith Woolner’s VORP, which is based around Marginal Lineup Value (which in turn is based on Runs Created); and Clay Davenport’s Batting Runs Above Replacement, which is Equivalent Runs (essentially a gussied-up version of Linear Weights) expressed as runs above replacement. BRAR has been the basis of WARP, along with FRAR, at least until now.
I agree that this is all very confusing - personally at this point I think they’re spending a lot of time figuring out which pig to put lipstick on.
Nate is definitely too politically correct, and doesn’t seem to ruffle feathers. He’s VP of BP, so I’m not sure we should expect more. Personally, I prefer the way Mark Cuban and George Steinbrenner and Howard Stern speak.
I’m sure his statement here is a typo:
We’ve actually re-done the WARP formula this year so that it’s based on a combination of WARP and VORP.
He probably meant to say:
We’ve actually re-done the MORP formula this year so that it’s based on a combination of WARP and VORP.
Regardless, when the questioner said:
You feel value is non-linear, as you represent it in MORP, which results in significant bending at the margins (ARod’s “value” at $38 million or whatever). Tango argues that value is actually pretty linear. It seems a fundamental issue. Why are you right and he’s wrong? (Clash of the analysts!)
I have no doubt that whatever non-linearity there is is very small, and therefore Nate is wrong, and I’m right. (I’m not politically correct.) I find that Nate and JC at Sabernomics do a great disservice by creating an exponential equation to map between runs or wins to dollars. It’s not apparent at all what the relationship is, without putting the numbers is a spreadsheet and charting the numbers. And, like I said, the linearity is most of the game, why confuse things? At the very least, present a linear model so that you can let your readers see what you are doing, and then sell the non-linear version as a minor enhancement.
Let me go check out the other responses…
I will ditto Dave’s comment in post 1. It’s one thing to be p.c. about responses, and it’s another thing to b.s. it. There’s simply no analyst around that can justify a team replacement level of .150, which is what Clay does. Nate himself introduced SuperVORP as a mecessary counter to WARP, as he obviously doesn’t believe in the WARP baseline.
I wish that there’d be one brave soul at BP that would denounce the untenable position that WARP clings on to. That no one does tells me that there’s a culture at BP that simply doesn’t make a BP v BP acceptable.
Fighting where you leave your opponent with a bloody nose is perfectly acceptable, as long as it’s done above the belt. Brutal honesty is what is needed.
What’s funny to me about the whole WARP issue is how dug in they seem to be on it. I understand the impulse for people to cling to their own methodology, but Clay will always be better known for EQA and his translations than he will for WARP (they were developed much earlier and appear in print in BP, whereas WARP is rarely used in the books). It’s not a case where he would be torching his most notable achievement by going back and revising it.
Of course, that’s what mystifies me about Bill James hanging on to RC, since James was so influential in so many things outside of RC. The fact that he has embraced Pythagenpat makes it even more confusing. But in fairness to BJ, he has clung on to RC for all these years, but he has been willing to make fundamental changes to it. The switch to a theoretical team framework was a major change.
So if Bill can switch from a straight multiplicative estimator to a theoretical team estimator, but hold on to RC, surely Clay can switch to a .300 replacement team, but hold on to WARP.
Your goal in disseminating information is vastly different than BP’s. You are interested in finding the truest answer we can know today to relevant and interesting questions, at least as far as I’ve been able to tell. BP is interested in finding the truest answer that will still generate a profit for their corporation.
You’re Linux; they’re Windows. Microsoft doesn’t go open source because it would cost them money. BP doesn’t abandon their proprietary metrics for better, non-BP methods because they can’t make money on them.
It’s not about being PC. Nate’s just looking out for BP’s bottom line. They’re capitalists, not scientists.
Personally, I have nothing against being mercantile about things - if BP wants to make some money and people are willing to pay them, I couldn’t care less.
What bothers me is how many little fiefdoms they seem to have over there - every man a castle unto themselves, it seems.
Just to answer a simple question - how many runs above replacement is, say, Alex Rodriguez worth - entails having that debate yourself. VORP says 96.6. BRAR says 86 or 83, depending if you adjust for season or all time. And both of them are supposed to be measuring the exact same thing! Is VORP right? Is BRAR right? If it’s BRAR, do I want to adjust for season or all time?
Or let’s say I want to figure out a team’s chance of making the playoffs. Do I use the standard Playoff Odds Report? The PECOTA one? The ELO one?
I don’t even want to get started on the vast number of pitching metrics they use to all measure pretty much the same damn thing.
I think the thing is that they just have a bunch of different analysts under one roof. They all are obviously extremely intelligent saber guys and they all have slightly (or perhaps more than slightly) different approaches to certain things.
It’s kind of like here where you have Tango, MGL, and a bunch of readers. I have certainly seen you guys all debate many things in the past. I guess the difference (I think) is that there’s obviously more of an open dialogue here. I think it would help BP if they had some internal debates about different issues, but brought those debates out for everyone to see. i.e., I’d love to see Davenport and Silver have a little back-and-forth about which replacement level should be used in an article or something along those lines. Something like a “back to the basics” series would be neat.
I understand that they may think that this would not look good for the BP “brand” or whatever, but I actually think it would only engage their readers more and make things all that more interesting.
I guess I’m saying that they should be more upfront about all of the different stats and ideas that each of them have (and I know much of this has been said before right here, so it is not exactly anything new).
Woolner’s with the Indians now, so this isn’t possible anymore, but back in the day they really should have had Wooler and Davenport go mano-a-mano and have Thunderdome: two stats enter, one stat leaves. Obviously VORP and BRAR can’t both be right, so just figure out which the best one is and use that.
This has nothing to do with their commercialism - they just seem to have a very gloves-on policy when it comes to critiquing each other’s work.
The blog software they use is Wordpress. This means they intentionally turned off comments. Instead, they force their readers to only be able to communicate with the authors, and only if the authors wish to communicate back.
Even NY Times and ESPN allow comments, and plenty of academics like Freakanomics and Wages of Wins. So it’s not like it’s the snooty elitist that don’t dare talk to us out in the open.
They totally miss the boat as to what a blog means. Yes, yes, I’m sure there’s some excuse that they have for not allowing comments from their users that sounds perfectly reasonable if you don’t think about it for more than 2 seconds.
I’m annoyed.
I think they’re worried about what happened to firejoemorgan.com the one time they turned on comments. But you’d figuring opening up posting comments to subscribers only (while still allowing public viewing, of course) would be a huge step forward without requiring much moderation.
That’s what turned me off from their site years ago. I found Baseball Primer and was able to instantly respond to anything written. In fact the discussions there are the primary reason to visit the site, though there are some very good authors providing original content there.
I used to email Prospectus now and then. The last one I sent was when one of their authors stated that Jim Leyland was to blame for ruining the career of Alex Fernandez. I pointed out that it was more likely years of abuse that did him in, and how his workload was greater with the 1996 White Sox. I got a real short response telling my I was wrong and they were right, and pretty much didn’t see any reason to keep reading after that.
There isn’t any excuse in online media for not allowing comments. I can understand in some cases the need for moderation or registration, if one is concerned about keeping the quality to noise ratio high, but BP is stuck in an old world concept where you get to pick and choose your responses, and ignore any criticism that shows you to be less than the super genius baseball expert you think you are.
I shouldn’t be too harsh, this is after all about one of their authors responding to questions on another forum. But that should be available on every article on their site.
Is there other sabrmetric site that does not allow comments on articles? None that I know of.
At the end Nate indicates that PECOTA does not have pitch f/x yet. He thinks it will be part of projection systems in the next 3-5 years. The way I would use that data, if I were a pitch f/x guru, would be in the regression portion. Get your rates for BB, SO, HR, BABIP, and instead of regressing x% to league average, you regress to your pitch f/x comparables. Age and park adjust, and you’ve got a great pitcher projection.
Rally, gotta be careful about regressing data to other data that is NOT independent.
Lots of criticism of BP and PECOTA. I feel compelled to point out that the PECOTA projections are probably one of the best, if not the best, overall (who knows about subsets of players) projections around. You can’t criticize someone too much for that! It certainly isn’t fair to say, “Putting lipstick on a pig” if the “pig” he was referring to is the player projections. If he was referring to their WARP, the only thing that makes it a “pig” is the baseline (the replacement level). The rest of it is just fine.
People in general don’t like to admit they made a mistake. In business, it is almost anathema (to admit a mistake, unless it is a small one - in this case, it is a LARGE one).
That being said, BP really should admit their mistake in a sort of “non-admitting a mistake” way, such as, “We’ve been thinking a lot about our replacement level baseline, and after much consideration, and input from our knowledgable readers, we have decided that...”
MGL - Yeah, the pig I was referring to was WARP, not PECOTA. PECOTA is “best of breed,” and has earned all the accolades that come with that.
And I think there’s really two things wrong with WARP - the replacement level, and the reliance on FRAR for defense.
I agree with you on both counts. Interesting that Pecota is “best of breed” yet gets hammered with criticisms. Seem like a lot of the criticism are valid, but are red herring, ad hominem, and other similar-type attacks and arguments.
Also, pitchers tend to get smarter about situational pitching as they age, which means that older pitchers tend to have ERAs that are better than their PERAs (component ERAs). This effect is very minor, however.
Interesting! I have never heard that before.
Anyone know of any published research to support that rather bold claim?
But the real problem is simply that baseball teams don’t have to pay market rates because there are lots of smart young people who are willing to work for substantially less than that, with baseball being a high-status job. We sabermetricians should unionize, damnit!
Is it simply (or mostly) supply and demand or is the “I’ll work for an MLB team for almost nothing because it is so glamorous (ostensibly)” force so strong?
It almost can’t be the latter, otherwise athletes and other entertainers would make almost nothing, as you can’t get much more glamorous than that.
So it has to be either that there are just too many adequate sabermetricians or that the teams don’t understand the value of a good sabermetrician. Probably the latter I would think. Market price for anything is a combination of its (perceived) inherent value AND supply and demand, no? Even if there were only 10 good janitors available in the whole world to clean the locker room after a game, no one is going to pay them a million bucks to do so, right?
In all fairness to Nate, he did qualify his non-linear stance to the effect of something like, “there is still some non-linearity...” Or something like that. So I don’t think it is a matter of one or the other. If Tango says that the relationship is mostly linear, which he did/is, I don’t know that Nate would disagree with that. And I would almost tend to think that there has to be SOME non-linearity, no matter how small, just because of supply and demand and the “bidding” process that essentially goes on for FA. If I even get what they are talking about (whether player $ value as a function of their WAR is linear?). My guess also is that because we don’t have slews of data to work with, we can’t really tell whether it is perfectly linear or not, although a nice graph SHOULD indicate whether there are “bends” or not, shouldn’t it (although bends or no bends could be noise, no?)?
There are probably more capable sabermetricians in the wild than there are teams willing to pay money for them; this is doubly true because they’re able to get so much of the return of hiring a sabermetrician for free (or nearly free) from places like the Hardball Times and Baseball Prospectus. I’m pretty sure that simply having the run/win expectancy tables available on this site is more advanced baseball research than most teams can concievably implement now.
Or put another way. Look at how many of us are willing to sit around and do this for free. I’m not suggesting that all of us are little Gary Huckabees, looking at everything from how it helps teams and owners. But how many of us would turn down a chance to do this as our day jobs? Meanwhile, how many people are willing to do surgery for free? There is a “prestige” premium.
But baseball has a much greater need to compete with, say, football and basketball for the limited talent pool of gifted natural athletes than it does to compete with brokerage firms and IT departments for those resources. Especially since they’re so far gone I don’t think they can really tell the top sabermetricians from the bottom ones; witness the link from Tango earlier about the house stats guy of the Brewers.
18: I’m pretty sure perceived inherent value is factored into demand, not a separate third concept…
I posted this at SOSH:
===============================
Here are two relevant threads that I wrote that remain unchallenged:
Posts 9 through 11:
http://www.insidethebook.com/ee/index.php/site/comments/supervorp/#9
3 (my son is forcing me to write the number 3 here… I think he wants his breakfast)
http://www.insidethebook.com/ee/index.php/site/comments/do_teams_pay_more_for_top_end_talent/
===============================
So, two things. One, I don’t even think you can create a model of non-linearity that has any real value. And two, the use of WARP’s superlow replacement level forces a MORP equation that is exponential to counteract that! In this case two wrongs (low replacement level plus exponent) DO make a right!
***
I don’t think anyone is criticizing PECOTA, other than the unproven percentile levels, which IMO should be scrapped.
Tango re/#12
I had never thought about the lack of a comments section. FWIW, I just submitted that question to Christina Kahrl’s chat today. I think she is the manager of BP.com so she might respond to it. You might want to skim thru the transcript to see if she does.
Hope people liked the chat overall.
Philly, I look forward to the response. My bet is a p.c. response:
“Yes, we’ve considered adding comments, as that’s a good idea. At the moment, we’re still not convinced that we can handle it. But, we’ll continue to have dialogues over it.”
What I really want to hear, the unabashed truth:
“Hard to believe, but there’s one or two guys at BP that are dead set against it. They don’t think it does our image any good, is ripe for problems, and is just going to give us headaches. (Look what Keith Law has to put up with with his paying ESPN readers.) These BP guys are the ones that have most of the say at BP. The rest of BP, they love the idea of a discussion forum with the readers out in the open.”
Colin/9: right, I agree that if they want to make money, more power to them. I would prefer them to present their work in a more upfront fashion.
I agree about the fiefdom analogy. Basically, it’s like the USSR, or the various European empires over the years.
Kahrl responds to Philly’s question:
Mike (PA): Do you ever see a time when BP would consider opening up a comments section for subscriber responses to individual articles on BP.com?
Christina Kahrl: I do, but that’s because it’ll be this year, and we’re probably talking several weeks and not several months in terms of implementation
Well, that’s good to hear! I wonder what made them change their minds?
I would have bet my next paycheck that the answer would have been Tango’s pc response.
And that’s why I don’t gamble.
And also why it’s good to ask questions even when you think you know the answer.
IIRC Nate Silver announced plans to introduce a comments feature last year, or maybe even two years ago, but they never implemented it. Given what’s been mentioned here (risk of spam, cost of monitoring), I bet this feature will be for subscribers only.
I dunno, I remember B James ranting in the ‘88 Abstract that, “If you want to read what I write, than read it. If you don’t, then don’t. I’m not a goddam public utility.” (my paraphrase).
Why doesn’t that sentiment apply to BPro? They publish their material, they charge a certain fee, and they let the market forces work. I may be mistaken, but I don’t recall BPro ‘guaranteeing’ that all of their stats are the absolute best alternative. They seem to be quite open that their non-PBP fielding stat is less than the state of the art, and are apparently working hard on a PBP alternative. They embraced PythagenPat when they realized it was better than PythagenPort. And so on. They’re not perfect, of course.
And to someone like Tango, maybe you do well enough in your ‘real job’ that you don’t have to consider how to maximize your sabermetric income, but maybe that’s not the case for most other analysts. This applies more to MGL ("I’ll wager any amount of money”, and all of his bluster. I’m happy for you that you have lots of disposable income, MGL, but maybe some others have to structure their efforts so that they can do baseball analysis for a living and still support their families).
So, I think it’s better to tread lightly in this area with BPro. Sending them emails on why X is better than their own in-house stat is fine, but slamming them as sell-outs on threads like this is simply not going to be fruitful, in the larger perspective. IMO.
I don’t think anyone branded them as sellouts.
I think they are just married to certain things in the face of something better.
Whatever anyone’s opinion of the subscription issue, it has nothing to do with FRAA or the replacement level. Those stats are, and have always been on the free side.
I’ve never been a subscriber because I only have so much time in a day. If I paid I’d have to start reading their stuff, and while I’m sure I’m missing some good things, I’d have to cut back on reading THT, this site, BTF, or somewhere else if I did. Too bad we can’t add 10 hours onto the day (and keep the regular job at 8 hours).
And it’s not just BP. Do you remember the furor of UZR when MGL unveiled it? And he laid it all out for people. Same with DIPS. I lay into Forman at b-r.com. Lord knows I was brutal on Win Shares (and deservedly so… and James is making or has made corrections as a result.) Everyone takes a hit. And the useless stats you see at Fever and other boards also get hit.
What disappoints us with BP is that they are smart, they see the problems, and STILL don’t do anything about it. And worse, they continue to use it as if they have no issues.
Having said that, what WOULD be a fruitful way to approach the issue? You tell me how to make them accept that their replacement level is unrealistic, LEV makes no sense, and VORP should not be based on an inferior stat like basic RC. I’m willing to do what it takes David. Tell me what that is.
Rally/31: good point. Everything that is being criticized is part of the “Free BP”. The “Pay-for BP” (basically, their analysises and opinions) are not even at issue here.
Here’s some cool charts from Nate:
http://baseballprospectus.com/article.php?articleid=7189
And Redsox players:
http://baseballprospectus.com/fantasy/dc/index.php?tm=BOS
Since we’re talking about BPro here… I’m presently reading the 2008 annual book. Other than the team sections, there’re only 3 research articles, I think. One on baserunning numbers, one on OF arm numbers, and an interesting article by C Davenport on pitchers working the strike zone.
Sure, feel free to comment on their book in this thread. I didn’t get mine yet.
***
David, and the end of my post 32, I put in a request. Let me know what you suggest.
I red the article and looked at the charts (#34, above), and I am afraid that I just don’t buy it.
Nate keeps talking about these data points on the chart as if they are data points in a player’s career (e.g. he says that Jeter is a “wide variance player at this point in his career") or as if they are bona fide distributions that we expect this player to have.
What these points are, of course, are simply rate stats and playing time for all of a player’s historical comparables. And given that there are relatively few players in each player’s set of comparables, the differences among player charts look like a lot of noise to me.
I mean is there ANY evidence that these differences are not a lot of noise? Is there any evidence that Jeter himself (and players like him) actually are likely to have such a wide range of rate performance in the future (I don’t know if these charts represent 2008 only or what)? And that Miguel Cabrera is expected to have a small range of performance?
And do we need a chart to tell us that a player like Pierre is likely to possibly have not that much playing time given his age and limited offensive skills?
Honestly, until I see evidence that a player’s unique comparables gives us a good idea as to the distribution of his likely future performance, I just don’t buy it. Even if I did buy it and Nate is right, certainly he must admit that there is TONS of noise in those data points!
Now here is a statement that is just flat out wrong!
Players like Pierre are fairly easy to predict in terms of their rate performance: over the last three years, Pierreās batting averages have varied between .276 and .293, his on-base percentages between .326 and .331, and his slugging percentages between .353 and .388; those are not very wide spreads.
Does Nate honestly think because a player’s performance has been consistent in the past, that his future performance is “easy” (significantly easier) to predict? Come on!
Here is a challenge for him or anyone else: Do a study and break down your players into two groups: One group has very consistent stats the last 3 years, and the the other group has inconsistent stats, but otherwise both groups are the same age, overall numbers, etc.
Now look at the next year for both groups. I say that one, both groups have the same overall numbers in year 4 AND I say that both groups have the same spread or standard deviation in performance in year 4! IOW, I think that the idea that if a player has been consistent the last few years that he is “easier” to project (by “easier” I assume that means a smaller spread in expected results) than one who has not been consistent, is hooey. That is especially true, IMO, if we make sure that the “inconsistent” group has not been unusually unhealthy and therefore we don’t know about their health status in year 4 (thus causing a wider spread in performance in that year).
I think it is a gigantic myth that consistent players are “easier” to predict and I think (and am surprised) that Nate is (ostensibly) buying into it.
Tango and I have been railing against this (the distribution of future performance, not the Piere “easy to project” thing) for years (not so much that they are aware, I don’t think) and as of yet, I have NEVER seen any studies or evidence suggesting that they can “project” the distribution of a player’s likely playing time beyond any player of a similar age, amount of historical playing time, and amount of projected playing time.
IOW, if we project a player with no playing time already, and we know nothing about him, and we project him for 300 PA, our projection will be about league average for whatever population we think he comes from, and the likely variance of that projected performance time will be exactly equal to the variance of true talent in that population plus the variance by chance in 300 PA. I don’t care what his comparables have done (of course, he’ll have no comparables).
Likewise, if we have a player who has thousands of historical PA, our projection for him will be some version of a Marcel, or whatever, and our likely variance of that projected performance will be close to zero for true talent (a little even for a player with thousands of PA) PLUS the random binomial variance associated with however many PA we project for him, with an adjustment for age. Period. I don’t care if it is Jeter, Cabrera, or Babe Ruth himself. I don’t think that the variance will be different for different players as Nate and BP strongly contend (with no evidence). I claim that the widely different variances they see, even for players of a similar age, and a similar historical number of PA, when looking at players’ comparables, are simply noise! And I think the burden is on THEM to prove otherwise before we believe all of these “performance distribution numbers and charts.”
I do concede Nate’s principal point in his article (although it does not interest me very much), that in reality, a player’s likely performance distribution will have a “funny” shape and characteristics because of this intrinsic relationship between performance rate and playing time. IOW, if we project a rookie to be league average in performance (good for a rookie) and have 500 PA, while we would expect his actual performance rate distribution to be normal around league average with a small spread in playing time as well (due to injury), in reality what will happen is that his performance rate will indeed be smooth and normal around league average, but his playing time will be funky and correlated with his performance rate, as Nate says. Play bad for the first month and he is done. Play good and he continues. Etc.
Ditto 38.
I’ll add that since K and BB component rates have more year-to-year correlation (i.e., more indicative of something real), then a guy like Thome would have less variability than a guy who always puts the ball in play.
I’m not so sure about that. Juan Pierre should have less variability than Thome re: batting average because he puts the in play 600+ times a year, compared to 300-350 for Thome. While HR rate may in general be less variable than BA, Thome might hit anywhere from 20-40 in a full season. Pierre might hit 1,2, or 0, and is very unlikely to hit more than 5 - there just isn’t much room for variability here.
Good point as well.
Sounds like an easy enough thing to study…
The less variability in Pierre’s HR’s comes simply from the lower standard deviation because of the lower rate (and in his case, more BIP).
For example, if Pierre’s true rate is 2 HR per 500 BIP, the binomial standard deviation is 1.4, so he is expected to hit between zero and 5, 95% of the time, which makes sense.
For Thome, if his true HR rate is 30 per 300 PA, his SD is 5.2 HR for a range of 20 to 40.
So, if you are going to give the expected range of performance for any player, sure you not only have to consider his overall number of historical PA, but you have to break it up by components as well.
That is one reason why Pecota, using comparables can get different profiles for players who have around the same number of historical PA and around the same overall projection, but it seems they carry it a lot further than that by virtue of taking a player’s comparables and assuming that his future performance will look like that of his comparables.
Even if that were true to any extent, at the very least, you would have to determine how much to regress those profiles, because clearly there is going to be a lot of noise in the total of 50 or 100 comparable players, at least in terms of their collective distribution of future performance.
And in order to to that (figure out how much to regress), you have to first figure out how much of these different future performance distributions (breakout, collapse, etc.) are “real” and how much are noise. As far as I can tell and as far as I know, BP is assuming that it is ALL “real” and is not doing any regression at all.
Here is example of what I am talking about in case anyone does not know what I am driving out. It is difficult to explain.
Let’s say that we have two similar players. Both are around the same age, have around the same number of historical PA, have around the same component profile, and around the same number of projected PA.
Now of course they probably have very similar comps, which would mean that BP would assign them similar future profiles (breakout, collapse) I would think.
But let’s say that they played different positions, and were different height and weight and had some other differences that made BP give them some fairly different comps (obviously a lot of the comps would still probably be the same, but that is OK).
Now what if by virtue of each player’s 50 or so comps (again, I don’t know how many comps a typical player has), player A has a 10% chance of underperforming by 10 runs and a 20% chance of overperforming by 10 runs, whereas player B is just the oppotite - BP says he has a 20% chance of underperforming by 10 runs and a 10% chance of overperforming by the same? IOW, player A has a higher breakout and lower collapse rate than player B?
And this is because that is exactly what happened to the each player’s comps. Overall, for all 50 players combined for player A, 10% of the time, they underperformed by 10 runs. Etc.
Now, I say that the differences between the two players’ comps (in terms of the collective distribution of their future performance) are just noise! I say that each of those players, given that they are around the same age, same number of historical PA, etc., would have the same future performance distribution.
Now, even if we can tell something about a player’s likely future peformance distribution from the collective performance of his comps (which makes sense) over and above what we can figure out given just what we know about that player (historical PA, etc.), you still have to determine how much of that “extra” information is real and how much is noise. This is critical and I don’t think BP has done that.
Is the distribution of player performance normally distributed? Does anyone have any proof or know of any research addressing this issue? And if it’s not, is it still okay to use standard deviations when discussing a player’s spread in performance?
Victor, sure you can use “standard deviations” whether you are talking about normal distributions or not. SD is just a measure of spread in a bunch of data. One really has nothing to do with the other except that if some distribution is reasonably normal, we can make some inferences about SD and the area under the curve.
Binomial distributions, which are important in baseball statistics because virtually all performance is composed of binomial probabilities, are good approximations of normal distributions, I think, and we can easily figure out the exact SD given a binomial or binomial distribution using the “binomial formula” for SD (as long as we know p and N).
There are two things which affect the distribution of a player’s likely performance. One is random, and that is, given a certain static true talent value for all of his components, and given a certain sample size (PA, AB, IP, or whatever), what are the likely chances of any particular result, by chance alone. That is easy for any one binomial (BA, OBA, HR rate, K rate, etc.). Given that most metrics involve more than one component and given that the components are not independent nor are the rates independent, I am not sure how the SD for these metrics gets handled. It can’t be that hard. In any case, every player given the same true rates for each component should have exactly the same expected distribution of those rates, if we are talking about chance variance only.
The second element of that performance variance/distribution, or whatever you want to call it, is the true talent. That is actually two-fold, in and of itself. One is the chance that we made a mistake in our estimation of his true talent, and two, how that true talent might change in a few months, one year, two years, etc. This is (or should be) the one that BP claims is quite different from player to player (or at least for some players). They can’t claim the first one (the variance due to chance) is different from player to player other than the things that do change chance variance, such as p and N.
Anyway, I am a little out of my league when it comes to the statistical stuff, so I’ll just leave it at that.
Victor,
Somewhat related (true talent, not sample) to your post:
44/45, thanks for the help.
MGL, so in the example you use in #42 about the hypothetical spread of HRs by Pierre and Thome, are you assuming then that Pierre and Thome’s spread in HRs will be normally distributed? IOW, does Thome have the same chance of hitting 20 HRs as he does of hitting 40 HRs?
If Thome’s TRUE HR rate is 30 HR per whatever, then, yes, he should have an equal chance of being at 20 and 40. (Though, I don’t think that’s technically true, since it’s not a NORMAL distribution, but close enough.)
But, if Thome’s SAMPLE HR rate is 30 HR, then, no, he won’t have the same chance at 20 and 40.
Ditto what Tango said. If a player’s true rate is X, the his chance of hitting Y more or Y less than X is always exactly the same, since the random variation around a binomial is symmetrical.
If his sample (actual in any time period) rate is X (say, 30 per 600 PA, or whatever), then in trying to estimate his true true rate, there would be a MUCH better chance that he is a true 20 HR hitter than a true 40 because there are many more true 20’s than true 30’s (which is why we regress the 30 DOWN), although given his size and other skills, it might be that there are more true 30’s or the same, or whatever. It is whatever the true talent in the population he comes from looks like. In regressing sample stats to compute projections, we assume that the talent distribution in the population a player comes from is normal, but that is not true of course. Even if the mean of a baseball population is 20 HR (per whatever), the chance that a player is worse than that is MUCH greater than the chance that he is better than that. In our calculations (when we regress), we normally assume that the chances are equal - that the distribution of talent is normal (symmetrical around the mean, etc.). Tango can correct me if I got anything wrong.
Marcels projects a player’s true talent, correct?
#50, sure. A projection is essentially the same thing as an estimate of true talent. The only difference is that a projection may include context if that is what the forecaster wants, but a context-neutral projection is pretty much the same thing as an estimate of true talent. Of course, even a context-neutral projection has some context - that is average (maybe league average, maybe both leagues combined) opponents, average parks, etc. When we say true talent, we mean, “as manifested in whatever context we are assuming,” usually some kind of average context (or we wouldn’t call it “true talent").
The other issue when we talk about “true talent” is that of course we recognize that a player’s true talent changes all the time, from PA to PA or day to day, depending on how is feeling, what he ate for breakfast, etc., as well as from longer time periods to longer time periods, because of aging, learning, health, etc.
So when we say that a Marcel or a projection is an estimate of “true talent”, we mean “average true talent” over some time period, like one year hence (or whatever).
As is often the case, one or two words alone (sometimes one or even a few sentences) is not nearly enough to explain something, and even then, some things cannot be explained or defined adequately without context. I know you are not arguing anything here, but too often arguments consist of too few words and not enough details about context, such as “so-and-so is ‘good’ - no, so-and-so is ‘not good.’
Ditto 51. That’s exactly what “true talent” means.
It’s the underlying average “true rate” of something that you expect to manifest within a typical distribution of contexts (opponent, park, hangover), over a set period of time.
A projection is that manifestation within a PARTICULAR distribution of context.
Unless you play at Petco, or have an unusual year of hangovers distinct from past years, the typical context matches the particular context for our purposes.
That’s another question for Clay Davenport, but I know generally speaking he’s very aggressive about making tweaks of all kinds to the WARP formula. Both Clay and I tend to be perfectionists about our algorithms and programs, which leads quite literally to a lot of sleepless nights.
That made me chuckle. We’re supposed to believe that Clay is aggressively tweaking WARP because he’s a perfectionist while leaving his disastrously incorrect replacement level as the baseline for the whole calculation?
In other news, you should check out the new speakers in my ‘74 Pinto. I’m getting it washed on Saturday. That thing is beautiful.