THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Sunday, January 31, 2010

Why do we need personal opinions about factual matters?

By Tangotiger, 11:19 PM

BPro reader:

Chris Perry
(46873)

Personally, I’ve never found the Chone projections at all accurate. Perhaps somebody can explain to me why they’re cited so often?
Jan 27, 2010 12:28 PM
link
rating: 1

judyblum
(5990)

I think it’s mostly because they’re available earlier than some of the others. That’s not meant to be a jab, anyone else who might still be working on their own projections, feel free to take as much time as you need to do a good job.
Jan 28, 2010 08:05 AM
link
rating: 1

See, that’s one thing that doesn’t go down in this blog: you talk bullsh!t, and I’ll slap you down for it.  I’m disappointed that all 20 of the BPro staff let’s stuff like that sit there like that.

As a point of fact, Chone has done as good, or better, than PECOTA.


#1    Nick Steiner      (see all posts) 2010/01/31 (Sun) @ 23:29

Tango, you don’t have a BPro account?  Why not just comment there and link to the Vegas Watch and THT articles showing the various accuracy of projections?


#2    Wells      (see all posts) 2010/02/01 (Mon) @ 00:00

I don’t think BP should delete that message- if someone wants to defend Chone and call BS on the poster, they are free to do that and the conversation will stand as an exchange. Seems legit to me. The guy’s not being inflammatory or anything.


#3    J. Cross      (see all posts) 2010/02/01 (Mon) @ 00:01

Chone definitely did better than pecota for the 438 2009 players we looked at (data forthcoming).  Of course, so did everyone.


#4    J. Cross      (see all posts) 2010/02/01 (Mon) @ 00:02

that should say “hitters” instead of “players.” It’ll get back to you on pitchers later.


#5    Tangotiger      (see all posts) 2010/02/01 (Mon) @ 02:00

Wells: I didn’t mean to remove it.  I meant for the BPro staff to respond to it.

Nick: no account.


#6    MGL      (see all posts) 2010/02/01 (Mon) @ 02:23

I agree that we don’t need opinions on factual matters, but one of the problems with this issue and many others, is the wording.  There are many ways to evaluate projection systems.  So to say that how “good” a system is is a matter of record (fact), is not a very accurate statement. The answer to that question depends on how you define “good” and I submit that there is no one definition as it pertains to the issue of evaluating projection systems.

In general, since the word “good” is a qualitative word, people are entitled to their opinions as to what is “good” or not, even when some people think that “good” or “bad” is a matter of record. 

Let’s say that someone has the perfect projection system in that God came down (or up, whichever the case may be, and whichever God you happen to worship) and told them everyone’s true talent rate numbers.  Well, every player will fluctuate in performance randomly around that number of course.  Someone might look at the projections and see that 1/3 are off by one SD, 10% are off by 1.5 SD, etc., and conclude that “in their opinion” these projections are not “good,” and they would be entitled to that opinion, not knowing, of course, much about statistics, projections, and the like.

So, basically I am saying that let’s be cognizant about how the questions are framed or how we interpret someone’s answers before we declare that the “true” answer is a matter of fact or record.

A favorite of mine in this regard is when Bill O’Reilly used to bait liberals on his show by asking them either, “Do you think we are winning the war in Iraq?” or “Do you want us to win the war in Iraq?” Of course, those are not really answerable questions because one person’s definition of “win” may be completely different from another’s, and both questions are stupid in general.  When the person being interviewed would duck or refuse to answer the question, and rightfully so, O’Reilly would yell at them, “Just answer the question, ‘yes or no’!” Kind of like, “When did you stop beating your wife - just give me a date and nothing more!”


#7    Nick Steiner      (see all posts) 2010/02/01 (Mon) @ 02:36

Ditto MGL/6

I’ve had this argument many times.  Projection systems can’t be “accurate”, it isn’t what they are trying to do.  They are trying to nail down true talent level essentially (some projections, like Oliver IIRC, are park neutral).  Since players ALWAYS perform differently than their true talent level, it’s impossible for a projection system to be “accurate” if the thing you are measuring it against is the projected season’s stats.


#8    MGL      (see all posts) 2010/02/01 (Mon) @ 05:44

We can certainly use some statistical techniques to evaluate the various projection systems for “accuracy,” but the results will still depend on the technique used, the data used, the filters applied, and of course plain old luck, as we are usually working with one or a few seasons at the most.  And when all is said and done, one person’s accuracy (say, lowest RMS error for all players weighted by actual playing time) is not necessarily going to be equivalent to another person’s accuracy (say, number of players “nailed” versus number of players “punted").

So, again, while I agree with and think about the premise all the time (the folly of having an “opinion” on a factual matter), I think that the “accuracy” of a projection system is a poor example of something that is “factual.”


#9    Vic Ferrari      (see all posts) 2010/02/01 (Mon) @ 08:58

MGL

Regarding your last post, I think that this is a terrific point.

And specifically regarding forecasting systems, all of them are regressing far too much. It seems to be a means of correction for applying ability distributions that are incorrect in every form, mostly the thinness of the tails, in some cases the skew.

Marcel is the only one that is naked as far as I know, and if I’ve read it correctly, it implicitly it assumes a Guassian ability distribution for batters that is near enough a beta form P = p^(Kx-1)*(p-1)^(K(x-1)-1) with x being the league average and the measure of ability: K=600 always and PA as the denominator.  That’s attributing an awfully wide range of abilities equally to every aspect of hitting.  Probably not far off on BABIP with IP as the denominator ... even then.

If you took Marcel’s implied math for regression, and built a likelihood distribution for each player in one season for any stat (say strikeouts per PA, or anything so long as PA was the base measure for the sake of consistency).  Then compared their results the next year ... their position in the cumulative density distribution of the forecasted is what you would want to grab.

So for example you would look at Bobby Abreu, see he had 120 strikeouts in 580 PA (all numbers made up) ... and compare that to the likelihood distribution that Marcel made using one year of previous results and no age adjustment (Marcel is doing precisely that, he just doesn’t say it aloud) and you see there was only a 2% chance of ending up so badly for him if marcel’s fundamental reasoning was correct.  The same for every other player of course.

Plot that out as a histogram (how many were in the 0-5% range of expected by the individual player ability, how many in the 5-10% range, etc.) a beta order test.  Simple as beans and naive as hell.  IMO that’s a decent first-step measure of the veracity the presumed ability distribution.  The concavity gives an indication of the correction needed, the extent to which it is skewed left is largely a measure of the ‘survival of the luckiest’ phenomenon you explained in a THT post in December.  Run that for players with relatively few PA and the effect is profound I’m sure.  For older players is should skew the other way a touch.  I does in hockey, and managers be managers, and human nature be human nature.

The ability distribution should be generally unaffected by underlying sample size, so using data from even numbered games to odd numbered games in the same season should completely eliminate the skew (which seems to have been balanced somewhat in Marcel by understating the prime age of hitters, though that’s just my sense of it). As well as park effects, aging, injury and the like, all that gets essentially neutralized.

If someone has the data for odd and even numbered games handy you could write a script that uses Jim Albert’s estimates of the beta constants for each batter ability (from a ‘By The Numbers’ article).  That will just be a few lines of code I think. They will have much the same problem as marcel (and presumably the other forecasters, which surely make precisely the same assumption), but it will be closer.  And if I’m right, the skew should disappear, but we will still have far too many overacheivers and underachievers, which will suggest a serious problem with the ability model.  And give us some ideas for when we go back to the drawing board.  Rinse and repeat.

Makes sense, no?


#10    Vic Ferrari      (see all posts) 2010/02/01 (Mon) @ 09:04

The third paragraph should read:  “That’s attributing an awfully NARROW range of abilities ...”


#11    jm      (see all posts) 2010/02/01 (Mon) @ 09:50

I’m not sure that the BPro staff have the responsibility to be arbiters of sound critical reasoning skills in comment sections.  Perhaps if they usually did, but chose not to when a competing projection system were called into question, then this would be morally questionable.  Given the rating system installed in BPro’s comments, clearly the design is for the readers to regulate their own content (which also matches BPro’s hesitancy to add these kinds of discussion areas).


#12    Tangotiger      (see all posts) 2010/02/01 (Mon) @ 10:12

BPro said they will “lead the discussion”. 

***

And who said anything about sound critical reasoning?  And to the earlier comments about the “accuracy” of the forecasting systems.

Look at exactly what the reader said: “Personally, I’ve never found the Chone projections at all accurate. “

There’s no ambiguity there, nor can you even test it for it to be close to accurate.  “never found"… I mean, not even ONCE did you find it accurate?  Look at the Vegas Watch rankings where Chone and MGL are neck-and-neck.

What if I said: “I’ve never found MGL to be at all accurate”.  I mean, if you are going to be fair, you have to concede that sometimes he knows his sh!t.

The reader couldn’t be more dismissive.

My point isn’t that the reader should have done any kind of critical comparison, and said anything about how Chone is somwhere between the 40th and 100th percentile in forecasting systems, or whatnot.  His statement can only be interpreted that it’s in the definite bottom half.

And that’s bullsh!t.


#13    Wells      (see all posts) 2010/02/01 (Mon) @ 10:15

You could lodge some complaint here that BP has some nefarious plot afoot to promote PECOTA and to make Chone (and others) look inferior. So maybe BP had some ringer make the comment.


#14    jm      (see all posts) 2010/02/01 (Mon) @ 11:31

"Leading the discussion” is far too vague to support any interesting generalizations here.  The issue remains whether or not there is an independent normative argument that BPro has a responsibility to respond to these kinds of statements.  That phrase could mean a hands-on interaction with all comments, it could mean pushing topics by writing articles, etc.

The question here is whether BPro should respond to these kinds of comments each time they come up.  Well, one issue would be the sort of nefarious plot Wells mentioned - such as if BPro did go after other unsupported comments, but not when the comments made BPro look good.  I’m not seeing the evidence for that.  So instead, it would seem to be the more general claim I began this paragraph with.  Alright, well there are two strategies.  One is to argue that they have taken that on themselves ("leading the discussion") and are failing in it.  The second is that they have the responsibility regardless.  I’m not sure why the second would hold, and would need an argument for it.  As to the first, that phrase is far too vague, and we’d need independent reason to limit its interpretation in a way that does not beg the question.

Regarding the point about “sound critical reasoning,” I’m not sure why that’s not a fair interpretation of your point.  Now, I think your point about “never” is at least instructive on why we may differ here.  I did not read “never” as a negated existential quantifier.  Rather, I read it as loose speak, a fairly common turn of phrase where the intended meaning is something more like “very rarely.” Even still, let’s follow both.

If we take the claim to be there is no case, such that in that case I found CHONE to be accurate, then the problem is likely that the poster is simply speaking falsely, or that he is using an extremely limited set of evidence (one could imagine a fantasy player looking at a few key projections and those projections being more inaccurate than others).  I interpreted him in a different way in order to avoid this first possibility, by the principle of charity.  If we take the claim to be that he rarely, if ever, has found it to be helpful, then again, one must question the evidence he brought to bear.  Indeed, I took “personally” to suggest exactly that - he has not looked at a wide array of evidence to ground his opinion.  That’s a fairly obvious case of “sound critical thinking.”

So, perhaps the claim is simply false.  Alright, well then the issue is whether BPro has a responsibility to regulate false claims.  Suppose instead it is true - that this poster has never (on either interpretation) actually found CHONE to be very accurate.  In that case, the problem is that he presumably wants to generalize to the power of CHONE (and not just keep it limited to a statement about his own experiences).  In that case, he is displaying poor critical reasoning skills, as he is reasoning on insufficient evidence.

So you can reject my “sound critical reasoning” description all you want - but doing so limits you to a rather narrow interpretation of what the poster actually said.  My phrasing covers both a narrow and broader interpretation.


#15    Vic Ferrari      (see all posts) 2010/02/01 (Mon) @ 11:45

Hrmm.  That would be more difficult than I thought.  It seemed simpler in my head, turns out it leads to big math.  Either that or iterative methods or bernoulli trials.

To test a forecasting system against the universe, as opposed to testing it against other forecasting systems ... you could simply take the forecasted rate (p1) for a player and calculate the simple binomial likelihood that a guy got next season’s results (k successes on n trials) with an underlying ability of p1 or less. 

Do the same for everyone and churn out a stem plot or histogram. 

That doesn’t seem quite like cricket to me, though.  Surely someone knows better than me, but I would think that using the forecasted rate likelihood distribution, instead of just the average or mode, that would yield a different result. especially if the forecasted rate distribution was irregular.  I dunno.


#16    Rally      (see all posts) 2010/02/01 (Mon) @ 12:03

Seems like an offhand comment by one person, who almost certainly has no facts to back it up.  Maybe he took looked at a few projections I did and didn’t like them.  I certainly doubt he’s looked at the whole set and done anything systematically here.

As for BPro, somebody with an account feel like dropping in “Personally, I’ve never found PECOTA to be all that accurate” and see what happens?  Seems like a good test case.

If the guy wants to think he’s getting something special and better for the money he spends on PECOTA, who are we to stop him?  It’s morally irresponsible to allow fools to keep their money.


#17    Rally      (see all posts) 2010/02/01 (Mon) @ 12:14

I could probably make a lot more money off my site if I switched my product pricing.  The website is free to all, but I charge for the WAR data downloads, yet give away spreadsheet projections for free.

The market of people who want to pay for large data files and have enough interest in what happened more than a season ago isn’t that big (though much appreciated), especially compared to the fantasy baseball market.

I charge the way I do because I think the historical database is a unique offering and worth it to those interested.  I don’t charge for projections because, frankly, I don’t think they are worth it and would never pay for them myself.

My guess is most people would just use ZIPS or CAIRO if I charged (as far as I know people look at several sources of data to inform them anyway), but enough people might still buy CHONE to give me some extra cash.  But I would not have any respect for my customers.  I would look down on them as fools.  So better not to charge at all.

For some of the things I don’t do beyond projections, like customizable league values, there’s always Last Player Picked.


#18    Rally      (see all posts) 2010/02/01 (Mon) @ 12:21

Vic,

Your post my be a little over my head, but what do you think of this:

Say I take all players projected to hit above average, say .300 to .310, and found that their actual averages next year averaged about .305.

Take all the guys expected to homer in 7-8% of their at bats.  Then we should expect next year for that group to average 7.5%.

If that test works out, would you still think all projections are over-regressing?


#19    Tangotiger      (see all posts) 2010/02/01 (Mon) @ 12:29

As for BPro, somebody with an account feel like dropping in “Personally, I’ve never found PECOTA to be all that accurate” and see what happens?  Seems like a good test case.

Good point.  I would guess that the poster would get a “rating -5”, if he posts early enough.


#20    Rally      (see all posts) 2010/02/01 (Mon) @ 15:19

Probably worse than that.  The subject is homegrown right sides of infields.  Some guy mentions Howie Kendry Morales and gets blasted down.  Seems like everything about the Angels’ existence is an inconvenient truth to the people who write and comment over there.


#21          (see all posts) 2010/02/01 (Mon) @ 15:24

Vic,
It’s confusing where you mean rates and where you mean counts. Is K(x-1) a mistake for K(1-x) where x is a rate?

“K=600 always and PA as the denominator.” Does the latter mean that x is a rate in [0,1] calculated from a count per PA?
Is there a subclass of projections that model only the rates and give everyone 600 PA? Do you propose to assess all projections on that common ground?

At one point it seems you may have in mind a set of one-side null hypothesis tests for all of the multinomial categories {strikeout, walk, BIPout, single, double, triple, homerun [stop here for convenience]}. That is, test every multinomial rate on one side? We know that every projection will overshoot some rates and undershoot some. When a walk rate is 20% rather than projected 16%, the strikeout rate (among others) is more likely to undershoot.

By the way did Albert suggest this test in some particular context such as fitting/predicting everyone’s batting average?


#22          (see all posts) 2010/02/01 (Mon) @ 15:37

Vic’s hope to assess distributions along with or instead of point estimates is sensible.

This reminds me that Tom Tango somewhere describes the reliability component of Marcel’s estimate and recommends that other projectors should quantify the uncertainty too. That is a great point. It may go back to the 2004 explanation of Marcel which I read last fortnight, but I feel sure that it remains relevant.

Vic hopes to assess distributions without using any distributional component of the projections except the point estimates. A binomial and beta model answers the call (maybe; see my preceding note). Where the projection includes a distributional component, that should be part of the assessment, and to assess it well may encourage others to provide it. --in the tiny sabrmetric market, not the big fantasy market.


#23    Vic Ferrari      (see all posts) 2010/02/01 (Mon) @ 15:44

rally

I’d say that you would be pretty close if that was the case, just my sense of it.  How well the pattern of forecast results shifted from year to year would be a good tell as well.

And sorry for being spectacularly unclear in my post above.  I managed to express a very simple idea in the most muddled way imaginable.

What I was trying to say:

If we had a player who had 150 hits in 500 AB in a league we knew absolutely nothing about, some foreign high school level league.  Then if the question was “what are the odds of him being a .325 hitter by ability, but just unlucky this year? ... well we could figure that out pretty easily, just a binomial probability. 

And we could plot that out for every other eventuality and get a likelihood distribution for his true ability.  Just based on what we know.

Then if God came down and told us that for players of this type, in this far off league ... talent is distributed in the form of a beta distribution with K=240, and the league average for this type of hitter in this league is .270.

Well then we could just multiply our naive binomial likelihood distribution by the population ability distribution and voila ... that’s our new ability likelihood.

It would have been a kind God as well, because multiplying a beta by a binomial likelihood distribution gives you a simple distribution.  And the mean of that distribution distills down to (H + 240*.270)/(AB + 240).

This is precisely what Marcel is doing if he has just one season of prior batting results.  He’s using PA in the denominator in all cases as I understand him.  Of course marcel implements simple yet effective adjustments for data from 2 and 3 years previous, and for age as well.  And doubtlessly they are catching a mixed bag of other meaningful information, and in a simple way.

I’m dead impressed with marcel.  My mental arithmetic let me down when I calced marcel’s K at 600 above.  1200 / 5 = 240.  D’Oh!

The assumption of ability being distributed in beta form is bloody unlikely.  The reason that this is assumed by people doing bayesian math is because it makes everything distill down to very simple math.  And the chance of all elements of hitting having the same population ability distribution is exactly zero.  Yet marcel does well.

I have no quibbles with any of the other forecast metrics either.  I think it’s terrific that so many people are putting them out.  I’m struggling to find the underlying math for anything other than marcel though, google hasn’t been my friend.  Is CHONE your metric, rally?

I’m not interested in creating a forecast, evaluating one, or even using one.  I would like to see how they are built, though.  I am interested in refined ability distributions, and there isn’t a chance in hell that God made any of them beta.  I can’t afford to be off by much on that or there is an ugly ripple effect.  There are some practical considerations re the ability distributions that are pointed out pretty well by Brad Null in a paper discussed here earlier this month.  People who know the game better than me, and have been kicking at the numbers for a while ... if they’re switched on they’re going to be right on a lot of that stuff, whether it ends up captured by their math or not.

I suspect that CHONE marginally but consistently outperforms marcel at SO, BB, BA, 2B/PA ... hell, at all of the component results, but that it is about the same at slugging%, OBP and wOBA.  Is that right, rally?

It would take nothing for Tangotiger to correct for that, somewhere Albert, Null or someone of the sort has a list of beta K’s for each element of hitting ... just multiply by 5 and use instead of 1200.  I think it would be a shame to change anything about the marcel forecast, though.  The simplicity is it’s best quality IMO.  Doesn’t give me what I’m after, but it’s impressive.


#24    Vic Ferrari      (see all posts) 2010/02/01 (Mon) @ 16:11

Paul said:
“It’s confusing where you mean rates and where you mean counts. Is K(x-1) a mistake for K(1-x) where x is a rate?”

Yes.

And sorry for the generally confusing nature of my first post.

And I don’t think that Albert did propose anything like that on this subject.  And like Null or McCracken, he’s working with a chain of events IIRC, so a lot of his math won’t be directly applicable.  Though, like marcel, he is using a beta distribution for ability in everything I’ve read.

He has used order tests of various sorts I’m sure.  All the Bayesian guys seem to dig those for some reason.  There is a terrific Albert article on streakiness in a stats journal somewhere, I don’t know how old it is, I read it last year I think.  In any case, there are a couple different flavours of order tests in there, all are wholly sensible imo.  I gravitate towards simple tests, though, the test gets too big and I quickly start losing track of the nuts and bolts it’s built from.

Having said all that, if none of the other forecast systems are open source, it’s a moot point.


#25    Tangotiger      (see all posts) 2010/02/01 (Mon) @ 16:33

Vic, right, I have done that.  For K per PA for example, I weight the most recent seasons far more than past seasons.  Instead of 5/4/3, it’s more like 6/4/2 or 7/3/1 or something.  Basically, the K per PA changes we see by age actually are more indicative of change by age than the changes we see with BABIP.  I think Brian Cartwright published his component numbers somewhere.

But, I DON’T want to do this for Marcel.  That’s because the point of Marcel is:
1. to be the minimum level of quality that one should accept
2. the minimum point that all forecasting systems would agree to, and therefore build from

Marcel says three things:
1. three years of data; if a better system were to come along, it better use at least three years, and definitely not to try to swing it with only 2; and more recent seasons weighted more, so again, the better forecasting system will try to figure out how much

2. the aging curve: marcel says that the growth up is twice as steep than the decline down; again, there’s no point for Marcel to try to figure out exactly what it should be… just a general sense; let everyone else figure it out

3. regression: treat the data as sample, not true; probably the one thing Marcel understands better than most; again, as noted in #1, some data needs more regression than others, so let others try to figure it out

If on top of the three basics, you want to add more, like comparable players, minor leagues, parks, etc, go for it.

Marcel is there to act at least as a sanity check.  And in actuality, an ego check.

It’s like when Houdini exposed “magicians”.


#26          (see all posts) 2010/02/02 (Tue) @ 01:59

Tango, a quick question re: weighting for projections.

How do you determine when to change the weights for the past n number of years and how much to change them? (e.g. why 6/4/2 or 7/3/1 and why?)


#27    Shhh      (see all posts) 2010/02/02 (Tue) @ 16:11

http://baseballprospectus.com/unfiltered/index.php?type=1&p=1498#47523

“CHONE is down on everybody. CHONE seems to predict going back to the dead-ball era every year.”

Same commenter “Chris Perry” knocks CHONE again.


#28    Tangotiger      (see all posts) 2010/02/02 (Tue) @ 17:22

Can’t someone educate him?  Please post this link

http://www.hardballtimes.com/main/article/forecasting-2006/

And this snippet:

The highest forecasted RBIs were 112 (Tejada), 110 (Pujols), and 108 (Ortiz). What is this, the 1980s? If you had wanted me to only forecast RBIs, and not tell you who would do it, I would have said 150. Why would I give a number like that? Because from 2001 to 2004, the four highest RBI totals were 160, 150, 146, 145. It would therefore be reasonable to think that the league leader will be around 150. The league leader in 2005 had 148 RBI. So, I would have been pretty close, as an over/under.

But, how sure could I have been that it would be Ortiz? You could come up with a reasonable list of 15 or 20 players that would lead the league in RBI. But, that’s not what we area trying to figure out. We are trying to come up with reasonable over/unders, numbers that you could find equal reasons where the player will over-perform and under-perform. Injuries, as we know with Bonds, can devastate any forecast.


#29    Rally      (see all posts) 2010/02/02 (Tue) @ 22:06

When I put together my hitter projections at the team level, assigning playing time, and then did the same for pitchers, I was off by 6 runs per team.  That is, my team hitters were projected to score 6 more runs than the team pitchers.  A simple adjustment gets all the teams to add up to .500.

If my system is lower on hitters than others, and higher ERA for pitchers (I certainly notice this comparing to Bill James on Fangraphs) then other systems will be projected something like 775 runs for hitters and 725 for pitchers.  So you need a much bigger adjustment to get your league to .500.

I had hitters at 745 runs scored, pitchers 739 allowed.  So I just added 6 to each pitching staff to get to .500.

Last season MLB teams averaged 747 runs.  If that’s back to the deadball era, guilty as charged.

Could somebody point ‘Chris Perry’ to this thread?  I’m not going to pay money to respond to his hit and runs.  He can come here, or to my site if he wants a discussion.


#30    SG      (see all posts) 2010/02/02 (Tue) @ 22:30

Rally, my advice to you is just ignore him.  I just wasted far too much time yesterday responding to a troll at Fangraphs who accused me of developing CAIRO strictly to give the Yankees optimistic projections.  I’m not quite sure what purpose that would serve, and when I asked him to tell me what the point of that would be he just ignored it.

Once these types of people get something in their head, you’re not going to change it.  They’re not interested in a reasonable discussion.


#31    Wells      (see all posts) 2010/02/02 (Tue) @ 22:34

So, uh, Rally = Sean Smith, right? Just to be on the up and up here.


#32    MGL      (see all posts) 2010/02/02 (Tue) @ 23:25

SG, I read that thread on FG and you REALLY wasted your time responding to an obvious troll.  There are 2 things you can do with a troll:  One, ignore him, and two, make some sarcastic or snark response to him.  Trying to engage a troll in a worthwhile and rational discussion is like trying to discuss politics with a hampster.


#33    Colin Wyers      (see all posts) 2010/02/02 (Tue) @ 23:40

As Lincoln probably never said, but should have: “I learned long ago never to wrestle with a pig. You get dirty, and besides the pig likes it.”


#34    Vic Ferrari      (see all posts) 2010/02/03 (Wed) @ 06:24

Tom

I understand the idea and the history, and I’m sure you could get some sharp baseball fan to create a forecast without using consistent math at all, just applying his knowledge of baseball and appreciating randomness in a general sense.  It might take the guy a year or two to stop trying to overpredict breakout years and collapses ... but he’d get there soon enough.

I guess the thing I can’t get past is that Marcel always assumes an ability distribution of the beta form with K=240.  This where μ=the league average for the stat, α=Kμ and β=K(1-μ), this for the most recent season.  The bigger the K, the narrower the range in abilities of the players in the population.

And this for EVERY element of hitting. 

I think we can agree that’s not cricket.  And that is exactly what Marcel is doing.

And for two seasons prior K=300, for three seasons prior K=400.  So marcel says the inherent abilities of the population are shifting in a way that we surely all agree isn’t possible.

Marcel builds an ability likelihood distribution for each season and combines them.  Take the mean of that and you have you have the marcel forecast.

The player’s ability likelihood distribution for one seasons’ past data would be of the beta form with (using BB per PA as an example):
α=Kμ+BB
β=K(1-μ)*(PA-BB)
with K=240.

And for the season prior the same, except K=300 and μ may have changed a smidge as well.  You may have changed PA and BB by weights, but marcel has changed K and used the actual PA and BB.

Since marcel believes that talent was distributed more tightly two years ago, that the players, on the whole, had a more similar ability to draw walks ... he reacts accordingly.

And for the season before that K=400.

Combine the three to get the marcel ability distribution for the player.  The mean of that is the marcel forecast, less the extra adjustment for age at the end ( the principle adjustment is contained, ethereally, in marcel’s assertion that talent was distributed differently in past seasons).  Which, if I’ve read it correctly, is a sum, no?  So the ability distribution shifts right or left by .003 or .001, depending on the age.  If the final age adjustment is a multiplier ... then multiply.

If, marcel is right in his thinking, then he should be able to use his individual ability likelihood estimations to predict then number of guys who will have a huge number of BB’s after 200 PA.  Even though he wouldn’t have predicted any players at all to have that many, on an individual prediction basis.

Do that with marcel and it will become apparent that he has predicted a spread of results that is sometimes far wider than was observed in the actual MLB season, and other times far narrower.  Depending on what stat he was predicting.

As I say, my interest in forecast systems is tangential.  But the socialist in me would like to see open-source marcel, who is already impressive, do better.

I certainly regret dragging on this discussion, it doesn’t seem I’ve been understood, though.  But as I’ve started, on the off chance anyone is still reading this ramble ...

My point: trying to improve marcel by running a series of regressions to determine how much more we should be affronting god by forcing false ability distributions on past seasons ... wow, I can’t go there.

Using decent estimates of K for each type of stat (somewhere around 80 for BB/PA as used here, considerably less for SO/PA, probably a bit north of 500 for H/PA, and on and on).  That’s just easy and, I’m sure you’d agree, plain reasonable.  Makes the marcel math for forecast rates (ability likelihood mean) much easier as well.

Then don’t weight the seaons for PA, instead account for age effects by age adjusting each ability likelihood distribution before combining them.  You could do that by adjusting the BB’s from each past season, relative to the player’s current age, just using the chart from Null’s paper would surely be near as dammit.  No regression analysis need be applied, just simple logic.

That will make marcel a bit simpler, and he’ll forecast a wee bit better.  Still a bunch of naive assumptions in there, and lots of real world factors unaccounte for.  Most importantly though, it solidifies the foundations under marcel’s house, just in case somebody decides to add a second floor at some point.

And if you run the order test as I suggested above, marcel will do miles better than before.

Straightforward, no?  Again, I don’t mean this as a slam of marcel, not at all.


#35    tangotiger      (see all posts) 2010/02/03 (Wed) @ 08:31

Yes, when I am serious, I do my aging by components, and I weight each one appropriately.  It’s no longer Marcel however.  It’s “Chone” or “ZiPS” or “Tango’s forecasting system”.


#36          (see all posts) 2010/02/03 (Wed) @ 11:24

Ahh. My bad.


#37    Tangotiger      (see all posts) 2010/02/03 (Wed) @ 11:38

Vic, no not at all.  You have valid points, and I am by no means dismissing them.  It’s just that that the points can’t apply to Marcel specifically, just to forecasting systems that aspire to be better than Marcel.


#38    Rally      (see all posts) 2010/02/03 (Wed) @ 14:16

Vic,

CHONE does use a different amount of regression for each of the abilities forecasted, less for strikeout rate, more for babip, the highest is for doubles.  And also different year weights for the stats.

A lot of the stuff you mention is a bit over my head.  It’s been a long time since I’ve had a statistics or econometrics course, and while that is my field, I find that in practice we use a few tools extensively, and the rest of statistical knowledge hardly at all.

I’ll probably not even use the right terms, but the Marcel (or CHONE with different coefficients) is an approximation of a binomial distribution? It seems this might work well for something like batting average, but not so well for HR rate.

If average HR would be 15 per season, and SD = 10, then we’ll have some players 2 SD from the mean hitting 35 per year, but you obviously can’t have anyone 2 SD below the mean.  But I don’t know if I should be doing something different to model that.


#39          (see all posts) 2010/02/03 (Wed) @ 17:39

Tango/28:

Thanks for posting that link. 

On another BPro comment thread this week, I was on one side of a protracted argument with another poster, whose main criticism of PECOTA was that its implied leaderboards were unrealistically narrow. 

I and others (I think Colin among them) tried arguing that the purpose of forecasts was to predict the true talent level; that it was silly to try and predict which individuals would specifically be outliers; and therefore that one would often expect to see less dispersion in the “forecast leaderboards” than in actual leaderboards.

Others, however, persisted in the view that if a forecast system doesn’t produce “reasonable looking” leaderboards, then it isn’t.....I dunno, isn’t interesting? 

It was a queer conversation.  I got bored with it.


#40    Tangotiger      (see all posts) 2010/02/03 (Wed) @ 17:53

I’d like to see that thread.  If you remember where it was, feel free to link to it.


#41    Colin Wyers      (see all posts) 2010/02/03 (Wed) @ 18:06

Here it is:

http://baseballprospectus.com/unfiltered/index.php?type=1&p=1495#47163

To put it succinctly: Projection systems do not give you the most likely outcome, but the outcome with the least expected error. (I posted that in a comment on Fangraphs and it actually got two thumbs down from readers, who knows why. It has been somewhat better received as a Tweet. I’m working on an article to explain it in a broader sense.)


#42          (see all posts) 2010/02/04 (Thu) @ 23:09

[afterthought: The linked BINOMIAL CALCULATOR does not permit adjustment of the scale of the display. Maybe there is a better implementation of the graph.]

38/Rally(Sean),

You asked, “the Marcel (or CHONE with different coefficients) is an approximation of a binomial distribution?”

Where ^x means “forecast x”,
the binomial discussion concerns forecast rates such as (^HR)/(^PA) and it concerns forecast counts such as ^HR given ^PA.

The beta part of the discussion concerns inferred true rates (parameters), not the forecasts per se, although it must be common to forecast ^HR by multiplying a forecast ^PA and an inferred supposedly true rate.

“"It seems this might work well for something like batting average, but not so well for HR rate.

If average HR would be 15 per season, and SD = 10, then we’ll have some players 2 SD from the mean hitting 35 per year, but you obviously can’t have anyone 2 SD below the mean. But I don’t know if I should be doing something different to model that."”

Yes you should. This is not a valid criticism of binomial and beta modeling. First, “2 SD below the mean” is an idea from statistical modeling with normal distributions (call it normal statistics). It is derived from a version of the 95% convention that directs attention to the bottom 2.5% and top 2.5% of any distribution (maybe people, maybe errors in your forecasts). For the normal distribution “2 SD below the mean” is the bottom 2.5%.

Second, and more fundamentally, the separability of mean and SD (two parameters) is a feature of normal distributions --and others, but not binomial, whose SD is a function of mean and n. Here the crucial critical question is, why do you think you have mean 15 and SD 10?

Back to the first point:
The bottom 2.5% of a binomial count is some interval from 0 to h, which may be 0 to 0.

For example, suppose you know a player’s true triples rate 3B/PA=0.01 and you forecast ^PA=500, ^3B=5. Visit the binomial calculator at Texas A&M
: http://www.stat.tamu.edu/~west/applets/binomialdemo.html
and select n=500; p=0.01; “at most” 1; calculate
... 0.0398
So “at most 1”, which is the interval [0,1], represents the bottom 4%. If you seek a lower tail smaller than 2.5% it’s zero alone or [0,0].

Try a few other values and see that the interval [1,10] for the binomial forecast 5 triples covers more than 95% with tails [0,0] and [11,500] both smaller than 2.5%. Indeed the intervals [1,9] and [2,10] both cover about 95% of the binomial outcomes with lower and upper tails about 1% and 4% or about 4% and 1% respectively.


#43          (see all posts) 2010/02/04 (Thu) @ 23:13

38/Rally, see the preceding note

34/Vic,
How do you compose the greek letters here?

“"Using decent estimates of K for each type of stat (somewhere around 80 for BB/PA as used here, considerably less for SO/PA, probably a bit north of 500 for H/PA, and on and on)."”

How are you estimating K decently, off the top of your head, I suppose?


#44    Fargo      (see all posts) 2010/02/05 (Fri) @ 01:57

Just a comment on the title of this thread.  It was allegedly Senator Daniel Patrick Moynihan who coined the phrase, “Everyone is entitled to their own opinion, but not to their own facts.”


#45          (see all posts) 2010/02/06 (Sat) @ 16:15

Vic #23
“"The assumption of ability being distributed in beta form is bloody unlikely. The reason that this is assumed by people doing bayesian math is because it makes everything distill down to very simple math. And the chance of all elements of hitting having the same population ability distribution is exactly zero. Yet marcel does well.
“”

Of course the probability of precisely that is zero; in a continuous world the probability of anything precise is zero.

Why do you believe the beta family provides a poor model of the distribution of abilities (true rates) in the population of players?

Do you believe it models the empirical rates adequately, for example the actual BB/PA rates for every batter with 200 PA during the 1990 season?

#34
“"the thing I can’t get past is that Marcel always assumes an ability distribution of the beta form
“”

Is this a general sermon about parametric models (such as the beta family of distributions)? Or do you mean that beta distributions of binomial rates among baseball batters work poorly compared with other distributions people use elsewhere? --eg, some people use some normal distributions in sabrmetrics


#46          (see all posts) 2010/02/07 (Sun) @ 09:25

Paul

To use Greek letters with windows go ...

Start -> All Programs -> Accessories -> System Tools -> Character Map

Someone on a hockey blog showed me how to do that the other day.  It turns out that Greek letters make all arguments more compelling, Paul.

I’m actually thinking of changing my internet pseudonym to ϴ³, just to make life more difficult for people who are arguing with me.  :D


#47          (see all posts) 2010/02/07 (Sun) @ 10:10

Rally

You’ve hit the nail squarely on the head with your last post, I think.

The argument of beta vs normal is a bit of a moot point, it’s near enough the same.  My ramble was just meant to deconstruct marcel to expose the underlying thinking.  This for discussion.  There is always a model, after all.  It just isn’t always obvious.

As you say, HR/PA ability (or thinking sequencially, HR/BABIP) is not distributed in anything close to a Guassian/Beta form.  And that’s really inconvenient, the math becomes a bear.

Someone like Albert or Null is surely capable of that kind of math, basically mixing distribution types (perhaps gamma and beta forms, I dunno), and it’s going to get big and ugly in a hurry. 

The rest of us are left to use brute force methods.  And they are just now becoming practical as personal computer speeds improve.  Even now, imo, if you decide to go that route it’s better to get server time and upload scripts to run overnight on faster machines.


#48          (see all posts) 2010/02/07 (Sun) @ 10:24

"Paul Wendt said:

Why do you believe the beta family provides a poor model of the distribution of abilities (true rates) in the population of players?”

I think it is probably reasonably close in a lot of areas.  Batting average being a good example.

Just looking at the spread of results for HR, SO and BB though ... just no way.  Wildly skewed results.  It works out better for baseball than for hockey.  In baseball the low event items teand to be skewed right.  i.e. nature obliges the naive assumption to a certain extent.

In hockey it is often the polar opposite (e.g. goalie EV save%) so the problem becomes obvious much more quickly.


#49    Vic Ferrari      (see all posts) 2010/02/07 (Sun) @ 13:32

Paul

Hopefully you’re still reading this thread.  In any case my ballpark values for “K” come from the same thinking as ‘SOLVING DIPS’.

Basically take the variance of the observed and subtract the binomial variance (at a quick hash, I used the harmonic mean of the PA for everyone with over 200 PA) ... voila! ability variance estimate: K = μ(1-μ)/VAR(ability) - 1

That’s not the point, though.  As ‘rally’ infers, nature will do exactly what nature wishes to do.

And by the by, someone with your math chops should really be deconstructing some of this wildly ethereal and nonsensical econometrics “math”, quotations well deserved.  You don’t have to dig too deep with that stuff before being overcome with toluene.  But that’s your obligation to the community, no?  Seriously.

There is no shame in having expertise in a subject matter.  And the fact that the econometrics math would have been executed the same way by fellows in Sri Lanka, Madagascar or Brisbane ... folks who had never, ever seen a baseball game, this isn’t necessarily a credit to the methodology.  The philosophers can argue with themselves, the gamblers expect a return on investment.  So in the spirit of this thread, predictive value should be the measure (with player level metrics ... all smart money on CHONE from here on out btw, not that it matters).


#50          (see all posts) 2010/02/07 (Sun) @ 13:53

Vic #49,
I am reading but the forum software is unfriendly to me. It disapproves many of my habits, such as delimiting quotations and k-tuples with angle brackets. It rejected a couple of my replies this morning.

Let me try a subset again.

Vic,
Thanks for the replies. I’ll try Greek letters someday soon.

#47 in reference to #38
“"The argument of beta vs normal is a bit of a moot point, it’s near enough the same.
“”

Either way there is a problem with the description that mean HR rate is 15/500 and its SD is 10/500.


#51          (see all posts) 2010/02/07 (Sun) @ 14:23

That worked grin
I don’t live at this website but I did read #48 before 10:24 EST (local). Software rejected my substantial reply.

Regarding the broad theme, I am working on it a little, beginning last week.
Can we post images here? Or is that for Tango, Lichtman, Dolphin only?
Can someone point me to explanation what kinds of images the sabrmetric wiki accepts?

By the way, here are five of the friendlier wikipedia entries that pertain to estimation of a binomial probability including regression to its population mean or its stipulated prior mean. (for philosophers more than phorecasters)

WIKIPEDIA
Sunrise problem
Rule of succession
Pseudocount
Additive smoothing
Bayesian average

Beta distribution
- not so friendly but the grey curve in figure one depicts one beta distribution relevant to the discussion here. It represents the distribution of some binomial rate (centered at 20%) in some population. No distribution in this family adequately fits the homerun rates or triples rates among regular major league batters.


#52          (see all posts) 2010/02/07 (Sun) @ 14:35

In regards to home run rates:

Deconstructed as above, we can agree that Rally was inferring home run hitting ability onto the population, yes?

So he was saying, with the math, that there is a possibility that Pujols was just a lucky version of a light hitting 2nd baseman, and vice versa.  Now it becomes a vanishingly small possibility because of the differenece in results, but as more light hitting middle infielders get thrown onto the pile ... Pujols’ expectation for ability gets unfairly damaged. 

The same general thinking applies to everyone else in the population, for better or worse.  We’re in agreement on that I think.

The sensible thing to do is to separate the populations.  We know that a lot of middle infielders esp, and catchers and CFers too ... they’re getting at bats because of the defensive work they do on the field.

I’ve never done it, but if you divorce the populations by fielding position and try again using the same CHONE math, what happens?

I suspect that you’ll find that the power position players become more accurate, and the defensive position players forecasts fall to crap.

My point, and I’m not expecting anyone to agree with me:  Pure mathematics is all well and good, but nature is leadng us around by the nose.


#53          (see all posts) 2010/02/07 (Sun) @ 17:05

""we can agree that Rally was inferring home run hitting ability onto the population, yes?
“”

?? I think you mean to emphasize that the method stipulates ONE population and regresses every player-season batting record (every observation in a sense) to ONE population mean.

One might ask rhetorically, why exclude the NL pitchers from the population? (It’s easy to lose sight of that move because it may be a byproduct of imposing some threshold number of plate appearances.) Why not instead model the whole population of major league batters at once?

“"The same general thinking applies to everyone else in the population, for better or worse.  We’re in agreement on that I think.

The sensible thing to do is to separate the populations.
“”

I suppose almost everyone would agree with the point of that rhetorical question. Subdividing the data may be a good move. It would be “insane” wink to insist on a single-population model.

Of course subdivision improves the fit.

Regarding your bottom line, Vic, let me conjecture the opposite. The problem that mlb has too many never-homerun and many-homerun hitters will be remarkably replicated within the two divisions “power positions” and “other fielding positions”.

I don’t have a market to meet. (No 2010 fantasy draft market. I do need to market myself for a full-time job.) So I expect to work on a fairly poor model of the pass (BB+HB) and strikeout rates.

fairly *pure* model
wink


#54          (see all posts) 2010/02/09 (Tue) @ 12:48

rally:

Just to add.  If I’ve understood you properly, you will be able to improve your accuracy simply by adding 2*c to your denominator and 1*c to your numerator, this for every stat.  That will give you the beta mode instead of the mean.  (Again, I didn’t choose the beta form, just deconstructed marcel to reveal his reasoning).  And c is a constant that is probably equal to 1 for you, though I’m not dead clear on your model.

That will also give you a wider spread of results.  It seems almost all of these simple models are undershooting the Bayesian posterior variance, and just generally missing the form.  PECOTA maybe not quite so much.  I suppose that is largely because of the means chosen to test their veracity.

Of course ALL models will have narrower variance than actual, because they are projecting means or modes.  That is immediately and intuitively obvious to everyone, I’m sure.  But that’s not the point.

With these forecasting systems, I’m appreciating the power of the associated narratives.  There is some terrific writing on the subject.

Who would you consider to be the best sabermetric forecasters of MLB game line results, rally? Or anyone else still reading this thread, for that matter.  I have no intention of ever criticizing anyone who puts that information out there, directly or indirectly.  I’m just intrigued.  I’ll just lurk.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:54
The two uncertainties of UZR

Sep 02 15:17
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?