Thursday, August 14, 2008
Bayes’ Theorem
Victor looks at Bayes’ Theorem for prospect valuation.
Buy The Book from Amazon
Victor looks at Bayes’ Theorem for prospect valuation.
I talked about this at length in the Edgar thread, so let me reserve this thread for more generic and technical arguments and presentation.
Let’s say you have someone who has a .380(*) career wOBA in 10,400 PA (16 seasons of 650 PA). How many standard deviations (SD) is he from the league mean of .340? Answer: 8.0
(*)For those new around here, a .380 wOBA is the same thing as a .380 OBP, with a corresponding profile of SLG, something like .475 or so.
A guy with a .380 wOBA in 10,400 PA is roughly +36 wins above average (WAA) and 69 wins above replacement (WAR). This is around the discussion level of someone being a hall of famer.
Now, suppose someone has a .420 wOBA. How many seasons does he have to play in order for us to say that he is 8 standard deviations from the league mean of 8.0? Answer: 4 seasons. That gives him a WAR of 26 wins and WAA of 18 seasons.
And a wOBA of .460? A little under 2 seasons. And a wOBA of .500? Just one season, with a 11 WAR and +9 WAA. That is a Bonds-like or Pujols-like season at their best.
So, is that enough? Is it enough to say that your performance is 8 standard deviations from the league mean, in order for your Observed performance to infer great talent?
I don’t know.
Now, let’s try asking: how far away are you from a .300 wOBA level, which is right close to replacement level. Here’s how that looks:
It’s the old college v high school draft, low ceiling v high ceiling. Phil’s post leads to this discussion.
I’ve got a two-year WPA list for batters involved in pennant races, broken out by age and time period (before and during the pressure-filled months).
As you can guess, nothing there. I have no doubt that there will be something there. (For example, in The Book, I noted that there is an age effect with a runner on 1B. Young hitters aren’t as smart in taking advantage of the hole.) However, whatever we find will be some isolated skillset, something that will be real, but in the grand scheme of things, it’s like finding a 50-foot tree in a forest of 30-foot trees. So, yes, it’s something real, it’s something noticeable, but when you’ve got a forest full of 30-foot trees, if you happen to find a 50-foot tree, it’s not like you’ve found a forest of 50-foot trees.
I’m good at data entry with the numeric keypad. Really really good. Or was anyway at one point. My fingers would fly over those numbers. But, when it came to typing words, and using the letters on the keyboard, I’d be average. If you gave me 20 papers to type, and 19 was for a lawyer and 1 was for an accountant, I’d fly on one of them. But, if all I get to do is expose my real skill 5% of the time, then won’t it be really hard to find that skill if you have 100 people’s results to look at, and you didn’t realize, or think to realize, that one paper might be filled with numbers? And even if you did think to find it, you realize, “eh… it’s real, but it comes to play so little… how the heck am I supposed to find it?”
Pizza talks about it all the time, and has a blog post to it. But, darned if I know what it’s actually doing. When I do my thing, I figure the z-score for the stat for each player (number of standard deviations, SD, from the mean), and then calculate the SD of the z-scores. The correlation is r = 1 - 1/SDzScore^2. So, if the SD of all the z-scores is 1.41, then r = .50. If it’s 2.0, then r=.80. I think this is what Pizza also does, and so, I guess I’m doing an intraclass correlation without even knowing it. Regardless, what I do seems sound. I like what Guy said in the comments in response to my comment:
Here is a snippet from a BP article by Geoff Young about Adrian Gonzalez, the Padres slugging first sacker (I sound like a real baseball writer!):
So I decided to check out his age 25 stats (from 5/8/07 to 5/7/08) and see just how much he’d built on his success from the previous year. Using the same format from my earlier article, and with the help of David Pinto’s Day-by-Day Database, here’s what I found:
Adrian Gonzalez, Age 24-25 Age AB BA OBP SLG ISO XB/H AB/HR
24 598 .316 .376 .543 .227 .376 18.69
25 650 .282 .344 .498 .216 .432 22.41Uh-oh. That wasn’t supposed to happen. I had it all figured out: Gonzalez was going to exhibit a slow but steady increase in skills, and the numbers would support what my eyes had led me to believe.
Unfortunately, reality had other ideas.
So, Young thinks that Gonzalez did not progress as a 24 year old should, given that his numbers (say, OPS) went down from .919 to .842, a significant decline. But wait…
I did not realize it was until I read this article on ESPN.com. In it, Buster Olney takes on the question as to why the home teams are winning at a .577 clip so far this year (as of May 29, according to Olney).
In the article, Olney says, according to several GM’s, players, managers and scouts, it might because of the “influx of young players” who are more familiar with their home environments, or perhaps even party on the road, more so than the average player I guess.
As I said, I hadn’t even noticed that home teams are winning at such a high rate so far this year, and normally I wouldn’t think anything of it anyway. But given the sample size, the difference between this year and what is typical (around 53-54%) is greater than 2 SD, enough to raise an eyebrow or two. Plus, there are a lot of weird things going on in baseball so far this year (well, at least one weird thing, which is the low run scoring and especially HR rate in the AL).
Anyway, the “young players” explanation seems a little silly to me on its face. I mean how many extra young players would it take to make such a difference?
Not one to accept anything at face value, especially that which “scouts, managers, GM’s, and players” posit, I looked at the average age and distribution of ages of pitchers and batters so far this year as compared to last year at the same time (thru May 29). Each age is prorated by the number of PA or TBF. The distribution of ages is percentage of total PA or TBF. Here is what I found:
Eli gives you a boatload.
Ron Shandler chimes in about the “reliability” of statistics at any point in the season. While he makes some good points about how the different measurements require different sample sizes (e.g., PA) to be equally reliable, the piece is littered with phrases, and I am paraphrasing, despite the quotations, like “become meaningful,” “reliable,” “taken seriously, etc.
As I mentioned in another thread, I don’t like those terms being thrown around, with respect to this issue. They are misleading. And dangerous. One man’s reliable is another man’s unreliable. More importantly, the more a short-term statistic strays from a career one and/or from a population mean, the less “reliable” it is, given identical sample sizes (of the current performance). Not to mention the fact that you cannot make any mention of a statistic’s reliability based on the sample size of that statistic without knowing the prior history of the player or the mean of the population. A player with no history hitting .260 on May 15 is probably around a .260 hitter (at least that is our best estimate, albeit without a great deal of certainty). A player hitting .260 On May 15 who has been a .300 hitter his whole career is probably a .295 hitter.
So how in the world can we say that May 15, or any other date, is the date at which a statistic becomes reliable, without knowing the prior history of the player and the mean of the population?
Similarly, if a player is hitting .300 on May 15 with no history, his true BA is probably around .270. So one player with no history who hits .260 on May 15 has a true BA of .260. Another player with no history is hitting .300 on May 15 has a true BA of .270. In one case, his short-term BA is likely his true BA. In the other case, his short term BA is likely nowhere near his true BA. Again, how can we talk about a date or a current sample size, in isolation, that makes a player’s statistic reliable or not? We can’t!
I did basically the same thing I did with pitchers, but this time for batters. I used only BA because I used box scores to gather the April and non-April data. Later I’ll use OPS or something like that. Again, the data are from 04-07.
I also only looked at batters who were on the same team the year before the year with the good or bad April.
For the projections, I used a basic Marcel, weighting each season 25% more then the prior season. I regressed based on 50% regression per 600 AB. And I age adjusted by adding 2 point of BA for all players younger than 28, subtracting 4 points for older than 30 but less than 36, and subtracting 8 point for older than 35.
As with the pitchers, I broke all batter with at least 50 AB in April into two groups. Those who hit less than .200 and those who hit more than .350.
Maybe Tango and some of the other stat guys can take a stab at this.
Phil looks at what happens if you remove Angel Hernandez from a study:
If you take one of Laz Diaz and Angel Hernandez out of the sample of umpires, and replace him with an average umps, every statistically significant effect in the original Hamermesh study become statistically insignificant. And if you replace *both* of those two umpires, the effect not only becomes insignificant, but almost completely disappears.
But he also asks why single out Angel? After all, somebody has to be extreme. Let me be clear: I did not single him out because I looked at the data, and saw how far away he was. That’d be cherry picking. What I did do is point out that there are only 2 or 3 Hispanic umps, and Angel is notorious for his poor performance as an umpire. This is identical to saying that having a fat belly gives you great control, since I’ve only got two pitchers who weigh over 260lbs (CC and David Wells), and they are both lights out with the control. Might as well ascertain the bias to their bellies!
As one of Phil’s commenters said:
The problem, it seems to me, is that the unit of observation really isn’t the pitch, it’s the umpire. And with only two Hispanic and four black unpires in the study, it’s all but impossible to find out anything of significance--the standard erors at the level of the umpire are just going to be too large, and the required t-statistic for (conventional) statistical significance just too low. Until there’s a large enough sample of Hispanic and black umpires, what Hamermesh et al. have done cannot provide a test with enough power to allow us to reach a conclusion.
And that is exactly my point. How can you reach any kind of conclusion about racial bias, if you’ve got just two people that are part of that race?
There’s usually one or two good articles in BE Press, but this issue has alot of interesting topics, from alot of familiar names. I’m starting with the Turocy paper, but I encourage everyone to pick a different one, and report back with a mini-review.
I posted this on a hockey list, so I’ll repost here. It applies to any sport.
===============================
I agree that someone using solely numbers is only seeing half the picture. Why can I be so firm on this? Because performance numbers are nothing more than a sample of a player’s talent (based on specific conditions, of which we may or may not be able to identify).
More specifically, every sample has a margin of error. Just look at goalies. A goalie can face 30 shots a game for 50 games, and that’s 1500 shots. Sounds like alot, right? Well, 2 standard deviations is .015 goals per shot. This means that a true .900 save percentage goalie will perform, 95% of the time, between .885 and .915. At .885, you are on your way out of the league. At .915, you are the #1 goalie on the team.
You need to know the “tools” of the players, so that you can better identify the noise that accompanies all performance of every player in every sport in the world.
Unless you have a large enough sample, in a group of players that have a wide spread of talent.
I love the way Phil deconstructs a study. Some academics cling to regression as if it’s the holy grail. Not Phil. Reading between the lines, you can tell that Phil is always asking “What the f-ck could possibly be happening here?”
Clay gives us a very simple method to figure a player’s peak age. Look at the age of his peak! Instead of all the rigamorole that most of us do, by trying to figure out the trajectory of a player’s performance, and then inferring the peak point from that, Clay simply counts the age of the peak performance for each player (with the result that he won’t know the trajectory). Good job.
There is a sampling bias of course. A good player will get a chance to peak in his 30s, simply by being given the opportunity to do so. A bad player won’t: he might “peak” at age 25, then be out of baseball by age 27, when in fact, if he was given the opportunity to continue to play until 43, he might have had a better season at some point over those next 16 years, just by luck.
Solution? Here’s one. Use the same age period for each player. Look at players’ career only from age 23 to 30 (and if they played in each of those seasons). Then, look from age 25 to 32. And from 27 to 34. If they TRULY peak at age 26 or 27, then we should see it in all these groupings.
I came across an old article I wrote about how to regress sample stats for pitchers and batters (and how that relates to DIPS). The “method” I used to come up with the regression coefficients (for various levels of PA or TBF) was crude (trial and error - we know of better methods now), but the concepts in the article are valuable. I thought it was worth re-reading or reading for the first time.
An article I did over at THT.
Article by MGL over at Hardball Times, including a very lengthy treatment of regression toward the mean.
Recent comments
Older comments
Page 1 of 64 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date