Filter posts by...
Statistical_Theory
Wednesday, April 13, 2011
If your team is a true .550 team, expect to observe that team play _x_ games under .500 (i.e., Losses minus Wins) after _y_ number of games purely by bad luck (no extra injuries, no extra change in talent, etc… just pure bad luck) 3% of the time:
Wins under .500 (after these many games)
6 (10)
7 (17)
8 (30)
9 (51)
10 (162)
Take for example the Redsox, who are 7 games under .500 after 11 games (2-9). Based on the above, we are 97% sure they are not a .550 team. It’s pretty close though. One win, they are 3-8, and now above this arbitrary line I set.
So, what you should be looking for: look for your team to go worse than 10 games under .500. Because once that happens, then you know you’re going to need a lot of good luck to go your way, because chances are, you don’t have enough talent.
97% of the time anyway.
Tuesday, April 12, 2011
This Twins blogger:
But the best defensive player so far: Young. He’s +4 in plus-minus with three runs saved. Those are incredible figures considering how incompetent these metrics have held him to be in the past. He’s -57 over the past three years in plus-minus, -27 runs saved in the same period. And now he’s saving the Twins a run every three games? Really? We’ll see how long this lasts.
His last two sentences are kind of strange. He’s not “saving” but “has saved”. Not to mention, it’s a best estimate. But his last sentence is correct that “we’ll see”.
***
Related story: Willie Bloomquist, after 40 plate appearances, is .368./.400/.500 (career .266/.318/.340). He’s currently at +3 runs batting, even though for his career, he averages close to -20 runs per season.
We may as well indict all batting stats, because it doesn’t make sense that such a terrible hitter can be performing so well.
By , 12:28 AM
http://texas.rangers.mlb.com/news/article.jsp?ymd=20110411&content_id=17643966¬ebook_id=17649334&vkey=notebook_tex&c_id=tex
Texas was up 2-0 with 2 outs and a runner on 2nd with Cabrera, a RHB up at the plate and Feliz, Texas’ RH closer on the mound. Washington, the Texas manager, did what any old-school manager worth his weight in practice balls would do - he ordered the IBB.
Some nifty quotes from the article:
First frick:
“I had to pick my poison, and I didn’t want Cabrera taking us to extra innings,” Washington said. “Martinez is a good hitter and I have respect for him, but he’s not swinging the bat well. He could have caught one and won the ballgame, but I decided to take my chances. I didn’t want Cabrera tying that game.”
And then frack:
“I think the answer is very simple: They did what they felt gave them the best chance to win the game,” Tigers manager Jim Leyland said. “And that’s what you do as a manager. If they felt that was their best chance to win the game, then that’s what they should do. I give them a lot of credit.
In related news, a manager at Japan’s Fukushima Daiichi power plant tried to contain the massive radioactive leaks with duct tape. A senior Japanese government official was quoted as saying this:
“He did what he thought he had to do to give us the best chance of containing the leak. And that’s all you can ask of your nuclear plant manager. If he felt that was their best chance of rectifying the problem, then that’s what they should do. I give him lots of credit. In fact, I gave him a raise!”
Saturday, April 09, 2011
Blasphemy from Phil!
No, really, why don’t more academics do this? Just because it’s subjective, doesn’t mean that it’s useless. Sure, it could be biased, but cold hard numbers are also biased. Imagine you look at cold hard numbers, but don’t adjust for Coors or the Astrodome? You look at the numbers, but don’t adjust for the batter handedness? If you knew exactly how to parse the data and look for exactly for all the variables, that’s one thing. But, there’s plenty the numbers don’t capture, things that a person could provide some insight.
As Phil noted, in this case, it’s only 29 data points. Why not talk to experts and see what they say?
Monday, April 04, 2011
I have for the perfect solution for someone who says “He’s on pace to...”.
Say that someone hits 3 HR in his first three games. Someone will say “he’s on pace to hit 162 HR”. But his actual pace is to do whatever it is other people who started with 3 HR in 3 games actually did.
That is, his observed pace of 3 HR and 3 games will lead to an EXPECTED pace of whatever has historically happened. Who knows, maybe people will even learn about regression toward the mean.
People who do the “on pace to...” the traditional (stupid) way impugn numbers. Leave them alone.
Thursday, March 31, 2011
Lucas shows that the count affects the speed.... maybe. It’s unclear if he guaranteed that each of the 12 pools was made up of identical pitchers in identical quantities.
This is a little lesson for your aspiring quants out there. You have three choices:
1. Take the simple average (each pitcher counts the same within each pool; same pitchers used in all pools)
2. Take the weighted average (each pitcher counts differently within a pool; however, that proportion is kept constant across pools)
3. Lump (just add up everything, paying no nevermind to how often a pitcher occurs in any pool)
Never, ever, ever do #3.
An example of #2 is that you would count Roy Halladay alot more than Stephen Strasburg in each pool, and that proportion is kept constant in each pool.
In #1, Doc and Strasburg count the same, and they both appear in all the pools.
Usually, #2 is preferred. But sometimes #1 will work, if you set the opportunities threshold high enough.
By , 04:01 AM
In the book Short Hops, the authors criticize (actually dismiss) UZR because, among other things, the parameters are too imprecise. For example, a batted ball is classified as hard, medium, or soft, but of course batted balls, in actuality, have some discreet speed, not to mention the fact that the 3-tiered classification is a judgment call by imperfect scorers. As well, a ball that is recorded as landing 300 feet from home plate at angle 32 degrees may in fact have landed 310 feet at an angle of 35 degrees. Similarly, in order to estimate the position of the outfielders, batters are classified into long, medium, and short fly ball hitters. You get the point.
Now, while the authors have a point - that all of these classifications are approximate - does that really make them imprecise?
Let’s put aside the fact that in science, advances are often made with inputs that are approximations. In fact, that may be the rule rather than the exception.
Rather than try and answer the above question, I want to pose a few more questions to illustrate my point.
Think about lwts, wOBA, OPS, or some similar all-encompassing offensive stat. What are the inputs? Singles, doubles, triples, etc. Are these inputs precise? Well, there is no judgment or approximation involved (other than errors). They are what they are.
Now compare that to the UZR inputs for batted balls. The offensive events seem to be a lot more precise and no one would think of assailing wOBA or lwts on that front (that the inputs are imprecise).
Now, what if instead of using singles, doubles, etc., for the offensive metric inputs, we decided that we wanted to do better. We correctly realize that for purposes of estimating true talent or formulating a projection, not all hits and outs are created equal (not even all walks or K’s are created equal) - there are bloop and bleeder hits and line drive outs, etc.
So we decide to try and figure out which batted balls are “true” hits, outs, etc., by using the speed of the batted ball, the location it is hit to, it’s trajectory, etc. We are now doing exactly the same thing as UZR does, in terms of batted ball classification.
Suppose we do a pretty good job with this and we are confident that our new inputs are much better than simply using singles, doubles, triples, etc. I think this is a reasonable scenario, and there are in fact some advanced offensive metrics that do something similar - and do not take the offensive events at face value.
Now let’s forget that anyone ever invented wOBA, lwts, OPS, etc., in their old format, using the traditional offensive inputs - singles, doubles, etc. Instead we’ll pretend that these metrics were originally invented using the better offensive inputs, the ones that imply singles, doubles, outs, etc., from the recorded parameters of the batted balls (and perhaps the estimated positions of the fielders, depending on the base/outs/score, and the power and speed of the batter).
These should be much better metrics than the original ones.
How would people like the authors of “Short Hops” react to these metrics? In the same way they react to UZR! OMG! How can you possibly use or trust a metric that uses all of these approximations, and inputs that are subject to the whims and biases of the scorers, parks, etc? The inputs are way too imprecise for these metrics to be useful!
Precision is in the eye of the beholder!
Friday, March 11, 2011
This is the same process we would use for players: compare actual performance (regardless of the luck involved) to the estimated true context. So, we don’t care what the margin of victory was in the left portion of the equation (the actual performance). We DO care about margin of victory of OTHER games for the right part of the equation (the context you were in).
Like I said, that’s how we do it for players.
Wednesday, March 02, 2011
The answer is 25 free throw attempts.
***
All data courtesy of Justin at http://www.basketball-reference.com
***
I selected all player seasons with at least 100 free throw attempts since 1987. There were 5067 such seasons. The average number of free throws per player season was 249.
I calculated the success rate (free throws made divided by attempted) for each player and found the standard deviation. The observed standard deviation of success rate was .090 (simple average) or .087 (weighted by attempts).
What we also want to determine is what is the expected spread of the random variation. Simply put, it would be sqrt(.25*.75/249) = .027, if each and every player took 249 attempts. If I calculate it on a player-by-player basis and: (a) do a simple average, I get .030, (b) do a weighted average based on attempts, I get .027.
That is, if every single player had equal talent, we’d expect to observe a distribution of success rates centered at 75%, with one standard deviation of 3%. What we instead observe is one standard deviation of 9%. This is a huge indication that there is talent at throwing free throws. ALOT of talent.
Now, what can we do with this? Plenty! Remember this equation:
standard deviation of observed ^ 2 = standard deviation of true talent ^ 2 + standard deviation of random variation ^ 2
Or:
sd(obs)^2 = sd(true)^2 + sd(random)^2
We have this information:
.090^2 = sd(true)^2 + .030^2
or
.087^2 = sd(true)^2 + .027^2
That gives us an sd(true) of .085 or .083.
r^2 = sd(true)^2 / sd(obs)^2
That gives us an r-squared of .90 in either case.(*) That is, given 249 attempts, the observed success rate is 90% explained by the true talent of the player, and 10% by random variation.
(*) Alternatively, you can do this using z-scores. .090/.030 is 3.18 (after rounding) and .087/.027 is also 3.18. r-squared = 1 - 1/zScore^2 = 1 - 1/3.18^2 = .901. If you are in school doing z-scores, just remember that “z-score is my friend”.
In order to have 50% of the variation explained, you’d have to have 25 attempts (.10/.90*249).
Our regression equation is therefore:
regression amount = 25 / (25 + Attempts)
If you have someone with 250 attempts, you regress his performance 10%. If you have someone with 25 attempts, you regress 50%.
In practicality, you rarely need to regress. For example, if we look at all players with at least 2500 free throw attempts, the regression rate would be at most 1%.
The #1 observed is Steve Nash at 90.4%. With 3063 attempts, we’d regress him to 90.3%.
The two worst free throw shooters by far are Ben Wallace, observed at 41% and Shaq, observed at 53%. Wallace goes up to 42% (mostly because of rounding). Shaq stays at 53%.
The key point is that once someone takes 25 shots, what you’ve observed is already half real.
Great stuff from Mike.
I also use a very similar technique in other things that I’m doing (public and private). It’s really the whole basis of WOWY that you treat the players as some sort of constants that all other players are compared to. Historically, it’s been used to do park adjustments and league adjustments. But really, you can use it for anything (with the understanding that you need to be aware of the uncertainty level when you do so).
Tuesday, March 01, 2011
Great chart by Patriot that shows the difference between K per IP and K per PA using Greg Maddux.
Per IP may be more convenient to look at, but analytically, you should use per PA. The denominator has to be opportunities, trials, attempts. It should not be successes. Doing that, using IP or outs in the denominator, means you are generally doing a ratio, not a rate (H per 9IP is a ratio of hits to outs). Or in the case of K, it’s a rate of outs made by K, which is not generally something you want to know.
Friday, February 25, 2011
By , 05:47 AM
I wrote this to the authors on their web site:
Dear Sirs:
I am a professional (having worked for several MLB teams, notably the Cardinals in 2004 and 2005) sabermetrician and I have been working extensively in this field for over 20 years. I am the “inventor” of one of the most widely used advanced defensive metrics, UZR, I am one of the co-authors of the sabermetric book, “The Book,” and I host, with one of my colleagues (Tom Tango), a popular and highly respected sabermetric blog, http://www.insidethebook.com (click on the “blog” link).
I recently read your new book, Scorecasting, and I liked it very much. It was well-written, clearly presented, and well-researched, as far as I can tell. I believe it has broken some new and important ground as well. I have recommended it to many of my colleagues and to our (blog) readers.
I was particularly interested in your baseball research of course, especially that pertaining to home field advantage (HFA), a topic that I am not unfamiliar with. While it has been researched some over the years, it is admittedly one area that we (sabermetricians) know comparatively little about.
While you provided some well-researched, eye-opening insight into the role (both in quality and quantity) that umpires may play in the HFA in baseball (and other sports), I must say that some of your research in that area conflicts with similar research I conducted recently, after reading your book.
I would appreciate it if you could take the time to read my comments below (they are edited and reprinted from my blog) and address each issue as you see fit to do.
Kindest regards,
Mitchel G. Lichtman
Read More
Tuesday, February 22, 2011
By , 10:50 PM
There have been several reviews of the new book Scorecasting by an economist, Tobias Moskowitz, and an SI writer, L. Jon Wertheim. Among them are those by Chris Jaffe and Phil Birnbaum, both very smart sabermetricians and critical analysts. I encourage everyone to read both the book and the reviews. Here are the links to the reviews:
http://www.hardballtimes.com/main/article/book-review-scorecasting/
http://www.baseballprospectus.com/article.php?articleid=13003
Phil also talks about the book on his (excellent) blog:
http://sabermetricresearch.blogspot.com/
Anyway, I agree with the general comments in both reviews. The book is well worth reading although they make some claims which appear to be dubious, as is often the case with mainstream books or works by authors who are not subject matter experts about which they are writing.
I decided to duplicate some of their research - at least that which seemed dubious to me - to see if it will hold up to scrutiny by a subject matter expert (me).
Here is what I came up with on the first pass:
Read More
Friday, February 18, 2011
Carson has an equation to convert out ratios into contact rates:
rate = .18ln(ratio) + .38
I’ll repeat my comments over there:
==================================
Carson, a groundout to airout ratio means:
g/a
A groundball percentage means:
g/(g+a)
So, in order to convert a ratio into a percentage, you do:
ratio/(ratio+1)
A g/a of .5 means a gb% of 0.33. A g/a of 2 means a gb% of 0.67, and so on.
However, in MLB, they exclude lineouts from the numerator and denomiator in the g/o ratio. But, they are included in the gb%. So, a gb% is actually:
g / (g + a + l)
Furthermore, in g/a refers only to outs, while gb% refers to all contacted balls. So, you’d have to convert the go to a gb by saying doing go/.75 = gb. And so on.
***
All to say: I don’t doubt the best-fit of the equation you found.
I do think that we can come up with a different equation that is grounded (no pun intended) in logic. And you can then do a best-fit against that equation.
***
Right.
If you have a g/a ratio of .500, 1, 2 the ln of that is going to give you: -.69, 0, +.69. So, perfectly symmetrical. Which matches what the g/(g+a) would give you of .333, .500, .667, respectively.
But, the actual equation for gb% is g/(g+a+l). Would the ln(g/a) still necessarily hold as a core part of the conversion?
I don’t know, I’m asking.
***
Following up:
To convert the ratio to a rate, if we had the exact same parameters in both, we’d do:
g% = g/(g+a) = .x*ln(g/a) + .5
That x would approach 0.25 as g/a approaches 1. And in MLB, x would range from .24 to .25.
So, if we used all contacted balls, then a best-fit equation would come in at something like .25*ln(g/a) + .5.
But, as noted, the ratio actually uses only outs, and excludes lineouts. The rate uses all contacted balls.
Carson’s best-fit, using observed data, changes that .25 coefficient to .18. It changes the intercept from .5 to .38.
My question is if someone here would like to try to come up with an equation without relying on individual data, and simply use some logic to the process. To presume that 20% of batted balls are line drives, that 25% of those are lineouts, and so on.
Saturday, February 12, 2011
tbwhite sent me a file of batting average (why batting average? I don’ t know.... it bothered me to no end to see those numbers… if you want to make me happy, send me a file of OBP next time please). A very well thought out file containing 4434 records with, among others, these fields:
A. batting average in year N
B. career BA through year N (would have preferred N-1)
C. league mean in year N
D. a field if BA in year N was above career mean
E. a field if BA in year N was below career mean
I then decided to ran various correlations to see which pieces of data helps us the most.
Unsurprisingly, the one that did the best was the one that used the first three, at r=.55. This is that equation:
0.22*A + 0.58*B + 0.21*C
This means 21% regression toward the league mean. The t-values for each coefficient is above 10 making it super-high statistically significant.
Now, what if he ignore the league mean? Well, you get a very strong r=.53 using just his current year and past career:
0.25*A + 0.74*B
Using past career is better than the league mean. This is using past single year and league mean at r=.48
0.56*A + 0.46*C
That’s a very strong regression toward the mean.
Now, here’s an interesting one at r=.50
0.99*A - .017*D + .016*E
So, if his BA in year N was above his career, then we drop his batting average by 17 points. If his BA in year B was below his career mean, then we increase his batting average by 16 points. That’s just regression toward his past career.
Anyway, you need it all. You need the player’s most recent performance. You need his career performance. And you also need the league mean. They are all required.
Thursday, February 10, 2011
By , 10:09 PM
No, Thomas Bayes did not have syphilis (as far as I know), but…
The title of this article is:
“Test Gets Almost 1 in 5 Syphilis Cases Wrong”
Specifically, there is an 18% false positive rate for this common test.
All pregnant women are urged to get this test. Let’s assume that you are a pregnant woman and you have no reason to think that you have this disease.
The actual rate of infection among pregnant women is small - something like 1 or 2 per 100,000. We’ll call it .002%.
Let’s say that you take this test that has an 18% false positive rate for any pregnant woman who takes it, and it comes back positive.
What are the chances that you have syphilis?
What if you take it again, and it still comes back positive?
The first person to answer these two questions correctly will win a free copy of the THT 2011 Annual! I would love it if some of the regulars for whom these questions are easy to recuse themselves. If you already have a copy you might want to recuse yourself as well, or at least wait until you think that someone else has already won (and then you can simply agree with them).
Good luck!
Carson does something that I’ve been hoping to see someone do for the longest time. What he does is figure out the surplus value of a player drafted, and link that surplus to the scout who signed him. So, if you signed Tulowitzki, you look like a genius. But this is a sample. It still is not something true, something real.
After all, I presume all thirty teams had Tulo ranked somewhere between #1 and #10. How much does it help us to give the 50MM$ surplus (or whatever it was) to the Rockies’ scout who drafted him #7, and 0$ surplus to the other 29 scouts who also ranked him quite high? In that same draft, Alex Gordon was drafted #2. I don’t follow the college scene, but presumably all the teams had him ranked pretty high as well. Do we slap down only the Royals for drafting him, because they are the ones that bought the Alex Gordon lottery ticket, even though the other 29 teams ALSO wanted to buy that ticket?
If you look at it as a sample, then you can say, yeah, the scout won the lottery ticket, that’s the money he made. But, you have to look at it from a true talent perspective. Perhaps you need to regress 99% of what you see from Carson’s process. Scout Bill Buck is credited with 74MM$ in surplus. Perhaps his true value is 740,000$, and the rest was his good fortune for having exclusive dibs on players.
The problem with the process is the exclusivity of it, the binary outcome: did he, or did he not draft that player. In many respects, it’s like looking at a player’s single game, where he puts the ball in play 4 times, and we’re trying to figure out what the true talent of the player is based on the observations of these binary outcome based on whether the batter was safe or out. Because, in this case, you would also regress what the batter did in 4 contacted PA 99% toward the league mean. On the other hand, if you had say his launch parameters (how hard he hit the ball, the spray and vertical angles, the spin imparted), then those 4 PA might regress 95% toward the league mean.
Today is sample-is-not-true day!
One of my favorite articles I researched was in-game momentum. For you newbies, it’s worth a look. Pre-discussions here.
Anyway, Gabe looks at a bit longer-term momentum in hockey. If you are going to find momentum, it’s in a true team sport like hockey. Basically, what happens in the last 30 games doesn’t really add much. The most obvious reason is: sample size. Just about ANYTHING can happen in the last 10 games, that to use that as some sort of guage of momentum is crazy.
This is the same sort like everything else: everything we observe is a SAMPLE of something real. I talked about this in the other thread yesterday that even if you have all the launch parameters of a batted ball, that is still a sample. You need to infer what that means as to the talent of the player to create those launch characteristics. This is true of when someone throws a baseball at 95mph that that is still a sample.
The question on the table is always: how much luck is there in that sample? Or, how quickly does that observation stabilize? For something like fastball speed, it stabilizes pretty darn fast. For things like team win%, it does not stabilize quickly. (The hockey chart above says about half-a-season.)
Everything you witness has a certain amount of illusion. Remember that when you deal with data.
Tuesday, February 08, 2011
tbwhite asks:
If Votto has a 3 year mean TAv of .325, but posted a .350 TAv in 2010, why should he revert towards the MLB mean TAv for 1B of .280(I’m just making up a number) in 2011 instead of his own mean TAv of .325 ? Regressing back towards the MLB mean implies that even his own .325 number over the past 3 seasons doesn’t reflect his true level of ability, that the .325 is nothing more than an outlier and isn’t repeatable. That is non-sense.
It sounds like nonsense, I agree.
When Votto hits .325 year after year in a league where 1B hit .280, and, this is important, in the absence of any independent estimate of his underlying true talent level, we have no choice but to believe that Votto was the recipient of more good luck than bad luck. That his observed .325 is more likely the effect of him being a true .315-.320, and got lucky to hit .325.
We know this is true (or at least, we can estimate this to be most likely as being true), because if you look at other players like Votto, you will find that their out-of-sample performance will be around .315-.320. The out-of-sample performance, for a large enough group of players, represents that group’s true talent level (with a certain level of uncertainty).
If Votto let’s say goes .325, .325, 300, .350 instead of .325, .325, .325, .325, almost nothing changes about what we know of Votto!. Well, a bit changes, because recent performance is more indicative of his current talent. But generally speaking, he’s got a career average of .325 in either case.
It’s called regression TOWARD the mean. And in the case of veteran’s, we’re only regressing 10-15% toward the mean. We’re basically saying “Yeah, we observed he hit great, but we also know that historically, a group of players that have been observed to hit great happens to hit less great the year after.”
So, going back to his original point, that is correct, his .325 does not represent his true ability. It does indicate his true ability to a certain (great) extent, but not 100%. And, it’s much more likely that he was disproportionately lucky than he was disproportionately unlucky. In absence of any independent evaluation of his underlying talent, we have to remove the luck from the observation. And we do that by regressing his performance toward the population of players that he was drawn from.
Recent comments
Older comments
Page 5 of 342 pages « First < 3 4 5 6 7 > Last »Complete Archive – By Category
Complete Archive – By Date