Filter posts by...
Statistical_Theory
Sunday, July 24, 2011
Patriot captures it right here:
Building your metric around a run estimator does not necessarily restrict you to simply plugging in the numbers in the appropriate place. Suppose you wanted to construct a metric based on batted ball types, strikeouts, and walks. One way to go about it would be to simply go through and estimate singles, doubles, triples, homers, and outs in play based on the percentage of each batted ball type that wind up as each. So, you would end up with equations that might look something like this:
Singles = .057FB + .217GB + .516LD + .017PU
However, if you believe that you have gleaned some other insights into the relationship between events that could improve your metric (such as strikeout pitchers having lower HR/FB rates) , you could still build that in to your formula for estimated home runs, and plug those into the run estimator. It’s more difficult than running a regression, and a more delicate balancing act (at least in terms of developing the formula), but it allows you to stay grounded in a model that estimates runs by taking a first step of, well, estimating runs.
He’s saying this (or if he’s not saying it, then that’s how I am reading it, and, in any case, it’s how I think it):
1. You start with a working model of how runs are created. This is the beauty of something like BaseRuns, because it works so darn well… GIVEN its inputs. If you know the number of hits, HR, walks, outs, then we have a fantastically great estimate as to how many runs are expected to be scored.
2. If you don’t know the inputs, estimate the inputs… but don’t change the actual run scoring model. So, again, if you happen to not have the number of doubles, but can estimate the number of doubles that this pitcher either gave up, deserved to give up, or was expected to give up, and it’s based on his batted ball distribution profile, and/or the number of HR he gave up, and/or his SO/BB ratio, then estimate the doubles in that manner.... but do NOT touch the run scoring model.
Once you have the estimates of all your inputs, then you can plug them into an established working model.
Even something like FIP is basically a regression equation, because it doesn’t adhere to an actual run scoring model. Of course, there is a tradeoff between complexity level. A linear equation is used at the expense of a real baseball run scoring model because it’s easier to compute or understand. But, if you’ve got a complex linear equation, or even a complex multiplicative equation, or some other form of equation, then you’ve got the worst of both worlds.
This is why I like FIP or wOBA, because they are such simple metrics, that its strengths and limitations are readily apparent.
So, ANY pitcher metric that is not grounded in BaseRuns is immediately setup for a limitation. The bigger your limitation, then the easier your metric must be.
SIERA, for example, is a good example of a metric that is too complex for its own good. The insights, the benefits of SIERA is hidden inside its complexity. But, if Matt were to follow Patriot’s lead here, and compute estimates for events (1b, 2b, 3b, hr, bb, so) based on his findings, about how things interact, then we would have a very helpful metric.
So, that’s my recommendation as to how you can really advance the cause: keep the logic of baseball intact if you insist on complexity.
Wednesday, July 20, 2011
Good job by Kincaid.
Also note the Tango Distribution (last two links on home page).
Monday, July 18, 2011
By , 10:22 AM
Here is how it works:
Let’s say it is the 9th inning and your team is winning by a run. Your pitcher walks the lead-off batter. The announcer on TV says something like, “Wow, you can’t walk the lead-off batter with a one-run lead. You have to challenge him.” Or, “There is nothing more frustrating for a manager than walking the first batter with a one-run lead in the 9th.”
Now, obviously a walk is not a good thing in that situation, as opposed to an out or even a generic PA. But, the question is whether a walk in that situation is particularly bad. The answer to that question is not necessarily obvious, especially if you are not sabermetrically inclined (like the announcer). But there is an easy way to answer it using the “balance theory.”
Let’s say that you had more than a 1-run lead. What about the walk then? It is now obvious that the lead-off walk is horrendous, since it is nearly equivalent to a home run (other than the double play possibility). Since the 1-run lead and the “more than 1-run” lead are the only two possibilities, if the walk is particularly bad with a “more than 1-run” lead, it HAS to be not so bad (again, comparatively speaking) with a 1-run lead.
That is the “balance theory,” and it can be used to answer many questions like that…
Tuesday, July 12, 2011
Exactly.
Fangraphs has its WAR and Baseball-Reference has one as well. But in truth, everyone has their own WAR.
My dad and I were talking about this the other day. He was talking about why he thinks no one in baseball is better now, and what he was doing was processing all the factors he values…he puts a higher value on speed (and triples) than you or I might…and he thinks there is a “fan popularity” impact for every player.
In his mind, he’s smushing all those factors together, just as the Fangraphs version and BB-Ref versions do. His version is personal. He and I don’t have to agree. But it makes for the most fun kind of baseball discussion.
We all come up with our “single number”, even though we kick and scream that we shouldn’t come up with a single number. If one guy argues that Felix is better than Lincecum, and the other argues the opposite, then guess what: they’ve each “smushed” a bunch of parameters, considerations and gut feelings to get to their final opinion.
I remember an old boss of mine deriding the idea of a spreadsheet that would take a bunch of factors into consideration to come up with everyone’s rating at the office, and, in turn, everyone’s salary. He said that he has to do everything on a case-by-case basis.
But, lost to him is that, in the end, everyone DOES get a final number: a salary. So, you can have a consistent process, that considers everything objective and subjective. Or, you can consider those same objective and subjective things, and smush them together in your mind on a case-by-case basis. You are STILL considering the exact same things.
The difference is that by going case-by-case you may be applying different weights to different parameters for different people as the mood strikes you. If you have a process, that doesn’t happen.
No one is telling you not to overweight or underweight strikeouts or HR. But a system requires you to spell out the rules for weighting, and apply that consistently to everyone.
The one good thing about the case-by-case basis is that it forces you to think about parameters. You’d like to ding Manny Ramirez a little, you’d like to up Jeter a little. So, you have to create a “heart” parameter. And that’s perfectly fine! Just spell it out that that’s what you are doing. And tell us how much you are giving to each player for heart. I have no problem with giving out wins for heart, over-and-above whatever his actual performance tells us. Just spell it out and be consistent.
Wednesday, June 29, 2011
Patriot.
Tuesday, June 28, 2011
Jimmy:
Let’s say you want to identify clusters in two-dimensional data. You an do this using a clustering algorithm such as k-means or soft k-means. In a nutshell, what this does is take an initial set of means (chosen however), evaluate the distance of each data point to one of the means using some distance metric and then assigns a mean to each data point (i.e. the closest mean). Then it re-evaluates the means given the current assignment and steps through the process again, unless it converges and you have the data grouped into “k” clusters.
So this helps with grouping the data points, but let’s say you wanted to go a little bit further. What you can do is run the initial algorithm to find the means and cluster assignments, and then impose the assumption that each cluster is distributed around its mean (which you just found) according to a bivariate normal distribution. Then you use maximum likelihood (ML) to find the variance parameters of the bivariate normal for each cluster, which may vary for each cluster. You can assume different variance in each direction to account for clusters that aren’t spherical. Then once you have those parameters, you have the variance of each cluster.
To relate this to baseball, assume the two-dimensional data we have is horizontal and vertical pitch movement, and assume that the pitcher in question has three pitches: 4-seam FB, slider, and a curve. Presumably these three pitches will form three distinct clusters when graphed. We run the k-means algorithm to identify which pitch is which (i.e. assign clusters), and then we fit each cluster to a bivariate normal distribution by ML. Then we have the variance of each cluster. Then we can compare the variance (i.e. the consistency) of each pitch’s movement relative to the other pitches, or compare it amongst pitchers with the same type of pitch. And we can track it from game to game, season to season, etcetera, so that we can say that “oh, Erik Bedard’s control of his CB has really improved this season relative to last” with some quantitative oomph rather than with simple visual evidence.
And there are a lot of other advantages to this too besides just getting the point estimate of the variance. We can also get the variance of the point estimate itself to quantify how accurate we think our estimate of that variance is. We can use the bivariate fit in real time, with Bayesian updating to improve the accuracy of the pitch/fx system itself (in identifying pitch type). There are a lot of places to go from here.
I also hear you on the problem with noisy data. That is a universal issue, but there exist a lot of ways to deal with it. I’ve heard of people transforming the data with principal components analysis first (which is a sort of clustering algorithm in itself… kinda) and then running the k-means on the transformed data to get better clustering fits. And lots of other improvements upon the plain vanilla k-means algorithm to deal with tough data. I’m sure there is literature on this stuff somewhere… but I should really shut up because I don’t understand the pitch/fx system too well.
If you’re feeling adventurous, I recommend chapters 20 and 22 of this book as an intro to the stuff I’m talking about: http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/
I’m not sure that my title description is clear enough. And if someone wants to propose a better title to be clearer, please do so.
Someone sent me something like what I’m about to post, and he called it a “paradox”, but it is not at all a paradox. It’s a question of whether you average out binary numbers or average out the rates.
Suppose that Roy Halladay’s true talent level is such that the Phillies win .601 of their games with him on the mound against an average team at a neutral site. At home, the odds go up by +.050 (and on the road, it goes down .050). Against good teams, the odds go down by .050 (and up by .050 against bad teams). Against great teams, the odds go down by .100 (and up by .100 against terrible teams). So, Phillies with Halladay starting at home against a terrible team gives us odds of .751 that the Phillies will win. And on the road against a great team gives us .451 that the Phillies will win.
Count as “1” any time the Phillies have a greater than 50% chance of winning with Roy Halladay on the mound.
What percentage of the games are the Phillies favored to win? Is it exactly 60%? Or more than 60%? It’s not a trick question.
Friday, June 24, 2011
Interesting…
Results indicate that players who were “morning types” had a higher batting average (.267) than players who were “evening types” (.259) in early games that started before 2 p.m. However, evening types had a higher batting average (.261) than morning types (.252) in mid-day games that started between 2 p.m. and 7:59 p.m. This advantage for evening types persisted and was strongest in late games that began at 8 p.m. or later, when evening types had a .306 batting average and morning types maintained a .252 average.
“Our data, though not statistically significant due to low subject numbers, clearly shows a trend toward morning-type batters hitting progressively worse as the day becomes later, and the evening-types showing the opposite trend,” said principal investigator and lead author Dr. W. Christopher Winter, medical director of the Martha Jefferson Hospital Sleep Medicine Center in Charlottesville, Va.
...but obviously the sample size is so tiny, that the “though not statistically significant” can’t just be walked by.
This is their sample size:
Nine participants were found to be evening types, and seven were morning types. Both groups had a mean age of 29 years. The study used the players’ statistics from the 2009 and 2010 seasons, which allowed for the analysis of 2,149 innings from early games, 4,550 innings from mid-day games and 750 innings from late games.
Reporting the innings played inflates the impact of the size of the sample, given that they are reporting batting averages (which means the opportunities is at bats, not innings). So, 2149 innings is like 1000 at bats, and 750 innings is like 350 at bats. Laughably small numbers of course. And next time, please, don’t use batting average. Linear Weights or wOBA would have been the far better choice.
However, I very much like the idea, and the effort. So, as a starting point, it’s great.
Glove-slap: Sky.
Monday, June 13, 2011
Derek does exactly (one of the way of) what I do. I don’t know that I actually get the same results, but, the process is bang-on.
Stabilizes Years Stat Denominator
100 0.2 K PA-IBB-HBP
168 0.3 UIBB PA-IBB-HBP
253 0.4 IBB PA
501 0.8 HBP PA-IBB
959 2.1 1B PA-HBP-K-BB-HR-ROE
833 1.8 2B+3B PA-HBP-K-BB-HR-ROE
48 1.5 2B 2B+3B
48 1.5 3B 2B+3B
1126 2.4 1B+2B+3B (BABIP) PA-HBP-K-BB-HR-ROE
143 0.3 HR PA-K-BB-HBP
62 0.5 HR (HR/FB) OF FB [MLBAM]
65 0.5 HR (HR/FB) OF FB [RS]
109 0.2 GB [MLBAM] GB+OF+IF+LD
116 0.2 GB [RS] GB+OF+IF+LD
182 0.4 OF FB [MLBAM] GB+OF+IF+LD
189 0.4 OF FB [RS] GB+OF+IF+LD
194 0.4 IF FB [MLBAM] GB+OF+IF+LD
233 0.5 IF FB [RS] GB+OF+IF+LD
795 1.7 LD [MLBAM] GB+OF+IF+LD
979 2.1 LD [RS] GB+OF+IF+LD
Inconclusive* SB% SB+CS
39 0.3 SBA% 1B+UIBB+HBP+ROE+FC
UPDATE: For pitchers:
Stabilizes Years Stat Denominator
126 0.2 K PA-IBB-HBP
303 0.5 UIBB PA-IBB-HBP
943 1.5 IBB PA
1346 2.1 HBP PA-IBB
3893 8.4 1B PA-HBP-K-BB-HR-ROE
2305 5 2B PA-HBP-K-BB-HR-ROE
4977 10.7 3B PA-HBP-K-BB-HR-ROE
1882 4 2B+3B PA-HBP-K-BB-HR-ROE
351 11 2B 2B+3B
351 11 3B 2B+3B
3729 8 1B+2B+3B (BABIP) PA-HBP-K-BB-HR-ROE
1271 2.7 HR PA-K-BB-HBP
1239 9.4 HR (HR/FB) OF FB [MLBAM]
105 0.2 GB [MLBAM] GB+OF+IF+LD
205 0.4 OF FB [MLBAM] GB+OF+IF+LD
288 0.6 IF FB [MLBAM] GB+OF+IF+LD
2026 4.3 LD [MLBAM] GB+OF+IF+LD
36 2.3 SB SB+CS
161 1.2 SBA 1B+UIBB+HBP+ROE+FC
Friday, June 10, 2011
Every matchup has a specific and true mean. God herself would establish that specific and true mean at that specific point in time-space with zero level of uncertainty. Pujols at Busch on July 3, 2011 against Doc and God knows that he can’t handle an outside cutter well, and the next pitch is going to be telegraphed by Doc as an outside cutter? God says that Pujols will contact that pitch 23% of the time (if allowed to replay in that time-space an infinite number of times) with 0 level of uncertainty.
But what about humans? If Pujols v Doc has an expected contact rate of 70% any time Pujols swings (with a certain level of uncertainty, say 10%), then how much a better mean estimate can we get in more specific situations (we find more data about Pujols and or Doc and or Busch and or the weather), and how much more can we reduce the uncertainty level?
Wednesday, June 08, 2011
Ichiro has had 802 games where he came to bat exactly 5 times. His OBP was .413.
The expectation of him getting on base 0 or once, using the binomial distribution, is 252 times. In reality, it was 262 times.
Ichiro had 671 games where he came to bat exactly 4 times. His OBP was .326.
The expectation of him getting on base 0 or once, using the binomial distribution, is 406 times. In reality, it was 399 times.
If you add the two above:
- the expected number of times he would get on base 0 or once, based on the binomial, is 659 games
- the actual number of times he actually did get on base 0 or once, based on the binomial, is 661 games
Ichiro was the first guy I looked at. That it ended up this close was fantastically fortunate for me. But, it’s not a surprise.
So, there’s my challenge to anyone else: select 10 hitters. I dunno… Rickey, Boggs, Gwynn, Raines… whoever. Whoever you are interested in (though preferably not guys with lots of IBB).
Report the results. You’ll find something close to what I found.
***
For those wondering why the OBP are so different for 4 and 5 PA: the PA was selected after the fact. If he came to bat 5 times, chances are, his team (and him) were hitting pretty well. In order to not have this issue, I would instead only look for the FIRST FOUR PA of each game. Then you wouldn’t have this problem.
Friday, June 03, 2011
My answer:
Michael,
I agree that, 100%, luck is the random occurrence centered around a true mean. In effect, EVERYTHING in the world is luck, since something either did (1) or did not (0) happen. There’s no such thing as something “partially” happening. If someone has a .420 OBP, he won’t get on base 42% of the time in his next PA. He either is, or is not, on base. So, whether he got on, or not, is luck. The FREQUENCY over a long period of time, is not luck. But, any single event is luck.
That’s the tough part to get through that, any single occurrence is random, but it’s random based on the true mean. A goalie saves 90% of his shots, so, he’ll get a save (1) far more than a goal (0). But any single event (save or goal) is luck.
To make it worse, the mean is not even constant! If there’s a breakaway, his chance at a save is 65%, but if he’s got all 5 of his teammates, and only 1 shooter, his chance at a save would be 95%.
Tough concept.
Tom
***
To further add: the “true mean” is based on everything we know, and don’t know, about the environment. That is, god herself told you the odds of something happening based on the properties of each entity. When something happens, or not, that’s luck. It’s the random occurrence (1 or 0), but predicated on the true mean (whatever it is, but it has to be greater than 0 and less than 1).
If one thing has a 100% causal effect to another thing, that has nothing to do with luck, and is instead, fate. We’re not talking about fate.
And, I’m not talking about things “outside my control”. That’s not luck. That’s simply a gap in knowledge.
I’m talking about you know exactly the true odds of something happening.
Wednesday, June 01, 2011
Friend of The Book’s Blog Millsy will be in Canada, as well as fellow debater Rodney Fort.
Wednesday, May 18, 2011
There are several articles of interest this month, including one from Andrew Thomas. I haven’t read any of these yet, but will do so momentarily. Please feel free to highlight any of these you find interesting or want to discuss.
Monday, May 09, 2011
Excellent:
By its nature, punditry craves attention, which is easier to attract with certainties than with equivocation. But that certitude reflects bravado more often than true knowledge.
Maybe Kobe Bryant should take heed, predicting a series comeback after being down 3-0 (i.e., needing FOUR consecutive wins), only to lose the next game by 36 points.
Friday, May 06, 2011
I haven’t read the paper, but apparently it’s true!
Friday, April 22, 2011
Great post from Kincaid.
Note: a better estimate than .050 as the spread in talent is .060. That’s why I use 69 games as the regression amount equal to 50%. (Kincaid’s example, using .050 as the spread, implies 100 games as the regression amount.)
One easy way to test that 69 is a better number than 100 is to simply take games #1, 3, 5… 137 for each team in pool1, and games 2, 4, 6… 138 in pool2, and run a correlation. You should get r=.50.
I am not an economist. I don’t even play one on the web. So, be kind to me if I get something wrong here.
Read More
Friday, April 15, 2011
Good job by Phil in laying it all out.
As for the reason for that 3.7, a large portion of that is almost certainly the uncertainty of the true talent for each player. There’s only so much we can know about a player, given such a small sample as 3000 plate appearances, combined with such a narrow talent base that is MLB.
Recent comments
Older comments
Page 4 of 342 pages « First < 2 3 4 5 6 > Last »Complete Archive – By Category
Complete Archive – By Date