Friday, May 16, 2008
When to walk ‘em…
MGL takes it on, in part one.
Buy The Book from Amazon
Non-sports post. Enter at your peril, avoid at your pleasure.
Knowing that a guy’s fastball speed has dropped 4mph is an important thing to know. What you’d also like to know is if the movement on the fastball has changed to compensate. This is where PITCHf/x comes in. Similarly, a guy’s fastball could have stayed the same, but if it was accompanied with much less movement than usual, that’s also important to know. Anyway, great job, and I’m looking forward to seeing more scouting aging patterns among pitchers.
I’ll point you to Phil, who does a great dissection and who links to the academic paper, along with Brian Burke’s great 3-part blog post (read only the third one if you are pressed for time).
Ron Shandler chimes in about the “reliability” of statistics at any point in the season. While he makes some good points about how the different measurements require different sample sizes (e.g., PA) to be equally reliable, the piece is littered with phrases, and I am paraphrasing, despite the quotations, like “become meaningful,” “reliable,” “taken seriously, etc.
As I mentioned in another thread, I don’t like those terms being thrown around, with respect to this issue. They are misleading. And dangerous. One man’s reliable is another man’s unreliable. More importantly, the more a short-term statistic strays from a career one and/or from a population mean, the less “reliable” it is, given identical sample sizes (of the current performance). Not to mention the fact that you cannot make any mention of a statistic’s reliability based on the sample size of that statistic without knowing the prior history of the player or the mean of the population. A player with no history hitting .260 on May 15 is probably around a .260 hitter (at least that is our best estimate, albeit without a great deal of certainty). A player hitting .260 On May 15 who has been a .300 hitter his whole career is probably a .295 hitter.
So how in the world can we say that May 15, or any other date, is the date at which a statistic becomes reliable, without knowing the prior history of the player and the mean of the population?
Similarly, if a player is hitting .300 on May 15 with no history, his true BA is probably around .270. So one player with no history who hits .260 on May 15 has a true BA of .260. Another player with no history is hitting .300 on May 15 has a true BA of .270. In one case, his short-term BA is likely his true BA. In the other case, his short term BA is likely nowhere near his true BA. Again, how can we talk about a date or a current sample size, in isolation, that makes a player’s statistic reliable or not? We can’t!
Rally asks:
What I want to do is see if, given a pitcher’s projection from his MLE, whether fastball velocity tells us any more useful information for his projection. In other words, do fireballers beat their projections? Do soft-tossers fail to live up to theirs?
If you don’t want to read his study, the answer is:
Knowing a pitcher’s velocity doesn’t tell you anything about his chances of success that you didn’t already know by looking at his minor league numbers.
Josh Kalk walks us through it. A link at the bottom (references) showing the results, would be a welcome addition. Otherwise, excellent article.
Here is a partial list of what I would do:
There are some (actually, many) decisions a manager makes that sabermetricians consider wrong. Or I should say, “the models that sabermetricians construct to model the relevant situation” say that they are wrong (sabermetricians have no “opinions”
).
Anyway, some of these “wrong” decisions are “justified” by conventional wisdom, some are so close that it doesn’t much matter, and for some, perhaps, the manager is right, because he knows things that the model doesn’t.
And then there are things that a manager does that are just plain dumb. Things that almost everyone, other than, seemingly, the manager, knows are dumb.
Today in the Braves game, they are losing 4-3 in top of th 6th, with runners on 1st and 2nd and 1 out and the pitcher due up. Even the announcers said, “Reyes (the Braves starter) is on deck, but he won’t hit, especially if Prado (the batter) gets on (recognizing that the leverage goes up if he gets on).”
I thought to myself, “Don’t count out the ‘managers can be exceedingly stupid’ factor.”
Sure enough Cox, the Hall of Fame manager let Reyes bat, and the rest of the game is history.
I doubt I have to explain to the readership here how bad that decision is in terms of costing the Braves WE. I’m sure Tango can give us the numbers if he has the time. We went through a similar situation with the Padres a couple of weeks ago. In that game, at least it was Peavy pitching (not that it makes that much difference). But here, we have a back of the roation guy in Reyes who is probably only going to pitch for another inning at the most.
Pathetic. I feel sorry for Braves fans, but heck, almost all managers make really stupid decision like that all the time (or at least from time to time).
You guys know I love the Wisdom of the Crowds approach. So, let’s use it for something that has an outcome, rather than just the academic exercise we’ve been doing. I offered for Leverage Index to be part of the Bill James Handbook.
For a non-commercial product (specifically, if one of my readers can get the product or service for free), I have a free licence for it. I figure that as long as you guys don’t pay for it, I’m happy to donate it to these guys so that you in turn can see it in action. That’s why I’m happy that Fangraphs and B-r.com has taken it.
Now, other commercial ventures have approached me for it (video games mostly). I’ve turned them all down, because now that’s a consumer-paid product (as opposed to the ad-driven service that Fangraphs and B-r.com offer), and I honestly don’t know what it’s worth. Plus, I never play video games, so I don’t really know how it’s going to be used, etc.
So, this is where you guys come in. What is a licence to Leverage Index worth? You can state it in terms of dollars, in terms of % of revenue, in terms of value per book sold. You can state it in terms of in-kind, like data(*). Or in other terms that you can think of. Maybe make Bill James agree to drop Runs Created in favor of BaseRuns! Whatever.
(*) For data, I already have an understanding that any work I do with BIS-related data at Fangraphs is published at Fangraphs. So, I’m thinking there is a limit to the value over and above this. But, maybe you guys have other ideas here.
Make me a deal guys. You are the wise ones.
Some great data by Pizza, on the relationship between pitch counts and performance. If the numbers look low he notes: “Again, these numbers are lower than might be expected due to some of the methodological problems I ran into. If I have a moment I might try to correct for it.”
Regardless, the pattern is fairly plain to see. Roughly speaking, it looks to be almost 2 wOBA points per 10 pitches thrown. There are roughly 33 pitches thrown per time through the order, so that gives us an average change of roughly 6.5 wOBA points, each time through the order. In The Book, table 82, I show that each time through the order shows a difference of 8 wOBA points. So, fairly close.
Pizza: can you add a parameter for “time through the order”? Table 80 makes it seem like there is a definite jump each time. Perhaps your results are smoothed out what may be a staggered effect.
All your links courtesy of the fantastically accessible Sportvision, and a recap from PITCHfx-er Ike Hall.
UPDATE: Ok, we are now live. You can click the links at the top right corner of any page on this blog, right where it says: “Mail”. I’ve started answering a few. When we have updates, I will make a note of it in the comments of this thread. Thanks.
We have thousands of readers at this blog, but just a small portion of those readers actually post on our blog. Presuming that there are some of you that prefer a different avenue to make your comments, we have started a Mailbag, whereby you can provide feedback, ask questions, or whatnot. We guarantee to not only read them all, but reply to each one, either by selecting it for our blog for public viewing, or replying privately. Your name and email address will be kept private.
This idea stemmed from the original Historical Abstract. Here’s what I wrote to David at Fangraphs, and I’ll provide further commentary:
I did basically the same thing I did with pitchers, but this time for batters. I used only BA because I used box scores to gather the April and non-April data. Later I’ll use OPS or something like that. Again, the data are from 04-07.
I also only looked at batters who were on the same team the year before the year with the good or bad April.
For the projections, I used a basic Marcel, weighting each season 25% more then the prior season. I regressed based on 50% regression per 600 AB. And I age adjusted by adding 2 point of BA for all players younger than 28, subtracting 4 points for older than 30 but less than 36, and subtracting 8 point for older than 35.
As with the pitchers, I broke all batter with at least 50 AB in April into two groups. Those who hit less than .200 and those who hit more than .350.
This is the headline and a snippet from an MLB article on the Tigers page from yesterday:
DETROIT—The Tigers’ new $153.3 million man has not lived up to expectations. But that’s not to say he won’t eventually.
Leyland apparently agrees with that sentiment.
I just looked at my current player database as of a few days ago, and Cabrera had a +33 per 150 park-neutral lwts. What the hell are they talking about?
This is a question solely for those readers who are subscribers to Bill James Online:
Two excellent issues from editor Phil Birnbaum (Nov 07, Feb 08). Here are my thoughts:
This is where game theory and PITCHf/x will collide.
Suppose: if you know the batter knows the data, then you make a change to your approach. But, since the batter knows you know he knows, and he knows you’ll change your approach, the batter’ll change his approach. But, what if you don’t know that the batter knows the data? Do you presume the batter doesn’t know and keep pitching the same way? But, if the batter actually does know the data, but the batter knows that you don’t know that he knows, then he’ll cream you.
Question: are you better off if everyone knows, or are you better off taking the chance that he might not really know? That is, might it be to your benefit to know 100% that everyone has the data and compiled in the same way you have it, or is it to your benefit to have that data well-compiled, while the other guy may or may not have it, and you have no way to know whether he has it well compiled?
Someone can insert the Princess Bride youtube clip right about.... now!
Recent comments
Older comments
Page 1 of 52 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date