Friday, October 03, 2008
Complete Linear Weights, 2008
Colin provides his data for easy access, along with his intro article.
Buy The Book from Amazon
Colin provides his data for easy access, along with his intro article.
Courtesy of Patriot.
I don’t really have much to add. Patriot noted that he uses a 73% offensive replacement level, likening it to a .350 OW%. Using PythagenPat and 4.5 RPG per team, I get .364. No biggie. Just wanted to point out that he should probably be saying .360 not .350. However, what if you look at it as one replacement guy with 8 average guys? In this case, this team will win .486 games, making our replacement level -.014 wins per game (or more accurately per one-ninth of a game slice). Adjusted to a per game basis, that’s -.014 times 9 equals -.126, or a .374 win%.
That is, rather than presuming 9 replacement-level hitters with a team of average defense, we presume 1 replacement-level hitter, 8 average hitters, and average defense. That gives you a .486 win%. The marginal impact is .014, which you “annualize” by multiplying by 9. Kinda like ERA for relievers. Anyway, to get it to my replacement level, I’d use 74% or 75%. We’re pretty much in agreement here.
With starters, if we repeat this process, but presume 5.4 IP per replacement start, and the bullpen gives him average support, then Patriot’s 125% gives you a starter win% of .390. To make it .380, you’d want 1.27 or 1.28. So, 125% is perfectly fine.
For relievers, it’s the same process as hitters, if you presume 1 IP per replacement relief. You’d want 106% or 107% of league average.
Anyway, basic core agreement, with just a smidge of disagreement on the peripherals.
Presuming someone out there is setup for this request better than I am, I’d like to get:
pitcherName, pitcherID, GB, PO, A, DP, E
That is, how many GB does a pitcher allow, along with his personal putouts, assists, DPs, errors?
If no one is setup for this, I’ll spider it this weekend.
Stat Corner is a new site by friends of The Book Matthew Carruth and Graham MacAree. It looks like they are focusing on presenting stats you can’t find elsewhere, so that’s good.
Since they are just starting out, and seem to be willing to try new things, I think they’ll be open to suggestions. Here are mine after first use:
1. The team page should list their special stats for 2008, and make their names hyperlinked. Right now, just their names are hyperlinked. No reason for me to go one player at a time on the same team.
2. The team info page should also be present under a league page, so we can see those numbers in context. Things should flow hierarchically (league, team, player), with information on each page.
3. Make the headings clickable or hoverable, so we can see what they mean.
4. On the Pitch data page, each year should be clickable to have further drill down. In fact, Matthew at THT had presented such stats in an article. It’s a great presentation, and I’d hope to see it here some day.
That’s all for now. Good luck guys!
I thought today after completing a fourth team for retrosheet.org in its’ quest to convert the PDF versions of the daily summary pages provided by the HOF into a digital database that I should speak up here and make my own call for volunteers.
David W. Smith is a generous and brilliant man and his life’s work (in the baseball sense)...to bring high quality data to the masses...has produced tremendous results thus far. The daily summary project is gaining momentum, but there’s really no limit on the number of volunteers they can use over there to get this done in a timely manner.
I should be speaking to the choir when I talk about how critical it is to the future of sabermetric research and to the integrity of our data that we get access to this treasure trove of game by game data that goes back into the 19th century.
I’m asking anyone here who would like to use this data when it becomes available to e-mail David (dwsmith~retrosheet~org, replacing the ~ with the appropriate character) and volunteer your time to enter it for him. You don’thave to spend many hours every day doing this work...you can take more time to do each team...even an hour a day would really help move the project forward (each team takes about 7-8 hours to enter if you’re at all adept with Excel and a keyboard).
Leaderboards as of this morning on Fangraphs:
WPA: Batters
Lance Berkman 5.29
Pat Burrell 4.98
Albert Pujols 4.47
Jason Bay 4.42
Manny Ramirez 4.18
Ramirez is also #7 in WPA/LI and Bay is #9.
That is all.
I have been fairly harsh toward BP these past several months, but deservedly so in my opinion. I am anything if not fair, and so, here are BP’s major league andminor league stats, including MLE and “peak” MLE (what you can expect the player to do if aged toward his peak). It’s real sweet, so huge kudos to Clay for presenting the work, apparently updated daily.
I know, sabermetric orthodoxy insists that lineup order doesn’t matter; I guess I keep forgetting to drink all of my Kool-Aid, especially when lineup-related research depends on so many lazy assumptions and/or involves redoing some of the same Markov Chain analysis that’s been done for decades, all of which ends up suggesting that… well, that Joe McCarthy or Earl Weaver or Casey Stengel or Bobby Cox are smarter than the models (or the modelers). Consider me a firm believer in the proposition that much of sabermetrics is about the documentation of already-observed phenomenon, and that the best-placed observers did not and do not need sabermetric re-educations, they need to be learned from to create historically-informed sabermetrics.
If Christina has read The Book, I am annoyed. If she hasn’t, then she is as lazy a reader as she claims the researchers are. And since most of The Book in fact documents empirical (i.e., real-life) data, satisfying her vision of sabermetrics, then Christina should be one of the biggest vocal supporters of The Book.
Hat tip: FifthOF
I don’t remember if I ever posted this, so here goes (maybe again):
If you are looking to make a contribution to the world of sabermetrics, here is the perfect little project for you: Create a mapping table of all player IDs out there.
So, this is what I would like:
1. Post all your mappings somewhere
That’s it. Someone, maybe me, will then merge all of them to come up with the (current) definitive list.
Ideally, other sites will be as bothered as the rest of us in terms of mapping everything, that they will contribute their mappings of the new players in the future to keep this up-to-date. All those minor league IDs, college websites, Japanese websites, etc, can finally have everything linked up. Basically the “universal ID”. Is it possible? Let’s see…
You can now filter based on months, or recent days
The main page to Tangotiger.net doesn’t necessarily have all the file listing on my site. It has most, but every now and then, I forget to update it. Here then is the complete listing:
With taxes finally filed, and my (now) lack of interest in forecasting systems analysis behind me, I’ll be focusing on building a Retrosheet database. I was envisioning releasing everything I do all at once. But, I had second thoughts about that. I might as well release things as I do them.
So, follow along with me, and you can build your database with me, and we can finally share our SQL code once we finish building it, since we’ll all be using the exact same names for everything. This is what you have to do to start off:
Sean has asked for, and received, my Win Expectancy (WE) and Leverage Index (LI) data for Run environments of 3.0 to 6.5. Fangraphs has the exact same data at the exact same terms: free, as long as all their users don’t have to pay to see the data (via PI). This offer is open to any information provider under the same terms.
You can see the results in the above link (still in Beta, so glossary not updated). One interesting presentation he did was to show the win expectancy play-by-play from the perspective of the eventual winning team. That’s what that “wWE” means (win expectancy of winning team).
Just as cool is the pitching summary, where he shows the result of strikes: contacted, swing&miss, called no swing. In addition, you get to see the results of the contacted PA by GB, FB, LD.
The payoff will be when we see this on the players’ split pages, so we can see seasonal and career totals.
Want to know what Jake Peavy throws? Go to the bottom of that page: 59% fastballs, 18% sliders, 11% cutters (CU), 2% curveballs (CB), 11% changeups (CH). In the last 3 years, he’s thrown 10,000 pitches. (Minor note: I’d call the cutter CT, as you can easily confuse CU for curve or changeUp.)
I’m in transition in my sports work. You get to set my schedule for the next several weeks. Let me tell you what I’m in the middle of, and you can decide if you want me to work on something else. In no particular order:
The pinnacle of Sabermetrics is the convergence of performance analysis and scouting observations. To that end, the future of sabermetrics will be the processing of the pitch-by-pitch data. So, a very micro-analysis. Bill James looks at the answer to the question from a very macro-perspective:
League-perspective decision making. Looking at decisions based from the standpoint of the league. Simple example: the wild card....
I know why what I’m saying is a candidate for the future of sabermetrics. I don’t know why what he says is. That’s not to say that he’s wrong, but I just don’t see what he’s seeing.
I was bothered by this statement, especially in conjunction with a later statement where he says he doesn’t keep up with what around, other than Retrosheet:
Then we created “profiles"… which contain all kinds of information about the teams and the players that you don’t have any other way of knowing, at least now; of course other people will rip us off, and the same information will be appearing on other sites in a matter of months.
I really wish he wasn’t so forceful about his statements here. Especially when he’s wrong.
Come one, come all. I just installed it last night, so the site is very bare. And I expect to make limited contributions, so this is a call for you guys to do the heavy lifting. I put up a couple of pages, as has Patriot.
What can you do? Click the above link, and then click on the main link of the page, and navigate the site. Once you’ve dipped your toe, put a search term on the left, like FIELDING or PARKS or EQA or whatnot. Click the SEARCH button. If there’s no hits, then click on the red search word that came back, and you can start creating the page.
Registration is recommended (will be easier for you to track your edits) but not required. Unless some yahoos derail the project, at which point it will be required.
If someone wants to be in charge of the web design, let me know.
Improvement on Fangraphs especially for you cut and pasters. Finally, we can grab ALL the players for a given season in one shot, not 50 at a time.
David has also offered to partner on my Clutch project. We’re working on the details as we speak.
Recent comments
Older comments
Page 1 of 70 pages 1 2 3 > Last »Complete Archive – By Category
Complete Archive – By Date