Tuesday, July 17, 2007
Internal ESPN memo
"We’re also looking to take statistical development in new directions—from pitch-type and pitch speed breakdowns to statistical win probabilities for various sports.”
(Hat tip: D.G.)
Buy The Book from Amazon
"We’re also looking to take statistical development in new directions—from pitch-type and pitch speed breakdowns to statistical win probabilities for various sports.”
(Hat tip: D.G.)
Here are some updated UZR numbers as of just before the ASB. Again, they are based on STATS data using the latest version of the UZR methodology (park adjusted, etc.). They do not include “arms” for the OF, turning the DP for the IF, or receiving throws for the 1B (or any other IF).
The full file can be found as the last file here:
http://www.tangotiger.net/mgl/
Top players in total runs saved:
Name Retrosheet ID Position Team Chances (Outs an average fielder would make) Range Runs Error runs Total Runs Defensive Games Runs per 150
Feliz, Pedro felip001 5 sfn 128 16 4 19 68 43
Ellis, Mark ellim001 4 oak 181 13 2 16 73 32
Sizemore, Grady sizeg001 8 cle 201 16 0 16 78 31
Rolen, Scott roles001 5 sln 116 10 4 14 61 34
Tulowitzki, Troy tulot001 6 col 227 9 5 14 87 25
Cano, Robinson canor001 4 nya 190 10 2 12 77 24
Holliday, Matt hollm001 7 col 146 11 0 12 77 22
Everett, Adam evera001 6 hou 128 11 0 11 49 35
Soriano, Alfonso soria001 7 chn 131 9 2 11 69 24
Helton, Todd heltt001 3 col 96 7 3 10 76 20
Matsui, Kazuo matsk001 4 col 117 8 2 10 47 33
Reyes, Jose reyej001 6 nyn 182 7 3 10 70 21
Vizquel, Omar vizqo001 6 sfn 188 6 4 10 72 21
Wilson, Jack wilsj002 6 pit 213 5 5 10 82 18
Bottom players:
Lee, Carlos lee-c001 7 hou 129 -10 0 -10 69 -21
Willingham, Josh willj004 7 flo 115 -10 0 -10 61 -24
Young, Michael younm003 6 tex 214 -11 1 -10 82 -17
Burrell, Pat burrp001 7 phi 87 -9 -3 -11 46 -37
Drew, J.D. drewj001 9 bos 130 -10 -1 -11 64 -25
Guillen, Carlos guilc001 6 det 175 -4 -7 -11 67 -24
Guillen, Jose guilj001 9 sea 143 -10 -1 -11 72 -22
Harris, Brendan harrb001 6 tba 171 -10 0 -11 66 -24
Kinsler, Ian kinsi001 4 tex 191 -6 -5 -11 77 -21
Lowell, Mike lowem001 5 bos 141 -9 -3 -11 75 -22
Young, Dmitri yound001 3 was 61 -6 -4 -11 48 -33
Griffey, Ken grifk002 9 cin 139 -11 -2 -13 69 -29
Jeter, Derek jeted001 6 nya 214 -14 1 -13 82 -24
Weeks, Rickie weekr001 4 mil 124 -11 -2 -13 50 -38
Ibanez, Raul ibanr001 7 sea 134 -12 -2 -14 71 -29
Ramirez, Hanley ramih003 6 flo 179 -9 -5 -14 69 -31
Ramirez, Manny ramim002 7 bos 138 -15 1 -14 73 -28
Kent, Jeff kentj001 4 lan 156 -11 -4 -15 63 -35
Peralta, Jhonny peraj001 6 cle 214 -15 -1 -16 82 -29
Interesting article on the changing landscape of suites at sports stadiums. My favorite suite experience was at the Olympic Stadium, where the Expos had the ground-level enclosed area behind home plate (you can sit for a dinner while watching the game, or sit in the seats). Nicest guy was Vlad, who while on one-knee in the ondeck circle would turn his head around to let the kids (on camera day) take pictures of him with his big smile. Dude’s about to come to bat in 3 seconds, and he actually thinks about the kids 20 feet behind him before he goes to home plate.
Dan Fox checks in with all things bunts. Let’s focus on this:
Success Rate By Outs
0 1 2
Empty .441 .450 .488
First .307 .298 .492
Second .481 .518 .516
Third .359 .467 .498
First/Second .337 .259 .424
First/Third .333 .439 .502
Second/Third .444 .429 .495
Loaded .412 .339 .348
Let’s focus only on bases empty for now. The linear weight run value by base/out states is in Table 50 of The Book, but here’s another version of that:
http://www.tangotiger.net/RE9902event.html
So, bases empty 0 outs, the run value of a hit is +.39 runs, and an out is -.26 runs. The breakeven point is 26/(26+39)=.400. That means, when you think you have AT LEAST a 40% chance of making it to 1B with 0 outs and bases empty, you should bunt. The group average is .441, and so, it looks like players are bunting when they should in this situation. The average run value per PA is .39*.441 - .26*(1-.441) = +.027 runs per PA, or +16 runs per 600 PA. Since the guys bunting are likely below average hitters, they are really adding to their non-bunting numbers. If this were Barry Bonds however, his breakeven point would be far far higher than .400 or .441. So, you’d really have to look at it on a player-by-player basis.
(It should be noted that in Game Theory, you should do some seemingly unoptimal bunting every now and then, so that the defense thinks you may be bunting more than you should allowing you to hit away and punch hits through. Look up Game Theory in the Search box of this blog.)
The break-even point for 1 and 2 outs is 40% and 48%. As you can see in the above chart, the players do have a higher bunt success rate with 2 outs and bases empty. And since the breakeven point is much higher, we expect the frequency to be much lower. And it is:
Frequency By Outs
0 1 2
Empty .274 .153 .070
First .172 .070 .025
Second .064 .011 .003
Third .001 .007 .010
First/Second .084 .023 .003
First/Third .004 .010 .006
Second/Third .001 .002 .002
Loaded .000 .003 .002
In short, it’s likely that only the very best bunters try to bunt for a hit with 2 outs and bases empty.
The other major bunting situation is man on 1B and 0 outs. In this case, the run value of a hit is actually the run value of a walk in that chart (since a bunt hit and a walk have the same impact here). The break-even point is 44%, and yet the actual success rate is only 31%. It is an almost foregone conclusion that these “bunt for a hit” numbers includes an enormous number of sacrifice bunts that are not recorded as such.
Given that the frequency of bases empty 1 out (15%) is similar to man on 1b 0 outs (17%), it’s implausible that the success rates could be that different (45%, 31%). I’d also bet that the top 30 bunters in one situation will be markedly different from the second situation, further showing that we’re not really looking at a similar sample of bunters.
His count data is also fascinating:
Count Success Frequency
0-0 .422 .694
0-1 .369 .099
0-2 .090 .018
1-0 .438 .057
2-0 .506 .007
3-0 .500 .000
1-1 .409 .069
1-2 .116 .017
2-1 .440 .026
2-2 .136 .007
3-1 .526 .005
3-2 .125 .002
You can also figure out the run values by count. Just taking a quick glance at Dan’s numbers, I’d bet only the 0-0, 0-1, and 1-1 counts are appropriate for bunting.
There was a discussion on Baseball Fever about using median ERA, because one bad game can really kill you in ERA. I wrote the following:
The ERA “problem” is that it doesn’t (can’t) follow a normal distribution. It’s bound by 0 on one side and infinity on the other.
OBP however doesn’t have that problem, and neither does winning %. I think converting ERA up to winning percentage (runs to wins) or down to OBP (runs to base/outs) is a sensible approach.
***
I just did a little test, where I took games, with an OBP of .135 to .535 (player mean of .335, uniform distribution, league mean also .335).
While the league mean of runs allowed per 9IP is 5.00, this pitcher had an RA of 5.66. However, when I converted it to win% instead, I got a win % of .498.
Essentially, what I did was construct an average pitcher; and his OBP and win% both gave me an average pitcher, but his runs allowed was 13% higher than the league average.
Therefore, you *cannot* look at the runs allowed figure, precisely because of the skew issue. Of course, when you look at it over a period of years, all the pitchers should have the same skew, and therefore, balance out that this is not an issue. On a seasonal basis? Definitely a problem.
Thanks to Studes, I see that Jeff Sagarin has applied the Mills Brothers’ Player Win Average to the Retrosheet years, like here:
http://www.kiva.net/~jsagarin/mills/nl1987.htm
You can actually try to calculate Leverage Index too. For example, Steve Bedrosian had about 50,000 advancement points in 385 situations, or 130 points per situation. If you take say Mike Scott, he had 62 points per situation. Presuming that 60-65 is the standard number, then we can see that Bedrosian had an LI of around 2.0.
Yet another in a long-series of great research pieces by John Walsh.
He points out to some obvious data quality issues like:
”...all I can say is that one pitch whose recorded location was right in the heart of the strike zone, was actually an intentional ball that was thrown two feet off the plate!”
Generally speaking, we see that the textbook strike zone should be the 17-inches of the plate, plus the inside/outside pitches where the ball knicks the plate. Since the datapoint is measured at the centre of the ball, that means that the center-of-ball to center-of-ball left/right strike zone would be a total of almost 20 inches. Umpires however are calling the left/right center-of-ball strike zone as 24.1 inches for RH and 24.5 inches for LH. Dan Fox once pointed out that the home plate also has the outside rubber, which could technically be considered part of the home plate. That would bridge some of that gap, but not all. As well, since John is considering all umpires, it would be interesting to see the left/right range of each umpire. After all, the good ones would call the 20 or 22 left/right strike zone, while the bad ones would call a 25-27 strike zone. You won’t have, I don’t think, umpires who call a 17-18 inch strike zone to balance it out.
John also shows a skew depending on the handedness of the batter. I have to confess I don’t know exactly where the umpire positions himself. In yesterday’s All-Star game, the old man got knocked heavily twice on his left-side, with righties at bat. I didn’t pay attention as to where he was positioned when lefties were at bat. I’m guessing the ump positions himself away from the catcher’s throwing hand, so, he would be farther away from a LHB than a RHB.
Fun stuff…
Two out of three ain’t bad. I mostly love Michael Moore. I mostly respect Larry King’s professionalism. CNN.... well, I used to like them. Other than Anderson Cooper and Larry King, I tune them out. But, last night....
Ah, I was in heaven. Larry King has Michael Moore and Dr. Sanjay Gupta, CNN medical correspondant. In a report a few days earlier, Gupta accused Michael Moore of cherry-picking facts by .... cherry-picking his own facts! He ended off saying “But no matter how much Moore fudged the facts, and he did fudge some facts...” To fudge some facts is to accuse someone of purposeful deception.
When Michael Moore presents data from the U.S. Department of Health and Human Services showing projected numbers per capita for 2006 of 7092$ per person (actual in 2005 was 6697, actual in 2004 was 6322, actual in 2003 was 5952), and Gupta counters that by saying “not true… 6096$”, but without specifying HIS source… well, who’s doing the fudging here? Gupta did a terrible job with this report, not because he presented his own data without attribution, but he didn’t investigate where Moore got his numbers and tell us why he was wrong. Gupta even said something to the effect that France is swimming in debt… a statement that Moore says is exactly what Moore says in his own movie! And Moore was further right that instead of disputing the 6000$ or 7000$ claims (like, really, who cares), why not discuss the larger issue at hand? In short, CNN decided to make a story by looking at every single data presented by Moore, and then CNN found other sources that contradicted Moore’s data, and therefore claimed that Moore fudged his data.
Gupta handled himself incredibly well as a politician, which means he stinks as a reporter.
Michael Moore responded to Gupta on King’s show, and furthermore on his website:
http://www.michaelmoore.com/sicko/news/article_10017.php
This happens in the NHL all the time. It’s really quite shocking. The latest recipient of this largesse is the NHL’s MVP, Sidney Crosby. Crosby is this generation’s Wayne or Mario. In his second season, at the age of 19, he was the league’s MVP.
The NHL has a rookie cap, meaning that for the first three years, a player caps out at 850,000$ per year for 3 years. (And there’s no signing bonuses either.) In the NHL, you become a free agent at the age of 27, or if you have 7 years under your belt. In Crosby’s case, that means playing just 5 more years. In short, something like A-Rod went through in Seattle.
The NHL cap is 20% of the team payroll cap, which is currently 50MM. Between the rookie cap and free agency, you have restricted free agency (meaning arbitration, or if some other team signs your player, you give up draft picks… topping out at four 1st rounders in the case of Sid). Crosby signed an extension of 5/45 (meaning 1 year of free agency has been bought out). This is a free-agent deal, for a guy who is still under the rookie cap. How does this make any sense? Crosby has no leverage.
Also note that the salary cap has been jumping like crazy each year, since the lockout. It started at 39MM, then jumped to 44MM, and now at 50MM. As you can see, the NHL is swimming in cash. If they keep increasing at say 5MM per year, the salary cap would go to 11MM next, 12MM after, 13MM, then 14MM. So Crosby, who as a free agent would have been able to sign single year 5yr deals totalling 65MM, or a 5yr next year at 60MM signs an extension for 5/45. So, he’s not getting the full free agent deal, but that’s awfully close for a guy with two years under his belt.
This causes problems for the rest of the league, since now Crosby can be used in arbitration for the elite players. And even if not, it sets the trend for other players to get near-free agent deals, and those deal will be used in arbitration. (Unless the arbitration process isn’t what I think it is.)
What happens if you take the best pitcher in baseball and give him only one pitch to throw (and that pitch is not Mo’s cutter)? That is, how much of pitching is the quality of the pitches and how much is the selection (mixing up) of the pitches? Unfortunately, no one is crazy enough to do that, to actually tell the batter that he’s going to throw nothing but fastballs. According to a sample at the ever-impressive USSM, we have something close. Felix Hernandex, a veteran pitcher already with unlimited potential (he’s 21 years, 3 months old !), throws 59% of his pitches for fastballs. But, if you look at his first 10 pitches in each game ("establishing his fastball"), he throws 84% fastballs. That is 3.6 SD from his mean, and is therefore highly significant. He’s not telling the batter what he’s doing, but he’s coming awfully close.
His season totals are: .277/.326/.421
His 1st inning totals are: .358/.404/.528
His pitch 1-25 totals are: .345/.383/.517
We don’t normally debate trades and signings on this blog, but…
This was written by a poster on the BTF thread about the signing:
He is an absolute steal at 4 yrs./56 million. He would have received at least 5/80 on the open market and that’s being conservative.
This is close to the prevailing sentiment on that thread. I doubt the 5/80, and I especially doubt the “and that’s being conservative.”
We have discussed Buehrle and the fact that he is probably worth around 2.5 WAR, maybe less in the AL. That is almost 6 mil per win. Even for a starting pitcher, I would have to think that is high. Given that the WS have a below average offense and can easily upgrade in that regard, this looks like a pretty bad deal to me. If you can’t find offensive wins for less than 6 mil per, you aren’t looking in the right places.
Not to mention the fact that long terms deals for pitchers, especially ones who are not spring chickens, are not particularly prudent for obvious reasons. Plus the quasi-no-trade clause, etc.
I would say that this deal was at best a tad poor and at worst, terrible. I would probably give him no more than 3/36.
A few years ago, I noted that Gordie Howe had 1071 North American Major League Professional Hockey goals. That’s NHL, WHA, regular season and post-season.
I was always irked that the WHA was treated as some sort of minor league, and even more, that the post-season is always discarded when it comes to career totals. When you have a superstar like Scott Niedermayer, contemplating retirement at age 33, who has already played 183 post-season games (17% of his regular season totals), those games count. And, if you are going to change their weight, change them UP to 2x or 3x or 5x and not DOWN to 0x!
Anyway, shortly after that, it was brought to the attention of Wayne Gretzky, and an article was written in The Hockey News. Wayne, who had the official regular-season NHL record from a few years earlier, ended his career with 1072 goals. Talk about a squeaker!
Babe Ruth has 729 HR, including his 15 post-season HR. They count, right? I mean, would you rather count them as ZERO?
Hank Aaron goes from 755 to the real 761.
Bonds? Goodbye to 751. He has 9 post-season HR. Say hello to Bonds at 760 MLB homeruns. Bonds needs ONE HR to tie the MLB record, and two to break it.
Note: I really don’t care what the official position of MLB, Elias or anyone else is.
(If you want to take this opportunity to discuss your take on PED, take it outside. This isn’t the place.)