Tuesday, March 02, 2010
By The Numbers, new issue
Just posted today. Haven’t opened it yet…
Buy The Book from Amazon
I like that the BTN has quality production standards. It makes for better reading and referencing in the future.
I think Phil would be better off taking all those pieces nominated for the Sabermetric Awards and packaging it into one volume. That would kick a$$. So, you’d have a “Best in 2009 Sabermetrics”, all nicely packaged. That’d be nice.
***
I liked Tom’s article in this issue. I’ve thought about this issue alot. I’m not sold that this is the right way to do it, but I like the idea behind it.
I do the same thing for K. I think I worked it out that 1 HR = 6K or something, so I compare how many K’s he has relative to his HR. So a 40HR, 200K guy is a plus.
Something like that…
Patriot, no offense taken. I’m thinking the same thing. The thing is, we’re willing to take stuff that’s been published elsewhere, as Tango suggests in #2. So maybe I’ll start more aggressively asking for reprint permission.
I agree Tom has an interesting idea here. But I wish people would stop using single-season data to measure relationships between stats, as he does here for BB rate. If he looked at career stats, I feel pretty confident he would have found a relationship with BA or SLG, and not just ISO. Especially in this case, when he was going to work with career data in his next step anyway! I’m just so tired of seeing reports of “weak” relationships based on single-season data—wins/payroll, BABIP, whatever—without any recognition that these correlations are telling us more about the chosen sample size than the actual relationship.
I’m starting to think Tango/2 is a great idea. I’ll get on it over the next few weeks.
"without any recognition that these correlations are telling us more about the chosen sample size than the actual relationship. “
Well-said.
With that many player-seasons—3,700 of them—wouldn’t you expect that the regression equation would be at least as accurate as if Tom had used careers? The r-squared would be lower, of course, but who cares about that? It’s the equation he uses.
You’d be more accurate with the seasonal breakdown, because power varies from year to year, and the career number would obscure some of that relationship. So you might get more randomness, not less.
Of course, there are problems—walks vary with on age independently of power, and Tom didn’t control for that. And there are other things you can think of. But, as far as the equation goes, are you sure it’s less accurate than you’d get by using careers? I don’t think it is.
Phil: I think you’re right that the regression coefficient for ISO would remain the same. But with career data he might have found that other factors also had a significant relationship with BB rate. I’d be surprised if using BA or SLG didn’t improve predictive power (and might then also change the ISO coefficient). In any case, it would be worth looking at.
True: he might have found that whatever relationship he found was statistically significant for careers, but not for seasons.
In his defense, he did say he didn’t find a positive correlation (which means he DID find a negative one). And he might have found that the equation he discovered showed such a small effect that it wasn’t worth correcting for. I bet that’s what happened—if there was a huge negative relationship between BA and BB/PA, I bet he would have mentioned it.
That is, if you find that every .001 of BA increases your walks by .00001, who cares? The relationship would be the same for careers, even if the r-squared came out higher.
In any case, I’ll tell Tom we’re here and let him argue for himself.
I agree that a single season does cause ‘fog’ problems; but as Phil stated, with this MANY seasons, the correlation should show through if it existed. Using career ##s would make me wonder if the increase/decrease in power with age made us “lose” data that otherwise woudl be there; after all, didn’t Bonds walk more when his power went up?
Maybe using sets of 2 to 4 years of data would work.
If you’re trying to distill walking ability apart from the pitchers’ fear of throwing strikes to a power hitter, what about walks per ball thrown? Using the fangraphs plate discipline data, you can figure balls thrown from (1-Z%) times pitches. After removing the IBB, HBP, SH stuff as desired, you can figure a good estimate of walks per ball. Of course, you can’t do that for Ted Williams or Max Bishop, since the data only goes back to 2002.
And maybe I missed it, but it doesn’t look like author took out the IBBs.
Tom/Phil: you’re right that if BA was a strong predictor, it should show up in the seasonal data. But there’s another problem, which is selection bias: a low-BA player is more likely to stay in the majors if he can draw walks, while a high-BA player can survive with or without this ability. So even if pitchers do tend to miss the strikezone more when throwing to high-BA hitters, it might not show up in walk totals.
One possible solution to this is using pitch f/x data, and use percentage of pitches in the strikezone as dependent variable. Would be interesting to see if anything other than ISO is predictive.
You might find this blog entry of mine interesting called “Which Players Had The Most Surprising Walk Rates? (Part 2)”
http://cybermetric.blogspot.com/2009/06/which-players-had-most-surprising-walk.html
I took ISO into account and era into account but also stealing and height. I think the guys I found are similar to the guys Tom found to have the best eye.
Feb 11 23:23
Reader Mail of the Day: Why do we need X years of fielding data? And what about outliers?
Feb 11 22:49
Clutch analogy
Feb 11 22:08
Who is Jeremy Lin?
Feb 11 20:11
Fighting leads to goals?
Feb 11 19:55
Why do players get crappy caps?
Feb 11 19:12
Hero of the month: Brittney Baxter
Feb 11 17:59
MGL: Today on Clubhouse Confidential
Feb 11 10:29
Dwight Evans
Feb 11 02:12
Performance through the ages
Feb 10 23:01
For Your Soul
I think blogs have finally killed BTN. No offense, Phil.