THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, December 22, 2011

“Acquiring Power Isn’t More Valuable When a Team Has None”

By Tangotiger, 02:28 AM

Good stuff.


One little technical note: The correlation should have wOBA*PA.  But, won’t really matter much at this level.

#1    Perceptron      (see all posts) 2011/12/22 (Thu) @ 11:34

I don’t understand how this is useful. How do you increase wOBA? Well, you could hit lots of singles and get lots of walks, but hitting for more power is more efficient. This is evident just from the weights of wOBA

The question they are asking is does ISO account for any of the variance between wOBA and runs scored? And the answer is pretty clearly no. But this is completely expected. ISO will be very correlated with wOBA, so when you take away wOBA (expected runs), it is pretty unlikely that ISO will explain any of the remaining variance.

How do we account for the remaining variance (which is obviously very small from the first plot)? Find things that are *not* accounted for by wOBA. Repeat this exact same analysis with OBP, BB%, BA, # of triples, etc. and you’re going to get the same thing. If wOBA already accounts for it, clearly then that stat will not explain any of the remaining variance.


#2    Tangotiger      (see all posts) 2011/12/22 (Thu) @ 11:50

What the blogger was trying to figure out is if there was a bias in the formula.

Specifically, can we simply treat the relationship of team wOBA to team runs scored in a more-or-less linear fashion?  Or, do we need to know how much of the wOBA was HR-centric?

As an example, do you give a weight of, say, 2.2 or 2.5 in the wOBA equation for HR, if you have a team that has no power?  That is, can a team devoid of power get more benefit from a power hitter?  And if you have a team of power hitters, that maybe the HR value should be say 1.7 or 1.8 in the wOBA equation.

That’s what he was after in his article…


#3    Perceptron      (see all posts) 2011/12/22 (Thu) @ 12:32

That clears it up, thanks.

I’m not sure that I have convinced myself his method is doing that, but I certainly understand the process better now.


#4    Tangotiger      (see all posts) 2011/12/22 (Thu) @ 12:39

Right.  I don’t know that he’s actually succeeding.  The reality is that team ISO, or team anything, is really pretty tight in whatever you look at.

We show things to 3-decimal places, like a team OBP of .320 and .340, but the reality is that it means OBP rate of 32% and 34%.  We can’t see any bias if you restrict yourself to such tight ranges.  Pitchers have an OBP of 27% to 36%.

He gave us an initial first step, an initial idea.  Further research is warranted if you follow his model.

That said, we don’t need to use his model.  A simulator will give us what we really want.  You construct two teams, one laden with power and another not, while making sure that one of the hitters is identical in both teams.  And you can construct them so they both score the same number of runs.  Then you swap out that one guy from both teams for a power hitter, and see the results.

Bill James did something like that for another study, with Willie Mays, Rickey Henderson, and Steve Sax placed in the leadoff and cleanup slots, to show the impact that these hitters had.  It was quite the interesting study.


#5    Perceptron      (see all posts) 2011/12/22 (Thu) @ 13:39

If you have X men on base, a home run will be worth X+1 runs for either team. So in that sense, a home run is worth the same for both teams. Now the question is does the random variable X differ between the two teams? Or a simpler question, does its expectation differ?

However, now the question is something more akin to does adding a slugger to a high OBP, low slugging team have a greater impact than adding a slugger to a low OBP, high slugging team?

I agree, this is a great time to use a simulation. However, I’m not sure I follow your argument about tightness, do you mind elaborating?


#6    Tangotiger      (see all posts) 2011/12/22 (Thu) @ 13:46

The tightness of the data (at the team-level) doesn’t offer enough signal to overcome the noise.


#7    Perceptron      (see all posts) 2011/12/22 (Thu) @ 14:20

Alright, I think I understand your point. Basically, individual player contribution is tough to discern from team totals, correct?

Team statistics are so tight due to the sample size. There’s less variation and so it’s closer to the ‘true’ value for that team. In that sense, it mostly signal with limited noise, hence my confusion. But we are not really worried about team level contributions.


#8    Tangotiger      (see all posts) 2011/12/22 (Thu) @ 14:25

I think we’re on the same page.


#9    mettle      (see all posts) 2011/12/22 (Thu) @ 14:32

I appreciate the back and forth here.

I think I’m with Perceptron’s early point (that I think Tango agreed with): I’m not sure this is the right method for evaluating the question. That is, because of the massive colinearity of wOBA and ISO, you’re never going to find significance using this method, regardless of the truth of the matter.

There are a few tricks that might work.
1) Do a factor analysis on wOBA and ISO to isolate two components, i.e., “wOBA minus power” and “power”. I think that’s what was attempted via xRuns, but I don’t think that is the right way to do it mathematically. Once you have the two factors, you can try the same original analysis.

2) Barring that, try a (step-wise) multiple regression with wOBA and ISO. You may find that ISO has the highest weighting and wOBA might even be kicked out.

2a) Compare the wOBA and ISO correlations.

***

Really I think it’s a major no-no to use wOBA here if you’re interested in this question because wOBA is a composite measure. That is, if your interested in some health risk as related to body characteristics, it’s very wrong to put in weight, height *and* BMI.

I don’t recall the details of how the weights in wOBA are calculated but if they were calc’ed via a regression to runs scored, then that’s your answer - the weights are the weights and nothing is under- or over- weighted to the best the analysis can determine.


#10    Tangotiger      (see all posts) 2011/12/22 (Thu) @ 14:58

The wOBA weights are valid in terms of the average environment.  They are not valid in extreme environment.  Indeed, I published the wOBA values year-by-year, so we ALREADY kinda know the answer to the question.

http://www.tangotiger.net/bdb/lwts_woba_for_bdb.txt

You can compare 1998 to 1968 if you want.


#11    mettle      (see all posts) 2011/12/22 (Thu) @ 15:17

Thanks for that.

I guess this gets into some of the debates regarding linear weight calculation (that I’m quickly brushing up on here:
http://www.tangotiger.net/wiki/index.php?title=Linear_Weights
and here:
http://sabermetricresearch.blogspot.com/2009/10/dont-use-regression-to-calculate-linear.html

If linear weight are calculated using RE, then I think the question the OP is asking ultimately comes down to what you write in about multiple linear regression in the first link.

For example, Sac Flies seems to be (one of) the biggest discrepancies between RE LWs and Multiple Regression LWs. Therefore, using the method in this original article with SacFlies instead of ISO would be the most likely way to find a *significant* result - that SF *does* account for some of the variance between R.E. wOBA and xRuns (which I think is a backdoor way of doing a mlutiple regression and is therefore multiple regression wOBA). But this is (obviously?) wrong—adding someone that hits lots of SF will not improve your team.

Does that sound right?

On a side note, I’m now finally appreciating some of Tango’s (and Phil’s) distaste for regression analyses.


#12    Tangotiger      (see all posts) 2011/12/22 (Thu) @ 16:01

SF is an outcome, not an event.  It’d be like separating RBI-singles from nonRBI-singles.  Which one would you rather have?  You continue on this path and you get runs = runs.  SF is cheating.

***

http://www.tangotiger.net/markov.html

If you have a sim or Markov process that has 9 individual hitters, then the answer will be there.


#13    MGL      (see all posts) 2011/12/22 (Thu) @ 23:30

I think that it is obvious that there is NOT a more or less linear relationship and if you look at certain examples or the extremes it becomes clear.  For example, if I have a team with a high OBP, adding a player with power is better than adding another high OBP player.  Another example is if I have a team of players with very high HR totals such that no one is ever on base, adding another player with power is not very useful or efficient.

I have found with my sim that even for real teams, that there can be diminishing returns when adding players to good teams.

So I think there is something to the idea of adding what you don’t have (IOW contradicting his study), but I doubt that it has much practical value - you probably want to use it as a tie breaker. You certainly don’t want to pigeon-hole yourself into only or even primarily pursuing one type of player…


#14    Brian Cartwright      (see all posts) 2011/12/22 (Thu) @ 23:46

I’m with mgl. The value of a HR is dependent on the expected number of base runners when the HR occurs. If a team can get runners, the most efficient way of driving them in is with a HR. But each additional HR takes runners off the bases for the next guy.

I would think that between two teams with the same OBP, the one with a single 30 HR batter would have a higher number of runners on base for that batter than the average of four 30 HR batters on a single team.

Group teams with the same on base percentage, then vary the number of HRs for the season, finding the number of runs scored on each HR.


#15    Tangotiger      (see all posts) 2011/12/23 (Fri) @ 09:00

There’s no question that the run value of each event changes based on the makeup of the team.

The run value of the HR stays pretty constant at 1.4 regardless of team makeup.  But, it’s all the other events that change. 

This is most clear with Pedro, where the run value of the walk is something close to .20 runs IIRC (but the HR stays close to 1.4).

Realistically, there’s nothing much a team can do.


#16          (see all posts) 2011/12/26 (Mon) @ 21:08

Tango, how are you getting those weights? I see you quote that 1.4 number all the time, but my calculations consistently show it closer to 1.5 (typically like 1.48). And it does move around a bit, even within realistic conditions, though it is the most stable event.
I’m also not sure what set of rates you’re using when you’re talking Pedro. Probably a projection. Looking at his career, I get this:
HR 1.47
2B 0.81
1B 0.43
BB 0.30
Out -0.21
And if we take the 1999 vintage Pedro, I’m getting:
HR 1.44
2B 0.77
1B 0.39
BB 0.26
Out -0.16


#17    Tangotiger      (see all posts) 2011/12/27 (Tue) @ 12:13

Your calculations are wrong if you are getting something closer to 1.5.

If you are using my basic Markov on my site, note its assumptions.

Otherwise, I’ve explained the 1.4 several times on my blog (search the archives), and it’s in The Book (which if you don’t have it, you can read the relevant chapter, Chapter 1, for free from Amazon’s Look Inside).


#18          (see all posts) 2011/12/27 (Tue) @ 22:10

1) Will re-check my copy of The Book when I get home.
2) Spent a good part of the afternoon reading things from blog archive. Vast majority takes 1.4 as a given rather than explaining, but I found 3-4 articles explaining. Not entirely convinced, but that’s fine.
3)Ran the numbers on the Markov. Can’t believe I hadn’t thought of that. Most enlightening.
4)Re-ran my numbers. I’m 1-10% too extreme, usually in the 2-5 range, depending on the event and environment. Not too Bsd or concerning, especially because making linear weights wasn’t my goal, just something it was really easy to do on the way. Also, my singles and especially doubles are more off because of a known issue where I’m treating advancement a bit weird. Which can probably also explain the 2% I’m seeing.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 20:16
Largest demonstration in Canadian history?

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com

May 24 00:16
Psst… wanna intern… somewhere?