THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Tuesday, September 02, 2008

Fixing BaseRuns and RARP

By Tangotiger, 03:35 PM

I’ve had a very interesting email discussion with Clay, which I will reproduce here.  The quoted parts are him.  I also have to say it was very pleasant, as are all my discussions with him.  He certainly seems willing to make corrections where needed, and offer alternate solutions.


On the discussion here, noted in post #10: http://www.insidethebook.com/ee/index.php/site/comments/rarp_v_vorp/

Basically, I think you have an issue with RARP with pitchers, and I think it relates to the issue of EqR at the low-end. I talked about this a year or two ago, and EqR breaks down at the low-end, and the result with your pitchers-as-hitters RARP may be the result.

No, it relates to a problem in calculating replacement level at the low end.

I have defined replacement level as a .230 eqa for the league, with .260 the average value of the league. For a 100% full-time player, that’s a 22-run difference.

Other positions were defined as (230/260)*(position average). In other words, replacement eqa was always a fixed .885 ratio of the average, which equates to a .736 run ratio and a .350 offensive winning percentage. For positions with eqas from .245 to .285, the difference between average and replacement varies between 18.4 and 27.8 runs.

For a pitcher, with an eqa of .100, the replacement eqa would .0885. The difference between average and replacement is only 2 runs, which is essentially the same as average pitcher, which is why they clustered around 0. I’ll probably re-write it to force all positions into a 22-run difference, which will tend to improve rarps of catchers and shortstops, and hurt ratings of first basemen and corner outfielders. I’ll have to think about that. That would make Zambrano a 14.5 if he were a pure pitcher, and Lohse a -1.1, for a 15.6 run spread between them. Zambrano’s RARP is being held down by appearing as a pinch-hitter in six games, so his position value is higher than a pure pitcher. The program is giving too much weight to those games.

I disagree about the breakdown in EQR at low values. Keep in mind that eqa/eqr are designed to reflect the change in runs from adding this player to an average team. Properly speaking, it should not be used as team/game estimator, although as long as you’re close to the league average it works well. If you were to calculate BaseR for the entire league, subtract Lohse’s stats from the league, and recalculate based on the league minus Lohse totals, you should get a close match for the EqR, and that will be a negative run value.

Speaking of baseR, is there a reason why CS and DP are in the “B” component, but not in the A or C components where it seems they belong?

Also, it appears to me be that it could be formulated as

HR + (average number of baserunners when a HR is hit) + (remaining baserunners)*(score fraction)

The average number of baserunners is - at least, should be - calculable from the obp, and should presumed to be equal to just the average number of baserunners at any give time, with a correction for the chance that the batter is leading off an inning. The last bit should depend, essentially, on the ratio of remaining bases to outs. Of course the “obp” would need to be adjusted for HR, CS, and other non-batter outs.

Ooooh… I like that HR idea. You are right that we should be able to come up with a function, based on OBP and HR rates, actually. A team that never hits a HR will have more runners on base than a team that often hits a HR, even if they both have a .333 OBP. I like your insight, and will get my readers to offer their ideas on it.

We have the discussion about whether to put DP and CS in the “A” or “B” term all the time. Either way is “right”. It depends on the perspective. I gravitate on “initial baserunners”, others on “known baserunners”. You can extend that further by even removing SF from the A and adding it in the D term (with HR). I’m not a fan of treating SF as anything other than an out, since we don’t call it a “run-driving-in single”.

It’s possible that the “rate” way of doing replacement level is better, but I prefer the fixed set. You use 22 runs, I use 19.8 runs for hitting and 2.7 runs for fielding (total of 22.5), so essentially we agree. (The replacement PLAYER is a very below average hitter and a slightly below average fielder.)

I’ll toss in another piece - given an OBP which accounts for outs on base and reached on errors, the probability of N batters coming to the plate should be

P(N) = [ (N-1)! / ( 2! * (N-3)! ) ] * ( 1 - OBP)^2 * OBP^(N-3)

Calculate over all N for a given OBP, and you get a vaguely logarithmic plot. This relates directly to the likelihood that a given HR will occur from a leadoff hitter and to the average number of baserunners.

Agreed on SF; I tend to force it to ride with SH, otherwise you get unreasonably large values associated with it.

The trouble with not putting CS in the A/C components is that it is possible to create a line where BaseR produces more runs than (baserunners - caught stealing) would permit, at admittedly very extreme conditions.

I’ll probably make the change on the RARP on Thursday.

#1    Tangotiger      (see all posts) 2008/09/02 (Tue) @ 16:06

One later thought I had regarding Clay’s point on the HR:

As we know the run value of the HR will approach 1.00 as OBP approaches 1.000.

However, the total number of runs scored directly off the HR will be say around 3.0 runs per HR.  So, it is possible to create a run estimator model that does not properly value each component.  That is, consider:

Runs = HR+runnersDrivenInByHR + (initial runners minus runnersDrivenInByHR)*scoreRate

This is true.  However, by removing the runnersDrivenInByHR from the A term and moving it to the D term, we may both increase the overall run estimate certainty level, while decreasing the explanation we can assign to each component.


#2    Tangotiger      (see all posts) 2008/09/02 (Tue) @ 16:13

The difference by the way between treating the HR differently from the other two is that the HR clears the bases and scores himself.  Therefore, we know exactly how many runs are scored, if we know exactly how many runners are on base.

You could also add runnersdriveninbytriple in the D component, but not the triple itself.

You can continue on this path of course, and instead of having “known runners driven in” by component, you have “estimated runners driven in”.  The problem is that when you do that for singles and doubles, it starts to get pretty complex.  Furthermore, like I said in post 1, it won’t work at the component level, because we are too focused on the direct run contribution, and not the indirect.


#3    Patriot      (see all posts) 2008/09/02 (Tue) @ 16:45

On the RARP/pitcher issue, I’m not sure I follow what making the gap between average and replacement the same as for the other positions solves.  Replacement pitcher hitting is a bit like replacement player fielding--it tends to be pretty close to average.  In the case of pitcher hitting, it makes intuitive sense IMO that this would be the case.  Pitchers are not chosen for their hitting ability; even if they are to some extent, they’re not going to lose their jobs for failing to hit.  It would take a few seasons of being a full-time starter to even accumulate a number of PA comparable to those of position players to make an informed judgment on how much the bat is costing you.

A model that assumes replacement players are average fielders is workable but has some distortion.  I have to believe that a model that assumes replacement pitchers are average fielders is workable with even less distortion. 

On the BsR thing, I do not expect anyone to find a simple improvement to B/(B+C).  I think that anything that better that comes along will be a lot more complex, and I do wonder if there will be a point when the added complexity is not worth it.  If you start getting in to factorials, you might as well just go straight for a Markov model, as they will both be equally unworkable without a big spreadsheet/program.

BTW, thanks for being my unsolicited defense attorney.


#4    david smyth      (see all posts) 2008/09/02 (Tue) @ 19:31

I’m not sure that Clay’s HR idea will amount to any improvement at all.

As far as where to put CS in BsR, most versions I’ve seen put it in the B Factor (and also possibly C). The problem with this is that negative runs becomes a possibility. Putting it in A solves that, but introduces another problem--that of creating a “known runners who don’t score” bin. If you put CS into the A, then why not also GDP? And if so, why not all of the other “Outs on Base”? And then, why not all of the “Left on Base”? At this point, though, all you have left in the A component are the runners who score, and the score rate has to be 100%.

So, including CS in the A factor is sort of cheating, looke at in the larger sense. But putting it in the B factor makes negative runs possible. This doesn’t matter with team/season data, but at the inning level, as Brian is trying to research, it will rear its ugly head often.


#5    tangotiger      (see all posts) 2008/09/02 (Tue) @ 21:14

Right, so why not include it in the C?


#6    david smyth      (see all posts) 2008/09/02 (Tue) @ 21:29

Do you mean the C only? If you do that, you are tying CS to batting outs. If you weigh a CS at 3 in C, you are saying that a CS is worth 3 batting outs, regardless of context.

If you include CS in the C to some degree, you still have to include the lion’s share in the B (or A).


#7    Colin Wyers      (see all posts) 2008/09/02 (Tue) @ 21:36

The negative effects of the CS are that it makes an out and removes a baserunner. If A is baserunners, B is advancement, and C is outs, it sounds like CS belongs in A and C.

Now if A, B and C aren’t those things, and are just abstract concepts, then you can put the CS anywhere so long as you get the right answer, but in my opinion you just end up with a dynamic version of XR that way - accurate so long as you’re only using it for the strict purpose it was intended for.

The reason we don’t include all baserunning outs in the A term is because we don’t know all of them. That’s a limitation of our data. (Some of them get counted as singles, like when a runner singles and is thrown out taking second. I think there were 122 singles in the inning dataset that resulted as a double play.) But if we have the CS data, or the DP data, I don’t see why we shouldn’t use it. It just doesn’t belong in the B factor - again, in my opinion - if B represents advancement, because baserunners can’t advance negatively. Once a baserunner has reached second, he can’t go back to first - he either is out on the bases, scores or is stranded. That’s the underlying reality we want to model.


#8    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 16:49

Don’t all of the events that your are including in the BsR equation have to be included in B? I remember Patriot saying that all events of your equation need to be included in B if you want to force the correct LW values. So I would lean towards including the CS in both B and C.


#9    Colin Wyers      (see all posts) 2008/09/03 (Wed) @ 17:58

Most linear weights values are constructed the way Jurassic Park constructs their dinosaurs - they use frog DNA to cover for the gaps in their information. Okay, okay - except for the frog DNA part.

We’ll use the Ruane linears weights as an example:

http://www.retrosheet.org/Research/RuaneT/valueadd_art.htm

They are great, fantastic, wonderful, probably the best linear weights values ever published. They’re also… wrong, technically speaking.

Here’s a look at my (similarly derived) weights:

http://www.editgrid.com/user/cwyers/linear_weights_version_0.1

Ruane put a lot more work into his than I did mine, and I’m working on improving them. And I didn’t include a lot of events that you’d really like to have, like double plays, or events that some people like to have, like sacrifices.

But look at the weight of a single - .467 in the 1960 AL according to Ruane, .457 by my reckoning. Why the difference? We used the same Retrosheet PBP data.

The difference is in events that aren’t traditionally recorded at the team seasonal level, like reaching on an error or balks or wild pitches.

In order to publish linear weights that reconcile with team scoring totals, you have to alter your values from the “true” values in order to compensate for your missing data - a single has to be worth more runs than it really is to capture the run scoring value of balks and passed balls.

So reconciling the the intrinsic linear weights of BsR to X set of linear weights means that you force BsR to conform to the assumptions made in constructing those linear weights, some of which were made based upon necessity to make the numbers add up rather than any real baseball logic.

That doesn’t mean you can’t create a BsR equation that models a specific set of linear weights. You can create a lot of different BsR equations for a lot of different purposes. But it’s not necessarily the correct approach, either.


#10    Patriot      (see all posts) 2008/09/03 (Wed) @ 18:05

Terps/8: The comment you attribute to me (not incorrectly so) assumes that you want to match the LW values *precisely* and simply.

It is possible that you could cook up a version without CS in B that would have a coefficient of say -.333 when the empirical value is -.328.  It is even theoretically possible that you could devise one that would give -.328 precisely, but you’d have to do some trial and error to get there.  If you throw everything into B, you can calculate the needed coefficients precisely given A, C, and D.

My fixation on matching B values stems from the fact that usually A and C are well-defined, and there is no question about what goes in there (to the extent that there is a question, you answer it by using “initial baserunners” or “all outs”, etc.).  However, with Colin’s recent work raising the negative B coefficient issue to the forefront, and Tango’s suggestion on moving some of those penalties to C, this may no longer be the case.

David has made the point that he doesn’t think the empirical values need to be matched.  I would agree to a point; I think there is definitely room for a BsR version that may be off by say .005 runs on the value of a single but also does not allow negative runs under any circumstance.


#11    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 19:24

Colin,

I can think of 2 reasons why your LW differ from Tom Ruane’s. First, Tom calculated Run Expectancy charts from all innings (home half of ninth and later + partial innnings). Secondly, you can’t figure out how Tom Ruane treats Reached On Errors. He implies that he included them somewhere in his calculations in the beginning of the article:

“It will produce results different from a good linear weight formula for two basic reasons: 1) it will include some stats (like pickoffs and reached on error) that aren’t typically available and so aren’t part of most linear weight formulas, and 2) it includes the context (men on and number of outs) associated with each event, factors which help determine their importance.”


#12    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 19:32

Patriot,

When you move a negative B value to C, how does this affect the rest of the equation?


#13    Colin Wyers      (see all posts) 2008/09/03 (Wed) @ 19:42

That quote doesn’t refer to his linear weight values, they refer to the value added approach. They’re both based upon a RE framework.

I did go ahead and check my RE values against Ruane’s and they are different, for reasons I don’t know.


#14    Peter Jensen      (see all posts) 2008/09/03 (Wed) @ 19:45

terpsfan101 - Ruane’s quote which you cited in post #11 was referring to why he likes the value added approach to valueing a hitter’s contributions, not to his methodology for calculating linear weights. I don’t know where Colin got a value for a single of .467 for Ruane.  In looking at Ruane’s chart for 1960 AL it seems to be .464.  Why that number differs from Colin’s .457 is unclear because Colin doesn’t fully state his methodolgy.

The difference is in events that aren’t traditionally recorded at the team seasonal level, like reaching on an error or balks or wild pitches.

This sentence in Colin’s post #9 suggests that Colin may be doing something inappropriate to linear weights.


#15    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 19:59

Regarding Tom Ruane’s LW,

Did he ignore plays that were ROE? Or is the value of these plays implied in the categories he calculated Linear Weights for?


#16    Patriot      (see all posts) 2008/09/03 (Wed) @ 20:04

Terps, no one has actually implemented that yet (Tango has mentioned it), so it’s hard to say.  What it definitely does do though is make it impossible to use an equation to find the B values to bring everything into balance.

This is not necessarily a bad thing--maybe the outs and stuff really have no business in B.  It just makes it a lot harder to match target linear weight values, and will force some trial and error (since now you have to solve for 2 unknowns--the B and C coefficients).

Peter/14: If I am reading Colin correctly, what he is saying is that if you have missing data, and you want your linear weight equations to equal the actual runs scored, you are going to have to jimmy the values of the events that you do have.  I agree with this, and I agree with his point that you may do not necessarily want to reflect these values in your BsR equation.


#17    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 20:05

Peter,

Tom and Colin probably got different results because they used different Run Expectancy Charts and they split plays up differently. Colin split plays up by “Retrosheet Event Code” and Tom chose to split his up by the “Official Statistics.”


#18    Peter Jensen      (see all posts) 2008/09/03 (Wed) @ 20:12

Ruane doesn’t explicitly say in his article, but the standard methodology is that all plays including ROE are included in the RE chart (or all minus home 9th or greater), but the run values of each event represent only that type of event.  This, of course, means that the actual league total runs scored will not reconcile with the event run values times the event frequency for the events listed.


#19    Peter Jensen      (see all posts) 2008/09/03 (Wed) @ 20:23

Patriot - I read Colin’s post #9 as referring to traditional linear weights such as Ruane was using in the article Colin cited.  What you fellows do with BaseRuns is entirely up to you.  I have stated before that I don’t see any value to BaseRuns except for estimating linear weights when PBP data isn’t available.

terpsfan - Possibly, and your explanation about whether Ruane was using all plays in his RE chart might be the entire difference, although it usually isn’t that large. Since Tom used Retrosheet as his source data I have to believe that he used retrosheet event codes as well.  That is the easiest way to make the query.


#20    Colin Wyers      (see all posts) 2008/09/03 (Wed) @ 20:32

That’s one way of figuring linear weights, Peter, but I’d hardly call it “standard.” (It’s certainly the best way, I’ll add.)

But it’s not the method used to derive, in no particular order:

* Estimated Runs Produced
* Equivalent Average
* Extrapolated Runs

And even Batting Runs used a simulator instead of real PBP data. Empiric linear weights may be the least common sort.

And that assumes that we want to study Majore League Baseball during the Retrosheet era. If we want to study minor league baseball, Japanese baseball, Negro League ball or MLB prior to 1954 we have to use other methods.


#21    Patriot      (see all posts) 2008/09/03 (Wed) @ 20:37

Here is a first pass at shoving negative stuff in C rather than B.  It should be taken in that light.  I did not bother with IW this time.

A = H + W + HB - HR
B = .731S + 2.132D + 3.463T + 1.918HR + .026W + .111HB + .909SB + .336SH + .729SF
C = .749(AB - H - K - DP) + 1.179K + 2.865CS + 5.058DP

The LW for 1960-2004:
0.464 S
0.767 D
1.055 T
1.415 HR
0.311 W
0.330 HB
0.197 SB
-0.273 CS
-0.071 AB-H-K-DP
-0.112 K
-0.482 DP
0.073 SH
0.158 SF

These are pretty close to the target weights I was using, which are based on Ruane’s.  The RMSE when applied to 1990-2004 teams is 23.75, which is worse than the “F1-W” version (23.55).  Obviously the real test would be to see how it would do on the inning level.  And again I’m not holding this up as anything but a first try.

What I did was start with the B values as generated by the formula, then wipe out those for the events I didn’t want and reconciled them so that the sum was the same as before.  I held C constant as the sum of AB-H+DP+CS, and used this formula to find initial C coefficients:

(C-LW*(B+C)^2)/(A*B)

All that is a rearrangement of the formula for calculating LW to solve for the C coefficient, assuming that the A and B coefficients for the events are zero.  Then I reconciled those to equal the previous C sum.


#22    Peter Jensen      (see all posts) 2008/09/03 (Wed) @ 20:54

Colin - Then what we have is a only a terminology problem.  When you use the term linear weights and cite an article like Ruane I assumed you used the term like Ruane was using it, as an extention of Palmer’s Batting Runs.  All the other linear run estimators that you mentioned, even if linear in nature, are usually referred to by their own specific names.

For studying baseball when PBP data is not available I completely agree with you that to answer some questions it is desirable to have some kind of run estimator, and I applaud your efforts to try and establish a methodology for getting the best one possible for the task at hand.


#23    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 21:21

Patriot,

I tried this the other night with Tango’s Full BsR version, and got similar results. If I knew what I was doing at the time was similar to your line of reasoning, I would have posted it then. Instead of removing the intentional walk, I combined it with the walk. Also I include .66 in C for the pickoff, because that is the percentage of runners picked off who are were out from 1974-1990:

A = 1B + 2B + 3B + BB +HBP +ROE +INT + OtherSafe + .08*SH

B = .763*1B + 2.047*2B + 3.294*3B + 1.781*HR + .0001*BB + .172*HBP + .84*ROE + .290*INT + 1.506*OtherSafe + .744*SH + .855*SB + 1.107*BK + 1.224*PB + 1.234*WP + .584*Indifference + .001*Foul Outs

C = .92*SH + 1.06*SO + 1.01*(AB-H-K-RBOE-OthSafe) + 1.8*CS + .9*Pickoff + 1.67*OtherAdvance + 2.52*ImpliedOuts

D = HR


#24    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 21:38

I made typing my equation for estimating “contact outs” in C: I forgot to include the Sac Fly. So it is 1.01*(AB + SF - H - K - RBOE - OtherSafe). This is just an estimate of what Tangotiger called Outs in his equation. In the equation above, I used his number for outs 1367321. The estimate is 1367318. Only 3 outs off of his.


#25    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 21:44

The beginning of my post above should read: “I made a mistake typing”

I keep forgetting to proofread. If you use any of Tangotiger’s Linear Weights, remember to include Sac Flys in “Contact Outs.”


#26    terpsfan101      (see all posts) 2008/09/03 (Wed) @ 22:09

Patriot,

I’m pretty sure we did this the same way. I zeored out the B coefficients I didn’t want to reconcile.


#27    terpsfan101      (see all posts) 2008/09/09 (Tue) @ 14:13

The negative B value of the IBB can be solved by using .6 as a coefficient in the A term. This makes sense because the score rate of the IBB is approximately 60% of the other on-base events in the A term. Would treating the IBB this way cause any problems?


#28    Tangotiger      (see all posts) 2008/09/09 (Tue) @ 14:19

Your suggestion is similar in spirit to how I handle the triples.  We recognize there are limitations within the construction and so we try to come up with something a bit more logical.

I’d be in favor of something like that.  I think Robert Dudek proposed something similar at BAtters Box a few years ago.  I’ll see if I can find it.  Where is that guy anyway?


#29    Tangotiger      (see all posts) 2008/09/09 (Tue) @ 14:23

My archived blog has a discussion on it:
http://www.tangotiger.net/archives/stud0112.shtml

Which points to this article:
http://www.battersbox.ca/article.php?story=20030828021716999

Where he says, in part:
A factor: Hits - Homeruns + .9*(HBP+W-IW) + .5*IW - CS - GIDP


#30    terpsfan101      (see all posts) 2008/09/09 (Tue) @ 16:25

I probably got the idea of the .60 IBB coefficient from Dudek, as I recall reading this article before. The problem I am having is that I don’t know the best way to test the new coefficient. I tested it on Barry Bonds 120 IBB 2004 season. Using 1 as the coefficient for the IBB, Bonds gets credit for 180 BsR. Using .6 as the coefficient he gets credit for 184 BsR.


#31    Tangotiger      (see all posts) 2008/09/10 (Wed) @ 09:59

{I wrote this to Clay}

In this equation:
P(N) = [ (N-1)! / ( 2! * (N-3)! ) ] * ( 1 - OBP)^2 * OBP^(N-3)

I think you meant to make this term:
( 1 - OBP)^2

as
( 1 - OBP)^3

If you work it out with sample data, you will see that my bug-fix will result in the sum of P(N) as 1.  In your case, it will equal to 1/(1-OBP).

Now, the idea you had about figuring the number of runners on base per HR is great at the team level; it won’t quite work at the player level.  If we can consider an extreme case of OBP=.999, with per 27 outs of 1 HR and 26972 walks, we can see how each inning will end with very close to 3 runners on base.  So, we’ll be left with almost 27 runners on base and the other batters all scoring.  In effect, a walk is as good as a run.  While the number of runs that we can presume scored with the HR is very close to 4.0, the run impact of the HR is very close to 1.0.

A Markov process will confirm that as well:
http://tangotiger.net/markov.html
(Set values to 28, 1, 0, 0, 1, 26972, 0 and click CALCULATE)
You get 26946.007 runs

And if you remove 1 HR, 1 H, and 1 AB, you get:
26945.005 runs

That difference is 1.002 runs, which is the effect of removing the HR from the game.

You will note that the results at the top has BaseRuns be awfully close, while Runs Created be awful, period.  More importantly, at the bottom of the page, we see the marginal run values of each event.  Markov correctly ends up with close to 1.00 runs for each positive event.  BaseRuns has the right idea (with the incorrect blip on the triples, of which I have a BaseRuns version that fixes the triples issue).  Runs Created is wrong.

So, I really like your idea at the team-level (or pitcher-level), but it cannot work at the individual hitter level.


#32    terpsfan101      (see all posts) 2008/09/10 (Wed) @ 16:58

I don’t know why I gave credit to Dudek for the .5*IBB in the A factor. This was David Smyth’s idea. Using trial and error, I have settled on .58*IBB in the A factor.


#33    Colin Wyers      (see all posts) 2008/09/11 (Thu) @ 14:52

All events have different score rates - we know that a runner who reaches on a single is less likely to score than a runner who reaches on a single. (I actually have that data in a table at home - part of the article on LWTS I’m writing for tomorrow.) How far do we want to go in modeling that in the A factor for BsR?


#34    Tangotiger      (see all posts) 2008/09/11 (Thu) @ 15:04

Right, that’s a good point.  You could have a Bs/(Bs+C) for singles and Bd/(Bd+C) for doubles, etc.

It could get ugly fast.  Incredibly, when you compare the Markov results to BsR, they are “close enough” that it seems almost pointless to overcomplicate BsR, other than as an academic exercise.


#35    terpsfan101      (see all posts) 2008/09/11 (Thu) @ 22:21

I agree that BsR is already complicated enough. I tested the .58 coefficient for IBB on Tango’s Full BsR equation. I finally decided on using .59, which is the largest value you could give the IBB before it’s “b coefficient” became negative. Here is what the “b coefficients” for the BB, look like using 1 as the coefficient for the IBB:

bWalk = .052
bIBB = -.483

Here is what they look like using a .59 coefficient for the IBB:

bWalk = .039
bIBB = .012

Depending on the BsR equation, I would set the “a coefficient” of the IBB to whatever gets you closest to the smallest positive “b coefficient”.

Sorry for the not so simple explanation.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 03:05
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 19:13
Offense by position groups by decade

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?

Nov 19 13:50
Response of a fired head coach