THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, June 16, 2008

Deprecate Runs Created in favor of BaseRuns

By Tangotiger, 03:12 PM

Bill James had said something that he had a to-do list, and that he will bump up some item if there was enough clamor for it.  I added my two cents and he replied:

To-do list? Deprecate Runs Created in favor of Base Runs.
Asked by: Tangotiger
Answered: May 31, 2008

If you can’t convince anybody, how can I?

Funny guy!  I believe that Bill James can move mountains, and the only thing stopping BaseRuns from supplanting Runs Created (the technical versions anyway) is that Bill James hasn’t embraced it.  And maybe the reasons he hasn’t embraced it is that people still buy into Runs Created (the simple version).  That is, the staying power of Runs Created (technical version) is the simplicity of Runs Created (simple version).

This is why I had suggested a while ago, to you guys, that Bill adopt BaseRuns as a technical version of Runs Created.  Clearly, the version he has now bears little resemblance to the simple version.  The technical version probably is more akin to Linear Weights (Estimated Runs Produced) than to the basic version of Runs Created.

Anyway, Bill can move mountains.  And BaseRuns provides a more solid foundation than the current incarnations of Runs Created.

If you are a BillJames subscriber, then tell him that he should finally adopt BaseRuns to supplant Runs Created.


#1    Tangotiger      (see all posts) 2008/06/17 (Tue) @ 16:24

It looks like Patriot will try to independently create his own BaseRuns version:
http://walksaber.blogspot.com/2008/06/run-estimation-stuff-pt-1.html

Hopefully, it’ll come close to what I got.

At the least, his LWTS numbers are pretty close to the recommended ones.


#2    Tangotiger      (see all posts) 2008/06/17 (Tue) @ 16:36

As a matter of principle, I am opposed to counting the SF separately.  A SF is “a flyball out, with a runner on 3B and less than 2 outs that scored that runner”. 

Why not have the “RBI-single” as a “single with a runner on base that scores the runner”, and make the RBI-single worth .80 runs and the non-RBI-single worth .40 runs?

You have empirical runs (output) on both sides of the equation.

Even the DP is a dubious category, as it’s similar to the SF.  The opportunity scale for the SF and DP is not the same as all the other events.


#3    Patriot      (see all posts) 2008/06/17 (Tue) @ 17:07

I don’t disagree with anything in Tango’s #2.

However, as part of a “path to acceptance”, I do think that having a version that deals with all of the official categories as Tech-RC does is a good thing, and it in no way needs to supplant Tango’s version.  I have had a few people email and ask about a version incorporating all of those categories, aware of Tango’s version but not wanting to estimate the missing data (whether this is a good attitude or not, I am not passing judgment on).

Using the same definitions for factors (initial baserunners, batting outs) that you did, this is what I got for B values (I did not include a fractional term for SH in the A and C factors, although that was a keep-it-simple move rather than a philosophical one):

B = .72S + 2.11D + 3.43T + 1.90HR - .01(W-IW) - .62IW + .11HB + .9SB - 1.22CS + .12 (AB-H-K-DP) - .06K - 1.71DP + .77SH + 1.16SF

For some reason it needed higher weights on extra base hits (relative to the Tango version).  The positive value for the out is also annoying, as is the negative value for walks.  All of that reduces the theoretical usefulness if nothing else.  Obviously we would expect to make a little more sense when we use more complete data (as Tango’s does).


#4    Rally      (see all posts) 2008/06/17 (Tue) @ 19:13

How did you get a negative value for walks?  Was that from a regression?

The positive value for contact outs doesn’t bother me, assuming they aren’t DP’s they should have advancement value.  It doesn’t break down theoretically because if there are no baserunners your BSR will still be zero.  The walks though...a team of Eddie Gaedels (like when the much favored Gondor played the shire, but ace Aragorn never made it out of the first inning) will advance quite a few runners.


#5    Patriot      (see all posts) 2008/06/17 (Tue) @ 19:30

No, it’s not a regression...it’s the B values needed to force the linear weights to hit the target linear weights for the sample in question. 

It would not be too much trouble to change the walk rate to something more reasonable, like .05, and then redo the other weights, but then it would no longer be a *perfect* match for the target linear weights.  Which is not to say that you should not do it, but the formula I posted was intended to match those weights.


#6    tangotiger      (see all posts) 2008/06/17 (Tue) @ 21:06

I didn’t realize that the SF was in the tech version.

From the standpoint of supplanting the tech version of RC, then I agree: use the exact same terms that James uses.  You can even do it based on his own construction (A*B/C) first, and then show the BaseRuns version, using the same terms, but with the HR removed, and the denominator as we are used to with B+C.

This way, we can see the effect.  Just a thought…


#7    Patriot      (see all posts) 2008/06/18 (Wed) @ 00:02

That is an excellent idea.  I will work that in somewhere.  A lot of stuff I’ve written lately is based on Tango ideas.  Good thing for the rest of us he doesn’t have time to do it all himself.


#8    David Smyth      (see all posts) 2008/06/18 (Wed) @ 18:09

The SF thing raises an interesting issue. You can do as James does and include SF as outs, plus some advancement. You can do as Tango prefers and include them as just outs.  Or you can give SF the HR treatment. HR are included as known runs, a lost runner, plus some advancement. A SF can be viewed as a known run, a lost runner, some negative advancement, and an out.

The objection to this is that it gets the formula closer to runs = runs. But if that applies to the SF, why doesn’t it apply to the HR? Is the treatment of HR in BsR just a form of cheating?

I’m starting to think that it is.


#9    Patriot      (see all posts) 2008/06/18 (Wed) @ 18:27

Wow, this is like Einstein deciding that relativity was bunk and that weightless elephants on frictional surfaces was the way to go in physics. grin

As I see it, the difference between the SF and the HR is that a home run is a fundamentally unique event under the rules of baseball.  There is no other event that entitles one to score a run without the danger of being put out that can occur independently of any previous series of events (this clause eliminates bases loaded walks and their cousins). 

A SF is just an accounting category.  For some reason TPTB decided to single it out for special treatment.  They could have done the same for the RBI single, the bases loaded walk, the RBI groundout, etc. but they chose not to. 

If one’s sole goal is to generate the best possible estimate of how many runs a team has scored based solely on their official statistics, than treating a SF in the manner that David espouses is the right thing to do.  But in attempting to model run scoring (at a level deeper than runs = runs), and especially in evaluating a player’s contribution, it is a big mistake.


#10    David Smyth      (see all posts) 2008/06/18 (Wed) @ 20:36

Patriot, I’m not ‘espousing’ that SF get the HR treatment, I’m espousing that HR don’t get the HR treatment.

ALL of these outcomes can be considered as ‘accounting categories’.

The task is to be consistent with the logic.


#11    Patriot      (see all posts) 2008/06/18 (Wed) @ 20:54

Espouse was a bad word choice on my part and misrepresented your position.  Sorry.  I should have said “explained” or something.

But I disagree that treating the HR differently is applying different logic.  A home run is a unique event because it is the only event that allows a run to score, guaranteed (accepting Robin Ventura and friends), independently of anything else that happens in the inning.

Any accounting that breaks things down into outcomes like hit type should account for the uniqueness of the home run.  Of course, you could eschew outcomes altogether and look at grounders, flyballs, line drives, etc. and in that case the home run could just be a FB/LB.

But if you’re going to get into outcomes on the level of hit types, I don’t see why you wouldn’t want to treat the HR specially.  It is special.


#12    tangotiger      (see all posts) 2008/06/18 (Wed) @ 20:58

The difference is that the HR is an event by the batter, just as the walk and single is, independent of the runners on base.  A SF is an event that is mate-dependent, just as the GIDP is.

The presumption among the run formulas is that the player will be presented with an average frequency of base/out states.  To that end, you can’t include the SF as its own category.

You can include reaching on error, but you cannot include fielder’s choice.


#13    David Smyth      (see all posts) 2008/06/19 (Thu) @ 05:52

I actually thought the word ‘espouse’ was cool.

I was just having a moment of doubt in the above post. I do agree that the fact that the known run on a HR is intrinsic instead of situational makes the HR adjustment legit. But I don’t see why it has to be a part of every version. In fact, I think I prefer the ‘basic’ version without it. Here is such a version:

A = H + BB
B = (2*TB - H + HR) *.75
C = AB - H

This still has advantages of BsR, such as propper capping of scoring at both ends.  And then, when you want something more advanced, bring in the HBP, SB, IBB, etc. And bring BB into the B factor. And bring in the HR adjustment…


#14    tangotiger      (see all posts) 2008/06/19 (Thu) @ 07:09

It’s an interesting thought… what happens to the run value of the HR as OBP approaches .400, .500… 1.000 in this form (and what is the best coefficient to use).  How bad to the run value of the HR get? Interesting…


#15    studes      (see all posts) 2008/06/19 (Thu) @ 09:20

FYI, I announced that morning that we’ll soon start tracking Base Runs at THT instead of Runs Created, in support of the “movement.”

http://www.hardballtimes.com/main/article/base-runs-and-carl-hubbell/


#16    Peter Jensen      (see all posts) 2008/06/19 (Thu) @ 10:04

I like the idea of the Base Runs formula as a teaching tool to describe the theory of how runs are actually produced and find it a useful formula when PBP information isn’t available or I need a quicky rough estimator for specific small sample situations, but I don’t see why it is even being discussed as a player run estimator for offensive players in the PBP era.  Base Runs is an estimate of Linear Weights and if you have PBP information available you can calculate Linear Weights directly from the Run Estimation Tables for whatever time period you are using for your calculations.  What is the value of estimating (which necessarily loses accuracy) Linear Weights when you can just as easily calculate the exact numbers?


#17    Patriot      (see all posts) 2008/06/19 (Thu) @ 10:54

I don’t have anything on David’s basic versions at the extremes.  However, it gives these LW for 1960-2004: .51, .81, 1.10, 1.55, .36, -.11

That’s not bad at all for a simple formula that doesn’t take the HR out; I’m impressed.  To get the homer down to the 1.45 range, I used B = (2.3TB - H + .5HR)*.65.  That gives .52, .82, 1.11, 1.46, .37, -.11

I’m sure someone could play around with it and get “better” coefficients, but probably not too much better if you want to keep the nice numbers--that’s already been harmed by the fractional TB and HR coefficients in my version.


#18    Tangotiger      (see all posts) 2008/06/19 (Thu) @ 11:43

Studes: Good job.  And you “get it” in that you did the BaseRuns with and without the player, with the difference being the player.

***

This part will be a bit technical, but you’d also want to add in an outs adjustment.  The best way to represent something is as runs per PA.  Now, in LWTS, you would do runs above average per PA, and simply add in the league average runs per PA, and you have your player’s LWTS-based runs created.

For a straight RC or BsR however, that relationship is RC per out.  Now, per out is not really what we want.  We’d really like to have per PA.  So, what you have to do is figure out the league average OBP (say .340) and the OBP of your player (say .400), and take that difference.  That’s .060 fewer outs per PA for your player, which has indirect run value.  How much?  Well, if you score 4.59 runs per game, that would be .17 runs per out.  So, you get a bonus of .17*.06=.01 runs per PA for not causing outs, that your teammates will end up scoring with.

Multiply that by the number of PA you have, and that’s your “final” runs-created number you have (as a function of PA).

Unless you do that, your runs-created number is a function of outs, meaning that you’d really want to divide RC by outs in order to scale him correctly.

If that didn’t make sense, maybe Patriot can lend us his explanatorial expertise (if he agrees with me, that is).

Going back to your Chipper number, that may explain why you have such a big gap, likely a gap that he doesn’t deserve in this case.


#19    studes      (see all posts) 2008/06/19 (Thu) @ 11:55

Thanks, Tango.  Believe it or not, I’m actually able to track that.  I’ve been hanging around here too long.

For the regular THT stats, however, I don’t want to go through the trouble of subtracting each player from an average team, etc.  That would take way too much CPU and programming code.  So Bryan and I are working on a simple version, based on the player’s freely available stats and calculating a league adjustment to the “B” factor.  I hope that’s “okay.”

I assume I should still make the out adjustment in that case?


#20    Patriot      (see all posts) 2008/06/19 (Thu) @ 12:50

Studes/19: if you want to approximate the effect of subtracting the batter’s stats from a team, you can use the same kind of theoretical team approach that BJames uses in RC now.  The BsR one will be a little more intensive, but not too bad.  Just so that it’s clear, David came up with the BsR adaptation of the concept years ago.  Here’s an example from my site:

TTBsR = (A+ 2.41PA)*(B + 2.44PA)/(B + C + 7.86PA) + D - .75PA

The actual coefficients you want to use will depend on the league average, but you can use default values as Bill does without causing too much distortion.  But they should be based on the specific BsR formula that you use.  If you post it, I’d be happy to give you a customized formula to use.

Tango/18:  Agreed, I’m not sure I can add anything.  I prefer to explain it as giving credit for the extra PAs created rather than the outs avoided, but that’s six of one and half a dozen of the other. 

For those who weren’t there, that’s what we used to call R+/PA back in the FanHome days.  There was a poster named Sibelius who introduced that approach on the board.

If you do it right, it will produce the same Runs Above Average figure as the per out approach, but it allows you to use PA as the denominator as Tango said.

Personally, I always leave the runs figure alone in its stand-alone form, but it is more telling if you figure R+.  Of course, if you do that, then you will need to base your marginal runs for Win Shares on PAs rather than outs.


#21    Patriot      (see all posts) 2008/06/19 (Thu) @ 12:59

Sorry for the double post.  Somehow I missed that Studes wants to use a B multiplier to bring everything in line to the total runs scored (This is a bad thread for me.  I just noticed that I said “accepting” instead of “excepting” above.  Oh well).  In that case you don’t want to use default values, since that would undo some of your precision.

What you can do is this.  Figure the league total of the A factor per PA--multiply that by 8 and call it a.  Figure the same for B, C, and D (ex. B/PA for the league, times 8) and call them b, c, and d.

Then you can use:

TT BsR = (A + a*PA)(B + b*PA)/(B + C + (b+c)*PA) + D - (a*b/(b+c))*PA

Whether that’s more than you want to deal with is up to you of course.  And you still have Tango’s suggested PA adjustment on top of it, so it is kind of cumbersome.


#22    studes      (see all posts) 2008/06/19 (Thu) @ 14:28

Heh.  This is always what has stopped me from posting Base Runs in the past.  All the theory and adjustments gets intimidating and makes my head hurt.

Here’s the exact formula we are working on:

For each player, calculate “A” as
hits plus walks and HBP
minus home runs
minus .5 times intentional walks (assuming intentional walks are
included in walks total)

- For each player, calculate “B” as:
1.4*Total Bases
minus .6 (point six) times hits
minus 3 times home runs
plus .1 (point one) times (Walks plus HBP minus IBB)
plus .9 (point nine) times (SB minus CS minus GIDP)

...and then multiply that player’s “B” by the league’s “B adjustment (see below).”

- For each player, calculate “C” as:
At bats
minus hits
plus GIDP
plus CS

- D equals home runs

- E adjusts for outs, and is calculated as:
Player OBP minus league average OBP,
times league runs/game divided by 27,
times number of player’s PA’s

Then, each player’s Base Runs calculation is A times B/(B+C) plus D plus E.

To calculate each league’s “B Adjustment, do the following:

- Calculate “B” for the league, using the formula above.

- Calculate the league’s actual scoring rate, which is:
(Runs minus home runs)
divided by…
(Hits plus BB plus HBP minus HR minus .5 times IBB)

- Calculate C for each league, using the formula above

- Backtrack from league C to a new B* by…
Multiplying C times…
League scoring rate divided by (1 minus the league scoring rate)

- Finally, calculating the league’s “B adjustment” by dividing the new league B* by the original league B.

Does this seem appropriate to people?


#23    Tangotiger      (see all posts) 2008/06/19 (Thu) @ 14:53

I prefer to explain it as giving credit for the extra PAs created rather than the outs avoided, but that’s six of one and half a dozen of the other.

While true, I prefer your way.  It is clearer, and the direct result of outs not made are the extra PAs.  And, since we prefer to talk about Runs per PA to begin with, and since we will be adding the league average of runs per PA, I think it makes more sense to speak it the way you are describing it.

So, .06 fewer outs will lead to an extra .091 PA, and if each PA generates .12 runs, then that’s .011 runs per playerPA that you want to add to his Runs Created, because of the extra PAs he generates for his team.


#24    studes      (see all posts) 2008/06/19 (Thu) @ 15:41

Follow up question: Tango, are you saying that RC and the way I approached Base Runs in that article essentially have two different denominators?

I ask because I know BJ divided RC by outs in Win Shares.


#25    Tangotiger      (see all posts) 2008/06/19 (Thu) @ 16:38

Since Patriot has studied this in-depth, I’d like to hear (see) him speak (write) first.  I’ll either defer to his analysis, or make some minor point to piggyback on him.


#26    Patriot      (see all posts) 2008/06/19 (Thu) @ 16:51

Studes/22: You can save yourself a couple steps in figuring your B*:

B* = C*(Actual Runs - HR)/(A - Actual Runs + HR)

As for the rest of it, your approach is applying BsR directly to the player, which is not as bad as applying RC directly to the player, but it’s not as good as LW or the differential BsR approach.  If you’re going to apply BsR directly to individuals, you really need to use the differential/TT approach.


#27    Patriot      (see all posts) 2008/06/19 (Thu) @ 17:13

Studes/24 & Tango/25: Let’s break down run estimators into a few classes (apologies in advance for the redundancy of what follows):

A: Multiplicative (RC, BsR, Gimbel, Van, Cramer, maybe some others I’ve forgotten)

B: Linear

C: Class A methods applied to individuals through some sort of team-modeling approach (TT RC, Differential BsR, MLV, others)

The class A models shouldn’t be applied directly to individuals.  If they are, they inherently presume that the player is his own team, and thus should have outs as the denominator.

The class B models are great to work with, and they have PA as the denominator, particularly if they start by measuring runs above average (Palmer’s Batting Runs).  If you start with the absolute (-.1 out value) form, then you can use the R+ approach.

The class C models are intended to be applied to the player, so they inherently need to be contextualized on the team level.  The team level is outs, but team outs, not player outs.  Team outs are directly related to team PA, of course, through the rate of getting on base/avoiding outs.  So by applying the R+ adjustment to the numerator (which is the initial runs estimate), we can then use PA as the denominator, since the R+ adjustment has accounted for the player’s impact on the conversion between team outs and team PA.

In your (Studes) article, you looked at James’ RC and a differential BsR--both class C estimators.  I don’t think they should have different denominators, so I don’t think that was the cause of the Chipper discrepancy (neither James’ RC nor your differential BsR accounted for the PA->O adjustment; they were both just the raw figures).  The Chipper discrepancy is probably just RC overvaluing singles as it is wont to do (when you’re hitting .400, RC is going to overvalue you, even with the TT adjustment).  Perhaps Tango was getting at something else in bringing that up, and we are not talking about the same thing.

I do think that Bill dividing RC by outs is really not the right denominator--it will give the same result as the R+ approach if you compare to average, but not for other baselines.  And in WS, he’s comparing to a 52% baseline to get marginal runs.  He should do the R+/PA that Tango explained--but this makes it harder to determine what the replacement performance is, since the Win Shares definition of replacement level is 52% of average team R/G, which is equivalent to 52% of average runs/out.

I don’t know if that made any sense.


#28    studes      (see all posts) 2008/06/19 (Thu) @ 17:15

If you’re going to apply BsR directly to individuals, you really need to use the differential/TT approach.

Thanks, Patriot.  David G. also convinced me to implement the “differential” approach, using the league totals as the baseline.  That’s what we’ll do at THT, using the formulas listed above for the “Base Runs w/o the player.”

I hope that makes sense.


#29    Tangotiger      (see all posts) 2008/06/23 (Mon) @ 14:17

Here’s Patriot part 2:
http://walksaber.blogspot.com/2008/06/run-estimation-stuff-pt-2.html

The thing with the walk.... the way the basic BsR construction works is that all baserunners start off with the same scoreRate (say 30%), whether they got on base via walk or triple.  It’s in the accounting for it in the B term that everything gets realigned properly.

The walk’s run value is very close to the chance that the random baserunner will score.  If the random baserunner is going to score say 30% of the time, then you know the guy getting on base with the walk and single will score less than 30% (while the double and triple will score more).  So, if the walk will score say 25% of the time, that makes the “getting on” value of the walk as .25 runs.  But, BaseRuns assigns .30 runs for that.  It overcompensates by +.05 runs in this illustration.  Walks can also move runners over.  But, since BsR overcompensated the getting on, it’s gotta nick the “moving over” that much.  As it turns out, it comes out to being very close to a wash, to the point that in certain run environments, the extra penalty in the B term will become negative.

That’s the issue we’ve come up against, that while we’d like to create something independent on the environment, so that it can in fact be correct in any environment, it doesn’t work that clean.  I mean, it’s incredibly clean, as far as these systems go.  BaseRuns takes a back seat to no one.  But, you always have a couple of wrinkles to contend with. 

Wrinkles are better than holes.


#30    Dave Smyth      (see all posts) 2008/06/23 (Mon) @ 17:15

We now know that there is a theoretically ideal scoring context for baseball--it’s the context where the scoring rate of the BB is the same as the score rate for the avg runner. smile

Has Patriot (or anyone) done an accuracy test for real MLB teams of the HR adjusted version vs the non-HR adj version?


#31    Patriot      (see all posts) 2008/06/23 (Mon) @ 18:16

David, the formula you gave in #13 has a RMSE of 25.41 for teams 1990-2005 (except 1994).  This version

A = H + W - HR
B = (1.4*TB - .6*H - 3*HR + .1*W)*1.02
C = AB - H
D = HR

has a RMSE of 24.80.  The version of the no special treatment approach that I posted in #17 has a RMSE of 24.84 (if the multiplier is set at .633).

For a reference point, basic ERP has a RMSE of 25.34 and basic RC has a RMSE of 26.32. 

BsR has two powerful things going for it; the special treatment of the homer is one, but as this exercise hints, the use of a proportion for baserunners scored (B/(B+C)) rather than a ratio (B/C in RC) is also a boon.


#32    Dave smyth      (see all posts) 2008/06/23 (Mon) @ 18:25

If you’re so inclined, how bout testing a non-HR basic version which underweights the HR a bit (which helps compensate for the overweighting of BB, due to the cross-correlation between hr and bb):

B = (2*TB - H) * .79


#33    Patriot      (see all posts) 2008/06/23 (Mon) @ 19:41

24.95 for that one.  Linear weights for 1960-2004 are .52, .83, 1.14, 1.45, .36, -.11. 

I actually prefer that one (at least with the actual teams), since the linear weight of the homer is 1.45 v. the 1.55 for the other version.


#34          (see all posts) 2008/06/23 (Mon) @ 23:07

Has anyone looked at or have any thoughts on Eric Walker’s run scoring formula (http://highboskage.com/formula-proof.shtml)?  It seems rather complicated but when I quickly checked the RMSE from 1954-2001 (the data he posts) the RMSE was 22.32 runs.  This also was better than any of the run estimators in Patriot’s study (http://gosu02.tripod.com/id80.html).


#35    Patriot      (see all posts) 2008/06/24 (Tue) @ 00:11

Thanks for pointing that out Victor; I have visited the HBH site several times over the last decade (ugh, literally) and never seen that page where they actually give the formula before.

Taking a cursory look at it, it is a RC-type model: A*B/C.  The reason it looks confusing is that the A and B components each have a regression equation attached, but they seem to be:

A = .917(H + W + HB + E - CS) + 4.703
B = .922(TB + .4(W + HB) + .7SH + .8SB) + .007
C = PA

Regardless of the RMSE, it does not appear to be unique conceptually, although I assume he developed it independently (and perhaps prior to) James publishing RC, which is impressive (although not relevant to its usefulness today).


#36    Tangotiger      (see all posts) 2008/07/08 (Tue) @ 10:41

Patriot’s part 3:
http://walksaber.blogspot.com/2008/07/run-estimation-stuff-pt-3.html

Great stuff.  I didn’t realize that the singles value was that poor in the most current version of RC.


#37    Tangotiger      (see all posts) 2008/07/08 (Tue) @ 12:38

I forwarded Patriot’s article to Bill, and included this note:

I suggest reading the entire article, which gets into technicals.  His conclusion here reads:

“When you look at the measures to which Bill James has resorted to prop up the accuracy of RC, you have to ask why not just take one more step and take HR out of A? That is the only difference between the two formulas at this point, once James abandoned using TB in the B factor (instead considering S, D, T, and HR separately) and went to two decimal place weights.”

That’s really where it comes down to.  Why not separate HR from the rest, since we obviously have a floor to the HR (run value = 1), and you once agreed with me that the HR cannot have an ever-rising value as the OBP goes up?

I’ll let you know his reply, if it’s forthcoming…


#38    David Smyth      (see all posts) 2008/07/09 (Wed) @ 10:49

Who really cares what B James thinks? He probably thinks that there are several/quite a few run estimators that all do a good job, and can be used to answer real/practical questions. So, why not stick with runs created, he figures. I guess I shouldn’t speak for him.

Patriot keeps writing about how important it is to have the plus-1 values equal to the lwts values. I think that method is a helpful guideline, but not the end-all be-all. The time unit for run production is not the team-season, the game, or the out. It’s the inning, because that’s when the clock is reset. So, I think the best way to find the B values is to find which are most accurate on the inning level, given the events you are including/not including. I would think that the only time such values should exactly match up to the lwts values is when all outcomes are included. The only place I’ve seen that is Tango’s BsR addendum formula.

And Patriot, I wish you would stop using W in place of BB. W means wins in base ballsmile


#39    Tangotiger      (see all posts) 2008/07/09 (Wed) @ 11:41

If we look at it at the inning level, we’ll get the LWTS values.  So, I’m not sure I follow the objection.

***

I only “care” what Bill James thinks because he has drones that follow him, and I’m tired of having to undo the damage.  Baseball Prospectus and Baseball-Reference are also frustrating for me. 

Do you know how complicated it is to figure out OPS+?  The only person who would *ever* figure it out is Sean Forman.  He gives you the bits and pieces to do it yourself, but since he does it for you, you won’t do it.  And, what if he didn’t do it for you?  Well, you would *not* calculate OPS+, because there are better ways to do it.  You would do RC+ or BsR+ or anything else really.  But, we’ve now got a legion of drones that use OPS+ (which undervalues BB and overvalues HR) simply because they choose to be drones.

Do you know how many people average ERA+?  Ugh.

Do you know how complicated EqA is, completely unnecessarily?

And how LEV is just plain wrong?

And how WARP uses a baseline no one uses, forcing MORP to use an exponential multiplier to untangle its mess.

Bill James, Sean Forman, Clay Davenport, Keith Woolner, and Nate Silver.  You’ll be hard-pressed to find 5 guys with more intelligence than these guys, and yet we have to be subjected to their drones who don’t question enough what their champions do, to force them to make changes.

Bill James can undo 20 years of damage with a stroke of a pen (or a tap of the keys).  I care what these people say because they will save me years of aggravation from their drones polluting cyberspace.

Every week, we are subjected to someone who invents bases per out or bases per PA.  And every week we are subjected to someone who uses some stats that the Big 5 are propogating, even though they need to be corrected.

***

Just to be clear: you are a drone if you follow things blindly.  If you like OPS+ after understanding its limitations, and only use it within that constraint, you are not a drone.

***

Btw, Bill wrote back, and he said next time he’s in a “RC” phase, he’ll try to think about what we’ve been saying.


#40    fifth of      (see all posts) 2008/07/09 (Wed) @ 15:41

Tango/39: I strongly agree with this post.


#41    Patriot      (see all posts) 2008/07/09 (Wed) @ 18:06

Matching the linear weights values rather than attempting to maximize accuracy with team seasonal data is the lazy man’s way of getting at the inning-level values.

I don’t think it’s paramount to precisely match them, but if I am trying to develop a formula for a given dataset, it’s a great place to start from.  My problem with RC of course is not that it gives .49 instead of .46 (or whatever) for a single, but that it gives .6. 

Also, I hope I’ve made clear over there that I’m not trying to supersede Tango’s formula, only offering a version that uses all (and nothing more than) the categories that Tech-RC uses. 

I don’t have any retort on the issue of W for walk other than that I am stubborn and set in my own misguided ways.  I also use HB which is actually the abbreviation for the pitcher’s version, and DP which is not a batting stat at all.  SO and K is the same thing as well.

Apologies for taking this off-topic, and not to demand your time (IOW, feel free to tell me to go jump in a lake), but David, I’d really like to see you write a little article on Base Wins and the theory behind it sometime.


#42    David Smyth      (see all posts) 2008/07/09 (Wed) @ 19:42

----"If we look at the inning level, we get the lwts values.”

That’s only if you are including all outcomes, I think. If you look at all innings since 1955 and look at only the basic outcomes, you’ll get something like .52 for a 1b for BsR, I expect, as the most accurate weight.

And that .52 is fine with me, given the need to compensate for non-included info. I think that is better than forcing the correct lwts values, at the expense of disrupting the formula into allowing negative runs, etc., and a less accurate overall RMSE.

The .6 runs for a 1b from RC is telling us that this is the value that best compensates for missing information, at the team-season level. So, that’s what he should use, IMO. But it also suggests that the RC model is somewhat inferior. That’s what the plus-1 event test is good for, IMO--for evaluating the structure of a formula AFTER the most accurate version is determined. NOT for determining that version in the first place.

As far as an article on Base Wins, can you explain why that would be worthwhile, Patriot? I mean, it’s an essentially cruder version of WAA in that it uses a cruder R/W converter.


#43    Patriot      (see all posts) 2008/07/09 (Wed) @ 23:00

The .6 runs for a 1b from RC is telling us that this is the value that best compensates for missing information, at the team-season level.

Assuming that the formula has been optimized for accuracy on that level.  But I’m not sure that it has; James probably had some tradeoff between lowest RMSE and giving reasonable results.

There’s a number of ways you could go about trying to find the most accurate on the team-season level.  One would be to run a linear regression, and then try to match those values.  Another would be to calculate the B’ for each team, and then run a regression to get the best-fit B coefficients.

Even then, there’s still a tradeoff between accuracy and reasonableness (see the SF).  James has not gone for broke on lowering the RMSE as can be seen by the fact that his SF is only worth .03 runs for 1961-2004; lower than its empirical LW value, and well below its “cheating” value.

The BsW comment was based on some interesting theories you used to espouse (my new favorite word) about R/W converters and the theoretical value of runs and outs being equal to their reciprocals on a per game basis.  I always found it an interesting perspective...but if you no longer hold it, then I guess it wouldn’t be a very interesting article grin


#44    David Smyth      (see all posts) 2008/07/11 (Fri) @ 18:43

This has nothing to do with Base Runs, but I have nowhere else to post it.

Say we have 2 teams which both score/create 5 R/G. One team is hi OBA/lo SLG, and the other is lo OBA/hi SLG.

I’m speculating that the hi power team will, everything else being equal, have a more ‘consistent’ offense. If so, they will have fewer ‘superfluous’ runs, and therefore tend to have more wins.

Any thoughts?


#45    Colin Wyers      (see all posts) 2008/07/11 (Fri) @ 19:29

Sal did some work on the value of consistancy:

http://www.hardballtimes.com/main/article/consistency-is-key/

As far as whether a high OBP team is more consistent, I have no idea off the top of my head; it sounds true, but you can get into a lot of trouble that way.


#46    David Smyth      (see all posts) 2008/07/11 (Fri) @ 20:41

No, Colin, I’m suggesting that a high POWER team should be more consistent, not a high OBA team.

Here’s an extreme example. 2 teams, one is all HR and SO, the other is all BB and SO. Both score 5 R/G. According to Tango’s Markov program, that means 5 HR per game, and 21 BB per game, respectively. If we assume that in any individual game, the ratio of each outcome varies between half and double the avg amount, then the HR team will vary between 2.5 and 10 runs, while the BB team will vary between .8 and 20 runs in their games.

Lots more superfluous runs for the BB team. Of course, the assumption that the half/double boundaries (I am simplifying of course) apply to both the HR and BB teams is the question…


#47    Tangotiger      (see all posts) 2008/07/14 (Mon) @ 14:37

Joe Posnanski adopts BaseRuns before Bill James:
http://joeposnanski.com/JoeBlog/2008/07/11/late-to-the-base-runs-party/

Bill James to a sabermetrics party is like Jack Nicholson to a Hollywood party: it’s not a party until Jack shows up.  Come on Bill.  What the heck are you waiting for?  Joe Posnanski shows up fashionably late and becomes a believer. It’s time to make an entrance Bill…


#48    Tangotiger      (see all posts) 2008/07/28 (Mon) @ 10:36

http://walksaber.blogspot.com/2008/07/run-estimation-stuff-pt-5.html

Patriot, part 5, continues his lamplighting

This is not to say that you should apply BsR directly to individuals--I would never endorse that. My intent is to suggest that the distortions caused by applying RC to individuals are due in larger part to that model’s flaws than to the mistake of conflating an individual with a team.


#49    Tangotiger      (see all posts) 2008/07/28 (Mon) @ 10:43

Patriot linked to an article I had completely forgotten I wrote:
http://www.tangotiger.net/reconcile.html


#50    Tangotiger      (see all posts) 2008/08/04 (Mon) @ 14:12

Part 6:
http://walksaber.blogspot.com/2008/08/run-estimation-stuff-pt-6.html

...I think that it is probably better to stick with the linear weights on a league level for most purposes. In stating that preference, I am not claiming that such an approach is “better” or any such thing. It is just my opinion that the extra effort put into the theoretical team calculation is not justified by the differences in the final estimates.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 14:26
Mail: rWAR v fWAR

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 13:00
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 12:05
Could Rob Dibble have been a comp for Strasburg?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II

Sep 01 22:11
PITCHf/x Summit 2010 - Recaps