THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, September 29, 2006

Why Does Runs Created Make Sense?

By Tangotiger, 10:08 AM

Boring and with math gyrations that I typically don’t like, here’s how I see it:


Here’s the RC equation that I like to use:
Runs = .87 * PA * OBP * SLG

The .87 * PA term is better than AB, because this addresses the weakness in the RC with respect to walks. 

And, we all know that true run scoring is:
Runs = Baserunners * ScoreRate + HR

Presuming that 92% of the OBP rate is baserunners, we can rewrite this equation as:
Runs = (.92 * OBP * PA) * ScoreRate + HR

The scoreRate is about 40% of (SLG+OBP), more or less.  Not an exact equation of course, but will do the job for our purposes.  So:

Runs = (.92 * OBP * PA) * .40 * (SLG + OBP) + HR

Which gives us:

Runs = (.368* PA * OBP * (SLG + OBP)) + HR

Compare this to Runs Created:
Runs = .87 * PA * OBP * SLG

So, we see that the main portion, the PA * OBP * something on both sides.  Runs Created says it should be SLG, while in reality the HR portion should be stripped somewhat from that term.  The basic essence of RC makes sense, that you want OBP multiplied by SLG somewhat.

My illustration here makes most sense if the balance of OBP and SLG is what historically happens.

#1    dq      (see all posts) 2006/09/29 (Fri) @ 15:11

When Bill James was asked about OBP + Slug, he stated you should probably multiply them instead. I recently reread that, and looked to see how accurate it was, and whether anyone was using it.
I found it to be about as accurate as the other methods out there. I also looked at game logs from
Retrosheet, and saw that on a game by game basis did as well as the other estimators.

I was surprised to see very very little use of this formula.

OTS, or PA * OBP * SlugPct * .87 (this constant has declined over the years) is a wonderful stat.

It is (1) just about as accurate as most any other
run estimating formula

(2) Converts to a rate stat (OBP * Spct * .87) very easily, making it easy to do analysis, and is better to use than OPS, which so many people use

and (3) can be shown to an average fan and he can understand it.

To me, it seems like people have really tried to fine tune things, to get maybe .5% better on estimating runs - that is like 4-5 runs a year per team - remember this is an estimation, not an equation.

I’ve been thinking about writing about the accuracy and simplicity of this.


#2    Rally Monkey      (see all posts) 2006/09/29 (Fri) @ 18:53

I called that OXS instead of OTS.

Used it so long ago that I had forgotten about it.  Its been over a decade since I’ve had web access and spreadsheets, but in the old days when I just had my trusty calculator and the USAToday Tuesday sports section, the easiest way to get an approximation of RC was to multiply OBP by total bases.


#3    tangotiger      (see all posts) 2006/09/29 (Fri) @ 19:26

OBP x TB is of course OBP x SLG x AB.  This undervalues the walk.  OBP x SLG x PA x constant puts the walk pretty much right where it should be.  If you run the “plus 1” method on this equation, or better yet, use Patriot’s Calculus/limit method, you will see this clearly.


#4    studes      (see all posts) 2006/09/30 (Sat) @ 07:06

Rally Monkey, that’s great.  I used to do the same thing.  In fact, I used to make sure my business trips were on Tuesdays and Wednesdays (those were the days, right?), so if I had to be on the road I could at least get the USA Today sports section.  I’d put my RC calculations in the right margin.


#5    Guy      (see all posts) 2006/09/30 (Sat) @ 08:07

Tuesdays for NL, Wed for AL, I think.  Was the lifeblood for fantasy baseball leagues.....


#6    David Smyth      (see all posts) 2006/10/01 (Sun) @ 04:55

Taking Tango’s RC formula of .87*PA*OBA*SLG…

If you multiply it by 1.5*(1-OBA), you get a factor of 1.00 for a normal OBA of .333.

Applying this to the 2001 season of B Bonds (.515/.863/664PA), I get 257 RC according to Tango’s formula. With my adjustment, I get 187 RC. The “actual” RC (B James new RC) was 191.

It’s just a simple way of getting rid of the self-interaction problem.


#7    tangotiger      (see all posts) 2006/10/01 (Sun) @ 16:42

So, one of the components becomes OBA * (1-OBA) * SLG * PA * 1.3.  The OBA * (1-OBA) is .222 when the OBA is .333.  When OBA is .5, it’s .250.  When OBA is .25, it’s .188.  When OBA is .75, it’s also .188.

It’s a fascinating addition to the equation, as a way to limit the self-interaction.  I have no idea if what David did makes sense, but, I think I’ll run some tests and see.


#8    David Smyth      (see all posts) 2006/10/01 (Sun) @ 17:25

Well, as to the question if it “makes sense”, remember that we are dealing with a simple shortcut method. IOW, I think my addition makes as much sense as the formula that it is being added to--but not as much sense as the better RC methods..

But, interaction is some mix of OBA and SLG. So, ideally, the adjustment would reflect that. But, I think such a construction would be more complex than this shortcut method warrants. Basing it only on OBA works well for most players, and turns out to be a convenient calculation.

So, the adjustment I suggested works well when the batter has both a high OBA and a high SLG (as is usually the case). In the odd case of a Boggs type hitter (avg SLG, hi OBA), he will tend to be underrated by using this adjustment.

But, I think it’s certainly better than crediting Bonds with 257 RC in 2001.


#9    tangotiger      (see all posts) 2006/10/02 (Mon) @ 04:43

David,

No question that at some point all these estimators don’t make sense.  I guess I’m asking not if what you did makes sense, but at what point doesn’t it make sense.  It’s such a nice and simple little adjustment, that I’m hoping it makes sense in the normal range, and has “extended sense” in the wider range.  And I just want to see how far it does make sense.

Tom


#10    Chris Miller      (see all posts) 2006/10/02 (Mon) @ 12:59

The value of SLG goes down as OBP goes up, so David’s adjustment makes sense, but if you had two players w/ identical SLG and PA, and one had an OBP was .450, and one had w/ a .550 OBP, they would have created the same number of runs based on Davids adjustment, and if the said OBP was .600 that player would have created less runs.  It definately doesn’t see like that should be the case (at least to me).

That may actually be correct though, since a very high walk player (Barry Bonds) would not have many opportunities to advance runners compared to a low walk player and, as David pointed out, it works about perfectly for Barry Bonds 2001 season, but looking at it, it understates his 2002-2004. 

One way I can see to work around this (while using OBP and SLG as the base) is to create two components from OBP and SLG as R and RBI-HR, and calculate R based on OBP, PA, and HR, and base the RBI-HR calculation on SLG and AB to correct the problems w/ Walks in RC.  At that point you would have to add HR and AB to get the calculation, and it’s no longer “simple” so I believe you would want to use one of the more advanced RC formulas at that point.


#11    dq      (see all posts) 2006/10/02 (Mon) @ 15:19

The basic formula of OBP * SLG * PA * constant (.9044)
comes out with a variance of about 3% from 1957-2005 (retrosheet era);
The best of the methods Ive tested is about 2.6% - that is about .4% runs - or 3 runs out of 800

I’m assuming people have spent a lot of time, and I don’t know if they are getting any better than about a 2.5% variance or so.

I don’t know why you want to go to the “advanced” formulas - You gain 3 runs, and lose most everyone in the formulas. You’re doing a heck of a lot of work for those 3 runs.

We are “approximating runs”.

I also looked at game by game for those years, and found that the OTS method gets the score of an individual game right better than base runs does, although Extrapolated Runs did do better.


#12    tangotiger      (see all posts) 2006/10/02 (Mon) @ 15:45

If you are going to use a LWTS-type equation, why in the world would you use Extrapolated Runs?  This has been explained here:
http://www.insidethebook.com/ee/index.php/site/comments/linear_weights_by_run_environment/

dq, rather those posting statements, can you post the results, as well as what versions of everything you used?


#13    Chris Miller      (see all posts) 2006/10/02 (Mon) @ 15:49

dq, were you referring to using basic RC at the team level?  That’s what I thought Tango was talking about in the entry, but (thought, at least) that David had made a simple adjustment to make it more useful for individual batters seasons, which is what I was responding to.  I should have quoted him.


#14    Chris Miller      (see all posts) 2006/10/02 (Mon) @ 15:52

And I agree, basic RC does work to approximate team run scoring nearly as good as any of the advanced formulas.


#15    Guy      (see all posts) 2006/10/03 (Tue) @ 02:23

Could you remove the interaction effect for individual players by working off the league averages?  Maybe use something like .87 * PA *(OBP*LgSLG + (SLG-LgSLG)*LgOBP).  Not as elegant as David’s formula, of course.


#16    Chris Miller      (see all posts) 2006/10/03 (Tue) @ 11:09

Guy, I was thinking something similar, basically you’re estimating R+RBI relative to the league averages.  I think that’s why OPS or X*OBP+SLG type stats work so well for batters.


#17    Chris Miller      (see all posts) 2006/10/03 (Tue) @ 11:30

Guy, what is the logic of (SLG-lgSLG)*lgOBP?  Why would you remove lgSLG from SLG?  It definately scales to RC, I’m just understanding why. Thanks.


#18    Guy      (see all posts) 2006/10/03 (Tue) @ 11:34

The idea is to let a hitter’s OBP interact with LgSLG (not his own), and his SLG interact with LgOBP.  Actually, for this kind of down-n-dirty metric, you could just use constants, at least for recent decades:  .87 * PA *(OBP*.43 + (SLG-.43)*.33).  Not sure if it works though.


#19    Guy      (see all posts) 2006/10/03 (Tue) @ 11:40

Alternatively, take Tango’s formula and subtract the interaction:  (SLG-LgSLG)*(OBP-LgOBP)*PA. Should be the same thing if I’ve done this right.


#20    Tangotiger      (see all posts) 2006/10/03 (Tue) @ 11:52

Guy, your second to last post is essentially:
PA * modifiedOPS * constant

Your modifiedOPS is 1.27 OBP + SLG, more or less.  At the very least, it should be 1.8 OBP + SLG.  And, in any case, goes completely against the RC principle of multiplying, not adding.


#21    tangotiger      (see all posts) 2006/10/03 (Tue) @ 12:10

The central part of the model of run scoring is that you need the number of baserunners times the % of times they will score.  To that end, you need to start with OBP * PA multiplied by something.

If the “something” that I initially provided, .368 * (OBP+SLG) doesn’t work well-enough, it’s simply a matter of running some regressions.  Maybe it should be .184 * (OBP + SLG + .750), or maybe just solve for:
a * (b OBP + c SLG + d OBP * SLG + f)

Of course, you’ll have one equation for teams, and another for players.  The regression against teams is easy enough, team runs.  For players, it’s not so simple, as you’d need to use a Theoretical Team approach with BaseRuns.


#22    Guy      (see all posts) 2006/10/03 (Tue) @ 12:10

Not at all.  I’m still using OBP * SLG.  But I’m essentially having the hitter’s OBP interact with league average SLG, and vice-versa.  Think of it as a 2x2 grid:  I’m subtracting out the cell where the above-avg part of a hitter’s OBP is multiplied by the above-avg part of his SLG.

But as I say, don’t know that it works.


#23    dq      (see all posts) 2006/10/03 (Tue) @ 13:48

Okay,

here’s what I did. I took all the team totals from the Jarvis site (1957-2005), and compared various run estimation tools

against each team’s totals. I used the following :

Estimated Runs
Runs Created
Base Runs
Ext Runs
LW Bat Runs
OTS

I then summed the absolute value of the difference between the run method and actual runs. I summed that number, and divided

by the total number of runs.

I got :

Estimated Runs2.6%
Runs Created2.8%
Base Runs2.6%
Ext Runs2.7%
LW Bat Runs2.8%
OTS3.0%

Now the average team had 693.9 runs, and 0.4% is 2.8 runs a year. I can get OTS to be a lower lower than 3.0%, but it makes
the simple formula more difficult, and probably improves the overall accuracy by 1-2 runs a year.

Now, I was worried since I am estimating runs, maybe the other methods are more precise. So I decided to look at them game by

game and see how good OTS is versus other methods. Retrosheet has all the game logs for 1957-2005, the same data I am looking

at above.

So I estimated runs on a game by game basis and found out how many times the method calculated the right number of runs. If

one method was a lot better, it would calculate the number of runs scored in a game better than a rough estimator. My results
are as follows:

Estimating Runs on a Game Basis

Summary of GamesOTSBaseRunsExtrap Runs

1957-1983 97,578 25,150 24,262 25,637

1984-2005 94,380 24,705 24,327 25,123

Total 191,958 49,855 48,589 50,760

26.0%25.3&.4%

so, OTS did about as well as the other 2 more “precise” methods.

One of my concerns was High Scoring environments. I was afraid OTS overestimated these, and that BaseRuns would prove
superior. so I took all games from 1984-2005 (I didn’t have the other games handy when I did this) where a team scored 15

runs. The results surprised me:

Games 15+ Runs 1984 - 2005

GAct ROTS RBase RunsExp Runs

7531240412564108099595
Ave 16.47 16.69 14.35 12.74

So it is doing better in higher scoring games. Obviously, it must be doing worse somewhere else. I know from Tango’s article

it does worse when there are a lot of HRs. BaseRuns adds the batter’s HRs separately, giving it an “advantage” in this area.

I also took ShutOuts during that time period

Estimating Runs on a Game Basis

Summary of GamesOTSBaseRunsExtrap Runs

Shut Outs 5,261 1,107 1,056 1,657
21.0 .1%31.5%

Extrapolated Runs was much better in predicting shutouts than BaseRuns or OTS.

But since BaseRUns does better in HRs, and no better overall, then it must not do as well in other categories.

Or try this : A*B/(B+C) +D = runs

runs = runners scored + home runs scored

home runs scored = D

subtracting from both sides I get

A*B/(B+C) = runs scored - home runs

If I take this against the 1957-2005 teams, as above, I get an error rate on BaseRuns of 3.3% - So, other than homeruns by

batters, baseruns doesn’t do as well as other methods.

The bottom line is that all of these methods are estimating runs. And they all come pretty close, and they all have their

plusses and minuses.

OTS overall is within 3 runs of the best measures.
OTS is much easier to show an interested person who didn’t score a 36 on his ACT in math.
OTS does just as well as other methods on a game by game basis; it’s no rougher than other measures.
OTS actually does better than other methods in high scoring games.


#24    Tangotiger      (see all posts) 2006/10/03 (Tue) @ 13:52

Sorry, Guy, but I guess I’m confused by this:

.87 * PA *(OBP*.43 + (SLG-.43)*.33). 

I read it to be:
.87 * PA * (a OBP + b SLG)

where a = lgOBP and b = lgSLG.

Did you mean to say
.87 * PA * OBP * (OBP*.43 + (SLG-.43)*.33).


#25    Tangotiger      (see all posts) 2006/10/03 (Tue) @ 13:59

dq, thanks for the presentation.

The only problem I have is when you take the known outcome (15+ runs, or shutouts), and then see which estimator does against it.  However, you cannot use the runs scored figure.  You must use the input variables, not the output variables.  If you want to look at only games where the OBP was under .100 or over .500, that’s ok.  But, you can’t select on the thing you are trying to measure.

Of course BaseRuns will never get an estimate of zero runs, if you don’t have a perfect game.  By definition, as soon as you get one batter on base, he has a chance to score.  (Of course, you are probably rounding that down to zero.) A linear equation will get alot of zero estimates, because it starts with a negative 2.7 run estimate for the perfect game.

Still, if you use the Ruane numbers, that should do better than XR, on the game-by-game dataset.  In fact, you can report the regression coefficients against your game-by-game dataset.  That should be highly interesting, and I would bet it would match the Ruane numbers.  XR would not have been created, if the PBP data had already existed.


#26    dq      (see all posts) 2006/10/03 (Tue) @ 14:48

I rounded to within .50; if you estimated .49 runs it is a shutout.

I did the experiment with 15 runs to represent an actual high run environment, and determine if OTS was good at the 15 run level. The best way to look at a 15 run environment is to look at actual 15 run games.

I’m a little confused about looking at the results to determine what is accurate. Isn’t one of the best ways to determine if a prediction/estimation is correct is to look at the results? If I want to see who predicted the AL races the best this year, dont I look to see who picked Oak,NYY, and DET ?

If I want to see how good the weatherman is at predicting 90 degree weather days, don’t I look at the 90 degree days and see what he predicted that day?

Looking at the 15 run games, I can tell BaseRuns was not as good as OTS in higher run scoring environments. I can’t tell you why; I didn’t test for that. This concerns me for using BaseRuns on high scoring environments.


#27    tangotiger      (see all posts) 2006/10/03 (Tue) @ 14:56

The reasons that teams scored 15 runs was a combination of their performance and the timing of their performance. 

Without question, if you look at all the games with shutouts, you will find that the OBP and SLG was far different with men on base than with bases empty.  And without question, if you look at all 15+ run games (in 9 innings), you will find a very high OBP and SLG with men on base, than bases empty.

What you should do is select those games with an OBP > .500, or SLG > 1.000, or whatever.  What will you find?  Again, without question, the SLG with bases empty and men on base will be ALMOST EQUAL.  And, this is what we want.  The question is: “how many runs are expected, if performance is randomly distributed throughout the game?”.


#28    tangotiger      (see all posts) 2006/10/03 (Tue) @ 14:58

The process I’ve outlined is the process I followed here:
http://www.tangotiger.net/rc3.html

And, it’s pretty clear how well BaseRuns held up in the 1974-1990 data.  I don’t see any reason to get different results using 1957-2005 data.


#29    dq      (see all posts) 2006/10/03 (Tue) @ 15:29

I read your article before I posted mine. BaseRuns holds up fine; it estimates runs just about as well as any other metric.

I didn’t get different results; I simply looked at the data different than you.
If a number of things are roughly equal, and one is better in some things (like games with home runs), it must be worse in others. Since baseruns splits out HR by batter, it should be the best at splits by HR. And that is how you viewed your data - by HRs.

If I group them by homeruns, Im sure BaseRuns will look similar to your data. It is a better measure than OTS in high slugging % games.

Parts of baseruns are better than any other method. As you explain, it makes a lot of sense, you get on base, you advance the base runners.

But if it measures HRs better than other methods, and is roughly the same, it must be worse in other areas.

If you take away the batter scoring on HRs, it will do worse than linear weight measures.

You are right by saying teams scored the number of runs partly based on the timing of their performance. 

I think most of the evidence shows there is no good way to predict that timing, and that past results in this area do by no means guarantee future success. Because of this, we probably won’t get much closer than the 2.6%.

You did an article stating the limit (.71 IIRC) of how well you can predict a batter’s performance.

I’m thinking the run estimation limit is going to be close to the 2.6%.

Which is what I keep coming back to - I can do a simple OBP * SLG * PA * .9044 and get within 3 runs of the best formulas.


#30    tangotiger      (see all posts) 2006/10/03 (Tue) @ 19:47

Actually, in that article, I also did it by OBP and by OPS, not just HR.

***

I’m not trying to say here that BsR is the best, or whatever. The main point is purely a technical one, and that is, under no circumstances, can you choose your sample by actual runs scored.  That’s the sole issue.


#31    dq      (see all posts) 2006/10/03 (Tue) @ 21:07

I don’t think I understand.

If I want to see how well a metric does in a 15 run environment; don’t I look at the games where 15 runs are scored? How else can I test to see if something works in a 15 run environment?

I see people change the weight of the variables based on the run environment. To me that is worse, changing variable based on the answer.


#32    tangotiger      (see all posts) 2006/10/03 (Tue) @ 21:11

dq, I hope that someone else can explain it better than I, because I don’t know how else to do it.

All I can suggest at this point is select games based on OBP, SLG, OPS, or anything else you want, EXCEPT runs scored.  You are committing a selective selection bias.


#33    Patriot      (see all posts) 2006/10/03 (Tue) @ 21:30

dq wrote:
“If I want to see how good the weatherman is at predicting 90 degree weather days, don’t I look at the 90 degree days and see what he predicted that day?”

What if I have one weatherman who predicts that it will be 90 degrees every day, be it on Christmas or on the Fourth of July.  And I have another one with a more realistic model.  And then I check who has the better accuracy on 90 degree days.  I might well find that the guy who picks exactly 90 every day is more accurate.  The guy with the good model might be closer on some days--sometimes he projects 94, and it’s 95, which is a smaller error then if he had picked 90 like the other guy.  But sometimes he picks 85, and that’s way off, and the 90 guy is never off by that much.

But the 90 guy has a biased model.  OTS is a biased model.  When OTS sees a high OBA coupled with a high SLG, it goes bonkers.

15 is a big number of runs.  What percentage of games have 15 runs scored in them?  It has to be pretty small.  In order to score 15, I’m guessing there’s a high OBA and a high SLG.  But also, I’m guessing that the team hit exceptionally well with runners in scoring position, or bunched together a bunch of hits, or other things that no run estimator, including OTS, accounts for.  I’ll lump these together as “luck”.

The 90 degree guy is dead wrong on Christmas.  The 90 degree guy is dead wrong a lot of days even in the summer, because some days in the summer it’s only 75 but he still thinks it’s going to be 90.

Summer = conditions that enable a 90 degree day, or in the OTS example, high OBA and high SLG.

But since OTS always overestimates when there is a high OBA and SLG, it may appear more accurate for games in which you have high OBA + high SLG + “luck”.  But if you looked at games with high OBA + high SLG without regards to luck, you might find a lot of cases where OTS is irrationally exuberant .  But by picking your sample based on runs, you are playing into the hands of being irrationally exuberant, since those 15 runs games probably don’t happen too often without “luck”.

What can I say, I tried.


#34    dq      (see all posts) 2006/10/03 (Tue) @ 21:43

Okay, but my weatherman doesnt predict 90 degrees every day. He’s only off by 3 degrees a day, while the other guy is off 2.6 degrees each day.

And I think my weatherman does a bad job at predicting 90 degree days. I don’t know if he is reliable when there are 90 degree days. I chose the 15 runs because I didn’t know how good OTS worked in a 15 run environment.

So if the conditions are really 90 degrees, I want to know if he could predict it.

And I find out he’s not doing that bad of job of it, compared to the other guy.

And I’m not that worried about .4 degrees difference.

And I also know I can explain my weatherman’s method in 2 minutes to someone.

Is there a better way to test something in a 15 run environment then to look at actual results that come from a 15 run environment?


#35    tangotiger      (see all posts) 2006/10/03 (Tue) @ 21:46

You don’t want a 15-run environment, you want a 15-run-TYPE environment.  Just choose a high OBP and high SLG.  That’s it. 

OBSERVED = TRUE + LUCK

In this case, runs is observed, and OBP and SLG is true, and timing (clutch) is luck.  You simply cannot select on observed. 

Please google selective sampling, selection bias, and selection criteria.


#36    dq      (see all posts) 2006/10/04 (Wed) @ 07:31

I understand the concepts; I didn’t explain well enough what I was testing.

My question was, “How well does OTS work in 15 + run games?” If I look at all 15 + run games, I get no sampling bias, as I am looking at 100% of the population. I did not ask “How well does OTS work in high scoring games?” If that was my question then obviously my sample of actual 15 run games would be incorrect.

Now, I obviously have to be very careful with the conclusions I draw from my testing.


#37    Tangotiger      (see all posts) 2006/10/04 (Wed) @ 07:46

dq: you explained what you were doing perfectly the first time.

If you treat your population as games that ended up scoring at least 15 runs, then you will not be able to draw any conclusions with what you are doing.

Focus more on the shutout games, as maybe that’s easier.  You select all your shutout games, and you make that your population.  Almost all games will have at least 1 runner on base.  Linear equations will give you, from your population of shutout games, an estimate of -3 runs to +3 runs or something.  BaseRuns will give you an estimate of 0 runs to 4 runs, or something.

What does that tell you?  Nothing!

What you care about it is your variables, your OBP and SLG.  The question you should be asking is: “given these variables, OBP and SLG, how well can I predict the number of runs that score?”

Instead, your question is: “given that no runs scored, how well did using OBP and SLG of those games predict 0 runs?”.  The question you are asking is not a very interesting question.  Certainly, its use is extremely limited.

It’s like taking the league leader in RBIs this year, and asking: “ok, which forecaster predicted this player will lead with this many RBIs?” Answer: NO ONE!

***

Another thing you can try: select all your shutout games, and report back to us the OPS of those games, and the standard deviation.  Say it’s an OPS of .400, with 1 SD = .100.

Then, what you should do, is select *all* games from 1957-2005 with an OPS of .300 to .500.  Tell us the average runs scored.  It’ll be something like 1.5 runs.

Now, use BaseRuns against your shutout population, and against this new dataset.  What’s going to happen?  BaseRuns will predict 1.5 runs for BOTH datasets.


#38    dq      (see all posts) 2006/10/04 (Wed) @ 08:51

I can’t believe I ever explained anything perfectly.

If my population is 15+ run games, then I can draw conclusions about 15+ run games.

The negative in LWeights tells me that Linear Weights doesn’t work at the end of the scale.

I already know the answer to how well runs can be predicted; we get 2.6 to 3% variance on the different methodologies. I’m pretty sure I won’t be able to do better than that.


#39    Tangotiger      (see all posts) 2006/10/04 (Wed) @ 09:28

Your 15+ run games are littered with teams that performed far better with men on base than bases empty.  Choosing your population based on Runs Scored is almost exactly like saying:
“Give me the 1000 highest OPS games, and exclude any of these high OPS games where the OPS with bases empty was less than with men on base”.

Same with shutouts.  “Give me the 10,000 lowest OPS games, and only include those where the OPS with men on base was less than .100”.

What kind of conclusions do we expect?


#40    Guy      (see all posts) 2006/10/04 (Wed) @ 22:09

Tango:
What I was trying to do above is remove the interaction of a player’s own OBP and SLG.  The idea would be to have his SLG interact with the lg avg OBP, and his OBP interact with lg avg SLG.  It sort of replicates the add-player-to-avg-lineup method in a crude way.  So, I’m replacing OBP*SLG in your equation with:
OBP*LgSLG
+ SLG*LgOBP
- LgOBS*LgSLG (to avoid double-counting)

Using .330 for LgOBP and .430 for LgSLG, that gives you:
.87 * PA * (OBP*.43 + SLG*.33 - .142). 
I think it’s on about the right scale:  for a 400/500 OBP/SLG player, it gives you a RC 16% lower than your formula.  Might be improved by adding a multiplier.....


#41    tangotiger      (see all posts) 2006/10/05 (Thu) @ 05:44

Guy, that’s my point.  It “looks” like you are interacting with the league, but you are not.  That term in the bracket is essentially OPS.


#42    Guy      (see all posts) 2006/10/05 (Thu) @ 07:16

I don’t think so.  Let’s look at players A) .350/.450 and B) .400/.600, with 600 PA. 

OPS says its .800 vs. 1.000, player B is 25% higher.

Tango says it’s 82 RC vs. 125 RC, player B is 52% higher. 

My formula says it’s 82RC vs. 119 RC, and player B is 45% higher. 

I don’t know if my formula tracks “real” RC or BsR well, but it’s definitely not producing an additive relationship the way OPS does. And as OBP or SLG grows, the interaction “penalty” increases, as it should.


#43    Tangotiger      (see all posts) 2006/10/05 (Thu) @ 07:22

Your formula is this:
.87 * PA * (OBP*.43 + SLG*.33 - .142). 

Which means:

a * PA * b *(OBP * 1.3 + SLG) + c

OBP*1.3 + SLG is a modified version of OPS.  I prefer 1.8*OBP+SLG.  But, that’s what you are doing.


#44    Guy      (see all posts) 2006/10/05 (Thu) @ 08:21

One last try.  How would you use (1.8*OBP + SLG) in this way?  There’s no constant mulitiplier that will work.  For a .350/.450 600 PA player, you would need to multiply PA * (1.8*OBP + SLG) by a factor of .1265 to get 82 RC.  But take that formula for a .400/.600 player and you get 100 RC, and for a .500/.700 player just 121 RC.  Clearly wrong, as we’d expect—the OPS-RC relationship isn’t linear. 

My formula doesn’t have that problem; it does seem to track RC in a reasonable way, at least in these few examples.  In any case, if you do test David S’s formula and it’s not too much trouble, see what mine gives you with the same data.


#45    Tangotiger      (see all posts) 2006/10/05 (Thu) @ 08:43

Guy, I would simply say that your version is this:

a * PA * (OBP * x + SLG) + d

All you are saying is that “x” should be 1.2 to 1.3 (which is lgSLG/lgOBP, which is of course *very stable*), and I’m saying that “x” is 1.8.  I am NOT saying that “d” should be zero, nor what “a” should be.

But, the heart of this equation is OPS-based.  It has nothing at all to do with “interacting with the league slugging”, or whatnot.


#46    Guy      (see all posts) 2006/10/05 (Thu) @ 09:37

Tango:  either I’m being dense or you’re being stubborn (or both!). Just try it, and see what you get. 

On the theory side, I think you’ve got the translation slightly wrong—it’s A * PA * (OBP * 1.3 + SLG - C).  In any case, I’m basically taking your formula of OBP*SLG and subtracting the product that results from a player’s “extra” performance above lg avg on both sides, or: 
OBP*SLG - ((OBP-LgOBP)*(SLG-LgSLG)). Can we agree this starting point does build on the core OBP*SLG concept?

That is the same as:
OBP*SLG - OBP*SLG + OBP*.43 + .33*SLG - .142 which reduces to my formula:
OBP*.43 - .33*SLG - .142.

OK, I promise to stop.


#47    Tangotiger      (see all posts) 2006/10/05 (Thu) @ 10:10

Guy, no offense, but I’m not being stubborn!  Actually, I am a little stubborn, and see my problem.

It doesn’t matter that you start with OBP*SLG, what matters is what you end up with.

a * PA * (OBP*1.3 + SLG - c) means therefore:

a * PA * (modifiedOPS) - ac * PA

So, that’s what your equation is.  It starts with a modified version of OPS (1.3 * OBP + SLG, as opposed to 1.8 * OBP + SLG), but then has a correction factor based on PA.

At the heart, your equation is OPS not OTS, even if you started off with OTS.  The wrinkle, which I missed until now, is your PA correction factor.

I don’t really want to debate the good or bad of it.  I intended this thread to say that the core part of RC and the core part of actual run scoring has OBP multiplied by something dynamic.


#48    Tangotiger      (see all posts) 2006/10/05 (Thu) @ 10:12

As an aside, the second post here shows various correlations to runs scored:
http://www.baseball-fever.com/showthread.php?t=48531

It further shows how little can be gained and learned from team stats, as OPS and 1.8*OBP+SLG are very similar in correlation to Runs.


#49    tangotiger      (see all posts) 2006/10/17 (Tue) @ 06:57

Just realized something.  Guy said this:
.87 * PA * (OBP*.43 + SLG*.33 - .142). 

If we multiply Guy’s equation by 3/3, we get:
.87 * PA * (1.3*OBP+SLG-.43) / 3

Multiply by 1.15/1.15, and we get:
PA * (1.3*OBP+SLG-.43)/4.45

Divide by 10 to get it into wins form:
PA * (1.3*OBP+SLG-.43)/(4.45*10)

which gives us:
.022 * PA * (1.3*OBP+SLG-.43)

If you remember, I introduced a way to convert a modified version of OPS into Wins, as:
.025 * PA * (1.7*OBP+SLG-1)

So, it’s the same form.


#50    Guy      (see all posts) 2006/10/17 (Tue) @ 08:52

I’m not spotting the math error, but you’re not translating the formula correctly. Let’s compare my formula to your translations of:
A) PA * (1.3*OBP+SLG-.43)/4.45 and
B) .022 * PA * (1.3*OBP+SLG-.43) * 10.

For Lance Berkman 2006, I get:
Guy 137
TangoA 114
TangoB 105

* * *

I actually think David’s original concern—that the muliplicative model overstates production for great hitters—isn’t such a big deal.  RC doesn’t account for the fact that these great hitters make fewer outs, and thus allow their teammates to produce more.  If you account for that, as something like BPro’s MLV does, you get values very close to Tango’s original formula even for someone like Pujols.  A handful of the most extreme performances, like Bonds 2000, will be overstated.  But for a shorthand method like this, I don’t think it’s worth making an adjustment to deal with .01% of player seasons.


#51    tangotiger      (see all posts) 2006/10/17 (Tue) @ 09:23

TangoA is 107.  You have TangoB correctly.  The difference between A and B is “.022*10=.220” and “1/4.45=.225”.  The rest of the equation is the same.

The ouch part is that 3*1.15 = 3.45, not 4.45, so the Guy-version is:
which gives us:
.029 * PA * (1.3*OBP+SLG-.43)

My OPSwins is:
.025 * PA * (1.7*OBP+SLG-1)
Which for Berkman is +5.4 wins above average.  Say about +55 runs above average, or around the 135-140 runs that Guy’s original equation is stating.


#52    Guy      (see all posts) 2006/10/17 (Tue) @ 09:50

BTW, I recently ran a regression with players’ 2006 MLVr as the dependent variable and OBP and SLG as the indep variables.  It gives me an equation of 2.13*OBP + 1.44*SLG (plus constant), suggesting a modified OPS of 1.5*OBP+SLG.  While I’m generally skeptical about regression, using the “real” run value as the dependent variable seems like it may be a reasonable way to figure out the relative contribution of OBP and SLG in the ranges players actually perform in. (Most regressions I’ve seen have used team data, and seems to provide estimates for OBP closer to 2x SLG). 

Breaking out BA, BB/PA, and ISO, I get:  3.75*BA + 1.38 ISO + 1.21 BB.  Not to deny the obvious superiority of OBP, but I think BA gets a bad rap in the sabermetric world—there’s a lot of information there.


#53    dq      (see all posts) 2006/10/17 (Tue) @ 10:19

If I take your model and apply it to a team I am way off - I get
A 583
B 571

For Berkman’s 06 Astros, who scored 735 runs.


#54    tangotiger      (see all posts) 2006/10/17 (Tue) @ 10:23

Is that the MLVr that uses Bill James’ Runs Created as its basis, which undervalues walks, and therefore would require OBP to be undervalued? 

http://www.stathead.com/articles/woolner/mlvdesc.htm

Thanks, not impressed.  You’d be better off figuring out the BaseRuns with/without the player and running the regression.

However, since most of your sample will be close to the league average, your certainty level won’t be so high for the very guys we care about.

That’s why PA*(1.8*OBP+SLG) is the best thing to use, for a linear expression: it actually matches the Linear Weights values.


#55    tangotiger      (see all posts) 2006/10/17 (Tue) @ 10:24

dq: You missed my post two posts ahead of yours.


#56    Guy      (see all posts) 2006/10/17 (Tue) @ 11:17

Tango:
Why do you feel MLV undervalues BBs?  It’s basically using your formula of OBP*SLG, and then calculating runs with and without a player.


#57    tangotiger      (see all posts) 2006/10/17 (Tue) @ 11:57

RC is AB*(OBP*SLG), mine is .87*PA*(OBP*SLG).  Using PA instead of AB compensates for the undervalue of the walk.


#58    Patriot      (see all posts) 2006/10/17 (Tue) @ 12:07

The run value of any event in RC is equal to:
a*(B/C) + b*(A/C) + c*(A/C)*(B/C)
where a, b, and c are the coefficients of a given event in the A, B, and C factors of RC, and A,B,and C are the league or team or whatever entity total in A, B, and C.

For basic RC, A/C is just (H+W)/(AB+W) and B/C is TB/(AB+W).  So for the 2006 AL, where A/C is .336 and B/C is .400, so a walk is worth:
1*.400 + 0*.336 - 1*.336*.400 = .266 runs

Hence the claim that it undervalues walks.

MLV confuses the matter a little bit, because they use a player’s OBA to measure the number of extra PA he would generate, and give him credit for the extra team runs that result.  But this doesn’t change the fact that the formula they use to figure out how many runs the team will score is undervaluing walks.


#59    Patriot      (see all posts) 2006/10/17 (Tue) @ 12:10

Sorry, that should be:
a*(B/C) + b*(A/C) - c*(A/C)*(B/C)

The example is correct, the first formula was not.


#60    Guy      (see all posts) 2006/10/17 (Tue) @ 12:44

Got it.  Thanks.  Still, looks like RC undervalues walks by around 10%.  BBs are only 20-25% of OBP.  So we’re saying MLV reduces the value of OBP by maybe 2%?  Doesn’t seem like that can explain the difference between an OBP weight of 1.5 and 1.8.


#61    tangotiger      (see all posts) 2006/10/17 (Tue) @ 13:26

PA AB Outs H 2B 3B HR BB BA OBP SLG
620 560 410 150 30 3 17 60 0.268 0.339 0.423

Create these three equations:
OPS: .275*(1.7*OBP+SLG-1)
RC2: .9*PA*OBP*SLG
RC1: AB*OBP*SLG

Roughly speaking, the first one will give you an average player with runs = 0, and the two RC equations will give you 80 runs created.  This is your baseline.

Now, let’s add 1 HR to see what we get.  That means, add 1 to your PA, AB, H, and HR values, and then compare to your baseline for that equation.  What happens?

The OPS equation adds +1.40 runs, RC2 gets +1.59 runs and RC1 gets +1.61 runs.

Instead, let’s add 1 walk.  What happens?

The OPS equation adds +.31 runs, the RC2 gets +.38 runs and RC1 gets +.25 runs.

Repeat for all categories.  Here’s what you get:
OPS RC2 RC1
0.48 0.58 0.59 1B
0.79 0.91 0.93 2B
1.09 1.25 1.27 3B
1.40 1.59 1.61 HR
0.31 0.38 0.25 BB

OPS has the closest to reality.  The run value of the walk is .17 below the single.  RC2 has the run value of the walk at .20 below the single, while RC1 is at .34 (!) runs below the single.

The run value of the HR in OPS is .92 above the single, which is great.  RC1 and RC2 is not bad at 1.01-1.02 above.

Compared to the OPS, RC2 is about 15-20% too high in each event.  RC1 is also about 15-20% too hight in each event, except the walk, which is 18% BELOW the correct figure.

Now, let’s get away from the league average, and use a great hitter.  To do that, just drop the “outs” to 300, and adjust AB and PA accordingly.  You’ll end up with this kind of hitter:
BA OBP SLG
0.333 0.412 0.527

Repeat the same “plus 1” method.  Here are the run values:

OPS RC2 RC1
0.48 0.67 0.69 1B
0.80 1.09 1.10 2B
1.11 1.52 1.51 3B
1.42 1.94 1.92 HR
0.34 0.47 0.27 BB

The OPS method keeps thing fairly static, as it should, especially since you realize that its the effect of a player onto a team.  But, look what happens to RC1 and RC2.  They start to really overshoot, especially the nonsensical HR.  This is why when you use “RC”, you should do “with and without” team.  But, you have to at least start with the correct base. 

Given the choice between RC2 and RC1, RC2 is much better, because it handles the walk properly, relative to the other events.


#62    tangotiger      (see all posts) 2006/10/17 (Tue) @ 13:27

That should read:
.275*(1.7*OBP+SLG-1)*PA


#63    tangotiger      (see all posts) 2006/10/17 (Tue) @ 13:46

I also repeated using for BaseRuns.  The “B” equation is:
.73*H + 1.2*2B+2.4*3B+HR+.05*BB
Which I multiplied by 1.15 to calibrate with the other equations, so that the total runs scored is also 80.0

Anyway, here’s the updated chart, using the “plus 1” method.
BsR OPS RC2 RC1
0.51 0.48 0.58 0.59 1B
0.80 0.79 0.91 0.93 2B
1.09 1.09 1.25 1.27 3B
1.42 1.40 1.59 1.61 HR
0.34 0.31 0.38 0.25 BB

Clearly BsR is much better than RC2 and RC1.

And then, repeating for that great hitter, this is what we get:
BsR OPS RC2 RC1
0.59 0.48 0.67 0.69 1B
0.92 0.80 1.09 1.10 2B
1.23 1.11 1.52 1.51 3B
1.46 1.42 1.94 1.92 HR
0.41 0.34 0.47 0.27 BB

Again, at the team level, BsR makes the most sense.  We know that the HR value cannot possibly approach 2.0 runs as RC1 and RC2 purport. 

I then ran it against my “perfect” simple-Markov model to see how much the run values should change between these two run environments.  The run value of the walk should increase by +.09 runs.  BsR and RC2 are close, RC1 and OPS are off.  OPS is off because it makes everything static.  RC1 is off because it doesn’t handle the walk properly.

The run value of the HR should increase by only +.03 runs between the two environments, according to the perfect model.  BsR nails it, and the RCx models blow it completely.


#64    Guy      (see all posts) 2006/10/18 (Wed) @ 08:18

In looking at individual players, I think there is an additional complication to deal with beyond the “interaction” effect.  The composition of OBP and SLG is not the same for all players.  For the best players, both their OBP and SLG is less valuable than average, while (ironically) the reverse is true for weak hitters. 

Within OBP, BA is more important in term of creating runs than BB rate.  Within SLG, BA is more important than ISO.  Here’s the ratio of BB rate to OBP for the top 30 and bottom 30 hitters this year (and average for all hitters with over 300 PAs):
Top30: .31
Avg:  .25
Bottom 30: .22

And here’s the ratio of ISO to SLG:
Top30:  .45
Avg:  .38
Bottom30: .30

So for great hitters, BA makes up a relatively lower percentage of both OBP and SLG than it does for weaker hitters.  So I would think that any formula that relies just on OBP and SLG will have a tendency to overstate production by good hitters and understate production by weak hitters.


#65    Tangotiger      (see all posts) 2006/10/18 (Wed) @ 08:31

I discuss the BA impact on OBP, SLG here:
http://www.tangotiger.net/ops2.html


#66    dq      (see all posts) 2006/10/18 (Wed) @ 15:31

The problem with OPS at the extremes, is that OBP & SLG interact with each other, and really should be multiplied. The problem with multiplying is that you get too big an answer at the high end.

Isn’t the answer to make the slugging percentage exponential? I ran obp * slg ^ .81 * pa * .758 by the jarvis data (teams 1957 - 2005) and got a 2.7% error rate; about as good as any other device. (I can get it to 2.6 if I add sb - cs, which is not my intent).

I took it by the incremental 620 pa .268/.423/.339 above and got the following for an incremental :

1b 0.53
2b 0.81
3b 1.08
hr 1.35
bb 0.38

I thought that was pretty good. I have a runs created formula, that comes pretty close on the incremental basis, and can also be calculated as a single rate stat. And I only need 2 stats (obp & slg) to get it.


#67    tangotiger      (see all posts) 2006/10/18 (Wed) @ 17:56

Interesting.  Can you run it against the great team (same positive values, and change the out value to 300).


#68    Patriot      (see all posts) 2006/10/18 (Wed) @ 19:13

I got .61,.94,1.27,1.60,.46,-.173 with the great team


#69    dq      (see all posts) 2006/10/18 (Wed) @ 19:33

I get very similar numbers :

.61,.93,1.25,1.58,.45,-.17

HR - Hit looks good,

Team gets 98.7 runs on 300 outs or 8.05 runs/game - 1,304 runs it’s beyond a great team -

Is this a bad answer, or should a hit & out mean more on a team that scores 8+ runs a game?? - My HR means more if they are more guys on base.


#70    dq      (see all posts) 2006/10/18 (Wed) @ 20:51

I think it does work - at a higher run environment the events are worth more.

I took all single games 1984-2005 where slgpct = .526 +/- .050 and obs = .412 +/- .050 -

The 8509 games had .526/.402 and an average of 7.10 runs

The following values work, which are 3.4% off my values above :

1b .59 (39,760 total)
2b .90 (18,513)
3b 1.21 (3,126)
hr 1.53 (17,107)
bb .43 (15,523)
outs -.164 (33,634)

Actual Runs = 60,417 in 8,509 games

So, I think this is working better than linear weights here.


#71    tangotiger      (see all posts) 2006/10/18 (Wed) @ 21:11

Yes, your values are pretty close to BaseRuns, which was close to the simple-Markov.  So, I think you’ve got something pretty good there.


#72    Patriot      (see all posts) 2006/10/18 (Wed) @ 21:55

Tango’s great team is a great team, but what if you had a real great team, a Babe Ruth team?  Ruth in 1920 hit .376/.530/.849 in 607 PA. The DQ method estimates 217 runs, while James’ RC is at 205, and the Cramer’s RC is at 238.  I got 193 with an old BsR version.

Anyway, the dq method here gives LW of .74,1.20,1.65,2.10,.67,-.39

Obviously the homer value is way too high.  My point is not to put down DQ’s method, which for all I know may turn out to be the best simple way to combine OBA and SLG.  Just to point that it’s hardly a substitute for the Markov or BsR at really extreme cases (which I probably didn’t have to do).


#73    Patriot      (see all posts) 2006/10/18 (Wed) @ 22:06

In that example, I thought it was interesting that DQ gave a higher estimate for Ruth then RC.  What is the cause of this?  Well, RC weights are:
.83,1.36,1.89,2.42,.30,-.34

DQ method gives more reasonable values for all of the hits.  The difference is in the walk value.  OBA*SLG*AB still doesn’t put the walk at more then .3 runs (about where it is in a normal context) at Babe Ruth levels!  But DQ puts it at 2/3 of a run, which is too high.  But with Ruth’s 150 walks, the difference bw/ DQ and BJ’s walk value is 55 runs.  So Bill gets his estimate closer to BsR by being worse in hit types, but getting lucky in that he pegs the walk low and Ruth has a ton of them, allowing his estimate to remain within shouting distance of BsR.  I’m not sure where the Cramer approach ends up blowing it here, and thanks to the economic causes of the development and downfall of the manorial system, I’m not going to find out tonight smile


#74    dq      (see all posts) 2006/10/19 (Thu) @ 04:51

At first glance, I thought 2.1 runs per hr might not be too bad, since Ruth’s obp was > 50%. But breaking it down, the problem with modeling 1920 Ruth is that a team of Ruths get 148 pas either leading off an inning (1/3 of 283 outs) or after a hr. That’s about 24%. In reality, a number 3 hitter would probably get around 8% leadoffs (3/4 * 1/9) plus 1-2% after an hr. Add 1% for hrs, and 1920 Ruth was actually probably around 9%. (Leadoff men are obviously well above this number.) The average player would get 11.1% leadoff, + 2-3% hr, call it 13% for simplicity.

So, I think part of the problem in the modeling a team of Ruths is the interaction. Ruth’ situational hitting is much different than what he would have seen; he’s close to being a leadoff man, whose hrs are worth less than the average player, since a large % of his abs are leading off.

Gotta run, thanks to the development (hopefully not the downfall) of 3 boys, one of whom may study some of the manorial system this year.


#75    Guy      (see all posts) 2006/10/19 (Thu) @ 12:03

I think one problem with any OBPxSLG formulation is that as BB rate increases, the true value of the SLG falls because it represents fewer TB (fewer AB for any given # of PA).  That’s why AB*SLG*OBP works pretty well:  it shortchanges walks, but that offsets the tendency to overvalue SLG for hitters with lots of walks.  In fact, if you take Tango’s original formula and replace SLG with TB/.89*PA (.89PA = average ABs), you basically end up with AB*SLG*OBP. 

So in terms of DQ’s formula, I wonder if it might work as well or better to use an exponent on OBP rather than SLG?  It’s the very high OBP players who the OBP*SLG formula will most overestimate.  At least I think that’s right....


#76    dq      (see all posts) 2006/10/19 (Thu) @ 21:36

The best I get so far is to make both exponential, with the factor around .84 for each; probably my best is:

(obs^0.82*slg^0.86* pa *0.65)

which puts the values at
.50,.79,1.07,1.36,.33,-.10

for the 268/423/339 team at 650 pa. I call it the best because it fit the incremental here fairly well.

For Babe Ruth 1920 I get 204 runs, with
.65,1.10,1.55,2.00,.52,-.32 - obviously not perfect, but not horrific for the extreme end of the scale.

If I mess with Ruth too much, I get worse answers for lower levels on the incremental basis.

IF you want a name I’ll guess I’ll call it OTSE, for ops times slugging exponential -

It appears a pretty reliable stat, and relatively simple - I think it appears to be a better rate stat than OPS


#77    tangotiger      (see all posts) 2006/10/20 (Fri) @ 07:39

At first glance, I thought 2.1 runs per hr might not be too bad, since Ruth’s obp was > 50%.

dq, I guess you are new around here.  The run value of the HR caps off at around 1.50, 1.60 or so.  If you haven’t, read the first three articles in the “Baseruns” box here:
http://www.tangotiger.net/

***

As for the Babe Ruth type team (in my case, I just set the “outs” to 150, and kept the rest the same), and the two RCx equations both give HR a run value of 2.6, 2.7, the non-walk coefficients are the same, while the walk is .71 in RC2 and .27 in RC1.  (BsR says .58).

Here’s how they stackup with this kind of team:
BsR. OPS. RC2. RC1. dq..
0.77 0.48 0.84 0.86 0.64 1B
1.10 0.81 1.47 1.44 1.09 2B
1.26 1.14 2.10 2.02 1.54 3B
1.48 1.47 2.73 2.61 1.99 HR
0.58 0.41 0.71 0.27 0.55 BB

The HR value is still too high for dq, but it’s not as egrgious as it is for the RC methods.


#78    tangotiger      (see all posts) 2006/10/20 (Fri) @ 07:59

If you want to play with another idea of the dq form: make the exponents dynamic.  For example, set the exponent for OBP to .75, and the exponent to SLG as:
1 - OBP/2.

As the OBP increases, the power of SLG decreases.

Multiply by PA and whatever constant you need to baseline.

So, just play around with the exponent until you get a decent match to the baseline and to the Ruthian team.


#79    dq      (see all posts) 2006/10/20 (Fri) @ 08:45

I think you probably have something, because I found that ots works best where the ratio of ob to slug was about 1.3 - there is a “best” mix of ob & slug (I kind of learned that about 30 years ago in a Strat O Matic League, when I thought obp was important, and got all these guys who would get on base but no one to drive them home.)

I tried playing with the difference from the 1.3 ratio, but it got too messy for my taste.

I was thinking about the softball or little league where the teams has 8 .500 single hitters and 1 hr hitter - I thought that the home run hitter might be worth 2 runs - It’s kind of like Jack Clark on the 87 Cards - one power hitter on a team with a lot of guys who get on base.

He would come up with an average of 1.5 runners and hit a hr for 2.5 runs - I know that’s a little high, but I dont have the time right now to figure it out.

The team without him generates about .91 runs an inning if I did my math right - the chances of getting 4 successes before 3 failures + chances of 5 successes before 3 failures, etc, that’s about .154 runs per pa - If homerun hitter works out to over 2.1 per hr, that’s 2.0 runs per hr

So the value of the hr might not max out at 1.6; I don’t know the upper limit.

The reason a team of Babe Ruth’s didn’t do that is because the other Babes are knocking in runs.


#80    tangotiger      (see all posts) 2006/10/20 (Fri) @ 11:27

The run value of the HR is the easiest to figure out:
HR = a * (1 - b) + 1
where a = number of baserunners on base, per HR
b = chance of runner scoring

In MLB, a is around .60, and b is around .30, so
HR = .6 * (1-.3) + 1 = 1.42

That’s the run value of the HR.  It’s pretty much as simple as that.

If say you have a league where the number of baserunners is about 1.00, and b = .50, then the run value of the HR is:
HR = 1.0 * (1-.5) + 1 = 1.50

See?  You’ll be hard-pressed to find a run value for a HR that’ll exceed 1.60, forget about 2.00 runs.

I highly recommend you read my “Runs Created” articles I pointed to earlier, and focus especially on the 2nd article, about “getting on” and “moving runners over”.


#81    tangotiger      (see all posts) 2006/10/20 (Fri) @ 14:17

The run value of the triple is:
T = a * (1 - b) + c
where c=chance of runner scoring from 3B
(a, b defined in earlier post)

***

Here are some useful rule of thumb equations.

c = 3 * OBP / ( 2 * OBP + 1)

So, if OBP is .3333, then c=60%.  When OBP=.500, c=75%.  When OBP=.250, c=50%.

b = .90*OBP

a = 1.8*OBP

***

Under these rules of thumb, here are the run values for the triple and HR, under various OBP levels:

OBP.. HR.. 3B..
0.001 1.00 0.00
0.050 1.09 0.22
0.150 1.23 0.58
0.250 1.35 0.85
0.350 1.43 1.05
0.450 1.48 1.19
0.550 1.50 1.29
0.650 1.49 1.33
0.750 1.44 1.34
0.850 1.36 1.30
0.950 1.25 1.23
0.999 1.18 1.18


#82    dq      (see all posts) 2006/10/22 (Sun) @ 20:42

The equations will be very helpful to me. Do you have similar ones for doubles, singles and walks?

Thanks


#83    tangotiger      (see all posts) 2006/10/26 (Thu) @ 08:06

THIS POST FROM “dq” AND MOVED TO THIS THREAD.

-------------------------------------------

OTSE

Onbase Times slugging, exponential.

The number that works for teams from 1957-2005 is pa * obp ^ .85 * slg ^ (1-obp)/2 * .652

Offense increases as obp increases, but it is exponential.

Offense also increases as slugging increases, but the impact of slugging is somewhat impacted by the base advancement of the onbase percentage.

I took the games from 1974-1990 from Jarvis’s site (the same dataset Tango used in his runs created; I had to change the constant r to .627) and got this:

Slug Games Act Runs OTSE Runs

All 69,126 4.27 4.26
950 + 19 17.42 17.49
850-949 100 12.89 13.62
750-849 731 10.85 11.11
650-749 1,798 9.43 9.44
550-649 5,732 7.78 7.76
450-549 12,011 6.09 6.01
350-449 18,112 4.47 4.42
250-349 18,204 2.94 2.94
150-249 9,534 1.57 1.65
50-149 2,679 0.63 0.68
49 - 207 0.17 0.12

OBP

All 69,126 4.27 4.26
550-649 106 14.54 14.57
450-549 4,011 9.53 9.55
350-449 22,162 6.02 6.04
250-349 31,774 3.35 3.34
150-249 9,648 1.55 1.46
50-249 1,375 0.57 0.47
49 - 50 0.58 0.66

Works at all levels of slug and obp. Works at 950 + slugging and 600 obp. Works at 50-149 slugging and 50-149 obp. You only need 3 stats - pa,obp,slugpct. It can be expressed as a rate stat, and also used to calculate runs scored.


#84    tangotiger      (see all posts) 2006/10/26 (Thu) @ 08:07

THIS POST FROM “Mike Humphreys” AND MOVED TO THIS THREAD

---------------------------------

Elegant.  I’m gonna keep this link.

Wonder how this ‘maps’ onto the much more complicated conditional probability model of Cover? in 1960, described by Jim Albert in two of his books.


#85    tangotiger      (see all posts) 2006/10/26 (Thu) @ 08:13

dq, when I suggested making it dynamic with (1-OBP)/2, I was just doing some trial and error.  I didn’t really expect that part to be that precise.

When you did your testing, were you trying different forms of (1-OBP)/x ?  Or did you just set the “x=2”, and proceeded with just looking for the exponent for the other term?


#86    dq      (see all posts) 2006/10/26 (Thu) @ 09:48

I tried some other variations, but the 2 factor worked well.

The obs for the Jarvis data is .326 ,
1 - .326/2 = .837. 

If you look at my posts above, the exponents worked well around the .84 figure. I also found I had problems when the ratio of obs vs slug went up or down.

From what little I’ve played with it; I think the 2 factor will have a little trouble at the extreme level of obs.

I wasn’t solving for one answer; I was trying to get an equation that worked for a range of values. I decided to keep it simple at 2; I would imagine someone who is better at higher math these days than me can come up with a better answer.

Part of my goal was to keep it simple (relatively). My real hard math days are long in the past.


#87    tangotiger      (see all posts) 2006/10/26 (Thu) @ 10:14

btw, in your post, you said:

The number that works for teams from 1957-2005 is pa * obp ^ .85 * slg ^ (1-obp)/2 * .652

You obviously meant:

pa * obp ^ .85 * slg ^ (1-obp/2) * .652


#88    tangotiger      (see all posts) 2006/10/26 (Thu) @ 10:31

Here are the run values based on BsR, my version of the dynamic exponent, and dq’s version.  The difference between mine and dq is that I use .75, and he uses .85.  His .85 is best-fitted against actual team data, while I try to fit mine against BsR.

BsR tango dq
0.50 0.52 0.54 1B
0.80 0.80 0.82 2B
1.09 1.08 1.10 3B
1.43 1.36 1.38 HR
0.33 0.35 0.38 BB
-0.10 -0.11 -0.12 out

Fairly close.

And with a crappy team (.196/.255/.310):
BsR tango dq
0.39 0.44 0.45 1B
0.64 0.67 0.67 2B
0.98 0.90 0.90 3B
1.36 1.14 1.13 HR
0.25 0.28 0.29 BB
-0.06-0.06-0.07out

With a powerhouse (.354/.434/.560), here’s what it looks like:
BsR tango dq
0.62 0.57 0.62 1B
0.95 0.90 0.95 2B
1.18 1.22 1.28 3B
1.48 1.54 1.61 HR
0.43 0.43 0.47 BB
-0.17-0.16-0.18out

At a level of .999/.999/1.580, we get:
BsR tango dq
1.00 0.56 0.62 1B
1.00 0.89 0.99 2B
1.00 1.21 1.35 3B
1.00 1.54 1.72 HR
1.00 0.75 0.83 BB
-0.98-0.17-0.27out

As you can see, the exponential process really pretty much flattens the curve, which is a great benefit over James’ RC.

Either dq’s exponent or mine can be used.


#89    Guy      (see all posts) 2006/10/26 (Thu) @ 11:27

I’m confused by the SLG exponent.  As written, the higher the OBP the greater the run value of any given SLG.  I think we want the reverse:  as OBP rises, the value of SLG is reduced since it actually represents fewer TBs over a given number of PAs.  Or am I thinking about this incorrectly?


#90    tangotiger      (see all posts) 2006/10/26 (Thu) @ 12:07

When OBP = 0, the SLG exponent is 1.0. 
When OBP = 1, the SLG exponent is 0.5.


#91    Guy      (see all posts) 2006/10/26 (Thu) @ 13:35

Yes, but since SLG is <1, smaller exponent = more runs, right?


#92    dq      (see all posts) 2006/10/26 (Thu) @ 14:06

It’s the square root of a number less than 1. The square of a number less than 1, is smaller than x, the square root of a number less than 1 is greater than x.

If OBP is 1,

slug pct of .600 is .6^.5 or .774
slug pct of .800 is .8^.5 or .894

So, the higher the slugging pct, the better, but it’s not linear, because the higher obp moves runners over.


#93    Guy      (see all posts) 2006/10/26 (Thu) @ 18:34

What I’m suggesting is that a high OBP should reduce the value of SLG, and vice-versa.  The higher the OBP, the fewer AB for any given number of PA, so SLG represents fewer TB.  Perhaps a SLG exponent of something like (.5 + OBP) would marginally increase accuracy.  (or .4 +OBP in Tango’s formula)


#94    tangotiger      (see all posts) 2006/10/26 (Thu) @ 19:47

Trying a form of Guy’s, I set the exponent for OBP to 1, and for SLG at 0.7 + OBP/2. At a league-average, I get near BsR-type weights, so that’s good.  At the great-hitter level, like before, the HR weight comes in too high (1.75, BsR says 1.48, and dq’s is 1.61).  Walk rate is better with Guy.  At the crappy hitter level, both are similar.

At the .999 OBP level, the run value of the HR is 4.95.  The walk value is almost triple the single.


#95    Guy      (see all posts) 2006/10/27 (Fri) @ 06:43

Tango:  Why are you interested in how these metrics perform in extreme environments that won’t ever happen in MLB?  Is it just intellectual curiosity, or do you think performing better at the extremes also indicates more accuracy in the “normal” range?


#96    tangotiger      (see all posts) 2006/10/27 (Fri) @ 11:33

MLB is not everyone’s concern.  I get requests for softball, etc.  In any case, if you have two similar things, and one works in the normal and the extreme, and the other only in the normal, why not go for the first one?

As well, if people want to use a run estimator for a .350/.450/.650 guy, even if they shouldn’t, might as well keep him close to reality.


#97    Tangotiger      (see all posts) 2006/11/02 (Thu) @ 13:41

http://www.baseball-fever.com/showthread.php?t=48531&page=2

dq’s version has the best correlation of all run estimators measured by “ubiquitous”, using any combination of BA, OBP, SLG, AB, PA, for years 1962-2003.


#98    dq      (see all posts) 2006/11/03 (Fri) @ 06:58

Thanks for your help in the process. Hopefully we made a useful tool out of this.


#99    Guy      (see all posts) 2006/11/15 (Wed) @ 17:40

I think we can do a little better if we’re allowed to use BA and AB in the formula. The ability to break down OBP between Hs and BB/HBPs increases accuracy.  Using a random sampling of teams over past 40 years, and measuring metrics against R per 162 games (162*25.35 outs), I find this formula does slightly better than PA*SLG*OBP or DQ’s formula, both in terms of both correlation and average error:
R = 103 + .799*PA*BA*SLG + .448*(OBP*PA-BA*AB)*SLG.
[note: (OBP*PA - BA*AB) essentially is BB+HBP]

To get rid of the constant, make it:
0.0167*PA + .799*PA*BA*SLG + .448*(OBP*PA-BA*AB)*SLG.

Of course, PA*OBP*SLG is far more elegant and nearly as good.


#100    Guy      (see all posts) 2006/11/15 (Wed) @ 19:27

Actually, we can make this a little more simple without sacrificing much accuracy by using (OBP-BA) to stand in for the non-hit component of OBP:

R = PA*.0156 + .838*PA*BA*SLG + .535*PA*(OBP-BA)*SLG,

OR

R = PA * (.0156 + .838*BA*SLG + .535*(OBP-BA)*SLG)


#101    Rally      (see all posts) 2006/11/15 (Wed) @ 21:01

The short, stuck without a calculator formula has still got to be simply total bases X OBP.

You guys have put so much complexity into finding a more accurate formula that you might as well just use base runs.


#102    Guy      (see all posts) 2006/11/16 (Thu) @ 07:27

Obviously, this is just for fun.  But since everyone uses spreadsheets to do these kinds of calculations now, simplicity isn’t as big a deal as it used to be.

For a simple shorthand, .87*PA*SLG*OBP is better.

Base runs requires more information (HRs), and has to be customized to historical era.


#103    dq      (see all posts) 2006/11/16 (Thu) @ 17:23

You use ab * ba, I simply made it hits.

I think the .87 * PA * SLG * OBP is better than OPS, which I see used so much.

Im also slightly off here - I get a correlation of .9716 and total error of 24,217 from 1957-2005 using your form, and .9730 and 22,304 using what I had.

On a team season basis, different methods do get better correlations than that, and better error numbers than what I did.


#104    Tangotiger      (see all posts) 2006/11/16 (Thu) @ 17:39

I agree this is all for fun, and if you want to be OBP, SLG centric, then how best to combine them. 

OPS is horrible as is TB*OBP (i.e., AB*SLG*OBP) because of their severe bias against walkers.  (Horrible from the pure technical sense of undervaluing walks by at least .05 runs.) Even VORP uses TB*OBP as its basis, which is inexcusable.

For single players, use anything, and it won’t really matter, because you are giving yourself a 5 or 10 run error buffer.  Heck, even R+RBI-HR-AB/10 will serve your purpose for single players. 

If you are going to go to the trouble of putting stuff in a spreadsheet, and want to calculate something quick, then there’s no reason to use TB*OBP, and better to go with my construction or dq’s.

And, if you are serious, then BaseRuns or LWTS is what you want.


#105    Guy      (see all posts) 2006/11/16 (Thu) @ 17:39

DQ:  I don’t follow.  If you’re not using a team-season basis, what are you looking at? 

Another possible difference: I was looking at runs/out, rather than just runs.


#106    DQ      (see all posts) 2006/11/16 (Thu) @ 19:00

Tango- Why is LWTS better? Isn’t it linear, and have the same problems that most of the other methods have?

Guy - I would like something that works in all situations to predict runs - game level, season level, etc.


#107    Guy      (see all posts) 2006/11/16 (Thu) @ 21:41

DQ:  When you say you get a correlation of .973, what is the unit of analysis, if not team seasons?


#108    dq      (see all posts) 2006/11/17 (Fri) @ 11:19

The .973 is for a season. There are a lot of different formulas that get about a .970 regression with about a 2.7-3.0% error for a season.

I think for a new formula to be significant, it needs to do at least 1 of 3 things:

1. Model game, player, and incremental events well. Most of the formulas are linear, and work within the contents of a normal team and season.
They don’t work well at the extremes,which is where you find Ruth/Bonds/Williams.
One of the advantages of base runs is that it works well at smaller levels.

2. Be more accurate than the .970/2.7% for a season (seems unlikely to me, but what do I know?)

3. Be simple, such as OPS is - OPS isn’t as accurate as other stats, but as gained widespread acceptance because it is simple, and people can understand it without being blown away by the formulas. The problem with the simple ones, like OBP * SLG is they fail at the extremes in part 1.

I think Tango started this thread with a 3, and during the course of it I learned the importance of 1.


#109    John Beamer      (see all posts) 2006/11/17 (Fri) @ 11:52

Why is LWTS better? Isn’t it linear, and have the same problems that most of the other methods have?

I think Tango is referring to Markov. Since this models run scoring perfectly irrespective of the environment it is 100% accurate (subject to implementation). If you want to model Bonds within a team environment then you need a Markov that can handle at least two different batter profiles (1 for bonds and 1x8 for an average profile)


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 20 01:43
Sabermetric Moves of the 2009 Pre-Season

Nov 20 09:12
David G. checks in again on whether experience matters in the post-season

Nov 20 04:02
Nate Silver: hero to interviewers

Nov 20 02:01
My 1B is better than your 1B

Nov 20 00:26
MLB logo

Nov 19 23:03
NBA’s Marcel

Nov 19 19:13
Offense by position groups by decade

Nov 19 17:32
Changes in home run rates during the Retrosheet years

Nov 19 16:40
One Year and One Million Hits Later

Nov 19 16:22
Soria as a starter?