THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, April 09, 2009

Testing run estimators

By Tangotiger, 09:28 AM

Colin gives it a go (part 1).

Surprise!  BaseRuns demolishes the field. 

All the other linear estimators are all pretty similar.  Not sure how he tuned each equation on an annual basis (team, game, or inning basis?).  wOBA, reg, and House should all come out pretty much the same.  It also depends on how he handled IBB, since wOBA explicitly ignores it as an event.


#1    Colin Wyers      (see all posts) 2009/04/09 (Thu) @ 11:25

I made a mistake when I was typing in the formulas, and IBB was counted as a BB for wOBA during the trial. I just fixed it and it doesn’t seem significant - MAE drops to .45 from .46, all else stays the same. I’ll try to get that fixed. Reg does consider the IBB as a BB, and that’s fine for Reg. (I don’t like him much anyway.) House does include IBB as a seperate weight.

For the tuning - the basic formula goes something like:

(2 * (X/lgX) -1) * R/PA * PA

Where R/PA and lgX are based on the average in that year.


#2    TangoTiger      (see all posts) 2009/04/09 (Thu) @ 12:25

The IBB needs to be treated the same for all the equations: either it is treated as a non-event, it is treated distinct, or it is treated as a walk.

wOBA treats it as a non-event (removed from both the numerator and denominator).  For your tests, that’s clearly wrong.

***

As for the conversion of wOBA to runs, I’m not sure that is correct.  It works for players because of the outs, PA issue.  But, for team-innings, it won’t work like that.

wOBA should yield results that are virtually identical to House.  Compare those two when the inning OBP is over .500.  If you get similar results, then you are probably doing the conversion correctly.  If not, then the conversion as you are doing it is incorrect.

***

It would be good if you can show all your work in terms of how you do the conversions for each one that is not already denominated in runs.


#3    Colin Wyers      (see all posts) 2009/04/09 (Thu) @ 12:51

I should be more explicit - that formula is for the rates, like OPS and GPA. The formula for wOBA is:

((@wOBA-wOBA)/wOBAscale+R_PA)*PA

Where @wOBA is inning wOBA, wOBA is average wOBA for that season, wOBAscale is the scaling factor for that season, and R_PA is average runs per PA for that season.

(There’s a typo in the formula listed in the article, but not in the actual SQL - I confused R/O and R/PA when I was writing that.)

MAE between wOBA runs and House runs at OBP >= .500 is .14.

I don’t know what this means:

“wOBA treats it as a non-event (removed from both the numerator and denominator).  For your tests, that’s clearly wrong.”

PA includes IBB, does it not? So it should be included in the denominator.


#4    Colin Wyers      (see all posts) 2009/04/09 (Thu) @ 13:19

And that pretty much sums up the difference between wOBA and House, actually - wOBA is R/PA, House is R/O. I don’t have the time between now and work, but tonight I’ll try and redo House as R/PA LWTS and see if that makes it and wOBA line up a bit more.


#5    dan      (see all posts) 2009/04/09 (Thu) @ 14:17

Colin says:

((wOBA-lgwOBA)/1.15+.18)*PA

Was the .18 after 1.15 always there? In a previous thread (click name), you say:

“You are right though, that I should have also said that if you subtract the league average OBP from wOBA, and divide by 1.15, you get the player’s runs above average, per PA.”


No mention of +.18

#6    Colin Wyers      (see all posts) 2009/04/09 (Thu) @ 14:22

.18 is a typo, and is supposed to be .12.

Note “runs above average, per PA.” .12 is the average runs per PA. So we’re all square here.


#7    TangoTiger      (see all posts) 2009/04/09 (Thu) @ 14:45

wOBA makes sense on a player level, but not on an inning level.

You can’t just add +.12 runs per PA and figure that it’s all going to work out fine.  Indeed, one of Bill James complaints against Linear Weights 25 years ago was in his inability to sync up LWTS against the team data.  So, you need to be very careful here that you don’t do the same error.

I can go from wOBA to LWTS to wOBA, and ensure that everything adds up perfectly (at the player level).  I posted the SQL for that, and Fangraphs uses this.  This is why I am saying that wOBA and LWTS need to report identical (or nearly identical) results.  If not, then there’s a conversion problem.

***

“PA includes IBB, does it not? So it should be included in the denominator. “

In wOBA, IBB and sac bunts are excluded from the numerator and denominator, treating these as non-events.

In the conversion from wOBA to runs, you do include these two events in the “PA” term (but never in the denominator term).


#8    TangoTiger      (see all posts) 2009/04/09 (Thu) @ 14:47

Dan: that’s runs above average.  To get total runs (for a player), you add +.12 runs per PA.

But, for a team (or inning), you add +.28 runs per out.

So, I have some questions on how wOBA is being evaluated here.


#9    TangoTiger      (see all posts) 2009/04/09 (Thu) @ 14:49

Here’s the scenario to understand to make sure that things work:
http://tangotiger.net/reconcile.html


#10    Tangotiger      (see all posts) 2009/04/09 (Thu) @ 14:57

Colin/6 was marked for moderation and is now open.


#11    Patriot      (see all posts) 2009/04/09 (Thu) @ 17:45

What is the point of evaluating wOBA at all in these terms?  Unless I have completely misinterpreted everything that Tango has written on the topic, wOBA is just the linear weight coefficients, with the out value added in, rescaled to equal OBA.

So basically, it’s just a way of expressing the linear weight values on a different scale, one that corresponds to OBA.  Trying to convert wOBA back into runs is unnecessary then.  It would be like taking EqA, after you’ve applied the formula (EqR/O/5)^.4, and trying to see how best to estimate runs based on those results while tying your hands and not reverse-solving the equation to give you EqR.

Since wOBA is just a restatement of a particular set of linear weights, it’s “real” accuracy is no more and no less than the accuracy of those weights are.  Why bother trying to estimate runs off of the cosmetic adjustment version of a pre-existing runs formula?


#12    Colin Wyers      (see all posts) 2009/04/09 (Thu) @ 18:47

Patriot:

The way I tested it (unless I’m mistaken) should be exactly equal to testing using the weights used to generate the wOBA formula. In short, I AM solving it in reverse.

Testing wOBA in this way is mostly a function of two things:

1) Giving the wOBA concept in general a little more exposure (just as BaseRuns is in the test simply so more people get a look at it).
2) Testing the underlying weights involved.

Tom - I’ll look at doing wOBA as R/O later tonight. For the purposes of these tests, how would you prefer to see IBB handled?


#13    terpsfan101      (see all posts) 2009/04/09 (Thu) @ 18:49

Colin, your “house weights” are based on R/O, so you need to add the average R/O or empirical R/O to the caught stealing to be consistent. Tango, you can include whatever events you want to in wOBA. For instance, I have a version that includes GIDP’s. Your version of wOBA doesn’t count IBB as an event, and I agree.


#14    terpsfan101      (see all posts) 2009/04/09 (Thu) @ 18:56

Colin, the best way to handle the IBB and SH is to treat those plate appearances as an average plate appearance for that batter. For instance, let’s say you have a batter who is +30 LWTS in 600 PA’s, not including his SH and IBB. The +30 LWTS does not include the run-value of his IBB and SH. The batter has 10 IBB and 5 SH. So he has .05 LW/PA and 615 total PA’s. His total LWTS are now 30.75 (.05*615).


#15    Tangotiger      (see all posts) 2009/04/09 (Thu) @ 19:27

For Colin’s purposes, he’s testing against actual runs scored in that inning.  So, the IBB needs to count for something.

To satisfy Colin’s test, wOBA should be based on the Linear Weights values that you sets the IBB value at about half the run value of a regular BB.

I’d probably say that it should have a weight of around 0.50.


#16    terpsfan101      (see all posts) 2009/04/09 (Thu) @ 19:39

Do I ever say anything relevant? I am always a mile behind you guys. Colin still needs to add the R/O to the caught-stealing in his house weights.


#17    dan      (see all posts) 2009/04/09 (Thu) @ 20:56

Thanks for clearing that up


#18    Peter Jensen      (see all posts) 2009/04/09 (Thu) @ 22:02

What exactly is the point of this exercise?  At the end of it all you may know what is the run estimator at estimating runs per half inning.  But what good is that?  It has absolutely no relevance as to what is the best run estimator for estimating the offensive or defensive contributions of individual players.


#19    Patriot      (see all posts) 2009/04/09 (Thu) @ 22:10

Sorry, Colin--I missed for missing the actual conversion you used for wOBA--I was distracted by the formula in post #1.

My point, restated, is that wOBA is not a run estimator at all, it is a repackaging of LW (and you did a good job of explaining this in the article).  But including it in this kind of survey obfuscates that, making it appear to the average reader like just another method with specific weights.


#20    MGL      (see all posts) 2009/04/10 (Fri) @ 05:24

I have to agree with Peter here. Lwts, or at least a customized version for the run environment and the rest of the lineup, is perfect for estimating a player’s run contribution, and BaseRuns, by definition, is perfect for team scoring.  Why do we care about anything else?  I suppose we might care how some other metric, especially an easier one to compute, stacks up, but hasn’t that been done a million times before?


#21    terpsfan101      (see all posts) 2009/04/10 (Fri) @ 08:28

I guess Colin is putting each run estimator through the most rigorous test, which is the inning-level. Every other accuracy study uses seasonal totals and is subject to aggregation bias. For instance, if you only use seasonal totals, the accuracy of Runs Created and Baseruns are comparable. But on the inning-level, Baseruns does much better than Runs Created.


#22    Tom N.      (see all posts) 2009/04/10 (Fri) @ 09:12

Hey guys...sorry if this a little naive, but I was just wondering why Colin was so negative toward the linear regression approach? Or was he just saying that he didn’t pick a good model, so the coefficients were probably poor? I’m a little confused…


#23    Patriot      (see all posts) 2009/04/10 (Fri) @ 09:50

Tom N., read the article linked under my name by John Beamer at the Hardball Times.  It is a great summary of why regression is of questionable value for this sort of problem.


#24    Tangotiger      (see all posts) 2009/04/10 (Fri) @ 10:10

I applaud the effort that Colin is going through, if for no other reason that it’ll stop everyone else from doing it.

He also recognizes that we should be using the same inputs for each metric, so that one doesn’t have a leg up on the other.  Looking now though, I also see that the SB, CS issue was not necessarily resolved either, in addition to the IBB.

I object only in his method of converting to runs those stats that are not run-based.  First, those metrics were not designed to capture team run totals, but individual player impact.  Secondly, even if he wanted to, it’s not necessarily a straight-forward method to do the conversion, because of the well-known Outs-PA issue.


#25    Tangotiger      (see all posts) 2009/04/10 (Fri) @ 10:21

Here’s another way to show Linear Weights:
http://www.tangotiger.net/lwr.html

It’s always the same equation, but recast to fit whatever scale is needed.


#26    Peter Jensen      (see all posts) 2009/04/10 (Fri) @ 10:49

I applaud the effort that Colin is going through, if for no other reason that it’ll stop everyone else from doing it.

Tango - The danger is that when Colin is all through and anoints one method as having better MAE or RSME or R than the rest of them, that terps and others like him are going to consider it a “rigorous” test that has some meaning, just as many people did 25 years ago when James tested RC against team totals.  But neither Colin’s half innings nor James team years are going to show which run estimator most accurately assigns run values to players’ individual performances.  Run Value Added + (.179 * outs) exactly equals runs scored in a half inning, but that doesn’t make RVA the best estimator for players.  For that matter so does runs scored, and RBIs comes pretty close, but they are obviously trivial.  I can not imagine any study that would require me to estimate the number of runs scored in a half inning from individual events.  If the results have no usefulness then why go through the process?


#27    Tangotiger      (see all posts) 2009/04/10 (Fri) @ 10:49

Post/23: marked for moderation and now open.


#28    Tom N.      (see all posts) 2009/04/10 (Fri) @ 11:06

Ah ok, thanks Patriot. That was very helpful


#29    Colin Wyers      (see all posts) 2009/04/10 (Fri) @ 13:19

From the article, Peter:

“Looking only at the linear run estimators, there isn’t a lot to differentiate any one from the other. Once you tune a linear offensive measure to the particular run environment (which is true for all of these measures), there is very little to differentiate them from one another in these tests.”

I think that we’re very much in agreement here, actually - that looking at R, RMSE and MAE at the aggregate level DOESN’T tell us what the better linear model for run estimation is. That was, in fact, my conclusion. (The XR and EqR studies stand out to me in declaring their run estimators to be better than alternatives based largely on the fact that they were tuned to the sample being tested upon, unlike the other estimators involved.)


#30    Tangotiger      (see all posts) 2009/04/10 (Fri) @ 13:33

Peter, it seems apparent that someone will go through these steps, regardless if Colin continues or not.

Let’s see what he’s going to do in part 2, to see where he’s going with this.

I agree with you that the extrapolation business is a huge issue, and I’m always on the lookout for that.


#31    Colin Wyers      (see all posts) 2009/04/10 (Fri) @ 13:46

I can’t get too specific about where I’m going with it, because I’m still working on it and therefore don’t know for certain. What I can say is that right now I’m testing based upon matched pairs - I’ve just discoved that games works much better than innings for this purpose.

Basically, I look for all the games with an identical number of singles, doubles, triples, home runs, etc. - but one more of whatever component I’m testing. So it’s a derivative of the plus-1 testing that you like to do, but with the added benefit of having an objective standard to measure against.


#32    Colin Wyers      (see all posts) 2009/04/12 (Sun) @ 15:41

I can’t get the wOBA-to-R/O conversion working properly; I am fully willing to accept that this is my fault. I can skip the weights-to-wOBA-to-runs step and use the base weights and everything works out just fine, so I’m not worried about it.

And I’ve decided to ignore the IBB issue, because the way the matched pairs are set up, it doesn’t become an issue - in all tests there are an identical number of IBB between the two, so the wOBA weight for an IBB could be -30 or 1 million and it wouldn’t matter.

I still have to address some sampling issues with the 2B/3B and SB/CS, but so far wOBA, House and BsR are all practically identical at the top tier of results. GPA and EqR aren’t far behind. OPS, OPS+, RC and Reg are all a tier below that, and TA is in a class of bad all by itself.


#33    Colin Wyers      (see all posts) 2009/04/12 (Sun) @ 22:53

I’m an idiot - matched pairs won’t work for SB/CS, because the run environment governs the frequency of the SB attempt.


#34    weskelton      (see all posts) 2009/04/13 (Mon) @ 13:59

Colin,

What is it about your matched pairs approach that makes it not work as well at the inning level.  Innings seems like it would be much more desirable as a HR in the 5th inning has zero impact on a single in the first.


#35    Colin Wyers      (see all posts) 2009/04/13 (Mon) @ 15:15

Because in over 95% of the matched pairs, zero runs score in both cases. Essentially, every 1-2-3 inning matches up with every inning where a runner walks and is stranded and it just washes out the entire rest of the sample.


#36    Tangotiger      (see all posts) 2009/04/16 (Thu) @ 10:16

Colin comes in for part 2:
http://www.hardballtimes.com/main/article/the-great-run-estimator-shootout-part-2/

This to me is the sole reason not to use regressions to create a run estimator:

Since I changed the dataset in use, I ran another regression to estimate the regression weights, which are now:

0.53*1B 0.61*2B 1.23*3B 1.46*HR 0.34*BB 0.31*HBP -.11*IBB 0.18*SB 0.05*CS -0.10*Outs

The run value of a double is .08 more than a single?  The triple is .62 more than a double?  The CS out is less costly than a batting out, even though it wipes a runner from the base?

(I presume the negative for IBB is because it is already included in the regular walk, and so, it’s really +.23. )


#37    Rally      (see all posts) 2009/04/16 (Thu) @ 11:11

I found it a bit confusing.  Especially at the end, the “similarity” score for each method.  I have no idea what this is measuring or how to calculate it.


#38    Colin Wyers      (see all posts) 2009/04/16 (Thu) @ 11:17

There’s a link in the notes at the end that goes into some better detail. Basically you take:

SQRT((1B-EqR1B)^+ (2B-EqR2B)^... + (BB-EqRBB)^2)

Except in this case, I multiplied each individual term by how often those events occured and then divided everything by the sum of events before taking the square root.

Anything else you’re confused about? I’ll confess that I like the theory behind this set of articles but I’m underwhelmed by the results; it’s like when a pitcher feels fine but for whatever reason doesn’t have their best stuff.


#39    Peter Jensen      (see all posts) 2009/04/16 (Thu) @ 11:50

Colin - There is still no substance to this analysis.  Your “plus one” method is basically an alternative method for calculating linear weights without the theoretical basis that the value added method or additional runs in an inning method has.  Runs are scored in units of innings, not games.  Each new inning starts the same way and ends the same way.  As weskleton correctly points out in post #34 above, what happens in the fifth inning has zero impact on what happens in the first.  The estimators that do well in this “test” are the ones that are closest in construction to linear weights.  No surprise here, since you have merely calculated linear weights in a slightly less precise way.

I hope you are done with this series.  You are trying to measure something in a relatively simple way that is very complicated to measure.  The only way that I can think of to test run estimators is to construct a simulation that is highly accurate in estimating individual game scores.  You could then test how well the different run estimators applied to the players in a constructed lineup playing against an average team can estimate the results of the simulator.  Even that might have some problems.  I haven’t really thought it through completely.


#40    Patriot      (see all posts) 2009/04/16 (Thu) @ 12:17

A couple of comments:

1. Colin says:

The formula to convert rates to runs is typically something along the lines of:

(2*(Rate/LgRate)-1) * PA * R_PA

I think it would be helpful to give the actual slope and intercept of the regression line for each rate stat, rather than the OPS and EqR case of 2 and -1. 

2. I don’t agree that for the purpose of evaluating a rate, it needs to be regressed in terms of runs per out.  Honestly, I can’t tell from the article if that’s what Colin did or not--maybe he used the R/PA regression to estimate runs and then turned that figure into R/O.  This part isn’t clear to me.

3. I do agree with Peter that the matched game pair method isn’t particularly helpful.  But I don’t agree with his assertion that Colin should stop writing about this topic (unless he’s already decided the series is over on his own, of course).

As an aside, the new issue of By the Numbers at Phil’s site has an article by Richard Schell titled “A Method for Estimating Run Creation”.  He says that he used a Markov model to derive some simplified equations that can be used to make a BsR-esque model of the scoring process. 

It’s not a very clear article at all--he doesn’t explain HOW he used the markov model to derive the simplified equations, or really even spell out what his final formula is.  So my comments should be taken with a grain of salt, since I may be missing something, or we don’t have the complete picture.

His formula has a constant labeled k2 in the denominator that appears to me to throw off its accuracy for the extreme case in which a team makes no outs or almost no outs relative to a huge quantity of offense (say 100 walks and 3 outs in an inning). 

In his footnotes he cites two as of yet unpublished papers, one called “Using Markov Models as a Tool for Run Estimation”, which hopefully will address the concerns I had above about how the model was derived, and one called “Linear Weights from Non-Linear Run Estimators”, which sounds a lot like the article I published in BTN a couple years ago.  But perhaps he has a different approach than partial derivatives.


#41    dave smyth      (see all posts) 2009/04/16 (Thu) @ 15:12

For me the .86 runs for a 3b is a big problem. If I can’t trust that result, why should I trust the others, such as HR=1.36 or 2b=.72? I don’t understand why Colin kept going at that point, judging the various run estimators against a seemingly flawed standard, and publishing the results of his testing as though they are worth looking at.

Of course, if Colin or someone else shows that the .86 is indeed correct, and all of the previous studies showing 1.05 or so for a 3b are wrong, I’ll be happy to give him his props. I just don’t think it’s gonna happen…


#42    Tangotiger      (see all posts) 2009/04/16 (Thu) @ 15:31

Patriot/40 is referring to this article:
http://www.philbirnbaum.com/btn2009-02.pdf

Good job on Rick to use BaseRuns as the basis, and go from there.  I don’t understand most of the math.  I don’t know what kind of shortcuts he took, etc.  He did separate the HR, which is a clear tip-o-cap to BsR.  I’m guessing what he did was good.

I’ll take exception to this footnote:

1 David Smyth asked in an online forum whether sacrifice flies should be given special treatment in run estimation models. In an early version of the run estimation model described here, sacrifice flies were given the same treatment as home runs; in the current version, they are treated as a variable part of V.
Both treatments yield roughtly the same “value” for sacrifice flies. Neither treatment is right or wrong.

Well, it’s wrong and wrong, especially if you are trying to compare your metric to others that do not treat SF in such a biased manner.

You know what, for my next run estimator, I’m going to only count the number of singles, doubles, triples, walk, and out that drives in a runner.  No need SF should be the only one to get this bias.

He also says:

Some of the advantage that the model has over RC, XR, and BsR exist because it incorporates reached on error (ROE), while the others do not. It requires a bit of manipulation, but both BsR and XR can be extended to include ROE;

But my version does include it, plus all the other events from Retrosheet:
http://www.tangotiger.net/bsrexpl.html

Rick contacted me a little while ago but I don’t remember what I said.  I think I may have brought up the above, but I’d have to check my sent box at home.


#43    david smyth      (see all posts) 2009/04/19 (Sun) @ 06:09

With regard to using regression to generate a run estimator and getting weird values...has anyone tried doing the regression using just hits and extra bases instead of the 4 hit types? This would seem to avoid the problem with, say, the 3b, which has a small sample size and some strong associations such as speed.

Has anyone ever tried just using H, (TB-H), (BB-IBB), and (AB-H)?

This would seem to be lass ‘precise’, but more ‘accurate’ (according to a recent post I saw on that distinction).


#44    Colin Wyers      (see all posts) 2009/04/19 (Sun) @ 14:32

Just did as David suggested, for 1993-2008. The weights come out to:

.56 1B
.83 2B
1.10 3B
1.37 HR
.36 BB
-.12 Out

You get a too-high value for the batting events, and a too-high value for the out to compensate.


#45    dave smyth      (see all posts) 2009/04/19 (Sun) @ 15:59

Isn’t it the case that, using only the major events, we ‘should’ get higher values to compensate for the missing info? Well, that’s what usually happens, anyway.

The events which are high are on-base / out events. The extra bases are low (.27 instead of .31 or so). That is interesting and makes sense.

As long as I understand that the .56 represents not a 1b, but a 1b+, etc., I have no problem with that formula. In fact, I rather like it, for most of the purposes for which I might use a simple run estimator.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 22:49
Clutch analogy

Feb 11 22:08
Who is Jeremy Lin?

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul