Tuesday, July 18, 2006
Linear Weights by Run Environment
I took the Ruane LWTS, year by year data on Retrosheet, and grouped them by run environment, to get this:
RunsMin_RunsMax_n___RperI____1B_____2B_____3B_____HR_____NIBB
0.000___0.439___15___0.419___0.436___0.720___0.986___1.402___0.279
0.440___0.459___21___0.451___0.448___0.740___1.017___1.411___0.292
0.460___0.479____8___0.469___0.454___0.749___1.005___1.404___0.295
0.480___0.499___11___0.489___0.460___0.761___1.052___1.409___0.305
0.500___0.519___17___0.508___0.459___0.764___1.060___1.399___0.303
0.520___0.539____4___0.525___0.471___0.768___1.078___1.396___0.314
0.540___0.559____7___0.550___0.474___0.768___1.043___1.393___0.324
0.560___1.000____7___0.581___0.489___0.791___1.049___1.404___0.337
What I did was take each run environment, and put them in one of these 8 bins. For example, the 1996 AL had .602 runs per inning, and was added to the last bin. The 1968 NL environment was .377 runs per inning, and was added to the first bin. “n” is the number of league-seasons per bin. RperI is the average runs per inning. The other columns are simply the average run values, based on the Ruane data:
http://www.retrosheet.org/Research/RuaneT/valueadd_art.htm
As you can see, as the run environment increases, the run value of all the events, save for one, pretty much increases. That one event, the HR, has a pretty static value of 1.40 runs. Fans of BaseRuns are not surprised by this.
There is alot of research out there that runs regressions against team-seasonal data to try to get to these numbers. You end up with ERP, XR, Jarvis numbers, and a bevy of run values for each event. All of that can now be discarded. There is no reason whatsoever to rely on such limited data, when we have actual play-by-play data. And, as a testament to the Palmer framework in The Hidden Game, the above numbers are much closer to the Palmer run values of 25 years ago, than what anyone else has come up with since. The reason is that once you’ve got a reasonable working model, like a Markov or decent simulator, you don’t need regressions.
Another way to look at the data is with respect to the run value of the single:
RperI __2b-1b__3b-1b__1b-bb
0.419 __ 0.28 __ 0.55 __ 0.16
0.451 __ 0.29 __ 0.57 __ 0.16
0.469 __ 0.30 __ 0.55 __ 0.16
0.489 __ 0.30 __ 0.59 __ 0.16
0.508 __ 0.31 __ 0.60 __ 0.16
0.525 __ 0.30 __ 0.61 __ 0.16
0.550 __ 0.29 __ 0.57 __ 0.15
0.581 __ 0.30 __ 0.56 __ 0.15
As you can see, the run value of a single and double are around .30 runs apart. The walk and single are around .15 to .16 runs apart. All this is regardless of the run environment.
For those who have never seen it, here is custom linear weights data, based on the BaseRuns framework:
I’m just fine-tuning my simple Markov program, which I will release shortly. The major disadvantage in the program is that it doesn’t handle outs on base. This is a problem of course, but for such a simple program, it’s still pretty powerful. This problem is neatly balanced out by turning all reached base on error as outs. For example, by doing so for 1974-1990 data, I get 4.27 runs per game, instead of the actual 4.29 runs per game.
Anyway, one of the very cool things is that we’ll be able to generate an RE chart for any run environment you can think of, even for a million runs per game!
The other cool thing, which is an offshoot of the RE, is that we can get Linear Weights. For example, in the 1974-1990 run environment, I get these LWTS values, for walks, singles, doubles, triples, HR:
.37, .49, .77, 1.07, 1.48
The walk value is much higher than I would want, and the single is a little bit higher than I want, as is the HR. The reason for the walk is that walks are distributed evenly in my program, when in reality it is not. This causes a .02 run effect. As for the single and walks, because there are no outs on base (principally DPs), this gives the illusion of the run value of being on 1B is higher than it should be. Not sure yet why the HR is where it is, but chances are it has to do with the outs on base issue.
Now, let’s bump up our run environment so that our team has an OBP of .500. The runs per game is now at 14.4. What is our new LWTS?
.63, .76, 1.02, 1.24, 1.52
We can see here how the HR run value barely moves up, while the other events all move up by around .25 runs.
What if we shoot up the run environment so that the OBP is .750? Now, we have 65 runs per game. The new LWTS values are:
.88, .96, 1.09, 1.18, 1.31
We see here that the run values of most events start to converge towards 1.00, as expected (according to BaseRuns).
Finally, let’s bump up the OBP to .900. That’s 226 runs per game! The run values are:
.96, .99, 1.04, 1.07, 1.12
We can see now that all the events are rapidly converging towards a run value of 1.00 for each one.
This Markov calculator should become the basis in which to evaluate all run estimators (with some adjustments for the issues discussed).