Tuesday, July 18, 2006
Linear Weights by Run Environment
I took the Ruane LWTS, year by year data on Retrosheet, and grouped them by run environment, to get this:
RunsMin_RunsMax_n___RperI____1B_____2B_____3B_____HR_____NIBB
0.000___0.439___15___0.419___0.436___0.720___0.986___1.402___0.279
0.440___0.459___21___0.451___0.448___0.740___1.017___1.411___0.292
0.460___0.479____8___0.469___0.454___0.749___1.005___1.404___0.295
0.480___0.499___11___0.489___0.460___0.761___1.052___1.409___0.305
0.500___0.519___17___0.508___0.459___0.764___1.060___1.399___0.303
0.520___0.539____4___0.525___0.471___0.768___1.078___1.396___0.314
0.540___0.559____7___0.550___0.474___0.768___1.043___1.393___0.324
0.560___1.000____7___0.581___0.489___0.791___1.049___1.404___0.337
What I did was take each run environment, and put them in one of these 8 bins. For example, the 1996 AL had .602 runs per inning, and was added to the last bin. The 1968 NL environment was .377 runs per inning, and was added to the first bin. “n” is the number of league-seasons per bin. RperI is the average runs per inning. The other columns are simply the average run values, based on the Ruane data:
http://www.retrosheet.org/Research/RuaneT/valueadd_art.htm
As you can see, as the run environment increases, the run value of all the events, save for one, pretty much increases. That one event, the HR, has a pretty static value of 1.40 runs. Fans of BaseRuns are not surprised by this.
There is alot of research out there that runs regressions against team-seasonal data to try to get to these numbers. You end up with ERP, XR, Jarvis numbers, and a bevy of run values for each event. All of that can now be discarded. There is no reason whatsoever to rely on such limited data, when we have actual play-by-play data. And, as a testament to the Palmer framework in The Hidden Game, the above numbers are much closer to the Palmer run values of 25 years ago, than what anyone else has come up with since. The reason is that once you’ve got a reasonable working model, like a Markov or decent simulator, you don’t need regressions.
Another way to look at the data is with respect to the run value of the single:
RperI __2b-1b__3b-1b__1b-bb
0.419 __ 0.28 __ 0.55 __ 0.16
0.451 __ 0.29 __ 0.57 __ 0.16
0.469 __ 0.30 __ 0.55 __ 0.16
0.489 __ 0.30 __ 0.59 __ 0.16
0.508 __ 0.31 __ 0.60 __ 0.16
0.525 __ 0.30 __ 0.61 __ 0.16
0.550 __ 0.29 __ 0.57 __ 0.15
0.581 __ 0.30 __ 0.56 __ 0.15
As you can see, the run value of a single and double are around .30 runs apart. The walk and single are around .15 to .16 runs apart. All this is regardless of the run environment.
For those who have never seen it, here is custom linear weights data, based on the BaseRuns framework: