THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, October 13, 2006

Run Estimator for Softball Leagues

By Tangotiger, 02:47 PM

I was given some softball league stats for a reader, which looked like this for BA/OBP/SLG:
.440/.500/.565

Basically, lots of singles and walks, and a few extra base hits sprinkled in.  The data looks like it’s a 7-inning league, with an average of 12.4 runs scored per game.

When I ran it through BaseRuns, I got 9.1 runs per game, which is a hefty difference.


I’m also assuming no reached on error, as the data makes me believe that everything was either a hit, out, or sac.

In order for me to resolve to the runs scored, I had to double the “B” factor, which is a huge fudge.

That is, while a MLB team with these stats would score 44.5% of its runners who reach base, this softball league scores 61.6% of its runners.

With this new fudged version of BsR, here are the Linear Weights for this softball league:
BB: .62
1B: .81
2B: 1.16
3B: 1.59
HR: 1.48
out: -.84

Clearly, we have an issue here with the triples and HR, but we’ve already known that from the beginning.

Now, in this softball league, like I said, the OBP is .500.  How often should the runner from 3B score?  Let’s make it very exaggerated and say that every out is a sac fly.  So, with 0 outs, the runner from 3B will ALWAYS score.  Same with 1 out.  With 2 outs, he’ll score half the time.  So, the best we can hope for is, on a triple, the runner will score 83% of the time.

Seeing that 5% of the outs are K, and let’s assume that 5% of the other outs the runner on 3B stays put, and on the other 90% of the outs, the runner on 3B scores (still exaggerated), a guy hitting a triple will eventually score 82% of the time.  Let’s stick with that figure.

Now, if that’s the case, how often will a guy getting on base via a double will score?  Let’s just assume 75% of the time.  Since we know that the team scored 668 runs, then we simply solve for “x”, the % of time a runner scores from 1B, and that is 59%.

So, the getting on value for a walk, single, or HBP is .59 runs.  BsR estimates the total run value of a BB is .62, meaning the “moving over” value of a walk is only .03 runs.

The moving over value of the single is .22 runs, .41 runs for a double, .77 for a triple, and .48 for a HR.  What is interesting about the run value of “moving runners over” for these (if you average out the triple and HR to .62) is that they are exactly what MLB has!  That is, the “moving runners over” impact is just about independent of the run environment.

The biggest impact we have is outs, and anyone making an out is doing a really bad job.

Anyway, BaseRuns does a terrible job with the triples, and this should be addresses immediately.

Next time, I’ll use my basic-Markov run modeller (which I know I promised to release, and I will… I just haven’t spent the appropriate time to make it useable), to see what LWTS values we *should* be assigning.  And we’ll also see if BaseRuns broke down because it doesn’t work at high run environments, or if it’s because the baserunning movements was based on MLB levels (say, 1B to 3B on a single is 33%).  Ideally, it’s because of the latter, and therefore, the fudge I applied was totally acceptable, and serves to address this issue.  If it’s not because of that, then we need to rethink BaseRuns.

Stay tuned…

#1    Tangotiger      (see all posts) 2006/10/13 (Fri) @ 14:53

I just got a note that the team reaches base on error 3 times a game, with about 2 baserunning outs per game, along with 1 WP per game.  I’ll have to rework everything, since this will clearly impact what I’ve just done.


#2    David Smyth      (see all posts) 2006/10/13 (Fri) @ 15:28

I certainly am not surprised that a softball league with those odd stats, plus the RBOE activity, does not conform to BsR. Yes, BsR is a “fundamental” type formula, but it certainly carries MLB assumptions about baserunner movement. The weights in the B factor are empirical, not theoretical. I mean, if you applied those B weights to little league, I am sure it would be off by a mile.

The surprise, for me, is that anyone would think that those weights are somehow inherent or universal. It’s certainly possible that the “structure” of the formula is universal, given the same rules of play. But the weights certainly are not. And that applies to the triples problem. The linear B factor is also just a model of a more complex function. If someone wants to solve that problem, that’s great. But then, how will those changes apply to a softball or little league?


#3          (see all posts) 2006/10/14 (Sat) @ 20:31

I think the more useful information to take back to the league from this analysis is the value of an out.

At -0.84 runs, it is imperative to get any out possible on defense and avoid outs on offense.  I tried to stress to this an over 30 league I played in.  It had a high run environment (about 7 runs per game per team) and giving up bases and not gettings outs were huge.

Not being real conversant in baseruns - how important is the 3B factor?  In most softball leagues, even more than MLB, the triple is rare due to short fences.  Is the frequency of an event refelected in the linear weight - or equivalently will the low frequency not affect comparisons to actual data if the weight is appreciably “off”?


#4    James Holzhauer      (see all posts) 2006/10/15 (Sun) @ 16:29

Re: Dufman, it’s impossible to teach non-mathematical types to appreciate the value of outs.  You will still see them running themselves out of inning after inning, or trying hopelessly to get the lead runner out, even in a 25-21 slugfest.

I’m under the impression that many of these statistics break down in extreme environments, especially if they use ratios or don’t adjust for context.


#5    tangotiger      (see all posts) 2006/10/16 (Mon) @ 08:53

I re-ran using the new data:
1 - Reached on error was added in (3.4 per game!)
2 - Sac flies were included (I’m assuming the “sac” category supplied was SF)
3 - WP, PB, BK (1 per game)
4 - NO fudge factor to the “B” component

Results?  BaseRuns, using “MLB assumptions” predicted 678 runs in 54 games.  Actual? 668.  Bingo!

The LWTS values are: .73, 1.07, 1.51, 1.49, .54 for the basic hitting categories.  RBOE, WP are .75, .29.  Run value of the out is -.89.

The triples value is of course problematic, but no so egregious this time.

The most important aspect is the out value, which is enormous, as we know.  Batting outs, and even worse baserunning outs, are terrible prices to pay.

I think the players know this, since only 20% of the hits are for extrabases (MLB is 30%), with more triples than HR. 

Later this afternoon, I’ll produce the LWTS values using a simple-Markov process.  This won’t have an issue with the triples.


#6    Chris Miller      (see all posts) 2006/10/16 (Mon) @ 10:10

This reminds me of something similar did w/ Wiffleball leagues whose stats I could pull off of the web.  I pulled random leauges off the net, based on which stats were available.  I was mostly looking for traditional wiffleball, a 2 person game, with no base-running, 1 defender, and fixed advancement (1B = 1 base advanced, 2B = 2 bases, etc.). 

Several leagues I looked at (sorry, dont have R/G available right now) fit best to OPS and basic RC.  I didn’t try very hard to fit them with Base-runs compnents, but found OPS and RC more accurate (without modifying the B component, so I probably could have improved the accuracy)

I do remember another league (again, don’t know R/G off top of my head), which had baserunning, 3 outs per inning, multiple defenders, and multiple batters, and it was a nearly exact fit plugging in the basic Base-Runs formula with no modifications, more-so than OPS or RC.

I think I saved the info to a spreadsheet, I’ll have to look when I get home.  I wasn’t so interested in modelling a specific league, as much as I was testing the fixed advancement, which I assumed would make wiffle ball easier to model.  Run Creation in traditional wiffle should model easily and perfectly in a simulator because of the fixed advancement, all you should need is 1B%, 2B%, 3B%, HR%, BB%,outs per inning.


#7    tangotiger      (see all posts) 2006/10/16 (Mon) @ 11:20

A simple-Markov approach gives us these run values, with the MLB version in parenthesis:
1b 0.85 (0.49)
2b 1.12 (0.77)
3b 1.30 (1.07)
hr 1.52 (1.48)
bb 0.73 (0.37)

In both cases, the assumption is that there are no outs on base, nor any basestealing, and that the “runner advancement on hits and outs” is the typical MLB patterns, and the the frequency of each event is independent of the base/out state.  The model is 100% pure math, meaning, it’s “perfect” under these assumptions.

As we can see, the gain in singles, doubles, and walks is virtually the same between the two leagues, 0.35-0.36 runs.  The gain in triples is only 0.23 runs, while the gain in HR is .04 runs.  What this means is that we’re still in the upward slope of the marginal run model.

Remember, at some point, all the run values will converge toward 1.000, meaning that the run values of the extra base hits will be lower, the higher the run environment.  We haven’t reached that point here.

So, if you want to use some LWTS values for your softball league, where the OBP is .500, and 20% of your hits are for extra bases, use these above values.  Set the run value of the out so that your totals come out to zero.

(Note: the simple-Markov approach estimated 13% more runs than actually scored for the softball league.  The reason is because of the baserunner outs not being part of the model.)

So, BaseRuns does a great job here, just blowing it on the triples.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Nov 21 17:29
Sabermetric Moves of the 2009 Pre-Season

Nov 22 06:40
The New Triple Crown

Nov 22 06:24
Chance of Scoring by Base/Out, Retrosheet Years

Nov 22 02:48
How good are the Fans in evaluating fielding?

Nov 21 20:13
Runs Produced

Nov 21 19:27
Marcel 2009 is here

Nov 21 16:43
Nate Silver: hero to interviewers

Nov 21 10:57
New BBTN

Nov 20 20:34
ABSO-lutely… not!

Nov 20 19:23
R.I.P. Tom Boswell, sabermetrician; P.A.L.L.(*) Tom Boswell, human being