THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Saturday, January 17, 2009

Inning-level Linear Weights

By Tangotiger, 10:49 AM

I’ve wanted to do this for the longest time, and finally got around to doing it.


From 1993 to 2008, there have been nearly 600,000 3-out innings, through 8 innings of each game.  Here is how many runs are scored in each of those innings, when broken down by the number of HR hit in that inning:

HR R_PER_I N
0 0.353 528611
1 1.983 59475
2 3.439 5319
3 4.961 413
4 6.778 36
5 7 1

So, when no HR are hit, which happened in over half a million innings, there were 0.353 runs scored.  When exactly 1 HR is hit in an inning, there were 1.983 runs scored.  The difference, 1.63 we can attribute to the HR.

Well, not totally, since we didn’t control for the other events (singles, doubles, walks, etc).  While we’d like to think those would be random for each HR class, the reality is that innings that have a HR hit will likely let us infer that it was disproportionately allowed (or hit) by teams/pitchers predisposed to allow (or hit) other events too.  Let’s keep going for the moment.

The number of runs scored with 2 HR hit in an inning is 3.439, which is 1.46 more than when 1 HR is hit.  When 3 HR are hit, there are 4.961 runs scored, which is 1.52 more than when 2 HR are hit.

Generally speaking, we can say that each HR adds something like 1.4 to 1.6 runs.

Let’s repeat with triples:
3b R_PER_I N
0 0.518 580901
1 1.866 12711
2 3.441 238
3 4.8 5

When one triple is hit in an inning, there are 1.35 more runs scored than in innings when no triples are hit.  I have to admit that this is much higher than expected, even though we have over twelve thousand such innings.

Doubles:
2b R_PER_I N
0 0.352 486603
1 1.231 93687
2 2.676 12095
3 4.134 1343
4 5.426 115
5 7.6 10
6 6 2

When one double is hit in an inning, there are 0.88 more runs scored than when no doubles are hit.  However, when two doubles are hit, there are 1.45 more runs scored than with one double.  The pattern repeats itself with three doubles (1.46 more runs scored than with 2 doubles).  Why so much?  Well, if you know that you hit two doubles, then you are guaranteeing one run scored, plus whatever other runners you get to knock in.  You are not getting the random events that you are hoping for.

Here is it for singles:
1b R_PER_I N
0 0.207 316712
1 0.543 183406
2 1.259 66297
3 2.366 20036
4 3.54 5539
5 4.745 1410
6 5.965 346
7 7.15 80
8 8.72 25
9 10 4

As you can see, that first single in the inning only adds 0.34 runs, but that second single adds .72 runs, and the third adds 1.11, while the fourth adds 1.17.  The fifth adds 1.20 runs, and the sixth is 1.22 runs.  As you can see even more clearly here, if you can bunch up singles in an inning, you’ll get quite the compounding effect.

The weighted average of all these numbers is +0.505 runs, which is alot more of what we expected.  The weighted average for the extra base hits was:
2B +0.951
3B +1.352
HR +1.615

Here are the numbers for non-intentional walks:
bb R_PER_I N
0 0.369 425646
1 0.818 134835
2 1.529 27682
3 2.475 4784
4 4.013 783
5 5.647 102
6 7.158 19
7 8 3
8 16 1

The first walk adds +0.45 runs, while the second walk adds +0.71 runs.  The third adds +0.95 runs, and the fourth is +1.54 runs.  Similar to the single, if you can bunch up walks, you have a devastating effect on the runs scored.  Interestingly, the weighted average is +0.512 runs, which is MORE than the single!  Almost certainly, if you see alot of walks, you can infer far more about the talent level of the pitchers or hitting team than if you see alot of singles.  We’ll get to that in a few minutes.

Here we have hit batters:
hb R_PER_I N
0 0.52 578497
1 1.566 14970
2 3.021 384
3 4 4

The first hit batter adds +1.05 runs!  That is insanely high, and really, impossible.  What can it mean?  Almost certainly, it means that alot of hit batters are not random, and, as baseball fans would suspect, are linked to HR.  So, that hit batter, in and of itself, should be worth similar to a walk.  But, it carries extra hidden information, which we can infer that there are more HR hit in innings in which a hit batter is allowed, than otherwise.  The weighted average is +1.06 runs.

Here are the numbers when a batter reaches base on error:
er R_PER_I N
0 0.511 569234
1 1.347 23823
2 2.734 772
3 3.808 26

We can also infer that the fielding team is not that good.  The first error adds +0.84 runs, with a weighted average of +0.85 runs.

Just to conclude this section, here is the intentional walks:
ib R_PER_I N
0 0.52 571721
1 1.234 21441
2 2.299 672
3 3.667 21

The first IBB adds +0.71 runs, for a weighted average of +0.73 runs.  As you can guess, our inference is that IBB are allowed when other runners are on base.  And so, the +0.7 runs being added are not directly attributable to the IBB itself, but to the KNOWLEDGE that an IBB has been issued, implying the value goes partly to the IBB and partly to everything else that is not random.

***

Now, let’s take care of all that non-randomness with regression.  Taking the 38,830 3-out innings of 2008 only, I get the following coefficients for the regression (r=.875):
+0.36 BB
+0.51 1B
+0.53 Error
+0.78 2B
+1.02 3B
+1.42 HR

Now, those numbers look VERY NICE.  They are pretty much what we expected, give or take .03 runs.

However, and this is why we don’t want to be a slave to the regression, the coefficient for the hit batter is +.26 runs, and for the IBB, it’s +.43 runs.  The IBB is especially ridiculous, since they are given out with 1 or 2 outs, and so, don’t have as many opportunities to score.  The standard error is .013 runs, meaning that we are 95% sure the run value is between .40 and .45 runs.  Like I said: ridiculous.  You have to use the regression as a tool, and not be a slave to it.  You must be tempered by your baseball senses.

Let’s repeat the regression, but on 2007 data.  We have 38,876 3-out innings, and the results of the regression (r=.877):
+0.37 BB
+0.51 1B
+0.58 Error
+0.78 2B
+1.02 3B
+1.39 HR

Most stayed the same, but notice how the reaching base on error jumped?  The standard error was only .013, so it is fairly different from the 2008 value.  The hit batter again comes in low (+.27 runs) and the IBB comes in high (+.40 runs).  An additional reason for the high run value of the IBB is that the subsequent batters may be good hitters, and for the hit batter, you can have the opposite conditions.  Well, that’s the story of what the regression might be telling you.

On the other hand, if you do it the right way, and look at run expectancy charts, as detailed in The Book, we don’t get these egregious differences.

We’ll try one more, for 2006.  With 38,824 innings, and an r=.879, we get:
+0.37 BB
+0.51 1B
+0.60 Error
+0.77 2B
+1.07 3B
+1.38 HR

Check out the jump for the error.  Otherwise, things were pretty stable.  Hit batter again came in low (+.27) and IBB high (+.41).

What we’ve laid out here is a progression of analysis techniques of sorts.  The first part simply presumes “all other things equal” (which is not true) to show you the impact of each subsequent event.  The second part uses regression which (tries to) account for the “other things”.  It succeeds mostly, but not totally.  The third part was detailed in The Book, which is the best way to do it.

#1    Patriot      (see all posts) 2009/01/17 (Sat) @ 12:33

It is nice to see that the regression weights are reasonable on the innings level.  And I would think it would be a blow to those who believe that season-level regressions provide any particular special insight into how runs are scored (this by now is a minority viewpoint, but I’m sure there’s still a few hardliners out there).


#2    Tangotiger      (see all posts) 2009/01/17 (Sat) @ 12:56

Here are the game-level regression, from all games from 1996-2008, only the first 8 innings, of which 24 outs were recorded:

n = 62,370
r = .885

+0.37 BB
+0.51 1B
+0.58 Error
+0.78 2B
+1.07 3B
+1.40 HR

Love how the HR always comes in at 1.40 when I do stuff like this.  The HBP is +.25 and IBB is +.42, both ridiculous of course.  Clearly, there are biases in when these two events occur.

***

Also note how the BB and 1B come in higher than we expected, by around .04 runs each.  In this particular case, I do NOT have any “batting out” parameter, as each sample has 24 outs.  What this means is that we are not distinguishing between batting and running outs.  This creates a bias in the results, similar to when you run my Markov, and you get values for the BB and 1B that are .04 higher than you’d expect.

What I should do is count also the number of batting outs, so that I don’t have this bias.  But, at this point, I don’t think we’re going to learn much more…


#3    dcj      (see all posts) 2009/01/17 (Sat) @ 13:44

Here are the numbers for non-intentional walks:
bb R_PER_I N
...
8 16 1

That must have been a painful inning. (looks it up...) April 19, 1996, Orioles at Rangers, bottom of the 8th. Texas already led 10-7. Armando Benitez started the inning, allowing a single and two walks to load the bases. Orosco came in and gave up six hits and two walks while recording just one out. The last walk came with the bases loaded, at which point Davey Johnson threw in the towel and brought in Manny Alexander to pitch. Alexander proceeded to walk the next three batters before “recovering” to give up a sac fly, another walk, a HR, and a groundout to end the inning.

Overall the Rangers managed to score 26 runs while leaving just 5 men on base.

Retrosheet link


#4          (see all posts) 2009/01/17 (Sat) @ 15:50

Do you have an explanation for why one single adds only 0.34 runs but one unintentional walk adds 0.45?  A single is worth either the same or more in every situation.  So it would seem that the first and only walk is more likely to occur with runners on than is the first and only single.  But I’d be very surprised if that were true with a runner on first.  So that suggests lone walks occur disproportionately with first base open and runner(s) in scoring position.  What does that sound like?  The notorious unintentional intentional walk.  It’s a long-held belief of mine that a high percentage of unintentional walks in these situations are in fact more or less intentional and should be treated as such.  Could you confirm or refute?

Since the raw data clearly indicates that the impact of singles and walks is non-linear, why use a linear regression?  Have you tried logistic regression?


#5    Tangotiger      (see all posts) 2009/01/17 (Sat) @ 16:35

That’s why we have BaseRuns or the Markov calculator.  For pitchers and teams.

For hitters, you would still need to use Linear weights.

Yes, walks are issued disproportionately when 1B is open or there are 2 outs.


#6          (see all posts) 2009/01/17 (Sat) @ 19:11

Any idea why the value of the error jumps around so much in the regressions?


#7    Pizza Cutter      (see all posts) 2009/01/17 (Sat) @ 21:51

My guess would be that an ROE is a (relatively) rare event, mixed with the fact that on a single, the batter gets one base, an ROE can actually look like a single, double, or triple, depending on the circumstances.  Small sample size plus unstable parameter equals jumpy coefficient.


#8          (see all posts) 2009/01/17 (Sat) @ 22:22

I checked 2008 data on Baseball Reference.  Over 30% of unintentional walks with 1st base open were semi-intentional: 30.5% in the NL and 32% in the AL.  Surprisingly, 8 or 9% of walks with the bases empty and 4-6% of those with a runner on first also fall into that category.


#9    dave smyth      (see all posts) 2009/01/18 (Sun) @ 16:59

Why does the bias of using all outs instead of batting outs result in higher numbers for the 1b and BB?


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 08:49
Do pitcher’s reach back for velocity when needed?

May 25 08:11
What sabermetrics is NOT

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story