THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Monday, May 03, 2010

Game time regressions

By Tangotiger, 07:30 AM

Russell final BPro piece.  He’s one of the best saberists as we know, so hopefully, he’ll still be around somewhere.

***

One of my issues with regression at this granularity level is when I see something like this: “.742 * inning breaks “

Well, we know that inning breaks are 2 to 3 minutes each, depending which TV network is involved.  So, what the regression is saying is that there’s some 1.5 to 2.0 minutes that it’s removing from what we know, and distributing it to other variables, even though, in this particular case, it should be completely independent.  That is, the between inning break has no relationship whatsoever to any other event.  But the regression is finding some relationship.

***

Cutting one minute in the non-action between inning will save some 17 minutes of game time.  The players loaf around too much by their own admission.  But, as one of the players recently admitted on his blog “we got to gets paid”. 

So, this is really the issue: how can you cut down on game time while not touching the non-game time.  Which is a very weird thing to try to optimize from a fan experience. 

Indeed, what’s to stop MLB from increasing between inning game time, even if we reduce the actual game-time, so that we are always going up the same hill? Sisyphus anyone?


#1          (see all posts) 2010/05/03 (Mon) @ 10:05

Good point about the between-innings variable.  What would cause this to happen?  Too much correlation between innings and the other variables?  Not enough variability in innings?

I suspect it’s the lack of variability.  Almost all games are 8.5 or 9 innings, and the others are so close, that maybe the innings effect gets washed out?

I’d really like to know the answer ... my own regression came out with a *negative* coefficient for innings when everything else was taken into account.  On my blog, someone asked if it’s possible the Yankee players came out so slow (Jeter was second slowest in MLB) because they played more games with long breaks.  I said, and he confirmed, that that would show up in the Yankees’ team variable.  But in that case, why don’t innings come out right in my variable and Russell’s?


#2    Tangotiger      (see all posts) 2010/05/03 (Mon) @ 10:23

I think it’s not enough variability between innings.  Suppose that he had ONLY nine inning games.  So, it would be almost impossible to capture the between-inning variable, since the only discerning parameter is if the home team came to bat or not in the bottom of the 9th.  Now, he does include all games, which means extra inning games.

So, he’s probably capturing the effect of extra inning games more than anything.  That maybe extra inning games go faster because there is less dilly-dallying.

Indeed, I’d like to see the elapsed game time for extra innings compared to other parts of the game.

Game time: first five innings
Game time: 6th-8th innings
Game time: 9th inning
Game time: 10th+ innings

List the elapsed time and number of batters, and present the data in terms of time per 39 batters.  I’ll bet you we’d get something quite illuminating.


#3          (see all posts) 2010/05/03 (Mon) @ 10:27

My regression had a variable for how many of the last six half-innings were close—within two runs.  It was significant and fairly large—47 extra seconds per close inning.

That captures some of what you’re looking for, I think.

Write up is at the link below: a link to the full results (coefficients) is in the second-last paragraph.

http://sabermetricresearch.blogspot.com/2010/04/why-are-yankeesred-sox-games-so-slow.html


#4          (see all posts) 2010/05/03 (Mon) @ 10:39

You raise a good point there ... maybe games with a one-run lead are slower than tie games, because teams save their slow strategies (lots of reliever changes, etc.) for when they’re protecting a lead, and/or when they think the end of the game is close.  As you say, there’s probably less dilly-dallying in a game that both teams think is going more innings.


#5          (see all posts) 2010/05/03 (Mon) @ 11:55

Wait, Russell’s leaving BPro?


#6          (see all posts) 2010/05/03 (Mon) @ 13:44

I clicked through before reading the comments, and I see that all the questions I raised at BPro (in response to Tango), Phil is asking here.

- What about the closeness of the game?
- Why wouldn’t we use just 9 inning games?

If 2010 Orioles games are any indication, the first half of the games are much faster than the final few innings.  But that’s mostly because the bullpen has been so bad.


#7          (see all posts) 2010/05/03 (Mon) @ 16:37

I preferred Phil’s regression - more informative because he controlled for players and team, although Pizza, I think, was trying to do something a little different to Phil (understand the characteristics of long games, rather than which players or teams caused them).

So if baseball really wants to shorten up games, it could cut 15 minutes out per game if it cut out two pitches of each half inning.

Similar to Tango/1 I’m not sure how accurate this is. Pitches and PA have got to be correlated right? I’d like to see pitchers per PA and PA in there. PA also includes a ‘runs scored’ effect.


#8          (see all posts) 2010/05/03 (Mon) @ 18:07

Regressions are interesting and all, but why is so much fuss being made over regressions when we have the actual time data for the events within the games?

Isn’t it far more accurate to measure something directly than to measure it indirectly through a variable in a regression?


#9          (see all posts) 2010/05/03 (Mon) @ 20:06

Let me add, re #8, that I’m willing to help Phil, Russell, Tango, etc., get the raw pitch-time data from 2008-2009 if that would help them.


#10    brent      (see all posts) 2010/05/04 (Tue) @ 06:47

About Sisyphus, MLB may decide one day that ratings would increase and provide more revenue than merely increasing the amount of commercials (and having a longer game). The fans need to vote by either going to the game or not watching it at home.


#11          (see all posts) 2010/05/04 (Tue) @ 13:40

Maybe I should state it more explicitly.  I believe these regressions are giving results that are incorrect.

To quote Phil

3:30 seems like a lot to me. How much is Jeter really involved in the play? Maybe 4 or 5 plate appearances a game, which is 15 or 20 pitches? That works out to between 10 and 15 seconds a pitch.

Does Derek Jeter really take an extra ten seconds between pitches than the average batter? I haven’t watched him bat that closely, but maybe you guys can let me know if that’s a reasonable estimate. There is some randomness involved in the regression, and, since Jeter was the second highest in the league, you might want to regress his number to the mean a little bit. But still—his game factor of 3:30 was 3.6 standard deviations from zero, so it’s almost certain that he’s pretty slow.

From 2008-2009 here are the numbers comparing Jeter in Bos-NYY games to Jose Lopez in Sea-Tex games.  (Jose Lopez

Jeter played in 33 such games and saw 174 pitches with runners on, averaging 33.0 seconds between pitches.  He saw 258 pitches with the bases empty, averaging 23.5 seconds between pitches. 

Lopez played in 41 such games and saw 220 pitches with runners on, averaging 26.3 seconds between pitches.  He saw 201 pitches with the bases empty, averaging 17.7 seconds between pitches.

Based on these numbers, Jeter uses a little more than one extra minute per game as compared to Jose Lopez, rather than 6 extra minutes that the regression tells us.  Now that’s just comparing between-pitch-time, but I would expect that to be the bulk of the difference between any two players.


#12          (see all posts) 2010/05/04 (Tue) @ 13:42

I also realize that my data set for the comparison in #11 is much more limited that Phil’s, but I doubt that makes anywhere near a 5-6x difference.  I believe the burden is on the regression folks to demonstrate that their results have validity.


#13    Guy      (see all posts) 2010/05/04 (Tue) @ 13:54

Mike: 
I think there is one other important element, which is getting on first base.  Over those 2 years, Jeter had 430 singles/BBs compared to 283 for Lopez, which means Jeter gets to 1B about .5 extra times per game.  Jeter is of course also more of a SB threat.  So, how many extra times do pitchers throw to 1B as a result?  How many more times do they step off the rubber?  I think that could account for some real time.

And then consider Jeter’s defense:  he’s allowing more opposing hitters to get to 1B on hits/ROE, and turning fewer GDPS, than a typical SS.  So that likely adds time as well.

I’m not arguing Phil’s coefficients are necessarily right, only that time between pitches is not the only factor to consider.


#14          (see all posts) 2010/05/04 (Tue) @ 14:38

I think Mike is onto something: pitches take 10 seconds longer with men on base than without.  Yankee games perhaps are more likely to have men on base.

Let me rerun the regression with separate pitch count variables for men on and without.

And, of course, I agree with Mike that real life pitch time data trumps regression results, at least as far as estimates of pitch times go.


#15          (see all posts) 2010/05/04 (Tue) @ 15:46

Mike—good work. I agree 100% that the pitch by pitch has to trump the regressions. From a purely ‘academic’ point of view I’m keen to understand where/why the regression falls down.

Guy - originally I thought some of that time would have been captured in the team coefficient. Also given that Jeter plays pretty much every game for the Yankees I also wondered whether it was possibly that that caused confusion btwn player and team coefficients but the simple simulation I did (possibly too simple) suggested not. I bet given the reasonable consistency of the Yankees team over time means Jeter’s defense isn’t distinguishable to the regression. I’d have thought it would be bundled with overall defense, which, although changes y-t-y (depending on how personnel changes), would more likely be captured by the team coefficient. Although that could be completely wrong.

Phil/14 as per the thread on your blog that is the type of thing I’d expect to get picked up, at least partly, by the team variable.

Interested in your new results.


#16          (see all posts) 2010/05/04 (Tue) @ 16:04

OK, this is WEIRD. 

As expected, pitches with runners on came out with a higher coefficient pitches with the bases empty: 27 seconds to 19.  The player coefficients didn’t change a whole lot.

But the weird part: now, the innings coefficient jumped from -.15 in the original regression, to 2.0 minutes in this one.  (The one in the middle was 0.5 minutes.)

Again: why would that happen?  Maybe if you don’t get EVERYTHING that matters, what you missed scatters itself around in unpredictable ways?  I have no idea.


#17          (see all posts) 2010/05/04 (Tue) @ 16:44

OK, got it.  You have to think about “keeping all else equal”.

What if a game takes an extra half inning, *keeping all else equal*?  Well, you have an extra two-minutes for the break between innings.  But: you have the same *equal* number of baserunners/PAs/pitches, spread over an extra half-inning.  That means the baserunners will be spread out more.  That means you will have fewer runners on base, on average.  That means the between-pitch times will be a little faster.  So, since the original regression didn’t keep *pitches with runners on* equal, you get:

-- more time between innings
-- less time between pitches

And the net effect could be anything!  It turns out that it was negative .15 for me, and +.7 for Russell, because of what we controlled for.

But, once we also keep *runners on base* equal, then the “less time between pitches” disappears, because it’s part of the “all else being equal”, and we get the proper 2 minute break for innings.

This is a good way, I think, to test for whether your regression is sound: put in things for which you know what the value should be.  If they come out wrong, then your model is missing something.


#18          (see all posts) 2010/05/04 (Tue) @ 17:12

Splitting the data into two, 2000-04 then 2005-08:

Derek Jeter was +4 minutes early, +2 minutes late.  So that’s a bit more reasonable in light of Mike’s findings which imply maybe +0.5 minutes.  Is Jeter getting faster at the plate?

Could part of the issue be mound conferences?  Does Jeter initiate or lengthen a lot of those?  Someone was talking about Posada making lots of trips to the mound during the playoffs ... does Jeter do that a bit too?  Does he talk a lot during mound conferences?


#19          (see all posts) 2010/05/04 (Tue) @ 23:09

To clarify #18: Jeter went from +4 early in the decade to +2 late in the decade.


#20          (see all posts) 2010/05/05 (Wed) @ 03:15

Phil—that’s quite a big difference wouldn’t you say? ‘Wasting’ double the amount of time consistently over 162 games a year implies a significant change in behavior.

I don’t think mound conferences is driving this. That should appear in the team variable. Given the consistency of the Yankees’ team part of that time would show up in the coefficient of Posada, Giambi, Matsui etc. not to mention the pitcher.

Are there large differences in team coefficients between the two models. If so that may affect how reliable the results are.


#21    Guy      (see all posts) 2010/05/05 (Wed) @ 09:39

I still think runners on first, especially runners who are plausible SB threats, is a big missing piece of the puzzle.  We have pitches with runners on as +9 seconds, but that’s a big category.  Presumably, a runner on 2nd or 3rd has little impact, but a runner on first more.  A runner on 1B with 2nd base open still more.  And a fast runner on first (like Jeter) might add 60-90 seconds or even more, depending how many batters/pitches occur with him still on 1B.  The SBA dummy will only capture that very imprecisely, since a SBA takes time itself, but also removes the runner on 1B for subsequent pitches.  In fact, it is probably when a fast runner does NOT attempt a steal that he has the biggest impact on game length. 

I don’t know if Jeter’s poor defense, especially early in the decade, is contributing to this as well.  But it does seem like weak defense at SS or 2B—Jeter is poor at turning the DP as well as turning GBs into outs—would have the greatest impact in terms of specifically increasing the number of opposing runners on 1B.


#22          (see all posts) 2010/05/05 (Wed) @ 10:24

John/20: I think the change from +4 to +2 is not statistically significant ... will verify that.  Also, I checked, and part of the +4 is due to the fact that Jeter’s substitute when he was hurt in 2003, Erick Almonte, came out faster than average but was not included in the original study (which had a minimum of one 250 AB season).  Because Almonte was faster, and played almost exclusively when Jeter was out, that inflates Jeter’s numbers a bit.

Guy/21: The latest regression includes pickoff throws (except for 2000-2002).  Shouldn’t that capture a big part of what you suggest?


#23    Guy      (see all posts) 2010/05/05 (Wed) @ 11:13

Phil:  do you mean every throw to first, or just successful pickoffs?  If the former, than I agree that should capture a lot of the effect (though I supect time between pitches is longer even when the pitcher doesn’t throw to first).


#24          (see all posts) 2010/05/05 (Wed) @ 11:20

Guy: every throw to any base is included.  Assuming Retrosheet accurately has them all.  Also, separate coefficients for pitches with runners on and pitches with the bases empty should capture most of that.

I can add a third category: pitches with a runner on first and second base open ... my guess is that it won’t make a whole lot of difference, but I’ll try later today.


#25          (see all posts) 2010/05/05 (Wed) @ 17:54

Follow up to 24: little difference when breaking out pitches in SB situations.  A pitch with runners on and 2B open was actually faster than other pitches with runners on, but that might be because pickoff throws and SB attempts were also included in the regression.


#26          (see all posts) 2010/05/06 (Thu) @ 14:57

From today’s New York Times:

http://www.nytimes.com/2010/05/06/sports/baseball/06gametime.html

“Before nearly every pitch of that game in which Jeter was at bat, he stepped out of the box for an average of 13.6 seconds. He blew on his hands, tucked his bat under his left arm and tapped his spikes. His teammate Robinson Cano delayed even more, averaging 18.7 seconds before settling back in. Cano stroked his bat, adjusted his gloves and wristbands, rested his bat between his legs, and took practice strokes.”

Other good stuff too, actual evidence timed with a stopwatch.


#27    Rally      (see all posts) 2010/05/06 (Thu) @ 15:06

When I played my instinct was to stay in the batter’s box in between pitches.  I actually had teammates tell me I should back out and delay a little bit.  I honestly don’t think it makes a damn bit of difference whether you delay 45 seconds (between the pitcher and the hitter) or just get on with it.  Either way, it’s your skill vs. the pitcher’s.  And while true talent changes over time, I doubt it changes in 45 seconds.

It would be cool if MLBAM had timestamps on the pitch by pitch data, we could see what effect players have when they slow down or play at a quick pace.


#28          (see all posts) 2010/05/06 (Thu) @ 17:37

It would be cool if MLBAM had timestamps on the pitch by pitch data, we could see what effect players have when they slow down or play at a quick pace.

If only!  smile

I’ve offered a couple times to help people get this data if they are interested in researching it.  That offer still stands.


#29          (see all posts) 2010/05/14 (Fri) @ 18:18

Phil has Part 2 of his analysis up:
http://sabermetricresearch.blogspot.com/2010/05/why-are-yankeesred-sox-games-so-slow.html

There are lot of good thoughts there.

However, one thing tickled my funny bone a bit:

Maybe when [Jeter]’s on first and a subsequent batter hits a long foul, he runs all the way to third base and takes a long time to get back?

That’s because, being so respectful of baseball’s tradition and unwritten rules, he knows better than to take the shortest path back to first base.  He has to run all the way around the pitcher’s mound!


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 23 01:15
How much should minor leaguers make?

Feb 22 22:31
Not everything you learn in college is true (duh)…

Feb 22 17:27
Would you cut to a regularly scheduled show, if the main event ran long?

Feb 22 17:02
This week in chart failure

Feb 22 16:26
Who’s evaluating the 2011 forecasts this year?

Feb 22 12:21
MLB 2012 Odds: BetOnline

Feb 22 07:11
K minus BB differential or ratio?

Feb 22 01:18
Two players have the same stats: one is much younger.  Which one will be better next year?

Feb 21 14:49
Knuckleball pitchers: all of them

Feb 21 13:57
Proper compensation for Epstein?