THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, December 29, 2006

Pitch Sequences

By Tangotiger, 01:36 PM

Fantastic article by Sal Baxamusa.  One of the little things I’ve done, which I’ve set aside for… I don’t know why… was to extend my Markov chains to counts.  I could, in effect, tell you how many pitches Christy Mathewson threw, with just a few reasonable assumptions.

Sal however is making me think that one of my assumptions may not be reasonable.  He says:

When the first pitch is a ball and the second pitch a strike (called, swinging, or foul), batters have a line of .243/.312/.378. Curiously, batters perform better when the all-important first pitch is a strike and the second pitch is a ball; they hit at a .257/.314/.402. That’s a 25-point difference in OPS; not world-breaking but statistically significant (p<0.001 for you stat wonks) nevertheless.

I’ve always wondered if the path to a particular state (1-1, 3-2, etc) matters.  That is, is 1-1 itself a state, or do I now have to say the state was “0-1 to 1-1”, and “1-0 to 1-1”.  What I would consider one state for Markov chain purposes is actually better described as two states.  The interesting work is to see how far back the states need to go.  That is, if you have a 3-2 state, how far back in the count do you have to go, in order to establish the state you are in.

Most pitch data research should be recognized and applauded, and this article, as well as all of the Appelman articles, fits the bill.


SabermetricsData
#1    Peter Jensen      (see all posts) 2006/12/29 (Fri) @ 19:25

Having spent the day sequencing the first 8 pitches of every plate appearance from 2003-2005 this article comes at an interesting time for me.  Sal should learn to use Access, all 579000+ plate appearances can be done in a single table.  But responding to his main point, I think that what Sal is seeing is being caused at least partially by selection bias.  Hitters confident in their ability to hit a particular pitcher are more likely to take a first pitch strike if it is not in their preferred hitting zone.  A pitcher confident that he can get a batter out is more likely to throw a first pitch on the edges or just out of the strike zone.


#2    Joe Arthur      (see all posts) 2006/12/29 (Fri) @ 21:17

I was thinking much the same thing as Peter, at least about batters (my intuition would be that confident pitchers challenge on strike 1, rather than nibble).

Several years ago, Wade Boggs was extreme in two ways - he took the first pitch about 95% of the time and swung and missed only 5% of his swings. Put the two together, and Boggs, a significantly above average hitter, would be very under-represented in the sample of at bats which got to 1-1 with either a first pitch or second pitch swinging strike and overrepresented among those which got to 1-1 with a first pitch called strike.

One fairly simple way to structure a study which corrected for batter quality would be find the subtotals for each batter for these categories, and adjust the subtotals to have equal weight, before aggregating to get grand totals for the categories.


#3          (see all posts) 2006/12/30 (Sat) @ 00:37

Awesome study.  Wow.  I never would have guessed that this happens.


#4    Guy      (see all posts) 2006/12/30 (Sat) @ 01:43

Another theory:  In the strike/ball scenario, the pitcher has the edge 0-1 but then gives it up.  That would be an indication (on average) that the pitcher fears the hitter, while the hitter’s ability to correctly take the ball despite being in the hole is a sign of his confidence (and eye).  The ball/strike scenario is the opposite:  the hitter has an edge 1-0, yet squanders it either by swinging and missing or letting a strike go by.  In other words, it may not be the outcome of the first pitch that’s important, so much as how the two players perform in the context of a 1-0 vs. 0-1 count.

It would definitely be nice to see the BA/OBP/SLG lines for the two pools of hitters (and pitchers).


#5    tangotiger      (see all posts) 2006/12/30 (Sat) @ 09:33

I think Sal pretty much implies it’s a question of “direction”, or momentum.

I think we can accept that there may be such thing as momentum on some level, and while it may not exist (or we can’t find it) week-to-week, or day-to-day, or PA-to-PA, perhaps pitch-to-pitch we can see it.  Maybe momentum (pyschological impact of very short term events) exists for 20-30 seconds.

In any case, getting the seasonal BA/OBP/SLG of the batters and pitchers at each count would be very interesting.


#6    Peter Jensen      (see all posts) 2006/12/30 (Sat) @ 10:38

I don’t keep the batter and pitcher’s entire BA/OBP/SLG line for every PA in a way that is easily accessible but I do keep a modified OPS ((1.8 * OPB) +SLG).  Not much help for our theories in the aggregate data: Pitcher modOPS avg. for strike-ball sequence 1.023, for ball-strike sequence 1.021.  Batter modOPS for strike-ball sequence 1.025, for ball-strike 1.023. Average for all PA’s, Pitcher 1.023, Batter 1.022.  I’ll do called strike-ball versus swing and miss strike-ball etc in a bit.


#7    Guy      (see all posts) 2006/12/30 (Sat) @ 11:10

You’re right, not much evidence of selective sampling.  But can you look at the subsets when the ball is put in play on the 3rd pitch?  That seems to be where most of the difference occurs.  Is there any difference between S-B-BIP vs. B-S-BIP?


#8    Guy      (see all posts) 2006/12/30 (Sat) @ 11:28

I thought the weak outcomes when there had been a swinging strike was very interesting.  Is this typically such a strong indicator of a PA’s outcome? 

However, I think there’s an error in these results (avg/slg):
Second pitch swinging strike .261 .367
Second pitch called strike .290 .452
Second pitch foul strike .328 .472

He reports the overall slg on B/S as .472, which can’t be right if these three results are correct.


#9          (see all posts) 2006/12/30 (Sat) @ 11:56

Here’s a theory:

Suppose that hitters have idiosyncracies about the first pitch.  Some always take, no matter what.  Some always swing, no matter what.

Then, the first pitch wouldn’t correlate as well with the eventual outcome as the second pitch, since the second pitch is their “real” ability, while the first pitch is something they do out of habit or coaching.

That would explain why SB leads to a better outcome than BS.  In “SB,” the B is real but the S isn’t.

The same might be true for pitchers—they might always do something specific on the first pitch.

You can test the theory by noting that if it’s true, the outcomes should be the same for any sequence with the same first pitch.  SSB should be the same as SBS, for instance.

No idea if this actually happens ... I’m just trying to think of possible explanations, and this is the first one that occurred to me.


#10    Peter Jensen      (see all posts) 2006/12/30 (Sat) @ 12:55

Here are the results broken down by Pitch Type in the sequence for 2003-2005 along with the OBP and SLG that I found.

SEQ----N--- BatmOPS--PitmOPS--OBP---SLG
CBX--17611--- 1.017--- 1.029---.324---.508
SBX---2502--- 1.021--- 1.020---.335---.552
FBX---5476--- 1.030--- 1.028---.338---.582

BCX--12034--- 1.091--- 1.025---.304---.472
BSX---3382--- 1.024--- 1.023---.302---.468
BFX---7691--- 1.024--- 1.027---.334---.544


#11    Peter Jensen      (see all posts) 2006/12/30 (Sat) @ 14:07

Tested your theory Phil on some 4 pitch sequences and both the order and typ of strike still seem to make a significant difference in the outcomes.

Sequence----N----BatmOPS--PitmOPS--OBP--SLG

BFBX-------3576----1.030----1.029---.329--.581
BBFX-------3080----1.037----1.028---.350--.609

BCBX-------5436----1.026----1.030---.329--.528
BBCX-------5921----1.020----1.028---.328--.521

SBFX--------510----1.034----1.016---.306--.504
SFBX--------477----1.028----1.015---.323--.479

CBFX-------3493----1.024----1.028---.302--.444
CFBX-------3471----1.012----1.027---.320--.474

Not as consistent as Sal’s earlier findings on 3 pitch sequences but further argument that Tango’s plan for a Markov estimator that assumes no difference in path to state won’t work.  Makes one wonder about our assumptions in creating run expectation tables and win probability tables doesn’t it.


#12          (see all posts) 2006/12/30 (Sat) @ 14:27

Thanks, Peter.

For CBFX vs. CFBX, the last pitch ball leads to better OBP.  But for BBFX vs BFBX, it’s the opposite—the last pitch ball leads to worse OBP.

So it can’t be momentum, right? 

This is getting more and more interesting ...

Okay, maybe guys given the green light on 2-0 (BBF) have better eyes?  And guys willing to take on 0-2 (SFB) also have better eyes.  That would explain it ... if it’s actually the case.


#13    Guy      (see all posts) 2006/12/30 (Sat) @ 16:16

Great stuff, Peter.

It sure looks like it’s the called strike, rather than a swinging strike, that tells us the hitter is in trouble.  Very interesting.


#14          (see all posts) 2007/01/06 (Sat) @ 13:02

From #10

>> SEQ— SLG
>> CBX—.508
>> SBX—.552
>> FBX—.582

Off the top of my head, it’s mostly due to selective sampling, again.

Comparing “CBX” versus “[SF]BX”, the majority of hitters swinging at the first pitch tend to be sluggers, hence the higher SLG.

Same with SBX vs FBX: SBX tends to be a matchup more in the pitcher’s favor by default.

---

Now, let’s take another look at “strike/ball” vs “ball/strike” problem.

Here’s what Gil Meche threw after 1-1 count in a sample I have from the 2003 season:

After Ball-Strike
Zone 44%
Fastball 48%
Curve 33%

After Strike-Ball
Zone 61%
Fastball 90% (!)

Yes, Meche might be an outlier, but it’s pretty much safe to say that on a 1-1 pitch after “strike-ball” sequence you’re more likely to get a strike and it’s a (moderate) FASTBALL COUNT for a typical MLB pitcher.

The Yankees, for example, threw 56% fastballs after “strike-ball” against just 42% after “ball/strike”.

And this works exactly the same way here in Russia, but not in Japan!

But there’s more to it: pitch selection after a ball for a certain pitcher may also depend on whether the previous pitch (that resulted in a ball) was a fastball or not.

Mostly, missing with a breaking ball calls for a fastball on the next pitch more often.

For a hitter to get a benefit comes down to knowing a particular pitcher’s tendencies, which can vary by 180 degrees from one pitcher to another.

The better hurlers try to avoid any patterns, of course.

I’d like to add a couple diagrams on Meche or else if it’s possible to post images here.


#15    tangotiger      (see all posts) 2007/01/06 (Sat) @ 13:24

Another good thought.  So, we not only want to know the ball/strike sequence that put us to the 1-1 count, but also the pitch type sequence.  I imagine fastballs are more responsible for strikes than balls, which goes with what Sergei/14 is saying.

We really need that philanthropist to get a dozen of us in a room, and hash it all out for a year.


#16    Guy      (see all posts) 2007/01/06 (Sat) @ 23:17

"Off the top of my head, it’s mostly due to selective sampling, again.”

Sergei:  check out post 6.  If Peter’s results are correct—modified OPS of 1.025 vs. 1.023 for the two scenarios—selective sampling can’t be the answer.

* *

More thoughts on the post 10 data:
FBX and BFX are really in a different category.  In both cases the batters do quite well and there’s not much difference between them.  The failure of the pitcher to achieve either a C or S strike on either pitch clearly indicates (or gives) a big edge to the batter.

In the case of called and swinging strikes, I think we may want to think of this in terms of two questions:  1) what does it tell us when a hitter can’t make contact on a strike despite enjoying a 1-0 count, vs. what does it tell us when a hitter has the confidence/discipline to take a ball despite being in the hole 0-1.  I think the second pitch is giving us more information, in a sense, because it comes in a specific context in which either the pitcher or hitter has a significant advantage (whereas neither has an advantage on the 0-0 pitch).


#17    Guy      (see all posts) 2007/01/06 (Sat) @ 23:22

Also, the hitter and pitcher modified-OPS data for those three K/B scenarios are in post 10.  The CBX hitters are only a tiny bit weaker—not enough to explain these outcomes.


#18    Sergei      (see all posts) 2007/01/07 (Sun) @ 12:09

BTW, speaking of Japanese pitchers and pitching backwards, here’s a couple of graphs showing tendencies of SoftBank Hawks lefty Tsuyoshi Wada (2005 NPB season):

Pitch selection by count, pitches in the strike zone, against non-pitchers

http://www.tangotiger.net/files/t_wada_by_count.png

~~~

Wada vs RH-batters when ahead and behind in count

http://www.tangotiger.net/files/t_wada_vs_rhb.png

BAs here are actually BABIPs (including HRs) and are worse than average, while the .324 isolated power (IPw) figure with batters ahead in count is bad.

But Wada didn’t allowed many balls into play (InP) at the same time, hence the percentages of swings that became hits (H%) were OK.

What we see here is another case of predictability, only in a reverse way.

(Thanks to Tangotiger for taking care of the images)

***

Guy (#15),

> Sergei:  check out post 6 [...].

My “Off the top of my head, it’s mostly due to selective sampling, again” comment actually referred to post #10 (about SBX, FBX, CBX), while Peter’s post #6 was about ball-strike and strike-ball combinations.

> If Peters results are correctmodified OPS of 1.025 vs. 1.023
> for the two scenariosselective sampling cant be the answer.

IMHO, hidden selective sampling might be present even if we come to absolutely equal results or any other results, for that matter.

***

Now I have to apologize for the numbers above on Meche and the Yankees.

First off, the figures were actually for the 2-1 count, not the 1-1 count!

Anyway, the samples by itself were not to prove anything in particular - one can always find a sample to fit any theory (btw, 90% fastballs on 2-and-1 tell us a lot about Meche).

And just to get facts straight, on 1-1 the Yanks actually threw fastballs after “strike-ball” only 5% more often than after “ball/strike”, not 14%.

Sorry, again.

~~~

Everything else stands.

Most importantly, knowing and exploiting a particular pitcher’s patterns can help a lot.

On the other hand, as the patterns between pitchers can be opposites of each other or more pitchers do manage to avoid any patterns, aggregate league numbers might not show much, especially when it comes to subtle cases like 1-1 count as compared to 2-1 count.

Besides, comparing batting after different pitch sequencies by BA and SLG is a crude method all by itself.

~~~

Finally, so what did Meche throw on 1-and-1 after ball-strike and strike-ball?

Ball-Strike: fastball 51%, curve 27%, change 19%
Strike-Ball: fastball 45%, curve 31%, change 17%

Very similar, but look at Strike-Ball subset by the type of the last pitch:

Strike-Ball (Fastball): curve 40%, fastball 36%, change 23%.
Strike-Ball (Other): fastball 57%, curve 21% ...

Overall, the Mariners seemed to adhere to this tendency
Strike-Ball (Fastball): fastball 40% ...
Strike-Ball (Other): fastball 55% ...

While the Yankees were just the opposite
Strike-Ball (Fastball): fastball 70% ...
Strike-Ball (Other): fastball 47% ...

Sergei


#19          (see all posts) 2007/01/07 (Sun) @ 15:45

Sergei -
your work is very interesting. One other possibility which occurs to me is that we should control for men on base, because the pitching patterns may change in basestealing situations - more fastballs, some balls which are quasi-pitchouts and so on.

Peter -
I am unable to replicate part of your results in #10. I am using 2005 only data from mlb.com and agree well on 4 of the sequences, but not on SBX or BSX. (My values are closer to what Sal reported for 2005 NL only from retrosheet).

Here are mine:
SEQ-----NUMB----BAVG----SLG
CBX-----5981----.325----.505
SBX-----0789----.299----.469
FBX-----1804----.343----.580

BCX-----3956----.304----.472
BSX-----1083----.276----.393
BFX-----2635----.325----.533

My number of instances of each sequence is roughly 1/3 of yours for 3 years, which is promising.
Can you report yours for 2005 only so that we can eliminate the possibility whether it’s just an issue between 2003-2004 and 2005? I don’t want to get deeper into this issue until I understand why we don’t match on these results. you can also e-mail me.

Thanks, Joe


#20    Guy      (see all posts) 2007/01/07 (Sun) @ 17:07

"My “Off the top of my head, it’s mostly due to selective sampling, again” comment actually referred to post #10 (about SBX, FBX, CBX), while Peter’s post #6 was about ball-strike and strike-ball combinations.”

I understand, but post #10 also includes pitcher and hitter OPS, and you can see that selective sampling doesn’t explain the outcomes.


#21    Peter Jensen      (see all posts) 2007/01/08 (Mon) @ 01:55

Joe - 2005 was an unusual year.  Or, to be more accurate, there is no typical year.  Here are the BAs broken down by year.

SEQ----YEAR---AB------H----BA
SBX----2003---821----284--.346
SBX----2004---846----308--.364
SBX----2005---781----234--.300

BSX----2003--1101----344--.312
BSX----2004--1130----366--.324
BSX----2005--1050----292--.278


#22    tangotiger      (see all posts) 2007/01/29 (Mon) @ 12:03

An interesting blog entry on football play-sequencing:
http://www.pro-football-reference.com/blog/wordpress/?p=243


#23    tangotiger      (see all posts) 2007/02/05 (Mon) @ 12:33

Sal continues his very cool look, this time focusing on 3-2 pitches, based on 3-0 going to 3-2, and 0-2 going to 3-2 (all with no swings):
http://www.hardballtimes.com/main/printarticle/more-on-pitch-sequences/

Small sample size, as he only look at the 2006 season, but that should open the door for the other bright bulbs to expand on his work.

I find the whole thing fascinating.


#24    tangotiger      (see all posts) 2007/07/23 (Mon) @ 10:46

The next in the series by Sal to see how far the memory of the pitcher/batter goes:

http://www.hardballtimes.com/main/printarticle/thanks-for-the-memories/


#25    Tangotiger      (see all posts) 2007/08/15 (Wed) @ 12:20

http://retrosheet.org/Research/SmithD/How%20Valuable%20is%20Strike%20One.pdf

Pages 10-12 deals with pitch sequencing.

***

The rest of the paper focusing on the “at” counts, and ignores the “through” counts.  That is, it treats as a non-event for the “0-1 count” if it went to 1-1 or 0-2. 

It would have been preferable to also show the frequency at which an 0-1 count went to the other two counts, which themselves have their own (recursive) Linear Weights value.

***

Also for further research is the quality of pitchers/batters, as that would likely explain the differences with the type of strike called (swing-and-miss, contacted, called).  On the top of page 13, Dave said he found no difference, but it’s not clear how he separated his batters.


#26    Tangotiger      (see all posts) 2008/06/12 (Thu) @ 09:44

And here’s one from Dan Brooks:
http://brooksbaseball.blogspot.com/2008/06/another-strikezone-tidbit-changing.html

As likely noted at some point in our thread here, we need to control for the family and quality of the batter.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential

Feb 11 17:58
Clutch analogy

Feb 11 16:48
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 11:54
Who is Jeremy Lin?

Feb 11 10:29
Dwight Evans

Feb 11 02:12
Performance through the ages

Feb 10 23:01
For Your Soul

Feb 10 18:32
Moneyball at Villanova

Feb 10 17:00
Psst… wanna intern in Canada?