THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, October 27, 2011

Batter-pitcher matchups

By Tangotiger, 09:21 AM

Colin says:

Once we have that expected value, we can also look at the TAv from that batter-pitcher matchup from all previous seasons. We can run this data from 1951 through 2011, giving us sixty years of data and over 16,000 data points to look at.

Using a technique known as ordinary least squares regression, we can see how well our expected TAv and our prior batter-pitcher matchup TAv predict future batter-pitcher matchup TAv. After controlling for whether the batter has the platoon advantage, what we find is that our log5 estimate of the outcome of a batter-pitcher matchup is 67 times more predictive than the batter’s past performance against that pitcher. Now, that’s slightly better for the batter-pitcher matchup data than we might have expected; there were on average 78 times as many PA for the log5 expectation as there were for the batter-pitcher matchup. (Since there are both batter PA and pitcher PA against used to generate the log5 expectation, I used what’s known as a harmonic mean to come up with the PA totals for the log5 expectation.)

We can conclude that one plate appearance against a specific pitcher is slightly more predictive than a plate appearance against any pitcher at all. But that effect is dwarfed by the number of plate appearances a batter makes against all pitchers

...

But what about cases where a batter has really owned a pitcher in the past—just utterly demolished him? Let’s restrict ourselves to cases with a prior TAv of .520 against a pitcher, or twice the average TAv. (By happy coincidence, that’s just about two standard deviations above the average, for those of you who care about such things.)

Historically, these have been more predictive of batter success than ordinary batter-pitcher matchups. But they are still dwarfed by the predictive power of our log5 expectation, by a factor of about 24 times. A manager is likely doing himself a favor if he puts a guy with that kind of extreme success in the lineup in place of a batter who’s otherwise reasonably close in ability. However, such cases are extremely rare, and even in these extreme cases, the whole of a batter’s historic performance (combined with knowledge of the platoon advantage) is still a much better gauge of how a batter will perform against a pitcher going forward.

...

The data isn’t telling us that batters can’t pick up certain cues about a pitcher, or that a pitcher’s repertoire is equally suited to all batters. However, 10, 50, or even 100 plate appearances aren’t enough to tell us whether what we’re seeing is one player with a special edge against another, or simply a small-sample-size fluke, and there’s too much at stake for La Russa and Washington to let themselves be overly swayed by such statistics to the detriment of their teams.

Thank you Colin for doing the work!

Moral of the story: take your noses out of your spreadsheets and index cards, and watch the baseball game instead.

Colin: I’d like to know the regression equation, of how much to weight the batter-pitcher matchup and how much to weight the log5 expectation.


#1    MGL      (see all posts) 2011/10/27 (Thu) @ 13:28

I have not read the article yet, but it sounds like truly great work.

So basically what we have already said a million times before: “Use it as a tie-breaker and nothing else.”

Yes, I would like to see how much to regress the batter-pitcher results toward the expected results (for a given number of PA of course) or how much to regress the expected results toward the batter-pitcher sample.

Colin one thing:  I assume (again, I have not RTFA yet) you did not control for GB/FB platoon.  If you do that, I suspect that the entire predictive value might disappear.  Can you perhaps tell us in the extreme cases on both ends what the average GB /FB ratio is for the pitchers and batters?


#2    MGL      (see all posts) 2011/10/27 (Thu) @ 13:36

One more thing (I just RTFA):

Colin, when you computed expected TAv for each match up, did you use a log5 of the hitter’s and pitcher’s historical platoon ratio or did you put into the regression the actual 3-year platoon ratios of the batter and pitcher?

You say in the article that in the regression, you “controlled for whether the batter or pitcher had the platoon advantage.” Do you mean you just used a dummy variable in the regression?  If that, then of course in the expected TAv, you are underestimating those batters who have large expected platoon ratios against certain pitchers and vice versa so that of course the prior match up results would have some predictive value over and above the expected TAv (again, if the expected TAv were only based on “yes” or “no” whether the batter had the platoon advantage or not…


#3    anon      (see all posts) 2011/10/27 (Thu) @ 14:50

In stats there’s this notion of false discovery rate (FDR).  It gives you a way to do thousands of tests at the same time, and then for a significance threshold, figure out what proportion of observed events that excede that threshold are interesting and what are likely just noise.  For example, you might look at the levels of gene expression in healthy and cancerous tissues, and do hypothesis tests for 50,000 different genes simultaneously.  Then one sets an FDR threshold (maybe 20%), and you can draw the conclusion that of the genes that exceed that threshold, 4 in 5 of them are real interesting effects.

It seems like one could do the same thing with batter-pitcher matchups.  Look at all the batter pitcher match-up pairs out there with some minimum number of plate appearances.  Then for each of them, do a hypothesis test to see if batter performace was better or worse than expected (estimated by log5, controlling for platoon, etc).  Finish by using FDR to select interesting batter-pitcher pairs.

That would give you a list of potential interesting batter pitcher matchups with some confidence that you were looking at real effects for a possibly smallish number of pairs.


#4    Tangotiger      (see all posts) 2011/10/27 (Thu) @ 16:42

Anon/3: I don’t see how this can apply here.  Players are human beings, so they have a huge amount of commonality.  To think that a Neifi-Pedro confrontation may be unique, but that Neifi-Clemens and Neifi-Smoltz is not would be quite a stretch, with respect to your analogy.

A better analogy is that you have a million people flip 100 coins, and someone out there is going to get 70 heads.  That by itself proves nothing IF you picked that out after the fact.  So, you are given the results of the million flips, and you take an extreme case, and then say something about it.

The right thing to do is that if you have a population of observations is to FIRST see if the result of those million flips follows the normal distribution.  If you get that, then we can say that there’s no weighted coins, and chalk up everything to luck.

Now, Colin is suggesting that it’s not all luck, which is an obvious result.  But he’s also showing that it is “mostly” luck, and I’d love to see the regression equation.

To then pick out the most extreme matchups WILL tell you that there’s SOMETHING there, but, you have no choice but to regress that matchup to the same extent as all other matchups.  You can’t just say that because it’s the most extreme, you are going to regress less than you otherwise would.

Well, you can say that, but then you better prove it.


#5    Bill      (see all posts) 2011/10/27 (Thu) @ 17:33

TT, you keep preaching it, I keep questioning it.

Your prior is that all baseball players are the same (or, rather that they are observations from the same normal distribution with a specific mean and variance based on the observed numbers).

Now that may be a great prior. But that prior, like all Bayesian priors, is subjective. To quote Sunny Mehta on PB’s blob, “Nothing in the universe makes them inherently correct or incorrect”; and, “they might be awesome assumptions, perhaps the most predictive that anyone has yet shown, or they might be shit.”

The point being that if someone else wants to use a different prior—for example, maybe some data support the idea that wOBA is really bi-modal (I just made that up), then it’s an empirical question as to which is a “better” prior to use—and that of course is completely subjective also, you’d probably want to compare based on some out-of-sample prediction criterion.

I just don’t agree with the notion that the default prior ALWAYS has to be that all players are the same.


#6    MGL      (see all posts) 2011/10/27 (Thu) @ 17:41

Bill, Tango is not suggesting using a prior at all as far as I can tell. 

Tango, I think that ansn is essentially suggesting looking at the distribution and seeing if it matches that expected by chance and if not by how much. I could be reading it wrong though.


#7    anon      (see all posts) 2011/10/27 (Thu) @ 17:46

Tango—we’re talking across purposes here.  Take your coin flipping example.  You give 1,000,000 people 100 coins each and tell them to flip the coins, trying to flip heads. 

If none of these people has any skill at all, we expect to see 39 flip 70 or more heads.  But, what if almost everybody flipped exactly as we would expect with a fair coin, except we saw 100 people flip 70 or more heads?  This would suggest that of those 100 people about 39 got lucky, but the remaining 61 probably had skill.  We don’t have any way to know which of the 100 the skilled are, but we can be confident that skill exists.

The essence of my suggestion is that it may (or may not depending on sample sizes) be possible to identify “candidate” interesting batter-pitcher matchups.  The assumption being that with most pairs of batter and pitcher, we see little/no ability beyond what is expected by true (average) talent, but there may be a few pairs out there where for whatever reason the batter really does have an exceptionally good read on one particular pitcher.

I am not suggesting that you can avoid regression—I’m just curious about quantifying how may of these exceptional performances are likely due to luck and how might actually be attributable to some particular situation specific skill.


#8    Tangotiger      (see all posts) 2011/10/27 (Thu) @ 18:25

The prior is not that all players are the same.

The prior is that all “extra” splits are the same.  That is, after accounting for the base talent of the players involved, there’s nothing else beyond that.

Well, there’s a bit, like GB/FB splits, and handedness splits, and maybe speed/type splits.

But, almost nothing about the identity of the players beyond that.


#9          (see all posts) 2011/10/28 (Fri) @ 01:21

I think that anon is just suggesting another type of statistical test, which would produce something like this:

Say that we look at all matchups of 30 PA, and there are 300 of them.  And say we expect to find 20 of them at a certain distance interval from the mean (say, between 1.5 and 1.6 SD), the mean being the expected performance using a log5 of their true talent adjusted for platoon, etc. If we find 30, there is a suggestion that some of those 30 players, the most likely being 10, have some kind of batter/pitcher true talent.

Is that right, anon?


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 02:54
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com