THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, May 07, 2009

Usefulness of batter/pitcher matchups?

By Tangotiger, 11:57 AM

In this BTF forum thread, Chris Dial is saying:

I’m no regression expert, but where is that point?  80 PAs?  20 PAs?  Tango and I have had a similar discussion around pitching matchups, and I still disagree that it’s useless info.  Much of it is, but there are real human factors that wash out in non-specific samples, but can be very real, based on the nature of the interaction.

Craig concurs:

I disagree as well.  The problem I have with using matchup data is that I would much rather put the qualitative rather than quantitative data to work here.  A true saberite wouldn’t want to use any of it, but while I disagree completely with that position, I think using matchup data, while it is as you very susceptible to human factors and highly non-random in origin, can often obscure rather than illuminate unless accompanied with a qualitative appreciation of those appearances…

But I do agree with what you say.

Let me quote myself in The Book:

The Book Says:
Knowing a player will face a particular opponent, and given the choice between that player’s 1500 PA over the past three years against the rest of the league, or twenty-five PA against that particular opponent, look at the 1500 PA.
...
The Book Says:
Sixty highly targeted PA are still not enough evidence to overwhelm the knowledge contained in 1500 random PA.
...
You see, we’re not saying that it doesn’t matter which pitcher is facing which hitter. It most certainly matters. Every person is different, and there’s no reason to think that two overall equally talented pitchers, but talented for different reasons, will necessarily have the same success level against the same hitter. However, you can’t tell by looking at the numbers from twenty-five or sixty PA. There is simply too much noise masking the truth under those numbers. You can’t say Edgar owns la famiglia Cormier, or that Mussina owns Varitek because, well, look at the numbers. The numbers don’t support your statement, because of the small sample sizes. For you to say that a certain hitter owns a certain pitcher, you have to go beyond the numbers. You have to look at the very specific traits of these players. We’ll look at a few traits in a second, but as noted earlier, there are many different kinds of traits to consider. When looking at batter/pitcher confrontations, scouting information becomes a critical component to the analysis.

As far as I can see, Chris, Craig and I all agree on the issue.  I disagree with Chris’ characterization of my position, and I totally disagree with Craig’s characterization of what a “true saberite” is (which he makes it seem to be one who ignores all qualitative information).


#1    Paul Scott      (see all posts) 2009/05/07 (Thu) @ 12:29

Reading the thread at BTF it seems to me they are self-contradicting at worst or at best using arguments that don’t support their conclusions.

If the question is: Is professional scouting information useful in determining likely outcomes for batter-pitcher match-ups?  As I read it, all three agree that yes, it is.  I expect all three might even agree that the scouting information might be as valuable or more valuable than 1500 PAs of “against the league” data.

What Craig and Chris seem to me to be saying, however, is that “Because the profession scouting data is very useful in determining likely outcomes of batter-pitcher match-ups, we will also say that the results from the very small number of PAs with this Batter-Pitcher match-up is also useful.” That, to me, is clearly wrong.

The scouting data is independent of the SSS match-up data.  If the scouting data confirms the SSS PA data, then the SSS PA data has added nothing new.  If the scouting data contradicts the SSS PA data, then the SSS PA data should be ignored.  The end result is the same.  You cannot rely on the SSS PA data to tell you anything, because it is absorbed by noise.  So, if you are lucky enough to have access to reliable, professional scouting data, listen to it.  If not, then relying on the large sample size PA data against the league is better than the Small Sample Size PA data for a particular Batter-Pitcher match-up.


#2    Dackle      (see all posts) 2009/05/07 (Thu) @ 12:37

Another way to illustrate the problem with batter/pitcher matchups is to look at the year-by-year logs for a specific long-term matchup, eg Pete Rose vs Phil Niekro:

http://www.baseball-reference.com/play-index/pvb.cgi?n1=niekrph01&n2=rosepe01

The batting averages are all over the place—Rose owned Niekro in 1968 (.385), then Niekro owned Rose in 1971 and 1972 (.071 and .067), then in 1973 Rose owned Niekro (.667), then in 1976 Niekro owned Rose again (.083) etc etc. It looks silly to draw conclusions from those numbers when you look at them over the longer term. And ... the lifetime average for the Rose/Niekro matchup (.283) seems to be generally in line with Rose’s career average and Niekro’s average against.


#3    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 13:05

Dackle: great illustration!

If there is any truth to this “owning” business, we’d see more deviations than expected.  While I am sure you’ll find it, I’m also sure that whatever you find will be the usually tininess we find with clutch hitting, streaks, “being on” and the like:

something less than the platoon advantage, and you’d only be able to reach that conclusion once both players are in their mid-30s.

The one thing that is strange is that we are constantly reminded that these are people not machines.  And people make adjustments.  So, isn’t it possible that if Tim Wallach is 2-19 against Dwight Gooden, that he might make some adjustment that lets improves his chance to hit a HR in his next at bat?  It’s precisely because they are people and not machines that we can say that we don’t want to be married too much to the idea of players-who-own-players.


#4    MGL      (see all posts) 2009/05/07 (Thu) @ 14:10

Tango’s research for The Book strongly suggests that the results of B/P matchups have almost no value in predicting future performance for those same B/P matchups.  That is irrefutable and there should be no argument there.

Now, how much scouting and general “baseball knowledge” can inform these matchups over and above a basic log5 projection based on the batter’s and pitcher’s individual projection, including platoon and G/F projections, is another question altogether.

Common sense would suggest that it can.  The question is using “what” and by how much (and by whom)?  I don’t think anyone knows the answer to that question.  It seems to me to be folly to argue it.  Where a “saberist” comes into play, like any scientist, is in testing and analyzing that question in a proper fashion so that someday we might have some kind of answer (how can we project those kinds of matchups and how much better can we do than a standard log5 projection?).

Just saying something like “a manager or a scout can surely figure out which matchups are good or bad (for the pitcher or batter) means nothing.” We would want to know what information they are using, and how well that information can inform us.

What we would need, for example, would be an experiment like Tango’s Fan Clutch Project, whereby scouts, coaches and managers tell us which matchups are particularly favorable and which ones are not, and then we track the performances over a season or two and see how they do as compared to what a traditional log5 would expect.  As with the Clutch project, I would guess that the coaches, et al. would do a little bit better but not much.  Of course I really have no idea, and again, it is not something that I can argue one way or the other.

One reason, I think, that they would NOT do so well, is that a certain small percentage might be right on, whereas another percentage or maybe even the majority of the coaches, et al., would have little idea what they were talking about, and in fact, would be using prior results to inform their opinions, which we already KNOW does not work…


#5    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 14:34

I concur.


#6    Matt Swartz      (see all posts) 2009/05/07 (Thu) @ 14:44

I will admit to start that I have not yet read The Book or the chapter on Batter/Pitcher matchups, so it’s possible that this is already covered in there, but I think there’s a very important distinction to make.

Using a hitter’s batting average against an individual pitcher is bound to be based on way too small of a sample size for a statistic with little player to player difference in skill level. 

If I tell you a hitter has 10 hits in 25 at-bats against some pitcher, then saying “he hits .400 against him despite being a .280 hitter, and therefore he’s got an advantage against him” is useless.  But the sample size doesn’t have to be at-bats and the variable examined doesn’t need to be hits.  If I were to tell you that that 10 for 25 is actually the following line:  35 PA, 25 AB, 10 H, 10 BB, 2 K, 6 HR, 3 2B, 1 1B, with 2 GB/10 LD/11 FB, then all of a sudden you a different story.  The reason is that while average has a little player to player variance, HR% BB% and K% have lower variance.

Basically, you can figure that 35 PA could be made up of 140 pitches thrown and 70 swings.  If the hitter usually makes contact on 50/70 of swings but made contact on 67/70 swings, that’s probably significant.  If he lays off 80% of pitches out of the strike zone normally but 95% against one pitcher, that’s likely statistically significant.

I would venture to guess that looking at statistics like the three true outcomes in addition to statistics of things on Fangraphs like O-Swing%, Contact%, etc., might significantly change the game on something like this.  It’s a matter of looking for statistics with larger variances in skill from player to player and peripherals statistics with larger sample sizes than just the sample size of plate appearances.


#7    Guy      (see all posts) 2009/05/07 (Thu) @ 14:47

Could there be any utility in trying to determine if there are types of pitchers—rather than specific pitchers—that a hitter is particularly good/bad against?  For example, a hitter who can’t handle FBs over 95 mph, or can’t hit a split-finger?  This would give you much larger samples to work with.  Have you guys already explored that?  (I seem to recall some analysis of FB/GB tendencies...)


#8    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 14:51

If the hitter usually makes contact on 50/70 of swings but made contact on 67/70 swings, that’s probably significant.

Sure, but that’s not how the believers frame it. I’m all for pitch-by-pitch and tools-based-level analysis.

***

I think that chapter is available for free from Amazon: read it!


#9    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 14:58

Guy, right, that’s where the focus should be.

In The Book, I looked at GB/FB, contact/noncontact, good/bad, etc pitchers.  Nuthin.

Looking for true comparables based on pitch repertoire is the way to go.

If I had to guess, it’s a very specific kinds of hitters and pitchers that could have an impact.  Say a power hitter with a very long swing, who always hits that way.  I imagine those hitters have a bigger hole that can be exploited.  I have to believe that you’d have maybe 10-20% of hitters that you can worry about this stuff, if they faced 10-20% of specific pitchers that can take advantage of this. 

So, you are down to about 2% of PA where the matchups would matter.  But coaches will act like it’s 20%.


#10          (see all posts) 2009/05/07 (Thu) @ 15:25

for some reason this blog won’t display on my screen properly.  It’s all over to the left and the first word or two of every line is cut off.


#11    Guy      (see all posts) 2009/05/07 (Thu) @ 15:30

The Dial filter is working!
:>)


#12    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 15:56

I’m using IE 7.0 and FF 3.0, and both look fine to me.

I know I have Chris’ issue with other sites…


#13    Gary Geiger Counter      (see all posts) 2009/05/07 (Thu) @ 16:23

Tom, other than the lefty/righty split, didn’t The Book say something about GB/FB splits being meaningful?


#14    Tangotiger      (see all posts) 2009/05/07 (Thu) @ 16:31

Yes, the GB/FB is meaningful for those players who are in the extreme.  That is, I think I classified about one-sixth in the GB category and one-sixth in the FB category (for batters and pitchers).  And there is a platoon-advantage for those categories (similar to the opposite-handedness of lefty/righty for hitters and same-handedness for pitchers).

So, for a pitcher like Zito or Lowe, these kind of platoon splits matter somewhat.


#15    Zack      (see all posts) 2009/05/07 (Thu) @ 17:25

I wonder if you could use video game data to establish pitcher “molds” easily.  They’re relatively invested in getting at least pitch selection and throwing speed into buckets.


#16    MGL      (see all posts) 2009/05/07 (Thu) @ 18:24

Chris, I had that problem and corrected it with the “zoom” feature of the browser, I think.

Matt, the issue is not whether the things you are looking at have a high player to player spread, it is whether there is in fact a spread batter/pitcher to batter/pitcher, over and above what a conventional matchup (odds ratio with platoon adjustments) formula will tell you.  Even if there is lots of variation in K%, HR%, and BB% among pitchers and batters, which there is, that does NOT mean that there will be ANY variation at all for these matchups.  That is yet to be determined.  And as Tango says, OF COURSE he did not only look at BA in the research for The Book.  He looked at wOBA which basically represents overall offensive production and really the only thing that counts in terms of any usefulness of these kinds of predictions (of course if a certain matchup favored a K or a GDP, or not, or something like that, it might be useful in a certain situation).

As Tango illustrates and suggests above, even if there is SOME prediction you can do (with scouting or whatever), over and above career stats, everyone (commentators, players, coaches, managers, etc.) is going to overstate, misrepresent, and mis-use it by a factor of 5 to 10.  Just like clutch and everything else.


#17    Chris Dial      (see all posts) 2009/05/07 (Thu) @ 21:18

However, you can’t tell by looking at the numbers from twenty-five or sixty PA. There is simply too much noise masking the truth under those numbers.

I said “I still disagree that it’s useless info.” These sentences certainly seem to say you think those numbers are useless. 

You can’t say Edgar owns la famiglia Cormier, or that Mussina owns Varitek because, well, look at the numbers. The numbers don’t support your statement, because of the small sample sizes.

This is unclear.  There *is* a level of abuse where even you would say “that is outside 3 sds of the expected performance”.  No?

For you to say that a certain hitter owns a certain pitcher, you have to go beyond the numbers. You have to look at the very specific traits of these players.

Why?


#18    MGL      (see all posts) 2009/05/07 (Thu) @ 21:56

Dial is misunderstanding statistical principles, as many people seem to do, although I certainly would not expect it from him.  Bill James does this all the time.

Here is the situation:

I have a bunch of player performance splits (clutch/non-clutch, batter/pitcher versus career expected, etc.) and I am testing the hypothesis that these splits have any predictive value.  This type of analysis comes up a lot in baseball of course.

Now, given a large enough sample size, you will see players with 2 SD, 3 SD splits, whatever.  That does not tell you anything of course! Again, given a large enough sample of players, you are guaranteed to see all kinds of large splits by chance alone.

So, how do we determine whether there is any “skill” involved in those splits we will see no matter what?  Well, there are a number of statistical techniques.  Certainly one of them is to do what Tango did.  Another is to look at correlation coefficients from one time period to another.  I suppose you can do T-tests. Or you can compare the expected distribution assuming the null hypothesis with the actual distribution (James likes to do this).

Now, let’s say that from one or more of these analyses (statistical techniques), it looks like there are not true splits (which is basically what Tango found given the sample sizes of the number of PA between each batter and pitcher), and that the variation in splits we see is likely just noise? (Keep in mind, of course, that we never know for sure - since we are dealing with sample data - but we can make some inferences and draw some conclusions with some degree - never 100% - of confidence that these inferences and conclusions are correct.)

Now, people like Dial (and James sometimes) say, but surely the guys who are like 2 SD and 3 SD in their splits have something going on, right?  I think that is what Dial is saying.  If not, he can correct me.

No! No! No!  Since we already determined that there was no aggregate or wholesale “skill” in the population we looked at, and we have no reason to believe that these 2 and 3 SD players are not part of that population (the fact that they have large splits does NOT mean that they are not part of the population), means that we HAVE TO assume that those 2 SD and 3 SD splits are noise!

Now, there is one caveat to that.  That is that IF we are wrong (we made a TYPE I error by accepting the null hypothesis when it is not true), the players we are obviously most likely to be wrong about are these players.  But if we accept the fact that there are NO true splits in our population, then we must accept that for ALL the players in the population.

Now, if it turns out that there is some small true split for, say, batter/pitcher matchups, and Tango certainly does not have enough data and has not done the most rigorous of statistical tests on that data, then obviously when we apply the proper regression to everyone’s splits (let’s say that it is 90% for 40 PA), the players with the largest sample splits, the 2 and 3 SD players, will obviously have the largest estimated true splits.  This is EXACTLY the case with clutch. 

We (Andy) found that clutch is likely a small skill. Therefore the ones with the largest clutch splits, SD-wise, are the ones that we estimate to have the highest true clutch skill, although not nearly as large as their sample splits of course.

But, and I will repeat this because it is really important and apparently people, like Dial and James, who do lots of “statistical” research don’t fully understand it (and if they do, they can correct my assumption), once the research shows that there is NO skill in the population, it does not matter how many SD an individual player or players in that population is!  It is assumed that those large and unlikely splits are 100% due to chance.  Again, the caveat is that IF we made a mistake in assuming no skill in the population, then those are the players that are most likely to have it to a large degree, which should be pretty obvious.

But the fact that there are several players who are 2 or 3 SD does NOT have any bearing on whether we made a mistake or not in our inferences or conclusions since those inferences and conclusions were based on our entire population of players, including those 2 and 3 SD players!  If there were an anomalous number of those players, given our sample size, then the statistical tests we did would likely have come up with the conclusion that there likely IS a skill among these players!

Here is a perfect e3xample of what I am talking about, which we have discussed before.  We have a bunch of coins that we flip a hundred times. One of them came up 70/30.  Surely that must be biased coins!  After all, that is 4 SD!  No! No! No!  We KNOW with a high degree of certainty, that there is no such thing as a significantly biased coin, therefore we HAVE to assume that the 70/30 occurred by chance, even if we only flipped 10 coins.  Now, as in my caveat, if there is any chance that there ARE some biased coins, then the fact that we had a 4 SD coin in only 10 coins might suggest there we underestimated the chance of biased coins existing, and if there are any biased coins, this one coin is a likely suspect.

But, as I said, if Tango already determines from the data that there does not seem to be any predictive value from batter/pitcher matchups, the fact that one or more of those matchups is “off the charts” means nothing.  We still must assume that those anomalous matchups randomly occurred.  Again, if there were so many anomalous results given our sample sizes of players, Tango would NOT have come up with the conclusion that there is no predictive value to them in the first place!  So don’t bother saying, “Well what if he looked at 100 batter/pitcher matchups, and there were 10 that were 2 SD and 3 that were 3 SD?” If that were the case, it is very likely that the statistical tests he used suggested that there WERE predictive value.  The fact that there wasn’t any likely predictive value found means that the number of anomalous results was likely around what you would expect by chance alone.

This is a really important concept which needs to be repeated from time to time - every time someone says, “Yeah, but what about that 4 SD coin (player)?  Surely that could not have occurred by chance!”


#19    Guy      (see all posts) 2009/05/07 (Thu) @ 22:56

I have a question about the “Andy method” of establishing a clutch talent.  He found that the spread of player differences clutch/non-clutch OBP was greater than we’d expect from random chance.  And he controlled for quality of opposing pitchers.  Still, that leaves a lot of things that could vary.  Presumably, some players enjoyed high platoon advantage in their clutch PAs, some had them more at home than on the road, some when they were 27 instead of 31, perhaps even some against pitchers they “own.” All of that introduces additional variance beyond the straight binomial variation, right?  So how do we know we’re seeing a talent rather than variance we couldn’t control for? 

It seems to me one test to run is to do the same exercise, but rather than selecting “clutch” PAs select a similar # of random PAs per hitter.  Then compare the variance in clutch/non-clutch spread to what you see with randomly-generated splits, to see if there’s a difference.  (Maybe Andy did this, and I’ve forgotten.)


#20    Bjorn      (see all posts) 2009/05/08 (Fri) @ 04:22

I actually have the same problem as Chris Dial. Text moves further and further to the left as comments go on and eventualy I start to lose the initial letter(s) of each line.

Until now I’ve just assumed this is because of the ancient browser I use when surfing from work. (Our company still has IE 6.0 as standard.)


#21    MGL      (see all posts) 2009/05/08 (Fri) @ 04:58

Guy, Andy would probably have to answer that or perhaps Tango knows the answer…


#22    Peter Jensen      (see all posts) 2009/05/08 (Fri) @ 06:08

MGL - Re post #18.  I think there is a flaw in your argument.

Now, let’s say that from one or more of these analyses (statistical techniques), it looks like there are not true splits (which is basically what Tango found given the sample sizes of the number of PA between each batter and pitcher), and that the variation in splits we see is likely just noise? (Keep in mind, of course, that we never know for sure - since we are dealing with sample data - but we can make some inferences and draw some conclusions with some degree - never 100% - of confidence that these inferences and conclusions are correct.)

As you correctly mention in the quoted paragraph no statistical test can state with 100% assurance that a population does not have a particular trait present to some degree.  For example, in your coin tossing experiment above, if one coin out of 1000 had a 70% bias toward heads no statistical test on the entire population that I know of would be able to determine that that bias existed.  Every test on that population would conclude that no SIGNIFICANT bias existed within the population and that conclusion would be correct because the bias is present at only a .1% level.  But the one coin IS biased nonetheless and that bias could be discovered with intensive testing of individual coins.

So it is quite possible that even though Tango could not find any evidence that there was no significant bias that could be shown by testing individual matchups in the population as a whole, that individual examples of matchup bias might still exist.  Having said that, I think that logically they would have to be very rare.  If one pitcher had found a flaw in a batter’s skill set that he could take advantage of (or vice versa) then many pitchers would eventually discover that flaw.  The skill sets of pitchers are just not all that different.  Once many pitchers discovered the batter’s flaw he would soon be out of baseball unless he was able to adjust to correct the flaw.

The only evidence that I find compelling that matchups may be important is that players themselves seem to think they are important.  When you read players discussing their careers there is often mention of a particular pitcher or batter that they either “owned” or that “owned” them.  Whether that was actually due to a particular skill that one player had over another is less important than the player’s belief that that skill was actual, since the belief would be a powerful factor for a self fulfilling prophecy.


#23    Matt Swartz      (see all posts) 2009/05/08 (Fri) @ 22:51

MGL, I definitely see what you’re saying and it’s an important thing to remember when determing whether you can reject a null hypothesis like this.  In other words, if you flip 1000 coins 1000 times each, and use 5% as your cutoff rule, 50 coins will come up “statistically significant” that aren’t any more likely to come up heads the next time than the other 950 coins.  Clearly, that’s true.  The best method seems to be a comparison of the variances. 

What I’m saying is that probably the least noisy statistic is the percent of time that hitter misses when he swings.  One needs a model in mind when performing analysis like this and calling into question a reasonable hypothesis like Tango found in The Book.  The model I have in mind is that pitchers have different deliveries and release points and hitters thrive on their ability to pick up the ball quickly.  If there’s a skill in hitting the ball that is not based on a certain type of pitcher but a certain individual pitcher, it really seems like it would be ability to see the ball and the least noisy way to pick that up would be swing-and-miss rate.  That would be the test I would do to confirm or reject Tango’s results using wOBA.  If nothing else, I’d bet hitters see different pitches better than others.  This is especially true since LH/RH and GB/FB stuff is coming up significant.  If LH/RH and GB/FB stuff are coming up real, I don’t see why matchups would be limited to that.


#24    MGL      (see all posts) 2009/05/08 (Fri) @ 23:23

Peter, yes of course what you said is correct.  I don’t think that is a flaw in my argument or logic.  As I and others have said many times, if we cannot identify a particular effect, for all practical purposes it does not exist.  That does not mean that it does not exist.  That should be obvious when we are dealing with sample data and when we are dealing with complex interactions especially those involving human beings.  The ONLY question I am addressing is, “Given the research that Tango and others have done, if a certain player is 2 or 3 SD’s against a certain opponent from what we would expect given their and their opponent’s traditional projection, including any platoon effects of course, what is our best guess as to the future performance of that matchup?” And the answer is that we expect a traditional outcome.  That, of course, does NOT mean that 1 in a 1000 (or 10,000 or 500 or whatever) in reality does not produce a non-traditional outcome.  It just means that when we find no true splits in a population, based on an analysis of a large amount of sample data (and of course we are not 100% certain that there are no true splits) it matters not whether we encounter a matchup that is .5 SD or 2.5 SD from what we expect from a traditional matchup model.  The answer is the same.  In both instances, we expect no splits in the future.  The best way to wrap one’s arms around this concept is to think of it in terms of regression towards the mean.  When we find NO true splits, we regress everyone 100% toward the mean.  So whether a player or a matchup is .5 SD or 5 SD, 100% regression yields exactly the same result!

Matt, yes of course there are better ways to detect a matchup phenomenon than just looking at wOBA or OBP or BA.  We know that.  We fully admit the possibility that there are matchup effects that are not captured in Tango’s kind of analysis.  He is the first one to not only admit that, but articulate it in The Book.  In fact, he says that there likely ARE matchup effects, but we can’t tell from looking at traditional stats in 30 or 40 PA.

And, BTW, the fact that players (and managers, et. al) think that something exists should mean almost nothing to us.  They think all kinds of silly things exists that don’t.  I am not saying that matchup effects do not exist, but why should this be any different from all the other goofy things that players think exist that don’t?


#25    Matt Swartz      (see all posts) 2009/05/08 (Fri) @ 23:36

Yeah, managers and players say crazy things exist that don’t.  I know that-- I do watch Sunday Night Baseball sometimes after all!

But I think that the fact that LH/RH splits exist and vary from player to player and GB/FB splits exist and vary from player to player indicates that certain players may very well do better or worse against certain pitches and speeds.  If you’re a LHB who can’t hit a 88 MPH slider from a same-handed pitcher but can hit a 80 MPH slider from a same-handed pitcher, you might have more trouble with the guys that throw it than other guys who may have trouble picking up fastballs from same-handed pitchers and have similar LHP/RHP splits.  Given that, I would suspect matchup data might be more specific within that and cater to seeing deliveries, at least for very unconventional deliveries.  It might be that it takes 200 PA against for this to become visible, but it might not be quite that bad.  I have to imagine some people struggle with Tim Wakefield and some don’t.  I can’t imagine every has the same ability to hit side-armers.


#26    Chris Dial      (see all posts) 2009/05/10 (Sun) @ 22:03

The Book Says:
Knowing a player will face a particular opponent, and given the choice between that player’s 1500 PA over the past three years against the rest of the league, or twenty-five PA against that particular opponent, look at the 1500 PA.

I disagree with this.  I manage the Minnesota Twins.  Coming up, the Yankees are coming to face us and Sabathia is going to open the series.  Do I start Mike Redmond or Joe Mauer at catcher?


#27    MGL      (see all posts) 2009/05/11 (Mon) @ 02:47

Are you seriously implying that you think The Book does not want you to take into consideration a platoon adjustment?

The answer to your question is you would play whoever hits better against a left-handed pitcher with the same expected platoon ratio as Sabathia. If it is a tie or thereabouts, then you can choose whomever between Redmond and Mauer has done the best (however you want to measure that) versus Sabathia.  And then of course there is defense, including game calling…


#28    Tangotiger      (see all posts) 2009/05/11 (Mon) @ 10:20

In reply to Chris: You would make the decision based on the player’s true talent level, the platoon split (Redmond is a RHH, Mauer LHH, CC LHP), and if it’s close, use the matchups as the tie-breaker. Catchers, however, have an additional need for rest.

Anyway, Chris is referring to the following: in 31 PA (excludes IBB) that Redmond faced CC, he has gotten on base 17 times (11 singles, 2 doubles, 3 walks, 1 hit batter, with 2 K).  Mauer, in 22 PA, is 4 singles, 1 double, 2 walks, with 8 K.

The question being asked is how much do we consider this information.  I would first say that given that we are resigning ourselves to not starting Mauer one game per week, you rest him against the LHP.  And if you are going to face 2 LHP, then play Redmond against whomever he happens to hit better.

Setting aside the peculiarities of the catcher, Chris’s more general question is how much to weight this information.  Mauer’s Marcel is a wOBA of .373 (career .374), and Redmond is .309 (career .315).  A shorthand for the platoon split is simply to remove 10 points to one guy, and add 10 points to the other guy.  So, Mauer, against the same-handed split is .363, and Redmond against opposite-hand is .319.  That’s still an enormous gulf.  Rest-issues aside, there is no situation that you’d want Redmond over Mauer.

What if it was closer though? 
- We know that each player has his own talent level, but in order to figure out how good a hitter he is, we need to add 250 PA of league average rates.
- We know that platoon splits by hand are real, but in order to figure out the true skill, you need to add at least 1000 PA of league-average splits just to figure out the player’s true handedness-platoon split. 
- We know that clutch splits are real, but for that true skill, you need to add at least 5000 PA of league-average splits to get to the bottom of the guy’s true clutch split.

In the continuum of “how much noise does sample data have”, the “who owns who” falls somewhere between the first and third in the list above.

Even if you think that the “who owns who” is as real as a player’s overall talent level, you’d still need to add 250 PA of league average stats.  And therefore, you would need to regress 90% toward the mean if you had 25-30 PA of matchup data.  So if Redmond has a .500 wOBA against CC, at the bare minimum, you need to regress that 90% toward his .319 (with the platoon advantage), or .337.  And if Mauer has a .300 against CC, you regress 90% toward his .363 (against platoon advantage), or .357.

So, you’d need to have an extraordinary difference in split to even possibly consider starting Redmond over Mauer (all other things equal).

***

Here is also a relevant article from Dan Fox:
http://www.hardballtimes.com/main/article/tony-larussa-and-the-search-for-significance/


#29    Chris Dial      (see all posts) 2009/05/11 (Mon) @ 10:22

Of course I don’t think platoon slits are ignored.  But see your last part?  “If it is a tie or thereabouts..the matchup counts”.  That’s what I am saying - where is that point?  How big is “thereabouts”? 

If a hitter is 20-for-20 against a pitcher, are you still attributing that to “luck” or randomness?  Okay, what about 15 for 18?  This is a better view of what can be useful and instructional about the utility of matchup data.

Mike Redmond is 21 for 48 against Tom Glavine.  Redmond would get the start over another RHB catcher even over a catcher with a better platoon split (Redmond has an 808 OPS vs LHP, so pretend catcher 2 has an 850 platoon split, but is a mere 3 for 10 vs Glavine).  Redmond’s got to be the choice here, right?  Even if the other catcher is routinely a better hitter (Redmond overall isn’t good). 

Perhaps the answer is still “That’s not enough to give Redmond the start over Catcher 2”, but *somewhere* the balance tips.  I think it is meaningful - there is something in a personal matchup where a hitters focal point, pitcher’s release point, ability to track for the hitter can be slightly better on a given pitcher/pitchers. 

Redmond in particular is a curious case http://www.bb-ref.com/play-index/shareit/zNqG .  *Maybe* the segments can be larger (GB/FB, sinker/slider, vs changeup), but they may be more related to arm slot and release point - which is where I’d look first, as early ball-tracking/pitch identification is more important, I think.

Remember, Mike Redmond overall all is only a .808 OPS against LHPs.  What’s the difference in the pitchers?  Is there one?  Is it all randomness?


#30    Tangotiger      (see all posts) 2009/05/11 (Mon) @ 11:11

I’m going to re-quote myself:

You see, we’re not saying that it doesn’t matter which pitcher is facing which hitter. It most certainly matters. Every person is different, and there’s no reason to think that two overall equally talented pitchers, but talented for different reasons, will necessarily have the same success level against the same hitter.

However, you can’t tell by looking at the numbers from twenty-five or sixty PA. There is simply too much noise masking the truth under those numbers. You can’t say Edgar owns la famiglia Cormier, or that Mussina owns Varitek because, well, look at the numbers. The numbers don’t support your statement, because of the small sample sizes.

For you to say that a certain hitter owns a certain pitcher, you have to go beyond the numbers. You have to look at the very specific traits of these players. We’ll look at a few traits in a second, but as noted earlier, there are many different kinds of traits to consider. When looking at batter/pitcher confrontations, scouting information becomes a critical component to the analysis.

So, I have to say that Chris and I agree in principle on the issue.  And the only question is the degree that the performance stats matter.

(This is the same issue with DIPS that I have with others, as well as clutch.  Yes, all these things are real.  The question is how much signal is there in all that noise.)

And if we can go beyond the performance stats (how does Redmond hit pitchers with the repertoire, if not quality, of CC?), we’re still in agreement, and again, it’s a matter of degree.

I’ve already shown in The Book that you can have to go beyond the numbers, that the guys with the most extreme batter/pitcher splits simply did not keep it up.  So, we need to get beyond that.  If Chris wants to offer up the reason that Redmond is particularly suited to facing CC, without quoting his 31 PA, then I’m ready to listen.  But, I definitely do not need to know how well Redmond has done in his 31 PA as a way to explain why those 31 PA are relevant.  They barely cause a dent.


#31    MGL      (see all posts) 2009/05/11 (Mon) @ 13:33

I apologize for the “platoon” confusion Chris.  My bad.  I think Tango is being too solicitous.  I’ll repeat what I have already said several times.  No predictive value means no predictive value!  That is the same thing as saying 100% regression towards the mean. Which means that 15 for 15 is the same as 0 for 15.

Now, that being said, since we are all conceding that there probably IS some predictive value to these matchups, then we can say that if the career numbers are close, go with the best matchup guy.  Does that mean that Tango’s statement in the book is “wrong” or that one can “disagree” with it? I don’t think so.  He clearly states his position which is that he thinks that there IS likely some predictive value to matchups.

The other thing (acutally THE thing) is that if you think that a Redmond versus Sabathia type of performance means that you should go with Redmond (or whomever) over Mauer (or whoever) even though the career numbers say otherwise, there is a simple way to investigate if that may be true or correct.  Simply look at batters who have owned pitchers over some 20 or 30 PA, as Redmond has owned Sabathia, and pitchers who have owned batters over some 20 or 30 PA, as Sabathia has with Mauer, and see how each group does in the future.  Isn’t that the same thing as asking whether you would use Redmond over Mauer?  And if there is going to be some predictive value in 17 for 31, or 4 for 22, there is also going to be predictive value in 15 for 31 or 14 for 31 or 5 for 22 - predictive value doesn’t just show up all of a sudden at some high number.

Oh, wait a minute, Tango already did that!  And he found no predictive value. So why are we even discussing this as if no one has done the research?

Chris, if you “disagree”, then show us some friggin’ research in which matchups have some predictive value!  If it ain’t happenin’ in the past, it ain’t happenin’ in the future, just because you THINK is should happen, at least as far as we can tell…


#32    MGL      (see all posts) 2009/05/11 (Mon) @ 20:02

Without belaboring the point TOO much, if it is true that Redmond is a better bet (going forward) than Mauer, because of their prior histories, then it would be true that all players in history with similar number of PA and success (or the equivalent in standard deviations) as Redmond and Mauer would also show a similar effect going forward.  Well, Tango found that NOT to be the case, therefore there are absolutely ZERO grounds (evidence) to assert that Redmond is a better bet than Mauer against Sabathia going forward.  It all gets back to my original point.  Dial is once again asserting, “Well what about these guys?  Look at their anomalous records against so-and-so pitchers.  Surely THAT can’t be by chance!”

Yes it can.  If Tango finds zero predictive value, hence 100% regression, then yes, those really, really, really anomalous historical results ARE, by definition, occurring by chance, at least as far as we can tell, and as far as we can predict.  Again, if you want to say that that it should be 98% or 95% regression (per X number of PA) rather than 100%, because Tango’s methods were not sensitive enough and therefore to some extent he made a small Type I error, that is fine by me. That is why I don’t have any problem with using history as a tiebreaker.  However, any more than a few points of wOBA, no, no, and NO!  We have no evidence that batter/pitcher history has any predictive value.  If you want to “disagree” with Tango’s statement in The Book, you are free to do that I suppose.  However, with zero evidence to support your disagreement (and lots of evidence to contradict it), it is like me “disagreeing” with the fact that Barry Bonds was a great hitter or Sabathia is a very good pitcher.  I mean where is the argument here? How can I argue with someone who disagrees with a proposition but there is no substance to that disagreement, in this case in the form of empirical evidence?


#33    Chris Dial      (see all posts) 2009/05/11 (Mon) @ 23:20

Perhaps I have some reading comprhension issues:
MGL

I’ll repeat what I have already said several times.  No predictive value means no predictive value!
...
Now, that being said, since we are all conceding that there probably IS some predictive value to these matchups,
...
He clearly states his position which is that he thinks that there IS likely some predictive value to matchups.
...
Oh, wait a minute, Tango already did that!  And he found no predictive value.

Which is it?


#34    MGL      (see all posts) 2009/05/12 (Tue) @ 04:05

Which is it?  You choose…


#35    Chris Dial      (see all posts) 2009/06/03 (Wed) @ 00:19

Walt Davis made this post on another topic at BTF, but it seems to answer the question I was asking (and coming up dry):

You know, there is the whole discipline known as statistics and, you know, it might have occurred to one or two statisticians that this is a question worth answering.

He’s stumbled into “the difference between two proportions.” For quick and dirty, we’ll assume they’re independent—you could certainly argue they’re not since it’s Mauer in both samples but we’re only assuming that the old PAs are independent of the new PAs conditional on it being Mauer. You could also argue neither of these is a sample since we’re looking at the entire population of Mauer PAs but then I’d have to do some hand-waving about super-populations and nobody wants that.

The difference is p1-p2 and its standard error is:

sqrt [ p1(1-p1)/n1 + p2(1-p2)/n2]

and here we get a z over 5 so yes it’s “significant” except ....

The issue here is that this is a post-hoc test. If you’d hypothesized beforehand that Mauer would come back with a higher HR rate and now wanted to test your conclusion, this would be a legit test. But the chances that at least one player in MLB would see a huge spike in HR rate in May is a lot higher.

Still, it’s over a 5-sigma variation which does suggest that this is a “process out of control” but all that tells us is that the “failure” rate (where each HR is a “failure") is higher than it used to be. The question then becomes what’s the new rate. It will be several hundred PA before we’ll have a good estimate of that but I’ll just point out that Adrian Gonzalez had 11 HR in May in 5 fewer PA so there’s no reason to think Mauer’s new level of talent is any higher than Gonzalez’s (who, to this point, has been a 25-35 HR guy though obviously he’s on a torrid pace right now).

So in 20 ABs, what is 5-sigma?  18 hits?


#36    MGL      (see all posts) 2009/06/03 (Wed) @ 01:05

5 sigma is around 51.5% for 20 AB.  If the mean BA were 5.5 hits, then 15.8 would be 5 sigma above the mean.  Hopefully I did that correctly.


#37    Chris Dial      (see all posts) 2009/06/03 (Wed) @ 02:01

5 sigma is around 51.5% for 20 AB.  If the mean BA were 5.5 hits, then 15.8 would be 5 sigma above the mean.  Hopefully I did that correctly.

Is 5-sgm actualy “valid” as Walt says?  So the “tipping point” of when pitch ownership is (roughly) .750 over 20+ ABs?


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:02
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?

Sep 01 23:16
Strasburg II