THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, June 10, 2011

How specific can we get in determining the true mean of a particular matchup?

By Tangotiger, 09:58 AM

Every matchup has a specific and true mean.  God herself would establish that specific and true mean at that specific point in time-space with zero level of uncertainty.  Pujols at Busch on July 3, 2011 against Doc and God knows that he can’t handle an outside cutter well, and the next pitch is going to be telegraphed by Doc as an outside cutter?  God says that Pujols will contact that pitch 23% of the time (if allowed to replay in that time-space an infinite number of times) with 0 level of uncertainty.

But what about humans?  If Pujols v Doc has an expected contact rate of 70% any time Pujols swings (with a certain level of uncertainty, say 10%), then how much a better mean estimate can we get in more specific situations (we find more data about Pujols and or Doc and or Busch and or the weather), and how much more can we reduce the uncertainty level?


#1    MGL      (see all posts) 2011/06/10 (Fri) @ 10:27

Are we talking about before the pitch is thrown and we don’t know what pitch and where?  Or we know the exact parameters of the pitch?  It makes a huge difference.


#2    Lee      (see all posts) 2011/06/10 (Fri) @ 10:47

In the real/human scenario, I assume you don’t know the exact parameters of the pitch ahead of time. But what you might have is a deadly accurate set of possible outcomes for the pitch itself, and how Pujols will perform on each of those possible locations. So, a set of weighted outcomes which you could sum to get a true mean.


#3    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 10:57

I agree it makes an enormous amount of difference.

I’m open to establish the reference point in time-space as however one wishes to discuss it.  Presumably, the latest point we can discuss this is when Pujols’ brain makes the last decision to swing or not, the “point of no return”.

So, at the point of no return, Pujols’ brain is processing the speed, trajectory, spin, and location of the pitch (and possibly location of fielders?), and his body and bat is in motion in response to that stimulus.

If Pujols can humanly figure out the property of the ball at that point of no return, and Pujols himself can learn his body’s expected behavioural response (plane of swing, break of wrist, speed of swing) at that same point of no return, then what is the best mean p we can get at his chance of making contact with that pitch, and how certain can we be?


#4    Guy      (see all posts) 2011/06/10 (Fri) @ 11:19

I’d say the reference point should be a point in time in which some baseball actor could use the information to his advantage.  For a hitter, that might be on a pitch-by-pitch basis, up to the point of no return.  So if you think we are going to learn new things that will allow Pujols to identify a pitch and react to it better than he does now, make that case.  For a manager, it’s basically the start of a PA (except for decisions like sac bunts and SBAS), and mainly when writing up the lineup.  For a GM, earlier still.


#5          (see all posts) 2011/06/10 (Fri) @ 11:44

I will agree with the basic proposal from Guy/4.

However, there is another viewpoint where this discussion has merit, that is, that of the outside analyst who wants to look backward and assign value to what happened in the past.  For example, if we want to divide credit between pitcher, batter, and fielders, how far we can go toward determining the expected outcome of a past matchup has a lot of bearing on that split.


#6    mettle      (see all posts) 2011/06/10 (Fri) @ 11:45

Point of info: If you’re talking about the point-of-no-return and you have the parameters of the ball, then pitcher doesn’t matter.


#7    DavidS      (see all posts) 2011/06/10 (Fri) @ 11:48

@4/Guy

I agree that the interesting discoveries are those that affect decisions.  I’m willing to believe that Pujols has a pretty good idea of what’s going to happen after he has started his swing (first of all, assuming that he is even swinging substantially reduces the number of potential outcomes).  I don’t really see what would be gained from that.  Pujols could use information up until he has to decide to swing but as you point out, the manager (and especially the GM) need information much earlier.  However, it’s possible that something for which information is only exactly known at the time of the at-bat can be estimated in aggregate some time before.  For example, platoon splits. (And in such cases, the opposing team may also benefit from this knowledge.)


#8    Zack      (see all posts) 2011/06/10 (Fri) @ 11:49

To Tango’s second paragraph, would the minimum uncertainty be the sum of the (co)variances of our measures of his “expected behavioral response”?  But even if you could accurately quantify these factors, I don’t know how you could quantify decision making better than guessing from observed results.

As an absolute minimum for variance (and an argument against any remaining determinists), I was trying to find a quantity for the variability in nerve firing.  I haven’t been able to find any data on within a single individual, only across individuals, though.  The closest I could find was this, which was not helpful.


#9    DavidS      (see all posts) 2011/06/10 (Fri) @ 11:56

@5/Mike

I like this idea.  Pitch F/X could be (and maybe already is) used to help quite a bit here.  If the pitcher throws a belt-high fastball at 85 mph, but somehow the batter misses it, we could probably assign all of that negative value to the batter and not credit the pitcher.  The expected result of that pitch was quite favorable (for the offense) and the batter blew it.


#10    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 12:28

...then pitcher doesn’t matter.

Won’t it?  When Pujols makes the decision to swing (when, where, how), he’s doing it not (only) based on the pitch recognition at the point of no return, but an INFERENCE of that pitch based on who the pitcher is, what count he’s in, and what’s been recently thrown to him.

There has to be a difference if he’s looking for fastball in, and he gets fastball in, compared to if he’s looking for curve away, finally interprets the pitch at 0.30 seconds prior to reaction as a fastball in, and he gets fastball in.

What would be lovely is to get batters to tell us what they were looking for for each pitch.


#11    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 12:30

0.30 seconds prior to reaching home plate I meant.


#12          (see all posts) 2011/06/10 (Fri) @ 12:46

@9/DavidS

I like the principle of it, but I don’t think all the value should go to the batter or the pitcher in that scenario.  The only way we could do that is if we knew the batter was looking for an 85 mph fastball belt down the middle.  If the batter is looking for a 60 mph curve ball low and away or is geared up for a 100 mph fastball then the pitcher should get credit for out gaming the batter. 

I like the idea of getting model results based on pitch characteristics, even though it would have shortcomings. Doing so should lead to some sort of order. Better throwers will rise to the top.  Those with better stuff than results, or vice versa, can attribute it to pitch selection, defense, unexpected hitting, or luck.  There is information on what the pitch is, information on what the outcome is, but other pieces are missing.

I don’t think its possible, maybe even for a deity, to ever know the true mean for something on the pitch by pitch level unless free thought is removed from the equation.  Maybe if there was a way to record the mental process this would be possible.


#13    Guy      (see all posts) 2011/06/10 (Fri) @ 12:47

I agree with Mike #5, that there are potential analytic insights ahead that will help us explain the game retrospectively.  I was just suggesting a way to focus this discussion, at least initially.  (Although personally I don’t think it will ever make sense to measure value in the way DavidS/9 suggests.)

Small point:  I don’t think any knowledge can be applied by a batter after the pitch is thrown. From what I’ve read, batters begin their swing before their conscious mind even realizes they are swinging.  I think hitters “prime” themselves with an approach for each pitch—a strong presumption to swing or not swing, a focus on certain location(s)—which varies based on the count, out/base, identity of the pitcher, etc.  But that all has to happen before the pitcher releases the ball. There is no “deciding” after that point, just reacting.


#14    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 12:55

I disagree with Mike/5 about the retrospective.  You’ll be able to establish what, but not why, and certainly not be able to assign it on a player-PA basis.

The big value (with regards to individual players) will be on inference and nothing else, because that’s what data at the player level is worth.  You are going to see how Pujols swing plane is for each pitch, how fast his bat travels, and how his wrists break, and you’ll see this for all kinds of pitches, counts, and pitchers.  Knowing that information will allow us to INFER a profile for Pujols.

The value here is the same value that the scout has: to infer something about Pujols.  Something real, and something predictive.

Retrospective, for the sake of retrospective, is for historians, not analysts.  (There’s nothing wrong with being both mind you.) Historians tell us what, but analysts tell us why.


#15    Peter Jensen      (see all posts) 2011/06/10 (Fri) @ 13:04

Guy - I don’t know how one determines what is “deciding” and what is “reacting”.  I am sure that a batter is still processing sensory information after the pitcher releases the ball.  Whether that processing is going on at a conscious level or a subconscious one is not a particularly important distiction, at least for me.  A batter is not just deciding before it is thrown what type of pitch is coming and where it will be likely crossing the plate and then swinging in that location and hoping that it comes true.  How in your scenario do you account for successfully checked swings?


#16    Guy      (see all posts) 2011/06/10 (Fri) @ 13:29

As I’ve said before, I am much, much more pessimistic than Mike F that future research will yield actionable information of broad significance.  For a few players, maybe, but I don’t expect big gains in forecasting accuracy on a large scale.  I’ll just repeat my reasons for skepticism from the other thread, for anyone new to the discussion:

1) A lot of people, both inside and outside the game, have been looking for such advantages for a long time.  Harder-to-discover factors tend strongly to also be small factors (in impact). 

2) Competitive pressures in the game already weeded out most of the interesting factors.  Good hitters who can’t hit a curve, or can’t hit the outside pitch, don’t survive.  Good pitchers whose slider gets crushed by RHH don’t throw the pitch (or don’t make the majors).  Every interesting difference is also a potential weakness, in a game that is extremely good at finding and exploiting them. 

3) Strong relationships between skills would be revealed in suprising predictive power from small samples.  For example, hitter-pitcher matchups would have a lot of predictive power.  Even if we didn’t yet understand exactly why some pitchers “owned” certain hitters, the relationships would be predictive.  And yet, we know that these patterns usually prove to have no predictive power. 

4) Many of the discoveries one can imagine will have very limited practical application.  This isn’t football or hockey where you have lots of pieces you can move onto and off the chessboard at will.  You set your lineup from a limited pool of hitters, and beyond making a few substitutions and a few tactical decisions, that’s it.  If some hitters just can’t hit in day games, well, their backup is usually so much weaker at the plate that it won’t matter.  (The best hope for practical applications by far, I think, is helping some players to improve their performance.  I believe there’s potential there.)


#17    DavidS      (see all posts) 2011/06/10 (Fri) @ 13:56

@12

You are correct, I was oversimplifying.  The deception could be something that is a real skill (and perhaps the catcher deserves some credit for calling it) or it could be luck.  Pitch F/X can already tell us the aggregate results from a specific pitch type so this was a suggestion to move the FIP analysis one step further back in the sequence of events.  We know that a pitcher doesn’t have much control on BABIP.  Like Tango suggests in 10, how much does it matter who threw it once the characteristics of the pitch are known?  Deception seems to be the only other variable that can be manipulated there (the count and base-out situation are fixed).


#18    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 14:10

As for the specific question (but I use OBP instead), if all I knew was Pujols and Doc’s seasonal stats, I’d get say .400 +/- .030 or something as my expectation of p.

Or, if we write without decimals:
p(OBP) = 40% +/- 3%

That is ALREADY a pretty tight range.  I’m 95% sure his OBP is going to be 34% to 46%.

Is there a particular variable that I’m not considering that is going to change that range?  Certainly possible.  Maybe I’ll get this:
p(OBP) = 38% +/- 2%

It’s just not going to make much difference.

As Guy noted:

3) Strong relationships between skills would be revealed in suprising predictive power from small samples.  For example, hitter-pitcher matchups would have a lot of predictive power.  Even if we didn’t yet understand exactly why some pitchers “owned” certain hitters, the relationships would be predictive.  And yet, we know that these patterns usually prove to have no predictive power. 

That is, we already do so well with predicting batter-pitcher matchup using just basic performance data, handedness, and GB/FB tendencies, that anything else will be so very slight.  (A promising one would be whether a pitcher is FB-Curve or FB-Change or FB-Slider, but again, I don’t expect much from that, probably on the order of FB/GB tendency.)

Otherwise, if there was something big missing, we wouldn’t have such strong predictive ability already.

The learning, if any, will be confined to peculiar cases, guys with reverse platoon splits or close to it (like Ichiro, but then Ichiro is a study unto himself anyway), or guys like Wakefield.  Guys with drastic changes in performance levels.  It’s at the extreme cases that anything new will come, with respect to batter-pitcher matchups.


#19    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 14:11

IMO anyway.


#20          (see all posts) 2011/06/10 (Fri) @ 15:35

Or, if we write without decimals:
p(OBP) = 40% +/- 3%

That is ALREADY a pretty tight range.  I’m 95% sure his OBP is going to be 34% to 46%.

Wait, run that by me again? The only way this makes sense (if I’m remembering my stats right, but it’s been some years, so I’m probably not) is if you’re saying that the +/13% is the standard deviation, the 40% is a mean, and you have a normal distribution (that’s where the 95% for 2 st dev comes from). But it’s not a normal distribution, is it? Usually I see the number after a +/- as a confidence interval with a certain alpha. Shouldn’t such an interval be slanted more toward the centre? To illustrate my point, let’s say our estimate is 3%. If we’re doing a CI, shouldn’t the +% here at a given certainty level be greater than the -%?
Like I said, I’m rusty on stats and probably totally off-base here.


#21    Tangotiger      (see all posts) 2011/06/10 (Fri) @ 15:39

It’s going to be pretty close as an estimate, and the +/- is one SD.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 08:11
What sabermetrics is NOT

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards