THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, March 31, 2010

A very good article by Sky A. on batter tendencies (ability to hit the fastball, curve, etc.)

By , 10:21 PM

Here is the URL of the article:

http://baseballanalysts.com/archives/behind_the_scor/

And here is the concluding paragraph:

Overall, this has been a somewhat sprawling piece on a tricky topic, so I’ll sum up. Looking at the evidence, it appears that when trying to identify a hitter’s strengths and weaknesses against particular pitches, looking at how he actually did against those pitches is not a particular useful measure. More indicative is the frequency which a batter was thrown each pitch. The better a hitter is against a particular pitch, they less often he will see it. This entire issue of selection bias is an important one to consider, especially when doing pitch f/x analysis or other pitch-by-pitch studies.

Basically, he starts out looking at the range (SD) of results for all batters against all different kinds of pitches, using a runs per pitch metric. He wants to see who is a good fastball hitter, who is not, etc., or at least what is the spread of talent among batters against each type of pitch.  If some batters are good fastball hitters, others are good curveball hitters, etc., we should see a spread in talent with respect to each of those pitches, right?  Uh, wrong…

First, here is how he found the spread of talent:  He used the wonderful method which Tango has been touting for years.  Take a sample of results, compute the variance (square of the SD) among players (say, weighted by opportunities per player) and then compare that to the expected variance by chance given the number of elements in your sample (number of batters), and the underlying sample size (number of pitches) for each element. The difference is the spread of talent, more or less (there are sometimes other small sources of variance besides talent and chance).

What sky found, and not surprisingly so, is that there was little or no spread of talent, when looking at runs allowed per pitch or per 100 pitches (or whatever) for each type of pitch.  Of course he explains why, which is same reason I wrote, “Uh, wrong...” above.  Those of you familiar with game theory and its implications already know why.  If in fact there IS a spread of talent among batters with respect to pitch type, according to the tenets of game theory pitchers will throw fewer and fewer of the pitches that a hitter hits well and more and more of the pitches that a hitter does not well (to that hitter of course) until the results of all those pitches are equal!  And that is why Sky finds virtually no spread of talent when he looks at the results of each kind of pitch.  There is a spread in talent but it “disappears” when pitchers adjust their pitch frequencies, as it should (if pitchers and batters are acting reasonably optimally).  The reason that this happens of course, is that the more a batter expects a certain kind of pitch, by virtue of how often it is thrown to him, the better he will do against that pitch, regardless of his “talent” at hitting that pitch.  If a batter is a great fastball hitter but not such a great curveball hitter, if hardly ever sees the fastball, he won’t be so good at it anymore.  Similarly, if he sees lots of curveballs, he is going to be better at hitting them.

Keep in mind that this “shifting in the quality of the results” against each type of pitch, according to how often (percent per pitch) the batter sees each type of pitch, is not because a batter gets used to certain pitches (or not) and thus improves his ability against that pitch (or not), although that is certainly true to some small extent.  It is the anticipation of a certain pitch at a certain count that drastically affects a batter’s results on that pitch.

Anyway, I spoiled Sky’s great article to some extent, but I really wanted to emphasize this point about game theory, because it is important and fascinating, in my opinion.  In fact, all batters, pitchers, coaches, and managers should take at least a basic primer on game theory…


#1          (see all posts) 2010/03/31 (Wed) @ 23:20

So in short… if a hitter performs equally across a bunch of pitches, you assume he is better at hitting the ones he sees less often.  Makes sense to me, as does the rest of your post.

But… most hitters see fastballs the majority (or, plurality) of the time.  How can this be reconciled?  That would indicate to me that every hitter is worst at hitting fastballs.  Could this possibly be the case?  Maybe if they’re all trying to anticipate a slider, because they know they’re weak against it, and never anticipating a fastball, and so a fastball is always a surprise, even though it’s the most common pitch?

This makes me dizzy.


#2          (see all posts) 2010/03/31 (Wed) @ 23:50

Really cool study by Sky and a great post here. Been just fan-boi gushing for a few weeks, but I figure sometimes it’s just nice to see compliments.

@Mike

Is that Sky’s conclusion? I could have missed it somewhere. But if not, I think the explanation is that it is also good game theory to use your best pitch in many situations, regardless whether the batter is better at hitting a generic fastball or not. The lure of the “challenge” fastball. grin


#3    Colin Wyers      (see all posts) 2010/04/01 (Thu) @ 00:23

Mike, you don’t look at the rate of fastballs per se, but the CHANGE in the rate of fly balls from the league average.


#4    MGL      (see all posts) 2010/04/01 (Thu) @ 00:57

"But… most hitters see fastballs the majority (or, plurality) of the time.  How can this be reconciled?  That would indicate to me that every hitter is worst at hitting fastballs.  Could this possibly be the case?”

No.  There is no reason why the ratio of fastballs to any other pitch should be the same or any particular number for that matter.  It just depends on how good pitcher’s fastballs are in general compared to their other pitches, including ability to locate of course (the value of a pitch includes how often it is a ball and strike).  If a fastball is a better pitch than a non-fastball even when thrown 100% of the time, then it is correct to throw 100% fastballs.  If not, then there will be some ratio. What those ratios are depends on the difference in quality among the various pitches.

Yes, I was just summarizing what Sky said in his article.  The article was perfect.


#5    Greg Rybarczyk      (see all posts) 2010/04/01 (Thu) @ 01:58

I saw a similar result, and described the same game theory equilibrium, in my 2010 Hardball Times Annual article, when looking at average home run distance by pitch type. 

In 2009, the average distances of home runs for 8 of the 9 pitch types ranged from a minimum of 397.5 feet to a maximum of 400.0 feet, a range of only 2.5 feet.  The only pitch type that did not fit this pattern was the knuckleball, which is thrown so infrequently that its number is more a reflection of Tim Wakefield than any MLB-wide trend.

Pretty fascinating stuff.  We ought to all remember that optimal results can sometimes come about naturally, without conscious decisions guided by fancy analysis…


#6          (see all posts) 2010/04/01 (Thu) @ 07:13

Got it, thanks Colin.

Greg, were curveballs the 397.5 and fastballs the 400.0?  Was the knuckleball less or more?


#7          (see all posts) 2010/04/01 (Thu) @ 09:45

Mike, here’s what I had:

Pitch Name Avg. Dist.  % of HR’s
SI Sinker 400.0 1.4%
FA Fastball 399.6 4.7%
FF 4-seam Fastball 399.4 53.3%
FT 2-seam Fastball 399.2 3.7%
SL Slider 398.7 15.5%
CU Curve 398.5 5.7%
FC Cutter 398.4 2.7%
CH Changeup 397.5 12.7%
KN Knuckleball 387.6 0.4%
Total 398.9 100.0%


#8    Guy      (see all posts) 2010/04/01 (Thu) @ 09:49

"according to the tenets of game theory pitchers will throw fewer and fewer of the pitches that a hitter hits well and more and more of the pitches that a hitter does not well (to that hitter of course) until the results of all those pitches are equal!”

That’s not quite right.  The results on each pitch type are not at all equal.  Most hitters perform better on fastballs than breaking balls, for example.  What should be true is that each hitter will perform the same on each pitch type for any given count and base/out situation. But since more FA are thrown on hitters’ counts (indeed, that’s what makes it a hitters’ count), production will still be higher on FAs than other pitches. 

What Sky’s results show is that pitchers are coming reasonably close to this ideal, and that hitters seem to show the same general relative advantage on FAs vs other pitches.  There are some real differences on FA productivity.  It would be interesting to parse out how much of that is because some hitters get more/less FA in hitters’ counts, as opposed to some inefficiency in how pitchers are approaching them.


#9    Tangotiger      (see all posts) 2010/04/01 (Thu) @ 10:03

We’ve talked about this “average HR distance”, and I keep saying it makes no sense.

For example, say you have three players, let’s call them Bambino the big slugger, Rock the average power guy, and Little Willie the ballplayer who could.  Here are the distances of their HR in April:

Bambino
440,410,400,390,380

Rock
410,390

Willie
395

The average for the three players:
404 Bambino
400 Rock
395 Willie

And, you can easily make it so that Rock’s HR are a shorter average than Willie, by giving Rock one nore “Just Enough” Home Runs.  Or give Willie one lucky HR at 405 feet to put him ahead of Bambino.

The problem is that since the JustEnough home runs are a lower length than the average HR, than anyone who hits alot of JustEnough HR will pull his average down.  It doesn’t make sense that as you add more HR, it makes it look like your HR aren’t going as far, because someone else’s warning track HR don’t count at all, as if they didn’t happen.

This is a selection bias issue, where you are throwing out legitimate samples because they don’t meet a threshhold.  And this biases all the data.

So, I am never surprised when I see HR distances reported by Greg or MGL or anyone else by category where we see little to no differences.  In fact, I just ignore that data altogether.

The “right” thing to do is to include the warning track HR so that you get the same quantity of long flyballs for each hitter.

If Bambino and Rock and Willie all came to bat 100 times, and teh above were their reported HR, then we need to add 3 warning track HR for Rock and 4 for Willie.  So, we’ll get:

Bambino
440,410,400,390,380

Rock
410,390, 378, 376, 374

Willie
395, 378, 376, 374, 372

Now here are those averages:
404 Bambino
386 Rock
380 Willie

Doesn’t this better reflect what we see in the HR data?

Now, yes, how do we know what to put in the warning track HR.  Fine, we can try to have that discussion and a way to come up with that.  But, my way here better reflects what happened than the original way.


#10    Greg Rybarczyk      (see all posts) 2010/04/01 (Thu) @ 10:58

Tom, that’s not even to mention the effect of the different fence distances for different parks.  Guys who play in small parks will always have their distance average dragged down by the cheap HR’s they get from the close fences (not that they will mind)…

I think this problem is pretty tough to avoid if you’re looking at individual players (although the very top of the list is usually all true sluggers, as they hit enough no doubt homers to separate themselves from the pack), but I think when you look at the league in aggregate (as for the distance by pitch type numbers above), the sample size gets big enough, and the 30 parks all mingle together enough, that the numbers can be illuminating…


#11    Tangotiger      (see all posts) 2010/04/01 (Thu) @ 12:19

I agree about the sluggers separating themselves… a little.  Not alot and not as much as they should.

I disagree about our last half-sentence.  The larger the sample, the more noise you have that it is less illuminating.  Your average of 400 and 398 feet HR for the different pitch types is exactly my point here.  Basically, it would be impossible to find anything diverging, even if there was something there.  You won’t find it because of the justenough HR.


#12    Greg Rybarczyk      (see all posts) 2010/04/01 (Thu) @ 12:48

Well, fortunately I can look at all the homers that comfortably cleared the fence, i.e. the non-JE’s.  Now, this doesn’t get rid of the problem of different fence distances, but it should clear the JE effect out.  Note, I also put back in the unclassified pitches, they were the ones we couldn’t identify (thanks again to Dan Brooks for helping me with the classifications)

Data for PL and ND homers only: (there were 3455 of these in 2009)

Pitch Name Avg. Dist.  % of HR’s
SI Sinker 408.2 1.2%
CU Curve 405.0 5.4%
FF 4-seam Fastball 404.8 49.0%
FA Fastball 404.4 4.5%
CH Changeup 404.1 11.1%
SL Slider 404.0 14.5%
?? Unknown 403.7 8.2%
FT 2-seam Fastball 403.6 3.2%
KN Knuckleball 401.9 0.3%
FC Cutter 400.7 2.7%

Total 404.4 100.0%

There is more variation here, but it still seems pretty well-clustered.  But, I don’t know if it is significantly more variation than we’d expect…

I also looked at just ND homers.  Smaller sample size, data still (at a glance) clustered fairly tightly about the mean, which for ND’s was 420.2 feet.


#13          (see all posts) 2010/04/01 (Thu) @ 13:15

"What should be true is that each hitter will perform the same on each pitch type for any given count and base/out situation.”

I’m not sure that’s true, either.  If you think about a pitchers array of pitches and their effectiveness as a weighted average - for instance if you have a guy with 3 pitches, FB, CB, CH, you would take a weighted average of those 3 pitches based on how effective they are and how often they’re thrown.  Let’s say in a given count and base/out situation a pitcher is throwing 50% FB, 25% CB, 25% CH and that makes the run values equal at -.15/100 pitches compared to average.  So you get .5*-.15 + .25*-.15 + .25*-.15 = -.15 runs allowed per 100 total pitches compared to average.  Let’s assume the pitcher wants to throw his FB more often, and he goes to 60% FB, 20% CB, 20% CH.  We’d expect his FB to get worse, and the other two to get better.  Well, it depends on how they change - if the FB becomes average (0/100 pitches), and the other two become lethal (-.5/100 pitches), now all of a sudden you’re seeing .6*0 + .2*-.5 + .2*-.5 = -.2. 

So his run expectancy per 100 total pitches has gone from -.15 to -.20 compared to average.  That’s what he should be trying to maximize, and it doesn’t necessarily have to be where all his pitches are equal.  In my example, given the fastball becoming average, the other two would have to average -.375 runs per 100 pitches compared to average to equal -.15 overall, and if they were less effective than that you’d get worse results than -.15 overall.

In real life, I have no idea how the relationship between effectiveness and how often it’s thrown works, so I’m not sure what’s optimal and if that varies across pitchers (I’d assume it would, but have no real reason to think that other than “everyone’s different")....


#14    Guy      (see all posts) 2010/04/01 (Thu) @ 13:58

"That’s what he should be trying to maximize, and it doesn’t necessarily have to be where all his pitches are equal.”

That is what he should try to maximize, but it DOES have to be where the pitches are all equal (at least against any given hitter).  Remember, the hitter is also adjusting, improving his performance as a pitch is thrown more frequently.  So, in your example of .6*0 + .2*-.5 + .2*-.5, the pitcher should throw more CB and CH.  As he does, hitters will adjust and hit those pitches better, but the FB will become more effective.  Eventually, you get to an equilibrium where neither the pitcher nor hitter can improve by changing their approach, and at that point all 3 pitches will have the same run value.  (Maybe some one else here can demonstrate the math.)


#15    Tangotiger      (see all posts) 2010/04/01 (Thu) @ 14:04

Greg, this is what I suggest:

1. For each pitch type, figure out the % of HR hit per swing.  Let’s say it’s a range of 1.5% to 3.0%.

2. Figure out the lowest rate from #1 above.  That would be 1.5%.

3. Take the longest HR hit for each pitch type, such that you have 1.5% of the swings.

This way, you are looking at the exact same number of HR per swing, and you are looking at the longest HR hit for each pitch type.

(This is similar to the Yankee issue from last year.)


#16    Mike Fast      (see all posts) 2010/04/01 (Thu) @ 15:10

I agree that this was a great article by Sky, but the problem with his (and MGL’s) conclusion is that Sky’s own evidence doesn’t support it.  He does find a significant relationship remaining between RAA for a given pitch and the frequency with which a batter sees it, for all pitch types except the changeup.

His graphs obscure that point, as I attempted to point out in the comments.  That kind of graph is one of my least favorite kinds of graphs as it leads to poor sabermetric conclusions more often than any other.


#17          (see all posts) 2010/04/01 (Thu) @ 16:23

"That is what he should try to maximize, but it DOES have to be where the pitches are all equal (at least against any given hitter).  Remember, the hitter is also adjusting, improving his performance as a pitch is thrown more frequently.”

To a certain degree, I was taking that into account - my question is, why do they all have to be equal?  That would mean that whatever adjustments the hitter is making cause the success rates on the new 60%, 20%, 20% pitching approach to be suboptimal, but what if the hitter can’t make adjustments great enough to cause that to happen?  This is what I’m unclear about, is there evidence that the hitter can adjust enough to make all situations where the run value of the pitches aren’t equal suboptimal?  I’m imagining a situation where the hitter starts sitting fastball as the fastball percentage goes up, but what evidence is there that he can increase his production on those fastballs by more than he’s hurt by the increased effectiveness on offspeed pitches?

Maybe I just haven’t thought this through enough, but I’m just not seeing a reason why all 3 pitch values have to be equal to maximize the overall effectiveness of the pitchers arsenal (or in other words, reach the equilibrium).  I see situations where sacrificing some value on one pitch for a greater increase in value in another might be possible, leading to greater overall effectiveness.  If someone could lay out why the equilibrium calls for all the pitch values to be equal for me, that would be great…


#18    David Gassko      (see all posts) 2010/04/01 (Thu) @ 16:38

B,

See this: http://en.wikipedia.org/wiki/Minimax


#19          (see all posts) 2010/04/01 (Thu) @ 17:28

I mean, I’m no game theory expert, but I do understand the basic concepts.  What I’m looking for is for someone to show me why the equilibrium we’re looking for has to fall where all pitch values are equal as opposed to some combination where they aren’t.


#20    David Gassko      (see all posts) 2010/04/01 (Thu) @ 17:47

And the answer to that question is minimax. If everyone is playing optimally, the outcomes must be equalized.


#21    Greg Rybarczyk      (see all posts) 2010/04/01 (Thu) @ 17:57

Seems to me if a pitcher threw three pitches, and the individual value of those three pitches were not equalized, then that pitcher could increase the overall value of his pitching portfolio by throwing one less of the least effective pitch, and instead throw one more of the most effective pitch.  You can’t be at maximum value unless you’ve gotten to the point where switching one pitch for another can’t push your overall value any higher.  The only time that can be the case is when the individual pitch values have equalized.

Did I say that right?


#22    Nick Steiner      (see all posts) 2010/04/01 (Thu) @ 19:19

MGL, you didn’t link to the actual article, but to Sky’s author page.  It shouldn’t matter for right now, but if someone stumbles on this thread in a few weeks, they might be confused!

http://baseballanalysts.com/archives/2010/03/hitter_scouting.php#comments


#23          (see all posts) 2010/04/01 (Thu) @ 20:08

Yes, Greg is 100% right and that is a good explanation and a good way to look at it.


#24          (see all posts) 2010/04/01 (Thu) @ 20:12

There might be a catch though. I am not sure. I would have to think about it some more. Maybe someone can address it.

The value of all pitches have to be the same (if everyone is playing optimally) for each situation - player, count, score, inning, etc.  But that might nit necessarily mean that the overall value of all pitches across all situations has to be the same. I am not sure of that…


#25    Mike Fast      (see all posts) 2010/04/01 (Thu) @ 20:35

So is no one but me bothered by the fact that the evidence doesn’t back up the theory?  The theory sounds so good, evidence be damned?


#26    jinaz      (see all posts) 2010/04/01 (Thu) @ 20:44

I’m not a game theory expert by any means.  But I produced a very simple game theory model looking at this question to help guide the discussion in my baseball class when we read MGL’s fangraphs article on game theory & pitch selection. 

The model found that with optimal pitch selection, all pitch run values tend to be very similar, but not necessarily identical.  Sometimes, it’s worth it to occasionally correct throw in an occasional “bad” pitch--even if it’s of inferior run value to the worst-case scenario for a “good” pitch--because it makes the good pitch that much better the next time around.  It’s a marginal effect, and depends on some assumptions.  ...  but I couldn’t get away from it.

I try to get it written up as soon as I can so you guys can tell me what’s wrong with it.
-j


#27    Nick Steiner      (see all posts) 2010/04/01 (Thu) @ 20:44

I agree with you Mike and I commented as such at Baseball Analysts.


#28    MGL      (see all posts) 2010/04/01 (Thu) @ 22:52

"So is no one but me bothered by the fact that the evidence doesn’t back up the theory?  The theory sounds so good, evidence be damned?”

I don’t get what you mean?


#29    Guy      (see all posts) 2010/04/01 (Thu) @ 23:14

"But that might not necessarily mean that the overall value of all pitches across all situations has to be the same. I am not sure of that”

It definitely doesn’t mean that.  Fastballs are thrown more in hitter’s counts, and thus has a much higher run value.  Sky actually reports the averages:  “Relative to their overall abilities, hitters did best against fastballs (.20 RAA per 100 pitches) and change-ups (.14 RAA per 100 pitches), about average against curveballs (-.05 RAA per 100 pitches), and worse against cutters (-.34 RAA per 100 pitches) and sliders (-.55 RAA per 100 pitches).”

*

Mike:  I completely agree with your general point about abuse of R2.  But in this case I think Sky presents some pretty good evidence that there is not a lot of variance in individual hitters’ performance on different pitch types. Contra Nick, if the total variance equals the random variance than that will mean the y-t-y correlation is near zero.

Yes, there’s a relationship btwn FA% and FA run value.  But that may just be an artifact of his metrics:  one a hitter sees 70% FA, it follows that the RAA must approach zero.  The problem is that the mean run value is largely determined by the FA.  It might be worth looking at the ratio of FA to non-FA, and then CU:FA, SL:FA, and CH:FA ratios.


#30          (see all posts) 2010/04/01 (Thu) @ 23:44

"And the answer to that question is minimax. If everyone is playing optimally, the outcomes must be equalized.”

Maybe I’m missing something, but I’m not seeing where minimax addresses anything about the outcomes (if outcomes = run values of each individual pitch) must be equalized?

“Seems to me if a pitcher threw three pitches, and the individual value of those three pitches were not equalized, then that pitcher could increase the overall value of his pitching portfolio by throwing one less of the least effective pitch, and instead throw one more of the most effective pitch.”

What if there were some cross pitch effects where throwing one more of the effective pitch (let’s call it Pitch A) caused Pitch A to be less effective, while throwing one less of the less effective pitch (Pitch B) lead to no change in Pitch B, and the loss this causes more than offsets the gain from going from Pitch B to Pitch A?  What if it didn’t offset the pitch initially but you eventually reach equilibrium before pitch values are equal because of the cumulative lost value from cross pitch effects?  I have no idea if any of these things exist, I’m just bringing up possibilities off the top of my head.

“So is no one but me bothered by the fact that the evidence doesn’t back up the theory?  The theory sounds so good, evidence be damned?”

Exactly, this is what I’ve been getting at.  Optimal pitch selection, if you really think about it, is an extremely complicated subject with tons of compounding factors.  Optimal pitch selection is always changing.  The batter is always adjusting to the new information.  You’re constantly facing different hitters with different strengths and weaknesses.  There are so many different counts and base/out situations how are you going to really know what’s going on from such a large group of tiny sample sizes?  My example right above this is getting to this larger point - have we considered everything, and how do we know?  Simply based on what I’ve seen from this conversation, it seems to me this is all based on some simplified theory, but that’s the purpose of my examples - are we considering everything?  Does the model match up with reality?  It seems to me there could be tons of factors that could throw a simple “pitch values must be equal” equilibrium off, especially since we’re just beginning to scratch the surface in terms of things like pitch sequencing, so I think it’s important to use actual data/evidence here.

A man much older than me once told a story about interviewing Robert McNamara sometime after his stint as President of the World Bank.  He asked McNamara what went wrong with his theory based models of human behavior, and McNamara told him the native populations didn’t properly react to the World Bank’s actions.  Supposedly McNamara insisted that his models were correct, long after they quite obviously failed, and he still refused to admit they were broken - truly believing the reality of how people acted was broken.

Now, I’m not sure how to test this, but I do think it’d be interesting to see some data on it.  I’ve seen mgl mention before that often players are probably pretty close to accurate in their actions on these types of things, so it could be interesting to simply look at how close to equalizing pitch values pitchers come, controlling for count and base/out situations....


#31    Mike Fast      (see all posts) 2010/04/02 (Fri) @ 00:34

MGL, this part, where Sky says this:

A last look at this subject is examining the relationship between RAA per 100 pitches and the percentage of each type of pitch seen. If my game theory presumption were true, we would see basically no relationship between the two variables. The graphs below show the relationships.

Then he presents graphs which look like they support his conclusions--big blobs of dots with low R-squared numbers.  But the R-squared numbers are irrelevant.  And the big blobs of dots actually all show significant trends, except for the changeups.  The way he graphed the data obscures this fact, but nonetheless it’s evidence that’s contrary to his conclusion.

He goes on to say that:

As you can see, the RAA per 100 pitches and the percentage of pitches seen have basically no relationship for sliders, cutters, change-ups, or curve balls. For fastballs there is a weak relationship,

But in the comments he acknowledges that that conclusion was wrong.  Doesn’t this bug you?

I agree that the first piece of evidence he presents regarding the variances of RAA is fascinating.  But the second piece of evidence he presents leads toward the opposite conclusion, that batters and pitchers are not optimizing according to minimax.  I’m not ready to assume that just because the first piece of evidence is the one that comports with what game theory tells us is optimal behavior means that it’s the one that is true and the other piece of evidence is prima facie false.


#32    Mike Fast      (see all posts) 2010/04/02 (Fri) @ 00:37

Guy/29, I’m not following what you’re saying here:

Yes, there’s a relationship btwn FA% and FA run value.  But that may just be an artifact of his metrics:  one a hitter sees 70% FA, it follows that the RAA must approach zero.  The problem is that the mean run value is largely determined by the FA.  It might be worth looking at the ratio of FA to non-FA, and then CU:FA, SL:FA, and CH:FA ratios.


#33    MGL      (see all posts) 2010/04/02 (Fri) @ 00:54

My head is spinning right now. Maybe I’ll address these points later.  Right now, I have to go back to my “character” resear…

I mean…

Actually I am trying to make some changes to UZR before the season starts, which I think is in a month or two.. wink


#34    Nick Steiner      (see all posts) 2010/04/02 (Fri) @ 01:09

Guy, I don’t understand why this necessarily has to be true:

Contra Nick, if the total variance equals the random variance than that will mean the y-t-y correlation is near zero.


#35    Colin Wyers      (see all posts) 2010/04/02 (Fri) @ 02:17

Nick, if observed variance is equal to random variance, then by definition correlation has to be close to zero.

What correlation is measuring is the amount of variance shared between two sets of variables (correlation is in fact just the covariance of two sets of variables normalized by the product of the variance of those sets ). If the variance in both sets is entirely random, then the correlation will be low.


#36    Colin Wyers      (see all posts) 2010/04/02 (Fri) @ 02:24

Or, to put it another way - if 100% of the variance in Set A caused by randomness, then 0% of the variance is explained by Set B, and therefore correlation (zero divided by… well, anything) should also be zero. Any correlation you observe between the two sets of variables is just noise, and given a large enough sample in both sets should be practically nonexistent.


#37    Guy      (see all posts) 2010/04/02 (Fri) @ 05:06

Mike 32:
What I meant was that Sky is looking at RAA for each pitch type, that is runs/pitch relative to his own mean.  But as the FA% grows higher, it becomes mathematically impossible for the two run values to diverge very much.  If Juan Pierre only sees fastballs, that will determine his mean.  This tendency will be much weaker on less-frequently thrown pitch types (Pierre’s performance on non-FAs could be very different from his own mean.), so the R2 on those pitches is much lower.

So, I was suggesting it might be better to look at ratios, as opposed to using the player’s mean as the reference point.  The mean will itself reflect the distribution of pitch types the hitter sees.  That said, I’m not sure the ratios would be an improvement—just a thought. 

The other problem with RAA is that the absolute values will tend to be larger for very good hitters (I assume). This should contribute a small downward slope for all the pitches, I think.  Using percentages intead of a differential (or ratios) would fix that problem.


#38          (see all posts) 2010/04/02 (Fri) @ 09:49

This is a tricky subject, and I don’t presume to know exactly what is going on.  My current thought is that pitchers do adjust and mitigate a lot of the meaning in the batter RAA per pitch.  But, they don’t adjust ALL the way.  Otherwise we would see zero variance in true ability, and no slopes.  However, we do see slopes and the variability doesn’t go all the way down to zero. 

Part of it is that the pitcher is trying to maximize performance over an entire at bat, not each pitch.  So if it’s beneficial to throw a batter a high fastball that will be a certain ball only to set up a batter for a curve ball on the next pitch, that might be optimal behavior, but it won’t show up that way in this analysis.  Additionally, pitchers have to worry about throwing over the course of an entire game and season.  It may be that physically, their arms can’t take throwing as many breaking pitches as would be optimal.  That would explain why the fastball seems to have the least regression and the most correlation between RAA and % thrown.

I still stand by my conclusion that looking at how a batter is pitched is a lot better indicator than his RAA against a particular pitch.  But yeah, it’s really difficult to separate this stuff out.


#39    Guy      (see all posts) 2010/04/02 (Fri) @ 12:41

"So if it’s beneficial to throw a batter a high fastball that will be a certain ball only to set up a batter for a curve ball on the next pitch, that might be optimal behavior, but it won’t show up that way in this analysis.”
It won’t?  I thought the RAA metric incorporates the impact of each pitch on the ultimate outcome of the PA.  If not, isn’t that a big problem for this data?

“Additionally, pitchers have to worry about throwing over the course of an entire game and season.”
That’s a valid point, and could result in non-equal RAA for different pitch types.

“That would explain why the fastball seems to have the least regression and the most correlation between RAA and % thrown.”
What complicates this analysis is that every pitch will have a tendency to have RAA approach zero as the percentage of that pitch increases. Even if every hitter is +.3 runs on FA compared to all non-FAs, the guys who see 70% fastballs will have a smaller fastball RAA (because the fastball is being compared mainly to itself).  That’s a function of comparing each pitch to the mean.  So you have to correct for that somehow to make sense of these correlations.

“I still stand by my conclusion that looking at how a batter is pitched is a lot better indicator than his RAA against a particular pitch.”
I think that’s probably true. But the big question is how much differentiation is there beyond the general pattern that great hitters get fewer fastballs.  Are there some great hitters who face a lot of sliders, while others see cutters and changes (beyond what’s explained by the pitchers they happened to face)?


#40          (see all posts) 2010/04/02 (Fri) @ 14:47

Hmmm...I thought at least that if a pitcher threw a 2-1 strike, that it was assigned the same RAA value regardless of the outcome of the bat.  Does anyone know the definitive answer on this?

Also, yes, there are great hitters who see a lot of fastballs.  Joe Mauer, Figgins, Johnny Damon, Jeter, Tejada, etc.


#41    Mike Fast      (see all posts) 2010/04/02 (Fri) @ 14:55

Sky/40, that is indeed how all the PITCHf/x analysts do it.  I assume David Appelman or whoever created the formula for Fangraphs did the same.

In other words, the run values are ignorant of pitch sequencing other than what is already included by knowing the ball-strike count.  If you throw a strike on 2-1, they assign that pitch the value of the difference between a typical 2-1 and a typical 2-2 count.

They are also typically assigned without considering the run environment of the particular pitcher, batter, park, temperature, etc.  So they are a decent approximation.


#42    Guy      (see all posts) 2010/04/02 (Fri) @ 15:15

OK, so a 2-1 FA strike has equal value to a 2-1 CU strike.  That seems reasonable, though it does leave room for a small sequence effect.  Has anyone ever looked to see if there are in fact any differences by pitch type (across all pitchers) for any given count?

This is all reminding me:  in our discussion of the Kovash-Levitt paper, Mike F ran numbers that pretty well demonstrated that the run value of FA and non-FA was equal for each count. The only exception was 2-strike counts, where FA had a higher run value, but I suspect that difference too would disappear if you controlled for platoon advantage.  See here:  http://www.insidethebook.com/ee/index.php/site/comments/game_theory_on_pitch_selection/


#43    Mike Fast      (see all posts) 2010/04/02 (Fri) @ 15:19

Sky/38

I still stand by my conclusion that looking at how a batter is pitched is a lot better indicator than his RAA against a particular pitch.  But yeah, it’s really difficult to separate this stuff out.

I pretty much agree with that.  If I’ve come across as critical of parts of your article, it’s certainly not that I disagree with your main aim, which I think was brilliantly conceived and well demonstrated.

I am still troubled by why the slopes are non-zero in the RAA vs. pitch percentage graphs.  It seems like Guy has some good ideas, but I’m not sure I quite grasp what he is saying.

I am very pleased with the discussion this has kicked off in the sabermetric world.  Definitely well done on that front, too.


#44    Guy      (see all posts) 2010/04/02 (Fri) @ 15:34

Mike:
I’m obviously doing a bad job of explaining this (and/or am just wrong).  Here’s what I’m thinking:  Let’s say two hitters are both +.2 on FA and -.2 on non-FA, relative to a league average hitter (not relative to their own mean).  So they have the exact same run value on each pitch type, and of course the same ratio between the two.  Player A sees 50% FA, player B sees 70% FA.  So player A has an overall mean of zero, while player B’s mean is +.08.  The result is that player A’s personal RAA on FA is .2-0=.2, while player B’s RAA is .2-.08=.12.  And a 90% FA player would have an RAA of just .04. The higher the FA%, the lower the RAA, even though each of these hitters is equally effective against both pitch types.

Any clearer?  Or have I made a logical error somewhere?


#45    Mike Fast      (see all posts) 2010/04/02 (Fri) @ 15:50

OK, so a 2-1 FA strike has equal value to a 2-1 CU strike.  That seems reasonable, though it does leave room for a small sequence effect.  Has anyone ever looked to see if there are in fact any differences by pitch type (across all pitchers) for any given count?

You mean run value differences between pitch types (e.g., fastball vs. curveball) at a given count?

I’m not aware that anyone has done that.  In theory it should be possible to calculate, but it would require a different method than I used to get to the RAA values by count.  I simply found the run values for each ball-strike count by looking at the through-count data on B-Ref.  Then the value of a ball or strike is simply the difference between the run values at the ending count and the starting count.

I know that at some point I checked my Gameday data to see if I had the same through-count results as I did from B-Ref, and it was very close, within .01 runs.

What you’re suggesting would involve the same process as that but a little more legwork to divide by pitch types.


#46    Mike Fast      (see all posts) 2010/04/02 (Fri) @ 15:55

Guy/44, I follow everything you said there and agree with it.  Now how are you applying that to explain what Sky was seeing in his graphs? 

He’s using RAA as compared to league average, not personal average, correct?


#47    Guy      (see all posts) 2010/04/02 (Fri) @ 15:59

What I was asking is whether the end result of a PA featuring a 2-1 FA strike is identical to a PA with a 2-1 CU strike?  (for example). Both put the count at 2-2, so your method assumes the eventual outcomes are the same.  But are they?

That said, even if you found differences you’d then have to check to see if was the specific pitcher and hitter quality in that bucket causing the difference, rather than an inherent advantage for a pitch type.  It might not be worth trying to sort out....


#48    Guy      (see all posts) 2010/04/02 (Fri) @ 16:03

Sorry, my #47 was replying to Mike/45.

Mike/46:  No, I think Sky’s RAA is calculated relative to the hitter’s own overall mean (across all pitch types).  That’s the point I’m making.

For example, he says early in the article:  “Relative to their overall abilities, hitters did best against fastballs (.20 RAA per 100 pitches) and change-ups (.14 RAA per 100 pitches), about average against curveballs (-.05 RAA per 100 pitches), and worse against cutters (-.34 RAA per 100 pitches) and sliders (-.55 RAA per 100 pitches).”


#49    Guy      (see all posts) 2010/04/02 (Fri) @ 16:11

Actually, this quote from Sky makes the method more clear:  “As a first step I subtracted each hitter’s RAA per 100 pitches for each pitch by their overall average RAA per 100 pitches. Obviously someone like Albert Pujols hits well against pretty much all pitches, but I’m interested in which pitches he hits best. This adjustment takes care of that.”


#50    Mike Fast      (see all posts) 2010/04/02 (Fri) @ 16:25

Okay, Guy, I’m on board now.  Thanks much for the explanation.

That would put a slope on every line, but I guess it’s not clear without the details of the data how much effect that has.


#51    Guy      (see all posts) 2010/04/02 (Fri) @ 16:43

Another complication is that good hitters will presumably have a larger spread than weak hitters, even if the ratio of FA to other pitches is similar.  That’s because their run values are larger, and Sky is using a differential (+/- vs. the player’s mean) rather than a percentage relationship. 

On fastballs, where a high percentage tends to mean weaker hitters, that would tend to reduce the RAA as FA% rises (Pujols might have a +.7 FA RAA, while Pierre is just +.2 RAA).  That would reinforce the slope toward zero from the other effect.  But on offspeed pitches, a higher percentage means a weaker hitter and thus a smaller spread, which in this case usually means a less bad RAA (Pujuls might be -.7 on CU, while Pierre is just -.2).  So for the non-FA pitches, this would tend to be an offsetting factor—which may explain the very low R2 for those pitches.


#52    MGL      (see all posts) 2010/04/02 (Fri) @ 17:18

BTW, after thinking about it, it is true that the overall value of each pitch will only be the same against each individual batter and for each count. And even then, you would still have to control for the game situation. League-wide, the overall values of each pitch could be anything, even if batters and pitchers were in equilibrium to pitch selection.


#53    James Holzhauer      (see all posts) 2010/04/03 (Sat) @ 11:15

mgl, any chance you’re revealing your projected standings for the season before it begins this time?  (not a flippant remark, just interested to see what they look like)


#54    MGL      (see all posts) 2010/04/03 (Sat) @ 22:46

James, I am trying to get them done tonight. If I do, it will be late (like Sunday morning), and I’ll post then on the blog here or send them to you.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

Feb 12 05:18
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 12 04:55
Who is Jeremy Lin?

Feb 12 03:15
New PECOTA

Feb 12 02:42
Whitney Houston

Feb 12 02:23
Psst… wanna intern in Canada?

Feb 12 00:40
Clutch analogy

Feb 11 20:11
Fighting leads to goals?

Feb 11 19:55
Why do players get crappy caps?

Feb 11 19:12
Hero of the month: Brittney Baxter

Feb 11 17:59
MGL: Today on Clubhouse Confidential