THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, January 20, 2011

Streakiness

By Tangotiger, 04:26 PM

I’m just waiting for part 2.  So far, all we’ve seen is that David Wright’s 2010 season was streaky, but we still don’t know if he just happens to be one of the extreme point (i.e., SOMEONE has to be streaky, just by luck), or if the number of streaky players is larger than expected by random.

I’ll bet we’ll only see a bit (not much, but some) of streakiness, for the following reasons:
1. you see the same pitcher in 3 or 4 PA
2. same park
3. same weather
4. same health

By re-ordering the PA as the author is doing, he is removing all this.  So, just by virtue that a player is going to be the same guy in the same day facing the same pitcher on 3 or 4 occasions, we expect SOME persistence, over and above any “streaky” factor innate to him as a player.  Once you factor all that out, I would say the streakiness factor will be close to zero.


#1    dutchbrowncoat      (see all posts) 2011/01/20 (Thu) @ 18:32

i would add a number 5 to your list, “pitcher handedness”, especially because this case surrounds david wright. david wright is pretty well known for having a large platoon split, a split that has grown larger as his “streakiness” has apparently increased.  his woba vs L/R last year was 441/344 and over his career it is 435/368.

string together a few games against lefties or bad righties and i can see how he can appear to have hot streaks. don’t have the data on hand to match this up to their streakiness, but i will try to look into it.

i could be way off base, but as a met fan i have heard this mentioned on broadcasts a lot and this was my best guess at an explanation.


#2    MGL      (see all posts) 2011/01/20 (Thu) @ 20:34

There are other sources of variance (streakiness) for sure.  A RH player’s platoon ratio is not one of them, at least to any significant degree since most RHB have around the same true talent platoon ratios.  Sure, there are a few outlier RHB who have a fairly large true platoon ratio and Wright may be one of them.

Notice I didn’t say that platoon ratio itself was not a significant source of streakiness.  It is. But it is primarily the pitcher who is responsible for that (for RHB).  But, that is included in Tango’s “facing the same pitcher 3 or 4 times per game.” The pitcher’s true talent versus any particular batter is a function of his overall true talent plus the platoon ratio of that batter/pitcher matchup, which is mostly the responsibility of the pitcher when facing RHB, for the reason I stated above (Most RHB have around the same true platoon ratio).

(When facing a LHB, maybe 60% of the platoon ratio is the responsibility of the pitcher, and 40% for the batter.  It might be 50/50.)


#3    MGL      (see all posts) 2011/01/21 (Fri) @ 05:25

Good stuff by Seth.  I had not thought of just reordering the actual results of each PA in order to sim a season of streakiness.

Just to see how close that method would come to actually simming each PA, using a random number to determine the outcome of each PA, based on Wright’s actual rates in 2010, I ran 10,000 of such sims.  Here are my results:

The average streakiness score (SC) was .0857.  The SD per season was .0104.  The smallest SC in 10,000 sims was .051 and the largest was .131.  (That seems like an asymmetrical and therefore non-normal curve - is that right?  Seth’s curve looks normal or at least symmetrical.) Wright’s SC of .101 in 2010 was greater than 92.22% of the sims, and his 2007 SC of .072 was greater than 8.56% of the sims.

All of these numbers are very similar to Seth’s so I am satisfied that his methodology is good and his results are accurate.

The next step is to simulate the effects of parks and weather, and the starting pitcher, for each game, to see how that affects the SD of the SC.

That shouldn’t be too hard to do to a reasonably acceptable degree.  I’ll try and do that tomorrow (Friday) night, if anyone is interested.

Tango, what is the best way to convert run factors to wOBA factors?  For example, if a park has a run factor of 1.08 (or a high temperature increases run scoring by 8%), what is the wOBA factor? Is it the square root, more or less, like OPS?  A log function?

Also, what is approximately the SD in true talent wOBA platoon differential or ratio for RH and LH pitchers?

One more thing:

In the article, Seth mentions that a player like Luis Castillo, who mostly gets a single or an out in each PA (although he does walk more than Wright), would have a much smaller average SC.  Let’s see how much.  I simmed his 2009 stats.  Here are the results, using the same game, date, and PA numbers as Wright, but with Castillo’s 2009 rates for each offensive event:

Average SC is .0683, which is indeed much lower than Wright’s (80% as much).  The SD per season was .0082.  The low SC in 10,000 runs was .041 and the high was .109, again, asymmetrical.


#4    Tangotiger      (see all posts) 2011/01/21 (Fri) @ 09:01

Square root is close enough.

Andy has the Left/right platoon splits in The Book.  Table 66 I think.  I think it was 17 points for RHH and 27 for LHH?  Something like that…


#5    Tangotiger      (see all posts) 2011/01/21 (Fri) @ 09:02

One of the most valuable tables in The Book by the way.  Probably available for free from Look Inside at Amazon for those interested…


#6          (see all posts) 2011/01/21 (Fri) @ 12:55

The results are in. And it’s all just noise.


#7    Tangotiger      (see all posts) 2011/01/21 (Fri) @ 13:07

http://www.fangraphs.com/blogs/index.php/were-going-streaking-again/

What if he just looked at the first PA of each game to remove the pitcher bias?  Or the first PA of each “series of games” (i.e., first PA on Friday, and ignore Saturday and Sunday) to remove the park bias?


#8    MGL      (see all posts) 2011/01/21 (Fri) @ 13:22

Very nice part 2.  Funny, Seth wrote this:

“I was, to be honest, completely shocked by the utter absence of a relationship between a player’s streakiness in one year and his streakiness in the next.”

I, on the other hand, and I think most sabermetric researchers, would have been shocked, although not completely, if a significant relationship was found.  In case anyone has not noticed, in 30 some odd years of sabermetric research, to my knowledge, no one has ever come up with any evidence of the existence of significant “intangible” skills whatsoever.  Not to say that they don’t exist of course, or that someday…


#9    dutchbrowncoat      (see all posts) 2011/01/21 (Fri) @ 13:51

hmm.  perhaps i didn’t explain what i meant well enough, or maybe i am just missing what you guys are getting at.  i did not mean that the platoon splits would be predictive of streakiness in any way. rather that anectdotally i would use the large splits to help explain the case for david wright.

it just seems too much of a coincidence to me that the only two years where he was deemed to be streaky are also the two years where his platoon splits had a .144 and .097 difference in woba.

i dont have the tools or the time (at work) to go too crazy analyzing this, but i was able to quickly throw together a crude plot matching up his appearancs vs left handers.  i stretched out his moving woba plot and then added dots for each pa against a lefty. seems to me like this coupled with random variation could be a solution, but i can’t attempt to give you any correlation numbers without having (or replicating) his data.

http://img522.imageshack.us/i/davidwrightstreakiness.png/

i am not saying it is “the answer”, just something else to think about. but as most of us would have guessed, this is likely to be largely just a result of random variation anyway.


#10          (see all posts) 2011/01/21 (Fri) @ 17:56

Gentlemen,

First off, let me just say that I’m honored that you read my post at all, and even more so that you seem to have liked it.

I’ve been trying to respond to a whole lot of comments over on fangraphs, but I’d love to discuss with you guys some of your ideas for teasing out the pitcher effects.

One thing I think you might find interesting is that when I ran it for pitchers, I found that the overall streaky shift is very high for starting pitchers, next highest for batters, and statistically nonexistent for relievers (in fact, the shift is slightly away from streakiness).  This strikes me as evidence that the strength-of-opponent aspect is what’s driving the appearance of overall streakiness--I’ve changed my mind a bit since I wrote the original.  Starters would of course be affected by the opposition, since roughly 1/5 of their performance over a 25-day stretch is against a single lineup.  Relievers, by contrast, might have pretty consistent usage patterns (e.g. LOOGYs who always face the best opposing lefty).

Tango, I must admit, somewhat sheepishly, that I hadn’t thought of park and opposition effects until a commenter brought it up yesterday.  I suspect I just got some tunnel vision.  I will say that I think the health aspect that you raise is something that could actually be different from player to player.  We know that some players get injured more than others, and we know that some players play through more pain than others.  But that seems to me to be an aspect of individual streakiness, even if it’s affected by an external factor.  Basically, even in an ideal world where you can control for everything, I’d be less inclined to factor out health than park/opposition effects (the latter of which are a given).  But that’s mostly a semantic thing.  I’d probably run it both ways, ultimately, since both parts would be worth knowing.

MGL, with respect to your question about the shape of the distribution, the short answer is that it’s not perfectly normal, but it’s very close. When I run a qqplot against a normal distribution, it’s close, but definitely with a bit of bowing.

As for my being “shocked,” I suppose that was a bit of an overstatement, but I was surprised that I couldn’t find any kind of effect at all.  I first read Jim Albert’s work about 6 years ago and had always thought of it as confirmatory evidence that streakiness exists.  Frankly, I wasn’t expecting a strong correlation, but yes, I did expect *something*.  Needless to say, I was mistaken.

Anyway, thanks all for your comments and thoughts.  I appreciate it a lot.  I’m happy to continue the discussion, and you’re welcome to e-mail me if I flake on refreshing this page--grad school kind of eats up my time.

Cheers,
Seth


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 14:14
Pete Palmer’s new book: Basic Ball

May 25 13:18
Do pitcher’s reach back for velocity when needed?

May 25 13:04
“Why Kickstarter works”

May 25 12:51
Chad Curtis

May 25 12:40
Largest demonstration in Canadian history?

May 25 11:32
Howard Stern

May 25 11:26
Lack of hustle during a game

May 25 11:22
What sabermetrics is NOT

May 25 10:58
Rooting for laundry

May 25 02:38
NFLPA lawsuit against collusion