THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, May 01, 2008

Cliff Lee’s hot start: You wanted crap/yap from a premium writer….

By , 11:13 PM

From Rob Neyer, who is lately (maybe for a long while) just as obsessed (and misguided) as almost everyone else about short-term recent performance:

So is Cliff Lee for real? I think all we can say is that he’s really healthy. He’s going to give up a higher batting average on balls in play, and some reasonable percentage of the fly balls he gives up will fly over the fence. So no, he probably doesn’t wind up winning the Cy Young Award. But I’ll bet he’s better than average. And considering how well C.C. Sabathia’s pitched in his last two starts, suddenly the Indians would seem to have the best rotation in the majors.

So Cliff Lee, 31 years old, is better than average, because he has pitched well to 128 batters after having pitched mediocrely, at best, to 3047 batters over the last 4 years?  I think not, and I will take up Neyer on that bet (he offered this time, although obviously not literally).

Here are Lee’s last 4 years’ NERC, keeping in mind that a league average pitcher, and full-time starter, within his league, is defined as 4.00:

04 4.87
05 3.84
06 4.45
07 4.93

That is a fairly sucky pitcher who, based on his 128 batters faced so far this year, is a now an ever-so-slightly less sucky pitcher!  He is NOT better than a league average pitcher, nor he is a league average pitcher.  (Warning: of course, I don’t KNOW what he is for sure, but my estimate, since it is based on science, is a heck of a lot better than Neyer’s, which is based on nothing, but a distorted and misinformed view of what 5 outings of good pitching following 4 years of poor pitching, means.)

Again, I ask, for any of these, “Is he for real?” questions, that someone simply look at all players in history of about the same age and circumstances, who have had X prior stats, followed by Y (presumably really good or really bad) stats for a short period of time (whatever you want) and then see how they all did in ANY future time period you want (the more, the larger the sample of course).  Oh, you mean researchers have already done that (see Tango’s, my, and probably others’ “banner years” study)?  And the answer is that they performed at around the usual Marcel projection?  So why are these writers trying to answer the silly, “Is he for real?” question and coming up with equally silly answers?  It is a combination of ignorance, they have to write something, and it has to be something that their audience likes (otherwise they are out of a job).  However, it doesn’t matter much if what they write is true or not.  They don’t get graded on the truth.

How about we all just say in unison, come on now, everyone together, “They will ALL likely (our best estimate) perform somewhere in between their past weighted performance and the ‘breakout’ (or collapse) period you are citing, MUCH closer to whichever is the largest sample!”

Then we can all get on with our lives.

Anyway, I am not done with Neyer.

So, now Sabathia is part of a great rotation, considering the way he performed in his last TWO starts?”

I guess before those two starts, when everyone was calling for Sabathia’s head, and wondering whether he was hiding an injury, the Indians’ staff was NOT great.  But now it is.  Considering Sabathia’s last 2 starts.  Maybe we better wait until his (or Westbrook’s or Carmona’s) next start or two.  Because if they pitch badly, then we are not so sure if the Indians have a great staff, right?  I am just following Neyer’s logic and that of every other sports writer in the world.

News flash:  The Indians staff is roughly the SAME staff it was before the season started, the same staff it is now, and the same staff it will be (assuming no major injuries) in a month from now, no matter how any of their pitchers pitch between now and then!

The sad part is that Neyer knows this stuff (I think), but he still writes the same crap that everyone else does.


#1          (see all posts) 2008/05/02 (Fri) @ 00:17

This is, of course, another case where people care more about aesthetics, flash and shape over actual performance.  Because Cliff Lee’s 5 great starts have started the season, it seems more interesting than if his 5 great starts had come in August. 

If he had a 6.00 ERA entering August and put together five great starts to lower it to 5.30 (or whatever,) nobody would really care, but because he puts them at the start of the season and gets to have an ERA of 0.93, people start overrating him. 

Of course, it makes no difference when you get hot, but it is HUGE in how it affects the way the mainstream population thinks about you.


#2          (see all posts) 2008/05/02 (Fri) @ 00:18

I think you’re being a little too harsh. His career FIP is 4.57 and that includes his injury problems last year (don’t give me a lecture on including injuries in projection). Last year, the AL average ERA was 4.61 (park adjusted according to B-ref), so both his career FIP and ERA are better than that (4.57 and 4.46 respectively). I’ve never seen the formula for NERC, but those numbers are out of line with every other defense-independent pitching statistic I’ve ever seen, so I don’t know how to judge it.

I’d also bet that he’ll be better than average, assuming that he doesn’t have some kind of injury that he attempts to pitch through later in the year (like Shilling last year or Brad Radke a few years ago). And for the record, CHONE projected him to be slightly better than average this season. If you actually want to make a bet with me, I’d be willing to put up a small amount (gas is getting ridiculous).


#3          (see all posts) 2008/05/02 (Fri) @ 01:06

#1, I concur completely.

#2, I am being a little harsh.  But I am in that mood lately about writers for some reason.

If Lee is an average or so pitcher, then he was average or so before the season started. These 5 starts may have changed our (anyone’s) estimate of him by a tenth of a run in ERA or so (I don’t really know off the top of my head), so my basic point still stands.

Arguing about his projection is another issue.  An FIP is NOT a good tool for a projection either in the short term or in the long term since it makes no attempt to be a projection tool (no age adjustment, no proper regression, uses only HR, BB, and K, and then, without any regression on those, etc.). I am not sure if FIP includes a park adjustment.  And ERA is, well, ERA.  But, again, we don’t need to argue about his projection.  That is a separate issue.

If Chone projects him as average or so, I have no problem with that.  His projections are probably at least as good as mine, although I am pretty confident in my Lee projection as it is merely a reflection of his career numbers, adjusted for defense, and park.

FWIW, Pecota has his projected eqERA (which is context neutral, I think) at 4.87, based on an average pitcher of 4.50, so they project him to be solidly worse than average as well.

If you take the average of Chone, Pecota, and I, I think you are solidly in the worse-than-average range.

I will, in fact, take you up on the offer, if you will use FIP and park adjust the HR totals based on LHB and RHB HR park factors for CLE, and the percentage of RHB and LHB that face Lee.  Any amount of money you want.

We can let Tango or someone do the FIP calcs and the park adjustment.

Of course, we are talking about from this point to the end of the season.  We can have a min # of IP if you want.  I don’t care about that.

The reason I don’t want ERA is because I have the CLE team as well-above average defensively, mostly in the OF.  And the CLE home park suppresses HR, especially for RHB.  So his ERA is probably going to be close to league average when you include those, and his park-UNadjusted FIP is going to be deflated because of the home park suppression of HR.

When I talk about a pitcher’s talent (he is average, above average, etc.), of course I am talking about independent of his park, team, etc. (even opponent, but we don’t have to include an adjustment for that, unless you want to), as I assume everyone else is too.

For example, it would make no sense to say that an average pitcher was an above-average pitcher because he had an above-average defense behind him, or played half his games in a pitcher’s park.

So we can just compare his park adjusted FIP to the league average FIP from whenever we want to start to the end of the season.

Fair enough?

BTW, my NERC is simply his component stats, all of them excluding IBB and sac attempts, and including wp, but not SB/CS/PO, park adjusted for every park he pitched in, adjusted for the pool of hitters he faces, as opposed to the pool of hitters that every other pitcher faces, and adjusted for the UZR of the players behind him when he pitches (sort of like using his PZR rather than his actual hits allowed, but not quite).

Then those components are turned into an ERA-like number, scaled to exactly 4.00.  So, by definition, a league average pitcher, starter or reliever, would have an NERC of 4.00.  It just so happens that in most years, full time starters (> 75% of their appearances are as starters) are right around 4.00 also, full time relievers (75% of games are as relievers) are around 3.80 or 3.90, and the rest (swingmen, etc.) are like 4.30.

And I use a regular lwts (like for batters) formula for the NERC rather than a base runs formula.  The Baseruns formula should be the correct one, but for some reason I find that using the regular lwts formula correlates better with ERA or RA.


#4    robneyer      (see all posts) 2008/05/02 (Fri) @ 01:09

Entering this season, Lee’s ERA+ was almost exactly league average. He’s got a 0.98 ERA entering May. Isn’t he likely to finish the season with an ERA+ better than league average? Just a little bit?

As for Sabathia, I believed there were legitimate questions about him after his first few starts. I also acknowledged, at least once, that it might have been nothing at all to worry about. In his last two starts his control’s been great, so now there seems to be less to worry about.

Yeah, I probably get too excited about what happens in April. I do have to write about something, after all. Mickey, you know I love you, but sometimes you seem to see the chaff and ignore whatever wheat might be there. -r


#5          (see all posts) 2008/05/02 (Fri) @ 05:26

Hey, I gotta write something on the blog too!

No, seriously, I admit that I was way too harsh on you.  I probably took out all of my frustration on you, caused by all the other writers, including several at BP and THT, who waste my time and computer screen on articles about who and what is “real” and who and what isn’t, yada, yada, yada.

And yes, my “job” is to ignore the wheat and point out the chaff.  That is what I do.  Just kidding of course, but that is what I tend to do.  Someone has to do it.

Lee, in my book is a solidly below average pitcher, but if his ERA+ or whatever is around average, then that is a legitimate point of view (that he was average and is now a hair above average).  Someone said that Chone had him projected at better than leaue average, and he does very nice projections.  Of course, no one knows exactly what his true talent level is, and I sure wouldn’t think that any projection system can be so “sensitive” or “accurate” as to declare a pitcher anything within a couple of points of ERA either way, with any degree of reliability.

I’d still make the bet, but I wouldn’t berate anyone for saying that he is around league-average (which actually in the AL is ABOVE average overall).  He may very well be.

My point about the Indians staff still stands.  What they are now is essentially the same as what they were 2 weeks ago, before Sabathia’s bad outings, and what they were 4 weeks ago, before the season started, and probably what they will be 4 weeks from no.  Pretty much the same for all the other teams, I am afraid, including TBA (were good, as you correctly projected, before the season started and are not a “surprise"), BAL (still suck, despite what Millar thinks and says), the Yankees (although injuries could derail them of course, as they can with any team), OAK, STL (both still mediocre), etc.

Anyway, I am glad I flushed you out, which was actually one of my ulterior motives.  I have written emails to you (not bad ones) a half dozen times and got no response. Same email address?

Good of you to drop by!  I am sure our readers will appreciate it.  Around here, you are considered in the top 2 or 3 mainstream baseball columnists in the world, maybe #1, and rightfully so!

I wonder what Bissinger would think of this blog?


#6    John Walsh      (see all posts) 2008/05/02 (Fri) @ 08:57

#3/ mgl,

Why do you not give any credit for controlling the running game in NERC?  Won’t you be systematically penalizing left-handers by doing that?


#7    Tangotiger      (see all posts) 2008/05/02 (Fri) @ 11:40

What luck.  I made a long post, and the browser crashed on me.  But, I did a CTRL-C first.  So, here is my lost post:
===============================
These are the ERA (and FIP) forecasts for Lee coming into 2008 (remembering that the league average ERA is roughly 4.4):

Marcel: 4.79, 4.68
Chone: 4.45, 4.67
ZiPS: 4.63, 4.38
James: 4.40, 4.62
Sackmann: 4.80, 4.76

Marcel is the least loving, with a win% of roughly .460 or so.  The most loving makes him a .500 pitcher.  I mean, you can reasonably call him a .480 pitcher, and no one is going to argue with you.

An average pitcher will be .470 as a starter and .560 as a reliever.

So, going into 2008, Lee was around an average PITCHER, and a bit below average as a starter.

Given his sample (129 batters faced), I would weight his 2008 performance as roughly 10%, and his career performance (performance through 2007) at 90%.

His 2008 performance has two ridiculously unsustainable numbers: BABIP of .195, when his career is .295 and all the forecasters coming into 2008 had him around .305 give or take.  He also has 32K with 2 walks in 38 innings.  He won’t keep that up.

Anyway, even if you want to take his 2.01 FIP of 2008 and give it 10% weight, and take a 4.65 FIP forecasted going into 2007 and give it 90% weight, you end up with a 4.40 FIP.

To the extent that one believes that he was slightly below average going into 2007, it is wholly supported to say that he’s slightly above average as of today.

***

I seem to remember that Jason Marquis has a ridiculously good April in 2007 as well I think.  Something about Marquis anyway, if someone wants to look it up.


#8    baseball bloggerette      (see all posts) 2008/05/02 (Fri) @ 11:46

dear mitchel,

undoubtedly mr. bissinger wouldn’t stoop to read anything written by a person like you who runs around in his mother’s basement in his underwear instead of having a JOB or getting a journalism degree, which would enable you to write plaschke-like, fact-filled prose, the epitome of excellence in baseball writing.

you see, you don’t have a degree in journalism which means that you aren’t capable of writing a cogent or coherent sentence about anything, let alone baseball.

also because if you don’t go into a locker-room filled with dirty smelly nekkid males, or watch a baseball game from a certain place in the stadium, you couldn’t possibly LOVE baseball, let alone understand it.

he would, of course, tell you this being careful to use swear words every other word, just to make sure you understand the depth of his hatred and contempt for anyone who DARES to use any numbers in a baseball game except for those he himself deems appropriate.

yours truly,

unrepentant baseball blogger-ess without ANY degrees who for some reason, thinks she has the right to write about any subject she darn well pleases


#9    Colin Wyers      (see all posts) 2008/05/02 (Fri) @ 11:50

You’re right about Marquis - B-Ref is giving me his hitting splits instead of his pitching splits for some reason, so I can’t find the exact numbers. Rich Hill had an even better April. It worked out better for Hill than it did Marquis.


#10    Tangotiger      (see all posts) 2008/05/02 (Fri) @ 12:22

I made a post at #31, and am linking here mostly for my benefit:
http://www.baseballthinkfactory.org/files/newsstand/discussion/the_book_blog1/


#11    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 12:33

Some questions for general discussion:

1. How unlikely do you consider Cliff Lee’s first five starts in 2008 to be, given his prior results?  Are we talking 1 in 3, 1 in 10, or has the tornado assembled the 747 in the junkyard?

2. Given that one of these two things must be true:

a) Cliff Lee is the same pitcher he was last year, or
b) Cliff Lee is not the same pitcher he was last year

How unlikely would his first five starts’ performance have to be for you to conclude that Cliff Lee’s first five starts in 2008 were pitched by a “different” person than wore his uniform last year?

3.  What degree of injury should prompt us to enforce a discontinuity in a player’s results that feed his projections?  Tommy John surgery?  60 day DL?  Loss of more than 5% of average fastball velocity?

4.  Should Rick Ankiel’s 87 MLB at-bats from 1999-2004 be incorporated into his Marcel projections, since he has only had 272 MLB at-bats in 2007-08?


#12    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 12:50

To amplify:

On #1, I am asking two things: what metric should we consider, and how would you use it to characterize how unlikely his performance has been.

On #2, what I am asking is what is your personal alpha value, to use the terminology of hypothesis testing.  The null hypothesis is that Cliff Lee is the same.  What p-value would you personally need to see to decide to reject the null hypothesis?


#13    fifth of      (see all posts) 2008/05/02 (Fri) @ 12:52

The argument of Lee supporters would be that Marquis, through his great May 9th start last season, had only 24 K against 13 BB in 47.2 IP.

ZiPS is actually a better projection for Lee than Chone, but likes Cleveland’s defense less - .309 BABIP projection vs. .293 for Chone. There really are not, as far as I can see in a side by side comparison, any substantive differences in his projections. ZiPS likes his control better, Chone likes his control less, and both have a slightly lower HR projection. I’m not looking at PECOTA, but the others on FG are basically the same.

Looking at his pitch data on FanGraphs, he’s going to his fastball a little more often and his changeup a little less, and his FB velocity is up a tick. If he really is so much better, how is he doing it without much change in his stuff? At best, he’s improved his sequencing. I feel pretty comfortable in arguing that, while his true talent is up by a very little bit, we’re seeing the effects of luck and opposition.

Much is made of Rany’s pointing out the uniqueness of his stretch, along with further elaborations thereof by others. Fair enough. But how historically unique is opening the season with six starts against five of the league’s six worst offenses?


#14    fifth of      (see all posts) 2008/05/02 (Fri) @ 12:56

That should be five starts against four of the league’s five worst offenses. (Baltimore being the fifth.)


#15          (see all posts) 2008/05/02 (Fri) @ 13:46

MGL-- We don’t actually need to bet, if you’re willing to back off a little bit as you did in the comments section. Of course, we dont KNOW what he will be this season… I’m really only suggesting that theres a 51% chance that he’s better than average.


#16    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 14:08

I just reviewed the Gameday hit charts for Lee’s 5 starts.  I don’t think I’ve seen it mentioned anywhere else that 7 of his 19 hits allowed have been infield hits, meaning in 37 2/3 innings, he’s allowed 12 hits to the outfield.  That’s pretty amazing…


#17    Guy      (see all posts) 2008/05/02 (Fri) @ 14:22

While I’m generally sympathetic to MGL’s perspective here, I do think small samples can provide meaningful new information IF the performance is extreme enough. (cross-posted at BTF)

If a pitcher performs at a truly elite level, even for a short period, it may tell us something important about his talent. For example, there have been 43 games thrown since 1994 with a game score of 95 or higher. Nine pitchers account for 60% (26) of these:
Randy Johnson (6)
Roger Clemens (3)
Pedro Martinez (3)
Mike Mussina (3)
Curt Schilling (3)
Kevin Millwood (2)
David Wells (2)
Hideo Nomo (2)
Kerry Wood (2)
Clearly, all enormously talented pitchers at the time they threw these games. The other 17 pitchers are also almost all good-to-great: David Cone, Eric Milton, Erik Bedard, Francisco Cordova, Greg Maddux, Jason Schmidt, Frank Castillo, Johan Santana, John Lackey, Justin Verlander, Kenny Rogers, Pat Hentgen, Andy Benes, Bartolo Colon, Bobby Witt, Chan Ho Park, Chuck Finley. So a game score of 95+ would seem to be a pretty strong indication of pitching talent, even though it’s a sample of just 1 game.

Now, I’m NOT saying that Lee’s 0.96 ERA belongs in this category. It clearly doesn’t. But if he had, for example, K’d 65 batters in his 38 IP, we’d have to seriously consider the possibility that his talent had changed.


#18    Tangotiger      (see all posts) 2008/05/02 (Fri) @ 14:35

b) Cliff Lee is not the same pitcher he was last year

Not only is Cliff Lee not the same pitcher he was last year, but this is the case for every single human on the planet.

...that Cliff Lee’s first five starts in 2008 were pitched by a “different” person than wore his uniform last year?

And to the extent that he changed ALOT, our certainty level of that is extremely low.  Like I said, Marcel changes him by 0.30 run difference.  And, MGL’s other post made the change 0.25 runs I believe.

So, I don’t know that we can say anything more beyond that, absent other information.

enforce a discontinuity in a player’s results that feed his projections

When you are ready to bet on it, meaning never.

Should Rick Ankiel’s 87 MLB at-bats from 1999-2004

Yes, absolutely.  But, his weight is different.  The average hitter would have 100% of his 2007, 80% of his 2006, 64% of his 2005, and 51% of his 2004.

However, with Ankiel we DO know something substantial has changed!  It is not only fantastically substantial, but clearly documented: he decided to stop spending his training time pitching and focused it on hitting.

The given in all these forecasts is that Pujols will continue to train and practice 1 (or whatever) hour a day on his hitting, and that he will not show up drunk any more than he ever has.  Since this parameter is virtually the same (or similar) for all hitters, we don’t need to introduce it as a parameter.  It’s implied.  But, not in Ankiel’s case.

Injuries are an issue, no doubt.  That needs to be quantified.


#19    Tangotiger      (see all posts) 2008/05/02 (Fri) @ 16:04

If you follow the link in post 10, I am making several (at least a dozen) posts, all on mean, randomness, and binomial distribution.

For those who know all about it, you can ignore those posts.  Otherwise, if you are a bit hazy in this area, I hope they are illuminating to you.


#20    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 16:08

Tango #18:

Sounds like your threshold for recent performance leading you to discard older performance and start a new baseline is… never.  Right?

You mentioned betting, and I mentioned Ankiel.  This makes me regret not knowing you back in 2000.  I could have bet you and MGL my life savings that Ankiel’s pitching perfomance in Game 5 of the 2000 NLCS would have been terrible (after his Nuke Laloosh imitation earlier in the postseason), and you guys would have been looking at his pretty good regular season and figuring on a quality start or something close.  I could have owned you both (or donated your worldly possession to a charity of my choice, either way)

The point is, you must have a threshold for recent performance changing your mind.  Maybe it’s just so high that it rarely is ever reached…


#21    Sky      (see all posts) 2008/05/02 (Fri) @ 16:22

I just flipped a coin and KNEW it was going to come up heads.  And it did.  Should have bet some money on that.


#22    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 16:44

Sky #21, if I pulled a coin out of my pocket and flipped it 10 times and got 9 heads, would you bet on heads, or tails on the next flip?  What about 13 out of 14?  19 out of 20?  Point is, at some point the uneven results strain your belief in the fairness of the coin beyond its breaking point.

I’m not sure if you watched the 2000 playoffs, but Ankiel was not walking guys because the HP umpire was squeezing him, he was hitting the backstop regularly.

If Ankiel in 2000 reg. season was a coin, then in the playoffs, he was a coin loaded towards failure.

As I think about it, it is probably much more common to be asking oneself this question regarding a collapse than regarding a breakout.  Player X hit decently the last few years, but this year can’t seem to catch up to the fastball any more, and after a few weeks of terrible hitting is sent to the bench and eventually DFA’d.  Are those front office decision-makers wrong to suspect that this year’s model is not a continuation of the previous year’s, but a totally busted version that hit its negative tipping point?  Well, probably they are right sometimes, and maybe overly hasty some other times.  But, they are acting on their own internal threshold…


#23    Colin Wyers      (see all posts) 2008/05/02 (Fri) @ 16:51

Greg, you’re basically using the textbook definition of the Gambler’s Fallacy there. At SOME point, yes, we can determine that the coin is biased, but it certainly is nowhere near 20 - or even 200 - flips.

Same with players - there is a sample size at which Lee’s new performance would lead us to be confident in a change of underlying talent/performance level. But Lee isn’t at that point.


#24    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 17:03

Colin,

Well, you’re right if the bias is weak.  I train people at work to avoid pass-fail tests using an example based on free-throw shooting: player A makes 91%, player B makes 86%, it takes a huge number of shots to tell them apart.  Bigger the true talent difference, the easier to differentiate.

With Ankiel’s 2000 playoff pitching, he hit my threshold for sure.  And I usually don’t put things this way, but anyone who thinks his three disastrous outings in the playoffs were unrelated rolls of his Marcel dice ought to have his head examined.

As for Lee, I expect him to come back to Earth, like most people in the thread.  I’m very impressed with his performance so far in 2008, but I certainly would predict his rest-of-2008 to be closer to his Marcel than to what he’s done so far…


#25    Tangotiger      (see all posts) 2008/05/02 (Fri) @ 17:07

I could have bet you and MGL my life savings that Ankiel’s pitching perfomance in Game 5 of the 2000 NLCS would have been terrible

If I told you that a team’s best hitter will:
a. only bat exactly one time in the playoffs
b. because he is injured and can barely walk
c. and he’s facing possibly the best reliever ever

You would:
a. bet your life savings that someone that crippled could not possibly produce anything of note
b. presume that The Natural will finally occur in reality

***

What you are saying is what my buddies say: “Man, I went to AC and won a fortune!”.  Conveniently forgetting all those times they went to AC and lost a fortune!

***

I don’t pretend to know the human psyche and how persistent or transient certain things are.  Nor what an “injury” really means.  I certainly am not going to bet for or against Rich Ankiel, Calvin Schiraldi, Donnie Moore, or Kirk Gibson.

But, there are plenty of places that will accept your money!


#26    Guy      (see all posts) 2008/05/02 (Fri) @ 17:13

Greg:
In hindsight, we can clearly see that Ankiel’s true talent abrubtly changed.  If we take you at your word, you saw this after 2 starts.  Great.  But how many other mental predictions have you made based on 2-game samples that proved to be wrong?  We all have a tendency to remember our correct predictions far more than the wrong ones. 

The real question is, can you devise a “Steve Blass” test that will tell us a pitcher has lost his control after X+ consecutive games with BB/9>Y?  How accurate is this test?  And how often is it useful?(since sudden, dramatic changes in talent like Ankiel’s are in fact quite rare).  Even better, tell us how we know when a young pitcher’s dramatic April performance is “real.” That would be impressive.....


#27    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 17:18

Tango #25

You’ve brought up another very dramatic example, but it’s different in an important way: Ankiel pitched in the 2000 NLDS, and in Game 2 of the NLCS, and was terrible, and then took his regular turn in the rotation, with no apparent change taking place between then and taking the mound in Game 5.  Gibson was hurt and hadn’t played recently, and I at least hadn’t seen him hit, and I had no idea what condition he was in.  I would never bet a lot on something I felt I knew nothing about (whereas I felt like I knew a lot more about Ankiel’s outlook), though if given the opportunity, I probably would have bet something small, I will admit.  I would have regarded Gibson’s hitting a homer as very, very unlikely.

To tell you the truth, I would never bet my life savings on anything, Ankiel included, so the comment was a bit unfair, please excuse my hyperbole smile

But as a serious question, if the Ankiel pitching situation were to come up now, would you expect him to pitch to his Marcel after those two fiascos, or like Nuke Laloosh?


#28    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 17:33

Guy #26:

Actually, I was thinking about this sort of thing recently when Manny Ramirez was hot.  He hit two homers in a game, and on his third AB ripped a single to left where his swing looked particularly locked in on the ball.  Believe it or not, I thought of some of the clutch hitting discussions, and discussions about hot hitters and the like, and I began to think about whether the visual evidence I just saw would be enough to bet on. 

I even hatched an idea in my head about proposing a bet to any takers: as a straight-up bet, I would bet them that a certain player would get a hit in a certain plate appearance.  I know MGL and lots of others don’t believe in clutch hitting, or hot streaks, or hitter vs. pitcher matchups, or anything like that, so I would expect he or someone of like mind might take that bet.  To win that bet, I would need to convince myself that the positive factors I can select for (pitcher matchup, ballpark, weather, perceived “hot-streak” of the hitter, etc.) would be enough to tip a hitter over the .500 mark, at which point it would be a worthwhile bet.

The fact is, I never convinced myself that I could do it.  I really doubt it can be done by anyone to that degree (i.e. enough to make it an even bet), and thus I would never propose such a bet.  And I think a purely statistical formula is unlikely to be able to discern the talent level changes that typically happen over that short of a time frame…

That said, I think Ankiel’s meltdown was unique in my experience, and not simply a tail-case of poor performance.  A one-of-a-kind immolation.  Besides, my judgment was not based only on his stats, if so I would not have felt as strongly about it (nor would most people, I suspect, though it would be hard to test that, you’d need to find someone who didn’t know what Ankiel did and never saw the repeated wild pitches to the backstop) Lots of people have had back-to-back terrible outings with poor control - what differentiated Ankiel to me was the degree of wildness, and the apparent loss of confidence, etc…

Manny didn’t get any more hits that game, by the way…


#29    MGL      (see all posts) 2008/05/02 (Fri) @ 17:39

Greg, I would not have bet with you about Ankiel.  In fact, when they brought him back up, after he was in the minors for a while, I told the Cardinals that that was a mistake, as pitchers do not often recover from the “disease” that Ankiel had.  Not to mention that he was still walking a million batters in the minors.

All of the “just go with the projection” arguments include the qualification or caveat, unless there is a known issue, such as an injury, mental (as in Ankiel) or physical, change in velocity, for whatever reason, new pitch or pitches, etc.  And then, you are on your own (and I defer to the scouts) in making a projection, or adjusting the Marcel projection.

The whole point, is that absent any of those types of indications we KNOW how to project a player who has had any kind of spate of greatness or stinkiness we can imagine.  All we have to do is look at history and see what happens.

That does not mean that that model applies to all pitchers, but we have to use it unless we know something about a particular pitcher that makes that model not apply.  And once we know something about that pitcher which makes the model not apply, we can probably go back in history again, and come up with another model THAT WORKS for that kind of pitcher.  For example, once we make some more progress with the pitch f/x data, we will be able to figure out how a change in velocity affects a pitcher’s components and his overall effectiveness, and then we can incorporate that into our model!

What you don’t seem to understand Greg, is that these models (like a simple Marcel) assume that a player’s talent changes every second of every day. Or at least that there is chance that is changes and a chance that it stays the same.  For example, if a pitcher pitches really lousy for one month, the model that we use to project his performance already includes the chances that he is injured and that is why he changed his true talent, the chances that he is old, and that is why his true talent changed, the chances that his true talent did not really change much, and he just got unlucky, etc.  The model already takes ALL of those things into consideration because the model was built on what happened FOR REAL.  And as long as the model was built on sound principles and was based on a large enough data base, it HAS to work, by definition.  And conversely, using any other model, has to NOT work.

That is why I did that little study about pitchers with hot and cold starts.  To show that the basic Marcel model works even for pitchers with anomalous starts to the season.  The study also suggests that PERHAPS there a few subsets of pitchers in which more (never THAT much more) weights must be given to recent performance, but again, those models are based on what actually happened, so by definition, they have to be correct.

If anyone believes in hot and cold streaks (based on stats), just go back in time and classify any hitter or pitcher as being in a hot or cold streak and then look at what happens.  You get your answer.  If anyone believes that a pitcher or hitter who starts out the season very, very hot will perform at a level after that higher than a basic Marcel would presume, just go back in history, punch in the right numbers, and voila, you have your answer!  Same with batter/pitcher matchups.  If you are watching a game and think that a certain hitter owns a certain pitcher (or vice versa) because he has KILLED him in the past, just go back in history, etc., and you have your answer.  There is never a need to debate these kinds of issues or get bogged down in semantics.  Getting bogged down in semantics only creates arguments where none should exist.  For example, you (Greg) think that Marcel says that no player reaches a new level of performance.  I think that is what you think. That is 100% incorrect.  Marcel or whatever the best projection model is, just projects a player’s performance based on all his past performance AND knowing the likely spread of talent in the population (that is the Bayesian part of the model, which is super important), and based on the chance that he is at ANY possible level of performance.  It is agnostic as far as whether his true talent has changed or not, because it does not have to worry about that mathematically.  It simply computes a player’s chances of being at any possible level of performance.  We KNOW of course, that some players change their true talent levels for whatever reasons, significantly, some don’t, everyone’s changes with age, injury, and everything in between and around.  But those things are irrelevant to our models, if only because they are implicitly INCLUDED in the models.

As it turns out, when a player has a banner year, we can still use a basic Marcel to project his performance.  How do we know that?  We went back in history and looked at players who had banner years, and found that a basic Marcel worked.

It didn’t HAVE TO work.  It was entirely possible that it wouldn’t have worked.  In fact, it wouldn’t have worked if players changed their true talents A LOT.  If players changed their true talents A LOT, then a banner year would be strongly indicative of a significant change in talent level, and a Marcel would not have worked.

Given that a Marcel works, we can say with confidence with players DO NOT, as a general rule, change their talent levels significantly.  When we say, as a general rule, we mean, of all players who have banner years or banner time periods.  It is possible, and likely probable, that there are SOME players who have changed their talent levels significantly (like Ankiel, Ozzie Smith, Big Papi), but we CANNOT tell just from looking at banner time periods.  If we could, then Marcel would not work for players with banner years, but it does.


#30    MGL      (see all posts) 2008/05/02 (Fri) @ 17:45

Who was it that chose to link this thread to BTF?  While there are some interesting discussions in this thread, the original post was NOT one of my better criticisms.  As I said in my reply to Rob, I was unduly harsh and was mostly taking out my frustrations with the MSM on him.

His positions were really not out of line, especially the assertion that Cliff Lee is now an above-average pitcher.

And of course, Rob is the epitome of the saber-friendly, reasoned, and intelligent MS writer on the web (and in print).

Even his response to my out-of-line post was quite civil and respectful.

I didn’t read the BTF thread, but I imagine that it will evolve into everything that Bissinger hates about the blogosphere.

Whoever linked to this thread on BTF, you probably didn’t mean it, but that was a bait job, if you know what I mean! wink


#31    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 17:46

By the way, I don’t really believe in all those things I listed.  Here are my short positions:

Clutch Hitting: Doubt it exists.  In fact, I my definition is different than for most: to me, clutch is performing consistently under all conditions, not upping one’s game under pressure, because that means doing worse in mundane spots, which is bad IMO.

Hot & cold streaks: Obviously they exist, but caused by changes in the players or just from chance?  I think the player can be the cause sometimes, as a result of fluctuations in a player’s health.  I know random chance can create hot and cold streaks, but I am not convinced that just because random chance CAN explain something means that it is the ONLY root cause of something.

Weather: obviously, a tail wind can turn some long flyouts into homers, but also some bloopers into catches.  I studied Wrigley Field a while back and found a difference of a couple runs per game when the wind was blowing out vs. in, so this one’s real, at least in some parks.

Hitter-Pitcher matchups: I believe they exist for some pitchers paired with some hitters, and the number is probably quite small.  Mainly influenced by some personal experience, playing ball growing up, I encountered a couple pitchers whose delivery I simply could not pick up, and I flailed against them while my teammates were pretty much normal as far as I could tell.  I also played against a couple pitchers whose delivery I saw perfectly, and I hammered them, while my teammates didn’t do anything at all similar.  Considering this only happened a short handful of times in almost 10 years of play, I think it’s rare.


#32    Greg Rybarczyk      (see all posts) 2008/05/02 (Fri) @ 18:01

MGL, I think with your help I’ve more or less arrived at the idea that the Marcel’s (and similar systems) are the best long-term projection methods, and short-term, they of course may differ more - sort of analogous to a long-range weather forecast derived from actual weather data.  You’ve done a good job educating and convincing me…

One question, though, about the hitter-pitcher matchups issue.  In The Book, IIRC, you demonstrated this effect to be absent by looking at groups of players who hit a certain pitcher over a certain period, and then the group’s performance in another adjacent time period.  Hoep I’m remembering this right.  The lack of a persistent effect (i.e. the group of guys who owned pitchers didn’t continue to collectively own them) was the evidence, IIRC.

My question is, during your research for this, did you always look at the data in groups?  I understand why you did, and agree with the approach, btw.

If not, did you ever find a player who owned a pitcher long-term? Of course, with lots of players in MLB, you would expect this to happen a few times, so to refine the question, did you ever find that happening at a higher than expected rate?  I’m just sort of wondering if the effect might be rare, but significant in some individuals - essentially a very uncommon interaction between players…


#33    Dackle      (see all posts) 2008/05/05 (Mon) @ 03:52

Greg, they took a list of 30 hitters who “owned” particular pitchers over a three-year period, and then checked what happened after the fact. For example, Miguel Tejada was 8-for-16 vs Jamie Moyer with a wOBA of .664 in 1999 through 2001. Next year Tejada had only three singles and a double in 20 PAs vs Moyer, for a wOBA of .197. This appears to be the rule, not the exception.

The issue I think is that you can always find great batter/pitcher matchups, or hot streaks, or whatever in short-term data, but this is always with the benefit of hindsight. But for the information to be useful, it has to be interpreted in a forward-looking way. You can’t say—here are 100 players with career averages above .500 vs particular pitchers in 40 or more at bats, the difference is significant, therefore some hitters own some pitchers, therefore I am going to pencil this 8-for-16 hitter into the lineup. Or, the As won 20 straight games, therefore hot streaks exist, therefore I am going to bet on the Twins tonight because they have won five straight. The only thing you can do is find all of the hitters who went 8-for-16, or all of the teams that won five straight, and find out what their performance was in the next four at bats or the next game or whatever. Whether hot streaks exist or not is irrelevant. Maybe the Twins are in the midst of a 20-game winning streak right now, and if it turns out they are, then in hindsight we can look back and say “yes, they were on a hot streak”. But we don’t know that at the moment—we only know they’ve won five straight, and the way to use this information is not to argue that hot streaks exist, it is to dig up all of the five-game winning streaks in history and find out what happened in the sixth game. You will find that the answer is: “not much”.  You are much better off considering short-term performance information within the context of longer-term data. If you assume that a team has a true winning percentage of .550 based on a 110-90 record in its last 200 games (and yes, 200 games will give you a better estimate than the last 100 or 50 or 20 games), then a 8-2 hot streak means it’s a 118-92 (.562) team (and even that’s going a bit far. They’re probably .553). That’s all the 8-2 streak really means.

Also, on a related topic, when you’re talking about hurdles of statistical significance, I think you have to consider that even a stringent 1% confidence level spread out over 700 or so players in the league is going to mean that on average, 700 x 1%, or seven players, are going to surpass your test for significance strictly by chance. So if you apply that 1% confidence level to batter/pitcher matchups, and you use the binomial theorem to compute that a particular matchup could only occur by chance 0.5% of the time, you have to remember that there are a lot of players in the league and a lot of potential batter/pitcher matchups, the sheer quantity of which is going to result in a few extreme results (eg Moyer vs Tejada).

To be even more of a wet blanket at the hot streaks party, even a significance level of 0.01% does not necessarily “mean” that something has changed. The odds of a 20-game winning streak for a .500 team are 1 in 1,048,576. For a .600 team they are 1 in 27,351. In a 30-team league, a .600 team should win 20 straight games once every 911 years. Another example from the stock market. A daily return in the S&P 500 that is more than four standard deviations from the mean should occur once every 50 years. One that is five standard deviations away is expected once every 7,000 years. The return for the S&P 500 on October 19, 1987 was 20 standard deviations away from the mean daily return! Clearly that must’ve meant that the markets had changed, and yet ... two days later the S&P was up 15%, and by October 20, 1988 it had made back all of the losses incurred on Black Monday a year earlier. Very odd combinations of events can suddenly flare up—20-game winning streaks, 20% one-day stock market crashes—and the explanation is not that the Oakland As or the S&P have changed. It’s simply due to the randomness/chaos of the world throwing together some unlikely combinations.


#34    Greg Rybarczyk      (see all posts) 2008/05/05 (Mon) @ 12:46

Thanks for the thoughts, Dackle, and certainly you’re right in what you’re saying, but I’ll go back to something I said in an earlier post: just because something can be explained by random chance does not mean it happened because of random chance.  Simple example: just because it is *possible* for a poker player to get a full house every time he deals, doesn’t mean that the guy opposite you isn’t cheating.  Or in baseball terms, just because chance predicts that some players will appear to hit a certain pitcher better than expected even if their true talent against every pitcher is identical, doesn’t mean that there is no hitter-pitcher interaction.  It just means you don’t know.

To be clear and perhaps forestall some rebuttals, I agree that if such interactions are rare and comparitively weak, then you are better off ignoring hitter-pitcher stats than you would be to trust them, because if you followed those matchup stats, most of the time you’d be wrong.  I think this is what you’re saying, rather than “hitter-pitcher matchup differences absolutely do not exist”.

However, why I keep coming back to this is I just don’t think this should be the final word: if an effect exists (or we think it might), we ought to keep looking for other ways to find it, and in this case those ways might involve non-statistical methods.  The value of knowing a true matchup advantage would ve very high, I think it’s worth looking for.

By the way, a couple things I’ve read by Tony Gwynn:

1.  He claims to have never hit a John Tudor curveball in fair territory (and also claims to be very glad Tudor never seems to have figured that out).

2.  Gwynn claims to have never hit any pitcher’s splitter for a home run.

Things like #1 make me believe in the existence of hitter-pitcher matchup differences.  Things like #2 make me believe that such differences, where they exist, will probably be significant in their effect (even though Gwynn didn’t hit a lot of homers anyway)…


#35    Tangotiger      (see all posts) 2008/05/05 (Mon) @ 13:02

As long as humans are involved, everything can be explained as not random to some extent.  The question is always “to what extent?”.  And, the related question is: “Can we identify the traits of those parameters?”

An enormous amount of the pitcher-batter confrontation can be explained by easily identifiable traits (the quality of the players, the handedness of the players, the GB/FB tendencies, the park, the climate, the game state i.e. inning/score/base/out, the on-deck hitter).

Once you’ve identified those parameters and their behaviours, you can model a pitcher-batter matchup without regard to any other parameters, and the expected outcome of the events will follow a random distribution around that mean.  It will actually be a little wider, because there are some things we still haven’t identified (e.g., pitch types thrown, and effectiveness to mix them, throw them, and the opposing hitter to hit them).  Clearly, these things matter.  Once you identify them, you include them in the model.

The important point, however, is that it is not necessary to do all this.  Even just sticking with the quality of the batter/pitcher and their handedness should capture most of the variance.  Not all, but it’s never “all”.


#36    Dackle      (see all posts) 2008/05/05 (Mon) @ 16:58

Greg, you’re right, but for the information on the Tudor/Gwynn matchup to be useful, it has to be available in real time (when you are filling out tonight’s lineup), not after the careers of both players are over. If you want to say there is an effect between the two players, or if you want to say there is no effect, I’m fine with both, because it doesn’t matter either way. The only thing that matters is the information we have at hand at the moment, making a decision right now, regarding a game being played in the near future. And generally, the only player/pitcher interaction information we do have at hand is the result of the historical matchup between the two players. And so if a .285 hitter is 8-for-16 lifetime against a league average pitcher, and we know that all .285 hitters in history with an 8-for-16 against a particular pitcher have batted .287 in their subsequent at-bats against him (adjusting for context, pitchFX information etc), then we have to live with that and use .287 as our best guess, because it’s the only information we have on hand at the time we are making the decision. There’s no way that an active player would let it be known that he can’t hit a particular pitcher’s curveball, because the information would be rapidly exploited both by the pitcher and by the rest of the league. It probably already is via scouting reports, video review and hitting coaches etc. Then the hitter and pitcher both adjust (at a time the fan wouldn’t be aware of) and the effect has washed out.

That brings up another issue really, that the batter/pitcher matchup isn’t necessary set in stone from the beginning of their careers. It’s forever evolving, and both players are constantly adjusting to try and “win the battle”. After 10 or 15 PAs, I’m sure it is fairly common that pitchers finally realize why they can’t get particular hitters out, and then for the next 10-15 PAs, the hitter struggles to figure out why he can’t hit that pitcher any longer, and so forth. The data in The Book suggest that this is the case (eg Tejada/Moyer at the top of post 33), although it is attributed to regression to the mean.

Bottom line is—what is the best interpretation of the available information at the moment the decision is being made? And if the only information at hand is “8-for-16” (as I’m sure it was during the Earl Weaver days), then you have to interpret the meaning of that “8-for-16” as outlined above.


#37    Greg Rybarczyk      (see all posts) 2008/05/05 (Mon) @ 18:59

Dackle, no doubt we are in agreement here. 

Let me just throw one more thing out there, though.

Suppose you’ve got two guys to choose from in the bullpen, with similar overall stats (for the sake of argument, let’s say identical stats, and they throw with the same hand).  Who do you brng in to face the opposing team’s star hitter?  Pitcher A, who has “owned” the hitter so far in his career, or Pitcher B, who has been “owned” BY the hitter?

Stats say no difference, right?  But what of the psychological difference?  If you choose the pitcher who has been the owner of the matchup, the pitcher comes in with more confidence, the hitter with less (assuming of course, that there is no real difference, just a statistical aberration, that has led to the difference in matchup stats) (and assuming the players have some awareness of the matchup stats). 

Is this *potential* factor enough to act on?  If so, how much is it worth?  What if Pitcher A isn’t quite as good as Pitcher B, but has better stats against the hitter - how much of a statistical edge would you “forego” in order to get the psychological difference?

Wondering what you guys all think…

This all leaves aside the very real “manager wants to keep his job” imperative that would weigh in favor of Pitcher A - I’m not itching to discuss that, I think we all know that’s a real influence on decision-making…


#38    tangotiger      (see all posts) 2008/05/05 (Mon) @ 19:28

This is the same as “clutch” hitting.  All other things equal, choose whatever you want as the tiebreaker, be it matchups, clutch, or lucky underwear.

The real test is when one is inferior to the other, does the “intangibles” outweigh the gap.

My Clutch project suggests that people will allow around a 20 wOBA point gap to be made up because of Clutch.  That would be my rough allowance for someone to play his hunches, somewhere between 10 and 20 points.

A 12 wOBA difference is roughly .001 wins per PA (when LI = 1.0).  That’s what I use as a “go with guts” and I don’t bother questioning the move.


#39    Dackle      (see all posts) 2008/05/06 (Tue) @ 02:49

I think I’d go with Pitcher A, for two reasons: (1) there’s enough random variance around the expected average for each pitcher (eg in 500 at-bats, a true .260 hitter will bat between .240 and .280 68% of the time) that you can’t really say for sure whether A is better than or equal to B, which opens up a bit of leeway for your gut instinct, and (2) I’m willing to admit that say an 8-for-10 performance vs a particular pitcher might be worth a point or two of batting average, thus tipping the scales a bit in favour of pitcher A.

But I would never bench a .327/.429/.568 hitter (eg Pujols) in favour of a .269/.354/.386 hitter (Spiezio) just because Albert is 1-for-15 lifetime against the pitcher and Scott is 10-for-22. Scott Spiezio went 10-for-22 last year between June 16 and June 23, and then went 1 for his next 10. Albert Pujols went 1-for-15 last year between May 11 and 14 and then hit .489 over his next 11 games! It would be silly to bench him in favour of Spiezio after the 1-for-15 and yet managers do that kind of thing frequently in the context of batter/pitcher matchups.

Beyond the numbers, I think you’re still bringing information to the table that you don’t have at the time the decision is being made, namely: (1) that you know the psychological makeup of the hitter and pitcher in question based on their prior matchups, and (2) that you can infer how this psychological makeup will translate into performance on the field. I really don’t think it’s possible to know these things, and absent a psychoanalyst sitting in each team’s dugout, we have to fall back on performance and scouting data. And how do we infer what an “8-for-16” means? By looking back through history at all of the prior 8-for-16s, and checking out what happened in at-bat #17. Based on that approach, the authors of The Book concluded that “having 20-30 PA against an opponent is a drop in the bucket, and it tells you almost nothing about what to expect. The player has a long history, say 1,500 PA, against the rest of the league. Any way you slice it, you can’t equate, or even compare, 25 PA against one opponent to 1,500 PA against the rest of the league.”

Greg, I think you’re saying that if one out of 10 matchups where the batter owned the pitcher is genuine, then we owe it to ourselves to look for that diamond in the rough. But there’s a “search cost” to sticking inferior players in the lineup (or on the mound) who have good matchup stats—you’re going to get it wrong nine times to get it right once. Better to get it right nine times and accept the occasional failure.


#40    Greg Rybarczyk      (see all posts) 2008/05/06 (Tue) @ 14:10

Dackle, not that kind of search!  It would be nuts to search for something by running guys out to the batter’s box or mound…

I mean a) to look for evidence that’s already in the record, or b) look for observational type data, as opposed to results type data.  The sample-size phenomenon is against you if you’re going to limit yourself to box score results.  But observational data might tell you, for example, that one guy’s 8 for 10 was hard-hit frozen ropes from line to line, while another’s was a few bloops and a couple questionably scored infield hits… expanding from discrete to continuous data can offer some hope of discerning factors that are otherwise too subtle…


#41    MGL      (see all posts) 2008/05/06 (Tue) @ 14:32

Greg, I think you’re saying that if one out of 10 matchups where the batter owned the pitcher is genuine

Obviously, it depends on what you mean by genuine - no one thinks that owning a pitcher means that you will hit 50 points more against him than expected.

But, once we look at everyone in aggregate and find NO effect, that means, by definition, other than the confidence interval of our findings (which depends on the sample size of the “aggregate” we looked at), that there are NOT 1 in 10 who truly own one or the other, that means that there is NOT a significant psychological factor.  If there were, then it would show up in the aggregate. IOW, there is nothing to find!  You cannot say that, “Yeah, you found nothing (batter/pitcher matchups, clutch, protection, etc.) in your (solid) research, but there must be a psychological factor that you overlooked.” If the psychological factor existed (to any significant) degree, it WOULD HAVE shown up in the performance, otherwise, of what value is it?

Here is the way it works: If you propose that there is some effect, say, a clutch effect, and you test it on a large group of players, and you find NO evidence of such an effect, what that means, again, other than making a Type I or II error (let’s assume that your sample is so large that the 99% confidence interval around your result is infinitesimal), is that there may be a tiny effect among a few players or a large affect among a microscopic number of players.  There is no other option.  There is NOT going to be a large effect in 1 out of 10 players or even 1 out of 100. If there were, it would have shown up as a small effect in your whole sample.  The idea that you MUST HAVE missed it (because your sample was not large enough) is a silly one.  That is not the way science works.  If we don’t find something in a large sample, there is a SMALL chance (sometimes VERY small), by definition, that we missed it. The idea that we MUST have missed it is just plain wrong.  And as I say at the end of this post, I don’t care about the 5% or 10% chance that I missed it!  I’ve got hundreds or thousands of decisions to make.  If I am wrong about some small percentage, which I will be, that’s just fine!

Greg, you can’t say that I’ll bring in the pitcher who has owned the batter because there is a psychological effect. What good is the psychological effect if it doesn’t show up in the performance?  We have already shown that it DOESN’T show up in the performance, so therefore, we have already shown that there is no psychological effect, at least as far as it affects performance.  Your argument makes no sense.  Your argument goes like this, “You find no predictive value for batter/pitcher matchups, but I will use them anyway, because there must be psychological advantage.” My retort is, “Well, I just said, there is no predictive value.  Which words don’t you understand?  Predictive INCLUDES psychological, physical, mental, emotional, and spiritual.” (I am not trying to be snark with you. I am just illustrating a point).

The ONLY reason Tango says, “use these things as tiebreakers” is that one, for some of these things, like clutch, we DO find a small effect, and two, there is some uncertainty in the findings, so you can’t go wrong when using them as tie-breakers.  That’s it.

Here’s the whole deal with ALL of these strategy things that people criticize, as I started to say above:  Yes, we are going to be wrong on some of them, for various reasons.  But there are hundreds, if not thousands, of strategy decisions to make in the course of the season.  If we use “the computer” for all of them, we will be right (no one will know whether we were right or wrong) with 95% of them (or 90% of them, or whatever), and we will be wrong with 5% of them.  We have no idea which ones we were right on and which ones we were wrong on, but with a 95/5 or 90/10 or 80/20 record we are going to be MUCH better off than some manager using “experience”, “conventional wisdom,” or, “intuition” in making these same decisions.


#42    Greg Rybarczyk      (see all posts) 2008/05/06 (Tue) @ 15:45

MGL:

Let me start by saying I appreciate the discussion.

A few points:

1.  I don’t have The Book in front of me: when you say you loked at the aggregate for hitter-pitcher matchups and found “no” effect, what was the p-value or the statistic that you got from the comparison?  Your later point that a small effect would be visible in a large sample, I can agree with, hence the question about how large of a p-value or what value of your test statistic did you get?

2.  I don’t think I ever said there MUST be an effect, what I said (or think I said) is that I suspect there is an effect for a small number of combinations, primarily based on personal experience.  This of course would be hard to detect, and as you said, perhaps not worth the effort considering how few situations it might affect.  No argument there.

3.  Regarding the psychological issue, if I understand your statement correctly, any psychological effect must be already baked into the numbers.  Here I disagree in the specific instance of a hitter owning a pitcher or vice versa.  I am saying that after some period of “ownage”, the winning one of the two might feel more confident, and the losing one might feel less confident.  This obviously would not be the case while this “ownage” was being established (whether by a real or statistical effect, it doesn’t really matter"): if the hitter batted safely against the pitcher the first time they faced each other, that psychological effect would not be there, yet, and thus could not be said to be baked into the results, yet.  Only later, after the “ownage” continued, would the potential psychological effect grow large enough to maybe matter to one or both of the people involved.

An analogy would be that no one complains about the rain in Oregon the first day it comes down in October.  It’s only after it’s been happening day after day for quite a while that the psychological effect of missing the sun can be said to exist. 

So, if we are looking at the next at-bat between a hitter and a pitcher he owns, I think the confidence factor could exist, where earlier in their careers it didn’t, and thus didn’t/couldn’t manifest in their matchup stats.

Overall, I will grant that the likelihood is that the effect, if any is small, and perhaps not worth the fuss, but nevertheless I’m not tossing it into the “Solved” bin yet.  I don’t mind if you do, though, we all have just so much time to devote to these things…

I agree completely with your last two paragraphs. 

Greg


#43    MGL      (see all posts) 2008/05/06 (Tue) @ 20:34

Only later, after the “ownage” continued, would the potential psychological effect grow large enough to maybe matter to one or both of the people involved.

Yes, but we ALREADY looked at what happens after a large number of PA between pitchers and batters and we FOUND no predictive value, so we presumably found NO psychological advantage.  How far out do you want to take it?  If a pitcher and batter faced each other 200 times, and one owned the other then maybe there is some predictive value? 

OK, fine.  Good luck finding that situation to exploit.

So why do you keep insisting that there might be one, other than there MIGHT be ANYTHING that we missed due to a Type II error?

You can say that about anything in science where sample data was tested including every medical test that has ever been done to determine the effectiveness of a drug.  “Yeah, you found no difference between the placebo and the drug, but there might be one you missed.” No shizit!  (Which is why Bill James “underestimating the Fog” article, or whatever it was called, was so silly.)

We test things using sample data.  We come up with conclusions based on an analysis of that sample data. Some of our conclusions are going to be wrong because it IS sample data, and we will make Type I and II errors, the frequency of which is determined by the size of the samples (and the tests used of course).  So what?

No, I don’t know the confidence interval of the results that were found with the batter/pitcher matchup data.  Tango did the research.  He’ll have to answer that.  And frankly I don’t care (much).  As I said, we (all scientists) are ALWAYS going to make “statistical” mistakes (Type I and II errors).  So what?  It is nice if we can tell people the confidence, certainty, or reliability of the results we get.  Sometimes we do and sometimes we don’t.  We probably always should.  But it doesn’t change the conclusions.  We can never say with 100% certainty ANYTHING when dealing with sample data.  But we can say with 90%, 95%, or 99%, or whatever the case may be.

And in most cases in baseball, it is NOT an either/or thing and it DOES NOT HAVE TO BE.  IOW, when we find evidence of a very small clutch ability, it goes without saying that there is a strong likelihood that a small clutch ability exists, AND there is some likelihood (small) that NO clutch ability exists (and we made a Type I error), AND there is some likelihood that a larger than we came up with clutch ability exists (and we made a Type II error), AND there is a very small likelihood that a MUCH larger than we came up with clutch ability exists (a very large Type II error), etc, etc, etc.  That goes with or without saying with all of these things, including batter/pitcher matchups.  Because we did not find any predictive value in ANY batter/pitcher matchup, that does NOT mean that none exists to a 100% CERTAINTY, so there is NO POINT in stating, “You might be wrong, what about X?.” We ALREADY know we might be wrong!  You don’t have to tell us!  But, that DOES mean that that is still the best conclusion we can draw given the information we have.  Heck, even if our results were only significant to the .5 sigma level, it would still be the BEST conclusion to draw, albeit we are not as certain of our conclusion.

The other thing is that if you increase the sample size to infinity, with all of these things, there is a high degree of probability that you will approach 100% predictability.

Remember that even if there is one tiny, slim, iota of “talent” or “effect” inherent in something, and it is so small that for all practical purposes it does not exist, if we have a large enough sample, that effect will have 100% predictability!

That is almost certainly true with batter/pitcher matchups, clutch, and other things which I can’t think of off the top of my head.


#44    Buff      (see all posts) 2008/05/13 (Tue) @ 15:56

I missed the part where you addressed the mechanical changes Lee has implemented.  I understand that Lee is ancient and incapable of change, but his pitches really are following different paths than the extreme flyball left-handed Scott Elarton of 2006.  (2007 was largely lost due to injury.)


#45    Tangotiger      (see all posts) 2008/05/13 (Tue) @ 16:22

Buff, I think you’ll appreciate the other recent thread where MGL’s research shows that we can infer something real has changed, based on the K/BB results we are seeing.  Please check those out.

The question on the table is always this: how much of what we are seeing is real.

And, the easiest way to get a guage as to what we think is to answer this question: If Cliff Lee were a free agent today, how much would you pay him for 3 years?  4 years?


#46    Buff      (see all posts) 2008/05/13 (Tue) @ 16:32

Thanks, Tom, I will look for it.

As a Cleveland fan, I dismiss the possibility of signing a free agent pitcher to a 4-year contract out of hand.


#47    Tangotiger      (see all posts) 2008/05/13 (Tue) @ 16:38

For those who didn’t RTFA at my post 10, I will give you a snippet here:

1. The main issue in the analysis, here or elsewhere, is that if you decide to constrain yourself to only looking at performance data, then [post 7] stands as correct.

2. If you decide to bring in outside data (he was hurt, he had a minor league rehab that went ok, he changed his pitching mechanics or how he mixes up pitches), then this is perfectly legitimate, and really desirable.  How you weight this information is of course the key. 

How much does all this affect the forecast?  Beats me.  I won’t pretend to know.  How does the community process all that information (of that available as of Apr 1, 2008)?  Well, look at community forecasts, fantasy auction bids, and whatnot.  When it came time to actually make a decision where they actually had to put money and thought to it, what did people actually do and say?

The guys who have Cliff Lee on their fantasy teams: what is being offered in trades?  No one is offering Santana. But, what is being offered?  Some #3 starter I suppose?  Were they offered #2 starters (say a .530-.540 pitcher)?

3. Finally, you cannot, simply cannot, use 129 PA to try to infer that something fundamental has changed, because his performance has been so historic, and therefore must conclude that *something* major has changed.

We can presume that something has changed, since we have more information (129 more PA).  But, that information has been processed in my point #1.  If someone wants to include even more information (point #2) without, at all, making reference to his ERA, K/BB, or any performance stat already include in point #1, fine.  Please do so.

Does anyone disagree with anything I’ve said here?


#48    Tangotiger      (see all posts) 2008/05/13 (Tue) @ 16:42

Here’s that thread:
http://www.insidethebook.com/ee/index.php/site/comments/what_do_good_and_bad_starts_by_pitchers_tell_us/

Lots of good research in the comments as well.


#49    MGL      (see all posts) 2008/05/13 (Tue) @ 17:44

I was apparently wrong about the weight that must be given to recent performance when a pitcher has an unusually good spate of performance in a current year, at least at the beginning of the year (April).  I have been underestimating the weight that must be given to that performance.

Remember also, that in The Book, I did find some predictive value in hot and cold pitchers (but not batters).

However, I think the jury is still out on these kinds of things with respect to pitchers, at least.

And of course, when we have these types of discussions, we (I, at least) are always referring to the “based on the stats” only. Obviously knowledge of other things, like injuries, change in mechanics, velocity, etc., can significantly change the “equation.”


#50    Tangotiger      (see all posts) 2008/05/14 (Wed) @ 09:52

GB-FB rates may be the most reliable metric we have, all other things equal.  A GB pitcher remains a GB pitcher, and we can establish if someone is a GB pitcher fairly early.  The regression point of 50% is achieved after something like 100 BIP (if not less; don’t quote me).  It tells you alot.

http://drivelinemechanics.com/2008/05/14/pitcher-analysis-cliff-lee/

Cliff Lee went from being a FB pitcher to a GB pitcher.  This does not happen in a vaccum.  The likely explanation is that something fundamental about him has changed. 

So, if a researcher wants to add something else to the mix, look at what happens when someone switches styles (GB to FB or vice versa) and report back the performance results, and how persistent the switch is.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Aug 31 15:28
Fans Scouting Report: Update

Sep 02 15:54
The two uncertainties of UZR

Sep 02 15:17
Mail: rWAR v fWAR

Sep 02 14:59
Roger Federer

Sep 02 14:59
It’s hard to beat the crowd (Vegas in this case) no matter how smart you think you are

Sep 02 14:57
Could Rob Dibble have been a comp for Strasburg?

Sep 02 14:15
WOWY Teachers

Sep 02 13:37
Who’s Waldo?

Sep 02 08:36
Team Elin

Sep 02 01:19
Can someone tell me why Trevor Hoffman is still allowed to pitch?