THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Saturday, April 26, 2008

Small team sample size: Can anyone just write anything they want because they have a “title”?

By , 09:22 PM

ESPN.com’s Jerry Crasnick writes:

Maybe it’s due to maturity. Maybe it’s because they’re being more patient at the plate. But whatever the reason, the D-backs now have one of the most powerful lineups in all of baseball.

The correct quote should be:

Maybe it’s due to maturity. Maybe it’s because they’re being more patient at the plate. But whatever the reason, the D-backs have had, with 13% of the season played, one of the most powerful lineups in all of baseball.  But, as we all know, almost anything can happen in 20 games or so.  If we look at the career stats for all of their players, we can easily see that they have been hitting over their heads, and will likely have one of the least powerful lineups in all of baseball, for the remainder of the season, although even that will likely be disguised by the fact that they play half their games in the one of the best hitter’s park in the NL.

I have not read the article itself.  Irresponsible me.


Here are my offensive projections for each NL team’s entire projected roster, prorated by projected playing time (granted, these numbers do not represent the projections for the D-Backs “everyday starting lineup"), which I steal from BP’s web site.  This is what we expect the team to do, offense-wise, from now and to the end of the season.  The measure is in regular old-fashioned Pete Palmer linear weights.  Notice the D-Backs projection.  Please.  And pass them along to Mr. Crasnick.  The reason the D-Backs have a good team is their pitching (and decent defense and baserunning), which I have as second best in the NL, just behind the Mets.  It is NOT because of their hitting, despite the fact that they have been hitting well (not disgustingly well - 15 runs above average, exactly the same as ATL and ANA), linear weights-wise, so far this season!

ATL 42
MIL 35
PHI 22
FLO 19
CHN 16
NYN 14
WAS 9
LAN 7
SDN 7
SLN 5
ARI -10
CIN -11
COL -11
PIT -20
HOU -25
SFN -102

#1    Vegas Watch      (see all posts) 2008/04/26 (Sat) @ 22:06

The difference between #1 and #15 is 10 runs larger than the difference between #15 and #16.  That is absolutely incredible.


#2    David Gassko      (see all posts) 2008/04/26 (Sat) @ 22:47

You mean 10 runs smaller, which is even more incredible.


#3    MGL      (see all posts) 2008/04/27 (Sun) @ 01:48

SF has to have one of the all-time worst offenses I would think.

Smaller, larger, you both mean the same thing.


#4    tangotiger      (see all posts) 2008/04/27 (Sun) @ 08:16

Are you guys kidding me?  They have clubhouse chemistry now.  Didn’t you read about it in the off-season?  The only other team in the entire country that has more clubhouse chemistry is my kid’s T-ball team.

I will grant the writers one thing: they talked about chemistry BEFORE the season started.  Fantastic.  Exactly what we’ve been asking for, for someone to talk about chemistry before a game is played, and not as a way to explain a 100-win season (or 100-loss season).

Chemistry may exist, but darned if anyone knows what the heck it means.


#5    Vegas Watch      (see all posts) 2008/04/27 (Sun) @ 10:49

Smaller, right.  That’s what I was trying to get at.

At least they got rid of Matt Morris.


#6          (see all posts) 2008/04/27 (Sun) @ 13:19

I believe clubhouse chemistry exists, because I don’t know why baseball players would be exempt from group dynamics. I just don’t see any evidence it translates into wins.

Looking at the Seattle Mariners last season, there were a number of media stories about how the team thought, from spring training on, that they were going to have a special year. Yeah, that didn’t work out so well. I suspect the “team chemistry” in that case just made them that much more resistant to making any changes when problems appeared.

We’ll see what the line is if the fabled chemistry doesn’t lead to the playoffs.


#7    Sky      (see all posts) 2008/04/27 (Sun) @ 14:14

Those projected linear weights seem very tightly bunched to me—70 runs between the best and second-worst offense?  Only 100 runs below average for the Giants?  Maybe my frame of reference is off.

MGL, are your numbers park-adjusted?  That’s another trap mainstream writers fall into.


#8    Eric J. Seidman      (see all posts) 2008/04/27 (Sun) @ 15:12

It’s clearly due to their guitar hero playing.  It’s the only plausible way to explain this start.  Clearly, chemistry. 

Chemistry = 2.5(Guys liking each other)^0.71 - 4(Guys that dislike each other) - BW

BW = Billy Wagner, which debits your team because he’s a “rat.”


#9    Colin Wyers      (see all posts) 2008/04/27 (Sun) @ 15:42

I could be wrong here, Sky, but I think that linear weights might underestimate the difference between the best and worst teams, because linear weights is an estimation of an event’s contribution in a run scoring context - if you’re using the same weights for each team, you’re saying that a single by one of the Giants’ hitters is just as valuable as a single by one of the Braves’ hitters. That’s not true. (This is why James was, in fact, correct that Run Scoring isn’t a linear process.)

I don’t know how much of a difference it makes, but BaseRuns or a Markov chain are probably going to be more accurate here.


#10    MGL      (see all posts) 2008/04/27 (Sun) @ 16:43

Yes, these are park adjusted.  In fact, completely context-neutral.  I take each player’s expected number of PA for the rest of the season, based on BP’s Fantasy playing time projections (it includes minor league players who have not even played in the majors yet), and simply prorate their current lwts projection (updated for current season performance) for those expected PA’s, and add everything up for each team.

If you want to translate my numbers to rpg for each team, simply take those numbers, divide by the number of games left, add that to the league average expected rpg for the remainder of the season (around 4.7 for the NL and 4.9 for the AL), and then park adjust around half of that.

For example, the Giants are -102 with around 142 games left (when I did the numbers). So, that is -.72 runs per game.  Average NL is 4.7.  So 4.7 - .72 is 3.98.  Park run factor for SF is around .96.  So multiply 3.98 by .98 (only half the games at Pac Bell), for 3.90 runs per game rather than 4.7 for he average team.

And they are an extreme and rare case.  The fact of the matter is that there is a not a large difference in true talent between the best and worst offenses.  What you see during the season, and at the end of the season, of course, in terms of the spread between the best and worst offenses, in runs scored, is talent plus a lot of luck.  In 162 games, I think one SD for rpg is around .2 runs.  That is a lot.  That means that around 1 or 2 teams will score 30 or more runs more or less than they “should” just by chance alone.

And sure, for a team, technically you want context-adjusted or custom lwts, or baseruns, or something like that.  But I think this is close enough.  It wouldn’t make a noticeable difference what you use, I don’t think.

Tango, chemistry is one of those things that even if you said it at the beginning of the season, you’d have a 50/50 chance of being right by chance alone, if by “right” we mean over or under-perform expectations.


#11    Eric J. Seidman      (see all posts) 2008/04/27 (Sun) @ 16:47

MGL, do you have this data for past years too?  If so, I’d be curious to know the last team(s) to fall into the same category as the Giants.


#12    MGL      (see all posts) 2008/04/27 (Sun) @ 16:52

Eric, sorry I don’t.  I suppose you can just look at all teams’ actual rpg in history, as compared to league average of course, do a park adjustment and a regression toward the mean (I don’t know how much for one season), and that should approximate their true talent run scoring.

BTW, in #10 above, of course I meant that 1 or 2 teams will likely score at least 60 runs (2 SD), not 30, more or less than they “should.”


#13    MGL      (see all posts) 2008/04/27 (Sun) @ 16:57

Oh, and BTW, DET comes in at +57 runs, which is the best in baseball (but not even close to ridiculously good, and not nearly as good as recent Yankee teams), but…

57/140 = .41.  4.9+.41 is 5.31.  Park adjust by multiplying by .985 (DET run factor is .97), for 5.23.  5.23 * 162 is 847 runs.

The idea that they could come close to scoring 1000 runs is preposterous.  They would have to score more than 150 runs more than they “should,” which is around 5 SD.


#14    tangotiger      (see all posts) 2008/04/27 (Sun) @ 17:25

The team talent is distributed such that 1 true talent SD = .060 wins per game.  Remember that.

That means that offense = defense for 1 SD = .0424 wins per game.

So, that puts the runs per game at around 1 SD = .44.  So, with 140 or so games remaining, 1 SD = 62 runs.

Therefore, MGL’s estimates look tighter than we’d like to see.


#15          (see all posts) 2008/04/27 (Sun) @ 20:09

The three top home run hitters for the D-Backs so far are:

Mark Reynolds (age 24, 366 career AB’s coming into this year)

Justin Upton (20/140)

Chris Young (24/639)

An open question for any who care to answer: how certain are you that their BP projections are accurate?

Another question: Is Crasnick demonstrably wrong for perhaps not agreeing with those projections?

It’s one thing to ridicule a writer for expecting a career journeyman to keep up a hot April throughout the year, but there are a lot of folks here exuding way too much certainty that they know how the D-Backs young hitters are going to perform over the 2008 season. 

Say Crasnick is in conflict with BP’s projections, I agree.  But I don’t think you can justify saying he’s wrong, or stupid, considering how few AB’s have led to those BP projections.

Not to mention the fact that he may have been speaking specifically about HR’s when he said “power”, while everyone here is assuming he meant runs scored…


#16    tangotiger      (see all posts) 2008/04/27 (Sun) @ 22:24

Note that for MGL’s forecasts, just to be clear, he is only taking the playing time forecasts of BP, and not the rate stats.


#17    MGL      (see all posts) 2008/04/28 (Mon) @ 00:23

I don’t think Crasnick, or 99% of any mainstream sportswriter, has any idea about projections, etc.  He merely sees the team doing well, knows they are young, so he writes an article about how “good” they are, with the clear implication that they will likely continue to do exactly what they have been doing.  Nothing more, nothing less.  It doesn’t make him wrong, or stupid, or fat or ugly.

As far as BP’s projections or mine being “accurate” or not, there is no way of answering that question without being more precise in your question (although BP does advertise their projections as being “deadly accurate” I think).  You can make a credible case for their and our projections being very accurate and you can make a credible case for them being notr very accurate at all. And then people can argue for months. That is what happens when you ask, vague, non-specific questions (I am not directing that at you Greg).

If you followed the projection threads on this site, you know that any one projection system is just as good as any other, more or less.  Whether someone “agrees” with a projection or not doesn’t really matter. If Crasnick can outperform BP, me, Chone, or Marcel, in the long run of course, with regard to young players, old players, the D-Backs, or any other team or groups of players, I will personally fly to where ever he lives or works, and anoint him a genius and retire from anything to do with baseball myself.  And I don’t mean that in a disparaging way to him at all.

As far as whether he meant power as in HR’s or run scoring, or whatever, I have no idea because I did not RTFA.  It doesn’t matter. I was just making the point that virtually ALL mainstream sports commentators look at short term results and write articles about why those results are what they are, because that is what they get paid to do.  No one would pay someone to write, “The D-Backs are -10 run in projected offense from this point forward, I have no idea why they have done a little better than that in their first 20 games, random chance is as good of an explanation as any (since it is not that unusual for a true -10 team to be +16 in 20 games), and frankly I don’t care (why they have done a little better thus far).”

End of article.

Well, actually I would pay a lot of money to see that in a mainstream newspaper or web site, but that’s just me.

On the ESPN game tonight, Gammons said, and I virtually quote, “A lot of people predicted that the Tigers would score 1000 runs.”

One of two things are true about that statement:  Either a lot of people did NOT predict that the Tigers would score 1000 runs (maybe a lot of those so-called people said that they “wouldn’t be surprised,” which is different from saying, “I predict that they will"), or if they did, they were incredibly misinformed about how to go about projecting a team’s runs scored and how much a team is likely to fluctuate around that projection (and park factors on top of everything else). 

There is/was probably a 1 in a million chance or so that the Tigers would score 1000 runs this year. 

A much more accurate statement would be, “There is virtually no chance that the Tigers will score 1000 runs this year, or any other year, unless perhaps, they change their park, for starters.”

How can someone say, “I would not be surprised if X happened,” or, “I predict that X will happen?” when the TRUTH is that there is virtually NO chance that X will happen?  You have to be an incredible moron, with respect to that statement (not a moron in general).

Now, granted, Gammons did not say that HE was one of those people, but…

Does that somehow educate us, the listener, to hear that “lots of” incredibly ignorant people predicted something that was almost impossible to occur.

Should Dan Rather tell us that, “Lots of people have predicted that global warming would kill everyone on the planet in the next 1-2 years?” even if that were true?  Is that good journalism?

I am obviously being hard on these guys, but as most of you know, I HATE mainstream sports journalists with a passion because of the never-ending stream of nonsense, lies, misinformation, ignorance, and general all-around drivel that comes out of their mouths, in the guise of actual analysis and true and useful information.


#18    David Gassko      (see all posts) 2008/04/28 (Mon) @ 02:08

Now, granted, Gammons did not say that HE was one of those people, but…

Does that somehow educate us, the listener, to hear that “lots of” incredibly ignorant people predicted something that was almost impossible to occur.

Should Dan Rather tell us that, “Lots of people have predicted that global warming would kill everyone on the planet in the next 1-2 years?” even if that were true?  Is that good journalism?

***

There’s one flaw in your analogy, Mickey. When it comes to global warming, the “experts” (scientists) don’t espouse the view that we will all be dead in a year or two. So, the “lots of people” that do aren’t really credible to the vast majority of people.

However, when it comes to the Tigers scoring 1,000 runs, the “experts” to most people are sports writers and commentators, a lot of whom did predict the Tigers would score 1,000 runs. That may be asinine, but that’s neither here nor there. Therefore, in this case, the “experts” were wrong (or it appears they will be). You may not value their opinion very much, but most people do.


#19    MGL      (see all posts) 2008/04/28 (Mon) @ 05:27

David, yes, I agree, which is where much of my ire comes from.  The fact that the guys who said that the Tigers might score 1000 runs are regarded as the “experts” in the field, whereas we are regarded as geeks who live in our mother’s basement.  For the record, my mother never had a basement.  It may sound like I am bitter, but I am not.  I don’t want to be regarded as an expert in the field.  I just don’t like scam artists, be it a late night infomercial guy, a stock broker or market analyst, a sports handicapper, or a TV, radio, or newspaper sports commentator.

I have absolutely no love or respect for someone, be it Gammoms, Crasnick, Morgan, or any other person who puts himself out as an expert, and is regarded as an expert, yet spouts almost nothing but nonsense.  Not complete nonsense of course, but enough to render their opinions and statements generally asinine, to borrow your terminology.

They may be fine individuals in their own right, with wonderful pedigrees, but as far as I am concerned, they are nothing but charlatans perpetrating a scam, whether they know it or not (of course they don’t know it), on the public.


#20    tangotiger      (see all posts) 2008/04/28 (Mon) @ 06:04

Bill O’Reilly is a bad journalist, he’s a bad commentator, he’s a bad analyst.  Whatever it is that he is supposedly an expert at, that makes people want to write to him, he’s bad at. 

Most sportswriters and sports commentators I hear are much worse than O’Reilly.  So much so, that I simply refuse to listen to any highlight shows and pregame shows, as much as possible.

At one point, we should make a pledge not to discuss anything these guys say with any seriousness, unless they impact something we really care about (see: HOF, Raines).

***

The list starts here with sports experts.  And, we should have a high standard here.  I’ll nominate one, then each of you can nominate someone else.  This will be our list of guys we can get po-ed about if they spew crap:

Rob Neyer (bonus: yet to be po-ed about him!)


#21    Terry      (see all posts) 2008/04/28 (Mon) @ 07:36

Without chemistry, there wouldn’t be plastics. Plastics are the future......


#22    Tangotiger      (see all posts) 2008/04/28 (Mon) @ 13:49

Four weeks into the season, we have to start making calls. It’s still a little early in most cases, but we have enough information to reach tentative conclusions about some players, teams, and issues. For the next three days, that’s what we’ll do—evaluate what’s real, what’s not, and what we’re on the fence about. Today, five things that are real.

-- Joe Sheehan
http://www.baseballprospectus.com/article.php?articleid=7438


#23    MGL      (see all posts) 2008/04/28 (Mon) @ 16:20

His comments (Sheehan) about the D-Backs are Crasnick-like crap.  What if he uses his own web site’s Pecota projections to figure how good their offense is? (I don’t know what that will yield - I used my own projections for the -10 lwts.)

There is no “magic point” at which current to-date season performance tells us something.  I’ve said that a hundred times already, and I’ll say it a hundred more.  Granted, the more the current performance, the more it tells us, but, why use current season performance only, when we have thousands of PA of performance in past seasons for most of a team’s players, and we KNOW that that past-season performance carries 80% or so of the weight of current season performance?

IOW, and for example, let’s say that we have a -20 runs per season (162 games) projection for all of team A’s non-pitchers and they have played to a +10 per season in 20 games.  Our new projection might be -17 or something like that (I am using the 90/10 formula), hardly any different from the original -20, at least as compared to the +10.

Now, say that team B has played to a +10 in 40 games and we are tempted to say that well, 1/4 of the season means a lot.  Two things:  One, what are we down to, using 80/20 (I am guessing at that), still puts a -20 pre-season team at -14, a FAR cry from +10, and two, who cares about whether they have played 20 games or 100 games, if we have their pre-season projections (and pre-season PA those projections are based upon)?  We can simply adjust our projections to include the new data!  What if team B was -30 before the season started?  They would now be -22 even after 40 games (using the 80/20).

What I am trying to say is that NO MATTER HOW MANY GAMES HAVE BEEN PLAYED, you still need to use the player projections which include the thousands of PA before the season started!  That is true even after 161 games!

So, let’s stop saying that any individual player or team is FOR REAL after X number of games (whether X is 20 or 120) until and unless we stop ignoring the 5000 or so games that have been played by all of their players prior to the current season!  Sure, if we don’t know any of that pre-season data, then we can give a lot of weight to those 120 or 161 games, or whatever (even then, we still have to regress toward .500 maybe 25% or so), but let’s not intentionally ignore those thousands of games and PA, as if they didn’t exist.

Since Joe Sheehan himself is part of BP, the LEAST he can do is give us what Pecota thinks of the D-Backs team based on their current, updated projections, and then he can tell us why he does not believe in his own (his company’s own) “deadly accurate” projections!

Neyer occasionally says “crap,” but with all due respect to him, if you point it out and explain why it is crap, he will usually acknowledge it.  But, the gap between ALL of the mainstream writers (save a few like Poz and King Kauffman) and Neyer is like the Grand Canyon.  It ain’t even close.


#24    Greg Rybarczyk      (see all posts) 2008/04/28 (Mon) @ 17:43

MGL:

In my day job, I analyze testing data on prototype products, part of a major new product development.  One of the trickiest things in doing this is deciding when to cutoff one series of data and begin another: typically, when we release a new major version of prototype, we do this.  Doing so ensures that we don’t mix one set of data from an old version with another set of data from a new version, which would make no sense if the design had drastically changed, which is absolutely the case when the new version involves fixes for problems found in the old version.  If part A was breaking half the time in version 1, and we replaced it with a new part in version 2 that is made of another material with a higher yield strength, it makes no sense at all to combine data from version 1 and 2.

Of course, the down side of starting new data sets is it resets your sample size, so occasionally if there are subsystems that have not changed from version to version, we will combine the data, but cautiously, with a close eye on looking at the root causes of failures (if any) with our engineering judgment, to decide if the failure modes are new or old.  This use of engineering judgment to decide when to disregard the general principle of re-versioning would be analogous to a scout or writer deciding with his eyes/experience that something has changed in a player such that they are no longer on the continuation of last year’s performance arc…

Long analogy, but here it comes back to the topic: while you are right that much of the time, the previous years’ data represents a continuous arc of performance and thus should be included, there are times when you should break the data out and begin a new set.  For example, injury/recovery: Sept. 25, 1999 is when Nomar Garciaparra was hit on the wrist with a pitch, and his hitting skills were not the same afterwards.  Maybe you could put PED’s into this discussion as well, though I’m not going to throw any names out.  In general, player development fits this model as well, though probably in ways difficult to detect until after the fact.  But that doesn’t make them non-existent, just unpredictable.

So, what I’m saying is, you should allow for the possibility that sometimes, the use of your specific formula for projection may not be the best way to predict performance going forward.  It is quite possible that others may have information that convinces them that this year’s version of Mark Reynolds, or Conor Jackson, or Justin Upton, is fundamentally different from last year’s version, or the previous year’s.

Certainly, if people like Crasnick or Sheehan believe this, they should say what that information is, and we all reserve the right to be underwhelmed by their reasoning, of course.  But my main point is, please acknowledge the imperfection of the projection system you are espousing.  I wouldn’t blame you for considering it better than other systems, but perfect it is not.


#25    tangotiger      (see all posts) 2008/04/28 (Mon) @ 20:05

Greg, as long as their determination is NOT made based on the sample data, then I concur, as I would think mgl would as well.

The problem is with people looking at the sample data, and then deciding that “yup,this is real”.  Make the determination without seeing the data, make it based on actual physical changes, or things that are known to have changed.

You simply cannot establish that someone has a new true talent level based on someone’s interpretation of what 100 PA means.  100 PA is 100 PA, unless someone tells you specifically why those 100 PA are different.

***

Yes, Poz is fantastic, and King is great too.  That’s 3 guys so far that I’d be disappointed with if they started yapping.


#26    Greg Rybarczyk      (see all posts) 2008/04/28 (Mon) @ 21:27

#25 Yes, I agree completely, if they aren’t doing anything more than looking at the recent past (particularly in just part of April), they are a windsock…

In this case, I think we may well be looking at windsockish commentary… hopefully Crasnick or Sheehan can elaborate on why they feel the way they do, we’d all be better for it…


#27    fifth of      (see all posts) 2008/04/28 (Mon) @ 21:50

Uh oh, do not let MGL see any of the blogging at Fangraphs so far! A conflagration would ensue.


#28    MGL      (see all posts) 2008/04/29 (Tue) @ 00:34

Which blogging is that, #27.  I rarely go there. 

Greg, if anyone can outdo basic projection systems (Marcel, Chone, me, BP, Zips, etc.), based on information they have that is not included in these systems, then let them come forward and announce themselves, and like I said, I will personally award them the medal of forecasting honor, assuming that they can show some evidence that they can outdo these systems.

Yes, of course a player’s basic underlying talent can change from any one time period to any other, and yes, the best forecasting systems should try and adjust for things like injuries and the like (I actually do in a very objective and crude way - I use days on DL to adjust the projections).  Who would be so ignorant as not to assume that that is true?

However, I seriously doubt that an entire team like the Diamondbacks can change their true talent level from what we would expect from the basic forecast, from a below average offensive team, to “one of the best offensive teams in baseball,” AND that Crasnick and Joe Sheehan would be blessed with the knowledge and ability and would enable them to know this.

Let Crasnick and Sheehan pick a number of runs above or below average that the D-Backs will score, without cheating of course, I will do the same - in fact, I’ll just use my -10 (after a park adjustment of course), so there can be no chance that I am cheating, and we will split the difference as an over-under.

I will lay them 3-2 odds and wager any amount they want (please, no comments from the guy who was pissed off that I make these “wager offers” - it is none of your business).

Of course, they have to pick a number which is commensurate with their statements that the D-Backs team IS (not was) one of the best offenses in the NL, otherwise they are cheating with the number or were lying in their articles.

So, I assume that it has to be somewhere in the range of 20 or more runs above average.  And that is AFTER park adjusting of course.  If a team plays in an extreme hitters park, which Chase Field (or whatever they call it now) is, and you predict them to score more runs than average, BEFORE park adjusting, you DON’T get to call their offense “above average!”

Or we can simply use road rpg (adjusted for HFA of course) to avoid any arguments about what the park run factor for Chase should be.  Personally, I use 1.08.

I make these “wager offers” (I don’t mean them literally - at least that is my legal disclaimer) for the reason that someone mentioned in the other thread.  It is often the only or the best way to expose someone’s BS.  Anyone can say anything they want, and that apparently goes for highly paid “journalists,” whether that something has any basis in evidence or truth.  Ask them to “put their money where their mouth is,” especially when they know in their heart of hearts that YOU are the true expert and not them (I am not ashamed or perhaps humble enough to admit that), and all of a sudden they start hemming and hawing and making excuses, or they say, “Well, I am not a betting man, but if I were....”


#29    MGL      (see all posts) 2008/04/29 (Tue) @ 03:09

From Sheehan’s article:

This is no fluke—the Diamondbacks are real, on their way back to 90 or more wins and the postseason. Last year’s team arrived early on the strength of a surprising bullpen. This team is the the one that will be among the best in the league for years to come.

A couple of things.

Again, where is the evidence that “this is no fluke?”

Because some of the players are young, and they have scored 6 rpg so far, and HE thinks it is no fluke?  That is WHY it is no fluke?

If banner years (I’m sorry banner 23 games) by young players mean that their projections are wrong, we all need to redo our projection models.

Joe says that the D-backs have a great offense and should win at least 90 games.  90 games?!

I have the D-Backs winning 91 games on the average, with a slightly below average offense.  Either one of our “maths” is screwed up or Joe must not think much of their pitching or defense! 

I have their defense and baserunning at 10 runs above average for the next 140 games, and their pitching at 52 runs above average.

In order for them to win “at least 90 games”, and we’ll call that 91 wins, since saying “they will win at least 91 games” is a kind of squirrely thing to say, they merely have to be a .535 for the rest of the season.

Shouldn’t Joe be saying that they will win “at least 95 games?” If they have a great offense, and I don’t think anyone would deny that they have great pitching, they would have to be a .580 team or so.  That puts them at 96 wins for the season, since they were 18-7 when Joe wrote the article I think. “At least 90 wins???” Can I say, “The Mets are a really good team.  I predict them to win at least 80 games?” If they win 81 or 101, I am right!

I’m sorry, I like Joe, and a lot of his writings, but what he says about the D-Backs is just yapping.  That being said, what he says just MIGHT be true of course.  I think the chances are quite low, but NOTHING is either 100% or 0% certain when it comes to projecting performance or estimating player or team talent.

As I used to say, ANYTHING (and that means ANYTHING, not ALMOST anything) has a finite chance of occurring in any sample of performance, no matter how large or small that sample, as long as that ANYTHING is “physically” possible.  Endyh Chavez hitting 60 HR’s?  Possible.  A-Rod hitting zero HR in 500 AB? Possible.  Zito going 28-0 (not now of course)?  Possible.  Pujols hitting .123 at the end of the season?  Possible.  Ichiro batting .000 at the end of the season?  Possible.


#30    fifth of      (see all posts) 2008/04/29 (Tue) @ 11:30

#28, you are better off not checking it out (for now, anyway), but Fangraphs hired Dave Cameron, Eric Seidman, and Marc Hulet to blog on the main page of Fangraphs.com. While each has shown themselves capable of strong analysis, the format and frequency of the posts has so far dictated that they not produce much compelling analysis. Many of the posts are filled up with graphs that make no reference to playing time or opportunity. A great number of conclusions (cough, Dave Cameron) are being drawn on the basis of tiny, tiny samples.

I like Dave Cameron’s writing and appreciate his presence in the internet baseball writing community. But I think he tends to go overboard pretty quickly on the basis of very small amounts of data. Remember when he argued that FIP should be adjusted for LOB%, and his argument rested on something like two years of Johan Santana and seemingly not realizing at all that the weights in FIP were run values rather than on-base values? He has been repeating that type of analysis all over the blog, and I don’t think it’s worth my time anymore. Maybe once they hit August these problems will not be so pronounced.

Marc Hulet I’m not ready to judge, since he started later and I think posts less. Eric Seidman knows what he is doing with numbers, but most of his posts are very fluffy. I realize that this is his conscious choice as he has stated repeatedly his goal is accessibility and not alienating the average fan. For a magician/screenwriter, this is understandable. But I think it is a waste of the talent David has assembled to not be conducting studies and instead to be just checking out little factoids.

David’s own blog posts are still good but infrequent.

It is disheartening that the posts on the site seem to so strenuously stick to fangraphs stats and tools. How are these analyses improved by showing graphs of RC/27 instead of tables of lwts and PA? At least have a graph that indicates PA. And Fangraphs has velocity and pitch type data, but none of it is used to analyze batters. Why? Plus, every time they show a graph of batted ball types, I have to read the next paragraph to figure out which color is LD, GB, FB. Is the legend being left out on purpose, or what?

I think Fangraphs needs to learn a lesson from BP. When you tell writers, implicitly or explicitly, to lean on your site’s flagship metrics, the writing suffers.


#31    fifth of      (see all posts) 2008/04/29 (Tue) @ 11:56

OK, from the two most recent blog posts at fangraphs:

First, Seidman:
“Great pitching trios are so valuable for the more obvious reason that, over the course of any given three game series, the team is likely assured of having at least one solid starter on the mound.”

I call BS. It is fine to say that having three great SP means you’ll have one in any 3+ game series if they’re all healthy. It is silly to say that this is what makes them “so valuable,” and that silliness is compounded by calling that value obvious. Is having three 3.5 WAR pitchers and two 1.5 WAR pitchers more valuable than two 4.5 WAR pitchers and three 1.5 WAR pitchers? Maybe, but it is not because of how they are distributed across series. They do not keep track of series won in the standings, and if the issue is avoiding having three lesser pitchers face a division rival, *you can shake up your rotation order.* Spacing out your two aces would accomplish the same thing.

Next, Dave Cameron:
“Nick Johnson is becoming the new poster child for why looking at results, and not the underlying skills, can lead to problems.

Johnson is posting a .216 batting average, so the easy narrative here is that he’s still getting his legs back under him after missing all of the 2007 season after a violent leg fracture in 2006.”

Cameron then points out that Johnson’s numbers are suffering because he has a low BABIP despite being fifth in the NL in LD%. What the hell kind of poster is Dave making that he wants Johnson’s mug on? Johnson’s career numbers are .270/.395/.457, and this season he’s hitting .216/.392/.432. Who cares that he has hit a few extra line-outs? The sample size for his LD% is not big enough for us to care about it, at least as long as were are talking about 28% and not 40% or 0%.

Johnson’s .216 BA can also be explained by having his highest K rate since 2002. Shouldn’t this factor into the analysis? Actually, no, it shouldn’t, because the sample size, like that for LD% and BABIP, is not such that these differences warrant being pointed out.

Seidman likes abstracting what he knows to be true about baseball into nuggets that sound like common sense but that are not particularly true, and Cameron likes using whatever data he can find to make a point regardless of the data’s significance. At their old blogs, these tendencies were more restrained, but on Fangraphs, they are encouraged.


#32    Terry      (see all posts) 2008/04/29 (Tue) @ 12:21

Dave isn’t predicting Johnson’s performance going forward but rather illustrating sound principles for analyzing past performance using Johnson’s first month.

Sample size really isn’t a flaw in his analysis.  In essence Dave is arguing that Johnson’s truly great performance over the first month was masked by the effects of randomness made possible by small sample size.

He’s not ignoring sample size but rather shining a spot light upon it....


#33    Greg Rybarczyk      (see all posts) 2008/04/29 (Tue) @ 12:42

#31:  You are confusing explanation with projection.  You can explain why something happened (in this case, low BABIP leads to low BA) without projecting it to continue happening.

Cameron says Johnson’s BA is low because his BABIP is low.  That statement can be made over any time frame; it’s like saying “I was late for work because I woke up late.” It’s not a projection, it’s an explanation.  Or, analysis, you might call it.  Not deep analysis if that’s all you say, of course.

Later, Cameron tries to go a bit deeper and points out a surprising fact: Johnson’s LD% does not match with his BABIP.  Again, nowhere in this is he saying a) Johnson’s BABIP will continue to be this low, b) Johnson’s LD rate will continue to be this high, c) Johnson’s BA will continue to be whatever. He is not projecting, he is explaining, and attempting to analyze.

Hopefully you’re not suggesting that writers shouldn’t discuss any short term results, because they are only short term.  The point that short term results shouldn’t (by themselves) be projected is valid, and has been made over and over here.  But you absolutely can look at a small sampling, or any size sampling of results, and ask “why did these results happen?” Example: why does Casey Kotchman have 6 home runs?  Well, by my reckoning, 3 of them were “lucky” and got help from the wind to leave the park.  That’s analyzing what happened, which is totally separate from using results to project forward.

And by the way, reflexively saying “Player X’s performance will regress to his mean”, or words to that effect, isn’t what I’d call compelling analysis, either.  Usually true, sure, but not adding much.


#34    Eric J. Seidman      (see all posts) 2008/04/29 (Tue) @ 12:50

Guys, I appreciate the critiques.  Seriously.  I’ve always been one to possess the mindset that you can always improve and so I will take these to heart and work on the skills mentioned that you think require work.  If something isn’t working I do the best I can to alleviate the problem.


#35    fifth of      (see all posts) 2008/04/29 (Tue) @ 12:54

#32, you are missing my point. We have sample data, and none of it shows that Johnson has been bad. Dave jumps in by pointing out that Johnson has better “skills” metric numbers than “results” metric numbers. His point that the latter are inferior to the former is not illustrated by Johnson, who he deems the new poster child for this. Who cares how many LD Johnson has hit in such a small period of time? Dave is correcting faulty thinking (Johnson can’t hit! He has a low BA!) with faulty thinking (Johnson can hit! He has a high LD% and PrOPS!). As if someone with a .200+ ISO in a month should have his ability to “drive the ball” put into question.

“Johnson’s truly great performance?” What, hitting a handful of extra line drives that were caught means Johnson has been excellent? Maybe he’s just gotten more favor from the scorers? Maybe he has faced lackluster pitchers? Perhaps Johnson has had a theoretically great performance, but wouldn’t a truly great performance have both the skills metrics and the results metrics to back it up?

Sixteen line drives in April is no more a meaningful sample than a .216 batting average in April. Dave is wrong in arguing that we should correct the latter with the former. We simply should not be using numbers that are this close to the mean in this amount of playing time to draw conclusions.

And Dave does all this without reference to Johnson drawing more walks and more strikeouts. Or hitting fewer ground balls. And so on. This is a junk article.


#36    Tangotiger      (see all posts) 2008/04/29 (Tue) @ 13:01

Remember when he argued that FIP should be adjusted for LOB%, and his argument rested on something like two years of Johan Santana and seemingly not realizing at all that the weights in FIP were run values rather than on-base values?

Are you sure about that?  I remember he was arguing about either extreme pitchers, and I looked at the top 9, and found their FIP matched their ERA, or about extreme GB pitchers and how they have more mistake HR.  Whichever it was, we had a happy resolution, IIRC.

I don’t remember the other stuff.

I think Fangraphs needs to learn a lesson from BP. When you tell writers, implicitly or explicitly, to lean on your site’s flagship metrics, the writing suffers.

I agree with your basic position.  However, David at Fangraphs is not married to any of the stats he has, I don’t think.  He is very quick to acknowledge the value of new metrics, and apply them almost immediately to his site.  If only BP were to learn this lesson.  When my kid will be into sabermetrics, he‘ll be tearing into WARP and LEV.  That’s how much BP is married to what they do.

David is very open-minded, in my dealings with him, as well as being generous.

***

As for Nick Johnson, I took it to mean that he was showing how his small sample should have resulted in far better numbers, because he had alot of LD (and line drives leads to 75% hits), that it would be fairly shocking to get than much LD, and not the corresponding hits.

Nick has 13 non-HR hits, total.  He has 16 line drives.  We’d expect 12 hits, just off those line drives.  His 38 non-HR FB and GB should give us, I dunno, 9 more hits.  So, we’d expect to see say 21 non-HR hits, and he has 13.  That looks to be close to 2.5 SD.

He’s been: unlucky in the success of his line drives, but lucky in the frequency of his line drives!  Overall, his OBP/SLG is a bit below his Marcel forecast (405/484).


#37    fifth of      (see all posts) 2008/04/29 (Tue) @ 13:16

#33, Dave does not attempt to explain Johnson’s average in this post! He says Johnson had a low BABIP in spite of a high LD% and uses that to make the following claims:

“Nick Johnson is becoming the new poster child for why looking at results, and not the underlying skills, can lead to problems.”

“I’d say it’s safe to say that Nick Johnson is just fine.”

The second statement in no way requires looking at Johnson’s LD%. The first statement is laughable in an article that doesn’t touch any of the “underlying skills” numbers except LD%. I guess PrOPS serves as his stand-in here, but who cares what Nick Johnson or anyone else’s PrOPS is over 100 PA?

Greg says:
“You are confusing explanation with projection.  You can explain why something happened (in this case, low BABIP leads to low BA) without projecting it to continue happening.”

You are asserting here that I have confused explanation and projection. BS. Dave looks at three numbers, BA, BABIP, and LD%. The argument that follows is that the second explains the first and that the third explains why the second is invalid, hence the first is invalid. He offers no explanation of why the third should be given weight. LD% is not scouting, it is a result. It is *closer* to “input data” than “output data”, but this does not mean that you can do with it what Dave is doing. The reason for the low BABIP and the low BA is sample size, and we know that regardless of the LD%.

If Dave is so good at *explaining* Nick Johnson’s numbers, where is the explanation for why his line drives have not fallen in for hits? Are 75% of LD in play hits because all LD are alike and a fourth of them go straight to fielders? No, the difference from LD to LD can be pretty substantial. Maybe Johnson has been scorching the ball, but sixteen LD is not meaningful evidence of that.

I am not against pointing out that, while Nick Johnson has a low BA and BABIP, his LD% is high. But that is a very different animal than suggesting that the concerns over Johnson (are there actually any from people who know what they’re doing, or is Dave just inventing them?) are unwarranted because of his high LD%.


#38    fifth of      (see all posts) 2008/04/29 (Tue) @ 14:34

#36, you present a reasonable interpretation of Johnson’s numbers, and I’m showing why DC’s is not.

I agree that David Appelman is good on the flexibility about numbers. That’s why I’m willing to launch into the criticism, because I hope it will cause change, as Eric indicates it can. I don’t want to see another damned RC/27 graph.

http://www.insidethebook.com/ee/index.php/site/comments/quick_eras/

Check for yourself. There was a “happy resolution,” but it had nothing to do with any of Dave’s suggestions. Dave repeatedly argued on the basis of his not understanding that the weights were *run values* as opposed to a stand-in for baserunners. I think this is really obvious in reading his comments. His Poster Child in that case was Johan Santana because of a couple of years of data with a fairly marginal difference between the Quick ERA’s and actual. The next year, the difference was flipped such that over three years his numbers were:

3.05 ERA
2.90 szERA
3.11 FIP
3.30 xFIP
3.04 QERA (Nate)
3.07 siERA (Tango’s linear szERA with GB/FB)

His actual LOB% was 71.9%. The implied LOB% from each metric:

74% szERA
71% FIP
72% xFIP
72% QERA
72% siERA

There is an entire thread emerging from Dave thinking these metrics are screwy because of pitchers like Santana. He says on that thread “Less runs are scoring off Johan Santana than a BB-K-GB/HR model would predict.” BS. I think his whole argument was anecdotal, and he didn’t look at the anecdote in question with remotely enough scrutiny. I think Dave Cameron is not someone I want telling me which players are the Poster Children for statistical phenomena.


#39    Greg Rybarczyk      (see all posts) 2008/04/29 (Tue) @ 15:25

Well, there’s more to discuss here, I think, but IMO life’s too short to waste time with anonymous posters who think everyone’s opinion but their own is BS.  Enjoy the rest of the thread, whoever you are.


#40    Tangotiger      (see all posts) 2008/04/29 (Tue) @ 15:59

fifth/38: Good find.  I re-read the whole thread.  I don’t get the sense of what you are reading.  I read it that he was saying that these metrics imply a constant strand rate.  And, until I ran the numbers, I was thinking that FIP implied a constant BABIP rate.  Both of us were wrong.  And once I generated the data, we were both happy with what it told us.

Regardless, I think our points on this issue have been made.


#41    fifth of      (see all posts) 2008/04/29 (Tue) @ 18:07

39 Greg, I posted under a pen name that I have used before and that many people know me as. For concerns about employment background checks and googleability, sometimes a pen name is appropriate. That is different than anonymity. I have not been consistent on this issue of using my real name or a pen name because the appeals of a pen name are balanced by the lesser accountability. Sometimes I have used my name and sometimes a pen name. The Fifth Outfielder has been linked to on this site before. My writing has been at THT and an MVN-owned site in the past. I don’t think it is fair for people with established careers to dismiss other writers because those other writers have been forced to make choices related to their ability to seek or maintain employment. I think Tangotiger would agree on this point.

I do not think everyone’s opinion but my own is BS. I am frustrated by the content on Fangraphs and trying to offer my critique of it. Why am I frustrated so much? Because as someone who has devoted an enormous amount of time to non-commercial baseball writing, it is extremely difficult to see sites commercialize themselves and have this commercialization impact the content. I think Fangraphs’ current blogging format is encouraging frequent posting and use of Fangraphs numbers, and that an effect of this is to reduce the quality of the work.

I sent an inquiry to David A. about writing for his site, to which he did not reply. Feel free to argue that I am simply bitter. I think a better interpretation is that I am invested. If Fangraphs chose the best baseball writers they could find, then I want to see that in the results. If Fangraphs chose writers for name recognition, then my criticism is that they are falling into the Crasnickisms although on a radically different level.

Not everybody has the luxury of being able to devote a lot of time to baseball writing, and those of us who do it anyway don’t like to see other people making money off of lesser analysis. Deal with it.

My analysis is that the format is encouraging Crasnick/Sheehan-like articles, which is not to say that they are not way, way better. In #30 above, I picked out examples from the two most recent posts on the site, not for their individual flaws but in order to be indiscriminate. Baseball writers produce better writing with time to research, develop, and think through their claims. That is what I think, and I could be wrong.

I am not nitpicking. Or rather, if I am picking nits it is because there is a diagnosis of lice and nits are the only way to demonstrate this. There is a pattern in David Cameron’s posts on Fangraphs, and the Nick Johnson post is just an example. Just because most or all of his posts have some acknowledgment of the limitations of the sample size does not mean that his analyses are appropriate.

I have been attacking BP for the same things I am attacking Fangraphs for for quite some time. I hold professional writers to a different standard. Deal with it. If I am being unfair in calling the Fangraphs bloggers professionals, okay. I don’t see it that way.

I have called three different things BS in this thread: Eric’s statement about the value of 3 good SP being related to series length, your claim that I was not differentiating between explanation and projection, and DC’s claim that DIPS-based ERA’s undervalued Santana. I’m calling them BS for their content, not their authors. Eric is a very good analyst with baseball statistics (not a sabermetrician, though! smile ), you run an excellent site and have offered countless contributions, and DC is an excellent writer. Eric, IMO, goes overboard in trying to make his writing accessible sometimes, you missed the nuance of my argumentation in this instance, and DC gets carried away in his use of statistics.

#40 - This is how David summarizes his argument:

“The culprit here isn’t so much pitching from the stretch or the windup; it’s distributions of hits, I think.  Crappy Pitcher X (we’ll call him Joel Pineiro) is going to put a lot of balls in play and walk a lot of guys, so when he gives up a leadoff single, it’s pretty likely that he’s going to give up a couple more hits before he gets the third out of the inning, and that leadoff baserunner is going to cross the plate.

When Johan Santana gives up a leadoff single, though, due to his inherent already goodness, it’s not that likely that he’s going to give up several more baserunners that same inning.  It’s not because he’s going to pitch better than he normally does, but that because he’s already so good, he’s inherently more likely to get outs with that man standing on base and leave him there with crossing the plate.

I’m not talking about guys like Glavine, who pitch better with runners on base than they do with no one on.  I’m talking about the league as a whole; because the distribution of baserunners isn’t even between good and bad pitchers, neither is the likelyhood that a baserunner will score.

By ignoring strand rate in these component ERA formulas, we’re saying (by omission) that a baserunner against Johan Santana is just as likely to score as he is against Joel Pineiro.  And that’s just not true.”

I don’t see how this argument can be made without resting on the notion that FIP et al are essentially making ERA isomorphic to number of baserunners allowed. While this logic is implicit, it is still the functioning logic. DC says that FIP et al treat a baserunner allowed by Santana the same as one allowed by Pineiro. The internal logic of this claim is built on reducing the components in run estimators into baserunner estimators. However, the weights in FIP and all of the others are based on the building blocks of runs, not that particular building block of baserunners allowed. That is, the weights are about the inning-ending factor, the moving runners over factor, and the getting on base factor. If you understand how these weights are derived, I don’t see how you can come to the conclusion that “FIP, xFIP, and QERA all regress LOB% back to 70% for all pitchers since they are assuming a league average strand rate by the omission of that information from the formula” unless you fundamentally misconstrue their respective formula. All you need to do, as others on the thread point out, is plug in a pitching line and reverse engineer the strand rate and you will see that this is not true.

This is also especially outstanding given that Nate’s QERA was based on a regression of runs and that Tango’s newer szERA was updated to reflect the run value of GB versus FB, rather than the hit value. I am not saying that DC doesn’t know his behind from a hole in the ground. I am saying that he is sometimes less familiar with the theoretical underpinnings of statistical principles and tools in baseball analysis. It was frustrating in reading that thread and it is frustrating on Fangraphs.

David also said on that thread:
“The whole idea behind FIP is that its fielding independant pitching.  But LOB% isn’t fielding, and it’s at least partly a skill, so if it can be included, I think it should be.

I’m not smart enough to know how to include it.  I’m just hopeful that if I bug one of you guys enough, you’ll be able to.”

I think it is clear that this is an off-base criticism of FIP, and its internal logic is that FIP is only crediting pitchers for keeping runners off the bases.

Dave is a great baseball writer with a lot of insights to offer, and I have been reading his work regularly for four years. It is not bad writing, and it is generally not bad analysis, so I continue to come back. I do think that he has some blind spots in his ability to make use of baseball statistics. In his previous formats, there has generally been more room to discuss that impact. On Fangraphs, though, he is writing two posts on most days and has either been encouraged to make his posts about the data from this season or has chosen to. I’d rather make a general argument about the impact of the format on his writing than go through and explain what is flawed in his use of statistics every time. When he himself has admitted, as in the quote above, that he has some shortcoming in this area, is it unfair for me to point out that his current gig seems to exaggerate it?

Your other point is that you yourself thought FIP was predicting the same BABIP for everybody. So yes, even the best analysts do make mistakes. Perhaps that means I am being too hard on David. I don’t know. Again, there are, for me, different standards for writers who are getting money. If I criticized Crasnick for bad use of stats and pointed out a previous example, I don’t think you would have objected. But because we like David, it is easier to want to defend him. I think DC suggesting that xFIP regresses LOB% to 70% for all pitchers because Santana’s xFIPs were higher than his ERA (wouldn’t they have to be, because his HR/Fly was less than 10%?) stems from a different sort of misunderstanding than you assuming that FIP implied league average BABIP.

I acknowledge that I am probably being overly critical and harsh today. I have many of the same flaws that MGL has admitted to. I hope these are not taken as personal attacks.

-tm


#42    Eric J. Seidman      (see all posts) 2008/04/29 (Tue) @ 18:44

Tom, out of respect for those who don’t wish to speak about this anymore, please feel free to e-mail me if you wish to speak further.  .


#43    MGL      (see all posts) 2008/04/29 (Tue) @ 18:59

FWIW, I have not thought about the other issues, but the 3 pitcher thing IS complete BS.  Three outstanding starters is of course great for the post-season, but in the reg season, you team benefits from the sum total of all their pitchers prorated for the amount of time they pitch (and the leverage of those innings of course).  It matters not whether you have 3 good ones, 1 great one and 2 average ones, etc.  And of course it matters not how long your series are, etc.

The blog about N Johnson seemed reasonable to me, although I have not thought about too much.  It was a little odd to say that you have to be careful about results when his OPS was fairly high, even though his BA was very low.  We all know that BA does not tell us much about what a player has done (in terms of run/win value).  If you want to say to be careful about results without digging deeper into more granular data, like LD rate and BABIP, then at least use some kind of non-garbage results stat, like OPS.  Not BA.  But overall, I though the blog entry and analysis was reasonable, although, as I said, I only read and thought about it briefly. 

I don’t know anything about the Santana issue.

I don’t think that “fifth of” comments were out of line.  This is NOT the blog where you have to stroke another blogger’s or saber guy’s ego.  Not at all.  If you have something substantive to say, no matter how direct or harsh, as long as you back it up with cogent arguments, if you can’t say it here, you can’t say it anywhere.


#44    Greg Rybarczyk      (see all posts) 2008/04/29 (Tue) @ 19:03

TM -

I’m going to read through all of this more closely when I get home, but for now, let me just say thanks for the obviously thoughtful post, and I withdraw my criticism about anonymity and the BS statement.  As happens a lot in this sort of medium, the completeness of your thoughts wasn’t getting through to me, but I’ll happily accept 50% or more of the responsibility for that.

As I’ve often said to MGL, Tango & others, I think I predominantly agree with you, but also like to dig at the 5% of disagreement, or perhaps the misunderstood 5%.  I look forward to more discussions, and I’ll try to be a bit more civil as well.


#45    David Cameron      (see all posts) 2008/04/29 (Tue) @ 19:20

Guess I’m late to the party, so I’ll just offer a few quick comments.

1. On the FIP/LOB% issue, I was obviously incorrect.  That was the whole point of my participation in that thread - to figure out if FIP was flawed based on what I was thinking.  Tom walked through the steps, showed me where I went wrong, and that was that.  I haven’t maintained any similar criticism of FIP since, and I doubt there are too many people out there who have championed it as a metric more than I have.  If I had ignored Tom’s comments in that thread and continued on critiquing FIP for this “flaw”, then I think a criticism would be totally valid.  But man, if we’re not allowed to have a conversation and learn about things that we mis-believe, then what is the point of places like this?

2. My interpretation of my Nick Johnson post is a lot more like Gregs.  I wasn’t trying to do any kind of projection or suggesting that PrOPS held some important truth that we needed to know - I just found the disconnect between his LD% and his BABIP interesting, considering those things generally have a positive relationship. 

3. This might just be a personal preference that not everyone shares, but I always respond a lot better to people who critique my writing either through a comment on where I’ve written or through an email (my address is very easy to find).  I’m not trying to critique your motives, but I will say that if you’re truly interested in helping my writing at fangraphs “get better”, communicating that to me will be more effective than communicating that to MGL and Tango.


#46    Tangotiger      (see all posts) 2008/04/29 (Tue) @ 19:21

I think the difference between David and Jerry is that David is an occasional poster here and Jerry is not.  I think criticizing a politician is much fairer than doing so to a student who is in the same classroom.  At the least, the dynamics is different.

I don’t think being paid $10 or $100 or $1000 an article makes much difference, unless you yourself are paying them via subscription fees.

***

I think you do have too high an expectation technically, especially since after David (and I) realized what FIP was really doing, we accepted it.  Being ignorant is ok.  Being ignorant after being educated is not.

***

Anonymity is always fine, as long as we don’t bounce handle to handle.  I protect “tangotiger” as much as I protect the name my mother gave me.  All my online baseball posts have “tango” in it.

***

“Fifth Of” has made some great baseball posts in the past, and I know MGL was also impressed with him at the time.  If David has decided not to go with him, I can only guess that either he didn’t see what we saw, or that he’s got enough writers for now.


#47    fifth of      (see all posts) 2008/04/29 (Tue) @ 19:47

45 - David, I apologize for my choice of formats and the harshness. Things kind of took off from a certain point of departure, and sometimes I don’t do a remotely good enough job of reining things in when I don’t feel like my points are being considered.

In my initial post #30 where I referenced the Santana/QERA thread, I was making the point that you sometimes go overboard too quickly on limited amounts of data, which I think has been an issue at fangraphs so far. Like I said, I expected that to get better by August when there would be some more time to work on the format and more data to evaluate the season in progress.

That thread stuck out in my mind more than it should. The short version starts with the fact that I was unable to keep up with baseball in September/October 2006, and that particular thread happened shortly after I found a dead body at my job and was working 80 hrs/wk. When I later read it, I was so frustrated that it was clear to me why DC was misunderstanding the issue when everybody else in the thread kept saying they didn’t understand DC’s complaint. I referenced it too quickly and with too little precision.

As for your post on Nick Johnson, I want to make it clear that I don’t think there is anything wrong with pointing out the disparity between his BABIP and his LD%. My point was that the way you articulated this position assigned too great of a weight to his LD% given the sample, and that this led to some statements in the post I disagree with.

Let me state once again that I have the utmost respect for everybody in this thread, and my criticisms come from my own flawed perspective. I guess because it is so easy to see the influence of my own personal blinders that I am quick to look at the influence of similar minor flaws in the work of others for whom I have a high level of regard.


#48    David Cameron      (see all posts) 2008/04/29 (Tue) @ 20:54

Let me go in a different direction for a second, since we have a captive audience of readers who obviously want to see the content at fangraphs be as good as possible.

What would you like to see Eric and I write about? There’s nothing inherently interesting about posting twice a day and saying “none of this means anything, so don’t pay attention to it”, so I’m pretty sure you don’t want us just pounding on regression to the mean until the cows come home.  I’ve tried to write posts that highlight potentially interesting things like the transformation of Corey Patterson at the plate or Jonathan Sanchez flying under the radar, but I am well aware of the fact that we don’t have a proper sample to judge any of this on. 

So, if you could pick, what would you like to see us tackle?


#49    Greg Rybarczyk      (see all posts) 2008/04/29 (Tue) @ 21:41

I’ve got a quantitative question for Fifth of, and maybe MGL as well: given that small sample sizes can be misleading, how do you (asking your personal outlook) quantitatively make the transition from attributing breakout performance to random chance to deciding that it must be real?

A couple good examples from last year might be Carlos Pena and Chris Shelton.  When, quantitatively, do you decide that the 2007 model of Carlos Pena isn’t a continuation of the 2006 model (or maybe you haven’t concluded that)?  Or another way, how long did you feel that Pena was just lucky, and when, if at all, did you change your mind?


#50    fifth of      (see all posts) 2008/04/29 (Tue) @ 22:14

David, I liked your pieces on Patterson and Sanchez. Those both got straight to the point and, though it felt a little superficial to see the 2008 point on the graphs, you were talking about things that reached back longer and didn’t rely on 2008 data.

As a general suggestion, I guess I just don’t see why the data from 2008 is figuring so prominently in your analysis so far. We don’t have to write about the current season by focusing on its data. I think you have done your best work on FG when you have used what has happened this season as a reason to investigate data from the past that has been overlooked (i.e., Patterson’s steady K improvement, Sanchez never having been that different from Cain/Lincecum).

I think you’ve shown, David, that you are excellent at making arguments about roster construction and promotion of minor leaguers. At USS Mariner, since you are blogging about the team, you can really keep track of it and focus. I think if you applied those skills to how other teams construct their rosters and treat their minor leaguers, you could do some really dynamite analysis. I know that the obvious reason not to is that it’s hard to keep track of everyone out there. But I would love to see your take on the Brewers’ bullpen, on the Angels’ tangled mass of position players, on the Phillies outfield rotation, and so on. More broadly, I think you could do a really excellent job, especially with the WPA and LI data at fangraphs, of bringing up general trends in how efficient teams are being in making use of their roster spots, to check up on teams’ BS rationales for keeping good players in AAA, and so on. I think you are great at finding trends, and so I think the wider you can cast your net, the better, because the smaller the subject, the more prone to randomness the trend is.

Also, I think posts that stay focused on correcting a misconception are the best, short of original research. The Sanchez post did that. The Johnson post I may have overreacted to because a) I had heard no such suspicions raised about him and b) you used his case to attack a much broader question (i.e., making him the Poster Child), when the evidence to do so was not sufficient and you didn’t get into any of the counter-arguments. I think when you clearly state what issue you are addressing, cite evidence of where the issue has come up, and clearly state why the evidence supports a different conclusion or supports the idea of not forming a conclusion yet, your posts are very good. The Johnson piece just felt like you got off track on each of the three. I wasn’t sure what you were trying to correct (who was saying Johnson wasn’t driving the ball, and why? Were they looking at his SLG or just his BA?) and you had an incomplete and, in my mind anyway, somewhat misleading explanation of why the would-be common knowledge was off (since you only looked at the difference between his LD% and BABIP without acknowledging the defecits of LD% or any of the other factors contributing to his lower BA (strikeouts, etc.) and BABIP (fewer GB, etc.).

Hmmm. This got me thinking about the Bill James piece at Bill James Online this week where he defines sabermetricians as starting with a question rather than their opinion. I think David does great analysis when he starts with the question, but sometimes gets quickly swept into an opinion that finds itself rising above the question. In other words, there is a difference between answering the question “Is Nick Johnson really not recovered from his injuries?” and starting your analysis with a stated purpose of showing that NJ is becoming a Poster Child for skills vs. results in data analysis.

That’s what comes to mind. I really do apologize for the nature of my criticism earlier, and I really think that Fangraphs will end up showing BP how it is done.

And David, I don’t have your email handy, but if you want to contact me off the boards you can get my email from Eric. I see myself as having been pretty mean-spirited in trying to make my points today, and I hope to make it up to you at some point.

Best,
tm


#51    fifth of      (see all posts) 2008/04/29 (Tue) @ 22:42

49 I think it is all about regressing, and the tools for doing so are available in The Book and elsewhere.

Pena/Shelton is an interesting choice. I don’t know if it is still floating around the internet, but I really laid into the Tigers for cutting Pena. I thought that was a terrible move at the time. They were not adjusting their expectations for the ballpark and they seemed ignorant of it being a bad idea for him to not be platooned. Pena should have been a solid major leaguer during that whole period, but the Tigers didn’t support him. In 2005, when they stopped hitting him against southpaws and eventually barely used him, he hit .254/.355/.493 against RHP. His 53 horrible PA against LHP, when he struck out 26 times, torpedoed those numbers. So they made the decision not to give him reps against southpaws and then misjudged him by looking at his overall numbers when they needed more salt. They also neglected his performance in AAA completely, where he was smacking the ball. My analysis might be a little off since I haven’t done the double splits, and maybe it was only after he was recalled that they stopped having him hit LHP.

I think I was willing to accept that he was better than the numbers would indicate by themselves last season because I was willing to discount his lackluster showing in Columbus to an extent, and he did not hit poorly while there.

FWIW, I think Pena may also have been miscoached by Detroit, a la Big Papi’s Minnesota years. I don’t have any evidence of it, though.

Shelton was a really good hitter all the way up the ladder, and I don’t know if he was injured or benched in 2004, but it kept him out of the limelight, so his very good 2005 was overlooked until his terrific April 2006. He didn’t really have a breakout, IMO - it just felt that way, and that feeling was compounded when he just completely lost it (and the it obviously includes some luck) after the first month. He didn’t recover much in 2006 or 2007, and if there is a phenomenon I think may have merit that would be really hard to study, it is that good players sent down to AAA after a perceived slump by an organization that seems to have really soured on them maybe don’t work as hard or are not confident enough to play up to their ability. (Same as with Pena.) Whether Shelton is going to bounce back with the Texas organization I don’t know. Obviously, it is not discouraging that he’s crushing the ball in Oklahoma.


#52    fifth of      (see all posts) 2008/04/29 (Tue) @ 23:12

OK, I checked Pinto’s site for more on Pena. He was sent down in 2005 with 26 PA against LHP (one was an IBB, his only walk against them) with 13 K, and 124 PA against RHP. Maybe my point about reps against LHP has some validity, and maybe not. His BABIP against the RHP was .247, and he was 2 for 10 on BIP against LHP. He’s generally ~league average in BABIP.

After he was recalled, he was .311/.381/.689 against RHP in 118 PA, and in his 27 PA against LHP he had 13 K, no BB or HBP, 3 HR, a double, and a single.

I’ve studied dynamics of LHB relative to how often they face LHP (including how often the LHP is a reliever, quality of LHP, and some other things). I’ve shelved most of it for the time being since there are way too many things I haven’t figured out how to account for.

Pena did have a decent amount of PA against LHP at Toledo that year, and did not hit them well.

And, to be clear, the reason to look at stuff like this is to understand his performance in its context better, not to draw conclusions from the actual small amounts of PA in the discussion. We don’t want to throw out his PA without the platoon advantage, and we don’t want to neglect that pinch-hitting for your platooned LHB has negative externalities and a penalty for the PH.


#53    fifth of      (see all posts) 2008/04/29 (Tue) @ 23:19

And, to tie it all together…

I think Greg’s question gets exactly at my point. When you see something like a breakout, it’s a great chance to examine the past in more depth. That’s the kind of analysis that I like. Pointing out how different the breakout is than the past is old news and doesn’t tell us much. But using the basis of the breakout to investigate whether there were overlooked aspects of performance (like an SP facing the order a third time more or less frequently or getting 22 starts at home and 11 on the road, or like an LHB having more PA with the platoon adantage) in the past that suggest further adjustments to our expectations? Man, I love doing that. I had no idea Corey Patterson was making steady improvements in his K rate until DC pointed it out, and though I didn’t care about his K/PA in 2008, it was the 2008 K/PA that got Dave looking.

I hope this is helpful.


#54    fifth of      (see all posts) 2008/04/29 (Tue) @ 23:29

Key data I left out of #52 above:

PA against RHP, 2003-4: 68.8%
PA against RHP, 2007: 73.4%


#55    MGL      (see all posts) 2008/04/29 (Tue) @ 23:46

I’ve got a quantitative question for Fifth of, and maybe MGL as well: given that small sample sizes can be misleading, how do you (asking your personal outlook) quantitatively make the transition from attributing breakout performance to random chance to deciding that it must be real?

Greg, I have discussed this before.  There is no such thing as performance (ANY performance, including so-called “breakout") being “real or not.” I would repeat that if it would make any difference.  That is a fiction!

There is also no such thing as “short term performance being misleading” any more than Stephen Hawking’s discussion of the origin of the Universe would be “misleading” in front of a bunch of realtors.  It is only “misleading” because they don’t know what the heck he is talking about, and if they so desire to figure it out (as opposed to just taking a nap), they are probably going to screw it up, and therefore they might think that what he said was “misleading”.

We have an estimate of a player’s true talent at given point in time. That estimate changes (of course it could also stay the same) as we get more sample performance.  Period.  End of story.  A player’s true talent is what the market guys (yeah, the yappers) call a “moving average.”

The idea of performance being real, or not real, etc. is just semantics and rhetoric.  If there is anyone that can do better than a Marcel model simply updating a player’s projection as we get more data, I have yet to meet them or hear of them.

There have been lot of studies on “breakout performance.” It is easy to study.  Just look at any group of players in history who have had a breakout performance, using any criteria you want.  Then look at how they performed in the future (preferably near future).  That is THE answer you are looking for. What you will find is that future performance is simply the Marcel of all past performance, including break out periods, exceedingly poor performance (there is no way to KNOW that a player is done), and everything in between.  Make the breakout as large or as long as you want and you still will find that any future performance is more or less a Marcel.

So given that you know that the best estimate of a player’s performance going forward (and hence, his true talent) is simply a weighted average of all past performance, regardless of what kinds of patterns exist in that past performance, how would YOU answer your own question?

One caveat.  Technically, the EXACT weighting of past performance depends a little on the age of the player and other things, but for all practical purposes, a basic Marcel works just fine.

And when I say “Marcel” I mean an age-adjusted one.  A 26 years old player with past performance of +10, +8, +6, is going to have a substantially different projection than a 36 year old player with the same recent history.  I am also not opposed to some people being better able to infer unique aging curves for certain players or certain types of players, even though most projection systems just use a generic aging curve.

So all of these articles about what is real and what is not is all BS, yapping, whatever you want to call it.  Give us a player’s projection as of today, and that’s all I need to know.  Period.  End of discussion.  No need to waste all that print or cyber space.


#56    MGL      (see all posts) 2008/04/29 (Tue) @ 23:55

You want to know happens when a player’s K rate substantially improves (in 25 games) after 5 or 10 years in the majors, easy enough to answer.  Just look at all players in history who have done about the same thing and then look at the rest of the season for all of them.  You’ll probably have 5000 or 10000 worth of PA to work with.

Guess what you will find?  Their K rate for the rest of the season will be a Marcel of their entire career K-rate, including the 25 games of improvement (although those 25 games won’t make much a dent, I can guarantee that).

Does that mean that Corey has not changed his true talent level beyond that?  No.  Does that mean that no players change their true talent K rates (or any other rates or skills) beyond their Marcel’s?  No.  But who cares?  All we can ever do in baseball forecasting is give our BEST estimate of anything.  If someone wants to say, “yeah, but is possible that...” I have no problem with that because it is true.

If I pull a coin out of my pocket and say that there is a 50/50 chance that if I flip it, it comes up heads, does that mean that it will come up heads?  No.  Does that mean that it will come up heads 5 out of 10 times?  No.  Does that mean that my coin isn’t biased, and that it won’t come up heads 52% of the time in a trillion flips?  No, no and no.  All I can do is give you the best and truest estimate I can, which is all we can do for players and teams.

Does that mean that someday we won’t have models that give us better estimates of players’ future performance?  No.  Does that mean that we won’t someday figure out, to some extent, which breakout performances indicate a true change in talent and which ones are purely random fluctuations (most likely both of course)?  No.

Etc.


#57    Greg Rybarczyk      (see all posts) 2008/04/30 (Wed) @ 00:33

MGL #55

Let me see if I understand what you’re telling me.

Take any large group of players and project their performance forward using Marcel, which is an age-adjusted model based on past performances (so, an empirical model).  Because it is a large group, and due to random chance, some players will perform better than their Marcel, and some worse, but most near their Marcel projection.  In fact, if a player “breaks out” or “collapses” in the sense the words gets used, his true talent is not really changing in any way other than the Marcel curve projects, it is just that he’s the guy on the tail of the distribution.  Basically, all players’ true talent stays “on the tracks”, and any deviation from it is explained by random chance.

Did I get that right?  I want to pause here and see if I’m on the correct scent here, because I am not sure I buy into this completely, and I want to dig deeper…


#58    Greg Rybarczyk      (see all posts) 2008/04/30 (Wed) @ 00:36

Hmmm.. now I think maybe what you’re saying is that whether any particular player’s talent stays “on the tracks” or not is irrelevant, as you won’t be able to know if the true talent is moving or if it’s just noise, and therefore you can’t predict off it.  Is that right?


#59    tangotiger      (see all posts) 2008/04/30 (Wed) @ 05:58

The sample data won’t tell you if he reached a new true talent level.  The only way to know, is if you have information BEYOND the sample performance data, like if Mike Scott has learned a new pitch (or is greasing it), if someone is hurt, if someone has decided to stop throwing 15 fastballs in a row, etc, etc.

And EVEN THEN, you really don’t know the impact of what all that is doing to him.


#60    Peter Jensen      (see all posts) 2008/04/30 (Wed) @ 08:01

Tango - Aren’t you assuming that a certain percentage of players ARE reaching new talent levels when you rate the most recent seasons more heavily in the Marcels, but that the monkey’s level of analysis can’t determine which players?


#61    Greg Rybarczyk      (see all posts) 2008/04/30 (Wed) @ 11:14

Tango #59

I think this is where we are right now.  Multitudes of possible factors, with lots of noise obscuring the true talent signal.  Not to mention too many output signals (it’s easier in manufacturing when you can just watch one value for departure from control via SPC)…

Looks like we’ll have to endure a lot more “Prince Fielder lost his power because he became vegetarian” stories before we figure it out (if ever)…


#62    Tangotiger      (see all posts) 2008/04/30 (Wed) @ 12:52

Peter/60: right, there’s no question that players are achieving new talent levels.  In fact, they have a new talent level on a second-by-second basis!  We are human after all.

But, I can’t pick out who is, and so, I am basically giving a blanket probability based on recent performance.  But, that blanket probability is very nuanced, since a 5/4/3 weighting is not that much different from a 4/4/4 weighting. Pitchers get 3/2/1 because they have a better chance to influence a change to their talent levels (either mechanically, or physically by being hurt).


#63    MGL      (see all posts) 2008/04/30 (Wed) @ 16:00

Here is the funny thing about “breakouts” (and whatever the opposite is), to me at least.

What is the “magic” of a “season?” We are now 25 games or so into a season.  All we hear about is how a player has done so far this season and why that is.

What about a breakout “half-season?” What about a breakout “3/4 of a season.” Why would a breakout in the first 25 games of this season have any more or less significance than a breakout in the last 14 games of last season plus the first 11 games of this season?  Or at the end of the season, what about a “breakout” for the last 25 games?  Or last 47 games?  What about the last half of last season plus the first half of this season?

Get my point?

In fact, if a “breakout” indicated that a player has reached some new level of talent, for some reason, wouldn’t it be more likely to occur towards the middle or end of a new season while he was working with the coaches and new techniques and actually training?  Wouldn’t the beginning of the season be one of the LEAST likely times to have a breakout performance?  What, did the player have an epiphany in the off-season while sitting on the coach watching football?  If anything, what we should focus on at the beginning of a season is players who are doing exceptionally poorly.  Wouldn’t that make more sense?  Isn’t it more likely that off-season affects a player by him being 6 months older than the end of the last season (for players past their peak of course), and possibly being fat and out of shape?  I am exaggerating of course to make a point?

I only bring all of that up to illustrate the artificiality ans silliness of using the beginning of a new season and today as the end points to focus on in terms of breakout or collapse.

From now on, I decree that no web site, newspaper, or radio or TV show, is allowed to say or display current, in-season stats.  They must show either one of two things:  The last 346 games running stats for all players, or, in the alternative, they must take the last 162 games and show us the combined stats from a randomly chosen 93 of those games.

The whole idea of a “breakout” or “collapse” “season” and whether they are “real” or not is so silly that this is the last time I will comment on it, with no offense to the other participants in this discussion, to whom I mean no disrespect whatsoever.


#64    tangotiger      (see all posts) 2008/04/30 (Wed) @ 17:51

"The last 346 games running stats for all players”

This is basically how the golf and tennis rankings work.  Even though they have a definite stop to their “seasons”, they still keep a running total of the rankings that go over multiple seasons.

Starting over is really just a thing to keep things fresh, to fool everyone into giving people hope that some players are having a breakout.


#65    Greg Rybarczyk      (see all posts) 2008/04/30 (Wed) @ 18:43

Sept. 1998: Cubs Announcer:  “And Sosa rips one onto Waveland Avenue!  That is his 100th homer in his last 346 games...”

Apr. 1986: Red Sox announcer #1:  “Strike three looking, and Phil Bradley heads back to the dugout.  And that makes 239 strikeouts for Clemens in his young career.  Say, seems like a lot of Mariners have been striking out tonight, how many is that tonight?”

Red Sox Announcer #2: “Shut up, it’s 239 in his first 264.2 innings.  Stop talking about tonight.”

Present Day: Diamondbacks Announcer #1: “So, what do you think of this guy Scherzer, he’s got some nasty stuff, eh?”

Diamondbacks Announcer #2: “How should I know.  Ask me in 2010.”

smile


#66    tangotiger      (see all posts) 2008/04/30 (Wed) @ 18:56

I’d love it.

You definitely don’t hear about Federer getting 10 points out of his last 14.  And when they talk about having 32 unforced errors, it’s a quick blurb on the screen, and they don’t yap it.

The downfall of the baseball broadcast is precisely because they have numbers in front of them.  You also don’t get this b-s much in NHL either.  They flash stuff, but they don’t yap it.

Stopping the yapping meaningless numbers would do wonders for a broadcast.


#67    MGL      (see all posts) 2008/04/30 (Wed) @ 19:21

Tennis broadcasters (some of them at least) are pretty bad without yapping about numbers (stats).  I don’t know how many times I have heard after 1 set or so how “so-and-so” looks tired, just doesn’t have it tonight, does not match up well against their opponent, and then goes on to trounce the other player.

It is (very powerful) human nature to want to make order out of disorder, and to be obsessed with, and of course, greatly overvalue, recent performance.  We see that in all facets of life.

Here is a good freakonomics thing:

I just read that there is evidence that lots of the FLDS kids who were confiscated in Texas had (past) broken bones, and that some of the boys were sexually and physically molested, as well as the girls.

Of course that means we should be outraged and glad that the state took these kids away from their parents, right?

First of all, I do NOT necessary believe the facts that are being reported by the state, and no one should.  (You mean the government lies sometimes?)

More importantly though, if you took a random group of 500 some odd children, especially from a rural area (nothing against rural areas), how many do you think would have evidence of broken bones (treated or not) and how many do you think would have been physically or sexually abused?

Without that number (which I am assuming is pretty high), whatever numbers are even true with the Texas kids mean nothing!

If we used abuse (physical, emotional, sexual, nutritional, educational, etc.) as a justification for taking kids away from their parents, how many parents do you think would have any kids?  I’d say not many.


#68    Terry      (see all posts) 2008/04/30 (Wed) @ 21:30

#67: A vast majority unless those terms are defined as inclusively as unreasonably possible......


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main