THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Sampling

Thursday, January 26, 2012

AL v. NL in 2011

By , 04:36 AM

It is generally accepted in the sabermetric community that the AL is a better league than the NL, at least for the last several years.  This is evidenced by the fact that the AL has a large advantage in IL games, although at least some of that edge could be something other than overall “talent”, although this is not likely and several people, including myself, have found little or no inherent advantage to the AL in IL games (e.g., the NL teams do not have any DH’s, so they have to juggle their lineup in AL parks, on the other hand, in NL parks, AL teams have to sit their DH’s or juggle their lineup, perhaps putting a bad defender - their DH - in the field, the AL pitchers typically are poorer hitters than the NL pitchers, etc.).

Read More

Tuesday, November 22, 2011

Selective end points AND data mining AND publishing bias rearing their ugly heads again…

By , 11:29 PM

Let’s see how many posts it takes for the geniuses on BBTF to figure this one out.  So far 9 and counting…

Anyway, here is the link:

http://www.thegoodphight.com/2011/11/21/2485197/phillies-citizens-bank-park-not-a-hitters-haven

to an article which tells us that CBP has played almost neutral for the last 4 years, therefore it is now a neutral park, as opposed to the first 4 years when it was an extreme hitters’ park (around 1.07).

Let’s forget for a second how a park can all of a sudden change its true PF’s (it can’t other than by changing other PF’s in the league and even then it won’t change much - of course the “effective” PF can change - a little - with weather and with different players).

Instead, let’s do this thought exercise:

You have 30 parks with a true PF of x, y, x, etc.  I am telling you that they never change (which is actually reasonably true, as I indicated above, barring a remodel of course).  We track the observed (sample) PF’s for 8 years.  What are the chances that in the last say, 3, 4 or 5 years (you get to choose the end points) some park will show an observed PF that is quite different than its true PF AND/OR quite different than the observed PF in its first 3, 4, or 5 years?

IOW, what can we conclude about the true PF of CBP?  Not much other than its true PF is likely the un-weighted average of the observed PF over the last 8 years, regressed toward some mean (of a similar park, dimension, weather, altitude-wise, etc.).  If you want to weight more recent years slightly more than past years, I don’t have much of a problem with that, although I don’t think that any weighting is necessarily appropriate…

(17) Comments • 2011/11/28 • SabermetricsSamplingStatistical_Theory

Friday, October 21, 2011

A Warning from MGL!!

By , 11:32 PM

Now that I have your attention…

I just read a decent article on FG written by David Cameron about TLR’s bullpen choices in the 9th inning of game 2, which we have discussed on this blog.

In the comments section, I kind of lectured Dave, who has done much excellent analyses and written many fine articles on FG and other sites and publications, about using samples properly. I think it is important enough to warrant a thread on this blog.  Essentially I said (I’ll reprint some things from my comments on his article):

I am very uncomfortable when an analyst gets to choose which sample he wants to present to support his point or his opinion. This year only? Last 2 years? 3 years? Career? Lately, as in last half season? The last 10 games? You should not be allowed to do that, for obvious reasons (cherry picking your evidence makes your arguments intellectually dishonest, or misleading at best).

For example, Dave said this:

“While Hamilton’s strikeout rate against LHPs jumps to 22.1%, Rhodes K% against LHBs this year was just 16.1%. His career numbers are much better, but he’s not the same pitcher he was a few years ago, and Hamilton had hit an outfield fly against him the night before.”

Yes, he is not the same pitcher, but if this year his K% was higher than his career numbers, Dave would probably be quoting us his career numbers (heck, I would too if I had the choice!). The analyst should NOT have the choice. He should always be quoting a projection which is some kind of weighted career average or whatever the accepted standard is!

And for the last part of that last sentence, about Hammy hitting a fly ball the night before, David should get immediately thrown into the MGL jail. I can’t believe he even said that in that context. Shame on you Dave!

I followed up with this:

If you are allowed to split the samples up anyway you want, you can probably support just about any thesis from one end of the spectrum to the other. Which is why a standard must be used. As in all scientific fields, in sabermetrcis there is a generally acceptable standard in the industry – weighted career (or to simplify last 3 or 4 years).

That is not an arbitrary method mind you. When we are trying to answer questions such as, “Who should be used in an upcoming situation,” we are essentially asking the question, “How do we expect so and so to perform at some time in the future, in most cases, as in this, the immediate future, such as in the next PA or tomorrow?”

To do that, again, the accepted standard in the industry, after years of very thorough research and analysis, is to use a “Marcel-like” projection for component rates, GB and FB frequency, platoon splits, etc. It is also accepted standard to ignore things like clutch, home/road splits (other than the normal one of course), day/night, pitcher/batter historical matchups, hot and cold streaks, etc. Not because we KNOW that these don’t exist, but because we find, again, after years of thorough and extensive research, that even if they exist, they have little predictive value.

So I implore all analysts, including David, who is a fine one, to use these standards when presenting a thesis. If time or other constraints exist, which I understand, then some semblance of these standards should be used, or some qualifications issued, rather than disingenuously using one year, half year, or other similarly small and/or misleading samples (such as un-weighted career) in order to support an argument.

(4) Comments • 2011/10/22 • SabermetricsSamplingStatistical_Theory

Tuesday, October 18, 2011

Times through the order with the 9th inning removed…

By , 08:56 PM

In light of the new research presented on this blog which suggests that when starters pitch the 9th, the score tends to be lopsided in favor of the pitching team and wOBA tends to be lower than expected given the true talent of the pitchers and batters (and other things that affect offense), I have recalculated the “times through the order” wOBA for both day and night games, with indoor games not in the sample, removing all 9th inning data.

In The Book, this is what we presented:

Times through the order expected actual

1 .353 .345
2 .353 .354
3 .354 .362
4 .353 .354

As you can see, the more a starting pitcher faces the lineup the better those batters do, due to familiarity or pitcher tiring, or both (or some other reason or reasons).  However, the 4th time through the order, the trend seems to stop and batters actually perform the same as the second time through the order.  This seems to make no sense.

We have speculated two things that might be causing the 4th time through the order depression:  One, 2/3 of all games are at night and it is colder the 4th time through the order.  Two, and more recently, the 4th time through the order sometimes happens in the 9th inning, and as we have just found, 9th inning wOBA versus starters gets depressed because the score is usually lopsided in favor of the pitching team.

So I reran the numbers separately for day and night games (and ignored indoor games), and I also ignored the 9th innings.  The wOBA is adjusted for the pool of pitchers and batters in each bucket.  The first row is 1st time through the order in the 1st inning only.  The second row is 1st time through the order in all other innings.  We see a real depression in the first inning.  Although the data is for home and road teams combined, it is actually the road team batting that is heavily depressed in the first inning for some reason. Either home team pitchers are already used to the mound, the road batting team starts out “cold” or there is some other reason or reasons.

Night games

1 (1st inn) .339
1 (other inn) .341
2 .352
3 .359
4 .350

So, again, we see a depression the 4th time even though we are not using 9th inning data.

Day games

1 (1st inn) .330
1 (other inn) .341
2 .349
3 .357
4 .367

Here we see a large jump from the 3th to the 4th time.  It does appear that either temperature or pitcher tiring during the day (but not so much at night), or perhaps shadow issues during the day, greatly affect the “time through the order” penalty…

(12) Comments • 2011/10/21 • SabermetricsPitchersSamplingStatistical_Theory

Sunday, March 20, 2011

Why MLE’s are a mess…

By , 02:38 AM

I’ll warn you in advance (what other kind of warning is there?) - this is a long post and one that is hard to follow…

If I look at park and league adjusted AAA stats and compare them to MLB stats for the same players, weighted by the lesser of the two PA (e.g., if a player had 300 PA in AAA in a certain year and 100 PA in MLB in a certain year, I use the 100 PA to weight each of those stats, AAA and majors), I get this:

Read More

(19) Comments • 2011/03/21 • SabermetricsMinors_CollegeSampling

Saturday, October 02, 2010

Two kinds of luck

By Tangotiger, 09:55 AM

There are two kinds of luck: pure luck, and talent-driven luck (or “make your own luck").  Let me describe the difference.

Each of us, in whatever actions we are performing (throwing a pitch, driving a car, typing) has a talent level.  We’ll even say that, at a point in time t, we have a fixed true talent level TT.  TT(t) if you will.  When you apply that TT(t), you will NOT OBSERVE TT(t).  That’s because we are not automotons.  We are people.  What we WILL observe is some performance where, had we repeated those actions a million times, will have centered around TT(t), with a normal distribution around that centering point.  If you randomly pick out one of these points, this is talent-driven luck, or “make your own luck”.  It counts as something you did because you are the causative agent.  It doesn’t REPRESENT you, but it is an INDICATOR of you.  Given enough of these indicators, it will represent you, with a certain uncertainty level (the more indicators the less uncertainty).

Now, pure luck has nothing at all to do with your talent level.  You are struck by lightning.  You are a pitcher for the league’s worst offense.  You bet on double-zero.  You observe results to these external actions.  You are hospitalized, you have an 8-16 record (with a 2.74 ERA), you made a million dollars.  This has nothing to do inherently with you, even though you are the beneficiary or victim of these actions.

Suppose that in baseball, we only recorded runs scored or allowed for a game to determine the winner, but after that, we discarded the runs numbers, and only kept track of wins, and who was pitching in that game.  And suppose that pitchers always pitched complete games.  So, you can have someone with an 8-16 record, and we have NO IDEA how much of that was due to his true talent level, and how much of that was due to pure luck.  All we know is that he was a participant of those results.  We also know that half of the game is offense and half is defense, and that the pitcher has no influence on the offense.  That 8-16 record is loaded with uncertainty.  We have talent-driven luck and pure luck.

If this pitcher had a career 300-250 record, and he pitched for many teams, we now feel better.  We feel better because the sample size increased, and his talent level, and the luck from his talent level is driving the record.  The pure luck, the noise, gets overwhelmed the more events you have.  If you get struck by lightning 10 times, maybe you have a lightning rod up your butt.  If you bet doube-zero 5 times and win each team, maybe you have the fix in.

Now look at batted balls in play.  Suppose that we KNOW (god told us in her wisdom) that results from batted balls are almost entirely due to the talent level of fielders, and virtually none by the pitcher.  But, the pitcher is the agent that delivers the ball.  He rolled the dice.  The fielders are the one that determines hits and outs (in this example). 

Now take a more realistic example that the pitcher has some complicity, as does his fielders.  And let’s say we know (god again, smart girl) that pitchers/fielders are equally responsibe.  Like, for example, offense/defense equally responsible for winning and losing.  But, like in the example earlier of us not knowing how many runs are scored or allowed and we just know how many wins and losses that the pitcher participated in, we have a similar situation that we don’t know why a hit or out was recorded.  All we know is that the pitcher was there when 36% of BIP are hits, and we that the the pitcher was there when his team won 33% of its games.

What do we do with this pitcher?  What if we know how many runs his team scored?  What if we know how good his fielders actually were?  What if he know what his career BABIP or W/L record was instead, but we don’t know anything about that particular season?

We have talent-driven luck, and we have pure luck.  The first thing you have to decide is how you want to account for the pure-luck in terms of apportioning responsibility to players.  A team wins 60 games in 162, and a pitcher has 27 wins in 35 decisions.  What do you do?  Now, you find out that this team happen to score 5 runs per game in his games, while scoring 2 runs per game in the other games.  Now what do you do?

These are not easy questions, and there are no easy answers.  It’s a question of philosophy.  You need to create your own personal framework to handle luck.  And then be consistent in that application.

(33) Comments • 2010/10/04 • SabermetricsSamplingStatistical_Theory

Wednesday, December 02, 2009

Official does not mean correct, when it comes to scorers

By Tangotiger, 10:23 AM

I love the effort put in here.  When you see such wild inconsistency, with no second-level quality check, what does the data really mean?  Increasing sample size does not reduce bias, if you are aggregating on the thing you suspect bias on.

To me, it’s extremely disappointing that the NHL does not care enough about its data recording that they don’t have a more robust system in place.

Glove-slap:Hawerchuk.

(5) Comments • 2009/12/07 • SabermetricsSamplingOther SportsHockey

Tuesday, September 15, 2009

Marcel of Joy

By Tangotiger, 03:32 PM

This is an interview that is different from most.  It gives you something to think about.

(0) Comments • • SabermetricsSampling

Wednesday, October 29, 2008

Introduction to the landscape of sabermetrics

By Tangotiger, 11:17 AM

Derek Carty gives us his take. 

The quickest explanation of sabermetrics was given by Theo Epstein early in his GM career, when he said that he sees statistics and scouting as two lenses of the glasses.  Unsaid by him is that the glasses is sabermetrics.  So, anyone who thinks of the choice as beer or nuts, ignores the reality that the choice is beer and nuts.

Also, the more performance statistics you have, the less value you can place in scouting, because at some point under certain conditions, the reality of what actually transpired is more important in determining what will happen than what the scout thinks that player will be doing.  And vice-versa of course, if the performance numbers was based on small samples.

(0) Comments • • SabermetricsSampling

Monday, May 05, 2008

How can the inputs remain the same, but the outputs change?

By Tangotiger, 10:27 AM

Joe Sheehan points out that the “slash” data is the same, but run scoring is down:

AL AVG OBP SLG ISO R/G
April 2008 .260 .334 .398 .138 9.04
April 2007 .255 .327 .404 .149 9.36

NL AVG OBP SLG ISO R/G
April 2008 .256 .331 .404 .148 9.11
April 2007 .258 .332 .400 .142 9.31

Is that random variation, or is something else going on?  Taking a quick crack at it:

We have in Mar/Apr 2007 in MLB: .256 .330 .402
And this year: .258 .332 .401

That’s remarkably close.  The runs scored per 27 outs in each year: 3785 runs in 7490.1 innings, 4.55 runs per 27 in 2008.  3360 in 6670.1, or 4.53 runs per 27 in 2007. 

Huh?  What’s Joe talking about?

Here’s the data I’m using:
http://www.baseball-reference.com/pi/psplit.cgi?team=TOT&lg=ML&year=2007#dates-month
http://www.baseball-reference.com/pi/psplit.cgi?team=TOT&lg=ML&year=2008#dates-month

Either Joe misstated his facts, or Sean has a bug, or I’m reading something wrong.

I’ll let the Wisdom of the Crowd make the decision.

(45) Comments • 2008/08/19 • SabermetricsSampling

Monday, February 11, 2008

How you can support just about any argument using silly statistics and logic…

By , 09:12 PM

This is from Chris Jaffe, no less, a baseball analyst.  While I have read plenty of his stuff and I recognize the name, I admittedly know little about him (and get him mixed up with the other Jaffe). This is also an example of how when you start writing for a (somewhat) mainstream web site or publication, you invariably develop a case of “I can write crap too, just like the rest of the guys (mainstream writers)...” (See my past comments about Keith Law.)

Read More

(26) Comments • 2008/02/16 • SabermetricsSampling

Sunday, December 02, 2007

Do Baseball Insiders Really Understand Baseball (and Statistics)?

By , 01:39 AM

I did not really know how to title this entry, but…

Here is an article that, in my opinion, is a good example of how baseball “insiders” are woefully inadequate in understanding the confluence of baseball and statistics, such that it can and will lead to bad deicision-making.

Read More

(6) Comments • 2007/12/03 • SabermetricsSampling

Thursday, October 11, 2007

Why the Phillies, Cubs, Yanks, and Angels lost the DS

By , 12:44 AM

Actually, I’ll generalize to all teams that have lost any game or series throughout the history of baseball (and most other sports).  Their opponents probably outhit and/or outpitched them, likely outscored them, and definitely won more games than they did.  Oh, and several players on the losing teams had a bad game/series - worse then their regular season stats.  And the winning teams probably played with more heart, guts, guile, and confidence, and some of them were even teams of destiny.  Did I leave anything out?

(6) Comments • 2007/10/13 • SabermetricsSampling

Friday, September 21, 2007

Eric Gagne

By Tangotiger, 04:08 PM

If we look at his 2002-2004 data, we see the following totals: 202 GB, 187 FB, 108 LD.  This year, he’s 52/54/32, which is almost exactly in line with his 2002-2004 performance.  Nate Silver points out the enormous flip in GB to FB ratio of Eric Gagne, between Texas and Boston this year. Excluding bunts, in Boston, he’s at: 14/20/13.  If you divide his 2002-2004 data by 10, you’d get this expectation: 20/18/11, which means he’s given up a couple more FB, a couple more liners, and a few less ground balls.  When your sample size is 50, that really means nothing. Of his 14 groundballs, batters are 6 for 14.  But again, that’s 14 PA.  Of the 20 FB, batters are 6 for 20 (all extra base hits).  Of the 13 liners, batters are 11 for 13.  He’s given up 2 more groundball hits than he should have, a few more extra base hits than he should have and one more line drive hit than he should have.  In high-leverage situations though (LI of 1.8 or higher), opposing batters reached base 13 of 21 times, which is horrible.  But still, it’s only 21 PA.

All this to say that with some 70 PA, Gagne needs to be evaluated on his mechanics and pitch effectiveness, and not on the resulting batter performance.  Rereading Nate’s piece, he says exactly this, and he’s right:

There may be scouting evidence that Eric Gagne is not the same pitcher in September that he was in June. But there is little or no statistical evidence based on an informed reading of his numbers.

(13) Comments • 2007/09/24 • SabermetricsSampling

Tuesday, August 14, 2007

A fascinating study, worthy of some discussion I think…

By , 01:33 AM

Here is the Study

There is a discussion of said article, where I made some comments, on BTF.

(97) Comments • 2011/07/01 • SabermetricsSamplingStatistical_Theory

Monday, July 02, 2007

Another article I have a problem with…

By , 07:12 PM

This time from a sabermetric web site.  Where are their editors?

Read More

(12) Comments • 2007/07/10 • SabermetricsSampling

Wednesday, January 24, 2007

What does 17 at bats mean?

By Tangotiger, 03:31 PM

Abbott Katz, in the November 2006 issue of By The Numbers shows us that players who had exactly 17 at bats hit .171 from 1959-2005. 

Does it mean anything?  Obviously, if you only have 17 seasonal at bats, it means alot.  It means you are a September callup, it means that you are on your last legs, it means you got hurt, it means that you did so badly that the manager doesn’t want to look at you.  It could mean a whole lot of things.  It might even mean that you suck.

In order to figure out more about what it means, you need to look at the data outside from which you selected from.  And that means, look at the data in the season before and after that selected season.  Which I will right now:

Read More

(10) Comments • 2007/01/25 • SabermetricsSampling

Wednesday, August 23, 2006

Selective Sampling - How NOT to Choose Players

By Tangotiger, 08:21 AM

Cy Morong takes a look at establishing the replacement level.  He says:

Read More

(17) Comments • 2006/08/24 • SabermetricsSampling
Page 1 of 1 pages

Latest...

COMMENTS

Feb 11 04:03
MGL: Today on Clubhouse Confidential

Feb 11 04:02
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

Feb 11 02:12
Performance through the ages

Feb 11 02:10
Dwight Evans

Feb 10 23:01
For Your Soul

Feb 10 21:07
Hero of the month: Brittney Baxter

Feb 10 18:32
Moneyball at Villanova

Feb 10 17:00
Psst… wanna intern in Canada?

Feb 10 15:01
New PECOTA

Feb 10 14:28
Win expectancy charts used in football… in 1983!

THREADS

February 10, 2012
Jose Molina

February 10, 2012
Reader Mail of the Day: Why do we need X years of fielding data?  And what about outliers?

February 10, 2012
Performance through the ages

February 10, 2012
Hero of the month: Brittney Baxter

February 10, 2012
Win expectancy charts used in football… in 1983!

February 10, 2012
Dwight Evans

February 09, 2012
Psst… wanna intern in Canada?

February 08, 2012
Moneyball at Villanova

February 08, 2012
MGL: Today on Clubhouse Confidential

February 08, 2012
New PECOTA