THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

Filter posts by...

 

Data

Tuesday, May 01, 2012

BPro glossary

By Tangotiger, 09:48 AM

They overhauled their glossary, so it should be a good reference especially for the newbies. 

I picked out a couple just to see what they had.  Since FIP is nothing more than a short-hand to DIPS, I think a good addition would be something like “FIP is a popular [ubiquitous?] short-hand for Voros’ DIPS construction”.  Voros deserves the lion’s share of the credit for FIP.  Not sure what the THT and Fangraphs glossaries say about FIP.

(11) Comments • 2012/05/02 • SabermetricsData

Monday, April 30, 2012

Fangraphs hates nocturnals

By Tangotiger, 04:43 PM

Now, you can see player stats live!

(0) Comments • • SabermetricsData

Thursday, April 26, 2012

wOBA pitching splits now on Fangraphs

By Tangotiger, 01:03 PM

This is a great little addition by David, so kudos to him.

(0) Comments • • SabermetricsData

Thursday, April 05, 2012

Fangraphs split filter

By Tangotiger, 01:00 AM

David has now added age, which is fantastic.  Here for example are the best players aged 21-27, since 1969.

(1) Comments • 2012/04/05 • SabermetricsData

Thursday, March 22, 2012

Negro Leagues

By Tangotiger, 04:00 PM

Data now housed under B-R.com.

(4) Comments • 2012/03/24 • SabermetricsData

Saturday, March 10, 2012

Visual Marcels

By Tangotiger, 01:14 PM

Great job here!

One important note is that Marcel should be on the x-axis, and the actuals on the y-axis.  So, when he lets you set the “minimum threshold”, say like “30 HR”, that should be “30 forecasted HR”, not “30 actual HR”.  You can’t ask for all the guys who were observed to hit at least 30 HR, and ask for what the forecast was for those guys.  By definition, any group who was observed to be high will include more good luck than bad luck (as a group, on average).

Sounds like this guy and Fangraphs ought to get together on… something.

(0) Comments • • SabermetricsData

Monday, March 05, 2012

Marcel 2012

By Tangotiger, 01:15 AM

http://tangotiger.net/marcel/

(11) Comments • 2012/03/07 • SabermetricsData

Friday, February 17, 2012

Testing catcher framing numbers

By , 04:21 AM

In this great article by Mike Fast in BP a few months ago, he described a method by which he estimated catcher framing performance using Pitch f/x data. He was generous enough to provide a complete database for all catchers in 07-11.

From those numbers I computed an estimate of each catcher’s framing true talent by simply taking his total observed numbers and regressing toward the mean (zero) by adding 4500 called pitches (about 75 called pitches per game, BTW) of league average framing (zero of course), as he suggests in the article. I did not do any weighting by year, age adjustments or anything like that. I just used the 4 year combined numbers that Mike provided. (BTW, I later learned that there was an error in Mike’s computations, so I multiplied his run values by .65, as per Mike).

To test his numbers, I first broke the list of catchers and their true talent framing skill into two groups of around 25 players each (an arbitrary number of players in each group) - the best and the worst. The average framing skill in the best group, weighted by the number of PA they caught in 07-11, was +7.5 runs per 150 games, and for the worst group, it was -7.7. That is around a .05 runs per game influence, which would show up in their pitcher’s ERA, RA9, or ERC (component ERA). Only a part of that would show up in DIPS or FIP, since framing also influences BABIP.

Anyway, to test his number, I did a WOWY on those catchers. I looked at the results of all pitchers they caught when they were in the game and when they were not. I did not control for anything else, like park, batters, H/A, etc. A pretty standard WOWY analysis. We can thank Tango for that, BTW. I then looked at the WOWY differences in wOBA, SO, and BB rates.

I looked at 05-11 for some reason rather than just 07-11.  So I used some in-sample data (07-11) and some out-of-sample data (05-06). The average catcher in the “good framing group” this time pro-rated to the number of PA they caught in 05-11 (rather than just 07-11) was +7.3 and for the “bad framing group”, -7.6, around the same as for 07-11. IOW, also around .05 runs per game.

Here are the results:

The good framing group had a wOBA difference of .008 points. IOW, looking at the same pitchers, when the good framing catchers caught them they allowed a wOBA of 8 points less than when some other catcher (a slightly bad framing catcher, on the average) caught them. That translates to around .24 runs per game - a lot more than we expected.  The BB per PA had a .004 difference (around .15 fewer BB per game) and the K was .003/PA (.11 per game) more.

For the bad framing catchers, they had a .003 higher wOBA, or .09 runs per game, .11/game more BB, and .23/game fewer K. The runs per game number is also more than we expected.

However, we expect to find much more of a WOWY effect in the in-sample data than is expected using the regressed in-sample framing data, because the actual framing performance of these good and bad framing catchers was much more spread out than the estimated true talent numbers (the regressed performance).

The total number of “min” PA were 302, 434 for the bad framers and 88,738 for the good framers. So the standard error in wOBA is around 1.7 points for the good framers and .9 points for the bad framers. (That is not exactly how you do a standard error for a WOWY; in fact, the real SE’s might be almost double since a WOWY is a difference between two numbers.)

Now, this is not such a great test because most of the data is in-sample (07-11). IOW, in the WOWY test, I used the same data that Mike used to come up with his catcher framing numbers. While he did not use the same method at all (WOWY), it is possible that there are some dependency issues.

The best way to test his numbers is to use out of sample data (and hope that the catchers had around the same skill that they had with the in-sample data).

So first I only used Mike’s data from 07-09 (and did the appropriate regression of course) and then I did a WOWY from 05-06, and 10-11 (4 years).

The average catcher in the bad framing group (based on only 07-09 framing numbers), prorated by the number of PA they caught in 05-06 and 10-11, was -8.9 per 150, and in the good group, +8.1. That is around .057 runs per game.

Here are the results of the out-of-sample WOWY.  These numbers should be close to (rather than larger) the true talent estimates, unlike the in-sample numbers.

Bad framers

wOBA diff: .09 runs/game
BB diff: .114 BB/game
K diff: .19/game

Good framers

wOBA diff: .03 runs/game
BB diff: .076 BB/game
K diff: .114/game

These numbers combined, (.03 + .09)/2, or .06 runs per game, are exactly in line with what we would expect from Mike’s numbers, which is very comforting. In fact, I love it!

Later today, I will do the same test on Max Marchi’s numbers, which were also derived from the pitch f/x data, but use a different method I think…

(5) Comments • 2012/02/18 • SabermetricsDataFieldingForecastingSampling

Saturday, February 04, 2012

Bill James Baseball IQ app

By , 06:48 PM

I downloaded Bill James Baseball IQ onto my iphone (I don’t think it is available on droid phones, but I’m not sure). Here is the web site for the app on Acta Sports:

http://www.actasports.com/titles/bill_james_baseball_iq_app/

It is pretty cool. You can read a description and see some screen captures on the above site, but basically it allows you to see heat maps and color maps of batters and pitchers (in all combinations, counts, situations, etc.) for K zone, batted balls, pitch type, etc.

Best of all, the app is free! Seems to me that they could have charged for this one, but I know nothing about the best way to make money from apps. It also seems like they could use these graphics more often on TV broadcasts.

Anyway, give it a try and see what you think…

(14) Comments • 2012/04/18 • SabermetricsBall_TrackingBatter_v_PitcherBill_JamesDataMedia

Tuesday, January 31, 2012

Injury database?

By Tangotiger, 10:08 PM

I’ve received a few requests in the last two weeks for an injury database.

I know there were a couple out there floating around.  Josh maybe?  Or Zimm?  Anyway, if someone wants to help out the community, please post a link to your injury DB.  Ideally, you have it cross-referenced to MLBAM or Retro IDs.

(9) Comments • 2012/02/28 • SabermetricsData

Thursday, January 19, 2012

Beisbol Data

By Tangotiger, 09:40 PM

Great stuff!

(6) Comments • 2012/01/21 • SabermetricsData

Thursday, January 12, 2012

MLB ID database

By , 07:07 AM

Does anyone know of an up-to-date (including the 2011 season) web site (or some other place) that has MLB ID’s of players?

This one:

https://github.com/geoffharcourt/mlb_rosetta

Does not seem to be updated anymore.

And I don’t think the Lahman or BDB includes 2011. Does anyone know if those will be updated to include 2011?

(15) Comments • 2012/03/13 • SabermetricsData

Wednesday, January 11, 2012

Cuba Stats

By Tangotiger, 09:38 PM

Clay is giving it to us… the raw data, the translated data, and the “forecasted” data.  Great stuff.

(1) Comments • 2012/01/11 • SabermetricsData

Friday, December 23, 2011

Transaction interface

By Tangotiger, 12:34 PM

I don’t know if this is available for the public, or for BPro-subs.  A great tool.  (Presumably at least the article is available to the public, because it sells the app.)

Seems to me this is the kind of thing you’d want for an iPhone especially.

(1) Comments • 2011/12/23 • SabermetricsData

Saturday, December 10, 2011

Pujols and Iannetta versus Abreu and Mathis

By , 04:59 PM

What is the addition of Pujols and Iannetta worth to the Angels, assuming they take Mathis’ and Abreue’s place?

I ran a quick sim against an average team.  The gain is 5.5 wins per 150 games versus a RHP, and 10.4 wins versus a LHP.

Interestingly, of that 5.5 win gain versus RHP, 3.6 is from Iannetta over Mathis (assuming roughly equal defensive value), which leaves only 1.9 for Pujols.

Against LHP, only 2.1 of those wins are from Iannetta/Mathis, so 8.3 are from Pujols/Abreu.

Anyway, if we assume 2/3 of their PA are against RHP, we get a total upgrade of 7.1 wins.

This probably overstates the upgrade, though, since Mathis was not nearly the full time catcher last year (he only played in 93 games).

(12) Comments • 2011/12/12 • SabermetricsData

Wednesday, December 07, 2011

Quick WAR calculator

By Tangotiger, 04:48 PM

For those interested:

http://wahoosonfirst.com/war-calculator

(1) Comments • 2011/12/07 • SabermetricsData

Monday, December 05, 2011

PITCHf/x on Fangraphs update

By Tangotiger, 07:58 PM

Some updates on Fangraphs.  Check out the various tabs in there.  Tons of good stuff.

And, if you have suggestions, post them here.  David is pretty much incomparable in terms of turnaround time of taking suggestions and implementing them.

(7) Comments • 2011/12/06 • SabermetricsBall_TrackingData

Saturday, October 22, 2011

“Fangraphs is now even awesomer”

By Tangotiger, 11:30 PM

Brian Burke links to it, and gives it the love it deserves.

(0) Comments • • SabermetricsData

Monday, October 10, 2011

Fangraphs customizations

By Tangotiger, 10:17 AM

Fangraphs rolls out even more customization options.

The one that I asked David to do was to be able to see a player’s stats split between teams.  This would be useful in cases like KRod, and you want to see his LI with the Mets and Brewers in 2011 in the leaderboards.  I asked David about this just last week, so the turnaround is just fantastic.

I might have to push him on other stuff that I use BR.com for, like times through the order for starting pitchers, etc.  Having it on a leaderboard, easily exportable, sure is a timesaver.

(0) Comments • • SabermetricsData

Tuesday, September 13, 2011

Minor League data

By Tangotiger, 10:51 AM

BPro rolled it out.  Check back a little later, as I wasn’t able to get any data yet.

Page 1 of 8 pages  1 2 3 >  Last »

Latest...

COMMENTS

May 16 22:50
Dodgers’ win reversed because Mattingly did not attest to proper score!

May 16 20:44
How to beat the shift

May 16 20:02
Sponsoring MLB jerseys

May 16 19:34
Now you frame it, now you don’t

May 16 16:56
Did Manny Pacquaio actually quote Leviticus?

May 16 16:06
Does changing your pitch frequency lead to substantial change in results?

May 16 14:18
Extra Innings: One-minute review

May 16 14:16
This particular criticism of UZR is unfounded

May 16 13:21
Psst… wanna intern for the Astros?

May 16 12:23
Arena wars

THREADS

May 16, 2012
Now you frame it, now you don’t

May 16, 2012
Dodgers’ win reversed because Mattingly did not attest to proper score!

May 16, 2012
Does changing your pitch frequency lead to substantial change in results?

May 16, 2012
Sponsoring MLB jerseys

May 15, 2012
Andre The Hawk Dawson speaks

May 15, 2012
Euro 2012 Preview

May 15, 2012
How to beat the shift

May 15, 2012
Will Pujols end the season with at least 30 HR and .500 SLG?

May 15, 2012
Kershaw v Strasburg, part 2

May 15, 2012
Did Manny Pacquaio actually quote Leviticus?