THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews
If you are a media member and would like a review copy of The Book, please contact Kevin Cuddihy of Potomac Books.

Buy The Book from Amazon

MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, February 01, 2007

Lake Wobegon

By Tangotiger, 01:40 PM

http://www.baseballprospectus.com/unfiltered/?p=171

Read that first.  Here’s what Marcel says:


For hitters, total forecasted PA, H, HR, R:
281414, 68117, 8139, 35698

Based on 862 non-pitchers, with each having at least 200 PA.  Obviously, the overall forecasts are too high. 

Here are the pitchers IP, H, HR, R:
59035, 61588, 7355, 32294

That’s based on 915 pitchers, with each having at least 25 IP.

Let’s baseline them to the 2006 totals of 188,052 PA and 43,258 IP.  That gives us these adjusted lines for hitters of H. HR, R:
45518, 5439, 23854
And for pitchers:
45129, 5389, 23663

The RS and RA would give you a pythag win total of 81.6 wins.

A slight on the optimistic side.  But just barely.

***

What if I only select the players with the most PA, which totals 188,052?  All batters forecasted with at least 269 PA (425 of them) put up this line of PA, H, HR, R:
188233, 46138, 5604, 24216
Repeating for pitchers (44 pitchers, at least 54 forecasted IP), IP, H, HR, R:
43259, 44953, 5348, 23381

This time, not as good.  The pythag win is 83.7 wins.

***

My problem is that I always regress toward the “league mean”.  However, the “league mean” is not the average player, since the better players have a higher PA than the worse players.  We are, in effect, overweighting the good players.  What we should do, in regression, is regress a player’s performance to the average performance for that PA class.  This way, a rookie will get weighted not to the .340 OBP league mean, but say the .320 OBP mean for that PA class.

Fun stuff.

#1    HarryAbles      (see all posts) 2007/02/01 (Thu) @ 14:47

I’ve thought about this, and wondered if projection systems would benefit from the normalization of aggregate W/L records, or total PA/IP based on previous years.  Was planning to track all the major systems’ ‘07 success anyway, and maybe I’ll do their normalized versions too.  Also, even when PAs or IP are baselined, their individual distributions may still not reflect reality well (Marcel has this problem at the lower end at least, but problem isn’t the right word.) Would it be more accurate still to baseline, and then fit the dstribution to something more normal?  The “area under the curve” would be the same, of course.


#2    Tangotiger      (see all posts) 2007/02/01 (Thu) @ 14:55

I was thinking that my second test, where I end up with 83.7 wins, should not be compared to 81 wins.

What if I select the actual 2006 performances, with batters having at least 269 PA and pitchers having at least 54 IP?

This is their actual per 9 IP and per 39.125 PA, of H, HR, R:
hitting: 9.74, 1.22, 5.19
pitching: 9.25, 1.08, 4.72

As you can see, if I set these threshholds, I get a good set of players, certainly above-average.

And the pythag of such a set of players?  88 wins (.545 win%).

Clearly, choosing players, after the fact, based on the quantity of their performance (undoubtedly meaning that their quality of performance was fairly decent) is a selection bias.  I think it’s easy to see how choosing a set of such players, that produced 88 wins, is in-line with expecting a true win total of around 83-84 wins.

Therefore, I think Marcel has it mostly right, that it can serve as a fair barmoter to other forecasting systems, in terms of figuring out how many wins the league should produce.


#3    tangotiger      (see all posts) 2007/02/01 (Thu) @ 15:21

For those wondering why are the aggregate PA totals so much higher than the aggregate IP totals:

The formula for PA is:
50% of PA in 2006 plus
10% of PA in 2005 plus
200

So, if I have 862 nonpitchers, I immediately start off with 172,400 PA, which is 91% of the whole 2006 (which includes pitchers batting).

The formula for IP is:
50% of IP in 2006 plus
10% of IP in 2005 plus
25

With 915 pitchers, the minimum IP total is 22,875, or only 53% of the 2006 total.

The problem really lies with the starter/relief roles.  If I knew a pitcher was a starter, I wouldn’t start his baseline at only 25 IP.

As well, the equivalent of 200 PA is 46 IP.  But, pitchers collapse far more than hitters, in terms of playing time and/or injuries. 

For these reasons, you’ll always estimate more playing time for the hitter.

And, without knowing who is on the 25- or 40-man rosters, Marcel forecests anybody that has played in 2006, which means 60 players per team.

Just things to remember…


#4    Trader Joe      (see all posts) 2007/02/01 (Thu) @ 16:28

It seems to me another way to assess the projections is to SEPARATELY compare the forecasted RA and RS components against the actual RA and RS for the league. This would give you some insight into where the net RA-RS bias might be coming from.

Could it be the most of the bias is due to underestimating the RA because of inadequate discounting for the risk of injury to pitchers in your forecasts?  Silver gives the Derrek Lee example but there are probably more Lirianos (relatively speaking) who get season-ending injuries. Or perhaps the bias isn’t primarily due to injuries, but to something else in your depth charts or your system for projecting hitting vs. pitching (and defensive) performance.

If you look at the separate RA and RS components over time (the last few years) you can also see whether the amount of bias has increased in each component (RA or RS) of your forecast, or perhaps only in the net.  And of course the bias in each of these components (compared with the actual league RA and RS) can, in principle, be compared across forecasting systems.


Page 1 of 1 pages


Name (required)
E-Mail (optional)
Website (optional)

<< Back to main


Latest...

COMMENTS

Jan 08 04:25
Sabermetric Moves of the 2009 Pre-Season

Jan 09 02:33
Cheers

Jan 08 23:45
The first Hardball Times Annual available for download!

Jan 08 21:16
Line Drives

Jan 08 20:23
(recent) Historical WAR on Fangraphs

Jan 08 16:07
Clint Eastwood is Archie Bunker

Jan 08 16:06
Hardball Times Annual 2008, starring…

Jan 08 15:58
Madoff’s Ponzi

Jan 08 03:41
Valuing relievers

Jan 07 17:41
The latest in park factors