THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Friday, April 22, 2011

Bayes and regression, again

By Tangotiger, 05:13 PM

Great post from Kincaid.

Note: a better estimate than .050 as the spread in talent is .060.  That’s why I use 69 games as the regression amount equal to 50%.  (Kincaid’s example, using .050 as the spread, implies 100 games as the regression amount.)

One easy way to test that 69 is a better number than 100 is to simply take games #1, 3, 5… 137 for each team in pool1, and games 2, 4, 6… 138 in pool2, and run a correlation.  You should get r=.50.


#1    Kincaid      (see all posts) 2011/04/22 (Fri) @ 23:50

Using .060 as the SD for the prior distribution gives an expected true W% of .450 for a team that goes 2-10.

From 2000-2010, there were 1255 stretches where an MLB team went 2-10 over 12 games (per Retrosheet gamelogs).  The average record of those teams in games outside that 12 game span (from the same year) is .451.  So that fits reality much better than using .050.


#2    Tangotiger      (see all posts) 2011/04/23 (Sat) @ 07:42

Great stuff Kincaid!  I’m not surprised really.  From 1961-present, the observed SD in win% is .071, where the random is .039.

sqrt(.071^2-.039^2)=.059

Hence, the reason I always use .060 in baseball.  That implies 69 games of regression that you add to each team.


#3    Tangotiger      (see all posts) 2011/04/23 (Sat) @ 07:54

Continuing, that means 2+69/2 in the numerator and 12+69 in the denominator.  Or 36.5/81 = .451 as the theoretical.  And Kincaid shows the actual as .451.


#4    J. Cross      (see all posts) 2011/04/23 (Sat) @ 11:07

Good stuff.  I really enjoyed it.


#5    Martin Monkman      (see all posts) 2011/05/01 (Sun) @ 15:05

This is a great thread, thanks to both Kincaid and Tango.  I’ve used this as the inspiration to create a google doc that compares the implied regression of 69 games and the long-form Bayesian approach.

http://bayesball.blogspot.com/2011/05/early-season-standings-and-bayes.html

My only question is to Tango—I got 66 games as the implied regression, and I’m trying to work out if I missed something or if the difference is simply a matter of rounding.  Can I ask you to share the details of your calculation?


#6    Tangotiger      (see all posts) 2011/05/01 (Sun) @ 19:43

I don’t know that I’d worry about 66 or 69, as that sounds like a rounding error, or some other tiny difference along those lines.


#7    Kincaid      (see all posts) 2011/05/20 (Fri) @ 16:20

I made an Excel macro to make calculating Bayesian W% and regression-to-the-mean estimates for a group of teams easier. You’ll need an Excel spreadsheet with each team’s W and L totals, and then paste the linked code into Excel’s VBA environment. Read the comments embedded in the code to see what values you might have to alter.

https://docs.google.com/document/d/1xdtx4H_upEzAqkspRwWJl_OAmuqIq9to7sgpHn0Uld0/edit?hl=en_US


#8    Tangotiger      (see all posts) 2011/05/20 (Fri) @ 16:23

I can’t access that document from the office, so let me jsut ask you while you are here.

What kind of setup would you have to have in order for the Bayes estimate and the regression toward the mean estimate to be off by at least 1 win (in 162 games)?


#9    Kincaid      (see all posts) 2011/05/20 (Fri) @ 16:41

An 0-22 team would be just over 1 win different per 162 games between regression and Bayes (using 69.4 games as the regression point instead of rounding to 69, it’s 0-21).

For a full season, a 40-122 team would show about the same difference.


#10    Tangotiger      (see all posts) 2011/05/20 (Fri) @ 17:06

Excellent.  Regression toward the mean is always a shortcut for Bayes, and it’s good to know at what extreme point it breaks down.

So, if you have a 40-122 team, that would regress to a 52-110 team, which is one win from what Bayes would suggest.


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 02:54
Largest demonstration in Canadian history?

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 25 00:36
Help needed with sticky issue…

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards

May 24 08:13
espnW for hockey: CBC’s WhileTheMenWatch.com