THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
MOST RECENT ARTICLES
MAIL : You ask | We say

Advanced


THE BOOK--Playing The Percentages In Baseball

<< Back to main

Thursday, May 20, 2010

How to calculate the Median value in MS Access

By Tangotiger, 10:22 AM

SELECT
    
(t2.medianT b2.medianB)/as medianValue
    
t2.medianT
    
b2.medianB
     
FROM
    
(
        
SELECT MIN(top50) AS medianT
        FROM 
(
            
SELECT TOP 50 PERCENT HR as top50 
            FROM Table1
            WHERE HR IS NOT NULL
            ORDER BY HR DESC
        
t
    
t2
    
,
    (
        
SELECT MAX(top50) AS medianB
        FROM 
(
            
SELECT TOP 50 PERCENT HR as top50 
            FROM Table1 
            WHERE HR IS NOT NULL
            ORDER BY HR ASC
        
b
    
b2
;

Web Admin
#1    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 10:32

I’m not big on MS Access functions.  Can someone tell me how to convert that into a function, whose usage would be akin to AVG, MIX, MAX ?

I want to be able to say

SELECT AVG(HR), MEDIAN(HR) FROM TABLE


#2          (see all posts) 2010/05/20 (Thu) @ 10:45

I don’t know the answer, but you just taught me TOP 50 PERCENT.  Thanks!


#3    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 11:06

I know, it’s pretty cool.  It goes back to the discussion I had of “meandian”.

http://www.insidethebook.com/ee/index.php/site/comments/meandian_part_2/

Now I can finally calculate a useful function.

For example, I can take the top50 percent, and get the median.  I can take the top40 percent (and bottom40 percent), and get the average of these two boundary points.  I can take the top30 percent (and bottom30 percent), and take the average.  And repeat for top/bott20, top/bott10, min/max.

Given these points, I can overweight the 50th point as say “6x”, the 40th at “5x’, the 30th at “4x”, the 20th at “3x”, the 10th at “2x” and the min,max at “1x”.

And, I get my meandian.  If you go to the post 4 of the above thread, and see the image of the “Huber”, I would think this will pretty much simulate that.

Remember, the mean gives each data point equal weight.  The median gives full weight to one data point, and zero weight to all the others.  “Meandian” as I call it, has a continuous function where the middle point gets most of the weight, and the rest of the points get a weight proportionate to how close they are to the middle point.

Seems this is called a “Huber” function.  I call it meandian.


#4    wcw      (see all posts) 2010/05/20 (Thu) @ 12:47

If you kids are doing anything numerical at all, you probably want to dump Access.  Replace with it with Postgres and use PL/R.  If you’re on a Debian-alike, ‘apt-get install postgresql-8.4-plr’ will pull all the packages you need.  All this is on Windows as well, but you’ll have to let Google tell you how to install there.

Yes, configuring your postgres server will take a few minutes, and figuring out how to pass data to and from R will take a half-hour.  Once you put in that time, though, you can use R for everything numeric and never worry about writing missing functions in SQL again.

Not only will you learn that R ships its quantile function with nine algorithms from which to choose, but you’ll also be able to load a library to generate a Huber estimator.  Or write your own functions, in the unlikely event there is something you want to do that nobody has coded yet.

Unless it is not clear, I heart R and hate Access.  I use both at work (installing your own servers is Not Allowed), and the difference between my work box with Access and my home box with Postgres is night and day.

If you absolutely must stay in pure SQL, I’d probably upgrade to something that can at least handle ANSI window functions.


#5    Peter Jensen      (see all posts) 2010/05/20 (Thu) @ 13:49

Access doesn’t have a median function and I don’t think it would be possible to create one easily because it lacks the SELECT TOP feature that SQL has.  But you have done the heavy lifting by writing the function in SQL.  The easiest way to automate finding the median in Access would be to create a query using the SQL code that you already have and then join that query to your table.


#6    Tangotiger      (see all posts) 2010/05/20 (Thu) @ 13:55

wcw: I do a large portion of my sports work not on my personal computer.  So, I don’t have the luxury of installing anything.

At home, I do everything in Oracle, and much prefer to work solely in SQL (or PL/SQL if I have to) and ONLY use something like R at the tail-end of the data processing workflow.


#7    Matt      (see all posts) 2010/05/22 (Sat) @ 12:24

The linked vba function will give the median of a field.  It’s a little different from average, because you have to enter the table name and field name in quotes.  Also it will return the median for every record, but you can get around that by just selecting the max (or min or first or last) since the median will be the same for every record:

SELECT Max(median("table1”,"field1")) AS med, Avg(Table1.Field1) AS ave
FROM Table1;

If you don’t usually use vba, you can get to the visual basic editor by hitting alt-f11, right-clicking the name of your database in the project explorer window in the top left, and inserting a module.  Then just paste the code from the site into the code window, and you can use median just like a normal function.


#8          (see all posts) 2010/05/22 (Sat) @ 12:50

Here’s a non-vba function that will find the median of the field “[HR]” in the table “HRtbl”, but it’s obviously pretty cumbersome:

Median: (Min(IIf(DCount("[HR]”,"HRtbl","[HR] <= "
& IIf(IsNull([HR]),0,[HR]))
/(DCount("[HR]","HRtbl","not(isnull([HR]))")+1)
<0.5,DMax("[HR]","HRtbl"),[HR]))
+Max(IIf(DCount("[HR]","HRtbl","[HR] >= “ & IIf(IsNull([HR]),0,[HR]))
/(DCount("[HR]”,"HRtbl”,"not(isnull([HR]))")+1)<0.5,DMin("[HR]”,"HRtbl"),[HR])))/2

(I wasn’t able to post this without putting in carriage returns, which you’d have to remove.)


Page 1 of 1 pages


Name (required)
E-Mail (optional; WILL be published)
Website (optional)

<< Back to main


Latest...

COMMENTS

May 25 08:11
What sabermetrics is NOT

May 25 06:43
Largest demonstration in Canadian history?

May 25 06:39
Lack of hustle during a game

May 25 02:38
NFLPA lawsuit against collusion

May 25 01:43
Neal Huntington’s best moves

May 24 23:50
Rooting for laundry

May 24 17:04
Firefox, IE, or Chrome?

May 24 12:07
How to beat the shift

May 24 11:11
Incredible story

May 24 09:41
Racial bias in card collecting: not the collectors, but the players on the cards