Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the SQL medians for a grouping

I am working with SQL Server 2008

If I have a Table as such:

Code   Value
-----------------------
4      240
4      299
4      210
2      NULL
2      3
6      30
6      80
6      10
4      240
2      30

How can I find the median AND group by the Code column please? To get a resultset like this:

Code   Median
-----------------------
4      240
2      16.5
6      30

I really like this solution for median, but unfortunately it doesn't include Group By: https://stackoverflow.com/a/2026609/106227

like image 804
Stu Harper Avatar asked Dec 13 '13 12:12

Stu Harper


People also ask

How do you find the median of a data set in mysql?

We calculate the median of the Distance from the demo table. SET @rowindex := -1; SELECT AVG(d. distance) as Median FROM (SELECT @rowindex:=@rowindex + 1 AS rowindex, demo.

How do you find the mean and median in SQL?

To calculate the mean, you can use the AVG function within an SQL server. The AVG function adds the data from the set and divides it by the number of rows present. This function also automatically orders the data for the user and makes for a more efficient calculation.


2 Answers

The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.

1
2
3
4

The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.

WITH CTE AS
(   SELECT  Code,
            Value, 
            [half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value), 
            [half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
    FROM    T
    WHERE   Value IS NOT NULL
)
SELECT  Code,
        (MAX(CASE WHEN Half1 = 1 THEN Value END) + 
        MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM    CTE
GROUP BY Code;

Example on SQL Fiddle


In SQL Server 2012 you can use PERCENTILE_CONT

SELECT  DISTINCT
        Code,
        Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM    T;

Example on SQL Fiddle

like image 138
GarethD Avatar answered Sep 28 '22 17:09

GarethD


SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:

WITH RankedTable AS (
    SELECT Code, Value, 
        ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
        COUNT(*) OVER (PARTITION BY Code) AS Cnt
    FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1

To elaborate a bit on this solution, consider the output of the RankedTable CTE:

Code   Value   Rnk    Cnt
---------------------------
4      240     2      3   -- Median
4      299     3      3
4      210     1      3
2      NULL    1      2
2      3       2      2   -- Median
6      30      2      3   -- Median
6      80      3      3
6      10      1      3

Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.

like image 45
Dan Avatar answered Sep 28 '22 15:09

Dan