Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL ranking query to compute ranks and median in sub groups

I want to compute the Median of y in sub groups of this simple xy_table:

  x | y --groups--> gid |   x | y --medians-->  gid |   x | y
-------             -------------               -------------
0.1 | 4             0.0 | 0.1 | 4               0.0 | 0.1 | 4
0.2 | 3             0.0 | 0.2 | 3                   |     |
0.7 | 5             1.0 | 0.7 | 5               1.0 | 0.7 | 5
1.5 | 1             2.0 | 1.5 | 1                   |     |
1.9 | 6             2.0 | 1.9 | 6                   |     |
2.1 | 5             2.0 | 2.1 | 5               2.0 | 2.1 | 5
2.7 | 1             3.0 | 2.7 | 1               3.0 | 2.7 | 1

In this example every x is unique and the table is already sorted by x. I now want to GROUP BY round(x) and get the tuple that holds the median of y in each group.

I can already compute the median for the whole table with this ranking query:

SELECT a.x, a.y FROM xy_table a,xy_table b
WHERE a.y >= b.y
GROUP BY a.x, a.y
HAVING count(*) = (SELECT round((count(*)+1)/2) FROM xy_table)

Output: 0.1, 4.0

But I did not yet succeed writing a query to compute the median for sub groups.

Attention: I do not have a median() aggregation function available. Please also do not propose solutions with special PARTITION, RANK, or QUANTILE statements (as found in similar but too vendor specific SO questions). I need plain SQL (i.e., compatible to SQLite without median() function)

Edit: I was actually looking for the Medoid and not the Median.

like image 923
Juve Avatar asked Apr 11 '13 10:04

Juve


People also ask

How do I write a ranked query in SQL?

We use ROW_Number() SQL RANK function to get a unique sequential number for each row in the specified data. It gives the rank one for the first row and then increments the value by one for each row. We get different ranks for the row having similar values as well.

Can we use rank function in where clause?

This order of operations implies that you can only use window functions in SELECT and ORDER BY . That is, window functions are not accessible in WHERE , GROUP BY , or HAVING clauses. For this reason, you cannot use any of these functions in WHERE : ROW_NUMBER() , RANK() , DENSE_RANK() , LEAD() , LAG() , or NTILE() .

How do you find the median in SQL query?

To get the median we have to use PERCENTILE_CONT(0.5). If you want to define a specific set of rows grouped to get the median, then use the OVER (PARTITION BY) clause. Here I've used PARTITION BY on the column OrderID so as to find the median of unit prices for the order ids.


1 Answers

I suggest doing the computing in your programming language:

for each group:
  for each record_in_group:
    append y to array
  median of array

But if you are stuck with SQLite, you can order each group by y and select the records in the middle like this http://sqlfiddle.com/#!5/d4c68/55/0:

UPDATE: only bigger "median" value is importand for even nr. of rows, so no avg() is needed:

select groups.gid,
  ids.y median
from (
  -- get middle row number in each group (bigger number if even nr. of rows)
  -- note the integer divisions and modulo operator
  select round(x) gid,
    count(*) / 2 + 1 mid_row_right
  from xy_table
  group by round(x)
) groups
join (
  -- for each record get equivalent of
  -- row_number() over(partition by gid order by y)
  select round(a.x) gid,
    a.x,
    a.y,
    count(*) rownr_by_y
  from xy_table a
  left join xy_table b
    on round(a.x) = round (b.x)
    and a.y >= b.y
  group by a.x
) ids on ids.gid = groups.gid
where ids.rownr_by_y = groups.mid_row_right
like image 86
Aprillion Avatar answered Sep 30 '22 17:09

Aprillion