Calculating median with Group By in AWS Redshift

Question

I've seen other posts about using the median() window function in Redshift, but how would you use it with a query that has a group by at the end?

For example, assume table course:

Course | Subject | Num_Students
-------------------------------
   1   |  Math   |      4
   2   |  Math   |      6
   3   |  Math   |      10
   4   | Science |      2
   5   | Science |      10
   6   | Science |      12

I want to get the median number of students for each course subject. How would I write a query that gives the following result:

  Subject  | Median
-----------------------
 Math      |     6
 Science   |     10

I've tried:

SELECT
subject, median(num_students) over ()
FROM
course
GROUP BY 1
;

But it lists every occurrence of the subject and the same median number across subjects like (this is fake data so the actual value it returns is not 6, but just showing it's the same across all subjects):

  Subject  | Median
-----------------------
 Math      |     6
 Math      |     6
 Math      |     6
 Science   |     6
 Science   |     6
 Science   |     6

isaac.hazan · Accepted Answer

The following will get you exactly the result you are looking for:

SELECT distinct
subject, median(num_students) over(partition by Subject) 
FROM
course
order by Subject;

PieCharmed · Answer

You simply need to remove the "over()" portion of it.

SELECT subject, median(num_students) FROM course GROUP BY 1;

Jakub Kania · Answer

You haven't defined a partition in the window. Instead of OVER() you need OVER(PARTITION BY subject).

Calculating median with Group By in AWS Redshift

Tags:

group-by

window-functions

amazon-redshift

clavid

3 Answers

isaac.hazan

PieCharmed

Jakub Kania

Recent Activity

Donate For Us

Calculating median with Group By in AWS Redshift

Tags:

group-by

window-functions

amazon-redshift

clavid

3 Answers

isaac.hazan

PieCharmed

Jakub Kania

Related questions

Recent Activity

Donate For Us