Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate median in AWS Redshift?

Most databases have a built in function for calculating the median but I don't see anything for median in Amazon Redshift.

You could calculate the median using a combination of the nth_value() and count() analytic functions but that seems janky. I would be very surprised if an analytics db didn't have a built in method for computing median so I'm assuming I'm missing something.

http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_NTH_WF.html http://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html

like image 412
tayl0rs Avatar asked Jan 07 '14 02:01

tayl0rs


People also ask

How do you use the median in Redshift?

The return type of median expression in redshift is date, decimal and double. If suppose we have provided input value Int, numeric and decimal format then return type of median function is decimal. If suppose we have provided the input value as float and double then the return type of median function is double.

How do I calculate the median?

To find the median: Arrange the data points from smallest to largest. If the number of data points is odd, the median is the middle data point in the list. If the number of data points is even, the median is the average of the two middle data points in the list.

Can SQL do median?

There's no dedicated function to calculate the median in SQL, so users often calculate the median manually. To do this, sort the data within the program in descending order. From this, you can select the top 50 percent of the data and select the last value.

What is the median function in SQL?

MEDIAN() is a window function that returns the median value of a range of values. It is a specific case of PERCENTILE_CONT, with an argument of 0.5 and the ORDER BY column the one in MEDIAN 's argument.


2 Answers

And as of 2014-10-17, Redshift supports the MEDIAN window function:

# select min(median) from (select median(num) over () from temp);
 min 
-----
 4.0
like image 113
Doctor J Avatar answered Oct 12 '22 10:10

Doctor J


Try the NTILE function.

You would divide your data into 2 ranked groups and pick the minimum value from the first group. That's because in datasets with an odd number of values, the first ntile will have 1 more value than the second. This approximation should work very well for large datasets.

create table temp (num smallint);
insert into temp values (1),(5),(10),(2),(4);

select num, ntile(2) over(order by num desc) from temp ;
 num | ntile 
-----+-------
  10 |     1
   5 |     1
   4 |     1
   2 |     2
   1 |     2

select min(num) as median from (select num, ntile(2) over(order by num desc) from temp) where ntile = 1;
 median 
--------
      4
like image 22
dima Avatar answered Oct 12 '22 11:10

dima