Most databases have a built in function for calculating the median but I don't see anything for median in Amazon Redshift.
You could calculate the median using a combination of the nth_value() and count() analytic functions but that seems janky. I would be very surprised if an analytics db didn't have a built in method for computing median so I'm assuming I'm missing something.
http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_NTH_WF.html http://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html
The return type of median expression in redshift is date, decimal and double. If suppose we have provided input value Int, numeric and decimal format then return type of median function is decimal. If suppose we have provided the input value as float and double then the return type of median function is double.
To find the median: Arrange the data points from smallest to largest. If the number of data points is odd, the median is the middle data point in the list. If the number of data points is even, the median is the average of the two middle data points in the list.
There's no dedicated function to calculate the median in SQL, so users often calculate the median manually. To do this, sort the data within the program in descending order. From this, you can select the top 50 percent of the data and select the last value.
MEDIAN() is a window function that returns the median value of a range of values. It is a specific case of PERCENTILE_CONT, with an argument of 0.5 and the ORDER BY column the one in MEDIAN 's argument.
And as of 2014-10-17, Redshift supports the MEDIAN window function:
# select min(median) from (select median(num) over () from temp);
min
-----
4.0
Try the NTILE function.
You would divide your data into 2 ranked groups and pick the minimum value from the first group. That's because in datasets with an odd number of values, the first ntile will have 1 more value than the second. This approximation should work very well for large datasets.
create table temp (num smallint);
insert into temp values (1),(5),(10),(2),(4);
select num, ntile(2) over(order by num desc) from temp ;
num | ntile
-----+-------
10 | 1
5 | 1
4 | 1
2 | 2
1 | 2
select min(num) as median from (select num, ntile(2) over(order by num desc) from temp) where ntile = 1;
median
--------
4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With