I have a hive table,
name age sal
A 45 1222
B 50 4555
c 44 8888
D 78 1222
E 12 7888
F 23 4555
I want to calculate median of age column.
Below is my approach
select min(age) as HMIN,max(age) as HMAX,count(age) as HCount,
IF(count(age)%2=0,'even','Odd') as PCOUNT
from v_act_subjects_bh;
Appreciate any query suggestion
So, create select queries normally; then go into SQL view and include "Top 50 Percent" after select keyword. Sort ascending for bottom 50 percent; sort descending for top 50 percent. Then find maximum of bottom percent result set and minimum of top percent result set. The average of these two is the median.
count(*), count(expr), count(*) - Returns the total number of retrieved rows. It returns the sum of the elements in the group or the sum of the distinct values of the column in the group. It returns the average of the elements in the group or the average of the distinct values of the column in the group.
Use PERCENTILE_APPROX if your input is non-integral. Returns an approximate pth percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000.
The median is the middle value. The data needs to be sorted so that the middle record can be found.
You can use the percentile function to compute the median. Try this:
select percentile(cast(age as BIGINT), 0.5) from table_name
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With