Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate median in Hive

Tags:

hive

hiveql

I have a hive table,

name    age     sal
A       45      1222
B       50      4555
c       44      8888
D       78      1222
E       12      7888
F       23      4555

I want to calculate median of age column.

Below is my approach

select min(age) as HMIN,max(age) as HMAX,count(age) as HCount,
IF(count(age)%2=0,'even','Odd') as PCOUNT 
from v_act_subjects_bh;

Appreciate any query suggestion

like image 324
Amaresh Avatar asked Nov 11 '14 10:11

Amaresh


People also ask

How do you find the median of an Access query?

So, create select queries normally; then go into SQL view and include "Top 50 Percent" after select keyword. Sort ascending for bottom 50 percent; sort descending for top 50 percent. Then find maximum of bottom percent result set and minimum of top percent result set. The average of these two is the median.

How do you find average in hive?

count(*), count(expr), count(*) - Returns the total number of retrieved rows. It returns the sum of the elements in the group or the sum of the distinct values of the column in the group. It returns the average of the elements in the group or the average of the distinct values of the column in the group.

How is Hive percentile calculated?

Use PERCENTILE_APPROX if your input is non-integral. Returns an approximate pth percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000.

What is median SQL?

The median is the middle value. The data needs to be sorted so that the middle record can be found.


1 Answers

You can use the percentile function to compute the median. Try this:

select percentile(cast(age as BIGINT), 0.5) from table_name
like image 76
Amar Avatar answered Oct 10 '22 02:10

Amar