Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive (Finding min of n columns in a row)

Tags:

sql

hadoop

hive

I have a table in Hive with with 5 columns i.e. email, a_first_date, b_first_date, c_first_date, d_first_date.

a, b, c, d are 4 different actions which a user can take and 4 columns in the table above indicate the date on which the user did the first respective action. For example, the value in 'a_first_date' has the date on which the user did action a.

Output: What I want is 2 columns having email, overall_first_date i.e. on what date user did his first action?

Example Table: (Assume all values are BIGINT type other than email)

email,a_first_date,b_first_date,c_first_date,d_first_date
abc,20140707,20140702,20140801,20140907
xyz,20140107,20140822,20140201,20141007

Output:

email,overall_first_date
abc,20140702
xyz,20140107

Possible couple of solutions are to write a UDF or compare these values with each other using IF ELSE and then find the minimum but that would involve a lot of comparisons.

Alternatively I can do a:

select email, min(action) as overall_first_date from

(
select email, a_first_date as action from mytable
UNION ALL
select email, b_first_date as action from mytable
UNION ALL
select email, c_first_date as action from mytable
UNION ALL
select email, d_first_date as action from mytable
) q1

GROUP BY email 

but again this is not a good way.

Could anyone please suggest a better way to achieve this?

like image 948
Amar Avatar asked Oct 09 '14 10:10

Amar


1 Answers

Use the function least(). For example; Select *, least(col1,col2,col3) as minofcol from Tablename;

like image 153
Yerandy Avatar answered Sep 24 '22 20:09

Yerandy