I have a table in Hive with with 5 columns i.e. email, a_first_date, b_first_date, c_first_date, d_first_date.
a, b, c, d are 4 different actions which a user can take and 4 columns in the table above indicate the date on which the user did the first respective action. For example, the value in 'a_first_date' has the date on which the user did action a.
Output: What I want is 2 columns having email, overall_first_date i.e. on what date user did his first action?
Example Table: (Assume all values are BIGINT type other than email)
email,a_first_date,b_first_date,c_first_date,d_first_date
abc,20140707,20140702,20140801,20140907
xyz,20140107,20140822,20140201,20141007
Output:
email,overall_first_date
abc,20140702
xyz,20140107
Possible couple of solutions are to write a UDF or compare these values with each other using IF ELSE and then find the minimum but that would involve a lot of comparisons.
Alternatively I can do a:
select email, min(action) as overall_first_date from
(
select email, a_first_date as action from mytable
UNION ALL
select email, b_first_date as action from mytable
UNION ALL
select email, c_first_date as action from mytable
UNION ALL
select email, d_first_date as action from mytable
) q1
GROUP BY email
but again this is not a good way.
Could anyone please suggest a better way to achieve this?
Use the function least(). For example; Select *, least(col1,col2,col3) as minofcol from Tablename;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With