Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select distinct on specific columns but select other columns also in hive

I have multiple columns in a table in hive having around 80 columns. I need to apply the distinct clause on some of the columns and get the first values from the other columns also. Below is the representation of what I am trying to achieve.

select distinct(col1,col2,col3),col5,col6,col7
from abc where col1 = 'something';

All the columns mentioned above are text columns. So I cannot apply group by and aggregate functions.

like image 713
Abhishek Avatar asked Sep 18 '25 18:09

Abhishek


1 Answers

You can use row_number function to solve the problem.

create table temp as
select *, row_number() over (partition by col1,col2,col3) as rn
from abc 
where col1 = 'something';

select *
from temp
where rn=1

You can also sort the table while partitioning.

row_number() over (partition by col1,col2,col3 order by col4 asc) as rn

like image 172
Safwan Avatar answered Sep 21 '25 11:09

Safwan