Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(Hive, SQL) - How to sort a list of string inside a column?

I have a big data problem in Hive (SQL).

SELECT genre, COUNT(*) AS unique_count
FROM table_name
GROUP BY genre

which gives result like:

genre           |   unique_count
----------------------------------
Romance,Crime,Drama,Law | 1560
Crime,Drama,Law,Romance | 895
Law,Romance,Crime,Drama | 942
Adventure,Action        | 3250
Action,Adventure        | 910

What I want is to sort the elements in genre ASC|DESC and get results like

genre           |   unique_count
----------------------------------
Crime,Drama,Law,Romance | 3397
Action,Adventure        | 4160

I could do this in Python but I have over 200 Million rows of data. I'm not aware of any reasonable way I can move that data. So how can I achieve this?

like image 533
Afloz Avatar asked Dec 23 '22 20:12

Afloz


1 Answers

select      concat_ws(',',sort_array(split(genre,','))) as genre
           ,count(*)                                    as unique_count

from        table_name

group by    concat_ws(',',sort_array(split(genre,',')))
like image 71
David דודו Markovitz Avatar answered Dec 28 '22 07:12

David דודו Markovitz