I have a big data problem in Hive (SQL).
SELECT genre, COUNT(*) AS unique_count
FROM table_name
GROUP BY genre
which gives result like:
genre           |   unique_count
----------------------------------
Romance,Crime,Drama,Law | 1560
Crime,Drama,Law,Romance | 895
Law,Romance,Crime,Drama | 942
Adventure,Action        | 3250
Action,Adventure        | 910
What I want is to sort the elements in genre ASC|DESC and get results like
genre           |   unique_count
----------------------------------
Crime,Drama,Law,Romance | 3397
Action,Adventure        | 4160
I could do this in Python but I have over 200 Million rows of data. I'm not aware of any reasonable way I can move that data.
So how can I achieve this?
select      concat_ws(',',sort_array(split(genre,','))) as genre
           ,count(*)                                    as unique_count
from        table_name
group by    concat_ws(',',sort_array(split(genre,',')))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With