I have a big data problem in Hive (SQL).
SELECT genre, COUNT(*) AS unique_count
FROM table_name
GROUP BY genre
which gives result like:
genre | unique_count
----------------------------------
Romance,Crime,Drama,Law | 1560
Crime,Drama,Law,Romance | 895
Law,Romance,Crime,Drama | 942
Adventure,Action | 3250
Action,Adventure | 910
What I want is to sort the elements in genre ASC|DESC and get results like
genre | unique_count
----------------------------------
Crime,Drama,Law,Romance | 3397
Action,Adventure | 4160
I could do this in Python but I have over 200 Million
rows of data. I'm not aware of any reasonable way I can move that data.
So how can I achieve this?
select concat_ws(',',sort_array(split(genre,','))) as genre
,count(*) as unique_count
from table_name
group by concat_ws(',',sort_array(split(genre,',')))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With