I have two tables with columns as shown below.
Posts:
user STRING,
tag_list ARRAY<STRING>
Tags:
tag STRING
Here is some sample data in those tables.
Posts:
user1 help, pig
user2 bigdata, hadoop, query, hiveql
user1 hive, hiveql, help
Tags:
hadoop
hiveql
If I want to filter out only the rows in Posts table that contain the tags listed in Tags table as shown in the results below, how would I do that in a hive query? I am not sure as to how to join these two tables using array_contains.
Expected Query Result:
user2 bigdata, hadoop, query, hiveql
user1 hive, hiveql, help
Thanks
I was able to get the expected result with this query.
select distinct b.user, b.tag_list from
(select tag from tags) a
join
(select user, tag_list, exp from posts
lateral view explode(tag_list) exploded_table as exp) b
on (a.tag = b.exp);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With