Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive - Checking if an array in each row of a table contains any matching data in a column in another table

I have two tables with columns as shown below.

Posts:
user STRING,
tag_list ARRAY<STRING>

Tags:
tag STRING

Here is some sample data in those tables.

Posts:
user1   help, pig
user2   bigdata, hadoop, query, hiveql
user1   hive, hiveql, help

Tags:
hadoop
hiveql

If I want to filter out only the rows in Posts table that contain the tags listed in Tags table as shown in the results below, how would I do that in a hive query? I am not sure as to how to join these two tables using array_contains.

Expected Query Result:
user2   bigdata, hadoop, query, hiveql
user1   hive, hiveql, help

Thanks

like image 504
rev Avatar asked Jan 27 '14 18:01

rev


1 Answers

I was able to get the expected result with this query.

select distinct b.user, b.tag_list from 
(select tag from tags) a
join
(select user, tag_list, exp from posts
lateral view explode(tag_list) exploded_table as exp) b
on (a.tag = b.exp);
like image 111
rev Avatar answered Oct 13 '22 00:10

rev