Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive check elements in array

Tags:

I have an Hive table made of user_id and item_id (id of items that have been purchased by the user). I want to get a list of all the users who purchased item 1 but not item 2 and 3.

To do this I wrote the simple query:

SELECT user_id, collect_set(item_id) itemslist FROM mytable WHERE item_id in (1, 2) GROUP BY user_id HAVING -- what should I put here??? 

As you can see, I don't know how to check whether the array itemslist contains 1 and not 2.

How do you do this? If there is some more efficient way can you please tell me both (or more) methods?

like image 371
lucacerone Avatar asked Nov 06 '14 07:11

lucacerone


1 Answers

There are some collection functions in Hive `(See collection functions here : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF ) which can use here.

You can use the array_contains(Array<T>, value) function to check if item 1 is present and the size(Array<T>) function to make sure the length is 1. If both conditions are satisfied, you will get the desired output.

like image 89
Amar Avatar answered Oct 20 '22 05:10

Amar