I have an Hive table made of user_id and item_id (id of items that have been purchased by the user). I want to get a list of all the users who purchased item 1 but not item 2 and 3.
To do this I wrote the simple query:
SELECT user_id, collect_set(item_id) itemslist FROM mytable WHERE item_id in (1, 2) GROUP BY user_id HAVING -- what should I put here???
As you can see, I don't know how to check whether the array itemslist contains 1 and not 2.
How do you do this? If there is some more efficient way can you please tell me both (or more) methods?
There are some collection functions in Hive `(See collection functions here : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF ) which can use here.
You can use the array_contains(Array<T>, value)
function to check if item 1 is present and the size(Array<T>)
function to make sure the length is 1. If both conditions are satisfied, you will get the desired output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With