Is there a convenient way to use the ARRAY_CONTAINS function in hive to search for multiple entries in an array column rather than just one? So rather than:
WHERE ARRAY_CONTAINS(array, val1) OR ARRAY_CONTAINS(array, val2)
I would like to write:
WHERE ARRAY_CONTAINS(array, val1, val2)
The full problem is that I need to read val1
and val2
dynamically from the command line arguments when I run the script and I generally don't know how many values will be conditioned on. So you can think of vals
being a comma separated list (or array) containing values val1
, val2
, ...
, and I want to write
WHERE ARRAY_CONTAINS(array, vals)
Thanks in advance!
Hive comes with a set of collection functions to work with Map and Array data types. These functions are used to find the size of the array, map types, get all map keys, values, sort array, and finding if an element exists in an array.
Hive Split Function The Hive split functions split given string into an array of values. This function will split on the given delimiter or a regular expression. Following is the syntax of split array function. where str is a string value to be split and pat is a delimiter or a regular expression.
You can use the array_contains(Array<T>, value) function to check if item 1 is present and the size(Array<T>) function to make sure the length is 1. If both conditions are satisfied, you will get the desired output.
There is a UDF here that will let you take the intersection of two arrays. Assuming your values have the structure
values_array = [val1, val2, ..., valn]
You could then do
where array_intersection(array, values_array)[0] is not null
If they don't have any elements in common, []
will be returned and therefore [][0]
will be null
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With