Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive Aggregate function for merging arrays

I need to merge arrays in a GROUP BY in HiveSQL. The table schema is something like this:

key int,
value ARRAY<int>

Now here is the SQL I would like to run:

SELECT key, array_merge(value)
FROM table_above
GROUP BY key

If this array_merge function only keeps unique values, that will be even better but not must.

Cheers, K

like image 573
kee Avatar asked Jan 09 '18 23:01

kee


People also ask

Does Hive support aggregation?

Hive provides a set of functions to do aggregation on a dataset. These functions operate on a range of data (rows) and provide the cumulative or relative result.

What are aggregate functions in hive?

Hive Aggregate Functions are the most used built-in functions that take a set of values and return a single value, when used with a group, it aggregates all values in each group and returns one value for each group.

How do you find AVG in hive?

count(*), count(expr), count(*) - Returns the total number of retrieved rows. It returns the sum of the elements in the group or the sum of the distinct values of the column in the group. It returns the average of the elements in the group or the average of the distinct values of the column in the group.

How do I find the size of an array in hive?

You can use the array_contains(Array<T>, value) function to check if item 1 is present and the size(Array<T>) function to make sure the length is 1. If both conditions are satisfied, you will get the desired output.


1 Answers

there is no UDAF to perform that kind of operation. The following query should result in the same without much overhead (keep running one map and one reduce operation) removing duplicates

select key, collect_set(explodedvalue) from (
  select key, explodedvalue from table_above lateral view explode(value) e as explodedvalue
) t group by key;
like image 100
hlagos Avatar answered Dec 15 '22 01:12

hlagos