I need to merge arrays in a GROUP BY in HiveSQL. The table schema is something like this: <pre class="prettyprint"><code>key int, value ARRAY<int> </code></pre> Now here is the SQL I would like to run: <pre class="prettyprint"><code>SELECT key, array_merge(value) FROM table_above GROUP BY key </code></pre> If this array_merge function only keeps unique values, that will be even better but not must. Cheers, K

there is no UDAF to perform that kind of operation. The following query should result in the same without much overhead (keep running one map and one reduce operation) removing duplicates <pre class="prettyprint"><code>select key, collect_set(explodedvalue) from ( select key, explodedvalue from table_above lateral view explode(value) e as explodedvalue ) t group by key; </code></pre>

Hive Aggregate function for merging arrays

Tags:

hiveql

hive-udf

I need to merge arrays in a GROUP BY in HiveSQL. The table schema is something like this:

key int,
value ARRAY<int>

Now here is the SQL I would like to run:

SELECT key, array_merge(value)
FROM table_above
GROUP BY key

If this array_merge function only keeps unique values, that will be even better but not must.

Cheers, K

573

asked Jan 09 '18 23:01

kee

1 Answers

there is no UDAF to perform that kind of operation. The following query should result in the same without much overhead (keep running one map and one reduce operation) removing duplicates

select key, collect_set(explodedvalue) from (
  select key, explodedvalue from table_above lateral view explode(value) e as explodedvalue
) t group by key;

100

answered Dec 15 '22 01:12

hlagos

Related questions
                            
                                Hive select data into an array of structs
                            
                                Hive: SemanticException [Error 10002]: Line 3:21 Invalid column reference 'name'
                            
                                Hive: Fatal error when trying to create dynamic partitions
                            
                                Hive DateTime Truncators (QUARTER, WEEK, DAYOFWEEK)?
                            
                                How to pass multiple parameter in hive script
                            
                                MismatchedTokenException on hive create table query
                            
                                what's SparkSQL SQL query to write into JDBC table?
                            
                                Filter Array in Hive
                            
                                setting compression on hive table
                            
                                Hive - Partition Column Equal to Current Date
                            
                                Unable to connect to HIVE2 via JAVA
                            
                                Use inline(ARRAY<STRUCT[,STRUCT]>) in Hive
                            
                                What does 'insert overwrite local directory' mean in Hive?
                            
                                Can we load Parquet file into Hive directly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With