Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Column to comma separated value in Hive

Tags:

hadoop

hive

It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:

+------+------+
| Col1 | Col2 |
+------+------+
| a    | 1    |
| a    | 5    |
| a    | 6    |
| b    | 2    |
| b    | 6    |
+------+------+

to this:

+------+-------+
| Col1 | Col2  |
+------+-------+
| a    | 1,5,6 |
| b    | 2,6   |
+------+-------+
like image 409
glp Avatar asked Mar 28 '14 06:03

glp


People also ask

What is COLLECT_ SET in Hive?

In Apache Hive the COLLECT_SET is an aggregate function that allows you to collect unique values from multiple rows into array.

What is lateral view explode in hive?

Lateral view creates a virtual table for exploded columns and make join with the base table. We need not to worry about the virtual table as it is done by hive internally.

How do I query an array in hive?

The Hive split functions split given string into an array of values. This function will split on the given delimiter or a regular expression. Following is the syntax of split array function. where str is a string value to be split and pat is a delimiter or a regular expression.


3 Answers

The aggregator function collect_set can achieve what you are trying to get. Here is the documentation. So you can write a query like:

SELECT Col1, collect_set(Col2)
FROM your_table
GROUP BY Col1;

However, there is one striking difference between MySQL's GROUP BY and Hive's collect_set that while GROUP_CONCAT also retains duplicates in the resulting array, collect_set removes the duplicates occuring in the array. In the example shown by you there are no repeating group values for Col2 so you can go ahead and use it.

like image 66
Neels Avatar answered Nov 15 '22 16:11

Neels


And there is collect_list that will take full list (with duplicates).

like image 45
Simon U Avatar answered Nov 15 '22 18:11

Simon U


Try this

SELECT Col1, concat_ws(',', collect_set(Col2)) as col2
FROM your_table
GROUP BY Col1;

apache.org documentation

like image 22
ASD Avatar answered Nov 15 '22 17:11

ASD