Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping hive rows in an array of this rows

Tags:

hive

hiveql

I have a table like the following :

User:String Alias:String
JohnDoe     John
JohnDoe     JDoe
Roger       Roger

And I would like to group all the aliases of an user in an array, in a new table which would look like this :

User:String Alias:array<String>
JohnDoe     [John, JDoe]
Roger       [Roger]

I can't figure out how to do that with HiveQL.Do I have to write an UDF for that ?

Thanks !

like image 326
C4stor Avatar asked May 30 '13 12:05

C4stor


People also ask

What is aggregation in Hive?

Data aggregation is the process of gathering and expressing data in a summary to get more information about particular groups based on specific conditions. HQL offers several built-in aggregate functions, such as max(...), min(...), and avg(...).

How do I query an array in Hive?

The Hive split functions split given string into an array of values. This function will split on the given delimiter or a regular expression. Following is the syntax of split array function. where str is a string value to be split and pat is a delimiter or a regular expression.

How do you explode an array in Hive?

Explode function syntaxselect explode (<MAP>) from <table_name>; It will return n number of rows where n is the size of the array/map. This function represent each element of array/map as a row.

How do you find AVG in Hive?

count(*), count(expr), count(*) - Returns the total number of retrieved rows. It returns the sum of the elements in the group or the sum of the distinct values of the column in the group. It returns the average of the elements in the group or the average of the distinct values of the column in the group.


1 Answers

Check out the built-in aggregate function collect_set.

select 
    User, 
    collect_set(Alias) as Alias
from table
group by User;
like image 89
Lukas Vermeer Avatar answered Oct 21 '22 02:10

Lukas Vermeer