Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to concat a string field after group by in Hive

I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities:

select country, concat_ws(city, "|") as cities
from my_table
group by country

Is this possible in Hive? I am using Hive 0.11 from CDH5 right now

like image 540
kee Avatar asked May 03 '15 05:05

kee


People also ask

Does Hive support group by?

Group by query:Group by clause use columns on Hive tables for grouping particular column values mentioned with the group by. For whatever the column name we are defining a “groupby” clause the query will selects and display results by grouping the particular column values.

What is group by in Hive?

The GROUP BY clause is used to group all the records in a result set using a particular collection column. It is used to query a group of records.


1 Answers

In database management an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list.

Source: Aggregate function - Wikipedia

Hive's out-of-the-box aggregate functions listed on the following web-page:
Built-in Aggregate Functions (UDAF - user defined aggregation function)

So, the only built-in option (for Hive 0.11; for Hive 0.13 and above you have collect_list) is:
array collect_set(col)

This one will answer your request in case there is no duplicate city records per country (returns a set of objects with duplicate elements eliminated). Otherwise create your own UDAF or aggregate outside of Hive.

References for writing UDAF:

  • Writing GenericUDAFs: A Tutorial
  • HivePlugins
  • Create/Drop Function
like image 134
Vyacheslav Shkolyar Avatar answered Sep 28 '22 06:09

Vyacheslav Shkolyar