I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities:
select country, concat_ws(city, "|") as cities
from my_table
group by country
Is this possible in Hive? I am using Hive 0.11 from CDH5 right now
Group by query:Group by clause use columns on Hive tables for grouping particular column values mentioned with the group by. For whatever the column name we are defining a “groupby” clause the query will selects and display results by grouping the particular column values.
The GROUP BY clause is used to group all the records in a result set using a particular collection column. It is used to query a group of records.
In database management an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list.
Source: Aggregate function - Wikipedia
Hive's out-of-the-box aggregate functions listed on the following web-page:
Built-in Aggregate Functions (UDAF - user defined aggregation function)
So, the only built-in option (for Hive 0.11; for Hive 0.13 and above you have collect_list
) is:array collect_set(col)
This one will answer your request in case there is no duplicate city
records per country
(returns a set of objects with duplicate elements eliminated). Otherwise create your own UDAF or aggregate outside of Hive.
References for writing UDAF:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With