Is it possible to concat a string field after group by in Hive

Tags:

I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities:

select country, concat_ws(city, "|") as cities
from my_table
group by country

Is this possible in Hive? I am using Hive 0.11 from CDH5 right now

540

asked May 03 '15 05:05

kee

1 Answers

In database management an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list.

Source: Aggregate function - Wikipedia

Hive's out-of-the-box aggregate functions listed on the following web-page:
Built-in Aggregate Functions (UDAF - user defined aggregation function)

So, the only built-in option (for Hive 0.11; for Hive 0.13 and above you have collect_list) is:
array collect_set(col)

This one will answer your request in case there is no duplicate city records per country (returns a set of objects with duplicate elements eliminated). Otherwise create your own UDAF or aggregate outside of Hive.

References for writing UDAF:

Writing GenericUDAFs: A Tutorial
HivePlugins
Create/Drop Function

134

answered Sep 28 '22 06:09

Vyacheslav Shkolyar

Related questions
                            
                                Error - b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'
                            
                                How To Refresh/Clear the DistributedCache When Using Hue + Beeswax To Run Hive Queries That Define Custom UDFs?
                            
                                Hive: work around for non equi left join
                            
                                Delta/Incremental Load in Hive
                            
                                Configured the HA Cluster with Hive-2.0.1(Derby Support) shows redundant database names?
                            
                                Connecting to Hive using python's Jaydebeapi
                            
                                Hive query too slow and failed
                            
                                Read data from remote hive on spark over JDBC returns empty result
                            
                                Presto: cast array<struct<key:string,value:array<string>>> into map<string,array<string>>
                            
                                Spark and Hive in Hadoop 3: Difference between metastore.catalog.default and spark.sql.catalogImplementation
                            
                                JSON SerDe for Hive that supports JSON arrays
                            
                                Hive alter location statement not working
                            
                                Apache Phoenix vs Hive-Spark
                            
                                howto add hive properties at runtime in spark-shell
                            
                                Spring-Batch for a massive nightly / hourly Hive / MySQL data processing
                            
                                Missing Hive Execution Jar: /usr/local/hadoop/hive/lib/hive-exec-*.jar
                            
                                Impala cannot find com.mysql.jdbc.Driver
                            
                                How to insert data into Parquet table in Hive
                            
                                Get sequential number of a row (rank) within a partition without using ROW_NUMBER() OVER function
                            
                                Cannot validate serde : org.openx.data.jsonserde.jsonserde

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to concat a string field after group by in Hive

Tags:

hive

cloudera-cdh

kee

People also ask

1 Answers

Vyacheslav Shkolyar

Recent Activity

Donate For Us