Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine columns from multiple columns into one in Hive

Tags:

hadoop

hive

Is there any way to do kind of reverse thing for explode() function in Apache Hive. Let's say I have a table in this form id int, description string, url string, ...

And from this table I would like to create table which looks like id int, json string where in json column stored all other columns as json. "description":"blah blah", "url":"http:", ...

like image 929
user1831986 Avatar asked Apr 15 '13 07:04

user1831986


People also ask

How do I concatenate columns in Hive?

Use concat_ws function to concatenate values with ^ as a delimiter. If columns are not string, wrap them with cast as string using shell, this will allow concat_ws work with strings and not-string columns.

How do you concatenate in Hadoop?

You can use || operator to concatenate two or more strings. The result of the operation is always a string. If an operand is a number, it is implicitly converted to string before concatenation. If an operand is NULL, it is treated as an empty string '' in the concatenation.

Which package is used for combining columns?

As previously mentioned, the stringr package is part of the Tidyverse packages which also includes packages such as tidyr and the unite() function. In the next section, we are going to merge two columns in R using the unite() function as well.

How do I edit multiple columns in Hive?

In the Hive documentation you can find the following: ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];


2 Answers

Hive has access to some string operations which can be used to combine multiple columns into one column

SELECT id, CONCAT(CONCAT("(", CONCAT_WS(", ", description, url)), ")") as descriptionAndUrl 
FROM originalTable

This is obviously going to get complicated fast for combining many columns into valid JSON. If this is one-of and you know that all of the JSON strings will have the same properties you might get away with just CONCAT for your purposes.

The "right" way to do it would be to write a User Defined Function which takes a list of columns and spits out a JSON string. This will be much more maintainable if you ever need to add columns or do the same thing to other tables.

It's likely someone has already written one you can use, so you should look around. Unfortunately the [JSON related UDFs provided by Hive]https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object) work from JSON strings, they don't make them.

like image 162
Daniel Koverman Avatar answered Oct 31 '22 15:10

Daniel Koverman


You can concatenate string variables using CONCAT_WS in HIve

SELECT CONCAT_WS('-','string1','string2','string3') FROM TABLE

like image 30
Saranga Avatar answered Oct 31 '22 16:10

Saranga