Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comma delimited string to individual rows - Impala SQL

Let's suppose we have a table:

Owner   | Pets
------------------------------
Jack    | "dog, cat, crocodile" 
Mary    | "bear, pig"

I want to get as a result:

Owner   | Pets
------------------------------
Jack    | "dog"
Jack    | "cat"
Jack    | "crocodile" 
Mary    | "bear"
Mary    | "pig"

I found some solutions to similar problems by googling, but Impala SQL does not offer any of these capabilities to apply the suggested solutions.

Any help would be greatly appreciated!

like image 476
ifotopoulos Avatar asked May 23 '16 19:05

ifotopoulos


2 Answers

The following works in Impala:

split_part(string source, string delimiter, bigint n)

You can find the documentation here:

https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_string_functions.html

like image 167
Timothée HENRY Avatar answered Sep 29 '22 08:09

Timothée HENRY


You have to do this with Hive SQL.

So lets create the impala table

-- imapla sql
CREATE TABLE IF NOT EXISTS tmp.my_example (
  `Owner` VARCHAR,
  Pets VARCHAR
);

INSERT INTO tmp.my_example (`Owner`, `Pets`) VALUES
('Jack', 'dog, cat, crocodile'),
('Mary', 'bear, pig');

Then use hive sql to split the column to rows:

-- hive sql
select f.owner as owner, t1.pos as pos, t1.val AS pet
from tmp.my_example f
lateral view posexplode(split(pets,', ')) t1 

If you save your results as a new table don't forget to run refresh new_table in impala so that your new table can be used

P.S. Hive is incredible slow for this small table but performs reasonable for real data

like image 33
fc9.30 Avatar answered Sep 29 '22 08:09

fc9.30