Let's suppose we have a table:
Owner | Pets
------------------------------
Jack | "dog, cat, crocodile"
Mary | "bear, pig"
I want to get as a result:
Owner | Pets
------------------------------
Jack | "dog"
Jack | "cat"
Jack | "crocodile"
Mary | "bear"
Mary | "pig"
I found some solutions to similar problems by googling, but Impala SQL does not offer any of these capabilities to apply the suggested solutions.
Any help would be greatly appreciated!
The following works in Impala:
split_part(string source, string delimiter, bigint n)
You can find the documentation here:
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_string_functions.html
You have to do this with Hive SQL.
So lets create the impala table
-- imapla sql
CREATE TABLE IF NOT EXISTS tmp.my_example (
`Owner` VARCHAR,
Pets VARCHAR
);
INSERT INTO tmp.my_example (`Owner`, `Pets`) VALUES
('Jack', 'dog, cat, crocodile'),
('Mary', 'bear, pig');
Then use hive sql to split the column to rows:
-- hive sql
select f.owner as owner, t1.pos as pos, t1.val AS pet
from tmp.my_example f
lateral view posexplode(split(pets,', ')) t1
If you save your results as a new table don't forget to run refresh new_table
in impala so that your new table can be used
P.S. Hive is incredible slow for this small table but performs reasonable for real data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With