I'm trying to append one variable from several tables together (aka row-bind, concatenate) to make one longer table with a single column in Hive. I think this is possible using UNION ALL
based on this question ( HiveQL UNION ALL ), but I'm not sure an efficient way to accomplish this?
The pseudocode would look something like this:
CREATE TABLE tmp_combined AS
SELECT b.var1 FROM tmp_table1 b
UNION ALL
SELECT c.var1 FROM tmp_table2 c
UNION ALL
SELECT d.var1 FROM tmp_table3 d
UNION ALL
SELECT e.var1 FROM tmp_table4 e
UNION ALL
SELECT f.var1 FROM tmp_table5 f
UNION ALL
SELECT g.var1 FROM tmp_table6 g
UNION ALL
SELECT h.var1 FROM tmp_table7 h;
Any help is appreciated!
Try with following coding...
Select * into tmp_combined from
(
SELECT b.var1 FROM tmp_table1 b
UNION ALL
SELECT c.var1 FROM tmp_table2 c
UNION ALL
SELECT d.var1 FROM tmp_table3 d
UNION ALL
SELECT e.var1 FROM tmp_table4 e
UNION ALL
SELECT f.var1 FROM tmp_table5 f
UNION ALL
SELECT g.var1 FROM tmp_table6 g
UNION ALL
SELECT h.var1 FROM tmp_table7 h
) CombinedTable
Use with the statement : set hive.exec.parallel=true
This will execute different selects simultaneously otherwise it would be step by step.
I would say that's both straightforward and efficient way to do the row-bind, at least, that's what I would use in my code. Btw, it might cause you some syntax error if you put your pseudo code directly, you may try:
create table join_table as
select * from
(select ...
join all
select
join all
select...) tmp;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With