I have two Hive tables of the same structure (schema). What would be an efficient SQL request to concatenate them into a single table with the same structure?
Update, this works quite fast in my case:
CREATE TABLE xy AS SELECT *
FROM (
SELECT *
FROM x
UNION ALL
SELECT *
FROM y
) tmp;
SQL Merge Statement Note that, starting from Hive 2.2, merge statement is supported in Hive if you create transaction table. MERGE INTO merge_demo1 A using merge_demo2 B ON ( A.id = b.id ) WHEN matched THEN UPDATE SET A. lastname = B. lastname WHEN NOT matched THEN INSERT (id, firstname, lastname) VALUES (B.id, B.
You can conditionally insert, update, or delete existing data in Hive tables using the ACID MERGE statement. The MERGE statement is based on ANSI-standard SQL.
The MERGE statement, available since Hive 2.2, is used to perform UPDATE , DELETE , or INSERT on a target table, based on the JOIN condition matching or not against a source table or query.
If you are trying to merge table_A
and table_b
into a single one, the easiest way is to use the UNION ALL
operator. You can find the syntax and use cases here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union
"union all" is a right solution but might be expensive, resource/time wise. I'd recommend creating a table with two partitions, one for table A and another for Table B. This way, no need to merge (or union all). The merged table is available as soon as both partitions get populated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With