Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive: Fast concatenate two tables into one?

I have two Hive tables of the same structure (schema). What would be an efficient SQL request to concatenate them into a single table with the same structure?

Update, this works quite fast in my case:

CREATE TABLE xy AS SELECT * FROM ( SELECT *
FROM x UNION ALL
SELECT *
FROM y ) tmp;

like image 892
DarqMoth Avatar asked May 15 '14 11:05

DarqMoth


People also ask

How do I merge two tables in Hive?

SQL Merge Statement Note that, starting from Hive 2.2, merge statement is supported in Hive if you create transaction table. MERGE INTO merge_demo1 A using merge_demo2 B ON ( A.id = b.id ) WHEN matched THEN UPDATE SET A. lastname = B. lastname WHEN NOT matched THEN INSERT (id, firstname, lastname) VALUES (B.id, B.

Does merge work in Hive?

You can conditionally insert, update, or delete existing data in Hive tables using the ACID MERGE statement. The MERGE statement is based on ANSI-standard SQL.

What is Hive merge?

The MERGE statement, available since Hive 2.2, is used to perform UPDATE , DELETE , or INSERT on a target table, based on the JOIN condition matching or not against a source table or query.


Video Answer


2 Answers

If you are trying to merge table_A and table_b into a single one, the easiest way is to use the UNION ALL operator. You can find the syntax and use cases here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union

like image 77
visakh Avatar answered Nov 16 '22 04:11

visakh


"union all" is a right solution but might be expensive, resource/time wise. I'd recommend creating a table with two partitions, one for table A and another for Table B. This way, no need to merge (or union all). The merged table is available as soon as both partitions get populated.

like image 31
Tauruzzz Avatar answered Nov 16 '22 03:11

Tauruzzz