Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I join two tables together that are in different databases, in Hive?

Tags:

hive

A problem I've encountered a few times: I have a table, table1, in db1. I have table2 in db2. How do I join between the two?

The obvious thing to do is something like:

SELECT *
FROM db1.table1 INNER JOIN db2.table2
ON db1.table1.field1 = db2.table2.field2;

Hive doesn't like this, however; it starts treating "table1" and "table2" as if they were column names, and "db1" and "db2" as table names, and complaining when they don't exist. How do I join between two tables in different databases?

like image 684
Oliver Keyes Avatar asked Dec 20 '14 23:12

Oliver Keyes


People also ask

Can we join two tables from different databases in Hive?

Basically, for combining specific fields from two tables by using values common to each one we use Hive JOIN clause. In other words, to combine records from two or more tables in the database we use JOIN clause. However, it is more or less similar to SQL JOIN. Also, we use it to combine rows from multiple tables.

Can you connect tables in two different databases?

SQL Server allows you to join tables from different databases as long as those databases are on the same server. The join syntax is the same; the only difference is that you must fully qualify table names.

How do I merge two tables in Hive?

SQL Merge Statement Note that, starting from Hive 2.2, merge statement is supported in Hive if you create transaction table. MERGE INTO merge_demo1 A using merge_demo2 B ON ( A.id = b.id ) WHEN matched THEN UPDATE SET A. lastname = B. lastname WHEN NOT matched THEN INSERT (id, firstname, lastname) VALUES (B.id, B.

Can you join more than two tables in Hive?

Apache Hive for Data Engineers (Hands On) JOIN is a clause that is used for combining specific fields from two tables by using values common to each one. It is used to combine records from two or more tables in the database.

What is equi join in Hive?

An equi-join is a join based on equality or matching column values. This equality is indicated with an equal sign (=) as the comparison operator in the WHERE clause, as the following query shows.


1 Answers

Joins between tables in different databases, in Hive, uniformly require an alias to be set for each {db,table} pair. So instead of the syntax provided in the question, you have to use:

SELECT *
FROM db1.table1 alias1 INNER JOIN db2.table2 alias2
ON alias1.field1 = alias2.field2;

This works. Of course, it's important to remember that if you're asking for particular fields in the SELECT statement, the aliases apply there too. So:

SELECT db1.table1.field1, db2.table2.field2

becomes:

SELECT alias1.field1, alias2.field2
like image 54
Oliver Keyes Avatar answered Oct 06 '22 18:10

Oliver Keyes