Comparing two tables for equality in HIVE

Tags:

I have two tables, table1 and table2. Each with the same columns:

key, c1, c2, c3

I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE):

select count(*) from table1 t1 
left outer join table2 t2
on t1.key=t2.key
where t2.key is null or t1.c1<>t2.c1 or t1.c2<>t2.c2 or t1.c3<>t2.c3

And

select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t2.key is null

So my idea is that, if a zero count is returned, the tables are the same. However, I'm getting a zero count for the first query, and a non-zero count for the second query. How exactly do they differ? If there is a better way to check this certainly let me know.

474

asked Aug 04 '15 11:08

Danzo

2 Answers

Well, the best way is calculate the hash sum of each table, and compare the sum of hash. So no matter how many column are they, no matter what data type are they, as long as the two table has the same schema, you can use following query to do the comparison:

select sum(hash(*)) from t1;
select sum(hash(*)) from t2;

And you just need to compare the return values.

188

answered Sep 17 '22 17:09

Youjun Yuan

If you want to check for duplicates and the tables have exactly the same structure and the tables do not have duplicates within them, then you can do:

select t.key, t.c1, t.c2, t.c3, count(*) as cnt
from ((select t1.*, 1 as which from table1 t1) union all
      (select t2.*, 2 as which from table2 t2)
     ) t
group by t.key, t.c1, t.c2, t.c3
having cnt <> 2;

There are various ways that you can relax the conditions in the first paragraph, if necessary.

Note that this version also works when the columns have NULL values. These might be causing the problem with your data.

answered Sep 19 '22 17:09

Gordon Linoff

Related questions
                            
                                UNION syntax in Cakephp
                            
                                How can I learn to optimize SQL queries [closed]
                            
                                default value of GUID in for a column in mysql
                            
                                Like clause and sql injection
                            
                                How do I find the position of a character in a SQLite column?
                            
                                Violation of UNIQUE KEY constraint during SQL update
                            
                                Postgresql: table name / schema confusion
                            
                                How to convert polygon data into line segments using PostGIS
                            
                                How to know when a set of RabbitMQ tasks are complete?
                            
                                Select rows from a table where row in another table with same id has a particular value in another column
                            
                                How to control nullability in SELECT INTO for literal-based columns
                            
                                How can I create index on nvarchar(max) datatype in sql?
                            
                                Mysql - select ids that match all tags
                            
                                Postgres Next/Previous row SQL Query
                            
                                SQL Transaction uncommittable while using try..catch.. Why?
                            
                                PL/SQL datastructure like Hashmap
                            
                                Looping through column names with dynamic SQL
                            
                                How to delete all MySQL tables beginning with a certain prefix?
                            
                                Rails Query with ILIKE
                            
                                Algebra Relational sql GROUP BY SORT BY ORDER BY

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Comparing two tables for equality in HIVE

Tags:

sql

join

left-join

hive

hiveql

Danzo

People also ask

2 Answers

Youjun Yuan

Gordon Linoff

Recent Activity

Donate For Us