I have two similar tables in Postgres with just one 32-byte latin field (simple md5 hash). Both tables have ~30,000,000 rows. Tables have little difference (10-1000 rows are different) Is it possible with Postgres to find a difference between these tables, the result should be 10-1000 rows I described above. This is not a real task, I just want to know about how PostgreSQL deals with JOIN-like logic.

<code>EXISTS</code> seems like the best option. <code>tbl1</code> is the table with surplus rows in this example: <pre class="prettyprint"><code>SELECT * FROM tbl1 WHERE NOT EXISTS (SELECT FROM tbl2 WHERE tbl2.col = tbl1.col); </code></pre> If you don't know which table has surplus rows or both have, you can either repeat the above query after switching table names, or: <pre class="prettyprint"><code>SELECT * FROM tbl1 FULL OUTER JOIN tbl2 USING (col) WHERE tbl2 col IS NULL OR tbl1.col IS NULL; </code></pre> Overview over basic techniques in a later post: <ul> <li>Select rows which are not present in other table</li> </ul> Aside: The data type <code>uuid</code> is efficient for md5 hashes: <ul> <li>Convert hex in text representation to decimal number</li> <li>Would index lookup be noticeably faster with char vs varchar when all values are 36 chars</li> </ul>

Find difference between two big tables in PostgreSQL

1 Answers

EXISTS seems like the best option.

tbl1 is the table with surplus rows in this example:

SELECT *
FROM   tbl1
WHERE  NOT EXISTS (SELECT FROM tbl2 WHERE tbl2.col = tbl1.col);

If you don't know which table has surplus rows or both have, you can either repeat the above query after switching table names, or:

SELECT *
FROM   tbl1
FULL   OUTER JOIN tbl2 USING (col)
WHERE  tbl2 col IS NULL OR
       tbl1.col IS NULL;

Overview over basic techniques in a later post:

Select rows which are not present in other table

Aside: The data type uuid is efficient for md5 hashes:

Convert hex in text representation to decimal number
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

176

answered Sep 19 '22 10:09

Erwin Brandstetter

Related questions
                            
                                Reading SQL Varbinary Blob from Database
                            
                                Select latest row for each group from oracle
                            
                                Why isn't psycopg2 executing any of my SQL functions? (IndexError: tuple index out of range)
                            
                                How do I connect to my 64-bit SQL Server with ODBC?
                            
                                Sum totals of two queries
                            
                                SQL Server default date time stamp?
                            
                                mysql select distinct rows into a comma delimited list column
                            
                                Chaining orX in Doctrine2 query builder
                            
                                First day of current year
                            
                                SQL check if group contains NULL
                            
                                #1146 - Table 'phpmyadmin.pma__tracking' doesn't exist
                            
                                Get date of 3 days ago
                            
                                cx_Oracle: How can I receive each row as a dictionary?
                            
                                .NET Decimal = what in SQL?
                            
                                Execution order of conditions in SQL 'where' clause
                            
                                ORDER BY with a UNION of disparate datasets (T-SQL)
                            
                                Single SQL query on many to many relationship
                            
                                Case insensitive duplicates SQL
                            
                                Oracle SQL trigger on update of column
                            
                                SQL - Is there an opposite but equivalent UNION function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find difference between two big tables in PostgreSQL

Tags:

sql

postgresql

left-join

full-outer-join

exists

odiszapc

People also ask

1 Answers

Erwin Brandstetter

Recent Activity

Donate For Us