Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find difference between two big tables in PostgreSQL

I have two similar tables in Postgres with just one 32-byte latin field (simple md5 hash). Both tables have ~30,000,000 rows. Tables have little difference (10-1000 rows are different)

Is it possible with Postgres to find a difference between these tables, the result should be 10-1000 rows I described above.

This is not a real task, I just want to know about how PostgreSQL deals with JOIN-like logic.

like image 499
odiszapc Avatar asked Mar 11 '13 02:03

odiszapc


People also ask

How do I find the relationship between two tables in PostgreSQL?

If there are foreign keys between the tables, then you can find the relationship between them. To do this, you can call \d on a table and see the foreign keys associated with its columns.

How do you do a difference in in PostgreSQL?

Discussion: To calculate the difference between the timestamps in PostgreSQL, simply subtract the start timestamp from the end timestamp. Here, it would be arrival - departure . The difference will be of the type interval , which means you'll see it in days, hours, minutes, and seconds.


1 Answers

EXISTS seems like the best option.

tbl1 is the table with surplus rows in this example:

SELECT *
FROM   tbl1
WHERE  NOT EXISTS (SELECT FROM tbl2 WHERE tbl2.col = tbl1.col);

If you don't know which table has surplus rows or both have, you can either repeat the above query after switching table names, or:

SELECT *
FROM   tbl1
FULL   OUTER JOIN tbl2 USING (col)
WHERE  tbl2 col IS NULL OR
       tbl1.col IS NULL;

Overview over basic techniques in a later post:

  • Select rows which are not present in other table

Aside: The data type uuid is efficient for md5 hashes:

  • Convert hex in text representation to decimal number
  • Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
like image 176
Erwin Brandstetter Avatar answered Sep 19 '22 10:09

Erwin Brandstetter