I have two similar tables in Postgres with just one 32-byte latin field (simple md5 hash). Both tables have ~30,000,000 rows. Tables have little difference (10-1000 rows are different)
Is it possible with Postgres to find a difference between these tables, the result should be 10-1000 rows I described above.
This is not a real task, I just want to know about how PostgreSQL deals with JOIN-like logic.
If there are foreign keys between the tables, then you can find the relationship between them. To do this, you can call \d on a table and see the foreign keys associated with its columns.
Discussion: To calculate the difference between the timestamps in PostgreSQL, simply subtract the start timestamp from the end timestamp. Here, it would be arrival - departure . The difference will be of the type interval , which means you'll see it in days, hours, minutes, and seconds.
EXISTS
seems like the best option.
tbl1
is the table with surplus rows in this example:
SELECT *
FROM tbl1
WHERE NOT EXISTS (SELECT FROM tbl2 WHERE tbl2.col = tbl1.col);
If you don't know which table has surplus rows or both have, you can either repeat the above query after switching table names, or:
SELECT *
FROM tbl1
FULL OUTER JOIN tbl2 USING (col)
WHERE tbl2 col IS NULL OR
tbl1.col IS NULL;
Overview over basic techniques in a later post:
Aside: The data type uuid
is efficient for md5 hashes:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With