Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is order preserved after UNION in PostgreSQL?

Here is the code:

CREATE TABLE audit_trail (
      old_email TEXT NOT NULL,
      new_email TEXT NOT NULL
);

INSERT INTO audit_trail(old_email, new_email)
  VALUES ('[email protected]', '[email protected]'),
         ('[email protected]', '[email protected]'),
         ('[email protected]', '[email protected]'),
         ('[email protected]', '[email protected]'),
         ('[email protected]', '[email protected]');


WITH RECURSIVE all_emails AS (
  SELECT  old_email, new_email
    FROM audit_trail
    WHERE old_email = '[email protected]'
  UNION
  SELECT at.old_email, at.new_email
    FROM audit_trail at
    JOIN all_emails a
      ON (at.old_email = a.new_email)
)
SELECT * FROM all_emails;

        old_email         |        new_email
--------------------------+--------------------------
 [email protected]     | [email protected]
 [email protected]     | [email protected]
 [email protected] | [email protected]
(3 rows)

select old_email, new_email into iter1
from audit_trail where old_email = '[email protected]';
select * from iter1;
--       old_email       |      new_email
-- ----------------------+----------------------
--  [email protected] | [email protected]
-- (1 row)

select a.old_email, a.new_email into iter2
from audit_trail a join iter1 b on (a.old_email = b.new_email);
select * from iter2;
--       old_email       |        new_email
-- ----------------------+--------------------------
--  [email protected] | [email protected]
-- (1 row)

select * from iter1 union select * from iter2;
--       old_email       |        new_email
-- ----------------------+--------------------------
--  [email protected] | [email protected]
--  [email protected] | [email protected]
-- (2 rows)

As you can see the recursive code gives the result in right order, but the non-recursive code does not.
They both use union, why the difference?

like image 286
qed Avatar asked Aug 12 '15 21:08

qed


1 Answers

Basically, your query is incorrect to begin with. Use UNION ALL, not UNION or you would incorrectly remove duplicate entries. (There is nothing to say the trail cannot switch back and forth between the same emails.)

The Postgres implementation for UNION ALL returns values in the sequence as appended - as long as you do not add ORDER BY at the end or do anything else with the result.
Be aware though, that each SELECT returns rows in arbitrary order unless ORDER BY is appended. There is no natural order in tables.

The same is not true for UNION, which has to process all rows to remove possible duplicates. There are various ways to determine duplicates, the resulting order of rows depends on the chosen algorithm and is implementation-dependent and completely unreliable - unless, again, ORDER BY is appended.

So use instead:

SELECT * FROM iter1
UNION ALL  -- union all!
SELECT * FROM iter2;

To get a reliable sort order, and "simulate the record of growth", you can track levels like this:

WITH RECURSIVE all_emails AS (
   SELECT  *, 1 AS lvl
   FROM    audit_trail
   WHERE   old_email = '[email protected]'

   UNION ALL  -- union all!
   SELECT t.*, a.lvl + 1
   FROM   all_emails  a
   JOIN   audit_trail t ON t.old_email = a.new_email
)
TABLE  all_emails
ORDER  BY lvl;

db<>fiddle here
Old sqlfiddle

Aside: if old_email is not defined UNIQUE in some way, you can get multiple trails. You would need a unique column (or combination of columns) to keep it unambiguous. If all else fails you can (ab-)use the internal tuple ID ctid for the purpose of telling trails apart. But you should rather use your own columns. (Added example in the fiddle.)

  • In-order sequence generation

Consider:

  • How to return records in correct order in PostgreSQL
like image 83
Erwin Brandstetter Avatar answered Sep 19 '22 05:09

Erwin Brandstetter