Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql UNION takes 10 times as long as running the individual queries

I am trying to get the diff between two nearly identical tables in postgresql. The current query I am running is:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB;

and

SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

Each of the above queries takes about 2 minutes to run (Its a large table)

I wanted to combine the two queries in hopes to save time, so I tried:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

And while it works, it takes 20 minutes to run!!! I would guess that it would at most take 4 minutes, the amount of time to run each query individually.

Is there some extra work UNION is doing that is making it take so long? Or is there any way I can speed this up (with or without the UNION)?

UPDATE: Running the query with UNION ALL takes 15 minutes, almost 4 times as long as running each one on its own, Am I correct in saying that UNION (all) is not going to speed this up at all?

like image 621
lanrat Avatar asked Jun 13 '11 23:06

lanrat


2 Answers

With regards to your "extra work" question. Yes. Union not only combines the two queries but also goes through and removes duplicates. It's the same as using a distinct statement.

For this reason, especially combined with your except statements "union all" would likely be faster.

Read more here: http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html

like image 77
RThomas Avatar answered Oct 25 '22 10:10

RThomas


In addition to combining the results of the first and second query, UNION by default also removes duplicate records. (see http://www.postgresql.org/docs/8.1/static/sql-select.html). The extra work involved in checking for duplicate records between the two queries is probably responsible for the extra time. In this situation there should not be any duplicate records so the extra work looking for duplicates can be avoided by specifying UNION ALL.

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
like image 20
dave Avatar answered Oct 25 '22 10:10

dave