Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does EXCEPT execute faster than a JOIN when the table columns are the same

To find all the changes between two databases, I am left joining the tables on the pk and using a date_modified field to choose the latest record. Will using EXCEPT increase performance since the tables have the same schema. I would like to rewrite it with an EXCEPT, but I'm not sure if the implementation for EXCEPT would out perform a JOIN in every case. Hopefully someone has a more technical explanation for when to use EXCEPT.

like image 213
Joseph Kiskaden Avatar asked Feb 04 '13 18:02

Joseph Kiskaden


People also ask

Which joins are the fastest?

Includes the matching rows as well as some of the non-matching rows between the two tables. In case there are a large number of rows in the tables and there is an index to use, INNER JOIN is generally faster than OUTER JOIN.

How do I speed up a join query?

1. Always reduce the data before any joins as much possible. 2. When joining, make sure smaller tables are on the left side of join syntax, which makes this data set to be in memory / broadcasted to all the vertica nodes and makes join faster.

What is difference between Except and left join in SQL?

EXCEPT is set operator that eliminates duplicates. LEFT JOIN is a type of join, that can actually produce duplicates. It is not unusual in SQL that two different things produce the same result set for a given set of input data.

Which is faster Left join or not exists?

But always go for Not Exists, most of the time it will perform much better,and the intent is clearer when using Not Exists .


2 Answers

There is no way anyone can tell you that EXCEPT will always or never out-perform an equivalent OUTER JOIN. The optimizer will choose an appropriate execution plan regardless of how you write your intent.

That said, here is my guideline:


Use EXCEPT when at least one of the following is true:

  1. The query is more readable (this will almost always be true).
  2. Performance is improved.

And BOTH of the following are true:

  1. The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
  2. Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).

It is important to note that it can be a challenge to write an equivalent EXCEPT query as the JOIN becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS equivalent, while slightly less readable than EXCEPT should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS or NEVER, except in the way I just did).

In this blog post I demonstrate at least one case where EXCEPT is outperformed by both a properly constructed LEFT OUTER JOIN and of course by an equivalent NOT EXISTS variation.

like image 71
Aaron Bertrand Avatar answered Sep 29 '22 11:09

Aaron Bertrand


In the following example, the LEFT JOIN is faster than EXCEPT by 70% (PostgreSQL 9.4.3)

Example:

There are three tables. suppliers, parts, shipments. We need to get all parts not supplied by any supplier in London.

Database(has indexes on all involved columns):

CREATE TABLE suppliers (
  id     bigint    primary key,
  city   character varying NOT NULL
);

CREATE TABLE parts (
  id     bigint    primary key,
  name   character varying NOT NULL,
);

CREATE TABLE shipments (
  id          bigint primary key,
  supplier_id bigint NOT NULL,
  part_id     bigint NOT NULL
);

Records count:

db=# SELECT COUNT(*) FROM suppliers;
  count
---------
 1281280
(1 row)

db=# SELECT COUNT(*) FROM parts;
  count
---------
 1280000
(1 row)

db=# SELECT COUNT(*) FROM shipments;
  count
---------
 1760161
(1 row)

Query using EXCEPT.

SELECT parts.*
  FROM parts

EXCEPT

SELECT parts.*
  FROM parts
  LEFT JOIN shipments
    ON (parts.id = shipments.part_id)
  LEFT JOIN suppliers
    ON (shipments.supplier_id = suppliers.id)
 WHERE suppliers.city = 'London'
;

-- Execution time: 3327.728 ms

Query using LEFT JOIN with table, returned by subquery.

SELECT parts.*
  FROM parts
  LEFT JOIN (
    SELECT parts.id
      FROM parts
      LEFT JOIN shipments
        ON (parts.id = shipments.part_id)
      LEFT JOIN suppliers
        ON (shipments.supplier_id = suppliers.id)
     WHERE suppliers.city = 'London'
  ) AS subquery_tbl
  ON (parts.id = subquery_tbl.id)
WHERE subquery_tbl.id IS NULL
;

-- Execution time: 1136.393 ms
like image 44
vrybas Avatar answered Sep 29 '22 11:09

vrybas