To find all the changes between two databases, I am left joining the tables on the pk and using a date_modified field to choose the latest record. Will using EXCEPT
increase performance since the tables have the same schema. I would like to rewrite it with an EXCEPT
, but I'm not sure if the implementation for EXCEPT
would out perform a JOIN
in every case. Hopefully someone has a more technical explanation for when to use EXCEPT
.
Includes the matching rows as well as some of the non-matching rows between the two tables. In case there are a large number of rows in the tables and there is an index to use, INNER JOIN is generally faster than OUTER JOIN.
1. Always reduce the data before any joins as much possible. 2. When joining, make sure smaller tables are on the left side of join syntax, which makes this data set to be in memory / broadcasted to all the vertica nodes and makes join faster.
EXCEPT is set operator that eliminates duplicates. LEFT JOIN is a type of join, that can actually produce duplicates. It is not unusual in SQL that two different things produce the same result set for a given set of input data.
But always go for Not Exists, most of the time it will perform much better,and the intent is clearer when using Not Exists .
There is no way anyone can tell you that EXCEPT
will always or never out-perform an equivalent OUTER JOIN
. The optimizer will choose an appropriate execution plan regardless of how you write your intent.
That said, here is my guideline:
Use EXCEPT
when at least one of the following is true:
And BOTH of the following are true:
It is important to note that it can be a challenge to write an equivalent EXCEPT
query as the JOIN
becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS
equivalent, while slightly less readable than EXCEPT
should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS
or NEVER
, except in the way I just did).
In this blog post I demonstrate at least one case where EXCEPT
is outperformed by both a properly constructed LEFT OUTER JOIN
and of course by an equivalent NOT EXISTS
variation.
In the following example, the LEFT JOIN
is faster than EXCEPT
by 70%
(PostgreSQL 9.4.3)
Example:
There are three tables. suppliers
, parts
, shipments
.
We need to get all parts not supplied by any supplier in London.
Database(has indexes on all involved columns):
CREATE TABLE suppliers (
id bigint primary key,
city character varying NOT NULL
);
CREATE TABLE parts (
id bigint primary key,
name character varying NOT NULL,
);
CREATE TABLE shipments (
id bigint primary key,
supplier_id bigint NOT NULL,
part_id bigint NOT NULL
);
Records count:
db=# SELECT COUNT(*) FROM suppliers;
count
---------
1281280
(1 row)
db=# SELECT COUNT(*) FROM parts;
count
---------
1280000
(1 row)
db=# SELECT COUNT(*) FROM shipments;
count
---------
1760161
(1 row)
Query using EXCEPT
.
SELECT parts.*
FROM parts
EXCEPT
SELECT parts.*
FROM parts
LEFT JOIN shipments
ON (parts.id = shipments.part_id)
LEFT JOIN suppliers
ON (shipments.supplier_id = suppliers.id)
WHERE suppliers.city = 'London'
;
-- Execution time: 3327.728 ms
Query using LEFT JOIN
with table, returned by subquery.
SELECT parts.*
FROM parts
LEFT JOIN (
SELECT parts.id
FROM parts
LEFT JOIN shipments
ON (parts.id = shipments.part_id)
LEFT JOIN suppliers
ON (shipments.supplier_id = suppliers.id)
WHERE suppliers.city = 'London'
) AS subquery_tbl
ON (parts.id = subquery_tbl.id)
WHERE subquery_tbl.id IS NULL
;
-- Execution time: 1136.393 ms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With