I have heard a lot of people over the years say that:
"join" operators are preferred over “NOT EXISTS”
Why?
NOT EXISTS is used with a subquery in the WHERE clause to check if the result of the subquery returns TRUE or FALSE. The Boolean value is then used to narrow down the rows from the outer select statement.
Using Joins Instead of IN or EXISTS An alternative for IN and EXISTS is an INNER JOIN, while a LEFT OUTER JOIN with a WHERE clause checking for NULL values can be used as an alternative for NOT IN and NOT EXISTS.
The most important thing to note about NOT EXISTS and NOT IN is that, unlike EXISTS and IN, they are not equivalent in all cases. Specifically, when NULLs are involved they will return different results. To be totally specific, when the subquery returns even one null, NOT IN will not match any rows.
If so, you should consider using a NOT EXISTS operator instead of NOT IN, or recast the statement as a left outer join.
In MySQL
, Oracle
, SQL Server
and PostgreSQL
, NOT EXISTS
is of the same efficiency or even more efficient than LEFT JOIN / IS NULL
.
While it may seem that "the inner query should be executed for each record from the outer query" (which seems to be bad for NOT EXISTS
and even worse for NOT IN
, since the latter query is not even correlated), it may be optimized just as well as all other queries are optimized, using appropriate anti-join
methods.
In SQL Server
, actually, LEFT JOIN / IS NULL
may be less efficient than NOT EXISTS / NOT IN
in case of unindexed or low cardinality column in the inner table.
It is often heard that MySQL
is "especially bad in treating subqueries".
This roots from the fact that MySQL
is not capable of any join methods other than nested loops, which severely limits its optimization abilities.
The only case when a query would benefit from rewriting subquery as a join would be this:
SELECT *
FROM big_table
WHERE big_table_column IN
(
SELECT small_table_column
FROM small_table
)
small_table
will not be queried completely for each record in big_table
: though it does not seem to be correlated, it will be implicitly correlated by the query optimizer and in fact rewritten to an EXISTS
(using index_subquery
to search for the first much if needed if small_table_column
is indexed)
But big_table
would always be leading, which makes the query complete in big * LOG(small)
rather than small * LOG(big)
reads.
This could be rewritten as
SELECT DISTINCT bt.*
FROM small_table st
JOIN big_table bt
ON bt.big_table_column = st.small_table_column
However, this won't improve NOT IN
(as opposed to IN
). In MySQL
, NOT EXISTS
and LEFT JOIN / IS NULL
are almost the same, since with nested loops the left table should always be leading in a LEFT JOIN
.
You may want to read these articles:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With