Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is using “NOT EXISTS” considered to be bad SQL practise?

I have heard a lot of people over the years say that:

"join" operators are preferred over “NOT EXISTS”

Why?

like image 469
Ian Ringrose Avatar asked Jul 21 '11 14:07

Ian Ringrose


People also ask

Can we use not EXISTS in SQL?

NOT EXISTS is used with a subquery in the WHERE clause to check if the result of the subquery returns TRUE or FALSE. The Boolean value is then used to narrow down the rows from the outer select statement.

What to use instead of not EXISTS?

Using Joins Instead of IN or EXISTS An alternative for IN and EXISTS is an INNER JOIN, while a LEFT OUTER JOIN with a WHERE clause checking for NULL values can be used as an alternative for NOT IN and NOT EXISTS.

When to use not in and not EXISTS in SQL?

The most important thing to note about NOT EXISTS and NOT IN is that, unlike EXISTS and IN, they are not equivalent in all cases. Specifically, when NULLs are involved they will return different results. To be totally specific, when the subquery returns even one null, NOT IN will not match any rows.

What can I use instead of not in SQL?

If so, you should consider using a NOT EXISTS operator instead of NOT IN, or recast the statement as a left outer join.


1 Answers

In MySQL, Oracle, SQL Server and PostgreSQL, NOT EXISTS is of the same efficiency or even more efficient than LEFT JOIN / IS NULL.

While it may seem that "the inner query should be executed for each record from the outer query" (which seems to be bad for NOT EXISTS and even worse for NOT IN, since the latter query is not even correlated), it may be optimized just as well as all other queries are optimized, using appropriate anti-join methods.

In SQL Server, actually, LEFT JOIN / IS NULL may be less efficient than NOT EXISTS / NOT IN in case of unindexed or low cardinality column in the inner table.

It is often heard that MySQL is "especially bad in treating subqueries".

This roots from the fact that MySQL is not capable of any join methods other than nested loops, which severely limits its optimization abilities.

The only case when a query would benefit from rewriting subquery as a join would be this:

SELECT  *
FROM    big_table
WHERE   big_table_column IN
        (
        SELECT  small_table_column
        FROM    small_table
        )

small_table will not be queried completely for each record in big_table: though it does not seem to be correlated, it will be implicitly correlated by the query optimizer and in fact rewritten to an EXISTS (using index_subquery to search for the first much if needed if small_table_column is indexed)

But big_table would always be leading, which makes the query complete in big * LOG(small) rather than small * LOG(big) reads.

This could be rewritten as

SELECT  DISTINCT bt.*
FROM    small_table st
JOIN    big_table bt
ON      bt.big_table_column = st.small_table_column

However, this won't improve NOT IN (as opposed to IN). In MySQL, NOT EXISTS and LEFT JOIN / IS NULL are almost the same, since with nested loops the left table should always be leading in a LEFT JOIN.

You may want to read these articles:

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
  • IN vs. JOIN vs. EXISTS: Oracle
  • IN vs. JOIN vs. EXISTS (SQL Server)
like image 144
Quassnoi Avatar answered Sep 28 '22 01:09

Quassnoi