I have heard a lot of people over the years say that: <blockquote> "join" operators are preferred over “NOT EXISTS” </blockquote> Why?

In <code>MySQL</code>, <code>Oracle</code>, <code>SQL Server</code> and <code>PostgreSQL</code>, <code>NOT EXISTS</code> is of the same efficiency or even more efficient than <code>LEFT JOIN / IS NULL</code>. While it may seem that "the inner query should be executed for each record from the outer query" (which seems to be bad for <code>NOT EXISTS</code> and even worse for <code>NOT IN</code>, since the latter query is not even correlated), it may be optimized just as well as all other queries are optimized, using appropriate <code>anti-join</code> methods. In <code>SQL Server</code>, actually, <code>LEFT JOIN / IS NULL</code> may be less efficient than <code>NOT EXISTS / NOT IN</code> in case of unindexed or low cardinality column in the inner table. It is often heard that <code>MySQL</code> is "especially bad in treating subqueries". This roots from the fact that <code>MySQL</code> is not capable of any join methods other than nested loops, which severely limits its optimization abilities. The only case when a query would benefit from rewriting subquery as a join would be this: <pre class="prettyprint"><code>SELECT * FROM big_table WHERE big_table_column IN ( SELECT small_table_column FROM small_table ) </code></pre> <code>small_table</code> will not be queried completely for each record in <code>big_table</code>: though it does not seem to be correlated, it will be implicitly correlated by the query optimizer and in fact rewritten to an <code>EXISTS</code> (using <code>index_subquery</code> to search for the first much if needed if <code>small_table_column</code> is indexed) But <code>big_table</code> would always be leading, which makes the query complete in <code>big * LOG(small)</code> rather than <code>small * LOG(big)</code> reads. This could be rewritten as <pre class="prettyprint"><code>SELECT DISTINCT bt.* FROM small_table st JOIN big_table bt ON bt.big_table_column = st.small_table_column </code></pre> However, this won't improve <code>NOT IN</code> (as opposed to <code>IN</code>). In <code>MySQL</code>, <code>NOT EXISTS</code> and <code>LEFT JOIN / IS NULL</code> are almost the same, since with nested loops the left table should always be leading in a <code>LEFT JOIN</code>. You may want to read these articles: <ul> <li>NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server</li> <li>NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL</li> <li>NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle</li> <li>NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL</li> <li>IN vs. JOIN vs. EXISTS: Oracle</li> <li> IN vs. JOIN vs. EXISTS (SQL Server) </li> </ul>

Is using “NOT EXISTS” considered to be bad SQL practise?

1 Answers

In MySQL, Oracle, SQL Server and PostgreSQL, NOT EXISTS is of the same efficiency or even more efficient than LEFT JOIN / IS NULL.

While it may seem that "the inner query should be executed for each record from the outer query" (which seems to be bad for NOT EXISTS and even worse for NOT IN, since the latter query is not even correlated), it may be optimized just as well as all other queries are optimized, using appropriate anti-join methods.

In SQL Server, actually, LEFT JOIN / IS NULL may be less efficient than NOT EXISTS / NOT IN in case of unindexed or low cardinality column in the inner table.

It is often heard that MySQL is "especially bad in treating subqueries".

This roots from the fact that MySQL is not capable of any join methods other than nested loops, which severely limits its optimization abilities.

The only case when a query would benefit from rewriting subquery as a join would be this:

SELECT  *
FROM    big_table
WHERE   big_table_column IN
        (
        SELECT  small_table_column
        FROM    small_table
        )

small_table will not be queried completely for each record in big_table: though it does not seem to be correlated, it will be implicitly correlated by the query optimizer and in fact rewritten to an EXISTS (using index_subquery to search for the first much if needed if small_table_column is indexed)

But big_table would always be leading, which makes the query complete in big * LOG(small) rather than small * LOG(big) reads.

This could be rewritten as

SELECT  DISTINCT bt.*
FROM    small_table st
JOIN    big_table bt
ON      bt.big_table_column = st.small_table_column

However, this won't improve NOT IN (as opposed to IN). In MySQL, NOT EXISTS and LEFT JOIN / IS NULL are almost the same, since with nested loops the left table should always be leading in a LEFT JOIN.

You may want to read these articles:

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
IN vs. JOIN vs. EXISTS: Oracle
IN vs. JOIN vs. EXISTS (SQL Server)

144

answered Sep 28 '22 01:09

Quassnoi

Related questions
                            
                                SQL Server sp_ExecuteSQL and Execution Plans
                            
                                Delete data from dependent tables
                            
                                GREATEST and LEAST in SQL standard
                            
                                microsoft sql server: check users own permissions
                            
                                Option Recompile makes query fast - good or bad?
                            
                                Opinions on sensor / reading / alert database design
                            
                                Query works with Oracle 10g but not with 11g?
                            
                                When using SELECT can you modify the value of a returned field based on other fields?
                            
                                SQL Server : subscription : how to know if a table is under replication/subscription
                            
                                Read VARBINARY(MAX) from SQL Server to C#
                            
                                How to import pipe delimited text file data to SQLServer table
                            
                                Deleting database records unpermenantley (soft-delete)
                            
                                Test a stored procedure in Microsoft Sql Server Management Studio
                            
                                Top N Per Group with Multiple Table Joins
                            
                                Use SQL to query javascript objects? [closed]
                            
                                Altering more than one column in a table in oracle
                            
                                When is it a good idea to move columns off a main table into an auxiliary table?
                            
                                Archiving large amounts of old data in SQL Server
                            
                                Hadoop Hive Query: Multi-join
                            
                                How to add two SUMs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is using “NOT EXISTS” considered to be bad SQL practise?

Tags:

sql

coding-style

Ian Ringrose

People also ask

1 Answers

Quassnoi

Recent Activity

Donate For Us