NOT IN vs NOT EXISTS

Tags:

Which of these queries is the faster?

NOT EXISTS:

SELECT ProductID, ProductName  FROM Northwind..Products p WHERE NOT EXISTS (     SELECT 1      FROM Northwind..[Order Details] od      WHERE p.ProductId = od.ProductId)

Or NOT IN:

SELECT ProductID, ProductName  FROM Northwind..Products p WHERE p.ProductID NOT IN (     SELECT ProductID      FROM Northwind..[Order Details])

The query execution plan says they both do the same thing. If that is the case, which is the recommended form?

This is based on the NorthWind database.

[Edit]

Just found this helpful article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

I think I'll stick with NOT EXISTS.

661

asked Oct 06 '08 02:10

ilitirit

2 Answers

I always default to NOT EXISTS.

The execution plans may be the same at the moment but if either column is altered in the future to allow NULLs the NOT IN version will need to do more work (even if no NULLs are actually present in the data) and the semantics of NOT IN if NULLs are present are unlikely to be the ones you want anyway.

When neither Products.ProductID or [Order Details].ProductID allow NULLs the NOT IN will be treated identically to the following query.

SELECT ProductID,        ProductName FROM   Products p WHERE  NOT EXISTS (SELECT *                    FROM   [Order Details] od                    WHERE  p.ProductId = od.ProductId)

The exact plan may vary but for my example data I get the following.

Neither NULL

A reasonably common misconception seems to be that correlated sub queries are always "bad" compared to joins. They certainly can be when they force a nested loops plan (sub query evaluated row by row) but this plan includes an anti semi join logical operator. Anti semi joins are not restricted to nested loops but can use hash or merge (as in this example) joins too.

/*Not valid syntax but better reflects the plan*/  SELECT p.ProductID,        p.ProductName FROM   Products p        LEFT ANTI SEMI JOIN [Order Details] od          ON p.ProductId = od.ProductId

If [Order Details].ProductID is NULL-able the query then becomes

SELECT ProductID,        ProductName FROM   Products p WHERE  NOT EXISTS (SELECT *                    FROM   [Order Details] od                    WHERE  p.ProductId = od.ProductId)        AND NOT EXISTS (SELECT *                        FROM   [Order Details]                        WHERE  ProductId IS NULL)

The reason for this is that the correct semantics if [Order Details] contains any NULL ProductIds is to return no results. See the extra anti semi join and row count spool to verify this that is added to the plan.

One NULL

If Products.ProductID is also changed to become NULL-able the query then becomes

SELECT ProductID,        ProductName FROM   Products p WHERE  NOT EXISTS (SELECT *                    FROM   [Order Details] od                    WHERE  p.ProductId = od.ProductId)        AND NOT EXISTS (SELECT *                        FROM   [Order Details]                        WHERE  ProductId IS NULL)        AND NOT EXISTS (SELECT *                        FROM   (SELECT TOP 1 *                                FROM   [Order Details]) S                        WHERE  p.ProductID IS NULL)

The reason for that one is because a NULL Products.ProductId should not be returned in the results except if the NOT IN sub query were to return no results at all (i.e. the [Order Details] table is empty). In which case it should. In the plan for my sample data this is implemented by adding another anti semi join as below.

Both NULL

The effect of this is shown in the blog post already linked by Buckley. In the example there the number of logical reads increase from around 400 to 500,000.

Additionally the fact that a single NULL can reduce the row count to zero makes cardinality estimation very difficult. If SQL Server assumes that this will happen but in fact there were no NULL rows in the data the rest of the execution plan may be catastrophically worse, if this is just part of a larger query, with inappropriate nested loops causing repeated execution of an expensive sub tree for example.

This is not the only possible execution plan for a NOT IN on a NULL-able column however. This article shows another one for a query against the AdventureWorks2008 database.

For the NOT IN on a NOT NULL column or the NOT EXISTS against either a nullable or non nullable column it gives the following plan.

Not EXists

When the column changes to NULL-able the NOT IN plan now looks like

Not In - Null

It adds an extra inner join operator to the plan. This apparatus is explained here. It is all there to convert the previous single correlated index seek on Sales.SalesOrderDetail.ProductID = <correlated_product_id> to two seeks per outer row. The additional one is on WHERE Sales.SalesOrderDetail.ProductID IS NULL.

As this is under an anti semi join if that one returns any rows the second seek will not occur. However if Sales.SalesOrderDetail does not contain any NULL ProductIDs it will double the number of seek operations required.

answered Oct 12 '22 03:10

Martin Smith

Also be aware that NOT IN is not equivalent to NOT EXISTS when it comes to null.

This post explains it very well

http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

When the subquery returns even one null, NOT IN will not match any rows.

The reason for this can be found by looking at the details of what the NOT IN operation actually means.

Let’s say, for illustration purposes that there are 4 rows in the table called t, there’s a column called ID with values 1..4
WHERE SomeValue NOT IN (SELECT AVal FROM t) 
is equivalent to
WHERE SomeValue != (SELECT AVal FROM t WHERE ID=1) AND SomeValue != (SELECT AVal FROM t WHERE ID=2) AND SomeValue != (SELECT AVal FROM t WHERE ID=3) AND SomeValue != (SELECT AVal FROM t WHERE ID=4) 
Let’s further say that AVal is NULL where ID = 4. Hence that != comparison returns UNKNOWN. The logical truth table for AND states that UNKNOWN and TRUE is UNKNOWN, UNKNOWN and FALSE is FALSE. There is no value that can be AND’d with UNKNOWN to produce the result TRUE

Hence, if any row of that subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or NULL and no records will be returned

answered Oct 12 '22 03:10

buckley

Related questions
                            
                                How do I (or can I) SELECT DISTINCT on multiple columns?
                            
                                What is an index in SQL?
                            
                                Is it possible to specify condition in Count()?
                            
                                How to delete duplicate rows in SQL Server?
                            
                                Error related to only_full_group_by when executing a query in MySql
                            
                                Adding an identity to an existing column
                            
                                Selecting COUNT(*) with DISTINCT
                            
                                How do I split a string so I can access item x?
                            
                                MySQL select 10 random rows from 600K rows fast
                            
                                UPDATE and REPLACE part of a string
                            
                                MySQL: @variable vs. variable. What's the difference?
                            
                                java.util.Date vs java.sql.Date
                            
                                What is the best way to paginate results in SQL Server
                            
                                How to turn IDENTITY_INSERT on and off using SQL Server 2008?
                            
                                What is the most efficient/elegant way to parse a flat table into a tree?
                            
                                How to request a random row in SQL?
                            
                                Best approach to remove time part of datetime in SQL Server
                            
                                Is there a Max function in SQL Server that takes two values like Math.Max in .NET?
                            
                                Insert text with single quotes in PostgreSQL
                            
                                MySql export schema without data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NOT IN vs NOT EXISTS

Tags:

sql

sql-server

notin

ilitirit

People also ask

2 Answers

Martin Smith

buckley

Recent Activity

Donate For Us