Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server IN vs. EXISTS Performance

People also ask

Which is faster in or EXISTS in SQL Server?

The EXISTS clause is much faster than IN when the subquery results is very large. Conversely, the IN clause is faster than EXISTS when the subquery results is very small. Also, the IN clause can't compare anything with NULL values, but the EXISTS clause can compare everything with NULLs.

Which operator is faster in or EXISTS operator?

When the subquery results are large, EXISTS operator provides better performance. In contrast, when the sub-query results are small, the IN operator is faster than EXISTS. IN operator always picks the matching values list, whereas EXISTS returns the Boolean values TRUE or FALSE.

Which is faster inner join or EXISTS?

If you do an inner join on a UNIQUE column, they exhibit same performance. If you do an inner join on a recordset with DISTINCT applied (to get rid of the duplicates), EXISTS is usually faster.

Is SQL EXISTS efficient?

Note: SQL Statements that use the SQL EXISTS Condition are very inefficient since the sub-query is RE-RUN for EVERY row in the outer query's table. This can be true for some database systems, but other database systems might be able to find a more efficient execution plan for such statements.


EXISTS will be faster because once the engine has found a hit, it will quit looking as the condition has proved true.

With IN, it will collect all the results from the sub-query before further processing.


The accepted answer is shortsighted and the question a bit loose in that:

1) Neither explicitly mention whether a covering index is present in the left, right, or both sides.

2) Neither takes into account the size of input left side set and input right side set.
(The question just mentions an overall large result set).

I believe the optimizer is smart enough to convert between "in" vs "exists" when there is a significant cost difference due to (1) and (2), otherwise it may just be used as a hint (e.g. exists to encourage use of an a seekable index on the right side).

Both forms can be converted to join forms internally, have the join order reversed, and run as loop, hash or merge--based on the estimated row counts (left and right) and index existence in left, right, or both sides.


I've done some testing on SQL Server 2005 and 2008, and on both the EXISTS and the IN come back with the exact same actual execution plan, as other have stated. The Optimizer is optimal. :)

Something to be aware of though, EXISTS, IN, and JOIN can sometimes return different results if you don't phrase your query just right: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx


I'd go with EXISTS over IN, see below link:

SQL Server: JOIN vs IN vs EXISTS - the logical difference

There is a common misconception that IN behaves equally to EXISTS or JOIN in terms of returned results. This is simply not true.

IN: Returns true if a specified value matches any value in a subquery or a list.

Exists: Returns true if a subquery contains any rows.

Join: Joins 2 resultsets on the joining column.

Blog credit: https://stackoverflow.com/users/31345/mladen-prajdic


There are many misleading answers answers here, including the highly upvoted one (although I don't believe their ops meant harm). The short answer is: These are the same.

There are many keywords in the (T-)SQL language, but in the end, the only thing that really happens on the hardware is the operations as seen in the execution query plan.

The relational (maths theory) operation we do when we invoke [NOT] IN and [NOT] EXISTS is the semi join (anti-join when using NOT). It is not a coincidence that the corresponding sql-server operations have the same name. There is no operation that mentions IN or EXISTS anywhere - only (anti-)semi joins. Thus, there is no way that a logically-equivalent IN vs EXISTS choice could affect performance because there is one and only way, the (anti)semi join execution operation, to get their results.

An example:

Query 1 ( plan )

select * from dt where dt.customer in (select c.code from customer c where c.active=0)

Query 2 ( plan )

select * from dt where exists (select 1 from customer c where c.code=dt.customer and c.active=0)

The execution plans are typically going to be identical in these cases, but until you see how the optimizer factors in all the other aspects of indexes etc., you really will never know.


So, IN is not the same as EXISTS nor it will produce the same execution plan.

Usually EXISTS is used in a correlated subquery, that means you will JOIN the EXISTS inner query with your outer query. That will add more steps to produce a result as you need to solve the outer query joins and the inner query joins then match their where clauses to join both.

Usually IN is used without correlating the inner query with the outer query, and that can be solved in only one step (in the best case scenario).

Consider this:

  1. If you use IN and the inner query result is millions of rows of distinct values, it will probably perform SLOWER than EXISTS given that the EXISTS query is performant (has the right indexes to join with the outer query).

  2. If you use EXISTS and the join with your outer query is complex (takes more time to perform, no suitable indexes) it will slow the query by the number of rows in the outer table, sometimes the estimated time to complete can be in days. If the number of rows is acceptable for your given hardware, or the cardinality of data is correct (for example fewer DISTINCT values in a large data set) IN can perform faster than EXISTS.

  3. All of the above will be noted when you have a fair amount of rows on each table (by fair I mean something that exceeds your CPU processing and/or ram thresholds for caching).

So the ANSWER is it DEPENDS. You can write a complex query inside IN or EXISTS, but as a rule of thumb, you should try to use IN with a limited set of distinct values and EXISTS when you have a lot of rows with a lot of distinct values.

The trick is to limit the number of rows to be scanned.

Regards,

MarianoC