Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the IN predicate work in SQL?

After prepairing an answer for this question I found I couldn't verify my answer.

In my first programming job I was told that a query within the IN () predicate gets executed for every row contained in the parent query, and therefore using IN should be avoided.

For example, given the query:

SELECT count(*) FROM Table1 WHERE Table1Id NOT IN (
SELECT Table1Id FROM Table2 WHERE id_user = 1)
Table1 Rows | # of "IN" executions
----------------------------------
      10    |       10
     100    |      100
    1000    |     1000
   10000    |    10000

Is this correct? How does the IN predicate actually work?

like image 859
Gavin Miller Avatar asked Apr 17 '09 16:04

Gavin Miller


2 Answers

The warning you got about subqueries executing for each row is true -- for correlated subqueries.

SELECT COUNT(*) FROM Table1 a 
WHERE a.Table1id NOT IN (
  SELECT b.Table1Id FROM Table2 b WHERE b.id_user = a.id_user
);

Note that the subquery references the id_user column of the outer query. The value of id_user on each row of Table1 may be different. So the subquery's result will likely be different, depending on the current row in the outer query. The RDBMS must execute the subquery many times, once for each row in the outer query.

The example you tested is a non-correlated subquery. Most modern RDBMS optimizers worth their salt should be able to tell when the subquery's result doesn't depend on the values in each row of the outer query. In that case, the RDBMS runs the subquery a single time, caches its result, and uses it repeatedly for the predicate in the outer query.

PS: In SQL, IN() is called a "predicate," not a statement. A predicate is a part of the language that evaluates to either true or false, but cannot necessarily be executed independently as a statement. That is, you can't just run this as an SQL query: "2 IN (1,2,3);" Although this is a valid predicate, it's not a valid statement.

like image 101
Bill Karwin Avatar answered Nov 06 '22 16:11

Bill Karwin


It will entirely depend on the database you're using, and the exact query.

Query optimisers are very smart at times - in your sample query, I'd expect the better databases to be able to use the same sort of techniques that they do with a join. More naive databases may just execute the same query many times.

like image 8
Jon Skeet Avatar answered Nov 06 '22 15:11

Jon Skeet