Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NOT IN (subquery) producing zero rows

I keep finding examples where processes that utilise the formulation

WHERE a NOT IN (SELECT b FROM x)

are just returning zero rows which seems wrong. If I change NOT IN to IN it doesn't return the inverse (all rows), in fact it leaves out all the rows where a is not null. This has started happening in overnight routines that have run fine for years and haven't been changed. It feels like there's a bug in SQL Server.

I can fix the problem by reformulating to

LEFT JOIN x on a = b
WHERE b IS NULL

but it's not something I want to have to live with.

Could it be something to do with statistics? A known bug? In the most recent example the table in the subquery is on a remote linked server though I'm not sure this has been the case every time I've seen it.

like image 492
Adamantish Avatar asked Oct 23 '13 15:10

Adamantish


1 Answers

If b is nullable, this is not a bug. The problem is that SQL Server turns NOT IN into a series of <> 1 AND <> 2 AND <> 3 etc. If you have <> NULL, that returns unknown, which in this case means false. In different scenarios this can qualify or disqualify ALL rows. Rather than the LEFT JOIN approach, you should say:

FROM dbo.OuterTable AS t
WHERE NOT EXISTS (SELECT 1 FROM x WHERE b = t.a);

Here is a quick demonstration:

DECLARE @x TABLE(i INT);
INSERT @x VALUES(1),(2);

DECLARE @y TABLE(j INT);
INSERT @y VALUES(2),(NULL);

SELECT i FROM @x WHERE i NOT IN -- produces zero results
  (SELECT j FROM @y);

SELECT i FROM @x  AS x WHERE NOT EXISTS -- produces one result
  (SELECT 1 FROM @y WHERE j = x.i);

For a lot more details (and metrics to prove why NOT EXISTS is the best alternative):

http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join

Also, please read this blog post by Gail Shaw:

http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

like image 64
Aaron Bertrand Avatar answered Oct 16 '22 08:10

Aaron Bertrand