I keep finding examples where processes that utilise the formulation
WHERE a NOT IN (SELECT b FROM x)
are just returning zero rows which seems wrong. If I change NOT IN to IN it doesn't return the inverse (all rows), in fact it leaves out all the rows where a is not null. This has started happening in overnight routines that have run fine for years and haven't been changed. It feels like there's a bug in SQL Server.
I can fix the problem by reformulating to
LEFT JOIN x on a = b
WHERE b IS NULL
but it's not something I want to have to live with.
Could it be something to do with statistics? A known bug? In the most recent example the table in the subquery is on a remote linked server though I'm not sure this has been the case every time I've seen it.
If b
is nullable, this is not a bug. The problem is that SQL Server turns NOT IN
into a series of <> 1 AND <> 2 AND <> 3
etc. If you have <> NULL
, that returns unknown, which in this case means false. In different scenarios this can qualify or disqualify ALL rows. Rather than the LEFT JOIN
approach, you should say:
FROM dbo.OuterTable AS t
WHERE NOT EXISTS (SELECT 1 FROM x WHERE b = t.a);
Here is a quick demonstration:
DECLARE @x TABLE(i INT);
INSERT @x VALUES(1),(2);
DECLARE @y TABLE(j INT);
INSERT @y VALUES(2),(NULL);
SELECT i FROM @x WHERE i NOT IN -- produces zero results
(SELECT j FROM @y);
SELECT i FROM @x AS x WHERE NOT EXISTS -- produces one result
(SELECT 1 FROM @y WHERE j = x.i);
For a lot more details (and metrics to prove why NOT EXISTS
is the best alternative):
http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join
Also, please read this blog post by Gail Shaw:
http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With