Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server: Why does comparison null=value return true for NOT IN?

Why does the comparison of value to null return false, except when using a NOT IN, where it returns true?


Given a query to find all stackoverflow users who have a post:

SELECT * FROM Users
WHERE UserID IN (SELECT UserID FROM Posts)

This works as expected; i get a list of all users who have a post.

Now query for the inverse; find all stackoverflow users who don't have a post:

SELECT * FROM Users
WHERE UserID NOT IN (SELECT UserID FROM Posts)

This returns no records, which is incorrect.

Given hypothetical data1

Users              Posts
================   ===============================
UserID  Username   PostID   UserID  Subject
------  --------   -------  ------  ----------------
1       atkins     1        1       Welcome to stack ov...
2       joels      2        2       Welcome all!
...     ...        ...      ...
399573  gt6989b    ...      ...
...     ...        ...      ...
                   10592    null    (deleted by nsl&fbi...
                   ...      ... 

And assume the rules of NULLs:

  • NULL = NULL evaluates to unknown
  • NULL <> NULL evaluates to unknown
  • value = NULL evaluates unknown

If we look at the 2nd query, we're interested in finding all rows where the Users.UserID is not found in the Posts.UserID column. i would proceed logically as follows:

Check UserID 1

  • 1 = 1 returns true. So we conclude that this user has some posts, and do not include them in the output list

Now check UserID 2:

  • 2 = 1 returns false, so we keep looking
  • 2 = 2 returns true, so we conclude that this user has some posts, and do not include them in the output list

Now check UserID 399573

  • 399573 = 1 returns false, so we keep looking
  • 399573 = 2 returns false, so we keep looking
  • ...
  • 399573 = null returns unknown, so we keep looking
  • ...

We found no posts by UserID 399573, so we would include him in the output list.

Except SQL Server doesn't do this. If you have a NULL in your in list, then suddenly it finds a match. It suddenly finds a match. Suddenly 399573 = null evaluates to true.

Why does the comparison of value to null return unknown, except when it returns true?

Edit: i know that i can workaround this nonsensical behavior by specifically excluding the nulls:

SELECT * FROM Users
WHERE UserID NOT IN (
   SELECT UserID FROM Posts
   WHERE UserID IS NOT NULL)

But i shouldn't have to, as far as i can tell the boolean logic should be fine without it - hence my question.

Footnotes

  • 1 hypothetical data; if you don't like it: make up your down.
  • celko now has his own tag
like image 957
Ian Boyd Avatar asked Oct 13 '10 14:10

Ian Boyd


People also ask

How NULLs are treated in comparison operators in SQL?

SQL has three logical operators TRUE,FALSE, UNKNOWN. For comparison operators in SQL, NULL can be compared usingIS or IS NOT operator. SQL treats each NULL as a distinct value, so =,<,> can not beused for comparison.In general, NULL values are discarded when aggregate functions are applied to aparticular column.

What does comparing a non value with NULL result into?

1 Answer. For explanation: In MySQL, NULL is not comparable to other known values. It will result in a NULL when compared to any value.

Can we use comparison operators with NULL values?

To handle NULLs correctly, SQL provides two special comparison operators: IS NULL and IS NOT NULL. They return only true or false and are the best practice for incorporating NULL values into your queries.

Can NULL value be compared in SQL?

It is not possible to test for NULL values with comparison operators, such as =, <, or <>. We will have to use the IS NULL and IS NOT NULL operators instead.


2 Answers

Common problem, canned answer:

The behavior of NOT IN clause may be confusing and as such it needs some explanations. Consider the following query:

SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)

Although there are more than a thousand distinct last names in AdventureWorks.Person.Contact, the query returns nothing. This may look counterintuitive to a beginner database programmer, but it actually makes perfect sense. The explanation consist of several simple steps. First of all, consider the following two queries, which are clearly equivalent:

SELECT LastName, FirstName FROM Person.Contact

WHERE LastName IN('Hedlund', 'Holloway', NULL)



SELECT LastName, FirstName FROM Person.Contact

WHERE LastName='Hedlund' OR LastName='Holloway' OR LastName=NULL

Note that both queries return expected results. Now, let us recall DeMorgan's theorem, which states that:

not (P and Q) = (not P) or (not Q)

not (P or Q) = (not P) and (not Q)

I am cutting and pasting from Wikipedia (http://en.wikipedia.org/wiki/De_Morgan_duality). Applying DeMorgan's theorem to this queries, it follows that these two queries are also equivalent:

SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)



SELECT LastName, FirstName FROM Person.Contact

WHERE LastName<>'Hedlund' AND LastName<>'Holloway' AND LastName<>NULL

This last LastName<>NULL can never be true

like image 200
A-K Avatar answered Oct 11 '22 04:10

A-K


The assumption in your first sentence isn't right:

Why does the comparison of value to null return false, except when using a NOT IN, where it returns true?

But comparison of a value to null does not return false; it returns unknown. And unknown has its own logic:

unknown  AND  true   = unknown
unknown  OR   true   = true
unknown  OR   false  = unknown

One example of how this works out:

where 1 not in (2, null)
--> where 1 <> 2 and 1 <> null
--> where true and unknown
--> where unknown

The where clause only matches on true, so this filters out any row.

You can find the full glory of 3 value logic at Wikipedia.

like image 22
Andomar Avatar answered Oct 11 '22 03:10

Andomar