Why does the comparison of value
to null
return false, except when using a NOT IN
, where it returns true?
Given a query to find all stackoverflow users who have a post:
SELECT * FROM Users
WHERE UserID IN (SELECT UserID FROM Posts)
This works as expected; i get a list of all users who have a post.
Now query for the inverse; find all stackoverflow users who don't have a post:
SELECT * FROM Users
WHERE UserID NOT IN (SELECT UserID FROM Posts)
This returns no records, which is incorrect.
Given hypothetical data1
Users Posts
================ ===============================
UserID Username PostID UserID Subject
------ -------- ------- ------ ----------------
1 atkins 1 1 Welcome to stack ov...
2 joels 2 2 Welcome all!
... ... ... ...
399573 gt6989b ... ...
... ... ... ...
10592 null (deleted by nsl&fbi...
... ...
And assume the rules of NULLs:
NULL = NULL
evaluates to unknownNULL <> NULL
evaluates to unknownvalue = NULL
evaluates unknownIf we look at the 2nd query, we're interested in finding all rows where the Users.UserID is not found in the Posts.UserID column. i would proceed logically as follows:
Check UserID 1
1 = 1
returns true. So we conclude that this user has some posts, and do not include them in the output listNow check UserID 2:
2 = 1
returns false, so we keep looking2 = 2
returns true, so we conclude that this user has some posts, and do not include them in the output listNow check UserID 399573
399573 = 1
returns false, so we keep looking399573 = 2
returns false, so we keep looking399573 = null
returns unknown, so we keep lookingWe found no posts by UserID 399573, so we would include him in the output list.
Except SQL Server doesn't do this. If you have a NULL in your in
list, then suddenly it finds a match. It suddenly finds a match. Suddenly 399573 = null
evaluates to true.
Why does the comparison of value
to null
return unknown, except when it returns true?
Edit: i know that i can workaround this nonsensical behavior by specifically excluding the nulls:
SELECT * FROM Users
WHERE UserID NOT IN (
SELECT UserID FROM Posts
WHERE UserID IS NOT NULL)
But i shouldn't have to, as far as i can tell the boolean logic should be fine without it - hence my question.
SQL has three logical operators TRUE,FALSE, UNKNOWN. For comparison operators in SQL, NULL can be compared usingIS or IS NOT operator. SQL treats each NULL as a distinct value, so =,<,> can not beused for comparison.In general, NULL values are discarded when aggregate functions are applied to aparticular column.
1 Answer. For explanation: In MySQL, NULL is not comparable to other known values. It will result in a NULL when compared to any value.
To handle NULLs correctly, SQL provides two special comparison operators: IS NULL and IS NOT NULL. They return only true or false and are the best practice for incorporating NULL values into your queries.
It is not possible to test for NULL values with comparison operators, such as =, <, or <>. We will have to use the IS NULL and IS NOT NULL operators instead.
Common problem, canned answer:
The behavior of NOT IN clause may be confusing and as such it needs some explanations. Consider the following query:
SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)
Although there are more than a thousand distinct last names in AdventureWorks.Person.Contact, the query returns nothing. This may look counterintuitive to a beginner database programmer, but it actually makes perfect sense. The explanation consist of several simple steps. First of all, consider the following two queries, which are clearly equivalent:
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName IN('Hedlund', 'Holloway', NULL)
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName='Hedlund' OR LastName='Holloway' OR LastName=NULL
Note that both queries return expected results. Now, let us recall DeMorgan's theorem, which states that:
not (P and Q) = (not P) or (not Q)
not (P or Q) = (not P) and (not Q)
I am cutting and pasting from Wikipedia (http://en.wikipedia.org/wiki/De_Morgan_duality). Applying DeMorgan's theorem to this queries, it follows that these two queries are also equivalent:
SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName<>'Hedlund' AND LastName<>'Holloway' AND LastName<>NULL
This last LastName<>NULL can never be true
The assumption in your first sentence isn't right:
Why does the comparison of value to null return false, except when using a NOT IN, where it returns true?
But comparison of a value to null does not return false
; it returns unknown
. And unknown
has its own logic:
unknown AND true = unknown
unknown OR true = true
unknown OR false = unknown
One example of how this works out:
where 1 not in (2, null)
--> where 1 <> 2 and 1 <> null
--> where true and unknown
--> where unknown
The where
clause only matches on true
, so this filters out any row.
You can find the full glory of 3 value logic at Wikipedia.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With