This seems so basic, I'm flabbergasted for lack of a better word. I have two tables, let's call them albums
and artists
CREATE TABLE `albums` (
`album_id` bigint(20) NOT NULL AUTO_INCREMENT,
`artist_id` bigint(20) DEFAULT NULL,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`album_id`)
)
CREATE TABLE `artists` (
`artist_id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(250) NOT NULL,
PRIMARY KEY (`artist_id`)
)
There are a few hundred thousand reconds in each table. Some of the album rows have a null artist_id
, this is expected.
However, when I perform the following query to find artists without albums:
SELECT * FROM artists WHERE artist_id NOT IN (SELECT artist_id FROM albums)
... the query returns zero results. I know that this is not true. So I tried this one:
SELECT * FROM artists WHERE artist_id NOT IN (SELECT artist_id FROM albums WHERE artist_id IS NOT NULL)
... and I get back a couple thousand rows. My question is: Why did the first query seem to operate on the idea that any number = NULL? Or is this an odd effect that NULL has on the IN()
statement? I feel like this is something basic that I've missed. I don't usually use NULL in my db tables at all.
This is why NOT EXISTS
is semantically correct
SELECT * FROM artists ar
WHERE NOT EXISTS
(SELECT * FROM albums al WHERE ar.artist_id = al.artist_id)
Logic:
NOT IN (x, y, NULL)
is actually
NOT (x OR y OR NULL)
is actually
(NOT x) AND (NOT y) AND (NOT NULL)
So NULL
invalidates the whole NOT IN
Quick answer - the IN
statement is a shortcut for =a OR =b OR ...
. If you include nulls in this list, then I think that is breaking the statement. Your second option is probably a better option.
Or using a join might also work, and be more efficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With