I have two tables. The first is full of books each with a book_id
. The second table is a book_id
to keyword_id
relationship table.
SELECT b.* FROM books_table b
INNER JOIN keywords_table k
ON b.book_id = k.book_id AND k.keyword_id NOT IN(1,2,3)
WHERE b.is_hardcover = 1
GROUP BY b.book_id
No books with the keyword_id 1, 2, or 3 attached to any of the books.
Books can have the keywords 1, 2, or 3 so long as they have additional keyword_ids attached to them that are not in the exclusion list.
The above query is the closest I have come to achieving it, but it fails in this one regard.
How can I achieve the desired outcome and in the most optimized way?
You can do so
SELECT b.*
FROM books_table b
INNER JOIN keywords_table k
ON b.book_id = k.book_id
WHERE b.is_hardcover = 1
GROUP BY b.book_id
HAVING SUM(k.keyword_id = 1) =0
AND SUM(k.keyword_id = 2) =0
AND SUM(k.keyword_id = 3) =0
As you noted, this query will produce any book that has at least one keyword that isn't 1, 2 or 3, which isn't what you want. Instead, you'd want to explicitly exclude books with these keywords. A join
isn't really the right took for the job here. Instead, you could use the exists
operator:
SELECT b.*
FROM books_table b
WHERE b.is_hardcover = 1 AND
NOT EXISTS (SELECT *
FROM keywords_table k
WHERE b.book_id = k.book_id AND
k.keyword_id IN (1,2,3))
What you are asking for is a flavor of "anti join". There are several ways to accomplish it; here's one:
SELECT b.* FROM books_table b
LEFT JOIN keywords_table k
ON b.book_id = k.book_id AND k.keyword_id IN (1,2,3)
WHERE k.book_id IS NULL AND b.is_hardcover = 1
The left join matches up each row from the left table (books_table
) with those rows of the right table that satisfy the condition b.book_id = k.book_id AND k.keyword_id IN (1,2,3)
, and includes a single result row for each row of the left table that doesn't match any row of the right table. The filter condition k.book_id IS NULL
conflicts with the join condition, so it can be satisfied only by those rows arising from a left row not matching any right row.
Note that the assignment of conditions to the join predicate and the filter predicate is critical with an outer join such as this one. Note also that there is no need for a GROUP BY
clause in this case unless books_table
may contain duplicate book_id
s.
This approach is likely to perform better in practice than one based on a correlated subquery in the WHERE
clause. If performance is important, however, then you would be well advised to test the alternatives you are considering.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With