Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excluding MYSQL query results with an INNER JOIN

I have two tables. The first is full of books each with a book_id. The second table is a book_id to keyword_id relationship table.

SELECT b.* FROM books_table b 
INNER JOIN keywords_table k 
ON b.book_id = k.book_id AND k.keyword_id NOT IN(1,2,3)
WHERE b.is_hardcover = 1 
GROUP BY b.book_id

Desired Outcome

No books with the keyword_id 1, 2, or 3 attached to any of the books.

Actual Outcome

Books can have the keywords 1, 2, or 3 so long as they have additional keyword_ids attached to them that are not in the exclusion list.

What I've tried

The above query is the closest I have come to achieving it, but it fails in this one regard.

How can I achieve the desired outcome and in the most optimized way?

like image 242
true Avatar asked Oct 05 '15 18:10

true


3 Answers

You can do so

SELECT b.* 
FROM books_table b 
INNER JOIN keywords_table k 
ON b.book_id = k.book_id
WHERE b.is_hardcover = 1 
GROUP BY b.book_id
HAVING SUM(k.keyword_id = 1) =0
AND SUM(k.keyword_id = 2) =0
AND SUM(k.keyword_id = 3) =0
like image 108
M Khalid Junaid Avatar answered Sep 23 '22 12:09

M Khalid Junaid


As you noted, this query will produce any book that has at least one keyword that isn't 1, 2 or 3, which isn't what you want. Instead, you'd want to explicitly exclude books with these keywords. A join isn't really the right took for the job here. Instead, you could use the exists operator:

SELECT b.* 
FROM   books_table b 
WHERE  b.is_hardcover = 1 AND
       NOT EXISTS (SELECT * 
                   FROM   keywords_table k 
                   WHERE  b.book_id = k.book_id AND 
                          k.keyword_id IN (1,2,3))
like image 44
Mureinik Avatar answered Sep 23 '22 12:09

Mureinik


What you are asking for is a flavor of "anti join". There are several ways to accomplish it; here's one:

SELECT b.* FROM books_table b 
LEFT JOIN keywords_table k 
  ON b.book_id = k.book_id AND k.keyword_id IN (1,2,3)
WHERE k.book_id IS NULL AND b.is_hardcover = 1 

The left join matches up each row from the left table (books_table) with those rows of the right table that satisfy the condition b.book_id = k.book_id AND k.keyword_id IN (1,2,3), and includes a single result row for each row of the left table that doesn't match any row of the right table. The filter condition k.book_id IS NULL conflicts with the join condition, so it can be satisfied only by those rows arising from a left row not matching any right row.

Note that the assignment of conditions to the join predicate and the filter predicate is critical with an outer join such as this one. Note also that there is no need for a GROUP BY clause in this case unless books_table may contain duplicate book_ids.

This approach is likely to perform better in practice than one based on a correlated subquery in the WHERE clause. If performance is important, however, then you would be well advised to test the alternatives you are considering.

like image 26
John Bollinger Avatar answered Sep 21 '22 12:09

John Bollinger