I am currently working on a query, which searches books from a table based on its attributes. The table contains more than 50 million row has the following structure:
-----------------------
| book_id | attr_id |
-----------------------
| 2005207 | 35021 |
-----------------------
| 2005207 | 28106 |
-----------------------
| 2005207 | 27173 |
-----------------------
| 2005207 | 35109 |
-----------------------
| 2005207 | 34999 |
-----------------------
| 2005207 | 35107 |
-----------------------
| 2005207 | 35099 |
-----------------------
| 2005207 | 35105 |
-----------------------
| 2005207 | 28224 |
-----------------------
| ... | ..... |
-----------------------
The attribute column is representing attributes, such as binding, publishing year, genre and many more. The primary key is a compound key attr_id, book_id
One example query could be "Find all books, where genre is either comic or science fiction without hardcover".
SELECT sql_no_cache a.book_id
FROM
(SELECT book_id
FROM attribute_books ab
WHERE ab.attr_id IN (38571,
38576)) a
LEFT JOIN
(SELECT book_id
FROM attribute_books ab
WHERE ab.attr_id = 35003) b ON b.book_id = a.book_id
AND b.book_id IS NULL;
These kind of queries can be self joined multiple times and currently have a very poor performance. Instead of an inner join for IN statements and left joins for NOT IN statements, I could also use the intersect command, which is available in some SQL flavors.
I currently have the following questions:
Possibly the most efficient method is exists and not exists:
select b.*
from books b
where not exists (select 1
from attribute_books ab
where ab.attr_id in (38571, 38576) and b.book_id = ab.book_id
) and
exists (select 1
from attribute_books ab
where ab.attr_id = 35003 and b.book_id = ab.book_id
)
For this, you want an index on attribute_books(book_id, attr_id).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With