Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Single table SELF JOIN alternatives / except / intersect

I am currently working on a query, which searches books from a table based on its attributes. The table contains more than 50 million row has the following structure:

-----------------------
| book_id | attr_id   |
-----------------------
| 2005207 | 35021     |
-----------------------
| 2005207 | 28106     |
-----------------------
| 2005207 | 27173     |
-----------------------
| 2005207 | 35109     |
-----------------------
| 2005207 | 34999     |
-----------------------
| 2005207 | 35107     |
-----------------------
| 2005207 | 35099     |
-----------------------
| 2005207 | 35105     |
-----------------------
| 2005207 | 28224     |
-----------------------
| ...     | .....     |    
-----------------------

The attribute column is representing attributes, such as binding, publishing year, genre and many more. The primary key is a compound key attr_id, book_id

One example query could be "Find all books, where genre is either comic or science fiction without hardcover".

SELECT sql_no_cache a.book_id
FROM
  (SELECT book_id
   FROM attribute_books ab
   WHERE ab.attr_id IN (38571,
                        38576)) a
LEFT JOIN
  (SELECT book_id
   FROM attribute_books ab
   WHERE ab.attr_id = 35003) b ON b.book_id = a.book_id
AND b.book_id IS NULL;

These kind of queries can be self joined multiple times and currently have a very poor performance. Instead of an inner join for IN statements and left joins for NOT IN statements, I could also use the intersect command, which is available in some SQL flavors.

I currently have the following questions:

  1. Is this the most efficient kind of queries for similar queries. If not, are there any suggestions for speeding this up?
  2. Should I switch to an entirely different type of database / engine, such as for more efficient (faster) queries?
like image 624
kiessan Avatar asked Apr 18 '26 07:04

kiessan


1 Answers

Possibly the most efficient method is exists and not exists:

select b.*
from books b
where not exists (select 1
                  from attribute_books ab
                  where ab.attr_id in (38571, 38576) and b.book_id = ab.book_id
                 ) and
      exists (select 1
              from attribute_books ab
              where ab.attr_id = 35003 and b.book_id = ab.book_id
             )

For this, you want an index on attribute_books(book_id, attr_id).

like image 144
Gordon Linoff Avatar answered Apr 19 '26 19:04

Gordon Linoff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!