Given m2m relation: items-categories I have three tables:
I want to find an item belonging to all given category sets:
Find Item belonging to a category in [1,3,6] and belonging to a category in [7,8,4] and belonging to a category in [12,66,42] and ...
There are two ways I can think of to accomplish this in mySQL.
OPTION A: INNER JOIN:
SELECT id from items INNER JOIN category c1 ON (item.id = c1.item_id) INNER JOIN category c2 ON (item.id = c2.item_id) INNER JOIN category c3 ON (item.id = c3.item_id) ... WHERE c1.category_id IN [1,3,6] AND c2.category_id IN [7,8,4] AND c3.category_id IN [12,66,42] AND ...;
OPTION B: EXISTS:
SELECT id from items WHERE EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [1,3,6] AND EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [7,8,4] AND EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [12,66,42] AND ...;
Both options work. The question is: Which is the fastest / most optimal for large item table? Or is there an OPTION C I am missing?
In cases like above the Exists statement works faster than that of Joins. Exists will give you a single record and will save the time also. In case of joins the number of records will be more and all the records must be used.
If you do an inner join on a UNIQUE column, they exhibit same performance. If you do an inner join on a recordset with DISTINCT applied (to get rid of the duplicates), EXISTS is usually faster.
In case there are a large number of rows in the tables and there is an index to use, INNER JOIN is generally faster than OUTER JOIN. Generally, an OUTER JOIN is slower than an INNER JOIN as it needs to return more number of records when compared to INNER JOIN.
OPTION A
JOIN
has an advantage over EXIST
, because it will more efficiently use the indices, especially in case of large tables
A JOIN
is more efficient, generally speaking.
However, one thing to be aware of is that joins can produce duplicate rows in your output. For example, if item id was in category 1 and 3, the first JOIN
would result in two rows for id 123. If item id 999 was in categories 1,3,7,8,12, and 66, you would get eight rows for 999 in your results (2*2*2).
Duplicate rows are something you need to be aware of and handle. In this case, you could just use select distinct id...
. Eliminating duplicates can get more complicated with a complex query, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With