Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it better to use INNER JOIN or EXISTS to find belonging to several in m2m relation?

Tags:

Given m2m relation: items-categories I have three tables:

  • items,
  • categories and
  • items_categories that hold references to both

I want to find an item belonging to all given category sets:

Find Item  belonging to a category in [1,3,6]  and belonging to a category in [7,8,4]  and belonging to a category in [12,66,42] and ... 

There are two ways I can think of to accomplish this in mySQL.

OPTION A: INNER JOIN:

SELECT id from items  INNER JOIN category c1 ON (item.id = c1.item_id) INNER JOIN category c2 ON (item.id = c2.item_id) INNER JOIN category c3 ON (item.id = c3.item_id) ... WHERE c1.category_id IN [1,3,6] AND c2.category_id IN [7,8,4] AND c3.category_id IN [12,66,42] AND ...; 

OPTION B: EXISTS:

SELECT id from items WHERE EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [1,3,6] AND EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [7,8,4] AND EXISTS(SELECT category_id FROM category WHERE category.item_id = id AND category_id in [12,66,42] AND ...; 

Both options work. The question is: Which is the fastest / most optimal for large item table? Or is there an OPTION C I am missing?

like image 381
Roman Semko Avatar asked Oct 25 '12 07:10

Roman Semko


People also ask

Which is better exists or join?

In cases like above the Exists statement works faster than that of Joins. Exists will give you a single record and will save the time also. In case of joins the number of records will be more and all the records must be used.

Which is faster inner join or exists?

If you do an inner join on a UNIQUE column, they exhibit same performance. If you do an inner join on a recordset with DISTINCT applied (to get rid of the duplicates), EXISTS is usually faster.

Are inner joins more efficient?

In case there are a large number of rows in the tables and there is an index to use, INNER JOIN is generally faster than OUTER JOIN. Generally, an OUTER JOIN is slower than an INNER JOIN as it needs to return more number of records when compared to INNER JOIN.


2 Answers

OPTION A

JOIN has an advantage over EXIST , because it will more efficiently use the indices, especially in case of large tables

like image 109
Joe G Joseph Avatar answered Nov 02 '22 04:11

Joe G Joseph


A JOIN is more efficient, generally speaking.

However, one thing to be aware of is that joins can produce duplicate rows in your output. For example, if item id was in category 1 and 3, the first JOIN would result in two rows for id 123. If item id 999 was in categories 1,3,7,8,12, and 66, you would get eight rows for 999 in your results (2*2*2).

Duplicate rows are something you need to be aware of and handle. In this case, you could just use select distinct id.... Eliminating duplicates can get more complicated with a complex query, though.

like image 33
dan1111 Avatar answered Nov 02 '22 04:11

dan1111