Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL WHERE NOT IN extremely slow

Below is a SQL statement inside a stored procedure (truncated for brevity):

SELECT * 
FROM item a 
WHERE a.orderId NOT IN (SELECT orderId FROM table_excluded_item);

This statement takes 30 seconds or so! But if I remove the inner SELECT query, it drops to 1s. table_excluded_item is not huge, but I suspect the inner query is being executed more than it needs to be.

Is there a more efficient way of doing this?

like image 280
pixelfreak Avatar asked Jan 05 '13 02:01

pixelfreak


People also ask

Why in query is slow in MySQL?

Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.

Which is faster in or not in SQL?

If you can write your query either way, IN is preferred as far as I'm concerned. Show activity on this post. Same for the other one, with 8 times = instead. So yes, the first one will be faster, less comparisons to be done.


Video Answer


2 Answers

use LEFT JOIN

SELECT  a.* 
FROM    item a 
        LEFT JOIN table_excluded_item b
            ON a.orderId = b.orderId
WHERE   b.orderId IS NULL

make sure that orderId from both tables has been indexed.

like image 141
John Woo Avatar answered Sep 18 '22 12:09

John Woo


The problem with the left join approach is that duplicate records might be processed in generating the output. Sometimes, this is not the case . . . according to this article, MySQL does optimize the left outer join correctly when the columns are indexed, even in the presence of duplicates. I admit to remaining skeptical, though, that this optimization always happens.

MySQL sometimes has problems optimizing IN statements with a subquery. The best fix is a correlated subquery:

SELECT * 
FROM item a 
WHERE not exists (select 1
                  from table_excluded_item tei
                  where tei.orderid = a.orderid
                  limit 1
                 )

If you have an index on table_excluded_item.orderid, then this will scan the index and stop at the first value (the limit 1 may not be strictly necessary for this). This is the fastest and safest way to implement what you want in MySQL.

like image 36
Gordon Linoff Avatar answered Sep 19 '22 12:09

Gordon Linoff