Below is a SQL statement inside a stored procedure (truncated for brevity):
SELECT *
FROM item a
WHERE a.orderId NOT IN (SELECT orderId FROM table_excluded_item);
This statement takes 30 seconds or so! But if I remove the inner SELECT query, it drops to 1s. table_excluded_item
is not huge, but I suspect the inner query is being executed more than it needs to be.
Is there a more efficient way of doing this?
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
If you can write your query either way, IN is preferred as far as I'm concerned. Show activity on this post. Same for the other one, with 8 times = instead. So yes, the first one will be faster, less comparisons to be done.
use LEFT JOIN
SELECT a.*
FROM item a
LEFT JOIN table_excluded_item b
ON a.orderId = b.orderId
WHERE b.orderId IS NULL
make sure that orderId
from both tables has been indexed.
The problem with the left join approach is that duplicate records might be processed in generating the output. Sometimes, this is not the case . . . according to this article, MySQL does optimize the left outer join
correctly when the columns are indexed, even in the presence of duplicates. I admit to remaining skeptical, though, that this optimization always happens.
MySQL sometimes has problems optimizing IN
statements with a subquery. The best fix is a correlated subquery:
SELECT *
FROM item a
WHERE not exists (select 1
from table_excluded_item tei
where tei.orderid = a.orderid
limit 1
)
If you have an index on table_excluded_item.orderid, then this will scan the index and stop at the first value (the limit 1
may not be strictly necessary for this). This is the fastest and safest way to implement what you want in MySQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With