I wanted to run the following query:
-- Main Query
SELECT COUNT(*) FROM table_name WHERE device_id IN
(SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA')
This following query (sub query from Main Query):
SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA'
executes in 7 seconds, giving 2691 rows from a table of 2.1M rows.
I fired the Main Query above and it is still executing after 5 mins+ of waiting.
Finally, I executed the sub query separately, took the 2691 records from the result, executed the following query:
-- Main Query (improvised)
SELECT COUNT(*) FROM table_name WHERE device_id IN
("device_id_1", "device_id_2", ....., "device_id_2691")
Surprisingly, this gave me an answer within 40 seconds.
What gives? Why doesn't MySQL use the same technique that I used and give an answer quickly? Am I doing something wrong?
Unfortunately, MySQL is not very good at optimizing subqueries with IN. This is from MySQL documentation:
Subquery optimization for IN is not as effective as for the = operator or for the IN(value_list) operator.
A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.
An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.
Try using a JOIN instead.
Because MySQL works from the inside out, sometimes you can trick MySQL by wrapping the subquery inside yet another subquery like so:
SELECT COUNT(*) FROM table_name WHERE device_id IN
(SELECT * FROM (SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA') tmp)
Here's the JOIN solution:
SELECT COUNT(DISTINCT t2.id) FROM table_name t1
JOIN table_name t2
ON t2.device_id = t1.device_id
WHERE t1.NAME = 'SOME_PARA'
Notice that I start from the inside and go out also.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With