Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does MySQL optimize subqueries automatically?

Tags:

sql

mysql

I wanted to run the following query:

-- Main Query    
SELECT COUNT(*) FROM table_name WHERE device_id IN 
     (SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA')

This following query (sub query from Main Query):

SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA'

executes in 7 seconds, giving 2691 rows from a table of 2.1M rows.

I fired the Main Query above and it is still executing after 5 mins+ of waiting.

Finally, I executed the sub query separately, took the 2691 records from the result, executed the following query:

-- Main Query (improvised)    
SELECT COUNT(*) FROM table_name WHERE device_id IN 
     ("device_id_1", "device_id_2", ....., "device_id_2691")

Surprisingly, this gave me an answer within 40 seconds.

What gives? Why doesn't MySQL use the same technique that I used and give an answer quickly? Am I doing something wrong?

like image 882
saurabhj Avatar asked Aug 02 '12 17:08

saurabhj


1 Answers

Unfortunately, MySQL is not very good at optimizing subqueries with IN. This is from MySQL documentation:

Subquery optimization for IN is not as effective as for the = operator or for the IN(value_list) operator.

A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.

The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:

SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);

The optimizer rewrites the statement to a correlated subquery:

SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);

If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.

An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.

Try using a JOIN instead.

Because MySQL works from the inside out, sometimes you can trick MySQL by wrapping the subquery inside yet another subquery like so:

SELECT COUNT(*) FROM table_name WHERE device_id IN
     (SELECT * FROM (SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA') tmp)

Here's the JOIN solution:

SELECT COUNT(DISTINCT t2.id) FROM table_name t1
  JOIN table_name t2
    ON t2.device_id = t1.device_id
  WHERE t1.NAME = 'SOME_PARA'

Notice that I start from the inside and go out also.

like image 60
Marcus Adams Avatar answered Sep 28 '22 22:09

Marcus Adams