Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - SELECT WHERE field IN (subquery) - Extremely slow why?

People also ask

Why is subquery slow?

For multiple-table subqueries, execution of NULL IN (SELECT ...) is particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL .

Which is faster subquery or function?

using function (included that subquery) has better performance, when you define a function, the function will not run while calling the function.

Are subqueries faster than separate queries?

For subqueries and joins, the data needs to be combined. Small amounts can easily be combined in memory, but if the data gets bigger, then it might not fit, causing the need to swap temporary data to disk, degrading performance. So, there is no general rule to say which one is faster.


The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:

SELECT * FROM
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
) AS subquery

The final query would look like this:

SELECT *
FROM some_table
WHERE relevant_field IN
(
    SELECT * FROM
    (
        SELECT relevant_field
        FROM some_table
        GROUP BY relevant_field
        HAVING COUNT(*) > 1
    ) AS subquery
)

Rewrite the query into this

SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id  /* list a unique sometable field here*/
HAVING COUNT(*) > 1

I think st2.relevant_field must be in the select, because otherwise the having clause will give an error, but I'm not 100% sure

Never use IN with a subquery; this is notoriously slow.
Only ever use IN with a fixed list of values.

More tips

  1. If you want to make queries faster, don't do a SELECT * only select the fields that you really need.
  2. Make sure you have an index on relevant_field to speed up the equi-join.
  3. Make sure to group by on the primary key.
  4. If you are on InnoDB and you only select indexed fields (and things are not too complex) than MySQL will resolve your query using only the indexes, speeding things way up.

General solution for 90% of your IN (select queries

Use this code

SELECT * FROM sometable a WHERE EXISTS (
  SELECT 1 FROM sometable b
  WHERE a.relevant_field = b.relevant_field
  GROUP BY b.relevant_field
  HAVING count(*) > 1) 

SELECT st1.*
FROM some_table st1
inner join 
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;

I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.

This worked a lot faster, try it!


I have reformatted your slow sql query with www.prettysql.net

SELECT *
FROM some_table
WHERE
 relevant_field in
 (
  SELECT relevant_field
  FROM some_table
  GROUP BY relevant_field
  HAVING COUNT ( * ) > 1
 );

When using a table in both the query and the subquery, you should always alias both, like this:

SELECT *
FROM some_table as t1
WHERE
 t1.relevant_field in
 (
  SELECT t2.relevant_field
  FROM some_table as t2
  GROUP BY t2.relevant_field
  HAVING COUNT ( t2.relevant_field ) > 1
 );

Does that help?


Subqueries vs joins

http://www.scribd.com/doc/2546837/New-Subquery-Optimizations-In-MySQL-6