Why would an IN condition be slower than "=" in sql?

Tags:

Check the question This SELECT query takes 180 seconds to finish (check the comments on the question itself).
The IN get to be compared against only one value, but still the time difference is enormous.
Why is it like that?

843

asked Aug 05 '10 16:08

Itay Moav -Malimovka

1 Answers

Summary: This is a known problem in MySQL and was fixed in MySQL 5.6.x. The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.

When you run EXPLAIN on the original query it returns this:

 1  'PRIMARY'             'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where' 2  'DEPENDENT SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where' 3  'DEPENDENT SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where'

When you change IN to = you get this:

 1  'PRIMARY'   'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where' 2  'SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where' 3  'SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where'

Each dependent subquery is run once per row in the query it is contained in, whereas the subquery is run only once. MySQL can sometimes optimize dependent subqueries when there is a condition that can be converted to a join but here that is not the case.

Now this of course leaves the question of why MySQL believes that the IN version needs to be a dependent subquery. I have made a simplified version of the query to help investigate this. I created two tables 'foo' and 'bar' where the former contains only an id column, and the latter contains both an id and a foo id (though I didn't create a foreign key constraint). Then I populated both tables with 1000 rows:

CREATE TABLE foo (id INT PRIMARY KEY NOT NULL); CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);  -- populate tables with 1000 rows in each  SELECT id FROM foo WHERE id IN (     SELECT MAX(foo_id)     FROM bar );

This simplified query has the same problem as before - the inner select is treated as a dependent subquery and no optimization is performed, causing the inner query to be run once per row. The query takes almost one second to run. Changing the IN to = again allows the query to run almost instantly.

The code I used to populate the tables is below, in case anyone wishes to reproduce the results.

CREATE TABLE filler (         id INT NOT NULL PRIMARY KEY AUTO_INCREMENT ) ENGINE=Memory;  DELIMITER $$  CREATE PROCEDURE prc_filler(cnt INT) BEGIN         DECLARE _cnt INT;         SET _cnt = 1;         WHILE _cnt <= cnt DO                 INSERT                 INTO    filler                 SELECT  _cnt;                 SET _cnt = _cnt + 1;         END WHILE; END $$  DELIMITER ;  CALL prc_filler(1000);  INSERT foo SELECT id FROM filler; INSERT bar SELECT id, id FROM filler;

148

answered Oct 05 '22 23:10

Mark Byers

Related questions
                            
                                Using Linq to SQL, how do I find min and max of a column in a table?
                            
                                How to define two relationships to the same table in SQLAlchemy
                            
                                Work around SQL Server maximum columns limit 1024 and 8kb record size
                            
                                In MySQL: How to pass a table name as stored procedure and/or function argument?
                            
                                Can I get the SQL string from a JPA Query object?
                            
                                Alter column in SQL Server
                            
                                NULL vs DEFAULT NULL vs NULL DEFAULT NULL in MYSQL column creation?
                            
                                Select "where clause" evaluation order
                            
                                How do I group on continuous ranges
                            
                                SQL Server - INNER JOIN WITH DISTINCT
                            
                                SQL: subquery has too many columns
                            
                                Explode (transpose?) multiple columns in Spark SQL table
                            
                                using DateDiff to find duration in minutes
                            
                                Why are the queries in SQL mostly written in Capital Letters?
                            
                                Select rows where column value has changed
                            
                                replace multiple values at the same time - in order to convert a string to a number
                            
                                Postgres not using index when index scan is much better option
                            
                                How is data stored in SQL server? [closed]
                            
                                Entity Framework 6 Code First Trigger
                            
                                How to quote a string value explicitly (Python DB API/Psycopg2)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why would an IN condition be slower than "=" in sql?

Tags:

performance

sql

comparison

mysql

Itay Moav -Malimovka

People also ask

1 Answers

Mark Byers

Recent Activity

Donate For Us