Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would an IN condition be slower than "=" in sql?

Check the question This SELECT query takes 180 seconds to finish (check the comments on the question itself).
The IN get to be compared against only one value, but still the time difference is enormous.
Why is it like that?

like image 843
Itay Moav -Malimovka Avatar asked Aug 05 '10 16:08

Itay Moav -Malimovka


People also ask

Is in faster than in SQL?

The EXISTS clause is much faster than IN when the subquery results is very large. Conversely, the IN clause is faster than EXISTS when the subquery results is very small. Also, the IN clause can't compare anything with NULL values, but the EXISTS clause can compare everything with NULLs.

What makes a SQL query slow?

Slow queries can mean your database does more work than it needs to, which means it's using more resources than it needs to. When limited resources like CPU or I/O run out, everything can start to slow down. Inefficient use of resources is also a problem when you're not using the resources you have.

Why do joins slow down queries?

Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. There's an example of this in the subqueries lesson. Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.

Does order of condition matter in SQL?

No, the order of the WHERE clauses does not matter. The optimizer reviews the query & determines the best means of getting the data based on indexes and such.


1 Answers

Summary: This is a known problem in MySQL and was fixed in MySQL 5.6.x. The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.


When you run EXPLAIN on the original query it returns this:

 1  'PRIMARY'             'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where' 2  'DEPENDENT SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where' 3  'DEPENDENT SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where' 

When you change IN to = you get this:

 1  'PRIMARY'   'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where' 2  'SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where' 3  'SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where' 

Each dependent subquery is run once per row in the query it is contained in, whereas the subquery is run only once. MySQL can sometimes optimize dependent subqueries when there is a condition that can be converted to a join but here that is not the case.

Now this of course leaves the question of why MySQL believes that the IN version needs to be a dependent subquery. I have made a simplified version of the query to help investigate this. I created two tables 'foo' and 'bar' where the former contains only an id column, and the latter contains both an id and a foo id (though I didn't create a foreign key constraint). Then I populated both tables with 1000 rows:

CREATE TABLE foo (id INT PRIMARY KEY NOT NULL); CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);  -- populate tables with 1000 rows in each  SELECT id FROM foo WHERE id IN (     SELECT MAX(foo_id)     FROM bar ); 

This simplified query has the same problem as before - the inner select is treated as a dependent subquery and no optimization is performed, causing the inner query to be run once per row. The query takes almost one second to run. Changing the IN to = again allows the query to run almost instantly.

The code I used to populate the tables is below, in case anyone wishes to reproduce the results.

CREATE TABLE filler (         id INT NOT NULL PRIMARY KEY AUTO_INCREMENT ) ENGINE=Memory;  DELIMITER $$  CREATE PROCEDURE prc_filler(cnt INT) BEGIN         DECLARE _cnt INT;         SET _cnt = 1;         WHILE _cnt <= cnt DO                 INSERT                 INTO    filler                 SELECT  _cnt;                 SET _cnt = _cnt + 1;         END WHILE; END $$  DELIMITER ;  CALL prc_filler(1000);  INSERT foo SELECT id FROM filler; INSERT bar SELECT id, id FROM filler; 
like image 148
Mark Byers Avatar answered Oct 05 '22 23:10

Mark Byers