Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL "IN" operator performance on (large?) number of values

I have been experimenting with Redis and MongoDB lately and it would seem that there are often cases where you would store an array of id's in either MongoDB or Redis. I'll stick with Redis for this question since I am asking about the MySQL IN operator.

I was wondering how performant it is to list a large number (300-3000) of id's inside the IN operator, which would look something like this:

SELECT id, name, price FROM products WHERE id IN (1, 2, 3, 4, ...... 3000) 

Imagine something as simple as a products and categories table which you might normally JOIN together to get the products from a certain category. In the example above you can see that under a given category in Redis ( category:4:product_ids ) I return all the product ids from the category with id 4, and place them in the above SELECT query inside the IN operator.

How performant is this?

Is this an "it depends" situation? Or is there a concrete "this is (un)acceptable" or "fast" or "slow" or should I add a LIMIT 25, or doesn't that help?

SELECT id, name, price FROM products WHERE id IN (1, 2, 3, 4, ...... 3000) LIMIT 25 

Or should I trim the array of product id's returned by Redis to limit it to 25 and only add 25 id's to the query rather than 3000 and LIMIT-ing it to 25 from inside the query?

SELECT id, name, price FROM products WHERE id IN (1, 2, 3, 4, ...... 25) 

Any suggestions/feedback is much appreciated!

like image 512
Michael van Rooijen Avatar asked Dec 22 '10 23:12

Michael van Rooijen


People also ask

Which operator is used for making a query shorter when there are too many values?

As you can see, the IN operator is much shorter and easier to read when you are testing for more than two or three values. You can also use NOT IN to exclude the rows in your list.

What does the IN operator in MySQL determine?

The IN operator allows you to determine if a value matches any value in a list of values. Here's the syntax of the IN operator: value IN (value1, value2, value3,...) The IN operator returns 1 (true) if the value equals any value in the list ( value1 , value2 , value3 ,…).

Which operator is used to find range of values in MySQL?

The SQL Between operator is used to test whether an expression is within a range of values. This operator is inclusive, so it includes the start and end values of the range. The values can be of textual, numeric type, or dates. This operator can be used with SELECT, INSERT, UPDATE, and DELETE command.


2 Answers

Generally speaking, if the IN list gets too large (for some ill-defined value of 'too large' that is usually in the region of 100 or smaller), it becomes more efficient to use a join, creating a temporary table if need so be to hold the numbers.

If the numbers are a dense set (no gaps - which the sample data suggests), then you can do even better with WHERE id BETWEEN 300 AND 3000.

However, presumably there are gaps in the set, at which point it may be better to go with the list of valid values after all (unless the gaps are relatively few in number, in which case you could use:

WHERE id BETWEEN 300 AND 3000 AND id NOT BETWEEN 742 AND 836 

Or whatever the gaps are.

like image 137
Jonathan Leffler Avatar answered Oct 15 '22 13:10

Jonathan Leffler


I have been doing some tests, and as David Fells says in his answer, it is quite well optimized. As a reference, I have created an InnoDB table with 1,000,000 registers and doing a select with the "IN" operator with 500,000 random numbers, it takes only 2.5 seconds on my MAC; selecting only the even registers takes 0.5 seconds.

The only problem that I had is that I had to increase the max_allowed_packet parameter from the my.cnf file. If not, a mysterious “MYSQL has gone away” error is generated.

Here is the PHP code that I use to make the test:

$NROWS =1000000; $SELECTED = 50; $NROWSINSERT =15000;  $dsn="mysql:host=localhost;port=8889;dbname=testschema"; $pdo = new PDO($dsn, "root", "root"); $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);  $pdo->exec("drop table if exists `uniclau`.`testtable`"); $pdo->exec("CREATE  TABLE `testtable` (         `id` INT NOT NULL ,         `text` VARCHAR(45) NULL ,         PRIMARY KEY (`id`) )");  $before = microtime(true);  $Values=''; $SelValues='('; $c=0; for ($i=0; $i<$NROWS; $i++) {     $r = rand(0,99);     if ($c>0) $Values .= ",";     $Values .= "( $i , 'This is value $i and r= $r')";     if ($r<$SELECTED) {         if ($SelValues!="(") $SelValues .= ",";         $SelValues .= $i;     }     $c++;      if (($c==100)||(($i==$NROWS-1)&&($c>0))) {         $pdo->exec("INSERT INTO `testtable` VALUES $Values");         $Values = "";         $c=0;     } } $SelValues .=')'; echo "<br>";   $after = microtime(true); echo "Insert execution time =" . ($after-$before) . "s<br>";  $before = microtime(true);   $sql = "SELECT count(*) FROM `testtable` WHERE id IN $SelValues"; $result = $pdo->prepare($sql);   $after = microtime(true); echo "Prepare execution time =" . ($after-$before) . "s<br>";  $before = microtime(true);  $result->execute(); $c = $result->fetchColumn();  $after = microtime(true); echo "Random selection = $c Time execution time =" . ($after-$before) . "s<br>";    $before = microtime(true);  $sql = "SELECT count(*) FROM `testtable` WHERE id %2 = 1"; $result = $pdo->prepare($sql); $result->execute(); $c = $result->fetchColumn();  $after = microtime(true); echo "Pairs = $c Exdcution time=" . ($after-$before) . "s<br>"; 

And the results:

Insert execution time =35.2927210331s Prepare execution time =0.0161771774292s Random selection = 499102 Time execution time =2.40285992622s Pairs = 500000 Exdcution time=0.465420007706s 
like image 41
jbaylina Avatar answered Oct 15 '22 13:10

jbaylina