Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IN vs OR in the SQL WHERE Clause

Tags:

sql

database

People also ask

Can in operator be used with WHERE clause?

The SQL IN OperatorThe IN operator allows you to specify multiple values in a WHERE clause. The IN operator is a shorthand for multiple OR conditions.

What is the difference between using the WHERE clause and the HAVING clause?

What is the Difference between Where and Having Clause in SQL? If “Where” clause is used to filter the records from a table that is based on a specified condition, then the “Having” clause is used to filter the record from the groups based on the specified condition.

Can I use two and in WHERE clause in SQL?

You can specify multiple conditions in a single WHERE clause to, say, retrieve rows based on the values in multiple columns. You can use the AND and OR operators to combine two or more conditions into a compound condition. AND, OR, and a third operator, NOT, are logical operators.

Is in the same as or SQL?

Yes, that is correct, SQL Server optimizer engine internally automatically maps all the values specified in the IN operator to OR operator. As SQL Server converts IN to OR automatically that is the primary reasons for both them having identical performance.


I assume you want to know the performance difference between the following:

WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'

According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.

The best way to know is to profile both on your database with your specific data to see which is faster.

I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:

SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)

SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)

So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.


The best way to find out is looking at the Execution Plan.


I tried it with Oracle, and it was exactly the same.

CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );

SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );

Even though the query uses IN, the Execution Plan says that it uses OR:

--------------------------------------------------------------------------------------    
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |    
--------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT  |                  |     8 |  1416 |   163   (2)| 00:00:02 |    
|*  1 |  TABLE ACCESS FULL| PERFORMANCE_TEST |     8 |  1416 |   163   (2)| 00:00:02 |    
--------------------------------------------------------------------------------------    

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR                
              "OBJECT_NAME"='DBMS_STANDARD')                                              

The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.

Here is a like of what you can use with OR but that are not compatible with IN: greater. greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE. In addition consider that the conditions may not always compare the same value.

For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and very much more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.

[EDIT] Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.


I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN clearly wins)


OR makes sense (from readability point of view), when there are less values to be compared. IN is useful esp. when you have a dynamic source, with which you want values to be compared.

Another alternative is to use a JOIN with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.