Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL IN clause slower than individual queries

I'm using Hibernate's JPA implementation with MySQL 5.0.67. MySQL is configured to use InnoDB.

In performing a JPA query (which is translated to SQL), I've discovered that using the IN clause is slower than performing individual queries. Example:

SELECT p FROM Person p WHERE p.name IN ('Joe', 'Jane', 'Bob', 'Alice')

is slower than four separate queries:

SELECT p FROM Person p WHERE p.name = 'Joe'
SELECT p FROM Person p WHERE p.name = 'Jane'
SELECT p FROM Person p WHERE p.name = 'Bob'
SELECT p FROM Person p WHERE p.name = 'Alice'

Why is this? Is this a MySQL performance limitation?

like image 334
Steve Kuo Avatar asked Dec 14 '22 05:12

Steve Kuo


2 Answers

This is a known deficiency in MySQL.

It is often true that using UNION performs better than a range query like the one you show. MySQL doesn't employ indexes very intelligently for expressions using IN (...). A similar hole exists in the optimizer for boolean expressions with OR.

See http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/ for some explanation and detailed benchmarks.

The optimizer is being improved all the time. A deficiency in one version of MySQL may be improved in a subsequent version. So it's worth testing your queries on different versions.

It is also advantageous to use UNION ALL instead of simply UNION. Both queries use a temporary table to store results, but the difference is that UNION applies DISTINCT to the result set, which incurs an additional un-indexed sort.

like image 85
Bill Karwin Avatar answered Dec 15 '22 19:12

Bill Karwin


If you're using the IN operator, it's not much different than saying:

(p.name = 'Joe' OR p.name = 'Jane' OR p.name = 'Bob' OR p.name = 'Alice')

Those are four conditions which must be checked for every row that the query must consider. Of course, each other query you cite has only one condition. I don't believe in most real-world scenarios doing four such queries would be faster, since you have to consider the time it takes for your client to read the result sets and do something with them. In that case, IN looks pretty nice; even better if it can use an index.

like image 33
Peter Avatar answered Dec 15 '22 17:12

Peter