Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(Why) Can't MySQL use index in such cases?

1 - PRIMARY used in a secondary index, e.g. secondary index on (PRIMARY,column1)

2 - I'm aware mysql cannot continue using the rest of an index as soon as one part was used for a range scan, however: IN (...,...,...) is not considered a range, is it? Yes, it is a range, but I've read on mysqlperformanceblog.com that IN behaves differently than BETWEEN according to the use of index.

Could anyone confirm those two points? Or tell me why this is not possible? Or how it could be possible?

UPDATE:

Links:
http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/
http://www.mysqlperformanceblog.com/2006/08/14/mysql-followup-on-union-for-query-optimization-query-profiling/comment-page-1/#comment-952521

UPDATE 2: example of nested SELECT:

SELECT * FROM user_d1 uo 
WHERE EXISTS (
    SELECT 1 FROM `user_d1` ui
    WHERE ui.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
    AND ui.id=uo.id
)    
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20

So, the outer SELECT uses timestamp_lastonline for sorting, the inner either PK to connect with the outer or birthdate for filtering.

What other options rather than this query are there if MySQL cannot use index on a range scan and for sorting?

like image 894
Chris Avatar asked Jun 18 '12 09:06

Chris


1 Answers

The column(s) of the primary key can certainly be used in a secondary index, but it's not often worthwhile. The primary key guarantees uniqueness, so any columns listed after it cannot be used for range lookups. The only time it will help is when a query can use the index alone

As for your nested select, the extra complication should not beat the simplest query:

SELECT * FROM user_d1 uo 
WHERE uo.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20

MySQL will choose between a birthdate index or a timestamp_lastonline index based on which it feels will have the best chance of scanning fewer rows. In either case, the column should be the first one in the index. The birthdate index will also carry a sorting penalty, but might be worthwhile if a large number of recent users will have birth dates outside of that range.

If you wish to control the order, or potentially improve performance, a (timestamp_lastonline, birthdate) or (birthdate, timestamp_lastonline) index might help. If it doesn't, and you really need to select based on the birthdate first, then you should select from the inner query instead of filtering on it:

SELECT * FROM (
    SELECT * FROM user_d1 ui
    WHERE ui.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
) as uo
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20

Even then, MySQL's optimizer might choose to rewrite your query if it finds a timestamp_lastonline index but no birthdate index.

And yes, IN (..., ..., ...) behaves differently than BETWEEN. Only the latter can effectively use a range scan over an index; the former would look up each item individually.

like image 181
eswald Avatar answered Oct 12 '22 20:10

eswald