Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ORDER BY id DESC in combination with WHERE id < ... won't use index

Tags:

sql

mysql

I have a very large MySQL table (half a billion rows), something like a "log file", in which I read data as a stream of chunks from "current" into "past". The mechanics is like the following: The first query fetches e.g. the most recent 10 entries, and the user can choose to load "more from past", which means loading the 10 most recent entries which are older than the oldest one already loaded. Instead of time, rows are identified by ids. So the query has a WHERE id < [the id of the oldest entry already loaded] as well as ORDER BY id DESC LIMIT 10 in order to load the "next 10" rows.

The WHERE together with the ORDER BY id DESC LIMIT 10 work fine if I use an index over (id) column only.

However, my scenario is a bit more complicated than that. The huge table is actually a combination of several log files. So the WHERE clause also filters out which kind of log file to display. The log files are organized in a two level hierarchy. Let's call the columns which specify the log file x and y. Both columns are of type varchar(32) collate ascii_general_ci, and I have an index (x, y, id) which uses them both in full text. (I'm thinking about simplifying this by using numerical identificators for the log file (and not using two levels of hierarchy for them), however, the following is very strange.)

So my query contains the following clause:

WHERE x = ... AND y = ... AND id < [the id of the oldest entry already loaded]

together with the beforementioned

ORDER BY id DESC LIMIT 10

The index (x, y, id) is very nicely being used by MySQL if I do NOT order descendingly but ascendingly instead. However, when sorting descendingly, it will not be used.

  1. Original query (slow)

    SELECT * FROM syncLog WHERE x = '' AND y = '' AND id < 438353696 ORDER BY id DESC LIMIT 10

    select_type?  table?   partitions?  type?  possible_keys?  key?    key_len?  ref?         rows?     Extra?
    SIMPLE        syncLog  NULL         ref    PRIMARY,x_y_id  x_y_id  68        const,const  37040991  Using where
    
  2. Sorting ascendingly (fast)

    SELECT * FROM syncLog WHERE x = '' AND y = '' AND id < 438353696 ORDER BY id LIMIT 10

    select_type?  table?   partitions?  type?  possible_keys?  key?    key_len?  ref?         rows?     Extra?
    SIMPLE        syncLog  NULL         ref    PRIMARY,x_y_id  x_y_id  68        const,const  37041163  Using index condition; Using where
    
  3. Without x,y (fast)

    SELECT * FROM syncLog WHERE id < 438353696 ORDER BY id DESC LIMIT 10

    select_type?  table?   partitions?  type?  possible_keys?  key?    key_len?  ref?         rows?     Extra?
    SIMPLE        syncLog  NULL         range  PRIMARY         PRIMARY 8         NULL         37041281  Using where
    

I also tried FORCE INDEX FOR ORDER BY (x_y_id), but it doesn't change anything (compared to 1.).

I also tried to include x and y in ORDER BY. I guessed, for MySQL to make use of the index for that, both x and y need also being included descendingly, although it doesn't make much sense here (since the result only contains same values for x and y anyways). ORDER BY x DESC, y DESC, id DESC But it didn't change anything either.

Why is the index not being used when searching for the rows with the 10 largest ids? How can this be fixed?

Interestingly, when forcing MySQL to use the primary index for this query (as in 1. but with FORCE INDEX (PRIMARY)), the query is fast. However, this can be a problem as soon as the individual log files become very sparse, since they share one id space, resulting in large portions of the table to be scanned for finding 10 rows which have the correct x and y.

This setup uses a pretty outdated MySQL version: 5.6.25

like image 381
leemes Avatar asked Nov 21 '25 04:11

leemes


2 Answers

Here is my best guess as to what is happening. Here is your query again, just for easy reference:

SELECT *
FROM syncLog
WHERE x = '' AND y = '' AND id < 438353696
ORDER BY id
LIMIT 10

In the case of an ascending sort, MySQL would only have to do a single index scan on id, starting from the left (lowest) side. Note that this kills two birds with one stone, because the scan satisfies both the WHERE restriction id < 438353696 and the sorting requirement in the ORDER BY clause.

However, in the descending sort version, it is not the same thing:

SELECT *
FROM syncLog
WHERE x = '' AND y = '' AND id < 438353696
ORDER BY id DESC
LIMIT 10

In this case, MySQL might choose to not scan the index to satisfy the WHERE criterion on id. The reason is that it would not know (or even be able to find) the position in the index to the left of which id < 438353696 would be true. So, it chooses to just scan the table. That being done, it then proceeds to just manually sort the result set, leading to the degrading performance you are seeing.

like image 182
Tim Biegeleisen Avatar answered Nov 23 '25 22:11

Tim Biegeleisen


Thank you all for your help. Your hints made me understand a lot better what is happening behind the scenes.

However, the solution was too simple: I should not have said FORCE INDEX FOR OREDER BY and neither just USE INDEX. The solution is to FORCE INDEX.

In detail:

Before, I tried:

  • Simply using an index hint without forcing it (slow):

    SELECT * FROM syncLog USE INDEX (x_y_id) WHERE x = '' AND y = '' AND id < 438459282 ORDER BY id DESC LIMIT 10

    select_type?  table?   partitions?  type?  possible_keys?  key?    key_len?  ref?         rows?     Extra?
    SIMPLE        syncLog  NULL         ref    x_y_id          x_y_id  68        const,const  37042925  Using where
    
  • Forcing the index, however in a silly rush I told MySQL to force it for ordering! (slow)

    SELECT * FROM syncLog FORCE INDEX FOR ORDER BY (x_y_id) WHERE x = '' AND y = '' AND id < 438459282 ORDER BY id DESC LIMIT 10

    select_type?  table?   partitions?  type?  possible_keys?  key?    key_len?  ref?         rows?     Extra?
    SIMPLE        syncLog  NULL         ref    PRIMARY,x_y_id  x_y_id  68        const,const  37042973  Using where
    

The solution is:

  • Forcing the index, but not telling MySQL for what. (fast)

    SELECT * FROM syncLog FORCE INDEX (x_y_id) WHERE x = '' AND y = '' AND id < 438459282 ORDER BY id DESC LIMIT 10

    select_type?  table?   partitions?  type?  possible_keys?  key?    key_len?  ref?  rows?     Extra?
    SIMPLE        syncLog  NULL         range  x_y_id          x_y_id  76        NULL  37042943  Using index condition
    

Now have a look at key_len. MySQL didn't use the id part of the index before. Now it is using it.

like image 38
leemes Avatar answered Nov 23 '25 21:11

leemes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!