I have a very large MySQL table (half a billion rows), something like a "log file", in which I read data as a stream of chunks from "current" into "past". The mechanics is like the following: The first query fetches e.g. the most recent 10 entries, and the user can choose to load "more from past", which means loading the 10 most recent entries which are older than the oldest one already loaded. Instead of time, rows are identified by ids. So the query has a WHERE id < [the id of the oldest entry already loaded] as well as ORDER BY id DESC LIMIT 10 in order to load the "next 10" rows.
The WHERE together with the ORDER BY id DESC LIMIT 10 work fine if I use an index over (id) column only.
However, my scenario is a bit more complicated than that. The huge table is actually a combination of several log files. So the WHERE clause also filters out which kind of log file to display. The log files are organized in a two level hierarchy. Let's call the columns which specify the log file x and y. Both columns are of type varchar(32) collate ascii_general_ci, and I have an index (x, y, id) which uses them both in full text. (I'm thinking about simplifying this by using numerical identificators for the log file (and not using two levels of hierarchy for them), however, the following is very strange.)
So my query contains the following clause:
WHERE x = ... AND y = ... AND id < [the id of the oldest entry already loaded]
together with the beforementioned
ORDER BY id DESC LIMIT 10
The index (x, y, id) is very nicely being used by MySQL if I do NOT order descendingly but ascendingly instead. However, when sorting descendingly, it will not be used.
Original query (slow)
SELECT * FROM syncLog WHERE x = '' AND y = '' AND id < 438353696 ORDER BY id DESC LIMIT 10
select_type? table? partitions? type? possible_keys? key? key_len? ref? rows? Extra?
SIMPLE syncLog NULL ref PRIMARY,x_y_id x_y_id 68 const,const 37040991 Using where
Sorting ascendingly (fast)
SELECT * FROM syncLog WHERE x = '' AND y = '' AND id < 438353696 ORDER BY id LIMIT 10
select_type? table? partitions? type? possible_keys? key? key_len? ref? rows? Extra?
SIMPLE syncLog NULL ref PRIMARY,x_y_id x_y_id 68 const,const 37041163 Using index condition; Using where
Without x,y (fast)
SELECT * FROM syncLog WHERE id < 438353696 ORDER BY id DESC LIMIT 10
select_type? table? partitions? type? possible_keys? key? key_len? ref? rows? Extra?
SIMPLE syncLog NULL range PRIMARY PRIMARY 8 NULL 37041281 Using where
I also tried FORCE INDEX FOR ORDER BY (x_y_id), but it doesn't change anything (compared to 1.).
I also tried to include x and y in ORDER BY. I guessed, for MySQL to make use of the index for that, both x and y need also being included descendingly, although it doesn't make much sense here (since the result only contains same values for x and y anyways). ORDER BY x DESC, y DESC, id DESC But it didn't change anything either.
Why is the index not being used when searching for the rows with the 10 largest ids? How can this be fixed?
Interestingly, when forcing MySQL to use the primary index for this query (as in 1. but with FORCE INDEX (PRIMARY)), the query is fast. However, this can be a problem as soon as the individual log files become very sparse, since they share one id space, resulting in large portions of the table to be scanned for finding 10 rows which have the correct x and y.
This setup uses a pretty outdated MySQL version: 5.6.25
Here is my best guess as to what is happening. Here is your query again, just for easy reference:
SELECT *
FROM syncLog
WHERE x = '' AND y = '' AND id < 438353696
ORDER BY id
LIMIT 10
In the case of an ascending sort, MySQL would only have to do a single index scan on id, starting from the left (lowest) side. Note that this kills two birds with one stone, because the scan satisfies both the WHERE restriction id < 438353696 and the sorting requirement in the ORDER BY clause.
However, in the descending sort version, it is not the same thing:
SELECT *
FROM syncLog
WHERE x = '' AND y = '' AND id < 438353696
ORDER BY id DESC
LIMIT 10
In this case, MySQL might choose to not scan the index to satisfy the WHERE criterion on id. The reason is that it would not know (or even be able to find) the position in the index to the left of which id < 438353696 would be true. So, it chooses to just scan the table. That being done, it then proceeds to just manually sort the result set, leading to the degrading performance you are seeing.
Thank you all for your help. Your hints made me understand a lot better what is happening behind the scenes.
However, the solution was too simple: I should not have said FORCE INDEX FOR OREDER BY and neither just USE INDEX. The solution is to FORCE INDEX.
In detail:
Before, I tried:
Simply using an index hint without forcing it (slow):
SELECT * FROM syncLog USE INDEX (x_y_id) WHERE x = '' AND y = '' AND id < 438459282 ORDER BY id DESC LIMIT 10
select_type? table? partitions? type? possible_keys? key? key_len? ref? rows? Extra?
SIMPLE syncLog NULL ref x_y_id x_y_id 68 const,const 37042925 Using where
Forcing the index, however in a silly rush I told MySQL to force it for ordering! (slow)
SELECT * FROM syncLog FORCE INDEX FOR ORDER BY (x_y_id) WHERE x = '' AND y = '' AND id < 438459282 ORDER BY id DESC LIMIT 10
select_type? table? partitions? type? possible_keys? key? key_len? ref? rows? Extra?
SIMPLE syncLog NULL ref PRIMARY,x_y_id x_y_id 68 const,const 37042973 Using where
The solution is:
Forcing the index, but not telling MySQL for what. (fast)
SELECT * FROM syncLog FORCE INDEX (x_y_id) WHERE x = '' AND y = '' AND id < 438459282 ORDER BY id DESC LIMIT 10
select_type? table? partitions? type? possible_keys? key? key_len? ref? rows? Extra?
SIMPLE syncLog NULL range x_y_id x_y_id 76 NULL 37042943 Using index condition
Now have a look at key_len. MySQL didn't use the id part of the index before. Now it is using it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With