I am diagnosing an intermittent slow query, and have found a strange behaviour in MySQL I cannot explain. It's choosing a different, non-optimal key strategy for one specific case, only when doing a LIMIT 1
.
Table (some unreferenced data columns removed for brevity)
CREATE TABLE `ch_log` (
`cl_id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`cl_unit_id` INT(11) NOT NULL DEFAULT '0',
`cl_date` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`cl_type` CHAR(1) NOT NULL DEFAULT '',
`cl_data` TEXT NOT NULL,
`cl_event` VARCHAR(255) NULL DEFAULT NULL,
`cl_timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`cl_record_status` CHAR(1) NOT NULL DEFAULT 'a',
PRIMARY KEY (`cl_id`),
INDEX `cl_type` (`cl_type`),
INDEX `cl_date` (`cl_date`),
INDEX `cl_event` (`cl_event`),
INDEX `cl_unit_id` (`cl_unit_id`),
INDEX `log_type_unit_id` (`cl_unit_id`, `cl_type`),
INDEX `unique_user` (`cl_user_number`, `cl_unit_id`)
)
ENGINE=InnoDB
AUTO_INCREMENT=419582094;
This is the query, which only runs slow for one specific cl_unit_id
:
EXPLAIN
SELECT *
FROM `ch_log`
WHERE `ch_log_type` ='I' and ch_log_event = 'G'
AND cl_unit_id=1234
ORDER BY cl_date DESC
LIMIT 1;
id|select_type|table |type |possible_keys |key |key_len|ref|rows|Extra
1 |SIMPLE |ch_log|index|cl_type,cl_event,cl_unit_id,log_type_unit_id|cl_date|8 |\N |5295|Using where
For all other values of cl_unit_id
it uses the log_type_unit_id
key which is much faster.
id|select_type|table |type|possible_keys |key |key_len|ref |rows|Extra
1 |SIMPLE |ch_log|ref |ch_log_type,ch_log_event,ch_log_unit_id,log_type_unit_id|log_type_unit_id|5 |const,const|3804|Using where; Using filesort
I can't see anything strange about the data for this 'unit':
General info
Things I've tried, and can "solve" the problem with:
Removing the LIMIT 1
- the query runs in milliseconds and returns the data.
Changing to LIMIT 2
or other combinations e.g. 2,3 - runs in milliseconds.
Adding a index hint - solves it:
FROM `ch_log` USE INDEX (log_type_unit_id)
but... I don't want to hard-code this into the application.
Adding a second order by on the primary key also "solves" it:
ORDER BY cl_id, cl_date DESC
giving explain:
id|select_type|table |type|possible_keys |key |key_len|ref |rows|Extra
1 |SIMPLE |ch_log|ref |ch_log_type,ch_log_event,ch_log_unit_id,log_type_unit_id|log_type_unit_id|5 |const,const|6870|Using where
which is slightly different to the type hinted one, with more records examined (6,000) but still runs in 10's of milliseconds.
Again I could do this, but I don't like using side-effects I don't understand.
So I think my main question are:
a) why does it only happen for LIMIT 1
?
b) how can the data itself affect the key-strategy so much? And what aspect of the data, seeing as the quantity and spread in the indexes seems typical.
The answer, in short, is yes.If you limit your result to 1, then even if you are "expecting" one result, the query will be faster because your database wont look through all your records. It will simply stop once it finds a record that matches your query.
This LIMIT clause would return 3 records in the result set with an offset of 1. What this means is that the SELECT statement would skip the first record that would normally be returned and instead return the second, third, and fourth records.
Yes, you will notice a performance difference when dealing with the data. One record takes up less space than multiple records.
You can easily change this limit by going to MySQL Workbench >> Edit >> Preferences >> SQL Queries tab. Over here you will option to Limit Rows. You can set this to very high value or uncheck the option. When you uncheck that option, it will retrieve all the rows from a query (equivalent to no limits).
Mysql will pick an explain plan and use different indexes depending on what it thinks is statistically the best choice. For all your first questions, this is the answer:
LIMIT 1
- the query runs in milliseconds and returns the data.
and -> Yes, check it, the explain plan is goodLIMIT 2
or other combinations e.g. 2,3 - runs in milliseconds. -> the same applies. The optimizer chooses a different index because suddenly, the expected block reads became twice as big as with LIMIT 1
(that's just one possibility)Now, that only answers half of the questions.
a) why does it only happen for LIMIT 1?
It actually happens not only because of LIMIT 1
, but because of
ORDER BY DESC
clause. Try with ORDER BY ... ASC
and you will probably see an improvement too.This phenomenon is perfectly aknowledged. Please read on.
One of the accepted solutions (bottom down in the article) is to force the index the same way you did. Yes, sometimes, it is justified. Otherwise, this hint thing would have been totally wiped out long ago. Robots cannot be always perfect :-)
b) how can the data itself affect the key-strategy so much? And what aspect of the data, seeing as the quantity and spread in the indexes seems typical.
You said it, the spread is what usually fucks up. Not only the optimizer might just make a wrong decision with accurate statistics, but it could also be completely off just because the delta on the table is right below 1 / 16th of the total row count...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With