Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LIMIT 1 is very slow, for specific records, using different keys

I am diagnosing an intermittent slow query, and have found a strange behaviour in MySQL I cannot explain. It's choosing a different, non-optimal key strategy for one specific case, only when doing a LIMIT 1.

Table (some unreferenced data columns removed for brevity)

CREATE TABLE `ch_log` (
    `cl_id` BIGINT(20) NOT NULL AUTO_INCREMENT,
    `cl_unit_id` INT(11) NOT NULL DEFAULT '0',
    `cl_date` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
    `cl_type` CHAR(1) NOT NULL DEFAULT '',
    `cl_data` TEXT NOT NULL,
    `cl_event` VARCHAR(255) NULL DEFAULT NULL,
    `cl_timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `cl_record_status` CHAR(1) NOT NULL DEFAULT 'a',
    PRIMARY KEY (`cl_id`),
    INDEX `cl_type` (`cl_type`),
    INDEX `cl_date` (`cl_date`),
    INDEX `cl_event` (`cl_event`),
    INDEX `cl_unit_id` (`cl_unit_id`),
    INDEX `log_type_unit_id` (`cl_unit_id`, `cl_type`),
    INDEX `unique_user` (`cl_user_number`, `cl_unit_id`)
)
ENGINE=InnoDB
AUTO_INCREMENT=419582094;

This is the query, which only runs slow for one specific cl_unit_id:

EXPLAIN
SELECT *
FROM `ch_log`
WHERE `ch_log_type` ='I' and ch_log_event = 'G'  
AND cl_unit_id=1234
ORDER BY cl_date DESC 
LIMIT 1;
id|select_type|table |type |possible_keys                               |key    |key_len|ref|rows|Extra
1 |SIMPLE     |ch_log|index|cl_type,cl_event,cl_unit_id,log_type_unit_id|cl_date|8      |\N |5295|Using where

For all other values of cl_unit_id it uses the log_type_unit_id key which is much faster.

id|select_type|table |type|possible_keys                                           |key             |key_len|ref        |rows|Extra
1 |SIMPLE     |ch_log|ref |ch_log_type,ch_log_event,ch_log_unit_id,log_type_unit_id|log_type_unit_id|5      |const,const|3804|Using where; Using filesort
  • All queries take about 0.01 seconds.
  • The "slow unit" query takes 10-15 minutes!

I can't see anything strange about the data for this 'unit':

  • Unit 1234 only has 6 records of type I and event G.
  • Other units have many more.
  • Unit 1234 only has 32,000 logs in total which is typical.
  • the data itself is normal, no bigger or older.
  • There are around 3,000 "units" in the database, which represent devices logging stuff. The cl_unit_id is their unique PK (although no constraint).

General info

  • There are 30m records in total, around 12GB
  • mysql 5.1.69-log
  • Centos 64bit
  • The data is gradually changing (30m = 3months of logs) but I don't know if this has happened before

Things I've tried, and can "solve" the problem with:

  1. Removing the LIMIT 1 - the query runs in milliseconds and returns the data.

  2. Changing to LIMIT 2 or other combinations e.g. 2,3 - runs in milliseconds.

  3. Adding a index hint - solves it:

    FROM `ch_log` USE INDEX (log_type_unit_id)
    

    but... I don't want to hard-code this into the application.

  4. Adding a second order by on the primary key also "solves" it:

    ORDER BY cl_id, cl_date DESC 
    

    giving explain:

    id|select_type|table |type|possible_keys                                           |key             |key_len|ref        |rows|Extra
    1 |SIMPLE     |ch_log|ref |ch_log_type,ch_log_event,ch_log_unit_id,log_type_unit_id|log_type_unit_id|5      |const,const|6870|Using where
    

    which is slightly different to the type hinted one, with more records examined (6,000) but still runs in 10's of milliseconds.

Again I could do this, but I don't like using side-effects I don't understand.

So I think my main question are:

a) why does it only happen for LIMIT 1?

b) how can the data itself affect the key-strategy so much? And what aspect of the data, seeing as the quantity and spread in the indexes seems typical.

like image 878
scipilot Avatar asked Jul 15 '15 03:07

scipilot


People also ask

Does Limit 1 speed up query?

The answer, in short, is yes.If you limit your result to 1, then even if you are "expecting" one result, the query will be faster because your database wont look through all your records. It will simply stop once it finds a record that matches your query.

What does limit 1 do in SQL?

This LIMIT clause would return 3 records in the result set with an offset of 1. What this means is that the SELECT statement would skip the first record that would normally be returned and instead return the second, third, and fourth records.

Does limit speed up SQL query?

Yes, you will notice a performance difference when dealing with the data. One record takes up less space than multiple records.

How do you change limits in SQL?

You can easily change this limit by going to MySQL Workbench >> Edit >> Preferences >> SQL Queries tab. Over here you will option to Limit Rows. You can set this to very high value or uncheck the option. When you uncheck that option, it will retrieve all the rows from a query (equivalent to no limits).


1 Answers

Mysql will pick an explain plan and use different indexes depending on what it thinks is statistically the best choice. For all your first questions, this is the answer:

  1. Removing the LIMIT 1 - the query runs in milliseconds and returns the data. and -> Yes, check it, the explain plan is good
  2. Changing to LIMIT 2 or other combinations e.g. 2,3 - runs in milliseconds. -> the same applies. The optimizer chooses a different index because suddenly, the expected block reads became twice as big as with LIMIT 1 (that's just one possibility)
  3. Adding a index hint solves it -> Of course, you force a good explain plan
  4. Adding a second order by on the primary key also "solves" it -> yes, because by coincidence, the result is a better explain plan

Now, that only answers half of the questions.

a) why does it only happen for LIMIT 1?

It actually happens not only because of LIMIT 1, but because of

  • Your data statistic repartition (orients the optimizer's decisions)
  • Your ORDER BY DESC clause. Try with ORDER BY ... ASC and you will probably see an improvement too.

This phenomenon is perfectly aknowledged. Please read on.

One of the accepted solutions (bottom down in the article) is to force the index the same way you did. Yes, sometimes, it is justified. Otherwise, this hint thing would have been totally wiped out long ago. Robots cannot be always perfect :-)

b) how can the data itself affect the key-strategy so much? And what aspect of the data, seeing as the quantity and spread in the indexes seems typical.

You said it, the spread is what usually fucks up. Not only the optimizer might just make a wrong decision with accurate statistics, but it could also be completely off just because the delta on the table is right below 1 / 16th of the total row count...

like image 103
Sebas Avatar answered Oct 19 '22 21:10

Sebas