We are using MySQL 5.5.42. We have a table <code>publications</code> containing about 150 million rows (about 140 GB on an SSD). The table has many columns, of which two are of particular interest: <ul> <li> <code>id</code> is primary key of the table and is of type <code>bigint</code> </li> <li> <code>cluster_id</code> is a nullable column of type <code>bigint</code> </li> </ul> Both columns have their own (separate) index. We make queries of the form <pre class="prettyprint"><code>SELECT * FROM publications WHERE id >= 14032924480302800156 AND cluster_id IS NULL ORDER BY id LIMIT 0, 200; </code></pre> <blockquote> Here is the problem: The larger the <code>id</code> value (14032924480302800156 in the example above), the slower the request. </blockquote> In other words, requests for low <code>id</code> value are fast (< 0.1 s) but the higher the <code>id</code> value, the slower the request (up to minutes). Everything is fine if we use another (indexed) column in the <code>WHERE</code> clause. For instance <pre class="prettyprint"><code>SELECT * FROM publications WHERE inserted_at >= '2014-06-20 19:30:25' AND cluster_id IS NULL ORDER BY inserted_at LIMIT 0, 200; </code></pre> where <code>inserted_at</code> is of type <code>timestamp</code>. Edit: Output of <code>EXPLAIN</code> when using <code>id >= 14032924480302800156</code>: <pre class="prettyprint"><code>id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra ---+-------------+--------------+------+--------------------+------------+---------+-------+----------+------------ 1 | SIMPLE | publications | ref | PRIMARY,cluster_id | cluster_id | 9 | const | 71647796 | Using where </code></pre> Output of <code>EXPLAIN</code> when using <code>inserted_at >= '2014-06-20 19:30:25'</code>: <pre class="prettyprint"><code>id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra ---+-------------+--------------+------+------------------------+------------+---------+-------+----------+------------ 1 | SIMPLE | publications | ref | inserted_at,cluster_id | cluster_id | 9 | const | 71647796 | Using where </code></pre>

There is some guesswork involved about MySQL using indexes in the wrong order. <code>PRIMARY</code> index seems to be treated in a completely different way than the others. In a query with a primary key condition indexes <code>PRIMARY</code> and on <code>cluster_id</code> can be used. For some reason, MySQL ignored <code>PRIMARY</code> index and looks at an index on <code>cluster_id</code> first, where you have a condition: it should be <code>NULL</code>. That leaves us with a huge potentially unordered (<code>NULL</code>s everywhere!) set of rows to be filtered by <code>id</code>. With the next query, however, it's different: <code>PRIMARY</code> index cannot be used at all, so MySQL figures what to use in a better way, apparently using an index on <code>inserted_at</code> first without any hints. What it should actually do in first query is take <code>PRIMARY</code> index first (tell it to do so). I am not a MySQL user, all my guesswork is backed only by my own understanding of internal data structures. I don't know whether it can apply an index on <code>cluster_id</code> on top of the results, but creating a composite index and comparing performance with and without it may give clues on whether it's used.

MySQL (id >= N AND col2 IS NULL) query unexpectedly slow for large N

Tags:

performance

mysql

We are using MySQL 5.5.42.

We have a table publications containing about 150 million rows (about 140 GB on an SSD).

The table has many columns, of which two are of particular interest:

id is primary key of the table and is of type bigint
cluster_id is a nullable column of type bigint

Both columns have their own (separate) index.

We make queries of the form

SELECT * FROM publications
WHERE id >= 14032924480302800156 AND cluster_id IS NULL
ORDER BY id
LIMIT 0, 200;

Here is the problem: The larger the id value (14032924480302800156 in the example above), the slower the request.

In other words, requests for low id value are fast (< 0.1 s) but the higher the id value, the slower the request (up to minutes).

Everything is fine if we use another (indexed) column in the WHERE clause. For instance

SELECT * FROM publications
WHERE inserted_at >= '2014-06-20 19:30:25' AND cluster_id IS NULL
ORDER BY inserted_at
LIMIT 0, 200;

where inserted_at is of type timestamp.

Edit:

Output of EXPLAIN when using id >= 14032924480302800156:

id | select_type | table        | type | possible_keys      | key        | key_len | ref   | rows     | Extra
---+-------------+--------------+------+--------------------+------------+---------+-------+----------+------------
1  | SIMPLE      | publications | ref  | PRIMARY,cluster_id | cluster_id | 9       | const | 71647796 | Using where

Output of EXPLAIN when using inserted_at >= '2014-06-20 19:30:25':

id | select_type | table        | type | possible_keys          | key        | key_len | ref   | rows     | Extra
---+-------------+--------------+------+------------------------+------------+---------+-------+----------+------------
1  | SIMPLE      | publications | ref  | inserted_at,cluster_id | cluster_id | 9       | const | 71647796 | Using where

807

asked Jul 16 '15 09:07

François Beaune

1 Answers

There is some guesswork involved about MySQL using indexes in the wrong order. PRIMARY index seems to be treated in a completely different way than the others.

In a query with a primary key condition indexes PRIMARY and on cluster_id can be used. For some reason, MySQL ignored PRIMARY index and looks at an index on cluster_id first, where you have a condition: it should be NULL. That leaves us with a huge potentially unordered (NULLs everywhere!) set of rows to be filtered by id.

With the next query, however, it's different: PRIMARY index cannot be used at all, so MySQL figures what to use in a better way, apparently using an index on inserted_at first without any hints.

What it should actually do in first query is take PRIMARY index first (tell it to do so). I am not a MySQL user, all my guesswork is backed only by my own understanding of internal data structures. I don't know whether it can apply an index on cluster_id on top of the results, but creating a composite index and comparing performance with and without it may give clues on whether it's used.

151

answered Oct 16 '22 05:10

D-side

Related questions
                            
                                Is this a valid SQL conditional expression or a MySQL bug (feature)?
                            
                                Will an SQL statement stop execution on the first match in an OR statement?
                            
                                Create a table with column names derived from row values of another table
                            
                                Issue with a manually instantiated SessionState provider
                            
                                UTF-8 and German characters?
                            
                                Having Lower-Case Page Titles in Mediawiki
                            
                                MySql MyISAM INSERT slowness
                            
                                MySQL Slow large query
                            
                                MySQL error in Trigger "Unknown column in 'NEW'"
                            
                                Scenario of SELECT a text column in MySQL
                            
                                Looping through large data array in PHP
                            
                                PHPMailer send base64 image
                            
                                Hot vs cold mysql schema migrations and improving speed
                            
                                sql join on string = integer to work on any RDBMS
                            
                                CodeIgniter's is_unique always saying value already exists
                            
                                PDOException Syntax error or access violation 1142, when creating view referencing other views
                            
                                mysql loop with variable incrementing
                            
                                How to display SQL query results in one line?
                            
                                Distributed Transaction on Linked Server between sql server and mysql
                            
                                How to refresh dropdown without page refresh?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With