We are using MySQL 5.5.42.
We have a table publications
containing about 150 million rows (about 140 GB on an SSD).
The table has many columns, of which two are of particular interest:
id
is primary key of the table and is of type bigint
cluster_id
is a nullable column of type bigint
Both columns have their own (separate) index.
We make queries of the form
SELECT * FROM publications
WHERE id >= 14032924480302800156 AND cluster_id IS NULL
ORDER BY id
LIMIT 0, 200;
Here is the problem: The larger the
id
value (14032924480302800156 in the example above), the slower the request.
In other words, requests for low id
value are fast (< 0.1 s) but the higher the id
value, the slower the request (up to minutes).
Everything is fine if we use another (indexed) column in the WHERE
clause. For instance
SELECT * FROM publications
WHERE inserted_at >= '2014-06-20 19:30:25' AND cluster_id IS NULL
ORDER BY inserted_at
LIMIT 0, 200;
where inserted_at
is of type timestamp
.
Edit:
Output of EXPLAIN
when using id >= 14032924480302800156
:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---+-------------+--------------+------+--------------------+------------+---------+-------+----------+------------
1 | SIMPLE | publications | ref | PRIMARY,cluster_id | cluster_id | 9 | const | 71647796 | Using where
Output of EXPLAIN
when using inserted_at >= '2014-06-20 19:30:25'
:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---+-------------+--------------+------+------------------------+------------+---------+-------+----------+------------
1 | SIMPLE | publications | ref | inserted_at,cluster_id | cluster_id | 9 | const | 71647796 | Using where
The IS NULL operator is used to test for empty values (NULL values).
The IS NULL constraint can be used whenever the column is empty and the symbol ( ' ') is used when there is empty value. mysql> SELECT * FROM ColumnValueNullDemo WHERE ColumnName IS NULL OR ColumnName = ' '; After executing the above query, the output obtained is.
In HeidiSql, you can insert NULL by clicking on a cell, and then Ctrl+Shift+N.
There is some guesswork involved about MySQL using indexes in the wrong order. PRIMARY
index seems to be treated in a completely different way than the others.
In a query with a primary key condition indexes PRIMARY
and on cluster_id
can be used. For some reason, MySQL ignored PRIMARY
index and looks at an index on cluster_id
first, where you have a condition: it should be NULL
. That leaves us with a huge potentially unordered (NULL
s everywhere!) set of rows to be filtered by id
.
With the next query, however, it's different: PRIMARY
index cannot be used at all, so MySQL figures what to use in a better way, apparently using an index on inserted_at
first without any hints.
What it should actually do in first query is take PRIMARY
index first (tell it to do so). I am not a MySQL user, all my guesswork is backed only by my own understanding of internal data structures. I don't know whether it can apply an index on cluster_id
on top of the results, but creating a composite index and comparing performance with and without it may give clues on whether it's used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With