The following query is intended to receive a list of unread messages by user. It involves 3 tables: recipients
contains a relation of users to message IDs, messages
contains the messages themselves, and message_readers
contains a list of which users have read which messages.
The query reliably takes 4.9 seconds - this is seriously hurting our performance, and is especially worrisome since we hope the database will eventually be several orders of magnitude larger. Granted, it's an inherently heavy query, but the data set is tiny, and intuitively it seems that it should be much faster. The server has enough memory (32gb) that the entire database should be loaded in RAM at all times, and there's nothing else running on the box.
The tables are all tiny:
recipients: 23581
messages: 9679
message_readers: 2685
The query itself:
SELECT
m.*
FROM
messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
r.id = $user_id
AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)
The explain plan is pretty straightforward:
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| 1 | SIMPLE | r | ref | index_recipients_on_id | index_recipients_on_id | 768 | const | 11908 | Using where |
| 1 | SIMPLE | m | eq_ref | PRIMARY | PRIMARY | 4 | db.r.message_id | 1 | Using index |
| 1 | SIMPLE | mr | ALL | NULL | NULL | NULL | NULL | 2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
There IS an index on message_readers.read_by_id
, but I guess it can't really use it because of the IS NULL condition.
I'm using all default settings except for the following:
key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G
Thanks!
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
MySQL full-text search (FTS) is far much faster than queries using wildcard characters.
By default, the slow query log file is located at /var/lib/mysql/hostname-slow. log. To change the log path or filename, type the following command, replacing path with the path to the file, and filename with the name of the log filename: Copy SET GLOBAL slow_query_log_file = '/path/filename';
Assuming that message_readers
is a subset of recipients
, I recommend making the following changes:
Get rid of the message_readers
table and replace it with a flag on the recipients
table. This will eliminiate the null check and remove a join.
It probably already is, but make sure your clustered index for recipients
is id, message_id
rather than message_id, id
, since nearly all searches for messages will be based on the recipients.
Here is the SELECT that results:
SELECT
r.whatever,
m.whatever,
-- ...
FROM
recipients r
INNER JOIN messages m ON m.id = r.message_id
WHERE
r.id = $user_id
AND r.read_flag = 'N'
UPDATE
Here is the correct version of your query using the existing scheme:
SELECT
r.whatever,
m.whatever,
-- ...
FROM
recipients r
INNER JOIN messages m ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.read_by_id = r.id
AND mr.message_id = m.id
WHERE
r.id = $user_id
AND mr.read_by_id IS NULL
This assumes that your clustered indexes are what would be expected:
recipients: id, message_id
messages: id
message_readers: read_by_id, message_id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With