I have a MySQL database with a single InnoDB table with about 300 million rows. There are up to 10 connected clients sending 50-60 read queries per second. Everything was smooth for a few months until recently, when MySQL started stalling, while using large amounts of CPU (100%+. uptime
command shows values like 15, 12, 15.). Queries that would take 500ms now take several seconds, from tens to hundreds. Doing a SHOW PROCESSLIST
shows queries hanging at the Sending data
state.
I'm unable to figure out why and any help is appreciated.
Intel(R) Xeon(R) CPU E5 @ 2.40GHz | 12 Cpus | 32 GB RAM
innodb_file_per_table = 1
tmp-table-size = 32M
max-heap-table-size = 32M
innodb-log-files-in-group = 2
innodb-flush-method = O_DIRECT
innodb-log-file-size = 512M
innodb-buffer-pool-size = 26G
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table = 1
innodb_file_format = barracuda
+----------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| identifier | int(11) | YES | MUL | 0 | |
| timestamp | int(11) | YES | MUL | NULL | |
| rtype | int(5) | YES | MUL | NULL | |
| x1 | int(11) | YES | | NULL | |
| x2 | int(11) | YES | | NULL | |
| net | bigint(20) | YES | | NULL | |
| created_at | datetime | NO | | NULL | |
+----------------+------------+------+-----+---------+----------------+
Indexed and used in the WHERE query:
Data_length = ~18 GB
Index_length = ~16 GB
SELECT identifier, timestamp, x1 AS a, x2 AS b, net
FROM records
WHERE
identifier=1010
AND timestamp >=1463111100
AND timestamp <= 1463738400
AND rtype=5
ORDER BY timestamp;
(Returns about 900 rows. Sometimes completes in less than a second, sometimes 10-100s of seconds)
select_type = SIMPLE
type = index_merge
possible_keys = indeXidentifier, indeXtimestamp, indeXrtype
key = indeXidentifier, indeXrtype
key_len = 4,5
rows = 10641
Extra = Using intersect(indeXidentifier,indeXrtype); Using where
Set long_query_time = 0 (in some cases, you may need to rate limit to not flood the log) Enable the slow log and collect for some time (slow_query_log = 1) Stop collection and process the log with pt-query-digest. Begin reviewing the top queries in times of resource usage.
If your database is being used in high volumes, this can slow the database down. When there are too many queries to process at once, the CPU will bottleneck, resulting in a slow database.
I have two recommendation :
1 . Change the column order in your multi-column index. Recommended order is: identifier, rtype, timestamp.
Index unique scan is faster than index range scan then it is better to appear first.
2 . Change your query like this:
select * from(
SELECT identifier, timestamp, x1 AS a, x2 AS b, net
FROM records
WHERE
identifier=1010
AND timestamp >=1463111100
AND timestamp <= 1463738400
AND rtype=5
) t1 ORDER BY timestamp;
To avoid using index for sorting.
Either of these is optimal for the SELECT
in the Question:
INDEX(rtype, identifier, timestamp)
INDEX(identifier, rtype, timestamp)
The principle is to put all the "= constant" parts of the WHERE
first, then add one more thing on (the 'range' over timestamp
). More cookbook tips.
There is no need to put this in a subquery -- that will only slow things down by building an unnecessary tmp table.
Why did it suddenly go slow? The likely reason is "caching". Right after adding the new index, less stuff was cached in RAM, and any SELECTs
were hitting the disk a lot.
Let's double-check the query plan. Please provide EXPLAIN SELECT ...
. It should be one line long, indicate that it is using the 3-column index, and not say "intersect", "temporary", or "filesort".
If anything is still amiss, please provide that explain, plus SHOW CREATE TABLE
(It is more descriptive than DESCRIBE
.)
Another thing to be sure to do: Turn off the Query Cache. Add/change these settings in my.cnf
and restart the server:
query_cache_type = OFF
query_cache_size = 0
How are the INSERTs
occurring? One row at a time? If they can be 'batching', even a few dozen at a time, that will help significantly.
Since you are commenting about CPU, it sounds like you have some query that is CPU-bound, not I/O-bound. Do SHOW FULL PROCESSLIST;
-- do you see some query with a large "Time"? Is it something you have not mentioned yet?
Please run
SHOW VARIABLES LIKE 'max_connections';
SHOW GLOBAL STATUS LIKE 'Max_used_connections';
(The values may lead to a discussion of "thundering herds".)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With