I needed to increase memory allocated to innodb_buffer_pool_size as the default 8M was far too low. Rick James recommends 70% of RAM for this setting, his has a lot of great information.
Edlerd was correct with both suggestions :-)
I split my data into monthly partitions and then ran a 6,000 row response query which originally took between 6 to 12 seconds. It now completes in less than a second (.984/.031). I ran this using the default innodb buffer size (innodb_buffer_pool_size = 8M) to make sure it wasnt just the memory increase.
I then set innodb_buffer_pool_size = 4G and ran the query with an even better response of .062/.032.
I’d also like to mention that increasing the memory has also improved the overall speed of my web application and service which receives and writes messages to this table, I am astounded at how much of a difference this configuration setting has made. The Time To First Byte (TTFB) from my web server is now almost on par with MySQL Workbench which at times would reach 20 seconds.
I also found that the slow query log file was an excellent tool to identify issues, it was there that I saw it suggesting my innodb_buffer_pool_size was low and higlighted all the poor performing queries. This also identified areas where I needed to index other tables.
I am in the process of refactoring a large table that logs telemetry data, it has been running for about 4-5 months and has generated approx. 54 million records with an average row size approx. 380 bytes.
I have started to see some performance lag on one of my raw data queries that returns all logs for a device over a 24 hour period.
Initially I thought it was indexing, but I think it is the amount of I/O that needs to be processed by MySQL. A typical 24 hour query would contain 2.2k 3k to 9k records and I’d actually like to support an export of about 7 days.
I am not experienced in database performance tuning so still just learning the ropes. I am considering a few strategies.
1,2 (INDEXES): I’ll rework my indexes with my queries, but I think I am good here as Explain is showing 100% hit, unless I am reading this wrong.
I’ll try a covering index when they are rebuilt, but how do I determine the knock on effects of making a bad setting? E.G. insert speeds are compromised.
How would I best monitor the performance of my table in a live environment?
EDIT: I've just started using the slow log file which looks like a good tool for finding issues and I suppose a query on the performance_schema might be another option?
3 (PARTITIONING): I have read a bit about partitions and not sure if the size of my data would make much of a difference.
Rick James suggests >1M records, I’m at 54M and would like to keep around 300M prior to archiving, is my table is complex enough to benefit?
I have to test this out myself as I do not have experience with any of this stuff and it all theoretical to me. I just don’t want to go down this path if it isn’t suitable for my needs.
4 (Vertical partitioning via ‘joined’ detail table): I don’t I think am having table scan issues and I need all rows, so I am not sure this technique would be of benefit.
5 (Use limits and fetch again): Would this free up the server if I used less of its time in a single request? Would I see better I/O throughput at the cost of more commands on the same connection?
6 (Review Config): The other piece would be to review the default non developer configuration that is used when you install MySQL, perhaps there are some settings that can be adjusted? :-)
Thanks for reading, keen to hear any and all suggestions.
The following FYI:
TABLE:
CREATE TABLE `message_log` (
`db_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`db_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`created` datetime DEFAULT NULL,
`device_id` int(10) unsigned NOT NULL,
`display_name` varchar(50) DEFAULT NULL,
`ignition` binary(1) DEFAULT NULL COMMENT 'This is actually IO8 from the falcom device',
`sensor_a` float DEFAULT NULL,
`sensor_b` float DEFAULT NULL,
`lat` double DEFAULT NULL COMMENT 'default GPRMC format ddmm.mmmm \n',
`lon` double DEFAULT NULL COMMENT 'default GPRMC longitude format dddmm.mmmm ',
`heading` float DEFAULT NULL,
`speed` float DEFAULT NULL,
`pos_validity` char(1) DEFAULT NULL,
`device_temp` float DEFAULT NULL,
`device_volts` float DEFAULT NULL,
`satellites` smallint(6) DEFAULT NULL, /* TINYINT will suffice */
`navdist` double DEFAULT NULL,
`navdist2` double DEFAULT NULL,
`IO0` binary(1) DEFAULT NULL COMMENT 'Duress',
`IO1` binary(1) DEFAULT NULL COMMENT 'Fridge On/Off',
`IO2` binary(1) DEFAULT NULL COMMENT 'Not mapped',
`msg_name` varchar(20) DEFAULT NULL, /* Will be removed */
`msg_type` varchar(16) DEFAULT NULL, /* Will be removed */
`msg_id` smallint(6) DEFAULT NULL,
`raw` text, /* Not needed in primary query, considering adding to single table mapped to this ID or a UUID correlation ID to save on @ROWID query */
PRIMARY KEY (`db_id`),
KEY `Name` (`display_name`),
KEY `Created` (`created`),
KEY `DeviceID_AND_Created` (`device_id`,`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DeviceID_AND_Created is the main index. I need the PK clustered index because I am using the record ID in a summary table that keeps track of the last message for a given device. Created would be the partition column, so I guess that would also be added to the PK cluster?
QUERY:
SELECT
ml.db_id, ml.db_created, ml.created, ml.device_id, ml.display_name, bin(ml.ignition) as `ignition`,
bin(ml.IO0) as `duress`, bin(ml.IO1) as `fridge`,ml.sensor_a, ml.sensor_b, ml.lat, ml.lon, ml.heading,
ml.speed,ml.pos_validity, ml.satellites, ml.navdist2, ml.navdist,ml.device_temp, ml.device_volts,ml.msg_id
FROM message_log ml
WHERE ml.device_id = @IMEI
AND ml.created BETWEEN @STARTDATE AND DATE_ADD(@STARTDATE,INTERVAL 24 hour)
ORDER BY ml.db_id;
This returns all logs for a given 24 hour period which at the moment is approx. 3k to 9k rows, average row size 381 bytes and will be reduced once I remove one of the TEXT fields (raw)
Implement ranged partitioning by date: a) Keep monthly partitions. E.g. last 6 months b) Move anything older to archive table.
This is a very good idea. I gues all the writes will be in the newest partition and you will query recent data only. You always want a situation where your data and index fits in memory. So no disk i/o on reads.
Depending on your use case it might even be wise to have one partition per week. Then you only have to keep max two weeks of data in memory for reading the last 7 days.
You might also want to tune your buffer sizes (i.e. innodb_buffer_pool_size) if you are using innodb as a engine or myisam_key_cache when using myisam engine.
Also adding ram to the DB machine usually helps as the os can then have the data files in memory.
If you have heavy writes you can also tune other options (i.e. how often writes are persisted to disk with innodb_log_buffer_size). This is in order to let dirty pages be in memory for longer to avoid writing them back to disk too often.
For those who are curious, the following is what I used to create my partition and configure memory.
Updated PK to include the range column used in partition
ALTER TABLE message_log
CHANGE COLUMN created DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
DROP PRIMARY KEY,
ADD PRIMARY KEY (db_id, created);
Added the partitions using ALTER TABLE.
In hindsight, I should have created each partition as a single ALTER statement and used Reorganize Partition (and here) on subsequent partitions as doing it in one hit consumed a lot of resources and time.
ALTER TABLE message_log
PARTITION BY RANGE(to_days(created)) (
partition invalid VALUES LESS THAN (0),
partition from201607 VALUES LESS THAN (to_days('2016-08-01')),
partition from201608 VALUES LESS THAN (to_days('2016-09-01')),
partition from201609 VALUES LESS THAN (to_days('2016-10-01')),
partition from201610 VALUES LESS THAN (to_days('2016-11-01')),
partition from201611 VALUES LESS THAN (to_days('2016-12-01')),
partition from201612 VALUES LESS THAN (to_days('2017-01-01')),
partition from201701 VALUES LESS THAN (to_days('2017-02-01')),
partition from201702 VALUES LESS THAN (to_days('2017-03-01')),
partition from201703 VALUES LESS THAN (to_days('2017-04-01')),
partition from201704 VALUES LESS THAN (to_days('2017-05-01')),
partition future values less than (MAXVALUE)
);
NOTE: I am not sure if using to_days() or the raw column makes much difference, but I've seen it used in most examples so I've taken it on as an assumed best practice.
To change the value of innodb_db_buffer_pool_size you can find info: MySQL InnoDB Buffer Pool Resize and Rick Jame's page on memory
You can also do it in MySQL Workbench in the options file menu and then the innoDB tab. Any changes you make here will be written in the config file, but you'll need to stop and start MySQL to read out the configuration, otherwise you can also set the global value to do it live.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With