Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimise a large table in MySQL, when can I benefit from partitioning?

Tags:

In summary, date range partitioning and memory configuration achieved my goal.

I needed to increase memory allocated to innodb_buffer_pool_size as the default 8M was far too low. Rick James recommends 70% of RAM for this setting, his has a lot of great information.

Edlerd was correct with both suggestions :-)

I split my data into monthly partitions and then ran a 6,000 row response query which originally took between 6 to 12 seconds. It now completes in less than a second (.984/.031). I ran this using the default innodb buffer size (innodb_buffer_pool_size = 8M) to make sure it wasnt just the memory increase.

I then set innodb_buffer_pool_size = 4G and ran the query with an even better response of .062/.032.

I’d also like to mention that increasing the memory has also improved the overall speed of my web application and service which receives and writes messages to this table, I am astounded at how much of a difference this configuration setting has made. The Time To First Byte (TTFB) from my web server is now almost on par with MySQL Workbench which at times would reach 20 seconds.

I also found that the slow query log file was an excellent tool to identify issues, it was there that I saw it suggesting my innodb_buffer_pool_size was low and higlighted all the poor performing queries. This also identified areas where I needed to index other tables.

EDIT 2016-11-12 SOLUTION

I am in the process of refactoring a large table that logs telemetry data, it has been running for about 4-5 months and has generated approx. 54 million records with an average row size approx. 380 bytes.

I have started to see some performance lag on one of my raw data queries that returns all logs for a device over a 24 hour period.

Initially I thought it was indexing, but I think it is the amount of I/O that needs to be processed by MySQL. A typical 24 hour query would contain 2.2k 3k to 9k records and I’d actually like to support an export of about 7 days.

I am not experienced in database performance tuning so still just learning the ropes. I am considering a few strategies.

  1. Tweak compound indexes according to query for raw data, although I think my indexes are OK as the explain plan is showing 100% hit rate.
  2. Consider creating a covering index to include all rows needed
  3. Implement ranged partitioning by date: a) Keep monthly partitions. E.g. last 6 months b) Move anything older to archive table.
  4. Create a separate table (vertical partitioning) with the raw data and join it with the IDs of the primary query table. Not sure this is my problem as my indexes are working.
  5. Change my queries to pull data in batches with limits, then order by created date limit X and carry on until no more records are returned.
  6. Review server configuration

1,2 (INDEXES): I’ll rework my indexes with my queries, but I think I am good here as Explain is showing 100% hit, unless I am reading this wrong.

I’ll try a covering index when they are rebuilt, but how do I determine the knock on effects of making a bad setting? E.G. insert speeds are compromised.

How would I best monitor the performance of my table in a live environment?

EDIT: I've just started using the slow log file which looks like a good tool for finding issues and I suppose a query on the performance_schema might be another option?

3 (PARTITIONING): I have read a bit about partitions and not sure if the size of my data would make much of a difference.

Rick James suggests >1M records, I’m at 54M and would like to keep around 300M prior to archiving, is my table is complex enough to benefit?

I have to test this out myself as I do not have experience with any of this stuff and it all theoretical to me. I just don’t want to go down this path if it isn’t suitable for my needs.

4 (Vertical partitioning via ‘joined’ detail table): I don’t I think am having table scan issues and I need all rows, so I am not sure this technique would be of benefit.

5 (Use limits and fetch again): Would this free up the server if I used less of its time in a single request? Would I see better I/O throughput at the cost of more commands on the same connection?

6 (Review Config): The other piece would be to review the default non developer configuration that is used when you install MySQL, perhaps there are some settings that can be adjusted? :-)

Thanks for reading, keen to hear any and all suggestions.

The following FYI:

TABLE:

CREATE TABLE `message_log` (
    `db_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `db_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `created` datetime DEFAULT NULL,
    `device_id` int(10) unsigned NOT NULL,
    `display_name` varchar(50) DEFAULT NULL,
    `ignition` binary(1) DEFAULT NULL COMMENT 'This is actually IO8 from the falcom device',
    `sensor_a` float DEFAULT NULL,
    `sensor_b` float DEFAULT NULL,
    `lat` double DEFAULT NULL COMMENT 'default GPRMC format ddmm.mmmm \n',
    `lon` double DEFAULT NULL COMMENT 'default GPRMC longitude format dddmm.mmmm ',
    `heading` float DEFAULT NULL,
    `speed` float DEFAULT NULL,
    `pos_validity` char(1) DEFAULT NULL,
    `device_temp` float DEFAULT NULL,
    `device_volts` float DEFAULT NULL,
    `satellites` smallint(6) DEFAULT NULL, /* TINYINT will suffice */
    `navdist` double DEFAULT NULL,
    `navdist2` double DEFAULT NULL,
    `IO0` binary(1) DEFAULT NULL COMMENT 'Duress',
    `IO1` binary(1) DEFAULT NULL COMMENT 'Fridge On/Off',
    `IO2` binary(1) DEFAULT NULL COMMENT 'Not mapped',
    `msg_name` varchar(20) DEFAULT NULL, /* Will be removed */
    `msg_type` varchar(16) DEFAULT NULL, /* Will be removed */
    `msg_id` smallint(6) DEFAULT NULL,
    `raw` text, /* Not needed in primary query, considering adding to single table mapped to this ID or a UUID correlation ID to save on @ROWID query */
PRIMARY KEY (`db_id`),
KEY `Name` (`display_name`),
KEY `Created` (`created`),
KEY `DeviceID_AND_Created` (`device_id`,`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DeviceID_AND_Created is the main index. I need the PK clustered index because I am using the record ID in a summary table that keeps track of the last message for a given device. Created would be the partition column, so I guess that would also be added to the PK cluster?

QUERY:

SELECT 
    ml.db_id, ml.db_created, ml.created, ml.device_id, ml.display_name, bin(ml.ignition) as `ignition`, 
    bin(ml.IO0) as `duress`, bin(ml.IO1) as `fridge`,ml.sensor_a, ml.sensor_b, ml.lat, ml.lon, ml.heading, 
    ml.speed,ml.pos_validity, ml.satellites, ml.navdist2, ml.navdist,ml.device_temp, ml.device_volts,ml.msg_id
FROM message_log ml 
WHERE ml.device_id = @IMEI
AND ml.created BETWEEN @STARTDATE AND DATE_ADD(@STARTDATE,INTERVAL 24 hour) 
ORDER BY ml.db_id;

This returns all logs for a given 24 hour period which at the moment is approx. 3k to 9k rows, average row size 381 bytes and will be reduced once I remove one of the TEXT fields (raw)

like image 827
Nicholas Avatar asked Nov 10 '16 04:11

Nicholas


2 Answers

Implement ranged partitioning by date: a) Keep monthly partitions. E.g. last 6 months b) Move anything older to archive table.

This is a very good idea. I gues all the writes will be in the newest partition and you will query recent data only. You always want a situation where your data and index fits in memory. So no disk i/o on reads.

Depending on your use case it might even be wise to have one partition per week. Then you only have to keep max two weeks of data in memory for reading the last 7 days.

You might also want to tune your buffer sizes (i.e. innodb_buffer_pool_size) if you are using innodb as a engine or myisam_key_cache when using myisam engine.

Also adding ram to the DB machine usually helps as the os can then have the data files in memory.

If you have heavy writes you can also tune other options (i.e. how often writes are persisted to disk with innodb_log_buffer_size). This is in order to let dirty pages be in memory for longer to avoid writing them back to disk too often.

like image 115
edlerd Avatar answered Sep 26 '22 17:09

edlerd


For those who are curious, the following is what I used to create my partition and configure memory.

Creating the partitions

  1. Updated PK to include the range column used in partition

    ALTER TABLE message_log 
    CHANGE COLUMN created DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    DROP PRIMARY KEY,
    ADD PRIMARY KEY (db_id, created);
    
  2. Added the partitions using ALTER TABLE.

In hindsight, I should have created each partition as a single ALTER statement and used Reorganize Partition (and here) on subsequent partitions as doing it in one hit consumed a lot of resources and time.

ALTER TABLE message_log 
PARTITION BY RANGE(to_days(created)) (
    partition invalid VALUES LESS THAN (0),
    partition from201607 VALUES LESS THAN (to_days('2016-08-01')),
    partition from201608 VALUES LESS THAN (to_days('2016-09-01')),
    partition from201609 VALUES LESS THAN (to_days('2016-10-01')),
    partition from201610 VALUES LESS THAN (to_days('2016-11-01')),
    partition from201611 VALUES LESS THAN (to_days('2016-12-01')),
    partition from201612 VALUES LESS THAN (to_days('2017-01-01')),
    partition from201701 VALUES LESS THAN (to_days('2017-02-01')),
    partition from201702 VALUES LESS THAN (to_days('2017-03-01')),
    partition from201703 VALUES LESS THAN (to_days('2017-04-01')),
    partition from201704 VALUES LESS THAN (to_days('2017-05-01')),
    partition future values less than (MAXVALUE) 
);

NOTE: I am not sure if using to_days() or the raw column makes much difference, but I've seen it used in most examples so I've taken it on as an assumed best practice.

Setting the buffer pool size

To change the value of innodb_db_buffer_pool_size you can find info: MySQL InnoDB Buffer Pool Resize and Rick Jame's page on memory

You can also do it in MySQL Workbench in the options file menu and then the innoDB tab. Any changes you make here will be written in the config file, but you'll need to stop and start MySQL to read out the configuration, otherwise you can also set the global value to do it live.

like image 45
Nicholas Avatar answered Sep 22 '22 17:09

Nicholas