Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DELETE performance hit on a large MySQL table with index

Let's say we have a web forum application with a MySQL 5.6 database that are accessed 24/7 by many many users. Now there is a table like this for metadata of notifications sent to users.

| notifications | CREATE TABLE `notifications` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `user_id` bigint(20) unsigned NOT NULL,
 `message_store_id` bigint(20) unsigned NOT NULL,
 `status` varchar(10) COLLATE ascii_bin NOT NULL,
 `sent_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 KEY `user_id` (`user_id`,`sent_date`)
) ENGINE=InnoDB AUTO_INCREMENT=736601 DEFAULT CHARSET=ascii COLLATE=ascii_bin |

This table has 1 million rows. With this table, a certain message_store_id becomes suddenly ineffective for some reason and I'm planning to remove all of records with that message_store_id with a single delete statement like

DELETE FROM notifications WHERE message_store_id = 12345;

This single statement affects 10% of the table since this message was sent to so many users. Meanwhile this notifications tables are accessed all the time by thousands of users, so the index must be present. Apparently index recreation is very costly when deleting records, so I'm afraid to do that and cause down time by maxing out the server resources. However, if I drop the index, delete the records then add an index again, I have to shut down the database for some time, unfortunately it is not possible for our service.

I wish MySQL 5.6 is not so stupid that this single statement can kill the database, but I guess it's very likely. My question is, is the index recreation really fatal for a case like this? If so, is there any good strategy for this operation that doesn't require me to halt the database for the maintenance?

like image 530
RyuK Avatar asked Oct 01 '22 08:10

RyuK


1 Answers

There can be a lot of tricks/strategies you could employ depending on details of your application.

  1. If you plan to do these operations on a regular basis (e.g. it's not a one-time thing), AND you have few distinct values in message_store_id, you can use partitions. Partition by value of message_store_id, create X partitions beforehand (where X is some reasonable cap on the amount of values for the id), and then you can delete all the records in that partition in an instant by truncating that partition. A matter of milliseconds. Downside: message_store_id will have to be a part of primary key. Note: you'll have to create partitions beforehand, because the last time I worked with them, alter table add partition re-created the entire table, which is a disaster on large tables.
  2. Even if the alter table truncate partition does not work for you, you can still benefit from partitioning. If you issue a DELETE on the partition, by supplying corresponding where condition, the rest of the table will not be affected/locked by this DELETE op.
  3. Alternative way of deleting records without locking the DB for too long:

    while (true) {
      // assuming autocommit mode
      delete from table where {your condition} limit 10000;
      // at this moment locks are released and other transactions have a chance
      // to do some stuff.
      if (affected rows == 0) {
        break;
      }
      // This is a good place to insert sleep(5) to give other transactions
      // more time to do their stuff before the next chunk gets deleted.
    }
    
like image 55
t7ko Avatar answered Oct 12 '22 23:10

t7ko