Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Throttle ALTER TABLE disk utilization

I'll start off with something from the MySQL Online DDL Limitations page:

There is no mechanism to pause an online DDL operation or to throttle I/O or CPU usage for an online DDL operation.

However, I'm still interested in solutions that I might have missed.

The situation: Indexes are getting larger and larger, and they're getting so large that there won't be enough memory for the queries that are used, causing disk I/O to skyrocket, and everything to descend into utter chaos. New composite indexes have been created that are smaller, but the problem is running the ALTER TABLE without breaking anything.

The facts are as follows:

  1. It's an InnoDB table.
  2. The table has no primary key or unique index.
  3. No combination of columns is suited as a primary key or unique index.
  4. The table has no foreign keys.
  5. The table is partitioned per month (currently 50).
  6. The table must accept writes at all times.
  7. The newest 3-6 partitions must accept reads.
  8. There is an id column, but this is not unique.
  9. The table consists of approximately 2 billion rows.
  10. The partition of the current month is the only one that receives writes.
  11. Partitions are made 1 month in advance; there's always one empty partition.

The SHOW CREATE TABLE (I didn't include all partitions):

CREATE TABLE `my_wonky_table` (
  `id` bigint(20) unsigned NOT NULL,
  `login` varchar(127) DEFAULT NULL,
  `timestamp` int(10) unsigned NOT NULL,
  `ip` varchar(32) CHARACTER SET ascii DEFAULT NULL,
  `val_1` int(10) unsigned DEFAULT NULL,
  `val_2` varchar(127) DEFAULT NULL,
  `val_3` varchar(255) DEFAULT NULL,
  `val_4` varchar(127) DEFAULT NULL,
  `val_5` int(10) unsigned DEFAULT NULL,
  KEY `my_wonky_table_id_idx` (`id`),
  KEY `my_wonky_table_timestamp_idx` (`timestamp`),
  KEY `my_wonky_table_val_1_idx` (`val_1`,`id`),
  KEY `my_wonky_table_val_2_idx` (`val_2`,`id`),
  KEY `my_wonky_table_val_4_idx` (`val_4`,`id`),
  KEY `my_wonky_table_val_5_idx` (`val_5`,`id`),
  KEY `my_wonky_table_ip_idx` (`ip`,`id`),
  KEY `my_wonky_table_login_idx` (`login`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`id`)
(PARTITION pdefault VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

Regarding queries: It's always a SELECT on id, with everything else being used to filter.

What I would like to avoid:

  • Turning off the database instance.
  • Disk I/O of 100%

I've thought of using the pt-online-schema-change tool to throttle, but ran into the no primary key wall. A different solution would be to do this in code, effectively moving the triggers to the code base, and slowly copying over data using somewhat weird chunks (e.g. chunks of an hour's worth of data using a timestamp column) because there's no unique index.

Are there other solutions and/or tools available?

like image 828
Aeveus Avatar asked Apr 06 '17 13:04

Aeveus


2 Answers

I present this as a separate Answer, since the innermost part is totally different.

As with my other answer, you need the new table with new indexes, plus a script to copy all the data over. However, the meat is to simulate the Trigger in your application.

Fortunately, you have id, even though it is not the PRIMARY KEY. And, even if it is not UNIQUE, it can be used (assuming you don't have thousands of rows with the same id -- if you do, we can talk further).

The "copy script" and the application talk to each other.

The copy script is in a long loop:

  • SELECT GET_LOCK('copy', 5), high_water_mark FROM tbl; -- (or some other timeout)
  • Copy rows with id BETWEEN high_water_mark AND high_water_mark + 999.
  • UPDATE tbl SET high_water_mark = high_water_mark + 1000;
  • Pause briefly (1 second?)
  • Loop until no more ids

The application, when reading, continues to read from the old table. But when writing, it does:

  • SELECT GET_LOCK('copy', 5), high_water_mark FROM tbl; -- (or some other timeout)
  • If timed out, something needs fixing.
  • Write to old table -- (hence, reads continue to work)
  • If id <= high_water_mark, write to new table also.
  • SELECT RELEASE_LOCK('copy');

Monitor the progress. At some point, you will need to stop everything, copy the last few rows and do the RENAME TABLE.

I don't know your optimal values for timeouts, sleep, or chunk size. But I don't think it is wise for chunk size to be bigger than 1K.

This technique has advantages for a variety of changes you might need to do in the future, so keep the guts in place.

like image 141
Rick James Avatar answered Oct 24 '22 07:10

Rick James


This is going to come down to what MySQL variant & version you are using, but if it's one thread per connection (my.cnf thread_handling=one-thread-per-connection, which might be default on your build), and you can put your ALTER TABLE workload in a new connection, then workload is a unique PID, and you can use ionice/renice on it.

I have somewhat of a crappy answer, but it's less invasive than the other options.

If you look at ps -eLf |grep mysql you can see the threads/lightweight-processes, and just need to figure out what PID belongs to your specific connection. If you connect via TCP, you could match your local connection port and map that against lsof to find the specific thread. Other ways are possible w/ strace, systemtap and more, or running an initial query you can watch for.

After that, you can use ionice/renice to affect the PID on the system. You will really want to make sure you capture what PID it was, and reset the nice & priority level afterwards, to not affect anything else.

As with the others, you really need to reshape this table in the long term. Partitions are useful, but not the endgame, since you're running 1.3TiB of online data, and you state that you only need to read from the most recent 3-6 partitions. Coming from MySQL before native partitions were added, I think this would be a good case for a VIEW and separate tables (atomically update the VIEW when you need to rollover). It would also let you trivially move some older tables to offline storage.

like image 44
robbat2 Avatar answered Oct 24 '22 09:10

robbat2