Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing Large MySQL InnoDB Tables

Adding a new column or adding a new index can take hours and days for large innodb tables in MySQL with more than 10 million rows. What is the best way to increase the performance on large innodb tables in these two cases? More memory, tweaking the configuration (for example increasing the sort_buffer_size or innodb_buffer_pool_size), or some kind of trick? Instead of altering a table directly, one could create a new one, change it, and copy the old data the new, like this which is useful for ISAM tables and multiple changes:

CREATE TABLE tablename_tmp LIKE tablename; ALTER TABLE tablename_tmp ADD fieldname fieldtype; INSERT INTO tablename_tmp SELECT * FROM tablename; ALTER TABLE tablename RENAME tablename_old; ALTER TABLE tablename_tmp RENAME tablename; 

Is it recommendable for innodb tables, too, or is it just what the ALTER TABLE command does anway?

like image 786
0x4a6f4672 Avatar asked Jul 12 '12 10:07

0x4a6f4672


1 Answers

Edit 2016: we've recently (August 2016) released gh-ost, modifying my answer to reflect it.

Today there are several tools which allow you to do online alter table for MySQL. These are:

  • edit 2016: gh-ost: GitHub's triggerless schema migration tool (disclaimer: I am author of this tool)
  • oak-online-alter-table, as part of the openark-kit (disclaimer: I am author of this tool)
  • pt-online-schema-change, as part of the Percona Toolkit
  • Facebook's online schema change for MySQL

Let's consider the "normal" `ALTER TABLE`:

A large table will take long time to ALTER. innodb_buffer_pool_size is important, and so are other variables, but on very large table they are all negligible. It just takes time.

What MySQL does to ALTER a table is to create a new table with new format, copy all rows, then switch over. During this time the table is completely locked.

Consider your own suggestion:

It will most probably perform worst of all options. Why is that? Because you're using an InnoDB table, the INSERT INTO tablename_tmp SELECT * FROM tablename makes for a transaction. a huge transaction. It will create even more load than the normal ALTER TABLE.

Moreover, you will have to shut down your application at that time so that it does not write (INSERT, DELETE, UPDATE) to your table. If it does - your whole transaction is pointless.

What the online tools provide

The tools do not all work alike. However, the basics are shared:

  • They create a "shadow" table with altered schema
  • They create and use triggers to propagate changes from original table to ghost table
  • They slowly copy all the rows from your table to shadow table. They do so in chunks: say, 1,000 rows at a time.
  • They do all the above while you are still able to access and manipulate the original table.
  • When satisfied, they swap the two, using a RENAME.

The openark-kit tool has been in use for 3.5 years now. The Percona tool is a few months old, but possibly more tested then the former. Facebook's tool is said to work well for Facebook, but does not provide with a general solution to the average user. I haven't used it myself.

Edit 2016: gh-ost is a triggerless solution, which significantly reduces master write-load on the master, decoupling the migration write load from the normal load. It is auditable, controllable, testable. We've developed it internally at GitHub and released it as open source; we're doing all our production migrations via gh-ost today. See more here.

Each tool has its own limitations, look closely at documentation.

The conservative way

The conservative way is to use an Active-Passive Master-Master replication, do the ALTER on the standby (passive) server, then switch roles and do the ALTER again on what used to be the active server, now turned passive. This is also a good option, but requires an additional server, and deeper knowledge of replication.

like image 105
Shlomi Noach Avatar answered Oct 06 '22 01:10

Shlomi Noach