Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

InnoDB indexes before and after importing

Tags:

sql

mysql

innodb

I'm trying to import a large SQL file that was generated by mysqldump for an InnoDB table but it is taking a very long time even after adjusting some parameters in my.cnf and disabling AUTOCOMMIT (as well as FOREIGN_KEY_CHECKS and UNIQUE_CHECKS but the table does not have any foreign or unique keys). But I'm wondering if it's taking so long because of the several indexes in the table.

Looking at the SQL file, it appears that the indexes are being created in the CREATE TABLE statement, prior to inserting all the data. Based on my (limited) research and personal experience, I've found that it's faster to add the indexes after inserting all the data. Does it not have to check the indexes for every INSERT? I know that mysqldump does have a --disable-keys option which does exactly that – disable the keys prior to inserting, but apparently this only works with MyISAM tables and not InnoDB.

But why couldn't mysqldump not include the keys with the CREATE TABLE statement for InnoDB tables, then do an ALTER TABLE after all the data is inserted? Or does InnoDB work differently, and there is no speed difference?

Thanks!

like image 690
Altherat Avatar asked Jun 21 '18 17:06

Altherat


People also ask

Should you create index before or after insert?

Creating index after data insert is more efficient way (it even often recomended to drop index before batch import and after import recreate it). +1, indexes will markedly slow down an operation involving 100M row insert task, so better to drop them and recreate them.

Why do indexes slow down inserts?

Indexes and constraints will slow inserts because the cost of checking and maintaining those isn't free. The overhead can only be determined with isolated performance testing.

Does MySQL create index automatically?

Even though PRIMARY wasn't specified, MySQL automatically created an index for the primary key. Note: An index cannot be used to create a primary key. However, when a table is created with a primary key, MySQL creates an index called PRIMARY for the primary key values in that same table.


1 Answers

I experimented with this concept a bit at a past job, where we needed a fast method of copying schemas between MySQL servers.

There is indeed a performance overhead when you insert to tables that have secondary indexes. Inserts need to update the clustered index (aka the table), and also update secondary indexes. The more indexes a table has, the more overhead it causes for inserts.

InnoDB has a feature called the change buffer which helps a bit by postponing index updates, but they have to get merged eventually.

Inserts to a table with no secondary indexes are faster, so it's tempting to try to defer index creation until after your data is loaded, as you describe.

Percona Server, a branch of MySQL, experimented with a mysqldump --optimize-keys option. When you use this option, it changes the output of mysqldump to have CREATE TABLE with no indexes, then INSERT all data, then ALTER TABLE to add the indexes after the data is loaded. See https://www.percona.com/doc/percona-server/LATEST/management/innodb_expanded_fast_index_creation.html

But in my experience, the net improvement in performance was small. It still takes a while to insert a lot of rows, even for tables with no indexes. Then the restore needs to run an ALTER TABLE to build the indexes. This takes a while for a large table. When you count the time of INSERTs plus the extra time to build indexes, it's only a few (low single-digit) percents faster than inserting the traditional way, into a table with indexes.

Another benefit of this post-processing index creation is that the indexes are stored more compactly, so if you need to save disk space, that's a better reason to use this technique.

I found it much more beneficial to performance to restore by loading several tables in parallel.

  • The new MySQL 8.0 tool mysqlpump supports multi-threaded dump.
  • The open-source tool mydumper supports multi-threaded dump, and also has a multi-threaded restore tool, called myloader. The worst downside of mydumper/myloader is that the documentation is virtually non-existant, so you need to be an intrepid power user to figure out how to run it.

Another strategy is to use mysqldump --tab to dump CSV files instead of SQL scripts. Bulk-loading CSV files is much faster than executing SQL scripts to restore the data. Well, it dumps an SQL file for the table definition, and a CSV for the data to import. It creates separate files for each table. You have to manually recreate the tables by loading all the SQL files (this is quick), and then use mysqlimport to load the CSV data files. The mysqlimport tool even has a --use-threads option for parallel execution.

Test carefully with different numbers of parallel threads. My experience is that 4 threads is the best. With greater parallelism, InnoDB becomes a bottleneck. But your experience may be different, depending on the version of MySQL and your server hardware's performance capacity.

The fastest restore method of all is when you use a physical backup tool, the most popular is Percona XtraBackup. This allows for fast backups and even faster restores. The backed up files are literally ready to be copied into place and used as live tablespace files. The downside is that you must shut down your MySQL Server to perform the restore.

like image 156
Bill Karwin Avatar answered Oct 19 '22 05:10

Bill Karwin