Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

improving performance of mysql load data infile

I'm trying to bulk load around 12m records into a InnoDB table in a (local) mysql using LOAD DATA INFILE (from CSV) and finding it's taking a very long time to complete.

The primary key type is UUID and the keys are unsorted in the data files.

I've split the data file into files containing 100000 records and import it as:

mysql -e 'ALTER TABLE customer DISABLE KEYS;'
for file in *.csv
    mysql -e "SET sql_log_bin=0;SET FOREIGN_KEY_CHECKS=0; SET UNIQUE_CHECKS=0;
    SET AUTOCOMMIT=0;LOAD DATA INFILE '${file}' INTO TABLE table 
    FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';  COMMIT"

This works fine for the first few hundred thousand records but then the insert time for each subsequent load seems to keep growing (from around 7 seconds to around 2 minutes per load before I killed it.)

I'm running on a machine with 8GB RAM and have set the InnoDB parameters to:

innodb_buffer_pool_size =1024M
innodb_additional_mem_pool_size =512M
innodb_log_file_size = 256M
innodb_log_buffer_size = 256M

I've also tried loading a single CSV containing all rows with no luck - this ran in excess of 2 hours before I killed it.

Is there anything else that could speed this up as this seems like an excessive time to only load 12m records?

like image 771
Michael Avatar asked Jan 09 '12 15:01

Michael


2 Answers

If you know the data is "clean", then you can drop indexes on the affected tables prior to the import and then re-add them after it is complete.

Otherwise, each record causes an index-recalc, and if you have a bunch of indexes, this can REALLY slow things down.

like image 96
cdeszaq Avatar answered Oct 08 '22 08:10

cdeszaq


Its always hard to tell what is the cause of performance issues but these are my 2 cents: Your key being a uuid is randomly distributed which makes it hard to maintain an index. The reason being that keys are stored by range in a file system block, so having random uuids follow each other makes the OS read and write blocks to the file system without leveraging the cache. I don't know if you can change the key, but you could maybe sort the uuids in the input file and see if that helps. FYI, to understand this issue better I would take a look at this blog post and maybe read this book mysql high performance it has a nice chapter about innodb clustered index. Good Luck!

like image 45
Assaf Karmon Avatar answered Oct 08 '22 07:10

Assaf Karmon