Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimising MySQL for parallel Import of massive data files. 1 Connection Per Table

I'm doing some preparatory work for a large website migration.

The database is around the 10GB in size and several tables contain > 15 Million records. Unfortunately, this only comes in a large single mysqldump file in SQL format due to client relations outside my remit, but you know how that goes. My goal is to minimize downtime and hence import the data as fast as possible.

I have attempted to use the standard MySQL CLI interface like so:

$mysql database_name < superhuge_sql_file -u username -p

This is however, super slow.

To try and speed things up I've used awk to split the file in to chunks for each table with associated data, and have built a little shell script to try and import the tables in parallel, like so;

#!/bin/sh

awk '/DROP TABLE/{f=0 ;n++; print >(file="out_" n); close("out_" n-1)} f{ print > file}; /DROP TABLE/{f=1}'  superhuge.sql

for (( i = 1; i <= 95; i++ )) 
do
    mysql -u admin --password=thepassword database_name < /path/to/out_$i &
done

It's worth mentioning that this is a "use once and destroy" script (passwords in scripts etc...).

Now, this works, but still takes over 3 hours to complete on a quad core server doing nothing else at present. The tables do import in parallel but not all of them at once, and trying to get MySQL server information through the CLI is very slow during the process. I'm not sure why but trying to access tables using the same mysql user account hangs while this is in process. max_user_connections is unlimited.

I have set max connections to 500 in my.cnf but have otherwise not configured MySQL on this server.

I've had a good hunt around but was wondering if there are any MySQL config options that will help speed this process up, or any other methods I have missed that will be quicker.

like image 393
MentalAgeOf2 Avatar asked Mar 15 '11 13:03

MentalAgeOf2


1 Answers

If you can consider using GNU parallel, please check this example found on wardbekker gist:

# Split MYSQL dump file
zcat dump.sql.gz | awk '/DROP TABLE IF EXISTS/{n++}{print >"out" n ".sql" }'
# Parallel import using GNU Parallel http://www.gnu.org/software/parallel/
ls -rS *.sql | parallel --joblog joblog.txt mysql -uXXX -pYYY db_name "<"

which will split big file into separate SQL files then run parallel for parallel processing.

So to run 10 threads in GNU parallel, you can run:

ls -rS data.*.sql | parallel -j10 --joblog joblog.txt mysql -uuser -ppass dbname "<"

On OS X, it can be:

gunzip -c wiebetaaltwat_stable.sql.gz | awk '/DROP TABLE IF EXISTS/{n++}{filename = "out" n ".sql"; print > filename}'

Source: wardbekker/gist:964146


Related: Import sql files using xargs at Unix.SE

like image 129
kenorb Avatar answered Oct 08 '22 15:10

kenorb