We need to transfer 15TB
of data from one server to another as fast as we can. We're currently using rsync
but we're only getting speeds of around 150Mb/s
, when our network is capable of 900+Mb/s
(tested with iperf
). I've done tests of the disks, network, etc and figured it's just that rsync is only transferring one file at a time which is causing the slowdown.
I found a script to run a different rsync for each folder in a directory tree (allowing you to limit to x number), but I can't get it working, it still just runs one rsync at a time.
I found the script
here (copied below).
Our directory tree is like this:
/main - /files - /1 - 343 - 123.wav - 76.wav - 772 - 122.wav - 55 - 555.wav - 324.wav - 1209.wav - 43 - 999.wav - 111.wav - 222.wav - /2 - 346 - 9993.wav - 4242 - 827.wav - /3 - 2545 - 76.wav - 199.wav - 183.wav - 23 - 33.wav - 876.wav - 4256 - 998.wav - 1665.wav - 332.wav - 112.wav - 5584.wav
So what I'd like to happen is to create an rsync for each of the directories in /main/files, up to a maximum of, say, 5 at a time. So in this case, 3 rsyncs would run, for /main/files/1
, /main/files/2
and /main/files/3
.
I tried with it like this, but it just runs 1 rsync at a time for the /main/files/2
folder:
#!/bin/bash # Define source, target, maxdepth and cd to source source="/main/files" target="/main/filesTest" depth=1 cd "${source}" # Set the maximum number of concurrent rsync threads maxthreads=5 # How long to wait before checking the number of rsync threads again sleeptime=5 # Find all folders in the source directory within the maxdepth level find . -maxdepth ${depth} -type d | while read dir do # Make sure to ignore the parent folder if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ] then # Strip leading dot slash subfolder=$(echo "${dir}" | sed 's@^\./@@g') if [ ! -d "${target}/${subfolder}" ] then # Create destination folder and set ownership and permissions to match source mkdir -p "${target}/${subfolder}" chown --reference="${source}/${subfolder}" "${target}/${subfolder}" chmod --reference="${source}/${subfolder}" "${target}/${subfolder}" fi # Make sure the number of rsync threads running is below the threshold while [ `ps -ef | grep -c [r]sync` -gt ${maxthreads} ] do echo "Sleeping ${sleeptime} seconds" sleep ${sleeptime} done # Run rsync in background for the current subfolder and move one to the next one nohup rsync -a "${source}/${subfolder}/" "${target}/${subfolder}/" </dev/null >/dev/null 2>&1 & fi done # Find all files above the maxdepth level and rsync them as well find . -maxdepth ${depth} -type f -print0 | rsync -a --files-from=- --from0 ./ "${target}/"
Use rsync Archive Mode and Compression to Speed Up Transfers Another way to save network bandwidth and speed up transfers is to use compression, by adding -z as a command line option.
Parallel rsync can be set up using a wrapper like this one: "[Multi-Stream-rsync] will split the transfer in multiple buckets while the source is scanned… The main limitation is it does not handle remote source or target directory, they must be locally accessible (local disk, nfs/cifs/other mountpoint)."
rsync is much faster than cp for this, because it will check file sizes and timestamps to see which ones need to be updated, and you can add more refinements. You can even make it do a checksum instead of the default 'quick check', although this will take longer.
If you are like me, you will have found through trial and error that multiple rsync sessions each taking a specific ranges of files will complete much faster. Rsync is not multithreaded, but for the longest time I sure wished it was.
Updated answer (Jan 2020)
xargs
is now the recommended tool to achieve parallel execution. It's pre-installed almost everywhere. For running multiple rsync
tasks the command would be:
ls /srv/mail | xargs -n1 -P4 -I% rsync -Pa % myserver.com:/srv/mail/
This will list all folders in /srv/mail
, pipe them to xargs
, which will read them one-by-one and and run 4 rsync
processes at a time. The %
char replaces the input argument for each command call.
Original answer using parallel
:
ls /srv/mail | parallel -v -j8 rsync -raz --progress {} myserver.com:/srv/mail/{}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With