Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up rsync with Simultaneous/Concurrent File Transfers?

We need to transfer 15TB of data from one server to another as fast as we can. We're currently using rsync but we're only getting speeds of around 150Mb/s, when our network is capable of 900+Mb/s (tested with iperf). I've done tests of the disks, network, etc and figured it's just that rsync is only transferring one file at a time which is causing the slowdown.

I found a script to run a different rsync for each folder in a directory tree (allowing you to limit to x number), but I can't get it working, it still just runs one rsync at a time.

I found the script here (copied below).

Our directory tree is like this:

/main    - /files       - /1          - 343             - 123.wav             - 76.wav          - 772             - 122.wav          - 55             - 555.wav             - 324.wav             - 1209.wav          - 43             - 999.wav             - 111.wav             - 222.wav       - /2          - 346             - 9993.wav          - 4242             - 827.wav       - /3          - 2545             - 76.wav             - 199.wav             - 183.wav          - 23             - 33.wav             - 876.wav          - 4256             - 998.wav             - 1665.wav             - 332.wav             - 112.wav             - 5584.wav 

So what I'd like to happen is to create an rsync for each of the directories in /main/files, up to a maximum of, say, 5 at a time. So in this case, 3 rsyncs would run, for /main/files/1, /main/files/2 and /main/files/3.

I tried with it like this, but it just runs 1 rsync at a time for the /main/files/2 folder:

#!/bin/bash  # Define source, target, maxdepth and cd to source source="/main/files" target="/main/filesTest" depth=1 cd "${source}"  # Set the maximum number of concurrent rsync threads maxthreads=5 # How long to wait before checking the number of rsync threads again sleeptime=5  # Find all folders in the source directory within the maxdepth level find . -maxdepth ${depth} -type d | while read dir do     # Make sure to ignore the parent folder     if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ]     then         # Strip leading dot slash         subfolder=$(echo "${dir}" | sed 's@^\./@@g')         if [ ! -d "${target}/${subfolder}" ]         then             # Create destination folder and set ownership and permissions to match source             mkdir -p "${target}/${subfolder}"             chown --reference="${source}/${subfolder}" "${target}/${subfolder}"             chmod --reference="${source}/${subfolder}" "${target}/${subfolder}"         fi         # Make sure the number of rsync threads running is below the threshold         while [ `ps -ef | grep -c [r]sync` -gt ${maxthreads} ]         do             echo "Sleeping ${sleeptime} seconds"             sleep ${sleeptime}         done         # Run rsync in background for the current subfolder and move one to the next one         nohup rsync -a "${source}/${subfolder}/" "${target}/${subfolder}/" </dev/null >/dev/null 2>&1 &     fi done  # Find all files above the maxdepth level and rsync them as well find . -maxdepth ${depth} -type f -print0 | rsync -a --files-from=- --from0 ./ "${target}/" 
like image 670
BT643 Avatar asked Jun 05 '14 11:06

BT643


People also ask

How do I speed up rsync transfer?

Use rsync Archive Mode and Compression to Speed Up Transfers Another way to save network bandwidth and speed up transfers is to use compression, by adding -z as a command line option.

Can you run rsync in parallel?

Parallel rsync can be set up using a wrapper like this one: "[Multi-Stream-rsync] will split the transfer in multiple buckets while the source is scanned… The main limitation is it does not handle remote source or target directory, they must be locally accessible (local disk, nfs/cifs/other mountpoint)."

Which is faster rsync or cp?

rsync is much faster than cp for this, because it will check file sizes and timestamps to see which ones need to be updated, and you can add more refinements. You can even make it do a checksum instead of the default 'quick check', although this will take longer.

Is rsync single threaded?

If you are like me, you will have found through trial and error that multiple rsync sessions each taking a specific ranges of files will complete much faster. Rsync is not multithreaded, but for the longest time I sure wished it was.


1 Answers

Updated answer (Jan 2020)

xargs is now the recommended tool to achieve parallel execution. It's pre-installed almost everywhere. For running multiple rsync tasks the command would be:

ls /srv/mail | xargs -n1 -P4 -I% rsync -Pa % myserver.com:/srv/mail/ 

This will list all folders in /srv/mail, pipe them to xargs, which will read them one-by-one and and run 4 rsync processes at a time. The % char replaces the input argument for each command call.

Original answer using parallel:

ls /srv/mail | parallel -v -j8 rsync -raz --progress {} myserver.com:/srv/mail/{} 
like image 163
Manuel Riel Avatar answered Sep 24 '22 14:09

Manuel Riel