I am trying to transfer files over a high latency and high bandwidth link. Unfortunately when I use rsync
my transfer speed utilizes only a fraction of my available bandwidth. My total transfer time takes much longer than what I expected (i.e. transfer time = bytes / bytes-per-second available bandwidth)!
What is the fastest way[1] to transfer files over high latency and high bandwidth link?
So for example:
[1] i.e. utilize most of the available bandwidth
When using rsync
in a high latency and high bandwidth situation, your per connection transfer speed will be slower[1] than your available bandwidth. For the example given, your expected transfer speed will be 56.25 KiB or less than 10% of the available bandwidth.
One solution is to run N rsync
processes in parallel:
#!/bin/bash
# tar up the files
tar -cvzf x.tar ${list_of_files}
# [optional]
# compute the md5sum
md5sum x.tar > x.tar.md5sum
# break the large tar file into N files (i.e. x.tar would become x.tar.1 ... x.tar.N)
# TODO
# start N `rsync` processes in parallel
for ((i=1;i<=N;i++)); do rsync -avzh x.tar.${i} ${destination} & done
# wait for the transfers to finish
wait && echo "success" || echo "fail" && exit 1
# stitch the N files back together into x.tar
TODO
# [optional... but gives everyone a nice warm and fuzzy]
# copy the md5sum and verify your files (even though `rsync` already did so)
scp x.tar.md5sum ${destination}
ssh ${destination_machine} "cd ${path} && md5sum -c x.tar.md5sum && echo 'PASS (files verified with md5sum)' || echo 'FAIL (file verification failed md5sum)' && exit 1"
# done!
[1] Why is your transfer speed slow in this example?
In a word: bandwidth-delay product (three words actually)
This is an example of a high latency and high bandwidth link. Some might use a tool like rsync
to transfer their data. If you run one instance of rsync
(or something similar that also uses TCP or TCP-like protocol) you won't utilize the available bandwidth.
The reason for the slowdown has to do with the round-trip nature of TCP (or TCP-like protocols) requiring ACKs before sending more data. The problem is formally referred to as bandwidth-delay product. Each connection speed will be limited by the latency more than the bandwidth.
Specifically for the example given, the theoretical speed will be 56.25 KiB or less than 10% of your available bandwidth.
The limitation is per connection. So using just one rsync
for your file transfer will not fully utilize your bandwidth.
Solution 1:
Use a different program that doesn't use a TCP-like protocol but still guarantees your data through other means (a quick google search is something like uftp
which transfers the data via UDP protocol instead of TCP). Unfortunately uftp
is still not in many distro repos as of this writing.
Solution 2:
Continue using one rsync
and change your TCP networking parameters on both sides but this requires expert knowledge that I don't readily have available at the moment.
Solution 3:
Run multiple rsync
processes in parallel as described in the beginning of this question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With