Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to transfer files over high latency and high bandwidth link?

I am trying to transfer files over a high latency and high bandwidth link. Unfortunately when I use rsync my transfer speed utilizes only a fraction of my available bandwidth. My total transfer time takes much longer than what I expected (i.e. transfer time = bytes / bytes-per-second available bandwidth)!

What is the fastest way[1] to transfer files over high latency and high bandwidth link?

So for example:

  • latency is greater than 900 ms latency (round trip time)
  • bandwidth 512 kbit/s

[1] i.e. utilize most of the available bandwidth

like image 400
Trevor Boyd Smith Avatar asked Jun 24 '16 13:06

Trevor Boyd Smith


1 Answers

When using rsync in a high latency and high bandwidth situation, your per connection transfer speed will be slower[1] than your available bandwidth. For the example given, your expected transfer speed will be 56.25 KiB or less than 10% of the available bandwidth.

One solution is to run N rsync processes in parallel:

#!/bin/bash

# tar up the files
tar -cvzf x.tar ${list_of_files}

# [optional]
# compute the md5sum
md5sum x.tar > x.tar.md5sum

# break the large tar file into N files (i.e. x.tar would become x.tar.1 ... x.tar.N)
# TODO

# start N `rsync` processes in parallel 
for ((i=1;i<=N;i++)); do rsync -avzh x.tar.${i} ${destination} & done

# wait for the transfers to finish
wait && echo "success" || echo "fail" && exit 1

# stitch the N files back together into x.tar
TODO

# [optional... but gives everyone a nice warm and fuzzy]
# copy the md5sum and verify your files (even though `rsync` already did so)
scp x.tar.md5sum ${destination}
ssh ${destination_machine} "cd ${path} && md5sum -c x.tar.md5sum && echo 'PASS (files verified with md5sum)' || echo 'FAIL (file verification failed md5sum)' && exit 1"
# done!

[1] Why is your transfer speed slow in this example?

In a word: bandwidth-delay product (three words actually)

This is an example of a high latency and high bandwidth link. Some might use a tool like rsync to transfer their data. If you run one instance of rsync (or something similar that also uses TCP or TCP-like protocol) you won't utilize the available bandwidth.

The reason for the slowdown has to do with the round-trip nature of TCP (or TCP-like protocols) requiring ACKs before sending more data. The problem is formally referred to as bandwidth-delay product. Each connection speed will be limited by the latency more than the bandwidth.

Specifically for the example given, the theoretical speed will be 56.25 KiB or less than 10% of your available bandwidth.

The limitation is per connection. So using just one rsync for your file transfer will not fully utilize your bandwidth.

Solution 1:

Use a different program that doesn't use a TCP-like protocol but still guarantees your data through other means (a quick google search is something like uftp which transfers the data via UDP protocol instead of TCP). Unfortunately uftp is still not in many distro repos as of this writing.

Solution 2:

Continue using one rsync and change your TCP networking parameters on both sides but this requires expert knowledge that I don't readily have available at the moment.

Solution 3:

Run multiple rsync processes in parallel as described in the beginning of this question.

like image 50
Trevor Boyd Smith Avatar answered Sep 28 '22 19:09

Trevor Boyd Smith