This is a basic question but I'd like to have someone with some more networking experience provide a more comprehensive answer.
Let's say I have 3 files on an external server that are 1GB each. And to download them I would do:
$ wget https://server.com/file1.mov
In terms of doing the three items in parallel (in three separate tabs/shells/threads, for example), or in doing then in series, such as:
$ wget https://server.com/file1.mov \
&& wget https://server.com/file2.mov \
&& wget https://server.com/file3.mov \
Under the following circumstances:
For the first case, it seems obvious that we'd want to use parallel downloads, if there is going to be what amounts to a fixed overhead cost on the external server, but what about for the other two scenarios, why would one be better than the other?
For all three cases the actual results depend on a number of limiting factors. You’ve listed download and upload bandwidth, but the bottleneck can be also in disk I/O, CPU and RAM as well as data transfer protocol and number of others.
In general segmented file transfer is a preferable way to obtain data as it reduces an impact of TCP Congestion Control (assume we use TCP protocol) and heterogeneous environment. As described in this “Applied Techniques for High Bandwidth Data Transfers across Wide Area Networks” (Jason Lee, Dan Gunter, Brian Tierney) paper:
“TCP probes the available bandwidth of the connection by continuously increasing the window size until a packet is lost, at which point it cuts the window in half and starts “ramping up” the connection again. The higher the bandwidth-delay product, the longer this ramp up will take, and less of the available bandwidth will be used during its duration.”
And
“In order to improve this situation where the network becomes the bottleneck, parallel streams can be used. This technique is implemented by dividing the data to be transferred into N portions and transferring each portion with a separate TCP connection. The effect of N parallel streams is to reduce the bandwidth-delay product experienced by a single stream by a factor of N because they all share the single-stream bandwidth (u). Random packet losses for reasonable values of q (<0.001) will usually occur in one stream at a time, therefore their effect on the aggregate throughput will be reduced by a factor of N. When competing with connections over a congested link, each of the parallel streams will be less likely to be selected for having their packets dropped, and therefore the aggregate amount of potential bandwidth which must go through premature congestion avoidance or slow start is reduced.”
Worth mentioning that congestion algorithms play a significant role in bandwidth allocation but they still need to be set according to the network “class”. Broadband, satellite, 3G, WiFi - they all have features dictated by the physical environment and CWND implementations perform differently.
Another thing to consider is a behavior of parallel vs concurrent data transfer in congested networks. Theoretically, the more data you transfer the higher chance to clog the network and activate policies on the ISP side that will start dropping or shaping connections. However, even in this case a probability to exchange data is higher with several small parallel links rather than with one big connection.
Again, this is a very generic explanation and connection throughput may vary significantly depending upon a variety of factors. It’s also possible to find yourself in a situation when a single connection will perform on par with multi-threaded and at the same time won’t require extra efforts and code for implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With