Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does multithreaded file transfer improve performance?

RichCopy, a better-than-robocopy-with-GUI tool from Microsoft, seems to be the current tool of choice for copying files. One of it's main features, hightlighted in the TechNet article presenting the tool, is that it copies multiple files in parallel. In its default setting, three files are copied simultaneously, which you can see nicely in the GUI: [Progress: xx% of file A, yy% of file B, ...]. There are a lot of blog entries around praising this tool and claiming that this speeds up the copying process.

My question is: Why does this technique improve performance? As far as I know, when copying files on modern computer systems, the HDD is the bottleneck, not the CPU or the network. My assumption would be that copying multiple files at once makes the whole process slower, since the HDD needs to jump back and forth between different files rather than just sequentially streaming one file. Since RichCopy is faster, there must be some mistake in my assumptions...

like image 552
Heinzi Avatar asked Nov 25 '09 14:11

Heinzi


People also ask

How does multithreading improve performance?

The ultimate goal of multithreading is to increase the computing speed of a computer and thus also its performance. To this end, we try to optimize CPU usage. Rather than sticking with a process for a long time, even when it's waiting on data for example, the system quickly changes to the next task.

Why is multi threading faster?

In many cases, multithreading gives excellent results for I/O bound application, because you can do multiple things in parallel, rather than blocking your entire app waiting for single I/O operation.

Does multithreading improve CPU performance?

On a single core CPU, a single process (no separate threads) is usually faster than any threading done. Threads do not magically make your CPU go any faster, it just means extra work.


2 Answers

The tool is making use improvements in hardware which can optimise multiple read and write requests much better.

When copying one file at a time the hardware isn't going to know that the block of data that currently is passing under the read head (or near by) will be needed of a subsquent read since the software hasn't queued that request yet.

A single file copy these days is not very taxing task for modern disk sub-systems. By giving these hardware systems more work to do at once the tool is leveraging its improved optimising features.

like image 86
AnthonyWJones Avatar answered Sep 19 '22 05:09

AnthonyWJones


A naive "copy multiple files" application will copy one file, then wait for that to complete before copying the next one.

This will mean that an individual file CANNOT be copied faster than the network latency, even if it is empty (0 bytes). Because it probably does several file server calls, (open,write,close), this may be several x the latency.

To efficiently copy files, you want to have a server and client which use a sane protocol which has pipelining; that's to say - the client does NOT wait for the first file to be saved before sending the next, and indeed, several or many files may be "on the wire" at once.

Of course to do that would require a custom server not a SMB (or similar) file server. For example, rsync does this and is very good at copying large numbers of files despite being single threaded.

So my guess is that the multithreading helps because it is a work-around for the fact that the server doesn't support pipelining on a single session.

A single-threaded implementation which used a sensible protocol would be best in my opinion.

like image 38
MarkR Avatar answered Sep 18 '22 05:09

MarkR