Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does multithreading make sense for IO-bound operations?

When performing many disk operations, does multithreading help, hinder, or make no difference?

For example, when copying many files from one folder to another.

Clarification: I understand that when other operations are performed, concurrency will obviously make a difference. If the task was to open an image file, convert to another format, and then save, disk operations can be performed concurrently with the image manipulation. My question is when the only operations performed are disk operations, whether concurrently queuing and responding to disk operations is better.

like image 261
Aidan Ryan Avatar asked May 23 '09 20:05

Aidan Ryan


People also ask

Can Io be multithreaded?

This pool then assigns tasks to the available processes and schedules them to run. When and where should these processors be used? If you're dealing with an I/O bound operation, multithreading is a good option so that you can benefit from reducing I/O wait time.

When should multithreading be used?

Multithreading is a process of executing multiple threads simultaneously. You should use multithreading when you can perform multiple operations together so that it can save time.

Under what conditions multithreading is not possible and why?

Many tasks may be compute bound, but still not practical to use a multithreaded approach because the process must synchronise on the entire state. Such a program cannot benefit from multithreading because no work can be performed concurrently.


2 Answers

Most of the answers so far have had to do with the OS scheduler. However, there is a more important factor that I think would lead to your answer. Are you writing to a single physical disk, or multiple physical disks?

Even if you parallelize with multiple threads...IO to a single physical disk is intrinsically a serialized operation. Each thread would have to block, waiting for its chance to get access to the disk. In this case, multiple threads are probably useless...and may even lead to contention problems.

However, if you are writing multiple streams to multiple physical disks, processing them concurrently should give you a boost in performance. This is particularly true with managed disks, like RAID arrays, SAN devices, etc.

I don't think the issue has much to do with the OS scheduler as it has more to do with the physical aspects of the disk(s) your writing to.

like image 153
jrista Avatar answered Nov 01 '22 07:11

jrista


That depends on your definition of "I/O bound" but generally multithreading has two effects:

  • Use multiple CPUs concurrently (which won't necessarily help if the bottleneck is the disk rather than the CPU[s])

  • Use a CPU (with a another thread) even while one thread is blocked (e.g. waiting for I/O completion)

I'm not sure that Konrad's answer is always right, however: as a counter-example, if "I/O bound" just means "one thread spends most of its time waiting for I/O completion instead of using the CPU", but does not mean that "we've hit the system I/O bandwidth limit", then IMO having multiple threads (or asynchronous I/O) might improve performance (by enabling more than one concurrent I/O operation).

like image 31
ChrisW Avatar answered Nov 01 '22 07:11

ChrisW