Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it useful to use multithreading to handle files on a hard drive?

In terms of performance and speed of execution it is useful to use multithreading to handle files on a hard drive? (to move files from a disk to another or to check integrity of files)

I think it is mainly the speed of my HDD that will determine the speed of my treatment.

like image 683
Bastien Vandamme Avatar asked May 02 '11 14:05

Bastien Vandamme


1 Answers

Multithreading can help, at least sometimes. The reason is that if you are writing to a "normal" hard drive (e.g. not a solid state drive) then the thing that is going to slow you down the most is the hard drive's seek time (that is, the time it takes for the hard drive to reposition its read/write head from one distance along the the disk's radius to another). That movement is glacially slow compared to the rest of the system, and the time it takes for the head to seek is proportional to the distance it must travel. So for example, the worst case scenario would be if the head had to move from the edge of the disk to center of the disk after each operation.

Of course the ideal solution is to have the disk head never seek, or seek only very rarely, and if you can arrange it so that your program only needs to read/write a single file sequentially, that will be fastest. Or better yet, switch to an SSD, where there is no disk head, and the seek time is effectively zero. :)

But sometimes you need your drive to be able to read/write multiple files in parallel, in which case the drive head will (of necessity) be seeking back and forth a lot. So how can multithreading help in this scenario? The answer is this: with a sufficiently smart disk I/O subsystem (e.g. SCSI, I'm not sure if IDE can do this), the I/O logic will maintain a queue of all currently outstanding read/write requests, and it will dynamically re-order that queue so that the requests are fulfilled in the order that minimizes the amount of travel by the read/write head. This is known as the Elevator Algorithm, because it is similar to the logic used by an elevator to maximize the number of people it can transport in a given period of time.

Of course, the OS's I/O subsystem can only implement this optimization if it knows in advance what I/O requests are pending... and if you have only one thread initiating I/O requests, then the I/O subsystem will only know about the current request. (i.e. it can't "peek" into your thread's userland request queue to see what your thread will want next). And of course your userland thread doesn't know the details of the disk layout, so it's difficult (impossible?) to implement the Elevator Algorithm in user space.

But if your program has N threads reading/writing the disk at once, then the OS's I/O subsystem will be aware of up to N I/O requests at once, and can re-order those requests as it sees fit to maximize disk performance.

like image 139
Jeremy Friesner Avatar answered Oct 13 '22 14:10

Jeremy Friesner