Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is is possible to write to different parts of the same file from multiple threads?

Tags:

c

io

disk-io

Can I write to different parts of the same file concurrently from multiple threads (on a typical PC)? I mean there's only one disk head, so the writes can be only performed in some order anyway i.e. not in parallel, right?

Edit:

I'm writing a program that sorts a large binary file but the majority of time is still spent on disk I/O, so I'm just wondering will I gain any extra speed by doing I/O in parallel.

like image 306
szx Avatar asked Mar 18 '13 13:03

szx


3 Answers

There's nothing to stop you from having multiple threads writing to different parts of the same file.

I have a program that sorts a large binary file but the majority of time is still spent on disk I/O, so I'm just wondering will I gain any extra speed by doing I/O in parallel.

If the program is disk-bound, making it multithreaded (and still writing the same amount of data to the same disk) will not speed it up.

If we are talking about a traditional hard drive, sequential I/O is generally faster than I/O that involves moving the disk head back and forth. With this in mind, splitting the I/O across threads might even be counter-productive.

There are several avenues to explore as far as speeding things up:

  1. Reducing the amount of I/O (e.g. by employing a sorting algorithm that requires less I/O, or by doing more work in-memory);
  2. Improving I/O throughput, for example by using a faster drive.
like image 146
NPE Avatar answered Oct 05 '22 11:10

NPE


It is possible on unix(-like) operating systems at least, presumably also on Windows, though file handling is somewhat different and may need specific file mode allow this (edit: see answer of bizzehdee for details).

On a running operating system, "file" is really a logical entity, some state of it stored to disk at any given time, but also some changes still only in kernel buffers. So, in a way, writing to file is no different from writing to block of shared memory, only API is different (and not even that if you use mmap).

But in short, just seek and write, old bytes in the file get overwritten. If two processes write on same bytes overlapping, I think end result is undefined, and in any case something, which should never happen in a correctly functioning system, and any programs doing this should have some mechanism to prevent overlapping writes.


About speed up: depends on what you do, really. If you just perform raw write, things will probably slow down on traditional spinning hard disk, or file may become fragmented more easily. On an SSD, there probably is no slow-down, but no speed-up either.

On the other hand, if your operation is CPU-bound, and you have multiple cores, and doing things in parallel will allow you to get higher total CPU usage, then processing different parts of same output file in parallel can speed up things, even a lot if there's lot of processing compared to bytes written to file.

like image 22
hyde Avatar answered Oct 05 '22 12:10

hyde


you need to look at CreateFileEx and WriteFileEx and make use of lpOverlapped. This allows for async reading and/or writing from/to the same file at the same time in multiple threads.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748(v=vs.85).aspx

like image 33
bizzehdee Avatar answered Oct 05 '22 13:10

bizzehdee