How does mpi_file_write differ from mpi_file_write_all?

Question

That's pretty much the question. I mean, I know mpi_file_write_all is the "collective" version, but I figure mpi_file_write is going to be called by several processes all at once anyway, so what is the actual difference in their operation? Thanks.

David Henty · Accepted Answer

Functionally, there is little difference in most practical situations. If your IO works correctly with mpi_file_write_all(), then it should work correctly with mpi_file_write() unless you're doing something very complicated. The converse isn't strictly true but in most real situations I've seen, where all processes are doing simple regular IO patterns at the same time, mpi_file_write_all() works if mpi_file_write() does.

Anyway, the point is that if you call mpi_file_write() then the IO library has to process that IO request there and then as it cannot assume that other processes are also performing IO. In anything but the most simple parallel decompositions, the data from a single process will not comprise a single contiguous chunk of the file. As a result, each process will do a large number of small IO transactions (write, seek, write, seek, ...) which is very inefficient on a parallel file system. Worse than that, it probably locks the file while it is doing IO to stop other processes interfering with what it's doing so IO can become effectively serialised across processes.

With write_all(), the IO library has a global view and knows what every process is doing. First, this enables it to reorganise the data so each process has a single large chunk of data to write to the file. Second, as it is in control of all the processes, it can avoid the need to lock the file as it can ensure that writes don't conflict.

For simple regular patterns, e.g. a large 3D array distributed across a 3D grid of processes, I've seen massive differences between the collective and non-collective approaches on a Cray with a Lustre filesystem. The difference can be gigabytes/second vs tens of megabytes/second.

PS I'm assuming here that the pattern is lots of processes writing data to a single shared file. For reading there should also be an improvement (a small number of large contiguous reads) but perhaps not so dramatic as file locking isn't needed for read.

How does mpi_file_write differ from mpi_file_write_all?

Tags:

io

mpi

bob.sacamento

1 Answers

David Henty

Recent Activity

Donate For Us

How does mpi_file_write differ from mpi_file_write_all?

Tags:

io

mpi

bob.sacamento

1 Answers

David Henty

Related questions

Recent Activity

Donate For Us