Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a write operation in unix atomic? [duplicate]

Tags:

c

unix

posix

atomic

I was reading the APUE(Advanced Programming in the UNIX Environment), and come across this question when I see $3.11:

if (lseek(fd, 0L, 2) < 0) /* position to EOF */
err_sys("lseek error");
if (write(fd, buf, 100) != 100) /* and write */
err_sys("write error")

APUE says:

This works fine for a single process, but problems arise if multiple processes use this technique to append to the same file. .......The problem here is that our logical operation of ‘‘position to the end of file and write’’ requires two separate function calls (as we’ve shown it). Any operation that requires more than one function call cannot be atomic, as there is always the possibility that the kernel might temporarily suspend the process between the two function calls.

It just says cpu will switch between function calls between lseek and write, I want to know if it will also switch in half write operation? Or rather, is write atomic? If threadA writes "aaaaa", threadB writes "bbbbb", will the result be "aabbbbbaaa"?

What's more,after that APUE says pread and pwrite are all atomic operations, does that mean these functions use mutex or lock internally to be atomic?

like image 237
scottxiao Avatar asked May 31 '18 12:05

scottxiao


People also ask

Is write operation atomic?

In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document.

Is write an atomic?

Does that mean write is atomic? Technically, yes: future reads must return the entire contents of the write, or none of it.

What are atomic operations in Unix?

Atomic operations are operations like "increment and get" that are executed atomically that means that no context switch can interfere with the operation. In Linux kernel space, we have to atomic_t type, in Java we have the java. util. concurrent.

Is file append Atomic?

Firstly, O_APPEND or the equivalent FILE_APPEND_DATA on Windows means that increments of the maximum file extent (file "length") are atomic under concurrent writers. This is guaranteed by POSIX, and Linux, FreeBSD, OS X and Windows all implement it correctly.


1 Answers

To call the Posix semantics "atomic" is perhaps an oversimplification. Posix requires that reads and writes occur in some order:

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. (from the Rationale section of the Posix specification for pwrite and write)

The atomicity guarantee mentioned in APUE refers to the use of the O_APPEND flag, which forces writes to be performed at the end of the file:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

With respect to pread and pwrite, APUE says (correctly, of course) that these interfaces allow the application to seek and perform I/O atomically; in other words, that the I/O operation will occur at the specified file position regardless of what any other process does. (Because the position is specified in the call itself, and does not affect the persistent file position.)

The Posix sequencing guarantee is as follows (from the Description of the write() and pwrite() functions):

After a write() to a regular file has successfully returned:

  • Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

  • Any subsequent successful write() to the same byte position in the file shall overwrite that file data.

As mentioned in the Rationale, this wording does guarantee that two simultaneous write calls (even in different unrelated processes) will not interleave data, because if data were interleaved during a write which will eventually succeed the second guarantee would be impossible to provide. How this is accomplished is up to the implementation.

It must be noted that not all filesystems conform to Posix, and modular OS design, which allows multiple filesystems to coexist in a single installation, make it impossible for the kernel itself to provide guarantees about write which apply to all available filesystems. Network filesystems are particularly prone to data races (and local mutexes won't help much either), as is mentioned as well by Posix (at the end of the paragraph quoted from the Rationale):

This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.

The first guarantee (about subsequent reads) requires some bookkeeping in the filesystem, because data which has been successfully "written" to a kernel buffer but not yet synched to disk must be made transparently available to processes reading from that file. This also requires some internal locking of kernel metadata.

Since writing to regular files is typically accomplished via kernel buffers and actually synching the data to the physical storage device is definitely not atomic, the locks necessary to provide these guarantee don't have to be very long-lasting. But they must be done inside the filesystem because nothing in the Posix wording limits the guarantees to simultaneous writes within a single threaded process.

Within a multithreaded process, Posix does require read(), write(), pread() and pwrite() to be atomic when they operate on regular files (or symbolic links). See Thread Interactions with Regular File Operations for a complete list of interfaces which must obey this requirement.

like image 133
rici Avatar answered Oct 18 '22 18:10

rici