Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is write() safe to be called from multiple threads simultaneously?

Assuming I have opened dev/poll as mDevPoll, is it safe for me to call code like this

struct pollfd tmp_pfd;
tmp_pfd.fd = fd;
tmp_pfd.events = POLLIN;

// Write pollfd to /dev/poll
write(mDevPoll, &tmp_pfd, sizeof(struct pollfd));

...simultaneously from multiple threads, or do I need to add my own synchronisation primitive around mDevPoll?

like image 921
Wad Avatar asked Feb 24 '17 15:02

Wad


2 Answers

Solaris 10 claims to be POSIX compliant. The write() function is not among the handful of system interfaces that POSIX permits to be non-thread-safe, so we can conclude that that on Solaris 10, it is safe in a general sense to call write() simultaneously from two or more threads.

POSIX also designates write() among those functions whose effects are atomic relative to each other when they operate on regular files or symbolic links. Specifically, it says that

If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them.

If your writes were directed to a regular file then that would be sufficient to conclude that your proposed multi-thread actions are safe, in the sense that they would not interfere with one another, and the data written in one call would not be commingled with that written by a different call in any thread. Unfortunately, /dev/poll is not a regular file, so that does not apply directly to you.

You should also be aware that write() is not in general required to transfer the full number of bytes specified in a single call. For general purposes, one must therefore be prepared to transfer the desired bytes over multiple calls, by using a loop. Solaris may provide applicable guarantees beyond those expressed by POSIX, perhaps specific to the destination device, but absent such guarantees it is conceivable that one of your threads performs a partial write, and the next write is performed by a different thread. That very likely would not produce the results you want or expect.

like image 170
John Bollinger Avatar answered Sep 25 '22 08:09

John Bollinger


It's not safe in theory, even though write() is completely thread-safe (barring implementation bugs...). Per the POSIX write() standard (emphasis mine): .

The write() function shall attempt to write nbyte bytes from the buffer pointed to by buf to the file associated with the open file descriptor, fildes.

...

RETURN VALUE

Upon successful completion, these functions shall return the number of bytes actually written ...

There is no guarantee that you won't get a partial write(), so even if each individual write() call is atomic, it's not necessarily complete, so you could still get interleaved data because it may take more than one call to write() to completely write all data.

In practice, if you're only doing relatively small write() calls, you will likely never see a partial write(), with "small" and "likely" being indeterminate values dependent on your implementation.

I've routinely delivered code that uses unlocked single write() calls on regular files opened with O_APPEND in order to improve the performance of logging - build a log entry then write() the entire entry with one call. I've never seen a partial or interleaved write() result over almost a couple of decades of doing that on Linux and Solaris systems, even when many processes write to the same log file. But then again, it's a text log file and if a partial or interleaved write() does happen there would be no real damage done or even data lost.

In this case, though, you're "writing" a handful of bytes to a kernel structure. You can dig through the Solaris /dev/poll kernel driver source code at Illumos.org and see how likely a partial write() is. I'd suspect it's practically impossible - because I just went back and looked at the multiplatform poll class that I wrote for my company's software library a decade ago. On Solaris it uses /dev/poll and unlocked write() calls from multiple threads. And it's been working fine for a decade...

Solaris /dev/pool Device Driver Source Code Analysis

The (Open)Solaris source code can be found here: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/devpoll.c#628

The dpwrite() function is the code in the /dev/poll driver that actually performs the "write" operation. I use quotes because it's not really a write operation at all - data isn't transferred as much as the data in the kernel that represents the set of file descriptors being polled is updated.

Data is copied from user space into kernel space - to a memory buffer obtained with kmem_alloc(). I don't see any possible way that can be a partial copy. Either the allocation succeeds or it doesn't. The code can get interrupted before doing anything, as it wait for exclusive write() access to the kernel structures.

After that, the last return call is at the end - and if there's no error, the entire call is marked successful, or the entire call fails on any error:

995     if (error == 0) {
996     /*
997      * The state of uio_resid is updated only after the pollcache
998      * is successfully modified.
999      */
1000        uioskip(uiop, copysize);
1001    }
1002    return (error);
1003}

If you dig through Solaris kernel code, you'll see that uio_resid is what ends up being the value returned by write() after a successful call.

So the call certainly appears to be all-or-nothing. While there appear to be ways for the code to return an error on a file descriptor after successfully processing an earlier descriptor when multiple descriptors are passed in, the code doesn't appear to return any partial success indications.

If you're only processing one file descriptor at a time, I'd say the /dev/poll write() operation is completely thread-safe, and it's almost certainly thread-safe for "writing" updates to multiple file descriptors as there's no apparent way for the driver to return a partial write() result.

like image 38
Andrew Henle Avatar answered Sep 22 '22 08:09

Andrew Henle