my question: in Linux (and in FreeBsd, and generally in UNIX) is it possible/legal to read single file descriptor simultaneously from two threads?
I did some search but found nothing, although a lot of people ask like question about reading/writing from/to socket fd at the same time (meaning reading when other thread is writing, not reading when other is reading). I also have read some man pages and got no clear answer on my question.
Why I ask it. I tried to implement simple program that counts lines in stdin, like wc -l. I actually was testing my home-made C++ io engine for overhead, and discovered that wc is 1.7 times faster. I trimmed down some C++ and came closer to wc speed but didn't reach it. Then I experimented with input buffer size, optimized it, but still wc is clearly a bit faster. Finally I created 2 threads which read same STDIN_FILENO in parallel, and this at last was faster than wc! But lines count became incorrect... so I suppose some junk comes from reads which is unexpected. Doesn't kernel care what process read?
Edit: I did some research and discovered just that calling read directly via syscall does not change anything. Kernel code seem to do some sync handling, but i didnt understand much (read_write.c)
When used with a descriptor (fd), read() and write() rely on the internal state of the fd to know the "current offset" at which the read and write will occur. As a result, they aren't thread-safe.
To allow a single descriptor to be used by multiple threads simultaneously, pread() and pwrite() are provided. With those interfaces, the descriptor and the desired offset are specified, so the "current offset" in the descriptor isn't used.
That's undefined behavior, POSIX says:
The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
About accessing a single file descriptor concurrently (i.e. from multiple threads or even processes), I'm going to cite POSIX.1-2008 (IEEE Std 1003.1-2008), Subsection 2.9.7 Thread Interactions with Regular File Operations:
2.9.7 Thread Interactions with Regular File Operations
All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links:
[…] read() […]
If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. […]
At first glance, this looks quite good. However, I hope you did not miss the restriction when they operate on regular files or symbolic links.
@jarero cites:
The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
So, implicitly, we're agreeing, I assume: It depends on the type of the file you are reading. You said, you read from STDIN. Well, if your STDIN is a plain file, you can use concurrent access. Otherwise you shouldn't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With