Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple readers from FIFO

Tags:

bash

stdin

fifo

Is it possible to split STDIN between multiple readers, effectively becoming a job queue? I would like pass each line to a single reader. Named pipes almost work, but simultaneous reads interfere:

reader.sh

#!/usr/bin/env bash
while read line
do
  echo $line
done <  fifo

writer.sh

#!/usr/bin/env bash
while true
do
  echo "This is a test sentance"
  sleep 1
done

execution:

mkfifo fifo
./reader.sh &
./reader.sh &
./writer.sh > fifo

Occasional output (particularly if the readers and writers are in separate windows)

This is atetsnac
Ti sats etnesats etne etsnac
isats etnes etsnac
Tisi etsnac
hi etsnac
Ti sats etn
hsi etsnac

Notes:

  • I know there are better approaches, just curious if this could be made to work
  • I assume this isn't a bug as I've tested both Linux and OSX boxes
  • I'd like one consumer per line, which rules out tee
  • I'd like to consume STDIN, which rules out xargs
  • GNU coreutils split can allocate round robin, but not first available
  • GNU parallel --pipe waits until STDIN closes; I'd like to allocate ASAP
like image 457
user3769065 Avatar asked Sep 27 '22 10:09

user3769065


1 Answers

No, in general it is not possible to do it robustly. Writes to a named pipe less than PIPE_BUF (>=512 bytes on all POSIX systems) are atomic. The problem is the reads are not atomic, and there is no standard (or non standard AFAIK) way to make them atomic. On a blocking read of the pipe if 1 or more byte is available they will be read immediately with actual number read returned as the return value.

Rochkind, Advance UNIX Programming states:

Because there is no guarantee of atomicity you must never allow multiple readers unless you have another concurrency control mechanism in place .... use something like a message queue instead.

Having said all that, for fun, it is possible to achieve surprisingly robust behavior. The reason the line based cat | while read line do; .. approach seems to work is because cat is immediately snatching lines from the pipe as soon as they arrive, and readers are ready to read as soon as writing begins, as you mentioned. Because it is reading straight away it happens to snatch up lines (plural) at the line boundaries at which they are being written. In general though a line based approach is not going to be very robust because the message boundary is not predictable.

If you wrote and read in constant sized chunks <=PIPE_BUF you'd do better. Your guaranteed never to read more than you ask for, and as long as you are writing constant sized chunks, less than PIPE_BUF in size with each write there is no reason that there should ever be less than a multiple of a chunk of bytes available for reading, however, it is not guaranteed all available bytes will actually be read; It is not an error for the underlying read system call to return less bytes than what you request regardless of how many bytes are actually available to be read:

On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal.

And there may be other peculiar reasons - if the standards don't explicitly say it is guaranteed and the conditions under which it is guaranteed, don't assume it is.

--

reader.sh:

#!/bin/bash
while read -N 21 packet
do
  echo [$$] $packet
done<fifo

writer.sh

#!/bin/bash
for((i=0; i<100; i++))
do
  s=`printf "%020d" $i`
  echo $s
  echo "wrote $s" >&2
done

execution:

mkfifo fifo
./reader.sh &
./reader.sh &
./writer.sh > fifo
like image 142
spinkus Avatar answered Oct 03 '22 12:10

spinkus