I'm trying to parallelize the processing of a file set using bash. I'm using named pipes for keeping number of process fixed and to gather output from the processes.
I'm assuming that the writes to named pipe are atomic, i.e the output of different process is not mixed up. Is that a safe assumption?
Any advice is greatly appreciated. I'm limited to using bash.
Here's the code:
mytask()
{
wItem=$1
#dummy func; process workItem
rt=$RANDOM
st=$rt;
let "rt %= 2"
let "st %= 10"
sleep $st
return $rt
}
parallelizeTask()
{
workList=$1
threadCnt=$2
task=$3
threadSyncPipeD=$4
outputSyncPipeD=$5
ti=0
for workItem in $workList; do
if [ $ti -lt $threadCnt ]; then
{ $task $workItem; if [ $? == 0 ]; then result="success"; else result="failure"; fi; \
echo "$result:$workItem" >&$outputSyncPipeD; echo "$result:$workItem" >&$threadSyncPipeD; } &
((ti++))
continue;
fi
while read n; do
((ti--))
break;
done <&$threadSyncPipeD
{ $task $workItem; if [ $? == 0 ]; then result="success"; else result="failure"; fi; \
echo "$result:$workItem" >&$outputSyncPipeD; echo "$result:$workItem" >&$threadSyncPipeD;} &
((i++))
done
wait
echo "quit" >&$outputSyncPipeD
while read n; do
if [[ $n == "quit" ]]; then
break;
else
eval $6="\${$6}\ \$n"
fi
done <&$outputSyncPipeD;
}
main()
{
if [ ! -p threadSyncPipe ]; then
mkfifo threadSyncPipe
fi
if [ ! -p outputSyncPipe ]; then
mkfifo outputSyncPipe
fi
exec 4<>threadSyncPipe
exec 3<>outputSyncPipe
gout=
parallelizeTask "f1 f2 f3 f4 f5 f6" 2 mytask 3 4 gout
echo "finalOutput: $gout";
for f in $gout; do
echo $f
done
rm outputSyncPipe
rm threadSyncPipe
}
main
I found below related post with answer to my question. I have revised the title to make it more appropriate.
Are there repercussions to having many processes write to a single reader on a named pipe in posix?
Reads from a pipe are not atomic. The standard developers considered adding atomicity requirements to a pipe or FIFO, but recognized that due to the nature of pipes and FIFOs there could be no guarantee of atomicity of reads of {PIPE_BUF} or any other size that would be an aid to applications portability.
What is a named pipe? On Unix-based operating system like Linux, a named pipe, or FIFO (first-in, first-out), is a “special” kind of file used to establish a connection between processes. Unlike a “standard” pipe, a named pipe is accessed as part of the filesystem, just like any other type of file.
Yes, multiple processes can read from (or write to) a pipe. But data isn't duplicated for the processes.
In Linux, we have two types of pipes: pipes (also known as anonymous or unnamed pipes) and FIFO's (also known as named pipes).
I found answer in the below given related post, according to it, the writes to fifo are atomic as long as the write messages is less than the page size 4k(page size depends on system configuration).
Are there repercussions to having many processes write to a single reader on a named pipe in posix?
Thank you all for the replies and suggestions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With