In the use-case of having the output of a singular command being consumed by only one other, is it better to use |
(pipelines) or <()
(process substitution)?
Better is, of course, subjective. For my specific use case I am after performance as the primary driver, but also interested in robustness.
The while read do done < <(cmd)
benefits I already know about and have switched over to.
I have several var=$(cmd1|cmd2)
instances that I suspect might be better replaced as var=$(cmd2 < <(cmd1))
.
I would like to know what specific benefits the latter case brings over the former.
Process substitution allows a process's input or output to be referred to using a filename. It takes the form of. <( list ) or. >( list )
A pipe in Bash takes the standard output of one process and passes it as standard input into another process. Bash scripts support positional arguments that can be passed in at the command line. Guiding principle #1: Commands executed in Bash receive their standard input from the process that starts them.
A pipeline is a sequence of one or more commands separated by one of the control operators ' | ' or ' |& '. The output of each command in the pipeline is connected via a pipe to the input of the next command. That is, each command reads the previous command's output.
Pipe is used to combine two or more commands, and in this, the output of one command acts as input to another command, and this command's output may act as input to the next command and so on. It can also be visualized as a temporary connection between two or more commands/ programs/ processes.
tl;dr: Use pipes, unless you have a convincing reason not to.
Piping and redirecting stdin from a process substitution is essentially the same thing: both will result in two processes connected by an anonymous pipe.
There are three practical differences:
Which is why you started looking into this in the first place:
#!/bin/bash
cat "$1" | while IFS= read -r last; do true; done
echo "Last line of $1 is $last"
This script won't work by default with a pipelines, because unlike ksh
and zsh
, bash
will fork a subshell for each stage.
If you set shopt -s lastpipe
in bash 4.2+, bash mimics the ksh
and zsh
behavior and works just fine.
POSIX only requires a shell to wait for the last process in a pipeline, but most shells including bash
will wait for all of them.
This makes a notable difference when you have a slow producer, like in a /dev/random
password generator:
tr -cd 'a-zA-Z0-9' < /dev/random | head -c 10 # Slow?
head -c 10 < <(tr -cd 'a-zA-Z0-9' < /dev/random) # Fast?
The first example will not benchmark favorably. Once head
is satisfied and exits, tr
will wait around for its next write()
call to discover that the pipe is broken.
Since bash waits for both head
and tr
to finish, it will appear seem slower.
In the procsub version, bash only waits for head
, and lets tr
finish in the background.
If you invoke an external command like sleep 1
, then the Unix process model requires that bash forks and executes the command.
Since forks are expensive, bash optimizes the cases that it can. For example, the command:
bash -c 'sleep 1'
Would naively incur two forks: one to run bash, and one to run sleep
. However, bash can optimize it because there's no need for bash
to stay around after sleep
finishes, so it can instead just replace itself with sleep
(execve
with no fork
). This is very similar to tail call optimization.
( sleep 1 )
is similarly optimized, but <( sleep 1 )
is not. The source code does not offer a particular reason why, so it may just not have come up.
$ strace -f bash -c '/bin/true | /bin/true' 2>&1 | grep -c clone
2
$ strace -f bash -c '/bin/true < <(/bin/true)' 2>&1 | grep -c clone
3
Given the above you can create a benchmark favoring whichever position you want, but since the number of forks is generally much more relevant, pipes would be the best default.
And obviously, it doesn't hurt that pipes are the POSIX standard, canonical way of connecting stdin/stdout of two processes, and works equally well on all platforms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With