I know about
program1 | program2
and
program1 | tee outputfile | program2
but is there a way to feed program1's output into both program2 and program3?
You can make it do so by using the pipe character '|'. Pipe is used to combine two or more commands, and in this, the output of one command acts as input to another command, and this command's output may act as input to the next command and so on.
Portability notes On some systems (but not Linux), pipes are bidirectional: data can be transmitted in both directions between the pipe ends. POSIX. 1 requires only unidirectional pipes. Portable applications should avoid reliance on bidirectional pipe semantics.
A pipe usually connects only two processes, although any number of child processes can be connected to each other and their related parent by a single pipe. A pipe is created in the process that becomes the parent by a call to pipe(2). The call returns two file descriptors in the array passed to it.
A pipe is a section of shared memory that processes use for communication. The process that creates a pipe is the pipe server. A process that connects to a pipe is a pipe client. One process writes information to the pipe, then the other process reads the information from the pipe.
You can do this with tee
and process substitution.
program1 | tee >(program2) >(program3)
The output of program1
will be piped to whatever is inside ( )
, in this case program2
and program3
.
This seem trivial, but doing this is not only possible, also doing so will generate concurrent or simultaneous process.
You may have to take care about some particular effects, like order of execution, exection time, etc.
There are some sample at end of this post.
As this question is flagged shell and unix, I will first give a POSIX compatible answer. (for bashisms, go further.)
Yes, there is a way to use unnamed pipes.
In this sample, I will generate a range of 100'000 numbers, randomize them and compress the result using 4 different compression tools to compare the compression ratio...
For this to I will first run the preparation:
GZIP_CMD=`which gzip` BZIP2_CMD=`which bzip2` LZMA_CMD=`which lzma` XZ_CMD=`which xz` MD5SUM_CMD=`which md5sum` SED_CMD=`which sed`
Note: specifying full path to commands prevent some shell interpreter (like busybox) to run built-in compressor. And doing way will ensure same syntax will run independently of os installation (paths could be different between MacOs, Ubuntu, RedHat, HP-Ux and so...).
The syntax NN>&1
(where NN is a number between 3 and 63) do generate unnamed pipe who could by find at /dev/fd/NN
. (The file descriptors 0 to 2 are already open for 0: STDIN, 1: STDOUT and 2: STDERR).
Try this (tested under dash, busybox and bash) :
(((( seq 1 100000 | shuf | tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 | $GZIP_CMD >/tmp/tst.gz ) 4>&1 | $BZIP2_CMD >/tmp/tst.bz2 ) 5>&1 | $LZMA_CMD >/tmp/tst.lzma ) 6>&1 | $XZ_CMD >/tmp/tst.xz ) 7>&1 | $MD5SUM_CMD
or more readable:
GZIP_CMD=`which gzip` BZIP2_CMD=`which bzip2` LZMA_CMD=`which lzma` XZ_CMD=`which xz` MD5SUM_CMD=`which md5sum` ( ( ( ( seq 1 100000 | shuf | tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 | $GZIP_CMD >/tmp/tst.gz ) 4>&1 | $BZIP2_CMD >/tmp/tst.bz2 ) 5>&1 | $LZMA_CMD >/tmp/tst.lzma ) 6>&1 | $XZ_CMD >/tmp/tst.xz ) 7>&1 | $MD5SUM_CMD 2e67f6ad33745dc5134767f0954cbdd6 -
As shuf
do random placement, if you try this, you must obtain different result,
ls -ltrS /tmp/tst.* -rw-r--r-- 1 user user 230516 oct 1 22:14 /tmp/tst.bz2 -rw-r--r-- 1 user user 254811 oct 1 22:14 /tmp/tst.lzma -rw-r--r-- 1 user user 254892 oct 1 22:14 /tmp/tst.xz -rw-r--r-- 1 user user 275003 oct 1 22:14 /tmp/tst.gz
but you must be able to compare md5 checksums:
SED_CMD=`which sed` for chk in gz:$GZIP_CMD bz2:$BZIP2_CMD lzma:$LZMA_CMD xz:$XZ_CMD;do ${chk#*:} -d < /tmp/tst.${chk%:*} | $MD5SUM_CMD | $SED_CMD s/-$/tst.${chk%:*}/ done 2e67f6ad33745dc5134767f0954cbdd6 tst.gz 2e67f6ad33745dc5134767f0954cbdd6 tst.bz2 2e67f6ad33745dc5134767f0954cbdd6 tst.lzma 2e67f6ad33745dc5134767f0954cbdd6 tst.xz
Using some bashims, this could look nicer, for sample use /dev/fd/{4,5,6,7}
, instead of tee /dev/fd/4 /dev/fd/5 /...
(((( seq 1 100000 | shuf | tee /dev/fd/{4,5,6,7} | gzip >/tmp/tst.gz ) 4>&1 | bzip2 >/tmp/tst.bz2 ) 5>&1 | lzma >/tmp/tst.lzma ) 6>&1 | xz >/tmp/tst.xz ) 7>&1 | md5sum 29078875555e113b31bd1ae876937d4b -
will work same.
This won't create any file, but would let you compare size of a compressed range of sorted integers, between 4 different compression tool (for fun, I used 4 different way for formatting output):
( ( ( ( ( seq 1 100000 | tee /dev/fd/{4,5,6,7} | gzip | wc -c | sed s/^/gzip:\ \ / >&3 ) 4>&1 | bzip2 | wc -c | xargs printf "bzip2: %s\n" >&3 ) 5>&1 | lzma | wc -c | perl -pe 's/^/lzma: /' >&3 ) 6>&1 | xz | wc -c | awk '{printf "xz: %9s\n",$1}' >&3 ) 7>&1 | wc -c ) 3>&1 gzip: 215157 bzip2: 124009 lzma: 17948 xz: 17992 588895
This demonstrate how to use stdin and stdout redirected in subshell and merged in console for final output.
>(...)
and <(...)
Recent bash versions permit a new syntax feature.
seq 1 100000 | wc -l 100000 seq 1 100000 > >( wc -l ) 100000 wc -l < <( seq 1 100000 ) 100000
As |
is an unnamed pipe to /dev/fd/0
, the syntax <()
do generate temporary unnamed pipe with others file descriptor /dev/fd/XX
.
md5sum <(zcat /tmp/tst.gz) <(bzcat /tmp/tst.bz2) <( lzcat /tmp/tst.lzma) <(xzcat /tmp/tst.xz) 29078875555e113b31bd1ae876937d4b /dev/fd/63 29078875555e113b31bd1ae876937d4b /dev/fd/62 29078875555e113b31bd1ae876937d4b /dev/fd/61 29078875555e113b31bd1ae876937d4b /dev/fd/60
This require GNU file
utility to be installed. Will determine command to be run by extension or file type.
for file in /tmp/tst.*;do cmd=$(which ${file##*.}) || { cmd=$(file -b --mime-type $file) cmd=$(which ${cmd#*-}) } read -a md5 < <($cmd -d <$file|md5sum) echo $md5 \ $file done 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma 29078875555e113b31bd1ae876937d4b /tmp/tst.xz
This let you do same previous thing by following syntax:
seq 1 100000 | shuf | tee >( echo gzip. $( gzip | wc -c ) ) >( echo gzip, $( wc -c < <(gzip)) ) >( gzip | wc -c | sed s/^/gzip:\ \ / ) >( bzip2 | wc -c | xargs printf "bzip2: %s\n" ) >( lzma | wc -c | perl -pe 's/^/lzma: /' ) >( xz | wc -c | awk '{printf "xz: %9s\n",$1}' ) > >( echo raw: $(wc -c) ) | xargs printf "%-8s %9d\n" raw: 588895 xz: 254556 lzma: 254472 bzip2: 231111 gzip: 274867 gzip, 274867 gzip. 274867
Note I used different way used to compute gzip
compressed count.
Note Because this operation was done simultaneously, output order will depend on time required by each command.
If you run some multi-core or multi-processor computer, try to compare this:
i=1 time for file in /tmp/tst.*;do cmd=$(which ${file##*.}) || { cmd=$(file -b --mime-type $file) cmd=$(which ${cmd#*-}) } read -a md5 < <($cmd -d <$file|md5sum) echo $((i++)) $md5 \ $file done | cat -n
wich may render:
1 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2 2 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz 3 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma 4 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz real 0m0.101s
with this:
time ( i=1 pids=() for file in /tmp/tst.*;do cmd=$(which ${file##*.}) || { cmd=$(file -b --mime-type $file) cmd=$(which ${cmd#*-}) } ( read -a md5 < <($cmd -d <$file|md5sum) echo $i $md5 \ $file ) & pids+=($!) ((i++)) done wait ${pids[@]} ) | cat -n
could give:
1 2 29078875555e113b31bd1ae876937d4b /tmp/tst.gz 2 1 29078875555e113b31bd1ae876937d4b /tmp/tst.bz2 3 4 29078875555e113b31bd1ae876937d4b /tmp/tst.xz 4 3 29078875555e113b31bd1ae876937d4b /tmp/tst.lzma real 0m0.070s
where ordering depend on type used by each fork.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With