Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OS X / Linux: pipe into two processes?

Tags:

I know about

program1 | program2 

and

program1 | tee outputfile | program2 

but is there a way to feed program1's output into both program2 and program3?

like image 996
Jason S Avatar asked Apr 18 '12 21:04

Jason S


People also ask

How can you use pipe in multiple commands?

You can make it do so by using the pipe character '|'. Pipe is used to combine two or more commands, and in this, the output of one command acts as input to another command, and this command's output may act as input to the next command and so on.

Are Linux pipes bidirectional?

Portability notes On some systems (but not Linux), pipes are bidirectional: data can be transmitted in both directions between the pipe ends. POSIX. 1 requires only unidirectional pipes. Portable applications should avoid reliance on bidirectional pipe semantics.

How is process connected with pipes in Linux?

A pipe usually connects only two processes, although any number of child processes can be connected to each other and their related parent by a single pipe. A pipe is created in the process that becomes the parent by a call to pipe(2). The call returns two file descriptors in the array passed to it.

Is pipe a shared memory?

A pipe is a section of shared memory that processes use for communication. The process that creates a pipe is the pipe server. A process that connects to a pipe is a pipe client. One process writes information to the pipe, then the other process reads the information from the pipe.


2 Answers

You can do this with tee and process substitution.

program1 | tee >(program2) >(program3) 

The output of program1 will be piped to whatever is inside ( ), in this case program2 and program3.

like image 67
inspector-g Avatar answered Oct 27 '22 20:10

inspector-g


Intro about parallelisation

This seem trivial, but doing this is not only possible, also doing so will generate concurrent or simultaneous process.

You may have to take care about some particular effects, like order of execution, exection time, etc.

There are some sample at end of this post.

Compatible answer first

As this question is flagged shell and unix, I will first give a POSIX compatible answer. (for bashisms, go further.)

Yes, there is a way to use unnamed pipes.

In this sample, I will generate a range of 100'000 numbers, randomize them and compress the result using 4 different compression tools to compare the compression ratio...

For this to I will first run the preparation:

GZIP_CMD=`which gzip` BZIP2_CMD=`which bzip2` LZMA_CMD=`which lzma` XZ_CMD=`which xz` MD5SUM_CMD=`which md5sum` SED_CMD=`which sed` 

Note: specifying full path to commands prevent some shell interpreter (like busybox) to run built-in compressor. And doing way will ensure same syntax will run independently of os installation (paths could be different between MacOs, Ubuntu, RedHat, HP-Ux and so...).

The syntax NN>&1 (where NN is a number between 3 and 63) do generate unnamed pipe who could by find at /dev/fd/NN. (The file descriptors 0 to 2 are already open for 0: STDIN, 1: STDOUT and 2: STDERR).

Try this (tested under dash, busybox and bash) :

(((( seq 1 100000 | shuf | tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 | $GZIP_CMD >/tmp/tst.gz ) 4>&1 | $BZIP2_CMD >/tmp/tst.bz2 ) 5>&1 | $LZMA_CMD >/tmp/tst.lzma ) 6>&1 | $XZ_CMD >/tmp/tst.xz ) 7>&1 | $MD5SUM_CMD 

or more readable:

GZIP_CMD=`which gzip` BZIP2_CMD=`which bzip2` LZMA_CMD=`which lzma` XZ_CMD=`which xz` MD5SUM_CMD=`which md5sum`  (   (     (       (         seq 1 100000 |           shuf |           tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 |           $GZIP_CMD >/tmp/tst.gz       ) 4>&1 |         $BZIP2_CMD >/tmp/tst.bz2     ) 5>&1 |       $LZMA_CMD >/tmp/tst.lzma   ) 6>&1 |     $XZ_CMD >/tmp/tst.xz ) 7>&1 |   $MD5SUM_CMD 2e67f6ad33745dc5134767f0954cbdd6  - 

As shuf do random placement, if you try this, you must obtain different result,

ls -ltrS /tmp/tst.* -rw-r--r-- 1 user user 230516 oct  1 22:14 /tmp/tst.bz2 -rw-r--r-- 1 user user 254811 oct  1 22:14 /tmp/tst.lzma -rw-r--r-- 1 user user 254892 oct  1 22:14 /tmp/tst.xz -rw-r--r-- 1 user user 275003 oct  1 22:14 /tmp/tst.gz 

but you must be able to compare md5 checksums:

SED_CMD=`which sed`  for chk in gz:$GZIP_CMD bz2:$BZIP2_CMD lzma:$LZMA_CMD xz:$XZ_CMD;do     ${chk#*:} -d < /tmp/tst.${chk%:*} |         $MD5SUM_CMD |         $SED_CMD s/-$/tst.${chk%:*}/   done 2e67f6ad33745dc5134767f0954cbdd6  tst.gz 2e67f6ad33745dc5134767f0954cbdd6  tst.bz2 2e67f6ad33745dc5134767f0954cbdd6  tst.lzma 2e67f6ad33745dc5134767f0954cbdd6  tst.xz 

Using bash features

Using some bashims, this could look nicer, for sample use /dev/fd/{4,5,6,7}, instead of tee /dev/fd/4 /dev/fd/5 /...

(((( seq 1 100000 | shuf | tee /dev/fd/{4,5,6,7} | gzip >/tmp/tst.gz ) 4>&1 |    bzip2 >/tmp/tst.bz2 ) 5>&1 | lzma >/tmp/tst.lzma ) 6>&1 |    xz >/tmp/tst.xz ) 7>&1 | md5sum 29078875555e113b31bd1ae876937d4b  - 

will work same.

Final check

This won't create any file, but would let you compare size of a compressed range of sorted integers, between 4 different compression tool (for fun, I used 4 different way for formatting output):

(   (     (       (         (           seq 1 100000 |             tee /dev/fd/{4,5,6,7} |               gzip |               wc -c |               sed s/^/gzip:\ \ / >&3         ) 4>&1 |           bzip2 |           wc -c |           xargs printf "bzip2: %s\n" >&3       ) 5>&1 |         lzma |         wc -c |         perl -pe 's/^/lzma:   /' >&3     ) 6>&1 |       xz |       wc -c |       awk '{printf "xz: %9s\n",$1}' >&3   ) 7>&1 |     wc -c ) 3>&1 gzip:  215157 bzip2: 124009 lzma:   17948 xz:     17992 588895 

This demonstrate how to use stdin and stdout redirected in subshell and merged in console for final output.

Syntax >(...) and <(...)

Recent bash versions permit a new syntax feature.

seq 1 100000 | wc -l 100000  seq 1 100000 > >( wc -l ) 100000  wc -l < <( seq 1 100000 ) 100000 

As | is an unnamed pipe to /dev/fd/0, the syntax <() do generate temporary unnamed pipe with others file descriptor /dev/fd/XX.

md5sum <(zcat /tmp/tst.gz) <(bzcat /tmp/tst.bz2) <(          lzcat /tmp/tst.lzma) <(xzcat /tmp/tst.xz) 29078875555e113b31bd1ae876937d4b  /dev/fd/63 29078875555e113b31bd1ae876937d4b  /dev/fd/62 29078875555e113b31bd1ae876937d4b  /dev/fd/61 29078875555e113b31bd1ae876937d4b  /dev/fd/60 

More sophisticated demo

This require GNU file utility to be installed. Will determine command to be run by extension or file type.

for file in /tmp/tst.*;do     cmd=$(which ${file##*.}) || {         cmd=$(file -b --mime-type $file)         cmd=$(which ${cmd#*-})     }     read -a md5 < <($cmd -d <$file|md5sum)     echo $md5 \ $file   done 29078875555e113b31bd1ae876937d4b  /tmp/tst.bz2 29078875555e113b31bd1ae876937d4b  /tmp/tst.gz 29078875555e113b31bd1ae876937d4b  /tmp/tst.lzma 29078875555e113b31bd1ae876937d4b  /tmp/tst.xz 

This let you do same previous thing by following syntax:

seq 1 100000 |     shuf |         tee >(             echo gzip. $( gzip | wc -c )           )  >(             echo gzip, $( wc -c < <(gzip))           ) >(             gzip  | wc -c | sed s/^/gzip:\ \ /           ) >(             bzip2 | wc -c | xargs printf "bzip2: %s\n"           ) >(             lzma  | wc -c | perl -pe 's/^/lzma:  /'           ) >(             xz    | wc -c | awk '{printf "xz: %9s\n",$1}'           ) > >(             echo raw: $(wc -c)           ) |         xargs printf "%-8s %9d\n"  raw:        588895 xz:         254556 lzma:       254472 bzip2:      231111 gzip:       274867 gzip,       274867 gzip.       274867 

Note I used different way used to compute gzip compressed count.

Note Because this operation was done simultaneously, output order will depend on time required by each command.

Going further about parallelisation

If you run some multi-core or multi-processor computer, try to compare this:

i=1 time for file in /tmp/tst.*;do     cmd=$(which ${file##*.}) || {         cmd=$(file -b --mime-type $file)         cmd=$(which ${cmd#*-})     }     read -a md5 < <($cmd -d <$file|md5sum)     echo $((i++)) $md5 \ $file   done | cat -n 

wich may render:

     1      1 29078875555e113b31bd1ae876937d4b  /tmp/tst.bz2      2      2 29078875555e113b31bd1ae876937d4b  /tmp/tst.gz      3      3 29078875555e113b31bd1ae876937d4b  /tmp/tst.lzma      4      4 29078875555e113b31bd1ae876937d4b  /tmp/tst.xz  real    0m0.101s 

with this:

time  (     i=1 pids=()     for file in /tmp/tst.*;do         cmd=$(which ${file##*.}) || {             cmd=$(file -b --mime-type $file)             cmd=$(which ${cmd#*-})         }         (              read -a md5 < <($cmd -d <$file|md5sum)              echo $i $md5 \ $file         ) & pids+=($!)       ((i++))       done     wait ${pids[@]} ) | cat -n 

could give:

     1      2 29078875555e113b31bd1ae876937d4b  /tmp/tst.gz      2      1 29078875555e113b31bd1ae876937d4b  /tmp/tst.bz2      3      4 29078875555e113b31bd1ae876937d4b  /tmp/tst.xz      4      3 29078875555e113b31bd1ae876937d4b  /tmp/tst.lzma  real    0m0.070s 

where ordering depend on type used by each fork.

like image 28
F. Hauri Avatar answered Oct 27 '22 20:10

F. Hauri