I am working on a pipeline that has a few branch points that subsequently merge-- they look something like this:
command2
/ \
command1 command4
\ /
command3
Each command writes to STDOUT and accepts input via STDIN. STDOUT from command1 needs to be passed to both command2 and command3, which are run sequentially, and their output needs to be effectively concatenated and passed to command4. I initially thought that something like this would work:
$ command1 | (command2; command3) | command4
That doesn't work though, as only STDOUT from command2 is passed to command 4, and when I remove command4 it's apparent that command3 isn't being passed the appropriate stream from command1 -- in other words, it's as if command2 is exhausting or consuming the stream. I get the same result with { command2 ; command3 ; } in the middle as well. So I figured I should be using 'tee' with process substitution, and tried this:
$ command1 | tee >(command2) | command3 | command4
But surprisingly that didn't work either -- it appears that the output of command1 and the output of command2 are piped into command3, which results in errors and only the output of command3 being piped into command4. I did find that the following gets the appropriate input and output to and from command2 and command3:
$ command1 | tee >(command2) >(command3) | command4
However, this streams the output of command1 to command4 as well, which leads to issues as command2 and command3 produce a different specification than command1. The solution I've arrived on seems hacky, but it does work:
$ command1 | tee >(command2) >(command3) > /dev/null | command4
That suppresses command1 passing its output to command4, while collecting STDOUT from command2 and command3. It works, but I feel like I'm missing a more obvious solution. Am I? I've read dozens of threads and haven't found a solution to this problem that works in my use case, nor have I seen an elaboration of the exact problem of splitting and re-joining streams (though I can't be the first one to deal with this). Should I just be using named pipes? I tried but had difficulty getting that working as well, so maybe that's another story for another thread. I'm using bash in RHEL5.8.
You can play around with file descriptors like this;
((date | tee >( wc >&3) | wc) 3>&1) | wc
or
((command1 | tee >( command2 >&3) | command3) 3>&1) | command4
To explain, that is tee >( wc >&3)
will output the original data on stdout, and the inner wc
will output the result on FD 3. The outer 3>&1) will then merge FD3 output back into STDOUT so output from both wc is sent to the tailing command.
HOWEVER, there is nothing in this pipeline (or the one in your own solution) which will guanrantee that the output will not be mangled. That is incomplete lines from command2 will not be mixed up with lines of command3 -- if that is a concern, you will need to do one of two things;
tee
program which internally uses popen and read each line back before sending complete lines to stdout for command4 to readcat
to merge the data as input to command4If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With