Given the following example input on STDIN:
foo
bar bar
baz
===
qux
bla
===
def
zzz yyy
Is it possible to split it on the delimiter (in this case '===') and feed it over stdin to a command running in parallel?
So the example input above would result in 3 parallel processes (for example a command called do.sh) where each instance received a part of the data on STDIN, like this:
do.sh (instance 1) receives this over STDIN:
foo
bar bar
baz
do.sh (instance 2) receives this over STDIN:
qux
bla
do.sh (instance 3) receives this over STDIN:
def
zzz yyy
I suppose something like this is possible using xargs or GNU parallel, but I do not know how.
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables.
stdin / stdout are logical names for open files that are forwarded (or initialized) by the process that has started a given process. Actually, with the standard fork-and-exec pattern the setup of those may occur already in the new process (after fork) before exec is being called.
Running Commands in Parallel using Bash Shell The best method is to put all the wget commands in one script, and execute the script. The only thing to note here is to put all these wget commands in background (shell background). See our simple script file below. Notice the & towards the end of each command.
Description. parallel runs the specified command, passing it a single one of the specified arguments. This is repeated for each argument. Jobs may be run in parallel. The default is to run one job per CPU.
In general, no. One of the reasons for this assessment is that standard I/O reading from files, rather than the terminal, reads blocks of data - BUFSIZ bytes at a time, where BUFSIZ is usually a power of 2 such as 512 or larger. If the data is in a file, one process would read the whole file shown - the others would see nothing if they shared the same open file description (similar to a file descriptor, but several file descriptors can share the same open file description, and could be in different processes), or would read the whole same file if they did not share the same open file description.
So, you need a process to read the file that knows it needs to parcel the information out to the three processes - and it needs to know how it is to connect to the three processes. It might be that your distributor program runs the three processes and writes to their separate pipe inputs. Or it could be that the distributor connects to three sockets and writes to the different sockets.
Your example doesn't show/describe what would happen if there were 37 sections separated by the marker.
I have a home-brew program called tpipe
that is like the Unix tee
command, but it writes a copy of (all of) its standard input to each of the processes, and to standard output too by default. This might be a suitable basis for what you need (it at least covers the process management part of it). Contact me if you want a copy - see my profile.
If you are using Bash, you can use regular tee
with process substitution to simulate tpipe
. See this article for an illustration of how.
See also SF 96245 for another version of the same information - plus a link to a program called pee
that is quite similar to tpipe
(same basic idea, slightly different implementation in various respects).
GNU Parallel can do that from version 20110205.
cat | parallel --pipe --recend '===\n' --rrs do_stuff
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With