Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to distribute STDIN over parallel processes?

Given the following example input on STDIN:

foo
bar bar
baz
===
qux
bla
===
def
zzz yyy

Is it possible to split it on the delimiter (in this case '===') and feed it over stdin to a command running in parallel?

So the example input above would result in 3 parallel processes (for example a command called do.sh) where each instance received a part of the data on STDIN, like this:

do.sh (instance 1) receives this over STDIN:

foo
bar bar
baz

do.sh (instance 2) receives this over STDIN:

qux
bla

do.sh (instance 3) receives this over STDIN:

def
zzz yyy

I suppose something like this is possible using xargs or GNU parallel, but I do not know how.

like image 802
Erik Avatar asked Jan 11 '11 11:01

Erik


People also ask

How do I run a shell script in parallel?

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables.

What are stdin and stdout of the new process?

stdin / stdout are logical names for open files that are forwarded (or initialized) by the process that has started a given process. Actually, with the standard fork-and-exec pattern the setup of those may occur already in the new process (after fork) before exec is being called.

How do you run a command in parallel?

Running Commands in Parallel using Bash Shell The best method is to put all the wget commands in one script, and execute the script. The only thing to note here is to put all these wget commands in background (shell background). See our simple script file below. Notice the & towards the end of each command.

What is parallel Linux?

Description. parallel runs the specified command, passing it a single one of the specified arguments. This is repeated for each argument. Jobs may be run in parallel. The default is to run one job per CPU.


2 Answers

In general, no. One of the reasons for this assessment is that standard I/O reading from files, rather than the terminal, reads blocks of data - BUFSIZ bytes at a time, where BUFSIZ is usually a power of 2 such as 512 or larger. If the data is in a file, one process would read the whole file shown - the others would see nothing if they shared the same open file description (similar to a file descriptor, but several file descriptors can share the same open file description, and could be in different processes), or would read the whole same file if they did not share the same open file description.

So, you need a process to read the file that knows it needs to parcel the information out to the three processes - and it needs to know how it is to connect to the three processes. It might be that your distributor program runs the three processes and writes to their separate pipe inputs. Or it could be that the distributor connects to three sockets and writes to the different sockets.

Your example doesn't show/describe what would happen if there were 37 sections separated by the marker.

I have a home-brew program called tpipe that is like the Unix tee command, but it writes a copy of (all of) its standard input to each of the processes, and to standard output too by default. This might be a suitable basis for what you need (it at least covers the process management part of it). Contact me if you want a copy - see my profile.


If you are using Bash, you can use regular tee with process substitution to simulate tpipe. See this article for an illustration of how.

See also SF 96245 for another version of the same information - plus a link to a program called pee that is quite similar to tpipe (same basic idea, slightly different implementation in various respects).

like image 123
Jonathan Leffler Avatar answered Oct 31 '22 18:10

Jonathan Leffler


GNU Parallel can do that from version 20110205.

cat | parallel --pipe --recend '===\n' --rrs do_stuff
like image 39
Ole Tange Avatar answered Oct 31 '22 19:10

Ole Tange