Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallelize Bash script with maximum number of processes

Tags:

bash

Lets say I have a loop in Bash:

for foo in `some-command` do    do-something $foo done 

do-something is cpu bound and I have a nice shiny 4 core processor. I'd like to be able to run up to 4 do-something's at once.

The naive approach seems to be:

for foo in `some-command` do    do-something $foo & done 

This will run all do-somethings at once, but there are a couple downsides, mainly that do-something may also have some significant I/O which performing all at once might slow down a bit. The other problem is that this code block returns immediately, so no way to do other work when all the do-somethings are finished.

How would you write this loop so there are always X do-somethings running at once?

like image 950
thelsdj Avatar asked Sep 01 '08 16:09

thelsdj


People also ask

Can you parallelize a bash script?

Often, you can Bash scripts in parallel, which can dramatically speed up the result.

Is bash multithreaded?

When you execute a Bash script, it will at maximum use a single CPU thread, unless you start subshells/threads. If your machine has at least two CPU threads, you will be able to max-out CPU resources using multi-threaded scripting in Bash.

Does Xargs run in parallel?

xargs will run the first two commands in parallel, and then whenever one of them terminates, it will start another one, until the entire job is done. The same idea can be generalized to as many processors as you have handy. It also generalizes to other resources besides processors.


1 Answers

Depending on what you want to do xargs also can help (here: converting documents with pdf2ps):

cpus=$( ls -d /sys/devices/system/cpu/cpu[[:digit:]]* | wc -w )  find . -name \*.pdf | xargs --max-args=1 --max-procs=$cpus  pdf2ps 

From the docs:

--max-procs=max-procs -P max-procs        Run up to max-procs processes at a time; the default is 1.        If max-procs is 0, xargs will run as many processes as  possible  at  a        time.  Use the -n option with -P; otherwise chances are that only one        exec will be done. 
like image 103
Fritz G. Mehner Avatar answered Sep 18 '22 09:09

Fritz G. Mehner