Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use GNU parallel to parallelise a bash for loop

I have a for loop which runs a Python script ~100 times on 100 different input folders. The python script is most efficient on 2 cores, and I have 50 cores available. So I'd like to use GNU parallel to run the script on 25 folders at a time.

Here's my for loop (works fine, but is sequential of course), the python script takes a bunch of input variables including the -p 2 which runs it on two cores:

for folder in $(find /home/rob/PartitionFinder/ -maxdepth 2 -type d); do
        python script.py --raxml --quick --no-ml-tree $folder --force -p 2
done

and here's my attempt to parallelise it, which does not work:

folders=$(find /home/rob/PartitionFinder/ -maxdepth 2 -type d)

echo $folders | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2

The issue I'm hitting (perhaps it's just the first of many though) is that my folders variable is not a list, so it's really just passing a long string of 100 folders as the {} to the script.

All hints gratefully received.

like image 510
roblanf Avatar asked Feb 05 '26 19:02

roblanf


2 Answers

Replace echo $folders | parallel ... with echo "$folders" | parallel ....

Without the double quotes, the shell parses spaces in $folders and passes them as separate arguments to echo, which causes them to be printed on one line. parallel provides each line as argument to the job.

To avoid such quoting issues altogether, it is always a good idea to pipe find to parallel directly, and use the null character as the delimiter:

find ... -print0 | parallel -0 ...

This will work even when encountering file names that contain multiple spaces or a newline character.

like image 60
user4815162342 Avatar answered Feb 09 '26 08:02

user4815162342


you can pipe find directly to parallel:

 find /home/rob/PartitionFinder/ -maxdepth 2 -type d | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2

If you want to keep the string in $folder, you can pipe the echo to xargs.

echo $folders | xargs -n 1 | parallel -P 25 python script.py --raxml --quick --no-ml-tree {} --force -p 2
like image 23
olisch Avatar answered Feb 09 '26 09:02

olisch