Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doing parallel processing in bash?

Tags:

I've thousands of png files which I like to make smaller with pngcrush. I've a simple find .. -exec job, but it's sequential. My machine has quite some resources and I'd make this in parallel.

The operation to be performed on every png is:

pngcrush input output && mv output input 

Ideally I can specify the maximum number of parallel operations.

Is there a way to do this with bash and/or other shell helpers? I'm Ubuntu or Debian.

like image 614
mark Avatar asked Sep 28 '10 10:09

mark


People also ask

Can you multithread in bash?

If your machine has at least two CPU threads, you will be able to max-out CPU resources using multi-threaded scripting in Bash. The reason for this is simple; as soon as a secondary 'thread' (read: subshell) is started, then that subsequent thread can (and often will) use a different CPU thread.

Do bash scripts run in parallel?

To run script in parallel in bash, you must send individual scripts to background. So the loop will not wait for the last process to exit and will immediately process all the scripts.

How do I run a parallel job in Linux?

The next method that we can use to run processes in parallel is our regular xargs command. Xargs supports an option to specify the number of processes that you want to run simultaneously. See below. seq command will simply give 1, 2, and 3 as output in three lines.


1 Answers

You can use xargs to run multiple processes in parallel:

find /path -print0 | xargs -0 -n 1 -P <nr_procs> sh -c 'pngcrush $1 temp.$$ && mv temp.$$ $1' sh 

xargs will read the list of files produced by find (separated by 0 characters (-0)) and run the provided command (sh -c '...' sh) with one parameter at a time (-n 1). xargs will run <nr_procs> (-P <nr_procs>) in parallel.

like image 156
Bart Sas Avatar answered Nov 12 '22 00:11

Bart Sas