Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easy parallelisation

I often find myself writing simple for loops to perform an operation to many files, for example:

for i in `find . | grep ".xml$"`; do bzip2 $i; done

It seems a bit depressing that on my 4-core machine only one core is getting used.. is there an easy way I can add parallelism to my shell scripting?

EDIT: To introduce a bit more context to my problems, sorry I was not more clear to start with!

I often want to run simple(ish) scripts, such as plot a graph, compress or uncompress, or run some program, on reasonable sized datasets (usually between 100 and 10,000). The scripts I use to solve such problems look like the one above, but might have a different command, or even a sequence of commands to execute.

For example, just now I am running:

for i in `find . | grep ".xml.bz2$"`; do find_graph -build_graph $i.graph $i; done

So my problems are in no way bzip specific! (Although parallel bzip does look cool, I intend to use it in future).

like image 757
Chris Jefferson Avatar asked Nov 11 '08 19:11

Chris Jefferson


1 Answers

Solution: Use xargs to run in parallel (don't forget the -n option!)

find -name \*.xml -print0 | xargs -0 -n 1 -P 3 bzip2
like image 176
Johannes Schaub - litb Avatar answered Sep 21 '22 06:09

Johannes Schaub - litb