I often find myself writing simple for loops to perform an operation to many files, for example:
for i in `find . | grep ".xml$"`; do bzip2 $i; done
It seems a bit depressing that on my 4-core machine only one core is getting used.. is there an easy way I can add parallelism to my shell scripting?
EDIT: To introduce a bit more context to my problems, sorry I was not more clear to start with!
I often want to run simple(ish) scripts, such as plot a graph, compress or uncompress, or run some program, on reasonable sized datasets (usually between 100 and 10,000). The scripts I use to solve such problems look like the one above, but might have a different command, or even a sequence of commands to execute.
For example, just now I am running:
for i in `find . | grep ".xml.bz2$"`; do find_graph -build_graph $i.graph $i; done
So my problems are in no way bzip specific! (Although parallel bzip does look cool, I intend to use it in future).
Solution: Use xargs
to run in parallel (don't forget the -n
option!)
find -name \*.xml -print0 | xargs -0 -n 1 -P 3 bzip2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With