I'm writing a tiny script that calls the "PNGOUT" util on a few hundred PNG files. I simply did this:
find $BASEDIR -iname "*png" -exec pngout {} \;
And then I looked at my CPU monitor and noticed only one of the core was used, which is quite sad.
In this day and age of dual, quad, octo and hexa (?) cores desktop, how do I simply parallelize this task with Bash? (it's not the first time I've had such a need, for quite a lot of these utils are mono-threaded... I already had the case with mp3 encoders).
Would simply running all the pngout in the background do? How would my find command look like then? (I'm not too sure how to mix find and the '&' character)
I if have three hundreds pictures, this would mean swapping between three hundreds processes, which doesn't seem great anyway!?
Or should I copy my three hundreds files or so in "nb dirs", where "nb dirs" would be the number of cores, then run concurrently "nb finds"? (which would be close enough)
But how would I do this?
The general way to parallelize any operation is to take a particular function that should be run multiple times and make it run parallelly in different processors. To do this, you initialize a Pool with n number of processors and pass the function you want to parallelize to one of Pool s parallization methods.
Using wait. We can launch a script in the background initially and later wait for it to finish before executing another script using the wait command. This command works even if the process exits with a non-zero failure code.
Answering my own question... It turns out there's a relatively unknown feature of the xargs command that can be used to accomplish that:
find . -iname "*png" -print0 | xargs -0 --max-procs=4 -n 1 pngout
Bingo, instant 4x speedup on a quad-cores machine :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With