I have access to a machine where I have access to 10 of the cores -- and I would like to actually use them. What I am used to doing on my own machine would be something like this:
for f in *.fa; do
myProgram (options) "./$f" "./$f.tmp"
done
I have 10 files I'd like to do this on -- let's call them blah00.fa, blah01.fa, ... blah09.fa.
The problem with this approach is that myProgram only uses 1 core at a time, and doing it like this on the multi-core machine I'd be using 1 core at a time 10 times, so I wouldn't be using my mahcine to its max capability.
How could I change my script so that it runs all 10 of my .fa files at the same time? I looked at Run a looped process in bash across multiple cores but I couldn't get the command from that to do what I wanted exactly.
You could use
for f in *.fa; do
myProgram (options) "./$f" "./$f.tmp" &
done
wait
which would start all of you jobs in parallel, then wait until they all complete before moving on. In the case where you have more jobs than cores, you would start all of them and let your OS scheduler worry about swapping processes in an out.
One modification is to start 10 jobs at a time
count=0
for f in *.fa; do
myProgram (options) "./$f" "./$f.tmp" &
(( count ++ ))
if (( count = 10 )); then
wait
count=0
fi
done
but this is inferior to using parallel
because you can't start new jobs as old ones finish, and you also can't detect if an older job finished before you manage to start 10 jobs. wait
allows you to wait on a single particular process or all background processes, but doesn't let you know when any one of an arbitrary set of background processes complete.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With