Suppose I have 10K filesa and a bash
script which processes a single file. Now I would like to process all these files concurrently with only K
script running in parallel. I do not want (obviously) to process any file more than once.
How would you suggest implement it in bash
?
One way of executing a limited number of parallel jobs is with GNU parallel. For example, with this command:
find . -type f -print0 | parallel -0 -P 3 ./myscript {1}
You will pass all files in the current directory (and its subdirectories) as parameters to myscript
, one at a time. The -0
option sets the delimiter to be the null character, and the -P
option sets the number of jobs that are executed in parallel. The default number of parallel processes is equal to the number of cores in the system. There are other options for parallel processing in clusters etc, which are documented here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With