I usually use grep -rIn pattern_str big_source_code_dir
to find some thing. but the grep
is not parallel, how do I make it parallel? My system has 4 cores, if the grep
can use all the cores, it would be faster.
The GNU parallel
command is really useful for this.
sudo apt-get install parallel # if not available on debian based systems
Then, paralell
man page provides an example:
EXAMPLE: Parallel grep
grep -r greps recursively through directories.
On multicore CPUs GNU parallel can often speed this up.
find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
This will run 1.5 job per core, and give 1000 arguments to grep.
In your case it could be:
find big_source_code_dir -type f | parallel -k -j150% -n 1000 -m grep -H -n pattern_str {}
Finally, the GNU parallel man page also provides a section describing differences betwenn xargs
and parallel
command, that should help understanding why parallel seems better in your case
DIFFERENCES BETWEEN xargs AND GNU Parallel
xargs offer some of the same possibilities as GNU parallel.
xargs deals badly with special characters (such as space, ' and "). To see the problem try this:
touch important_file
touch 'not important_file'
ls not* | xargs rm
mkdir -p "My brother's 12\" records"
ls | xargs rmdir
You can specify -0 or -d "\n", but many input generators are not optimized for using NUL as separator but are optimized for newline as separator. E.g head, tail, awk, ls, echo, sed, tar -v, perl (-0 and \0 instead of \n),
locate (requires using -0), find (requires using -print0), grep (requires user to use -z or -Z), sort (requires using -z).
So GNU parallel's newline separation can be emulated with:
cat | xargs -d "\n" -n1 command
xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel.
xargs has no support for grouping the output, therefore output may run together, e.g. the first half of a line is from one process and the last half of the line is from another process. The example Parallel grep cannot be
done reliably with xargs because of this.
...
There will not be speed improvement if you are using a HDD to store that directory you are searching in. Hard drives are pretty much single-threaded access units.
But if you really want to do parallel grep, then this website gives two hints of how to do it with find
and xargs
. E.g.
find . -type f -print0 | xargs -0 -P 4 -n 40 grep -i foobar
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With