I have a large number of small files to be searched. I have been looking for a good de-facto multi-threaded version of grep but could not find anything. How can I improve my usage of grep? As of now I am doing this:
grep -R "string" >> Strings 
                The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in. The output is the three lines in the file that contain the letters 'not'.
Example: Note: The egrep command used mainly due to the fact that it is faster than the grep command. The egrep command treats the meta-characters as they are and do not require to be escaped as is the case with grep.
If you have xargs installed on a multi-core processor, you can benefit from the following just in case someone is interested.
Environment:
Processor: Dual Quad-core 2.4GHz Memory: 32 GB Number of files: 584450 Total Size: ~ 35 GB   Tests:
1. Find the necessary files, pipe them to xargs and tell it to execute 8 instances.
time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -H "string" >> Strings_find8  real    3m24.358s user    1m27.654s sys     9m40.316s   2. Find the necessary files, pipe them to xargs and tell it to execute 4 instances.
time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P4 grep -H "string" >> Strings  real    16m3.051s user    0m56.012s sys     8m42.540s   3. Suggested by @Stephen: Find the necessary files and use + instead of xargs
time find ./ -name "*.ext" -exec grep -H "string" {} \+ >> Strings  real    53m45.438s user    0m5.829s sys     0m40.778s   4. Regular recursive grep.
grep -R "string" >> Strings  real    235m12.823s user    38m57.763s sys     38m8.301s   For my purposes, the first command worked just fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With