I have a large number of small files to be searched. I have been looking for a good de-facto multi-threaded version of grep
but could not find anything. How can I improve my usage of grep? As of now I am doing this:
grep -R "string" >> Strings
The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in. The output is the three lines in the file that contain the letters 'not'.
Example: Note: The egrep command used mainly due to the fact that it is faster than the grep command. The egrep command treats the meta-characters as they are and do not require to be escaped as is the case with grep.
If you have xargs installed on a multi-core processor, you can benefit from the following just in case someone is interested.
Environment:
Processor: Dual Quad-core 2.4GHz Memory: 32 GB Number of files: 584450 Total Size: ~ 35 GB
Tests:
1. Find the necessary files, pipe them to xargs and tell it to execute 8 instances.
time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -H "string" >> Strings_find8 real 3m24.358s user 1m27.654s sys 9m40.316s
2. Find the necessary files, pipe them to xargs and tell it to execute 4 instances.
time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P4 grep -H "string" >> Strings real 16m3.051s user 0m56.012s sys 8m42.540s
3. Suggested by @Stephen: Find the necessary files and use + instead of xargs
time find ./ -name "*.ext" -exec grep -H "string" {} \+ >> Strings real 53m45.438s user 0m5.829s sys 0m40.778s
4. Regular recursive grep.
grep -R "string" >> Strings real 235m12.823s user 38m57.763s sys 38m8.301s
For my purposes, the first command worked just fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With