I'd like to know if there is any tip to make grep
as fast as possible. I have a rather large base of text files to search in the quickest possible way. I've made them all lowercase, so that I could get rid of -i
option. This makes the search much faster.
Also, I've found out that -F
and -P
modes are quicker than the default one. I use the former when the search string is not a regular expression (just plain text), the latter if regex is involved.
Does anyone have any experience in speeding up grep
? Maybe compile it from scratch with some particular flag (I'm on Linux CentOS), organize the files in a certain fashion or maybe make the search parallel in some way?
Is fast grep faster? The grep utility searches text files for regular expressions, but it can search for ordinary strings since these strings are a special case of regular expressions. However, if your regular expressions are in fact simply text strings, fgrep may be much faster than grep .
@Gilles looks good, repeating each test here 100 times (timing the entire thing), egrep is faster than grep until I set LANG=C and then they're both roughly the same. @EightBitTony Look at user time (which does not include time waiting for disk). There is an order of magnitude in difference.
grep should be slightly faster because awk does more with each input line than just search for a regexp in it, e.g. if a field is referenced in the script (which it's not in this case) awk will split each input line into fields based on the field-separator value and it populates builtin variables.
Typically grep is an efficient way to search text. However, it can be quite slow in some cases, and it can search large files where even minor performance tweaking can help significantly.
Try with GNU parallel, which includes an example of how to use it with grep
:
grep -r
greps recursively through directories. On multicore CPUs GNUparallel
can often speed this up.find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
This will run 1.5 job per core, and give 1000 arguments to
grep
.
For big files, it can split it the input in several chunks with the --pipe
and --block
arguments:
parallel --pipe --block 2M grep foo < bigfile
You could also run it on several different machines through SSH (ssh-agent needed to avoid passwords):
parallel --pipe --sshlogin server.example.com,server2.example.net grep foo < bigfile
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With