Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest possible grep

Tags:

grep

bash

unix

I'd like to know if there is any tip to make grep as fast as possible. I have a rather large base of text files to search in the quickest possible way. I've made them all lowercase, so that I could get rid of -i option. This makes the search much faster.

Also, I've found out that -F and -P modes are quicker than the default one. I use the former when the search string is not a regular expression (just plain text), the latter if regex is involved.

Does anyone have any experience in speeding up grep? Maybe compile it from scratch with some particular flag (I'm on Linux CentOS), organize the files in a certain fashion or maybe make the search parallel in some way?

like image 922
pistacchio Avatar asked Jan 30 '12 15:01

pistacchio


People also ask

Is there anything faster than grep?

Is fast grep faster? The grep utility searches text files for regular expressions, but it can search for ordinary strings since these strings are a special case of regular expressions. However, if your regular expressions are in fact simply text strings, fgrep may be much faster than grep .

Is grep faster than Egrep?

@Gilles looks good, repeating each test here 100 times (timing the entire thing), egrep is faster than grep until I set LANG=C and then they're both roughly the same. @EightBitTony Look at user time (which does not include time waiting for disk). There is an order of magnitude in difference.

Is awk or grep faster?

grep should be slightly faster because awk does more with each input line than just search for a regexp in it, e.g. if a field is referenced in the script (which it's not in this case) awk will split each input line into fields based on the field-separator value and it populates builtin variables.

How efficient is grep?

Typically grep is an efficient way to search text. However, it can be quite slow in some cases, and it can search large files where even minor performance tweaking can help significantly.


1 Answers

Try with GNU parallel, which includes an example of how to use it with grep:

grep -r greps recursively through directories. On multicore CPUs GNU parallel can often speed this up.

find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {} 

This will run 1.5 job per core, and give 1000 arguments to grep.

For big files, it can split it the input in several chunks with the --pipe and --block arguments:

 parallel --pipe --block 2M grep foo < bigfile 

You could also run it on several different machines through SSH (ssh-agent needed to avoid passwords):

parallel --pipe --sshlogin server.example.com,server2.example.net grep foo < bigfile 
like image 72
Chewie Avatar answered Oct 13 '22 05:10

Chewie