How does grep run so fast?

People also ask

How efficient is grep?

Typically grep is an efficient way to search text. However, it can be quite slow in some cases, and it can search large files where even minor performance tweaking can help significantly.

How long does grep take to run?

Now grep only takes 2 minutes and 8 seconds. By excluding Windows 10 partitions with 147 Gib of Programs and Data saves 21.5 minutes!

Is awk or grep faster?

When only searching for strings, and speed matters, you should almost always use grep . It's orders of magnitude faster than awk when it comes to just gross searching.

Does grep use a lot of memory?

grep memory usage is constant; it doesn't scale with file size†. It doesn't need to keep the whole file in memory, only the area it's searching through.

Assuming your question regards GNU grep specifically. Here's a note from the author, Mike Haertel:

GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH BYTE that it does look at.

GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.

GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the Boyer-Moore delta table entries in such a way that it doesn't need to do the loop exit test at every unrolled step. The result of this is that, in the limit, GNU grep averages fewer than 3 x86 instructions executed for each input byte it actually looks at (and it skips many bytes entirely).

GNU grep uses raw Unix input system calls and avoids copying data after reading it. Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!

So instead of using line-oriented input, GNU grep reads raw data into a large buffer, searches the buffer using Boyer-Moore, and only when it finds a match does it go and look for the bounding newlines (Certain command line options like -n disable this optimization.)

This answer is a subset of the information taken from here.

To add to Steve's excellent answer.

It may not be widely known but grep is almost always faster when grepping for a longer pattern-string than a short one, because in a longer pattern, Boyer-Moore can skip forward in longer strides to achieve even better sublinear speeds:

Example:

# after running these twice to ensure apples-to-apples comparison
# (everything is in the buffer cache) 

$ time grep -c 'tg=f_c' 20140910.log
28
0.168u 0.068s 0:00.26

$ time grep -c ' /cc/merchant.json tg=f_c' 20140910.log
28
0.100u 0.056s 0:00.17

The longer form is 35% faster!

How come? Boyer-Moore consructs a skip-forward table from the pattern-string, and whenever there's a mismatch, it picks the longest skip possible (from last char to first) before comparing a single char in the input to the char in the skip table.

Here's a video explaining Boyer Moore (Credit to kommradHomer)

Another common misconception (for GNU grep) is that fgrep is faster than grep. f in fgrep doesn't stand for 'fast', it stands for 'fixed' (see the man page), and since both are the same program, and both use Boyer-Moore, there's no difference in speed between them when searching for fixed-strings without regexp special chars. The only reason I use fgrep is when there's a regexp special char (like ., [], or *) I don't want it to be interpreted as such. And even then the more portable/standard form of grep -F is preferred over fgrep.

Related questions
                            
                                Color text in terminal applications in UNIX [duplicate]
                            
                                Run java jar file on a server as background process
                            
                                How to split a file into equal parts, without breaking individual lines? [duplicate]
                            
                                grep for special characters in Unix
                            
                                Is file append atomic in UNIX?
                            
                                How to send a simple string between two programs using pipes?
                            
                                Convert decimal to hexadecimal in UNIX shell script
                            
                                What does 'stale file handle' in Linux mean?
                            
                                Grep for literal strings
                            
                                Is there a way to ignore header lines in a UNIX sort?
                            
                                Track the time a command takes in UNIX/LINUX?
                            
                                How/When does Execute Shell mark a build as failure in Jenkins?
                            
                                What's the difference between --general-numeric-sort and --numeric-sort options in gnu sort
                            
                                How do you set your pythonpath in an already-created virtualenv?
                            
                                Why should eval be avoided in Bash, and what should I use instead?
                            
                                Multiplication on command line terminal
                            
                                An efficient way to transpose a file in Bash
                            
                                find -exec cmd {} + vs | xargs
                            
                                How does this bash fork bomb work? [duplicate]
                            
                                mkdir's "-p" option

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does grep run so fast?

Tags:

grep

unix

People also ask

Recent Activity

Donate For Us