Grepping a huge file (80GB) any way to speed it up?

People also ask

How do you grep fast?

If you just need matching filenames, and not the actual matches found in the files, then you should run grep with the -l flag. This flag causes grep to just print filenames that match, and not print the matching lines.

Does grep have a file size limit?

Lack of disk space or exceeding enabled quotas will also cause the output file to truncate. grep has a line length limit of 2048 characters. There also is a concept of largefiles, files which are so …

Is Egrep faster than grep?

The egrep command allows the use of extended regex. The fgrep command on the other hand works on fixed string instead of a regex. This means that it takes the search pattern as it is for searching and thus it is faster than grep.

Here are a few options:

1) Prefix your grep command with LC_ALL=C to use the C locale instead of UTF-8.

2) Use fgrep because you're searching for a fixed string, not a regular expression.

3) Remove the -i option, if you don't need it.

So your command becomes:

LC_ALL=C fgrep -A 5 -B 5 'db_pd.Clients' eightygigsfile.sql

It will also be faster if you copy your file to RAM disk.

If you have a multicore CPU, I would really recommend GNU parallel. To grep a big file in parallel use:

< eightygigsfile.sql parallel --pipe grep -i -C 5 'db_pd.Clients'

Depending on your disks and CPUs it may be faster to read larger blocks:

< eightygigsfile.sql parallel --pipe --block 10M grep -i -C 5 'db_pd.Clients'

It's not entirely clear from you question, but other options for grep include:

Dropping the -i flag.
Using the -F flag for a fixed string
Disabling NLS with LANG=C
Setting a max number of matches with the -m flag.

Some trivial improvement:

Remove the -i option, if you can, case insensitive is quite slow.
Replace the . by \.

A single point is the regex symbol to match any character, which is also slow

Two lines of attack:

are you sure, you need the -i, or do you habe a possibility to get rid of it?
Do you have more cores to play with? grep is single-threaded, so you might want to start more of them at different offsets.

< eightygigsfile.sql parallel -k -j120% -n10 -m grep -F -i -C 5 'db_pd.Clients'

If you need to search for multiple strings, grep -f strings.txt saves a ton of time. The above is a translation of something that I am currently testing. the -j and -n option value seemed to work best for my use case. The -F grep also made a big difference.

Related questions
                            
                                get just the integer from wc in bash
                            
                                What does it mean in shell when we put a command inside dollar sign and parentheses: $(command)
                            
                                What is the difference between PS1 and PROMPT_COMMAND?
                            
                                Is \d not supported by grep's basic expressions? [closed]
                            
                                Running bash script from within python
                            
                                Extract directory from path [duplicate]
                            
                                How to open Emacs inside Bash
                            
                                env: bash\r: No such file or directory
                            
                                How to insert a new line in Linux shell script? [duplicate]
                            
                                How to pass a variable containing slashes to sed
                            
                                Unit testing Bash scripts
                            
                                Redirect STDERR / STDOUT of a process AFTER it's been started, using command line?
                            
                                Relative paths based on file location instead of current working directory [duplicate]
                            
                                Unix - create path of folders and file
                            
                                Remove a character from the end of a variable
                            
                                Compare integer in bash, unary operator expected
                            
                                Check free disk space for current partition in bash
                            
                                Get first line of a shell command's output
                            
                                How to exit a function in bash
                            
                                How to source virtualenv activate in a Bash script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Grepping a huge file (80GB) any way to speed it up?

Tags:

grep

bash

People also ask

Recent Activity

Donate For Us