If you just need matching filenames, and not the actual matches found in the files, then you should run grep with the -l flag. This flag causes grep to just print filenames that match, and not print the matching lines.
Lack of disk space or exceeding enabled quotas will also cause the output file to truncate. grep has a line length limit of 2048 characters. There also is a concept of largefiles, files which are so …
The egrep command allows the use of extended regex. The fgrep command on the other hand works on fixed string instead of a regex. This means that it takes the search pattern as it is for searching and thus it is faster than grep.
Here are a few options:
1) Prefix your grep command with LC_ALL=C
to use the C locale instead of UTF-8.
2) Use fgrep
because you're searching for a fixed string, not a regular expression.
3) Remove the -i
option, if you don't need it.
So your command becomes:
LC_ALL=C fgrep -A 5 -B 5 'db_pd.Clients' eightygigsfile.sql
It will also be faster if you copy your file to RAM disk.
If you have a multicore CPU, I would really recommend GNU parallel. To grep a big file in parallel use:
< eightygigsfile.sql parallel --pipe grep -i -C 5 'db_pd.Clients'
Depending on your disks and CPUs it may be faster to read larger blocks:
< eightygigsfile.sql parallel --pipe --block 10M grep -i -C 5 'db_pd.Clients'
It's not entirely clear from you question, but other options for grep
include:
-i
flag.-F
flag for a fixed stringLANG=C
-m
flag.Some trivial improvement:
Remove the -i option, if you can, case insensitive is quite slow.
Replace the .
by \.
A single point is the regex symbol to match any character, which is also slow
Two lines of attack:
-i
, or do you habe a possibility to get rid of it?grep
is single-threaded, so you might want to start more of them at different offsets.< eightygigsfile.sql parallel -k -j120% -n10 -m grep -F -i -C 5 'db_pd.Clients'
If you need to search for multiple strings, grep -f strings.txt saves a ton of time. The above is a translation of something that I am currently testing. the -j and -n option value seemed to work best for my use case. The -F grep also made a big difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With