I am trying filter data from data.txt using patterns stored in a file filter.txt. Like below,
grep -v -f filter.txt data.txt > op.txt
This grep takes more than 10-15 minutes for 30-40K lines in filter.txt and ~300K lines in data.txt.
Is there any way to speed up this?
data.txt
data1
data2
data3
filter.txt
data1
op.txt
data2
data3
This works with solution provided by codeforester but fails when filter.txt is empty.
If you're running grep over a very large number of files it will be slow because it needs to open them all and read through them. If you have some idea of where the file you're looking for might be try to limit the number of files that have to be searched through that way.
Though grep expects to do the matching on text, it has no limits on input line length other than available memory, and it can match arbitrary characters within a line.
Is fast grep faster? The grep utility searches text files for regular expressions, but it can search for ordinary strings since these strings are a special case of regular expressions. However, if your regular expressions are in fact simply text strings, fgrep may be much faster than grep .
When only searching for strings, and speed matters, you should almost always use grep . It's orders of magnitude faster than awk when it comes to just gross searching.
Based on Inian's solution in the related post, this awk
command should solve your issue:
awk 'FNR==NR {hash[$0]; next} !($0 in hash)' filter.txt data.txt > op.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With