I am trying filter data from data.txt using patterns stored in a file filter.txt. Like below, <pre class="prettyprint"><code>grep -v -f filter.txt data.txt > op.txt </code></pre> This grep takes more than 10-15 minutes for 30-40K lines in filter.txt and ~300K lines in data.txt. Is there any way to speed up this? data.txt <pre class="prettyprint"><code>data1 data2 data3 </code></pre> filter.txt <pre class="prettyprint"><code>data1 </code></pre> op.txt <pre class="prettyprint"><code>data2 data3 </code></pre> This works with solution provided by codeforester but fails when filter.txt is empty.

Based on Inian's solution in the related post, this <code>awk</code> command should solve your issue: <pre class="prettyprint"><code>awk 'FNR==NR {hash[$0]; next} !($0 in hash)' filter.txt data.txt > op.txt </code></pre>

grep -vf too slow with large files

Tags:

performance

grep

bash

shell

awk

I am trying filter data from data.txt using patterns stored in a file filter.txt. Like below,

grep -v -f filter.txt data.txt > op.txt

This grep takes more than 10-15 minutes for 30-40K lines in filter.txt and ~300K lines in data.txt.

Is there any way to speed up this?

data.txt

data1
data2
data3

filter.txt

data1

op.txt

data2
data3

This works with solution provided by codeforester but fails when filter.txt is empty.

707

asked Mar 09 '17 18:03

user3150037

1 Answers

Based on Inian's solution in the related post, this awk command should solve your issue:

awk 'FNR==NR {hash[$0]; next} !($0 in hash)' filter.txt data.txt > op.txt

answered Sep 28 '22 04:09

codeforester

Related questions
                            
                                Split large csv file and keep header in each part
                            
                                How to run shell script in another server in a script?
                            
                                Heredoc not preserving blank line
                            
                                ln -s and overwriting a physical directory
                            
                                Bash sort ignore first 5 lines
                            
                                sorting output of find before running the command in -exec
                            
                                Why is nvm command installed as root and also not found during vagrant bootstrap.sh?
                            
                                Change AWK field separator on the fly
                            
                                Why doesn't LD_PRELOAD take effect with scripts having no shebang?
                            
                                removing backslash with tr
                            
                                How to use sed to insert a line before each line in a file with the original line's content surrounding by a string?
                            
                                How to take multiple argument in bash and pass them to awk?
                            
                                How do I execute command for each stdin line?
                            
                                In shell scripting, what does .[!.]* mean?
                            
                                How can I dynamically pass arguments to a node script using unix commands?
                            
                                Does the Exclamation point represent negation in Bash IF condition?
                            
                                Store output of for loop into an array or variable
                            
                                Count occurrences of a list of words in a text file
                            
                                Pausing a Pingdom check via the Pingdom API using curl (BASH)
                            
                                How to parse a string with multiple characters to split on-Bash Scripting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With