I am currently trying to grep
a large list of ids (~5000) against an even larger csv file (3.000.000 lines).
I want all the csv lines, that contain an id from the id file.
My naive approach was:
cat the_ids.txt | while read line do cat huge.csv | grep $line >> output_file done
But this takes forever!
Are there more efficient approaches to this problem?
The basic grep syntax when searching multiple patterns in a file includes using the grep command followed by strings and the name of the file or its path. The patterns need to be enclosed using single quotes and separated by the pipe symbol. Use the backslash before pipe | for regular expressions.
Egrep CommandThis version of grep is efficient and fast when it comes to searching for a regular expression pattern as it treats meta-characters as is and doesn't substitute them as strings like in grep, and hence you are freed from the burden of escaping them as in grep.
Try
grep -f the_ids.txt huge.csv
Additionally, since your patterns seem to be fixed strings, supplying the -F
option might speed up grep
.
-F, --fixed-strings Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With