Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep a large list against a large file

I am currently trying to grep a large list of ids (~5000) against an even larger csv file (3.000.000 lines).

I want all the csv lines, that contain an id from the id file.

My naive approach was:

cat the_ids.txt | while read line do   cat huge.csv | grep $line >> output_file done 

But this takes forever!

Are there more efficient approaches to this problem?

like image 820
leifg Avatar asked Oct 15 '13 12:10

leifg


People also ask

How do I grep a list of strings in a file?

The basic grep syntax when searching multiple patterns in a file includes using the grep command followed by strings and the name of the file or its path. The patterns need to be enclosed using single quotes and separated by the pipe symbol. Use the backslash before pipe | for regular expressions.

Is Egrep faster than grep?

Egrep CommandThis version of grep is efficient and fast when it comes to searching for a regular expression pattern as it treats meta-characters as is and doesn't substitute them as strings like in grep, and hence you are freed from the burden of escaping them as in grep.


1 Answers

Try

grep -f the_ids.txt huge.csv 

Additionally, since your patterns seem to be fixed strings, supplying the -F option might speed up grep.

   -F, --fixed-strings           Interpret PATTERN as a  list  of  fixed  strings,  separated  by           newlines,  any  of  which is to be matched.  (-F is specified by           POSIX.) 
like image 80
devnull Avatar answered Sep 22 '22 11:09

devnull