Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep based on blacklist -- without procedural code?

It's a well-known task, simple to describe:

Given a text file foo.txt, and a blacklist file of exclusion strings, one per line, produce foo_filtered.txt that has only the lines of foo.txt that do not contain any exclusion string.

A common application is filtering compiler warnings from a build log, but to ignore warnings on files that are not yours. The file foo.txt is the warnings file (itself filtered from the build log), and a blacklist file excluded_filenames.txt with file names, one per line.

I know how it's done in procedural languages like Perl or AWK, and I've even done it with combinations of Linux commands such as cut, comm, and sort.

But I feel that I should be really close with xargs, and just can't see the last step.

I know that if excluded_filenames.txt has only 1 file name in it, then

grep -v foo.txt `cat excluded_filenames.txt`

will do it.

And I know that I can get the filenames one per line with

xargs -L1 -a excluded_filenames.txt

So how do I combine those two into a single solution, without explicit loops in a procedural language?

Looking for the simple and elegant solution.

like image 871
talkaboutquality Avatar asked Oct 10 '11 14:10

talkaboutquality


1 Answers

You should use the -f option (or you can use fgrep which is the same):

grep -vf excluded_filenames.txt foo.txt

You could also use -F which is more directly the answer to what you asked:

grep -vF "`cat excluded_filenames.txt`" foo.txt

from man grep

-f FILE, --file=FILE
          Obtain patterns from FILE, one per line.  The empty file contains zero patterns, and therefore matches nothing.

-F, --fixed-strings
          Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.
like image 134
Paul Creasey Avatar answered Sep 21 '22 19:09

Paul Creasey