Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove files not containing a specific string

Tags:

linux

grep

bash

sed

I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?

like image 457
Hakim Avatar asked Jul 01 '12 07:07

Hakim


2 Answers

The following will work:

find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm

This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c to see the effect of the -print0 argument.

Then the --null parameter to xargs tells it to accept null-terminated inputs. xargs will then call grep on a list of filenames.

The -Z argument to grep works like the -print0 argument to find, so grep will print out its results null-terminated (which is why the final call to xargs needs a --null option too). The -L argument to grep causes grep to print the filenames of those files on its command line (that xargs has added) which don't match the regular expression:

my string

If you want simple matching without regular expression magic then add the -F option. If you want more powerful regular expressions then give a -E argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)

Finally you call xargs again to get rid of all the files that you've found with the previous calls.

The problem with calling grep directly from the find command with the -exec argument is that grep then gets invoked once per file rather than once for a whole batch of files as xargs does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:

rm $(some command that produces lots of filenames)

It's always better to pass it to xargs as this knows the maximum command-line limits and will call rm multiple times each time with as many arguments as it can.

Note that this solution would have been simpler without the need to cope with files containing white space and new lines.

Alternatively

grep -r -L -Z 'my string' . | xargs --null rm

will work too (and is shorter). The -r argument to grep causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ... approach if you want to do some other tests on the files as well (such as age or permissions).

Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ). But note also that find does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.

like image 183
Nick Avatar answered Sep 22 '22 08:09

Nick


GNU grep and bash.

grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done

Use a find solution if portability is needed. This is slightly faster.

like image 5
ormaaj Avatar answered Sep 24 '22 08:09

ormaaj