Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Recursively Remove Files of a Certain Type

Tags:

regex

find

sed

I misread the gzip documentation, and now I have to remove a ton of ".gz" files from many directories inside one another. I tried using 'find' to locate all .gz files. However, whenever there's a file with a space in the name, rm interprets that as another file. And whenever there's a dash, rm interprets that as a new flag. I decided to use 'sed' to replace the spaces with "\ " and the space-dashes with "\ -", and here's what I came up with.

find . -type f -name '*.gz' | sed -r 's/\ /\\ /g' | sed -r 's/\ -/ \\-/g'

When I run the find/sed query on a file that, for example, has a name of "Test - File - for - show.gz", I get the output

./Test\ \-\ File\ \-\ for\ \-\ show.gz

Which appears to be acceptable for rm, but when I run

rm $(find . -type f -name '*.gz'...)

I get

rm: cannot remove './Test\\': No such file or directory
rm: cannot remove '\\-\\': No such file or directory
rm: cannot remove 'File\\': No such file or directory
rm: cannot remove '\\-\\': No such file or directory
...

I haven't made extensive use of sed, so I have to assume I'm doing something wrong with the regular expressions. If you know what I'm doing wrong, or if you have a better solution, please tell me.

like image 717
fpf3 Avatar asked Aug 25 '14 07:08

fpf3


2 Answers

Adding backslashes before spaces protects the spaces against expansion in shell source code. But the output of a command in a command substitution does not undergo shell parsing, it only undergoes wildcard expansion and field splitting. Adding backslashes before spaces doesn't protect them against field splitting.

Adding backslashes before dashes is completely useless since it's rm that interprets dashes as special, and it doesn't interpret backslashes as special.

The output of find is ambiguous in general — file names can contain newlines, so you can't use a newline as a file name separator. Parsing the output of find is usually broken unless you're dealing with file names in a known, restricted character set, and it's often not the simplest method anyway.

find has a built-in way to execute external programs: the -exec action. There's no parsing going on, so this isn't subject to any problem with special characters in file names. (A path beginning with - could still be interpreted as an option, but all paths begin with . since that's the directory being traversed.)

find . -type f -name '*.gz' -exec rm {} +

Many find implementations (Linux, Cygwin, BSD) can delete files without invoking an external utility:

find . -type f -name '*.gz' -delete

See Why does my shell script choke on whitespace or other special characters? for more information on writing robust shell scripts.

like image 152
Gilles 'SO- stop being evil' Avatar answered Sep 24 '22 00:09

Gilles 'SO- stop being evil'


There is no need to pipe to sed, etc. Instead, you can make use of the -exec flag on find, that allows you to execute a command on each one of the results of the command.

For example, for your case this would work:

find . -type f -name '*.gz' -exec rm {} \;

which is approximately the same as:

find . -type f -name '*.gz' -exec rm {} +

The last one does not open a subshell for each result, which makes it faster.


From man find:

-exec command ;

Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of ;' is encountered. The string{}' is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find. Both of these constructions might need to be escaped (with a `\') or quoted to protect them from expansion by the shell. See the EXAMPLES section for examples of the use of the -exec option. The specified command is run once for each matched file. The command is executed in the starting directory. There are unavoidable security problems surrounding use of the -exec action; you should use the -execdir option instead.

like image 29
fedorqui 'SO stop harming' Avatar answered Sep 24 '22 00:09

fedorqui 'SO stop harming'