I have a list containing about 1000 file names to search under a directory and its subdirectories. There are hundreds of subdirs with more than 1,000,000 files. The following command will run find for 1000 times:
cat filelist.txt | while read f; do find /dir -name $f; done
Is there a much faster way to do it?
If filelist.txt
has a single filename per line:
find /dir | grep -f <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(The -f
option means that grep searches for all the patterns in the given file.)
Explanation of <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
:
The <( ... )
is called a process subsitution, and is a little similar to $( ... )
. The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
The call to sed
runs the commands s@^@/@
, s/$/$/
and s/\([\.[\*]\|\]\)/\\\1/g
on each line of filelist.txt
and prints them out. These commands convert the filenames into a format that will work better with grep.
s@^@/@
means put a /
at the before each filename. (The ^
means "start of line" in a regex)s/$/$/
means put a $
at the end of each filename. (The first $
means "end of line", the second is just a literal $
which is then interpreted by grep to mean "end of line"). The combination of these two rules means that grep will only look for matches like .../<filename>
, so that a.txt
doesn't match ./a.txt.backup
or ./abba.txt
.
s/\([\.[\*]\|\]\)/\\\1/g
puts a \
before each occurrence of .
[
]
or *
. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt
would match files like abtxt
).
As an example:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep then uses each line of that output as a pattern when it is searching the output of find
.
If filelist.txt
is a plain list:
$ find /dir | grep -F -f filelist.txt
If filelist.txt
is a pattern list:
$ find /dir | grep -f filelist.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With