I am successfully using find to create a list of all files in the current subdirectory, excluding those in the subdirectory "cache." Here's my first bit of code:
find . -wholename './cach*' -prune -o -print
I now wish to pipe this into a grep command. It seems like that should be simple:
find . -wholename './cach*' -prune -o -print | xargs grep -r -R -i "samson"
... but this is returning results that are mostly from the cache directory. I've tried removing the xargs reference, but that does what you'd expect, running the grep on text of the file names, rather than on the files themselves. My goal is to find "samson" in any files that aren't cached content.
I'll probably get around this issue by just using doubled greps in this instance, but I'm very curious about why this one-liner behaves this way. I'd love to hear thoughts on a way to modify it while still using these two commands (as there are speed advantages to doing it this way).
(This is in CentOS 5, btw.)
We can exclude directories by using the help of “path“, “prune“, “o” and “print” switches with find command. The directory “bit” will be excluded from the find search!
The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in. The output is the three lines in the file that contain the letters 'not'.
You can make grep search in all the files and all the subdirectories of the current directory using the -r recursive search option: grep -r search_term .
To search all files in the current directory, use an asterisk instead of a filename at the end of a grep command. The output shows the name of the file with nix and returns the entire line.
The wholename
match may be the reason why it's still including "cache" files. If you're executing the find
command in the directory that contains the "cache" folder, it should work. If not, try changing it to -name '*cache*'
instead.
Also, you do not need the -r
or -R
for your grep
, that tells it to recurse through directories - but you're testing individual files.
You can update your command using the piped version, or a single-command:
find . -name '*cache*' -prune -o -print0 | xargs -0 grep -il "samson"
or
find . -name '*cache*' -prune -o -exec grep -iq "samson" {} \; -print
Note, the -l
in the first command tells grep
to "list the file" and not the line(s) that match. The -q
in the second does the same; it tells grep
to respond quietly so find
will then just print the filename.
You've told grep
itself to recurse (twice! -r
and -R
are synonyms). Since one of the arguments you're passing is .
(the top directory), grep
is searching in every file (some of them twice, or even more if they're in subdirectories).
If you're going to use find
and grep
, do this:
find . -path './cach*' -prune -o -print0 | xargs -0 grep -i "samson"
Using -print0
and -0
makes your script work even with file names that contain spaces or punctuation characters.
However, you probably don't need to bother with find
here, since GNU grep is capable of excluding directories:
grep -R --exclude-dir='cach*' -i "samson" .
(This also excludes ./deeply/nested/directory/cache
. If you only want to exclude cache directories at the toplevel, use find
as you did.)
Use the -exec
option on find instead of piping them to another command. From there you can use grep "samson" {} \;
to look for samson in each file listed.
For example:
find . -wholename './cach*' -prune -o -exec grep "samson" "{}" +
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With