I have been trying to get a list of files matching a glob pattern in a command line argument (sys.argv[1]
) recursively using glob.glob
and os.walk
. The problem is, bash (and many other shells it seems) auto-expand glob patterns into filenames.
How do standard unix programs (e.g. grep -R
) do this then? I realize they're not in python, but if this is happening at the shell level, that shouldn't matter, right? Is there a way for a script to tell the shell to not auto-expand glob patterns? It looks like set -f
will disable globbing, but I'm not sure how to run this early enough, so to speak.
I've seen Use a Glob() to find files recursively in Python?, but that doesn't cover actually getting the glob patterns from command line arguments.
Thanks!
Edit:
The grep-like perl script ack accepts a perl regex as one of its arguments. Thus, ack .*
prints out every line of every file. But .*
should expand to all hidden files in a directory. I tried reading the script but I don't know perl; how can it do this?
The shell performs glob expansion before it even thinks of invoking the command. Programs such as grep don't do anything to prevent globbing: they can't. You, as the caller of these programs, must tell the shell that you want to pass the special characters such as *
and ?
to the program, and not let the shell interpret them. You do that by putting them inside quotes:
grep -E 'ba(na)* split' *.txt
(look for ba split
, bana split
, etc., in all files called <something>.txt
) In this case, either single quotes or double quotes will do the trick. Between single quotes, the shell expands nothing. Between double quotes, $
, `
and \
are still interpreted. You can also protect a single character from shell expansion by preceding it with a backslash. It's not only wildcard characters that need to be protected; for example, above, the space in the pattern is in quotes so it's part of the argument to grep
and not an argument separator. Alternative ways to write the snippet above include
grep -E "ba(na)* split" *.txt
grep -E ba\(na\)\*\ split *.txt
With most shells, if an argument contains wildcards but the pattern doesn't match any file, the pattern is left unchanged and passed to the underlying command. So a command like
grep b[an]*a *.txt
has a different effect depending on what files are present on the system. If the current directory doesn't contain any file whose name begins with b
, the command searches the pattern b[an]*a
in the files whose name matches *.txt
. If the current directory contains files named baclava
, bnm
and hello.txt
, the command expands to grep baclava bnm hello.txt
, so it searches the pattern baclava
in the two files bnm
and hello.txt
. Needless to say, it's a bad idea to rely on this in scripts; on the command line it can occasionally save typing, but it's risky.
When you run ack .*
in a directory containing no dot file, the shell runs ack . ..
. The behavior of the ack
command is then to print out all non-empty lines (pattern .
: matches any one character) in all files under ..
(the parent of the current directory) recursively. Contrast with ack '.*'
, which searches the pattern .*
(which matches anything) in the current directory and its subdirectories (due to the behavior of ack
when you don't pass any filename argument).
When it comes to grep, it simply accept a list of filenames, and doesn't do the glob expansion itself. If you really need to pass a pattern as an argument, it has to be quoted on the command line with single quotes. But before you do that, consider letting the shell do the job it was designed for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With