Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursively matching filenames with glob argument

I have been trying to get a list of files matching a glob pattern in a command line argument (sys.argv[1]) recursively using glob.glob and os.walk. The problem is, bash (and many other shells it seems) auto-expand glob patterns into filenames.

How do standard unix programs (e.g. grep -R) do this then? I realize they're not in python, but if this is happening at the shell level, that shouldn't matter, right? Is there a way for a script to tell the shell to not auto-expand glob patterns? It looks like set -f will disable globbing, but I'm not sure how to run this early enough, so to speak.

I've seen Use a Glob() to find files recursively in Python?, but that doesn't cover actually getting the glob patterns from command line arguments.

Thanks!

Edit:

The grep-like perl script ack accepts a perl regex as one of its arguments. Thus, ack .* prints out every line of every file. But .* should expand to all hidden files in a directory. I tried reading the script but I don't know perl; how can it do this?

like image 410
Bryan Head Avatar asked Feb 24 '23 02:02

Bryan Head


2 Answers

The shell performs glob expansion before it even thinks of invoking the command. Programs such as grep don't do anything to prevent globbing: they can't. You, as the caller of these programs, must tell the shell that you want to pass the special characters such as * and ? to the program, and not let the shell interpret them. You do that by putting them inside quotes:

grep -E 'ba(na)* split' *.txt

(look for ba split, bana split, etc., in all files called <something>.txt) In this case, either single quotes or double quotes will do the trick. Between single quotes, the shell expands nothing. Between double quotes, $, ` and \ are still interpreted. You can also protect a single character from shell expansion by preceding it with a backslash. It's not only wildcard characters that need to be protected; for example, above, the space in the pattern is in quotes so it's part of the argument to grep and not an argument separator. Alternative ways to write the snippet above include

grep -E "ba(na)* split" *.txt
grep -E ba\(na\)\*\ split *.txt

With most shells, if an argument contains wildcards but the pattern doesn't match any file, the pattern is left unchanged and passed to the underlying command. So a command like

grep b[an]*a *.txt

has a different effect depending on what files are present on the system. If the current directory doesn't contain any file whose name begins with b, the command searches the pattern b[an]*a in the files whose name matches *.txt. If the current directory contains files named baclava, bnm and hello.txt, the command expands to grep baclava bnm hello.txt, so it searches the pattern baclava in the two files bnm and hello.txt. Needless to say, it's a bad idea to rely on this in scripts; on the command line it can occasionally save typing, but it's risky.

When you run ack .* in a directory containing no dot file, the shell runs ack . ... The behavior of the ack command is then to print out all non-empty lines (pattern .: matches any one character) in all files under .. (the parent of the current directory) recursively. Contrast with ack '.*', which searches the pattern .* (which matches anything) in the current directory and its subdirectories (due to the behavior of ack when you don't pass any filename argument).

like image 102
Gilles 'SO- stop being evil' Avatar answered Feb 26 '23 22:02

Gilles 'SO- stop being evil'


When it comes to grep, it simply accept a list of filenames, and doesn't do the glob expansion itself. If you really need to pass a pattern as an argument, it has to be quoted on the command line with single quotes. But before you do that, consider letting the shell do the job it was designed for.

like image 34
Adam Byrtek Avatar answered Feb 26 '23 21:02

Adam Byrtek