Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is ** understood by file glob()?

Tags:

shell

glob

In Java7, sun.nio.fs.Globs and getPathMatcher() seem to understand the idiom ** as a way to match zero or more characters across directory boundaries (see the getPathMatcher javadoc).

I could swear some flavor of shell (zsh, bash, tcsh) with some appropriate option settings was giving me the same behavior at some point. But for the life of me I cannot remember how to enable this, and I'm even starting to doubt my memory that I had it working at some point... (Edit: zsh provides that behavior, but only for directories, i.e. "**.gz" doesn't match foo/bar/fubar.gz, but "**/*.gz" does).

In fact, looking at the documentation for various implementations of glob (e.g. POSIX glob(3), glob(7), and Perl's File::Glob) this behavior doesn't seem mentioned anywhere. One exception is Ruby's Dir.glob() which explicitly handles **.

(original question was: "Does anyone know how to enable this behavior in a unix shell (e.g. zsh)?", but now see the Edited question below).

As a bonus question: does anyone know how to search for '**' in Google?...


Edited question

In fact, it looks like that behavior is indeed accepted by my zsh shell (thanks for the responses asserting that fact and prompting me to look further). The reason I thought it wasn't comes from the following subtlety: "**.gz" won't match a <path>/<prefix>.gz, but "**/*.gz" will. Here is an example. Let's start with the following tree:

$ find . -type f | sort
./foo/a.gz
./foo/bar/fubar/abc.gz
./foo/bar/x.gz
./foo/bar/y.gz
./xyz.gz

"**.gz" doesn't match inside subdirs, just matches what "*.gz" would:

$ ls -1 **.gz
xyz.gz

whereas "**/*.gz" does:

$ ls -1 **/*.gz
foo/a.gz
foo/bar/fubar/abc.gz
foo/bar/x.gz
foo/bar/y.gz
xyz.gz

Now, compare this to the Java behavior:

@Test
public void testStar() {
    String pat = Globs.toUnixRegexPattern("*.gz");
    assertEquals("^[^/]*\\.gz$", pat);
}

@Test
public void testStarStar() {
    // '**' allows any number of directories on the path
    // this apparently is not POSIX, although darn useful
    String pat = Globs.toUnixRegexPattern("**.gz");
    assertEquals("^.*\\.gz$", pat);
}

Clearly (from the regexps), here the "**" matches any character on the path (i.e. it becomes ".*" in the regexp), whether in a subdirectory or not, and as part of filename or not.

(Disclaimer: Globs is a copy of sun.nio.fs.Globs.toUnixRegexPattern(String glob) because I needed something that works cross-platform).

like image 330
Pierre D Avatar asked Jan 15 '23 19:01

Pierre D


1 Answers

In POSIX shell:

The slash character in a pathname shall be explicitly matched by using one or more slashes in the pattern; it shall neither be matched by the asterisk or question-mark special characters nor by a bracket expression

You could google: "filename expansion pattern".

In bash you could set globstar:

[An asterisk] Matches any string, including the null string. When the globstar shell option is enabled, and ‘*’ is used in a filename expansion context, two adjacent ‘*’s used as a single pattern will match all files and zero or more directories and subdirectories. If followed by a ‘/’, two adjacent ‘*’s will match only directories and subdirectories.

$ shopt -s globstar
$ ls **/
$ shopt -u globstar
$ ls **/

note: '/' is used here to show only directories.

like image 82
jfs Avatar answered Feb 23 '23 16:02

jfs