Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

glob pattern matching in .NET

Tags:

c#

.net

glob

People also ask

What is glob style matching?

What are globs? Globs, also known as glob patterns are patterns that can expand a wildcard pattern into a list of pathnames that match the given pattern.

What is a glob regex?

Regex and Glob patterns are similar ways of matching patterns in strings. The main difference is that the regex pattern matches strings in code, while globbing matches file names or file content in the terminal. Globbing is the shell's way of providing regular expression patterns like other programming languages.

What is glob used for?

glob (short for global) is used to return all file paths that match a specific pattern. We can use glob to search for a specific file pattern, or perhaps more usefully, search for files where the filename matches a certain pattern by using wildcard characters.

How do you implement globbing?

The obvious implementation of glob pattern matching against a single path element is to walk the pattern and the name together, matching letters or wildcards in the pattern to letters in the name. If the walk reaches the end of the pattern at the same time as the end of the name, they match.


I like my code a little more semantic, so I wrote this extension method:

using System.Text.RegularExpressions;

namespace Whatever
{
    public static class StringExtensions
    {
        /// <summary>
        /// Compares the string against a given pattern.
        /// </summary>
        /// <param name="str">The string.</param>
        /// <param name="pattern">The pattern to match, where "*" means any sequence of characters, and "?" means any single character.</param>
        /// <returns><c>true</c> if the string matches the given pattern; otherwise <c>false</c>.</returns>
        public static bool Like(this string str, string pattern)
        {
            return new Regex(
                "^" + Regex.Escape(pattern).Replace(@"\*", ".*").Replace(@"\?", ".") + "$",
                RegexOptions.IgnoreCase | RegexOptions.Singleline
            ).IsMatch(str);
        }
    }
}

(change the namespace and/or copy the extension method to your own string extensions class)

Using this extension, you can write statements like this:

if (File.Name.Like("*.jpg"))
{
   ....
}

Just sugar to make your code a little more legible :-)


Just for the sake of completeness. Since 2016 in dotnet core there is a new nuget package called Microsoft.Extensions.FileSystemGlobbing that supports advanced globing paths. (Nuget Package)

some examples might be, searching for wildcard nested folder structures and files which is very common in web development scenarios.

  • wwwroot/app/**/*.module.js
  • wwwroot/app/**/*.js

This works somewhat similar with what .gitignore files use to determine which files to exclude from source control.


I found the actual code for you:

Regex.Escape( wildcardExpression ).Replace( @"\*", ".*" ).Replace( @"\?", "." );

The 2- and 3-argument variants of the listing methods like GetFiles() and EnumerateDirectories() take a search string as their second argument that supports filename globbing, with both * and ?.

class GlobTestMain
{
    static void Main(string[] args)
    {
        string[] exes = Directory.GetFiles(Environment.CurrentDirectory, "*.exe");
        foreach (string file in exes)
        {
            Console.WriteLine(Path.GetFileName(file));
        }
    }
}

would yield

GlobTest.exe
GlobTest.vshost.exe

The docs state that there are some caveats with matching extensions. It also states that 8.3 file names are matched (which may be generated automatically behind the scenes), which can result in "duplicate" matches in given some patterns.

The methods that support this are GetFiles(), GetDirectories(), and GetFileSystemEntries(). The Enumerate variants also support this.