Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

file name matching with wildcard

Tags:

I need to implement something like my own file system. One operation would be the FindFirstFile. I need to check, if the caller passed something like ., sample*.cpp or so. My "file system" implementation provides the list of "files names" as a array of char*.

Is there any Windows function or any source code that implements this file name matching?

like image 292
harper Avatar asked Jul 21 '10 14:07

harper


People also ask

How do you use wildcards in file names?

An asterisk is replaced by any number of characters in a filename. For example, ae* would match aegis, aerie, aeon, etc. if those files were in the same directory. You can use this to save typing for a single filename (for example, al* for alphabet.

What is a wildcard filename?

Wildcards (also referred to as meta characters) are symbols or special characters that represent other characters. You can use them with any command such as ls command or rm command to list or remove files matching a given criteria, receptively.

Which wildcard character matches any symbol in a filename?

The asterisk represents any number of unknown characters. Use it when searching for documents or files for which you have only partial names. For most web search engines, wildcards increase the number of your search results.

What is * wildcard in Linux with examples?

The wildcard '*' means it will match any number of characters or a set of characters. For example, S**n will match anything between S and n. The number of characters between them do not count. Example: Here, we can see in the result that files starting with 'A' and ending with 'f' are displayed.


3 Answers

For wildcard name matching using '*' and '?' try this (if you want to avoid boost, use std::tr1::regex):

#include <boost/regex.hpp>
#include <boost/algorithm/string/replace.hpp>

using std::string;

bool MatchTextWithWildcards(const string &text, string wildcardPattern, bool caseSensitive /*= true*/)
{
    // Escape all regex special chars
    EscapeRegex(wildcardPattern);

    // Convert chars '*?' back to their regex equivalents
    boost::replace_all(wildcardPattern, "\\?", ".");
    boost::replace_all(wildcardPattern, "\\*", ".*");

    boost::regex pattern(wildcardPattern, caseSensitive ? regex::normal : regex::icase);

    return regex_match(text, pattern);
}

void EscapeRegex(string &regex)
{
    boost::replace_all(regex, "\\", "\\\\");
    boost::replace_all(regex, "^", "\\^");
    boost::replace_all(regex, ".", "\\.");
    boost::replace_all(regex, "$", "\\$");
    boost::replace_all(regex, "|", "\\|");
    boost::replace_all(regex, "(", "\\(");
    boost::replace_all(regex, ")", "\\)");
    boost::replace_all(regex, "{", "\\{");
    boost::replace_all(regex, "{", "\\}");
    boost::replace_all(regex, "[", "\\[");
    boost::replace_all(regex, "]", "\\]");
    boost::replace_all(regex, "*", "\\*");
    boost::replace_all(regex, "+", "\\+");
    boost::replace_all(regex, "?", "\\?");
    boost::replace_all(regex, "/", "\\/");
}
like image 196
nabulke Avatar answered Nov 07 '22 00:11

nabulke


There are quite a few such functions around. Here's a directory of various implementations, sorted into recursive and non-recursive, etc.

In case you don't like the licensing there (or have trouble with the link, etc.) here's one possible implementation of a matching algorithm that at least closely approximates what Windows uses:

#include <string.h>
#include <iostream>

bool match(char const *needle, char const *haystack) {
    for (; *needle != '\0'; ++needle) {
        switch (*needle) {
        case '?': 
            if (*haystack == '\0')
                return false;
            ++haystack;
            break;
        case '*': {
            if (needle[1] == '\0')
                return true;
            size_t max = strlen(haystack);
            for (size_t i = 0; i < max; i++)
                if (match(needle + 1, haystack + i))
                    return true;
            return false;
        }
        default:
            if (*haystack != *needle)
                return false;
            ++haystack;
        }
    }
    return *haystack == '\0';
}

#ifdef TEST
#define CATCH_CONFIG_MAIN

#include "catch.hpp"

TEST_CASE("Matching", "[match]") {
    REQUIRE(match("a", "a") == true);
    REQUIRE(match("a", "b") == false);
    REQUIRE(match("a*", "a") == true);
    REQUIRE(match("a?", "a") == false);
    REQUIRE(match("a?", "ab") == true);
    REQUIRE(match("a*b", "ab") == true);
    REQUIRE(match("a*b", "acb") == true);
    REQUIRE(match("a*b", "abc") == false);
    REQUIRE(match("*a*??????a?????????a???????????????", 
        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa") == true);
}

#endif

Since there was a discussion of complexity of some of the other answers, I'll note that I believe this has O(NM) complexity and O(M) storage use (where N is the size of the target string, and M is the size of the pattern).

With @masterxilo's test pair:

"*a*??????*a*?????????a???????????????", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

...this finds a match in approximately 3 microseconds on my machine. That is a lot slower than a typical pattern--most of my other tests run in about 300 nanoseconds or so on this particular machine.

At the same time, @masterxilo's code takes approximately 11 microseconds to run on the same machine, so this is still around 3 to 4 times faster (not to mention being somewhat smaller and simpler).

like image 23
Jerry Coffin Avatar answered Nov 07 '22 01:11

Jerry Coffin


Have a look at the POSIX functions fnmatch, glob, and wordexp.

like image 22
Phil Miller Avatar answered Nov 07 '22 00:11

Phil Miller