Given two strings with * wildcards, I would like to know if a string could be created that would match both.
For example, these two are a simple case of overlap:
But so are all of these:
Is there an algorithm published for doing this? Or perhaps a utility function in Windows or a library I might be able to call or copy?
You can solve this in time linear in the sum of the pattern lengths:
If both strings start or end with non-wildcards, check that they match until one pattern hits a wildcard (otherwise they don't match). This reduces the problem to the case where at least one pattern starts with a wildcard and at least one pattern ends with a wildcard. If both patterns have wildcards (somewhere), then they have to match:
Otherwise, one string (p1) has no wildcards, and the other string (p2) has strings s1,s2,...punctuated with wildcards. So just search for the first occurrence of s1 in p1, then for the first subsequent occurrence of s2 (starting from the end of the match in p1), etc. If you find all of the strings, then the patterns match, otherwise they don't.
For what it's worth, here's one implementation of the algorithm from sepp2k's answer in C# (I used explicit return true;
and return false;
calls, along with comments, for algorithm readability):
public static bool WildcardIntersect(string w1, string w2)
{
// if both are empty or contain wildcards
if ((string.IsNullOrEmpty(w1) || w1 == "*")
&& (string.IsNullOrEmpty(w2) || w2 == "*"))
return true;
// if either string is empty, return false
// we can do this because we know the other string MUST be non-empty and non-wildcard
if (string.IsNullOrEmpty(w1) || string.IsNullOrEmpty(w2))
return false;
char c1 = w1[0], // first character of wildcard string 1
c2 = w2[0]; // first character of wildcard string 2
string remain1 = w1.Substring(1), // remaining of wildcard string 1
remain2 = w2.Substring(1); // remaining of wildcard string 2
// if first letters match and remaining intersect
if ((c1 == c2 && WildcardIntersect(remain1, remain2))
// if either is a wildcard and either remaining intersects with the other whole
|| ((c1 == '*' || c2 == '*') && (WildcardIntersect(w1, remain2) || WildcardIntersect(remain1, w2))))
return true;
// else, no match, return false
return false;
}
Since every glob can be written as a regular expression and the intersection of two regular expressions can be found (unless they aren't really regular, but they would be in this case), you can find the intersection of two globs by transforming them into regular expressions and then finding the intersection of those. So you can find out whether two globs intersect by finding the intersection of the regular expressions and checking whether it's empty.
However since globs are more limited than regular expression, there is a much easier way:
Let's call the two globs g1 and g2. They intersect iff
An example implementation in haskell:
intersect g1 [] = all (== '*') g1
intersect [] g2 = all (== '*') g2
intersect g1@('*':t1) g2@(c2:t2) = intersect g1 t2 || intersect t1 g2
intersect g1@(c1:t1) g2@('*':t2) = intersect t1 g2 || intersect g1 t2
intersect (c1:t1) (c2:t2) = c1 == c2 && intersect t1 t2
This algorithm isn't particular efficient if the globs contain a lot of wildcards, but it's very easy to implement and since you're likely planning to use this with filenames, I doubt you'll have globs longer than 1000 chars.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With