Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern matching on a string that already has wildcards in it

Tags:

c#

regex

wildcard

I have a scenario where I would like to search using a wildcarded pattern ontop of a string that already has wildcards in it. In my words, i would say it is a 2 way pattern matching requirement.

The input and pattern string can have either/both of the following wildcards - ? representing a single character and % representing zero or more characters. Assume these are the only 2 wildcards allowed in input and the pattern strings.

For ex:

bool IsMatch(string input, string pattern) //Should return True if the input string matches the pattern, should return False otherwise.

IsMatch("XYZ%", "?Y%") // Should return True

IsMatch("YY?", "?Y%") // Should return True - Last character in input string expects a single character where as the pattern matches to zero or more characters after Y (which means it includes a single character match as well)

IsMatch("X123", "?Y%") // Should return False - Missing Y in the input string which the pattern expects

IsMatch("?Y%", "?Y%")// Should return True

IsMatch("%", "?Y%")// Should return True - The input string has a wildcard % representing zero or more characters and can also have any character(s). In a way, it acts as a pattern in itself representing anything of any size.

I'm able to find articles (ex: Regex) that only talk about performing a wildcarded pattern match on a non-wildcarded string. I'm looking for pointers/ideas on the algorithm as it is getting difficult for me to come up with an algorithm that can do this kind of match, as I start putting it down. Appreciate your inputs.

like image 605
PKumar Avatar asked Nov 07 '13 18:11

PKumar


1 Answers

As I wrote in my comment for the most general cases you'd have to create the minimal deterministic finite automaton of the two expressions and compare the two automatons. Having said that there may be a bruteforce/poorman's solution to your question.

Based on your examples it sounds like you're interested in seeing if one of input/pattern matches all the strings generated by the other.

IsMatch("XYZ%", "?Y%") // returns true because ?Y% matches a superset of strings matched by "XYZ%"
IsMatch("%", "?Y%") // returns true because "%" matches a superset of "?Y%"

You can check if input indeed matches a subset of strings generated by pattern as long as

  • you really can limit yourselves to % and ? operators as specified
  • your input/pattern strings are reasonably short - more specifically the occurences of % in either input or pattern are less than about 20 or so.

The basic idea is you generate a list of representative strings for input and match each one with pattern using your favorite regex engine. If all the representatives match - input matches a subset of pattern. This algorithm for IsSubset can be described as follows

let c = some character not in `pattern` (lexically speaking)
let searchString = replace all occurences of '?' in input with c
add searchString to setOfSearchStrings
foreach occurence of '%' in input
    foreach str in setOfSearchStrings
        replace str with two strings - {str with c in place of '%', str without the '%'}

foreach str in setOfSearchStrings
   if str doesn't "regex" match with pattern 
       return false

return true

for example if input is ?X%YZ% and the pattern doesn't contain the character A the list generated would be
AXYZ
AXYZA
AXAYZ
AXAYZA

It's easy to see that the number of strings in this list is 2^n where n is the number of '%' in input.

Also it's easy to swap the order of arguments and figure out the relationship the other way round. So in effect your

IsMatch(input,pattern) = IsSubset(input,pattern) || IsSubset(pattern,input)
like image 114
hawk Avatar answered Nov 14 '22 22:11

hawk