I have a scenario where I would like to search using a wildcarded pattern ontop of a string that already has wildcards in it. In my words, i would say it is a 2 way pattern matching requirement.
The input and pattern string can have either/both of the following wildcards - ? representing a single character and % representing zero or more characters. Assume these are the only 2 wildcards allowed in input and the pattern strings.
For ex:
bool IsMatch(string input, string pattern) //Should return True if the input string matches the pattern, should return False otherwise.
IsMatch("XYZ%", "?Y%") // Should return True
IsMatch("YY?", "?Y%") // Should return True - Last character in input string expects a single character where as the pattern matches to zero or more characters after Y (which means it includes a single character match as well)
IsMatch("X123", "?Y%") // Should return False - Missing Y in the input string which the pattern expects
IsMatch("?Y%", "?Y%")// Should return True
IsMatch("%", "?Y%")// Should return True - The input string has a wildcard % representing zero or more characters and can also have any character(s). In a way, it acts as a pattern in itself representing anything of any size.
I'm able to find articles (ex: Regex) that only talk about performing a wildcarded pattern match on a non-wildcarded string. I'm looking for pointers/ideas on the algorithm as it is getting difficult for me to come up with an algorithm that can do this kind of match, as I start putting it down. Appreciate your inputs.
As I wrote in my comment for the most general cases you'd have to create the minimal deterministic finite automaton of the two expressions and compare the two automatons. Having said that there may be a bruteforce/poorman's solution to your question.
Based on your examples it sounds like you're interested in seeing if one of input/pattern matches all the strings generated by the other.
IsMatch("XYZ%", "?Y%") // returns true because ?Y% matches a superset of strings matched by "XYZ%"
IsMatch("%", "?Y%") // returns true because "%" matches a superset of "?Y%"
You can check if input
indeed matches a subset of strings generated by pattern
as long as
The basic idea is you generate a list of representative strings for input
and match each one with pattern using your favorite regex engine. If all the representatives match - input
matches a subset of pattern
. This algorithm for IsSubset
can be described as follows
let c = some character not in `pattern` (lexically speaking)
let searchString = replace all occurences of '?' in input with c
add searchString to setOfSearchStrings
foreach occurence of '%' in input
foreach str in setOfSearchStrings
replace str with two strings - {str with c in place of '%', str without the '%'}
foreach str in setOfSearchStrings
if str doesn't "regex" match with pattern
return false
return true
for example if input is ?X%YZ% and the pattern
doesn't contain the character A the list generated would be
AXYZ
AXYZA
AXAYZ
AXAYZA
It's easy to see that the number of strings in this list is 2^n where n is the number of '%' in input.
Also it's easy to swap the order of arguments and figure out the relationship the other way round. So in effect your
IsMatch(input,pattern) = IsSubset(input,pattern) || IsSubset(pattern,input)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With