Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking if two patterns match one another?

This Leetcode problem is about how to match a pattern string against a text string as efficiently as possible. The pattern string can consists of letters, dots, and stars, where a letter only matches itself, a dot matches any individual character, and a star matches any number of copies of the preceding character. For example, the pattern

ab*c.

would match ace and abbbbcc. I know that it's possible to solve this original problem using dynamic programming.

My question is whether it's possible to see whether two patterns match one another. For example, the pattern

bdbaa.*

can match

bdb.*daa

Is there a nice algorithm for solving this pattern-on-pattern matching problem?

like image 657
codecrazer Avatar asked Jun 27 '17 08:06

codecrazer


1 Answers

Here's one approach that works in polynomial time. It's slightly heavyweight and there may be a more efficient solution, though.

The first observation that I think helps here is to reframe the problem. Rather than asking whether these patterns match each other, let's ask this equivalent question:

Given patterns P1 and P2, is there a string w where P1 and P2 each match w?

In other words, rather than trying to get the two patterns to match one another, we'll search for a string that each pattern matches.

You may have noticed that the sorts of patterns you're allowed to work with are a subset of the regular expressions. This is helpful, since there's a pretty elaborate theory of what you can do with regular expressions and their properties. So rather than taking aim at your original problem, let's solve this even more general one:

Given two regular expressions R1 and R2, is there a string w that both R1 and R2 match?

The reason for solving this more general problem is that it enables us to use the theory that's been developed around regular expressions. For example, in formal language theory we can talk about the language of a regular expression, which is the set of all strings that the regex matches. We can denote this L(R). If there's a string that's matched by two regexes R1 and R2, then that string belongs to both L(R1) and L(R2), so our question is equivalent to

Given two regexes R1 and R2, is there a string w in L(R1) ∩ L(R2)?

So far all we've done is reframe the problem we want to solve. Now let's go solve it.

The key step here is that it's possible to convert any regular expression into an NFA (a nondeterministic finite automaton) so that every string matched by the regex is accepted by the NFA and vice-versa. Even better, the resulting NFA can be constructed in polynomial time. So let's begin by constructing NFAs for each input regex.

Now that we have those NFAs, we want to answer this question: is there a string that both NFAs accept? And fortunately, there's a quick way to answer this. There's a common construction on NFAs called the product construction that, given two NFAs N1 and N2, constructs a new NFA N' that accepts all the strings accepted by both N1 and N2 and no other strings. Again, this construction runs in polynomial time.

Once we have N', we're basically done! All we have to do is run a breadth-first or depth-first search through the states of N' to see if we find an accepting state. If so, great! That means there's a string accepted by N', which means that there's a string accepted by both N1 and N2, which means that there's a string matched by both R1 and R2, so the answer to the original question is "yes!" And conversely, if we can't reach an accepting state, then the answer is "no, it's not possible."

I'm certain that there's a way to do all of this implicitly by doing some sort of implicit BFS over the automaton N' without actually constructing it, and it should be possible to do this in something like time O(n2). If I have some more time, I'll revisit this answer and expand on how to do that.

like image 128
templatetypedef Avatar answered Sep 27 '22 21:09

templatetypedef