I came across this problem that asks you to implement a regular expression matcher with support for '.' and '*', where
'.' Matches any single character.
'*' Matches zero or more of the preceding element.
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true
While I was able to solve this in a linear fashion, I came across lots of solutions that used DP, like the one below,
class Solution {
public boolean isMatch(String text, String pattern) {
boolean[][] dp = new boolean[text.length() + 1][pattern.length() + 1];
dp[text.length()][pattern.length()] = true;
for (int i = text.length(); i >= 0; i--){
for (int j = pattern.length() - 1; j >= 0; j--){
boolean first_match = (i < text.length() &&
(pattern.charAt(j) == text.charAt(i) ||
pattern.charAt(j) == '.'));
if (j + 1 < pattern.length() && pattern.charAt(j+1) == '*'){
dp[i][j] = dp[i][j+2] || first_match && dp[i+1][j];
} else {
dp[i][j] = first_match && dp[i+1][j+1];
}
}
}
return dp[0][0];
}
}
I'm having a hard time understanding this. I've solved a few DP problems that involved grids (shortest path in 2d grid, largest square in binary 2d grid), using a DP table there made perfect sense to me. However here I'm completely lost, I'm unable to understand how traversing a 2d table helps in solving this problem. Further more it appears we know when the characters don't match in the loop, so I don't understand why we don't terminate the search there (this is also probably due to my lack of understanding of how a table traversal leads to a solution). Is there a clear intuitive explanation for problems like these?
A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
$ means "Match the end of the string" (the position after the last character in the string).
A regular expression (RE) is a language for specifying text search strings. RE helps us to match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are used to search texts in UNIX as well as in MS WORD in identical way.
The intuition to solve problem like this using DP is to figure out answer to following questions
Lets first understand the problem's solution which you would have figured out while solving in linear fashion.
While matching the text with pattern either first character will match or it will not match.
Case 1: First character matches or first character of pattern is '.'
Case 1.1 Next character is '*'
Case 1.2 Next character is not '*'
Case 2: First character does not match
case 2.1 Next character is '*'
case 2.2 Next character is not '*'
Lets now figure out two steps discussed earlier for above problem.
For case 1.1 examples are
isMatch("a", "a*a")
and isMatch("aab", "a*b")
which is equivalent to solving
isMatch("a", "a") || isMatch("", "a*a")
and
isMatch("aab", "b") || isMatch("ab", "a*b")
respectively. First part of ||
condition covers the scenario of optional character in pattern as it is followed by '*' and second part covers the scenario for repetition of matched character.
Similarly sub-problems can be formed for all the cases. case 2.2 should straightaway return false.
Below is the java code with recursive approach
public boolean isMatch(String text, String pattern) {
dp = new Boolean[text.length()][pattern.length()];
return isMatch(text, pattern, 0, 0);
}
private boolean isMatch(String text, String pattern, int ti, int pi) {
if (pi == pattern.length() && ti < text.length()) return false;
if (ti == text.length() && pi == pattern.length()) return true;
if (ti == text.length()) {
return isNextCharStar(pattern, pi) && isMatch(text, pattern, ti, pi + 2);
}
boolean result;
final boolean hasFirstMatched = text.charAt(ti) == pattern.charAt(pi) || pattern.charAt(pi) == '.';
if (isNextCharStar(pattern, pi)) {
result = isMatch(text, pattern, ti, pi + 2);
if (hasFirstMatched) {
result = result || isMatch(text, pattern, ti + 1, pi);
}
return result;
}
return hasFirstMatched && isMatch(text, pattern, ti + 1, pi + 1);
}
private boolean isNextCharStar(String pattern, int pi) {
return pi < pattern.length() - 1 && pattern.charAt(pi + 1) == '*';
}
Now lets apply the memoization step. If we start saving the result before returning we can reuse it next time. How and where should we save it?
For all possible combinations of ti
and pi
we have to store the result. Where ti
is text Index and pi
is pattern index. So a 2d array of size text.length * pattern.length
should be sufficient. Following is the code after above changes
Boolean [][] dp;
public boolean isMatch(String text, String pattern) {
dp = new Boolean[text.length()][pattern.length()];
return isMatch(text, pattern, 0, 0);
}
private boolean isMatch(String text, String pattern, int ti, int pi) {
if (pi == pattern.length() ) return ti == text.length();
if (ti == text.length()) {
return isNextCharStar(pattern, pi) && isMatch(text, pattern, ti, pi + 2);
}
if(dp[ti][pi] != null) return dp[ti][pi];
boolean result;
final boolean hasFirstMatched = text.charAt(ti) == pattern.charAt(pi) || pattern.charAt(pi) == '.';
if (isNextCharStar(pattern, pi)) {
result = isMatch(text, pattern, ti, pi + 2);
if (hasFirstMatched) {
result = result || isMatch(text, pattern, ti + 1, pi);
}
dp[ti][pi] = result;
return result;
}
dp[ti][pi] = hasFirstMatched && isMatch(text, pattern, ti + 1, pi + 1);
return dp[ti][pi];
}
private boolean isNextCharStar(String pattern, int pi) {
return pi < pattern.length() - 1 && pattern.charAt(pi + 1) == '*';
}
If you will look closely only 3 lines have been changed to make it a DP solution from a recursive solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With