I was recently in an interview and they asked me the following question:
Write a function to return true if a string matches a pattern, false otherwise
Pattern: 1 character per item, (a-z), input: space delimited string
This was my solution for the first problem:
static boolean isMatch(String pattern, String input) {
char[] letters = pattern.toCharArray();
String[] split = input.split("\\s+");
if (letters.length != split.length) {
// early return - not possible to match if lengths aren't equal
return false;
}
Map<String, Character> map = new HashMap<>();
// aaaa test test test1 test1
boolean used[] = new boolean[26];
for (int i = 0; i < letters.length; i++) {
Character existing = map.get(split[i]);
if (existing == null) {
// put into map if not found yet
if (used[(int)(letters[i] - 'a')]) {
return false;
}
used[(int)(letters[i] - 'a')] = true;
map.put(split[i], letters[i]);
} else {
// doesn't match - return false
if (existing != letters[i]) {
return false;
}
}
}
return true;
}
public static void main(String[] argv) {
System.out.println(isMatch("aba", "blue green blue"));
System.out.println(isMatch("aba", "blue green green"));
}
The next part of the problem stumped me:
With no delimiters in the input, write the same function.
eg:
isMatch("aba", "bluegreenblue") -> true
isMatch("abc","bluegreenyellow") -> true
isMatch("aba", "t1t2t1") -> true
isMatch("aba", "t1t1t1") -> false
isMatch("aba", "t1t11t1") -> true
isMatch("abab", "t1t2t1t2") -> true
isMatch("abcdefg", "ieqfkvu") -> true
isMatch("abcdefg", "bluegreenredyellowpurplesilvergold") -> true
isMatch("ababac", "bluegreenbluegreenbluewhite") -> true
isMatch("abdefghijklmnopqrstuvwxyz", "zyxwvutsrqponmlkjihgfedcba") -> true
I wrote a bruteforce solution (generating all possible splits of the input string of size letters.length
and checking in turn against isMatch
) but the interviewer said it wasn't optimal.
I have no idea how to solve this part of the problem, is this even possible or am I missing something?
They were looking for something with a time complexity of O(M x N ^ C)
, where M is the length of the pattern and N is the length of the input, C is some constant.
Clarifications
Coding interviews often tend to ask a pattern program to test the candidates. Usually, in an interview process lasting four rounds, the first round tends to be one programming round, and there is a probability that one of the questions asked could be a pattern program.
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern.
For example, x* matches any number of x characters, [0-9]* matches any number of digits, and . * matches any number of anything. A regular expression pattern match succeeds if the pattern matches anywhere in the value being tested.
Pattern matching is used to determine whether source files of high-level languages are syntactically correct. It is also used to find and replace a matching pattern in a text or code with another text/code. Any application that supports search functionality uses pattern matching in one way or another.
It is possible to optimize a backtracking solution. Instead of generating all splits first and then checking that it is a valid one, we can check it "on fly". Let's assume that we have already split a prefix(with length p
) of the initial string and have matched i
characters from the pattern. Let's take look at the i + 1
character.
If there is a string in the prefix that corresponds to the i + 1
letter, we should just check that a substring that starts at the position p + 1
is equal to it. If it is, we just proceed to i + 1
and p + the length of this string
. Otherwise, we can kill this branch.
If there is no such string, we should try all substrings that start in the position p + 1
and end somewhere after it.
We can also use the following idea to reduce the number of branches in your solution: we can estimate the length of the suffix of the pattern which has not been processed yet(we know the length for the letters that already stand for some strings, and we know a trivial lower bound of the length of a string for any letter in the pattern(it is 1)). It allows us to kill a branch if the remaining part of the initial string is too short to match a the rest of the pattern.
This solution still has an exponential time complexity, but it can work much faster than generating all splits because invalid solutions can be thrown away much earlier, so the number of reachable states can reduce significantly.
I feel like this is cheating, and I'm not convinced the capture group and reluctant quantifier will do the right thing. Or maybe they're looking to see if you can recognize that, because of how quantifiers work, matching is ambiguous.
boolean matches(String s, String pattern) {
StringBuilder patternBuilder = new StringBuilder();
Map<Character, Integer> backreferences = new HashMap<>();
int nextBackreference = 1;
for (int i = 0; i < pattern.length(); i++) {
char c = pattern.charAt(i);
if (!backreferences.containsKey(c)) {
backreferences.put(c, nextBackreference++);
patternBuilder.append("(.*?)");
} else {
patternBuilder.append('\\').append(backreferences.get(c));
}
}
return s.matches(patternBuilder.toString());
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With