Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern matching interview Q

Tags:

I was recently in an interview and they asked me the following question:

Write a function to return true if a string matches a pattern, false otherwise

Pattern: 1 character per item, (a-z), input: space delimited string

This was my solution for the first problem:

static boolean isMatch(String pattern, String input) {
    char[] letters = pattern.toCharArray();
    String[] split = input.split("\\s+");

    if (letters.length != split.length) {
        // early return - not possible to match if lengths aren't equal
        return false;
    }

    Map<String, Character> map = new HashMap<>();
    // aaaa test test test1 test1
    boolean used[] = new boolean[26];
    for (int i = 0; i < letters.length; i++) {
        Character existing = map.get(split[i]);
        if (existing == null) {
            // put into map if not found yet
            if (used[(int)(letters[i] - 'a')]) {
                return false;
            }

            used[(int)(letters[i] - 'a')] = true;
            map.put(split[i], letters[i]);
        } else {
            // doesn't match - return false
            if (existing != letters[i]) {
                return false;
            }
        }
    }

    return true;
}

public static void main(String[] argv) {
    System.out.println(isMatch("aba", "blue green blue"));
    System.out.println(isMatch("aba", "blue green green"));
}

The next part of the problem stumped me:

With no delimiters in the input, write the same function.

eg:

isMatch("aba", "bluegreenblue") -> true
isMatch("abc","bluegreenyellow") -> true
isMatch("aba", "t1t2t1") -> true
isMatch("aba", "t1t1t1") -> false
isMatch("aba", "t1t11t1") -> true
isMatch("abab", "t1t2t1t2") -> true
isMatch("abcdefg", "ieqfkvu") -> true
isMatch("abcdefg", "bluegreenredyellowpurplesilvergold") -> true
isMatch("ababac", "bluegreenbluegreenbluewhite") -> true
isMatch("abdefghijklmnopqrstuvwxyz", "zyxwvutsrqponmlkjihgfedcba") -> true

I wrote a bruteforce solution (generating all possible splits of the input string of size letters.length and checking in turn against isMatch) but the interviewer said it wasn't optimal.

I have no idea how to solve this part of the problem, is this even possible or am I missing something?

They were looking for something with a time complexity of O(M x N ^ C), where M is the length of the pattern and N is the length of the input, C is some constant.

Clarifications

  • I'm not looking for a regex solution, even if it works.
  • I'm not looking for the naive solution that generates all possible splits and checks them, even with optimization since that'll always be exponential time.
like image 697
David Xu Avatar asked Feb 13 '15 20:02

David Xu


People also ask

Are pattern question asked in interview?

Coding interviews often tend to ask a pattern program to test the candidates. Usually, in an interview process lasting four rounds, the first round tends to be one programming round, and there is a probability that one of the questions asked could be a pattern program.

What is pattern matching explain?

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern.

What is pattern matching give an example?

For example, x* matches any number of x characters, [0-9]* matches any number of digits, and . * matches any number of anything. A regular expression pattern match succeeds if the pattern matches anywhere in the value being tested.

Why is pattern matching important?

Pattern matching is used to determine whether source files of high-level languages are syntactically correct. It is also used to find and replace a matching pattern in a text or code with another text/code. Any application that supports search functionality uses pattern matching in one way or another.


2 Answers

It is possible to optimize a backtracking solution. Instead of generating all splits first and then checking that it is a valid one, we can check it "on fly". Let's assume that we have already split a prefix(with length p) of the initial string and have matched i characters from the pattern. Let's take look at the i + 1 character.

  1. If there is a string in the prefix that corresponds to the i + 1 letter, we should just check that a substring that starts at the position p + 1 is equal to it. If it is, we just proceed to i + 1 and p + the length of this string. Otherwise, we can kill this branch.

  2. If there is no such string, we should try all substrings that start in the position p + 1 and end somewhere after it.

We can also use the following idea to reduce the number of branches in your solution: we can estimate the length of the suffix of the pattern which has not been processed yet(we know the length for the letters that already stand for some strings, and we know a trivial lower bound of the length of a string for any letter in the pattern(it is 1)). It allows us to kill a branch if the remaining part of the initial string is too short to match a the rest of the pattern.

This solution still has an exponential time complexity, but it can work much faster than generating all splits because invalid solutions can be thrown away much earlier, so the number of reachable states can reduce significantly.

like image 101
kraskevich Avatar answered Sep 28 '22 11:09

kraskevich


I feel like this is cheating, and I'm not convinced the capture group and reluctant quantifier will do the right thing. Or maybe they're looking to see if you can recognize that, because of how quantifiers work, matching is ambiguous.

boolean matches(String s, String pattern) {
    StringBuilder patternBuilder = new StringBuilder();
    Map<Character, Integer> backreferences = new HashMap<>();
    int nextBackreference = 1;

    for (int i = 0; i < pattern.length(); i++) {
        char c = pattern.charAt(i);

        if (!backreferences.containsKey(c)) {
            backreferences.put(c, nextBackreference++);
            patternBuilder.append("(.*?)");
        } else {
            patternBuilder.append('\\').append(backreferences.get(c));
        }
    }

    return s.matches(patternBuilder.toString());
}
like image 21
David Ehrmann Avatar answered Sep 28 '22 10:09

David Ehrmann