Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's time complexity of this algorithm for Wildcard Matching?

Wildcard Matching
Implement wildcard pattern matching with support for '?' and '*'.

  • '?' Matches any single character.
  • '*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:

  • isMatch("aa","a") → false
  • isMatch("aa","aa") → true
  • isMatch("aaa","aa") → false
  • isMatch("aa", "*") → true
  • isMatch("aa", "a*") → true
  • isMatch("ab", "?*") → true
  • isMatch("aab", "c*a*b") → false

Question:

  • What's time complexity?
  • What's space complexity?

Personally, I think

  • Time complexity highly dependents on the "input", can not write it out like T = O(?).
  • Space complexity = O(min(sLen, pLen)), because the max recursion depth = O(min(sLen, pLen)).

Have tried:
Write out Time complexity Expression, then draw recursion tree:

TC Expression => T(n) = T(n - 1) + O(1),            when pChar == '?' or pChar == sChar,
                      = T(n - 1) + T(n - 1) + O(1), when pChar == '*'.

I tried to draw recursion tree, but can not figure out how to draw it based on this kind of Time Complexity Expression.

Additional question:
Accurately, I hope to know how to calculate the time complexity for this kind of recursion, which has multi-unforeseen-branch based on input.

Note:

  • I know both iterative-solution and recursive-solution, but can not figure out how to calculate time complexity for the recursive-solution.
  • And this is not homework, this question is from "leetcode.com", I just hope to know the method how to calculate time complexity for this special kind of recursion.


Code: Java, Solution: Recursion.
public class Solution {
    public boolean isMatch(String s, String p) {
        // Input checking.
        if (s == null || p == null) return false;

        int sLen = s.length();
        int pLen = p.length();

        return helper(s, 0, sLen, p, 0, pLen);
    }

    private boolean helper(String s, int sIndex, int sLen,
                           String p, int pIndex, int pLen) {
        // Base case.
        if (sIndex >= sLen && pIndex >= pLen) return true;
        else if (sIndex >= sLen) {
            // Check whether the remaining part of p all "*".
            while (pIndex < pLen) {
                if (p.charAt(pIndex) != '*') return false;
                pIndex ++;
            }
            return true;

        } else if (pIndex >= pLen) {
            return false;
        }

        char sc = s.charAt(sIndex);
        char pc = p.charAt(pIndex);

        if (pc == '?' || pc == sc) {
            return helper(s, sIndex + 1, sLen, p, pIndex + 1, pLen);

        } else if (pc == '*') {
            return helper(s, sIndex, sLen, p, pIndex + 1, pLen) ||
                   helper(s, sIndex + 1, sLen, p, pIndex, pLen);

        } else return false;
    }
}
like image 400
Zhaonan Avatar asked Sep 04 '14 17:09

Zhaonan


1 Answers

In order to get an upper bound (i.e., big-O) on the worst case running time, you need to assume the very worst. The correct recurrence for an upper bound on the asymptotic running time of matching a string of length s with a pattern of length p is as follows.

T(s, p) | s == 0 || p == 0 = 1
        | s >  0 && p >  0 = 1 + max(T(s, p - 1) + T(s - 1, p),  // *
                                     T(s - 1, p - 1))            // ? or literal

Solving two-variable recurrences like this can be tricky. In this particular case, one can show fairly easily by induction that T is non-decreasing in both arguments, and so we can simplify the max.

T(s, p) | s == 0 || p == 0 = 1
        | s >  0 && p >  0 = 1 + T(s, p - 1) + T(s - 1, p)

Now one, with experience, can recognize the strong resemblance to a recurrence for binomial coefficients and make the (admittedly slightly magical) substitutions s = n - k and p = k and T(s, p) = 2 U(n, k) - 1.

2 U(n, k) - 1 | n == k || k == 0 = 1
              | n >  k && k >  0 = 1 + 2 U(n - 1, k - 1) - 1 + 2 U(n - 1, k) - 1

U(n, k) | n == k || k == 0 = 1
        | n >  k && k >  0 = U(n - 1, k - 1) + U(n - 1, k)

We conclude that T(s, p) = 2 U(s + p, p) - 1 = 2 ((s + p) choose p) - 1 = O(2^(s + p)/sqrt(s + p)) by Stirling's approximation (that's the best big-O bound possible in the single quantity s + p, but it's confusing if I write big-Theta).

So far we have proved only that T(s, p) is an upper bound. Since * was the more troublesome case, an idea for the worst case presents itself: make the pattern all *s. We have to be a little bit careful, because if the match succeeds, then there's some short-circuiting possible. However, it takes very little to prevent a match: consider the string 0000000000 and the pattern **********1 (adjust the number of 0s and * as desired). This example shows that the quoted bound is tight to within a polynomial factor (negligible, since the running time already is exponential).


For the purpose of getting just an upper bound, it's not necessary to work out these recurrences nearly so precisely. For example, I might guess that T(s, p) <= 3^(s + p) and proceed to verify that claim by induction.

T(s, p) | s = 0 || p = 0  = 1 <= 3^(s + p)                 // base case
        | s > 0 || p > 0  = 1 + T(s, p - 1) + T(s - 1, p)  // induction
                         <= 3^(s + p - 1) + 3^(s + p - 1) + 3^(s + p - 1)
                          = 3^(s + p)

Now, 3^(s + p) is a valid upper bound, though in light of the rest of this answer it's not tight. One now can look for waste in the bounds; 1 <= 3^(s + p - 1), for example, is a gross overestimate, and with some tricks, we can get the exponential base 2.

The more important order of business, however, is to get an exponential lower bound. From drawing the recursion tree for the bad example above, I might conjecture that T(s, p) >= 2^min(s, p). This can be verified by induction.

T(s, p) | s = 0 || p = 0  = 1 >= 2^min(s, p) = 2^0 = 1             // base case
        | s > 0 && p > 0  = 1 +     T(s, p - 1) +     T(s - 1, p)  // induction
                         >=     2^min(s, p - 1) + 2^min(s - 1, p)
                         >= 2^(min(s, p) - 1) + 2^(min(s, p) - 1)
                          = 2^min(s, p)
like image 92
David Eisenstat Avatar answered Oct 07 '22 00:10

David Eisenstat