I read this article Retiring a Great Interview Problem, the author came up with a work break
problem and gave three solutions. The efficient one uses memoization
algorithm and the author said its worst case time complexity is O(n^2)
since the key insight is that SegmentString is only called on suffixes of the original input string, and that there are only O(n) suffixes
.
However, I find it difficult for me to understand why it is O(n^2)
. Could someone please give me a hint or proof?
Work Break Problem:
Given an input string and a dictionary of words,
segment the input string into a space-separated
sequence of dictionary words if possible. For
example, if the input string is "applepie" and
dictionary contains a standard set of English words,
then we would return the string "apple pie" as output.
The memoization algorithm from Retiring a Great Interview Problem
Map<String, String> memoized;
String SegmentString(String input, Set<String> dict) {
if (dict.contains(input))
return input;
if (memoized.containsKey(input) {
return memoized.get(input);
}
int len = input.length();
for (int i = 1; i < len; i++) {
String prefix = input.substring(0, i);
if (dict.contains(prefix)) {
String suffix = input.substring(i, len);
String segSuffix = SegmentString(suffix, dict);
if (segSuffix != null) {
return prefix + " " + segSuffix;
}
}
}
memoized.put(input, null);
return null;
}
Because no node is called more than once, this dynamic programming strategy known as memoization has a time complexity of O(N), not O(2^N).
With recursion, the trick of using Memoization the cache results will often dramatically improve the time complexity of the problem.
Explanation: As the mentioned approach uses the memoization technique it always stores the previously calculated values. Due to this, the time complexity is decreased but the space complexity is increased.
Memoized time complexity calculation Let's take time complexity of n to be T(n) , hence T(n) = T(n-1) + T(n-2) . This is because F(n-2) is in the cache when we calculate F(n - 1) . Therefore, the function of F(n-2) is 1 (reading from cached equation), hence T(n) = T(n-1) + 1 = T(n-2) + 2 = ... = T(n-n) + n.
Intuition
(it is clear that the worst case is the one where there are no solutions, so I focus on that)
Since the recursive call is made before putting values in the memoization-cache, the last (shortest) suffixes will get cached first. This is because the function is first called with a string of length N, which then calls itself with a string of length N-1, which then .... , with a string of len 0, which is cached and returns, then the string of length 1 is cached and returns, ..., length N is cached and returns.
As the hint suggests, only suffixes get cached, and there are only N of them. This means that by the time the top-level function gets the result of its first recursive call (i.e. on a suffix of length N-1), the cache is already populated with the N-1 suffixes.
Now, assuming the N-1 last suffixes are already cached, the for-loop needs to make N-1 recursive calls, each taking O(1) (since the answer is already cached), totalling O(N). However, (pre-) building the cache of the last N-1 takes O(N^2) (explained below), totalling with O(N)+O(N^2) = O(N^2).
Proof by mathematical induction
This explanation can easily be translated to a formal proof using induction. Here is the gist of it:
(f(N)
being the number of operations required for the function to complete on an input of length N)
Induction hypothesis -- there exists a constant c
s.t. f(N) < c*N^2
.
The base case is trivial -- for a string of length 1, you can find a constant c
such that f(1) < c*N^2 = c
Induction step
Recap of the order things happen:
Step 1: the function first calls itself on the suffix of length N-1, building a cache containing the answer for the last N-1 suffixes
Step 2: the function then calls itself O(N) more times, each taking O(1) (thanks to this test: if (memoized.containsKey(input))
, and the fact that the cache has already been populated in Step 1).
So we get:
f(N+1) = f(N) + a*N < (by the hypothesis)
< c*N^2 + a*N < (for c large enough)
< c*N^2 + 2*c*N + c =
= c*(N+1)^2
Thus we got f(N+1) < c*(N+1)^2
, which completes the proof.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With