Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the time complexity of this example from "Cracking the Coding Interview" O(k c^k)?

This question is from Cracking the Coding Interview 6th Edition, Question V1.11.

The following code prints all strings of length k where the characters are in sorted order. It does this by generating all strings of length k and then checking if each is sorted. What is the runtime?

package QVI_11_Print_Sorted_Strings;


public class Question {

    public static int numChars = 26;

    public static void printSortedStrings(int remaining) {
        printSortedStrings(remaining, "");
    }

    public static void printSortedStrings(int remaining, String prefix) {
        if (remaining == 0) {
            if (isInOrder(prefix)) {
                System.out.println(prefix);
            }
        } else {
            for (int i = 0; i < numChars; i++) {
                char c = ithLetter(i);
                printSortedStrings(remaining - 1, prefix + c);
            }
        }
    }

    public static boolean isInOrder(String s) {
        for (int i = 1; i < s.length(); i++) {
            int prev = ithLetter(s.charAt(i - 1));
            int curr = ithLetter(s.charAt(i));
            if (prev > curr) {
                return false;
            }
        }
        return true;
    }

    public static char ithLetter(int i) {
        return (char) (((int) 'a') + i);
    }

    public static void main(String[] args) {
        printSortedStrings(5);
    }

}

The answer is O(k c^k) where k is the length of the string and c is the number of characters in the alphabet. It takes O(c^k) time to generate each string. Then, we need to check that each of these is sorted, which takes O(k) time.

Now, I understand where O(k) comes from but I don't see how O(c^k) came about.

like image 842
MrBuggy Avatar asked May 23 '17 22:05

MrBuggy


People also ask

What is Cracking the Coding Interview written in?

What programming language is used in Cracking the Coding Interview? The vast majority of problems in Cracking the Coding Interview are written in Java.

How many problems are there in cracking the coding interview?

WHAT'S INSIDE? 189 programming interview questions, ranging from the basics to the trickiest algorithm problems. A walk-through of how to derive each solution, so that you can learn how to get there yourself. Hints on how to solve each of the 189 questions, just like what you would get in a real interview.

How long did it take you to read Cracking the Coding Interview?

Cracking the Coding Interview: 189 Programming Questions and Solutions. The average reader will spend 11 hours and 27 minutes reading this book at 250 WPM (words per minute).


2 Answers

The above algorithm works by recursively generating all possible strings of length k using a set of c choices of characters. The number of possible strings of length k you can make from c choices of letters is equal to ck. For example, if I have two letters to pick from (a and b) and I have strings of length three, there are 23 = 8 possible strings I can make:

  • aaa
  • aab
  • aba
  • abb
  • baa
  • bab
  • bba
  • bbb

To better see where this comes from, notice that every time you add a new letter to the end of the string, you have c choices for what that letter can be, so the number of possible strings is

c · c · ... · c (k times) =

ck

This means that the above code, which works by generating each of these strings, must do at least Ω(ck) work, since that's the minimum number of strings to check.

So how much work does it do for each of these strings? Here's where things get tricky. These strings are built up one character at a time by constantly appending a new character from the list of possible characters. In Java, appending to a string makes a full copy of the string, so the cost of appending the first character is (roughly) 1, the second is (roughly) 2, then 3, then 4, etc. This means that the cost of building up a full string of length k will be

1 + 2 + 3 + ... + k

= Θ(k2)

So it actually seems like the runtime here would be O(ck k2) rather than O(k ck), since the cost of building all those strings adds up rather quickly.

However, that's not a tight bound. For example, some of the work done to form the string aaa is also used to form the string aab, since both strings are formed by starting with aa and concatenating another character in.

To get a better analysis, we can sum up the total work done performing concatenations at each level in the tree. The zeroth level of the tree has a single string of size 0, so no concatenations are done. The first level of the tree has c strings of size 1, requiring c work to do to the concatenations. The second level of the tree has c2 strings of size 2, requiring 2c2 work to form. The third level of the three has c3 strings of size 3, requiring 3c3 work to form. More generally, level i requires ici work to form. This means we want to determine

0c0 + 1c1 + 2c2 + ... + kck.

This summation works out to Θ(kck), with a lower exponent of the k term.

To summarize:

  • The ck term comes from the number of strings that needed to be checked.
  • The k term comes from from the work done per string.
  • A careful analysis shows that the time to generate all these strings does not affect the overall runtime of O(k ck).
like image 147
templatetypedef Avatar answered Nov 04 '22 22:11

templatetypedef


The recursive calls to printSortedStrings form a recursion tree. Because the total number of nodes is about a constant factor of the number of nodes at the lowest level, and the upper levels don't do more work than the lowest level, then only the cost of the lowest level is significant.

Take for example, c being 2, and k being 3:

Looking at the first level of the tree this produces:

2 (or 2^1) strings, "a", "b".

The second level produces:

4 (or 2^2) strings, "aa", "ab", "ba", "bb".

The third level produces:

8 (or 2^3) strings, "aaa", "aab", "aba", "abb", "baa", "bab", "bba", "bbb".

The cost to produce the next string in sequence is linear. The characters from the old string plus the new character are copied into the new string.

This depends on the length of the string, so for the first level the cost is 1, the second level cost is 2, and the third level cost is 3. Multiply by the number of items at each level:

(2^1)*1 + (2^2)*2 + (2^3)*3 = 34

This pattern would continue if k was 4, then it would be:

(2^1)*1 + (2^2)*2 + (2^3)*3 + (2^4)*4 = 98

This thing about sums like this is that the last term is greater than all the previous ones combined. So only the last term is significant:

(2^1)*1 + (2^2)*2 + (2^3)*3 < (2^4)*4

Because:

(2^1)*1 + (2^2)*2 + (2^3)*3
< (2^1)*3 + (2^2)*3 + (2^3)*3
= (2^1 + 2^2 + 2^3) * 3
= (2^4 - 2) * 3
< (2^4 - 2) * 4
< (2^4) * 4

So the whole sum is less than 2*(2^4)*4, or 2*(c^k)*k, or O(k c^k).

At the end of the recursion, there is more linear time work. There are c^k nodes with k work giving another O(k c^k) cost. So the total cost is still O(k c^k).

One other thing is that the for-loop takes linear time per string as well, but this works out to a total of O(k c^k), for similar reasons as above.

like image 23
fgb Avatar answered Nov 04 '22 21:11

fgb