Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Review an answer - Decode Ways

Tags:

java

I'm trying to solve a question and my question here is why doesn't my solution work?. Here's the question and below's the answer.

Question taken from leetcode: http://oj.leetcode.com/problems/decode-ways/

A message containing letters from A-Z is being encoded to numbers using the following mapping:

'A' -> 1
'B' -> 2
...
'Z' -> 26

Given an encoded message containing digits, determine the total number of ways to decode it.

For example,Given encoded message "12", it could be decoded as "AB" (1 2) or "L" (12). The number of ways decoding "12" is 2.

My solution:

The point with my solution is going backwards and multiplying the number of options if a split is found. By split I mean that digits can be interpreted in two ways. For example: 11 can interpreted in two ways 'aa' or 'k'.

public class Solution {
    public int numDecodings(String s) {
        if (s.isEmpty() || s.charAt(0) == '0') return 0;
        int decodings = 1;
        boolean used = false; // Signifies that the prev was already use as a decimal
        for (int index = s.length()-1 ; index > 0 ; index--) {
            char curr = s.charAt(index);
            char prev = s.charAt(index-1);
            if (curr == '0') {
                if (prev != '1' && prev != '2') {
                    return 0;
                }
                index--; // Skip prev because it is part of curr
                used = false;
            } else {
                if (prev == '1' || (prev == '2' && curr <= '6')) {
                    decodings = decodings * 2;
                    if (used) {
                        decodings = decodings - 1;
                    }
                    used = true;
                } else {
                    used = false;
                }
            }
        }
        return decodings;
    }
}

The failure is on the following input:

Input:"4757562545844617494555774581341211511296816786586787755257741178599337186486723247528324612117156948"
Output: 3274568
Expected: 589824
like image 460
AlikElzin-kilaka Avatar asked Dec 02 '13 23:12

AlikElzin-kilaka


4 Answers

This is a really interesting problem. First, I will show how I would solve this problem. We will see that it is not that complicated when using recursion, and that the problem can be solved using dynamic programming. We will produce a general solution that does not hardcode an upper limit of 26 for each code point.

A note on terminology: I will use the term code point (CP) not in the Unicode sense, but to refer to one of the code numbers 1 though 26. Each code point is represented as a variable number of characters. I will also use the terms encoded text (ET) and clear text (CT) in their obvious meanings. When talking about a sequence or array, the first element is called the head. The remaining elements are the tail.

Theoretical Prelude

  • The EC "" has one decoding: the CT "".
  • The EC "3" can be destructured into '3' + "", and has one decoding.
  • The EC "23" can be destructured as '2' + "3" or '23' + "". Each of the tails has one decoding, so the whole EC has two decodings.
  • The EC "123" can be destructured as '1' + "23" or '12' + "3". The tails have two and one decodings respectively. The whole EC has three decodings. The destructuring '123' + "" is not valid, because 123 > 26, our upper limit.
  • … and so on for ECs of any length.

So given a string like "123", we can obtain the number of decodings by finding all valid CPs at the beginning, and summing up the number of decodings of each tail.

The most difficult part of this is to find valid heads. We can get the maximal length of the head by looking at a string representation of the upper limit. In our case, the head can be up to two characters long. But not all heads of appropriate lengths are valid, because they have to be ≤ 26 as well.

Naive Recursive Implementation

Now we have done all the necessary work for a simple (but working) recursive implementation:

static final int upperLimit  = 26;
static final int maxHeadSize = ("" + upperLimit).length();

static int numDecodings(String encodedText) {
    // check base case for the recursion
    if (encodedText.length() == 0) {
        return 1;
    }

    // sum all tails
    int sum = 0;
    for (int headSize = 1; headSize <= maxHeadSize && headSize <= encodedText.length(); headSize++) {
        String head = encodedText.substring(0, headSize);
        String tail = encodedText.substring(headSize);
        if (Integer.parseInt(head) > upperLimit) {
            break;
        }
        sum += numDecodings(tail);
    }

    return sum;
}

Cached Recursive Implementation

Obviously this isn't very efficient, because (for longer ETs), the same tail will be analyzed multiple times. Also, we create a lot of temporary strings, but we'll let that be for now. One thing we can easily do is to memoize the number of decodings of a specific tail. For that, we use an array of the same length as the input string:

static final int upperLimit  = 26;
static final int maxHeadSize = ("" + upperLimit).length();

static int numDecodings(String encodedText) {
    return numDecodings(encodedText, new Integer[1 + encodedText.length()]);
}

static int numDecodings(String encodedText, Integer[] cache) {
    // check base case for the recursion
    if (encodedText.length() == 0) {
        return 1;
    }

    // check if this tail is already known in the cache
    if (cache[encodedText.length()] != null) {
        return cache[encodedText.length()];
    }

    // cache miss -- sum all tails
    int sum = 0;
    for (int headSize = 1; headSize <= maxHeadSize && headSize <= encodedText.length(); headSize++) {
        String head = encodedText.substring(0, headSize);
        String tail = encodedText.substring(headSize);
        if (Integer.parseInt(head) > upperLimit) {
            break;
        }
        sum += numDecodings(tail, cache);  // pass the cache through
    }

    // update the cache
    cache[encodedText.length()] = sum;
    return sum;
}

Note that we use an Integer[], not an int[]. This way, we can check for non-existent entries using a test for null. This solution is not only correct, it is also comfortably fast – naive recursion runs in O(number of decodings) time, while the memoized version runs in O(string length) time.

Towards a DP Solution

When you run above code in your head, you will notice that the first invocation with the whole string will have a cache miss, then calculate the number of decodings for the first tail, which also misses the cache every time. We can avoid this by evaluating the tails first, starting from the end of the input. Because all tails will have been evaluated before the whole string is, we can remove the checks for cache misses. Now we also don't have any reason for recursion, because all previous results are already in the cache.

static final int upperLimit  = 26;
static final int maxHeadSize = ("" + upperLimit).length();

static int numDecodings(String encodedText) {
    int[] cache = new int[encodedText.length() + 1];

    // base case: the empty string at encodedText.length() is 1:
    cache[encodedText.length()] = 1;

    for (int position = encodedText.length() - 1; position >= 0; position--) {
        // sum directly into the cache
        for (int headSize = 1; headSize <= maxHeadSize && headSize + position <= encodedText.length(); headSize++) {
            String head = encodedText.substring(position, position + headSize);
            if (Integer.parseInt(head) > upperLimit) {
                break;
            }
            cache[position] += cache[position + headSize];
        }
    }

    return cache[0];
}

This algorithm could be optimized further by noticing that we only ever query the last maxHeadSize elements in the cache. So instead of an array, we could use a fixed-sized queue. At that point, we would have a dynamic programming solution that runs in *O(input length) time and O(maxHeadSize) space.

Specialization for upperLimit = 26

The above algorithms were kept as general as possible, but we can go and manually specialize it for a specific upperLimit. This can be useful because it allows us to do various optimizations. However, this introduces “magic numbers” that make the code harder to maintain. Such manual specializations should therefore be avoided in non-critical software (and the above algorithm is already as fast as it gets).

static int numDecodings(String encodedText) {
    // initialize the cache
    int[] cache = {1, 0, 0};

    for (int position = encodedText.length() - 1; position >= 0; position--) {
        // rotate the cache
        cache[2] = cache[1];
        cache[1] = cache[0];
        cache[0] = 0;

        // headSize == 1
        if (position + 0 < encodedText.length()) {
            char c = encodedText.charAt(position + 0);

            // 1 .. 9
            if ('1' <= c && c <= '9') {
                cache[0] += cache[1];
            }
        }

        // headSize == 2
        if (position + 1 < encodedText.length()) {
            char c1 = encodedText.charAt(position + 0);
            char c2 = encodedText.charAt(position + 1);

            // 10 .. 19
            if ('1' == c1) {
                cache[0] += cache[2];
            }
            // 20 .. 26
            else if ('2' == c1 && '0' <= c2 && c2 <= '6') {
                cache[0] += cache[2];
            }
        }
    }

    return cache[0];
}

Comparision with your code

The code is superficially similar. However, your parsing around characters is more convoluted. You have introduced a used variable that, if set, will decrement the decode count in order to account for double-character CPs. This is wrong, but I am not sure why. The main problem is that you are doubling the count at almost every step. As we have seen, the previous counts are added, and may very well be different.

This indicates that you wrote the code without proper preparation. You can write many kinds of software without having to think too much, but you can't do without careful analysis when designing an algorithm. For me, it is often helpful to design an algorithm on paper, and draw diagrams of each step (along the lines of the “Theoretical Prelude” of this answer). This is especially useful when you are thinking too much about the language you are going to implement in, and too little about possibly wrong assumptions.

I suggest that you read up on “proofs by induction” to understand how to write a correct recursive algorithm. Once you have a recursive solution, you can always translate it into an iterative version.

like image 173
amon Avatar answered Nov 18 '22 20:11

amon


So here is some what simpler way out for your problem. This is pretty close to calculating Fibonacci, with the difference that there are condition checks on each smaller size subproblem. The space complexity is O(1) and time is O(n)

The code is in C++.

   int numDecodings(string s)
   {
    if( s.length() == 0 ) return 0;


    int j  = 0;
    int p1 = (s[j] != '0' ? 1 : 0);         // one step prev form j=1
    int p2 = 1;                             // two step prev from j=1, empty
    int p = p1;

    for( int j = 1; j < s.length(); j++ )
    {
        p = 0;

        if( s[j] != '0' ) 
            p += p1;    


        if( isValidTwo(s, j-1, j) )
            p += p2;

        if( p==0 )                  // no further decoding necessary, 
            break;                  // as the prefix 0--j is has no possible decoding.

        p2 = p1;                    // update prev for next j+1;
        p1 = p;

    }

    return p;
    }

    bool isValidTwo(string &s, int i, int j)
    {
        int val= 10*(s[i]-'0')+s[j]-'0';

        if ( val <= 9 ) 
        return false;

        if ( val > 26 ) 
        return false;

        return true;

    }
like image 6
U J Avatar answered Nov 18 '22 18:11

U J


Here is my code to solve the problem. I use DP , I think it's clear to understand.

Written in Java

public class Solution {
        public int numDecodings(String s) {
            if(s == null || s.length() == 0){
                return 0;
            }
            int n = s.length();
            int[] dp = new int[n+1];
            dp[0] = 1;
            dp[1] = s.charAt(0) != '0' ? 1 : 0;

            for(int i = 2; i <= n; i++){
                int first = Integer.valueOf(s.substring(i-1,i));
                int second = Integer.valueOf(s.substring(i-2,i));
                if(first >= 1 && first <= 9){
                    dp[i] += dp[i-1];
                }
                if(second >= 10 && second <= 26){
                    dp[i] += dp[i-2];
                }

            }
            return dp[n];

        }

}

like image 6
FrancisGeek Avatar answered Nov 18 '22 20:11

FrancisGeek


Since I struggled with this problem myself, here is my solution and reasoning. Probably I will mostly repeat what amon wrote, but maybe someone will find it helpful. Also it's c# not java.

Let's say that we have input "12131" and want to obtain all possible decoded strings. Straightforward recursive solution would do iterate from left to right, obtain valid 1 and 2 digits heads, and invoke function recursively for tail.

We can visualize it using a tree:

enter image description here

There are 5 leaves and this is number of all possible decoded strings. There are also 3 empty leaves, because number 31 cannot be decoded into letter, so these leaves are invalid.

Algorithm might look like this:

public IList<string> Decode(string s)
{
    var result = new List<string>();

    if (s.Length <= 2)
    {
        if (s.Length == 1)
        {
            if (s[0] != '0')
                result.Add(this.ToASCII(s));
        }
        else if (s.Length == 2)
        {
            if (s[0] != '0' && s[1] != '0')
                result.Add(this.ToASCII(s.Substring(0, 1)) + this.ToASCII(s.Substring(1, 1)));
            if (s[0] != '0' && int.Parse(s) > 0 && int.Parse(s) <= 26)
                result.Add(this.ToASCII(s));
        }
    }
    else
    {
        for (int i = 1; i <= 2; ++i)
        {
            string head = s.Substring(0, i);
            if (head[0] != '0' && int.Parse(head) > 0 && int.Parse(head) <= 26)
            {
                var tails = this.Decode(s.Substring(i));
                foreach (var tail in tails)
                    result.Add(this.ToASCII(head) + tail);
            }
        }
    }

    return result;
}

public string ToASCII(string str)
{
    int number = int.Parse(str);
    int asciiChar = number + 65 - 1; // A in ASCII = 65
    return ((char)asciiChar).ToString();
}

We have to take care of numbers starting with 0 ("0", "03", etc.), and greater than 26.

Because in this problem we need only count decoding ways, and not actual strings, we can simplify this code:

public int DecodeCount(string s)
{
    int count = 0;

    if (s.Length <= 2)
    {
        if (s.Length == 1)
        {
            if (s[0] != '0')
                count++;
        }
        else if (s.Length == 2)
        {
            if (s[0] != '0' && s[1] != '0')
                count++;
            if (s[0] != '0' && int.Parse(s) > 0 && int.Parse(s) <= 26)
                count++;
        }
    }
    else
    {
        for (int i = 1; i <= 2; ++i)
        {
            string head = s.Substring(0, i);
            if (head[0] != '0' && int.Parse(head) > 0 && int.Parse(head) <= 26)
                count += this.DecodeCount(s.Substring(i));
        }
    }

    return count;
}

The problem with this algorithm is that we compute results for the same input string multiple times. For example there are 3 nodes ending with 31: ABA31, AU31, LA31. Also there are 2 nodes ending with 131: AB131, L131. We know that if node ends with 31 it has only one child, since 31 can be decoded only in one way to CA. Likewise, we know that if string ends with 131 it has 2 children, because 131 can be decoded into ACA or LA. Thus, instead of computing it all over again we can cache it in map, where key is string (eg: "131"), and value is number of decoded ways:

public int DecodeCountCached(string s, Dictionary<string, int> cache)
{
    if (cache.ContainsKey(s))
        return cache[s];

    int count = 0;

    if (s.Length <= 2)
    {
        if (s.Length == 1)
        {
            if (s[0] != '0')
                count++;
        }
        else if (s.Length == 2)
        {
            if (s[0] != '0' && s[1] != '0')
                count++;
            if (s[0] != '0' && int.Parse(s) > 0 && int.Parse(s) <= 26)
                count++;
        }
    }
    else
    {
        for (int i = 1; i <= 2; ++i)
        {
            string head = s.Substring(0, i);
            if (head[0] != '0' && int.Parse(head) > 0 && int.Parse(head) <= 26)
                count += this.DecodeCountCached(s.Substring(i), cache);
        }
    }

    cache[s] = count;
    return count;
}

We can refine this even further. Instead of using strings as a keys, we can use length, because what is cached is always tail of input string. So instead of caching strings: "1", "31", "131", "2131", "12131" we can cache lengths of tails: 1, 2, 3, 4, 5:

public int DecodeCountDPTopDown(string s, Dictionary<int, int> cache)
{
    if (cache.ContainsKey(s.Length))
        return cache[s.Length];

    int count = 0;

    if (s.Length <= 2)
    {
        if (s.Length == 1)
        {
            if (s[0] != '0')
                count++;
        }
        else if (s.Length == 2)
        {
            if (s[0] != '0' && s[1] != '0')
                count++;
            if (s[0] != '0' && int.Parse(s) > 0 && int.Parse(s) <= 26)
                count++;
        }
    }
    else
    {
        for (int i = 1; i <= 2; ++i)
        {
            string head = s.Substring(0, i);
            if (s[0] != '0' && int.Parse(head) > 0 && int.Parse(head) <= 26)
                count += this.DecodeCountDPTopDown(s.Substring(i), cache);
        }
    }

    cache[s.Length] = count;
    return count;
}

This is recursive top-down dynamic programming approach. We start from the begining, and then recursively compute solutions for tails, and memoize those results for further use.

We can translate it to bottom-up iterative DP solution. We start from the end and cache results for tiles like in previous solution. Instead of map we can use array because keys are integers:

public int DecodeCountBottomUp(string s)
{
    int[] chache = new int[s.Length + 1];
    chache[0] = 0; // for empty string;

    for (int i = 1; i <= s.Length; ++i)
    {
        string tail = s.Substring(s.Length - i, i);

        if (tail.Length == 1)
        {
            if (tail[0] != '0')
                chache[i]++;
        }
        else if (tail.Length == 2)
        {
            if (tail[0] != '0' && tail[1] != '0')
                chache[i]++;
            if (tail[0] != '0' && int.Parse(tail) > 0 && int.Parse(tail) <= 26)
                chache[i]++;
        }
        else
        {
            if (tail[0] != '0')
                chache[i] += chache[i - 1];

            if (tail[0] != '0' && int.Parse(tail.Substring(0, 2)) > 0 && int.Parse(tail.Substring(0, 2)) <= 26)
                chache[i] += chache[i - 2];
        }
    }

    return chache.Last();
}

Some people simplify it even further, initializing cache[0] with value 1, so they can get rid of conditions for tail.Length==1 and tail.Length==2. For me it is unintuitive trick though, since clearly for empty string there is 0 decode ways not 1, so in such case additional condition must be added to handle empty input:

public int DecodeCountBottomUp2(string s)
{
    if (s.Length == 0)
        return 0;

    int[] chache = new int[s.Length + 1];
    chache[0] = 1;
    chache[1] = s.Last() != '0' ? 1 : 0;

    for (int i = 2; i <= s.Length; ++i)
    {
        string tail = s.Substring(s.Length - i, i);

        if (tail[0] != '0')
            chache[i] += chache[i - 1];

        if (tail[0] != '0' && int.Parse(tail.Substring(0, 2)) > 0 && int.Parse(tail.Substring(0, 2)) <= 26)
            chache[i] += chache[i - 2];
    }

    return chache.Last();
}
like image 6
anth Avatar answered Nov 18 '22 20:11

anth