Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a dictionary, find all possible letter orderings

I was recently asked the following interview question:

You have a dictionary page written in an alien language. Assume that the language is similar to English and is read/written from left to right. Also, the words are arranged in lexicographic order. For example the page could be: ADG, ADH, BCD, BCF, FM, FN
You have to give all lexicographic orderings possible of the character set present in the page.

My approach is as follows: A has higher precedence than B and G has higher precedence than H. Therefore we have the information about ordering for some characters:

A->B, B->F, G->H, D->F, M->N

The possible orderings can be ABDFGNHMC, ACBDFGNHMC, ... My approach was to use an array as position holder and generate all permutations to identify all valid orderings. The worst case time complexity for this is N! where N is the size of character set. Can we do better than the brute force approach.

Thanks in advance.

like image 382
user434345 Avatar asked Sep 02 '11 10:09

user434345


Video Answer


2 Answers

Donald Knuth has written the paper A Structured Program to Generate all Topological Sorting Arrangements. This paper was originally pupblished in 1974. The following quote from the paper brought me to a better understanding of the problem (in the text the relation i < j stands for "i precedes j"):

A natural way to solve this problem is to let x1 be an element having no predecessors, then to erase all relations of the from x1 < j and to let x2 be an element ≠ x1 with no predecessors in the system as it now exists, then to erase all relations of the from x2 < j , etc. It is not difficult to verify that this method will always succeed unless there is an oriented cycle in the input. Moreover, in a sense it is the only way to proceed, since x1 must be an element without predecessors, and x2 must be without predecessors when all relations x1 < j are deleted, etc. This observation leads naturally to an algorithm that finds all solutions to the topological sorting problem; it is a typical example of a "backtrack" procedure, where at every stage we consider a subproblem of the from "Find all ways to complete a given partial permutation x1x2...xk to a topological sort x1x2...xn ." The general method is to branch on all possible choices of xk+1.
A central problem in backtrack applications is to find a suitable way to arrange the data so that it is easy to sequence through the possible choices of xk+1 ; in this case we need an efficient way to discover the set of all elements ≠ {x1,...,xk} which have no predecessors other than x1,...,xk, and to maintain this knowledge efficiently as we move from one subproblem to another.

The paper includes a pseudocode for a efficient algorithm. The time complexity for each output is O(m+n), where m ist the number of input relations and n is the number of letters. I have written a C++ program, that implements the algorithm described in the paper – maintaining variable and function names –, which takes the letters and relations from your question as input. I hope that nobody complains about giving the program to this answer – because of the language-agnostic tag.

#include <iostream>
#include <deque>
#include <vector>
#include <iterator>
#include <map>

// Define Input
static const char input[] =
    { 'A', 'D', 'G', 'H', 'B', 'C', 'F', 'M', 'N' };
static const char crel[][2] =
    {{'A', 'B'}, {'B', 'F'}, {'G', 'H'}, {'D', 'F'}, {'M', 'N'}};

static const int n = sizeof(input) / sizeof(char);
static const int m = sizeof(crel) / sizeof(*crel);

std::map<char, int> count;
std::map<char, int> top;
std::map<int, char> suc;
std::map<int, int> next;
std::deque<char> D;
std::vector<char> buffer;

void alltopsorts(int k)
{
    if (D.empty())
        return;
    char base = D.back();

    do
    {
        char q = D.back();
        D.pop_back();

        buffer[k] = q;
        if (k == (n - 1))
        {
            for (std::vector<char>::const_iterator cit = buffer.begin();
                 cit != buffer.end(); ++cit)
                 std::cout << (*cit);
            std::cout << std::endl;
        }

        // erase relations beginning with q:
        int p = top[q];
        while (p >= 0)
        {
            char j = suc[p];
            count[j]--;
            if (!count[j])
                D.push_back(j);
            p = next[p];
        }

        alltopsorts(k + 1);

        // retrieve relations beginning with q:
        p = top[q];
        while (p >= 0)
        {
            char j = suc[p];
            if (!count[j])
                D.pop_back();
            count[j]++;
            p = next[p];
        }

        D.push_front(q);
    }
    while (D.back() != base);
}

int main()
{
    // Prepare
    std::fill_n(std::back_inserter(buffer), n, 0);
    for (int i = 0; i < n; i++) {
        count[input[i]] = 0;
        top[input[i]] = -1;
    }

    for (int i = 0; i < m; i++) {
        suc[i] = crel[i][1]; next[i] = top[crel[i][0]];
        top[crel[i][0]] = i; count[crel[i][1]]++;
    }

    for (std::map<char, int>::const_iterator cit = count.begin();
         cit != count.end(); ++cit)
        if (!(*cit).second)
            D.push_back((*cit).first);

    alltopsorts(0);
}
like image 171
Christian Ammer Avatar answered Sep 27 '22 23:09

Christian Ammer


There is no algorithm that can do better than O(N!) if there are N! answers. But I think there is a better way to understand the problem:

You can build a directed graph in this way: if A appears before B, then there is an edge from A to B. After building the graph, you just need to find all possible topological sort results. Still O(N!), but easier to code and better than your approach (don't have to generate invalid ordering).

like image 34
Mu Qiao Avatar answered Sep 28 '22 01:09

Mu Qiao