Given a dictionary, find all possible letter orderings

Question

I was recently asked the following interview question:

You have a dictionary page written in an alien language. Assume that the language is similar to English and is read/written from left to right. Also, the words are arranged in lexicographic order. For example the page could be: ADG, ADH, BCD, BCF, FM, FN
You have to give all lexicographic orderings possible of the character set present in the page.

My approach is as follows: A has higher precedence than B and G has higher precedence than H. Therefore we have the information about ordering for some characters:

A->B, B->F, G->H, D->F, M->N

The possible orderings can be ABDFGNHMC, ACBDFGNHMC, ... My approach was to use an array as position holder and generate all permutations to identify all valid orderings. The worst case time complexity for this is N! where N is the size of character set. Can we do better than the brute force approach.

Thanks in advance.

Christian Ammer · Accepted Answer

Donald Knuth has written the paper A Structured Program to Generate all Topological Sorting Arrangements. This paper was originally pupblished in 1974. The following quote from the paper brought me to a better understanding of the problem (in the text the relation i < j stands for "i precedes j"):

A natural way to solve this problem is to let x₁ be an element having no predecessors, then to erase all relations of the from x₁ < j and to let x₂ be an element ≠ x₁ with no predecessors in the system as it now exists, then to erase all relations of the from x₂ < j , etc. It is not difficult to verify that this method will always succeed unless there is an oriented cycle in the input. Moreover, in a sense it is the only way to proceed, since x₁ must be an element without predecessors, and x₂ must be without predecessors when all relations x₁ < j are deleted, etc. This observation leads naturally to an algorithm that finds all solutions to the topological sorting problem; it is a typical example of a "backtrack" procedure, where at every stage we consider a subproblem of the from "Find all ways to complete a given partial permutation x₁x₂...x_k to a topological sort x₁x₂...x_n ." The general method is to branch on all possible choices of x_k+1.
A central problem in backtrack applications is to find a suitable way to arrange the data so that it is easy to sequence through the possible choices of x_k+1 ; in this case we need an efficient way to discover the set of all elements ≠ {x₁,...,x_k} which have no predecessors other than x₁,...,x_k, and to maintain this knowledge efficiently as we move from one subproblem to another.

The paper includes a pseudocode for a efficient algorithm. The time complexity for each output is O(m+n), where m ist the number of input relations and n is the number of letters. I have written a C++ program, that implements the algorithm described in the paper – maintaining variable and function names –, which takes the letters and relations from your question as input. I hope that nobody complains about giving the program to this answer – because of the language-agnostic tag.

#include <iostream>
#include <deque>
#include <vector>
#include <iterator>
#include <map>

// Define Input
static const char input[] =
    { 'A', 'D', 'G', 'H', 'B', 'C', 'F', 'M', 'N' };
static const char crel[][2] =
    {{'A', 'B'}, {'B', 'F'}, {'G', 'H'}, {'D', 'F'}, {'M', 'N'}};

static const int n = sizeof(input) / sizeof(char);
static const int m = sizeof(crel) / sizeof(*crel);

std::map<char, int> count;
std::map<char, int> top;
std::map<int, char> suc;
std::map<int, int> next;
std::deque<char> D;
std::vector<char> buffer;

void alltopsorts(int k)
{
    if (D.empty())
        return;
    char base = D.back();

    do
    {
        char q = D.back();
        D.pop_back();

        buffer[k] = q;
        if (k == (n - 1))
        {
            for (std::vector<char>::const_iterator cit = buffer.begin();
                 cit != buffer.end(); ++cit)
                 std::cout << (*cit);
            std::cout << std::endl;
        }

        // erase relations beginning with q:
        int p = top[q];
        while (p >= 0)
        {
            char j = suc[p];
            count[j]--;
            if (!count[j])
                D.push_back(j);
            p = next[p];
        }

        alltopsorts(k + 1);

        // retrieve relations beginning with q:
        p = top[q];
        while (p >= 0)
        {
            char j = suc[p];
            if (!count[j])
                D.pop_back();
            count[j]++;
            p = next[p];
        }

        D.push_front(q);
    }
    while (D.back() != base);
}

int main()
{
    // Prepare
    std::fill_n(std::back_inserter(buffer), n, 0);
    for (int i = 0; i < n; i++) {
        count[input[i]] = 0;
        top[input[i]] = -1;
    }

    for (int i = 0; i < m; i++) {
        suc[i] = crel[i][1]; next[i] = top[crel[i][0]];
        top[crel[i][0]] = i; count[crel[i][1]]++;
    }

    for (std::map<char, int>::const_iterator cit = count.begin();
         cit != count.end(); ++cit)
        if (!(*cit).second)
            D.push_back((*cit).first);

    alltopsorts(0);
}

Mu Qiao · Answer

There is no algorithm that can do better than O(N!) if there are N! answers. But I think there is a better way to understand the problem:

You can build a directed graph in this way: if A appears before B, then there is an edge from A to B. After building the graph, you just need to find all possible topological sort results. Still O(N!), but easier to code and better than your approach (don't have to generate invalid ordering).

Given a dictionary, find all possible letter orderings

Tags:

language-agnostic

algorithm

user434345

Video Answer

2 Answers

Christian Ammer

Mu Qiao

Recent Activity

Donate For Us

Given a dictionary, find all possible letter orderings

Tags:

language-agnostic

algorithm

user434345

Video Answer

2 Answers

Christian Ammer

Mu Qiao

Related questions

Recent Activity

Donate For Us