Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order a list of characters, given a dictionary

Tags:

java

algorithm

I was asked this question in an interview. Suppose you have an ordered dictionary, and are given a list of unordered characters- how would you order these characters by precedence? This dictionary contains words where all the 26 characters are guaranteed to appear. However, note that the size of the dictionary might be anything. The dictionary could be as small as a few words and may not have separate sections for each character e.g., there might be no sections for words beginning with a; although a will appear as part of another word e.g., "bat".

The dictionary might be "ordered" (/sarcasm) as such "zebra', "apple", "cat", "crass", and if you're given the list {a, z, r}, the correct order would be {z, a, r}. Since "zebra" is before "apple" in the dictionary, we know z comes before a in the presedence. Since "apple" comes before "cat", we know a comes before c. Since "cat" comes before "crass", we know that a comes before r. This ordering leaves c and r with ambugious presendece, but since the list of letters was {a, z, r}, we know the solution to be {z, a, r}.

like image 474
OckhamsRazor Avatar asked Jan 18 '26 21:01

OckhamsRazor


2 Answers

Use a directed graph with 26 vertices, each vertex represents a character. An edge from vertex A to vertex B means in the alphabet B is in front of A.

The first step is to establish such a graph with only vertices but NO edges.

Second, you scan the input dictionary, word by word. And compare each word with the previous word. You should find exact one relationship for each word you scanned. So you add an edge in this graph. Assume the dictionary is correct, there should be no conflicts.

After you finished the dictionary, you output the alphabet by

  1. pick a random vertex, traverse its path until you find the one character that points to nothing. This is the first character in the alphabet. Output it and delete it from the graph.
  2. keep doing 1 until all vertices are deleted.

EDIT: To better explain this algorithm, let's run it on your sample input.

Input: {"zebra', "apple", "cat", "crass"}

Word 0 and word 1, we immediately know that z comes before a, so we make an edge a->z

Word 1 and word 2, we immediately know that a comes before c, so we make another edge c->a

Word 2 and Word 3, the first letters are the same "c", but the second ones differ, so we learn that a comes before r, so we have another edge r->a

Now all the words are read. Output the order by pick up a vertex randomly (say we pick c), then we have c->a->z in the graph. Output z and delete z from the graph (mark it as NULL). Now pick another one (say we pick r), then we find r->a in the graph. We output a and delete a from graph. Now we pick another one (say we pick c again), there's no path found, so we just output c and delete it. Now we pick the last one, r, there's no path again, so we output r and delete it. Since all vertices are deleted, the algorithm is done.

The output is z, a, c, r. The ordering of "c" and "r" are random since we don't really know their relationship from the input.

like image 196
HelloWorld Avatar answered Jan 21 '26 12:01

HelloWorld


From the fact that "zebra' < "apple" < "cat" < "crass", the most efficient way to derive the per-character relationships is to have a loop consider the Nth character of all words, where N is initially 0 yielding the relationships "z" < "a" < "c". That loop can recursively extract relationships for the (N + 1)th character for groups of words with the same prefix (i.e. text in positions <= N). Doing that for N == 1 with same-prefixed "cat" and "crass" yields the relationship "a" < "r".

We can represent known relationships in a 2 dimensional array of x < y truth values.

y\x a b c...r...z
a   -   N   N   Y
b     -
c   Y   -       Y
r   Y       -
z   N   N       -

The brute force approach is to iterate over all pairs of characters in the input list (i.e. {a, z, r} -> az, ar, zr) looking up the table for a<z, a<r, z<r: if this is ever false, then swap the characters and restart the whole she-bang. When you make it through the full process without having had to swap any more characters, the output is sorted consistently with the rules. This is a bit like doing a bubble sort.

To make this faster, we can be more proactive about populating cells in our table for implied relationships: for example, we know "z" < "a" < "c" and "a" < "r", so we deduce that "z" < "r". We could do this by running through the "naive" table above to find everything we know about each character (e.g. that z<a and z<c) - then run through what we know about a and c. To avoid excessively deep trees, you could just follow one level of indirection like this, then repeat until the table was stable.

like image 28
Tony Delroy Avatar answered Jan 21 '26 12:01

Tony Delroy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!