Find strings which are prefixes of other strings

Question

This is an interview question. Given a number of strings find such strings, which are prefixes of others. For example, given strings = {"a", "aa", "ab", abb"} the result is {"a", "ab"}.

The simplest solution is just to sort the strings and check each pair of two subsequent strings if the 1st one is a prefix of the 2nd one. The running time of the algorithm is the running time of the sorting.

I guess there is another solution, which uses a trie, and has complexity O(N), where N is the number of strings. Could you suggest such an algorithm?

Martinsos · Accepted Answer

I have a following idea regarding Trie, complexity O(N): You start with empty Trie. You take words one by one, and add word to Trie. After you add a word (let's call it word Wi) to Trie, there are two cases to consider:

Wi is prefix of some of the words you added before. That statement is true if you didn't add any nodes to Trie while adding word Wi. In that case, Wi is prefix and part of our solution.
Some of the words added before are prefix of Wi. That statement is true if you passed through node that represents end of some word added before (let's cal that word Wj). In that case, Wj is prefix of Wi and part of our solution.

In more details (pseudocode):

for word in words
    add word to trie
    if size of trie did not change then   // first case
        add word to result
    if ending nodes found while adding word   // second case
        add words defined by those nodes to result
return result

Adding new word to Trie:

node = trie.root();
for letter in word
    if node.hasChild(letter) == false then   // if letter doesnt exist, add it
        node.addChild(letter)
    if letter is last_letter_of_word then   // if last letter of word, store that info
        node.setIsLastLetterOf(word)
    node = node.getChild(letter)    // move

While you are adding new word, you can also check if you passed through any nodes that represent last letters of other words. Complexity of algorithm that I described is O(N). Another important thing is that this way you can know how many times word Wi prefixes other words, which may be useful.

Example for {aab, aaba, aa}: Green nodes are nodes detected as case 1. Red nodes are nodes detected as case 2. Each column(trie) is one step. At the beginning trie is empty. Black arrows show which nodes we visited(added) in that step. Nodes that represent last letter of some word have that word written in parenthesess.

enter image description here

In step 1 we add word aab.
In step 2 we add word aaba, recognize one case 2 (word aab) and add word aab to result.
In step 3 we add word aa, recognize case 1 and add word aa to result.

At the end we have result = {aab, aa} which is correct.

amit · Answer

The original answer is correct for: is a string a a substring of b (misread).

Using a trie, you can simply add all strings to it in a first iteration, and in the 2nd iteration, start reading each word, let it be w. If you find a word that you finished your read, but did not reach the string terminator ($ usually), you reach some node v in the trie.
By doing a DFS from v, you can get all strings which w is prefix of them.

high level pseudo code:

t <- new trie
for each word w:
   t.add(w)
for each word w:
  node <- t.getLastNode(w)
  if node.val != $
     collection<- DFS(node) (excluding w itself)
     w is a prefix of each word in collection

Note: in order to optimize it, you might need to do some extra work: if a is prefix of b, and b is prefix of c, then a is prefix of c, so - when you do the DFS, if you reach some node that was already searched - just append its strings to the current prefix.
Still, since there could be quadric number of possibilities ("a", "aa", "aaa", .... ), getting all of them requires quadric time.

Original answer: finding if a is a substring of b:

The suggested solution runs in a quadric complexity, you will need to check each two pairs, giving you O(n* (n-1) * |S|).

You can build a suffix tree from the strings in the first iteration, and in the 2nd iteration check if each string is a non trivial entry (not itself) of another string.
This solution is O(n*|S|)

Find strings which are prefixes of other strings

Tags:

language-agnostic

algorithm

data-structures

trie

Michael

2 Answers

Martinsos

amit

Recent Activity

Donate For Us

Find strings which are prefixes of other strings

Tags:

language-agnostic

algorithm

data-structures

trie

Michael

2 Answers

Martinsos

amit

Related questions

Recent Activity

Donate For Us