Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the longest word in a trie?

I'm having trouble understanding the concept of a trie. From the "trie" wikipedia entry I have this picture: enter image description here

If I see this correctly, all leaf nodes in a trie will have the entire word spelled out and all parent nodes hold the characters leading up the the final leaf node. So, if I have a class called DigitalTreeNode defined by

public class DigitalTreeNode {
       public boolean isAWord;
       public String wordToHere; (compiles all the characters in a word together)
       public Map<String, DTN> children;
}

If I wanted to implement a method that returns the longest word in the trie would it simply involve finding the longest word at each leaf node? How would I implement a method such as:

public static String longestWord (DigitalTreeNode d);

I'm guessing it involves setting up a longest String variable, recursively going through each node and checking if it is a word, if it is a word and it's length is greater than the longest variable then longest = newWordLength . But, I'm not sure how the Map children fits in. How would I find the longest word in any trie using the method above?

like image 241
user1766888 Avatar asked Nov 02 '12 08:11

user1766888


People also ask

How do you find the longest string in trie?

The easiest way is to store somewhere a max_string , and each time you add a string to the trie, just compare the length of your new_string and the length of the max_string , and, if the new length is larger, then assign max_string = new_string .

What is the time complexity of trie?

The complexity of creating a trie is O(W*L) , where W is the number of words, and L is an average length of the word: you need to perform L lookups on the average for each of the W words in the set.

What is the time complexity to search if a given word is in a trie or not?

A trie is a data structure used for efficient retrieval of data associated with keys. If key is of length n, then using trie worst case time complexity for searching the record associated with this key is O(n).


1 Answers

The leaf nodes do not contain the entire string usually (though they can), many time in a trie, a leaf node just contains a '$' sign to indicate this is the end of the string.

To find the longest word in a trie you can use a BFS on the tree, to first find the "last" leaf. The "last leaf" is the last element that was popped out of the BFS queue (after it was popped the algorithm of BFS stopped with an empty queue).
To get the actual word from this leaf you will need to go up from the leaf back to the root. This thread discussed how it can be done.

This solution is O(|S| * n), where |S| is the average length of a string, and n is the number of string in the DS.

If you can manipulate the trie DS, I assume it can be done better (but it seems not to be the issue in this question)

Pseudo code:

findLongest(trie):
  //first do a BFS and find the "last node"
  queue <- []
  queue.add(trie.root)
  last <- nil
  map <- empty map
  while (not queue.empty()):
     curr <- queue.pop()
     for each son of curr:
        queue.add(son)
        map.put(son,curr) //marking curr as the parent of son
     last <- curr
  //in here, last indicate the leaf of the longest word
  //Now, go up the trie and find the actual path/string
  curr <- last
  str = ""
  while (curr != nil):
      str = curr + str //we go from end to start   
      curr = map.get(curr)
  return str
like image 176
amit Avatar answered Oct 15 '22 17:10

amit