Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

recommended data structure while designing something like a dictionary?

Is TRIE the most recommended data structure while designing something like a dictionary for storing words? Any other alternatives that improve either the time or memory performance?

I believe a hash may be good if there's no collision but then memory requirements start getting bad for overlapping words: over, overlap, overlaps, overlapped, overlapping all occupy exclusive storage while we could share space in trie.

EDIT: Thanks @Moron and to all of you for the very useful answers. I agree -- generating the hash key is O(n) and so is a TRIE search. However, for hash things can be worse with chaining adding to the time while for TRIE this will not happen. My concern remains that for every node in a TRIE I need to keep a pointer which may be blowing things if the dictionary size is small.

like image 701
Fanatic23 Avatar asked May 03 '26 01:05

Fanatic23


2 Answers

A trie has the following advantages over a Hash table:

  1. Looking up data in a trie is faster in the worst case, O(m) time, compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
  2. There are no collisions of different keys in a trie.
  3. Buckets in a trie which are analogous to hash table buckets that store key collisions are only necessary if a single key is associated with more than one value.
  4. There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
  5. A trie can provide an alphabetical ordering of the entries by key.

Tries have the following drawbacks:

  1. Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random access time is high compared to main memory.
  2. It is not easy to represent all keys as strings, such as floating point numbers - a straightforward encoding using the bitstring of their encoding leads to long chains and prefixes that are not particularly meaningful.

If the drawbacks are something that you can live with, I'd suggest going with the trie.

Source: Wikipedia: Trie#As a replacement of other data structures

like image 140
Vivin Paliath Avatar answered May 05 '26 19:05

Vivin Paliath


You can try considering Directed Acyclic Word graph which is basically a trie, but has better memory usage, and according to the wiki, for english, the memory consumption is much lower than a trie.

Time wise, it is like a trie and is likely better than hash. Not sure where you got the O(logn) time for hash. It should be O(n) for reasonable hashes, where n is the length of the word that is being searched.