Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trie vs red-black tree: which is better in space and time?

Tries and red-black trees are very efficient for storing strings.

Which has better time complexity? How about space complexity?

like image 537
xuh5156 Avatar asked Jan 18 '13 10:01

xuh5156


1 Answers

This depends on many factors, such as which strings you are storing and how the trie is represented.

In a red-black tree, you will need to do O(log n) string comparisons on each operation (insertion, deletion, lookup, etc.). The cost of a comparison is small if you compare two strings that don't have a common prefix, or if you compare two strings where on string is small. The worst-case for a comparison is when one string is a prefix of another, in which case all the characters have to be read. Consequently, if you wanted to look up a string of length L in a red-black tree of strings, in the worst case you would do O(L log n) work by making O(log n) comparisons, each of which look at all the characters from the input string. However, in the best case this would require only O(log n) time, which happens if the comparisons always immediately fail.

In terms of space usage, the red/black tree would need two pointers per node and one node per string. (The red/black bit can usually be packed into the low bits of a pointer). Thus the total space is 2n + (total length of all strings).

In a trie, insertions, deletions, and lookups take O(L) in the worst-case (if all characters of the input must be considered) and O(1) in the best case (if you fall off the trie early). This is faster than the red-black tree by a factor of O(log n), which could potentially be significant for large collections. However, the trie has worse locality, since there is a lot more pointer-chasing involved and no contiguous arrays of characters to scan.

In terms of memory usage, a trie with an alphabet of size k typically needs a total of kn pointers, where n is the number of nodes. This can be dramatically worse than the red/black tree if the alphabet size is large. However, that space overhead can be reduced by compressing the trie using the Patricia tree representation, using a more efficient data structure to store child pointers, or building a DAWG from the trie.

Hope this helps!

like image 81
templatetypedef Avatar answered Sep 27 '22 23:09

templatetypedef