One of my favourite data structures in college was the Trie. It's a great data structure for holding a large set of strings if the prefixes are shared. Lookups are also nice, since they are done at O(|length|) of the string regardless of how many strings are in the set.
By comparison, a balanced tree would be O(log N) in the number of set items, plus whatever you pay for comparisons. A hash table would involve the hash calculation, comparison, etc.
It is therefore surprising to me that there is no Trie implementation in the standard library of most languages.
The only reason I could come up with was the possibility that memory access costs make it too expensive. Rather than investigating O(log N) locations if doing a tree lookup, you are not doing O(|length|) different locations, with all the consequences. If the strings are long, this could end up being way too much.
So I'm wondering: how much is what I just described a concern? What do you do when you need to store a large set or map of strings?
I hadn't thought of this as an area of concern before, but now that you mention it, there are times when a standard Trie implementation might be handy. On the other hand, as far as I know, Tries are used by Python and Perl and other string-savvy languages that I use now.
Last I checked, which was ages ago, the BSD kernel code used Tries (Patricia Tries) in the code to select the best interface for sending packets. Looks like Wikipedia has some info.
You could just build two sample apps and see which one performs better. Memory access is cheap assuming you don't page fault. Then it's very expensive. For client application development, its almost always better to process than to access memory for this very reason. Modern processors are ridiculously fast, but cache misses still hurt.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With