Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

STLish lower_bound function for Radix/Patricia Trie

Lately I've been studying Patricia tries, and working with a really good C++ implementation which can be used as an STL Sorted Associative Container. Patricia tries differ from normal binary trees because leaf nodes have back pointers which point back to internal nodes. Nonetheless, it's possible to traverse a Patricia trie in alphabetical order by doing an in-order traversal, if you only visit internal nodes through leaf-node back pointers.

Which brings me to the question: is it possible to implement the STL lower_bound and upper_bound functions with a Patricia trie? The implementation I'm using does in fact, implement these functions, but they don't work as expected.

For example:

typedef uxn::patl::trie_set<std::string> trie;
trie ts;
ts.insert("LR");
ts.insert("BLQ");
ts.insert("HCDA");

trie::iterator it = ts.lower_bound("GG");
std::cout << *it << std::endl;

This outputs BLQ, when I would expect it to output HCDA. (An std::set, for example, would certainly output HCDA here.)

I emailed the developer who made this library, but never got a response. Regardless, I feel I have a pretty good understanding of how Patricia tries work, and I can't figure out how something like lower_bound would even be possible. The problem is that lower_bound seems to rely on the ability to lexicographically compare the two strings. Since "GG" doesn't exist in the tree, we'd need to find out which element is >= to GG. But Radix/Patricia tries don't use lexicographical comparison to move from node to node; rather each node stores a bit index which is used to perform a bit comparison on the search key. The result of the bit comparison tells you whether to move left or right. This makes it easy to find a particular prefix in the tree. But if the prefix doesn't exist in the tree, (as in the case of my search for "GG"), there doesn't seem to be any way, short of a lexicographical comparison, to get the lower_bound.

The fact that the C++ implementation I'm using doesn't seem to implement lower_bound properly confirms my suspicion that it may not be possible. Still, the fact that you can iterate over the tree in alphabetical order makes me think there might be a way to do it.

Does anyone have experience with this, or know if it is possible to implement a lower_bound functionality with a Patricia Trie?

like image 936
Channel72 Avatar asked Sep 20 '10 14:09

Channel72


1 Answers

Yes, it is possible. I have implemented a variant which does this, and D. J. Bernstein's page describes that as one of the fast operations.

http://cr.yp.to/critbit.html

In principle, you keep matching the prefix until you can't match any more, and then you go to the next value, and there's the node you're after.

like image 72
janm Avatar answered Oct 05 '22 01:10

janm