Prefix search in a radix tree/patricia trie

Tags:

I'm currently implementing a radix tree/patricia trie (whatever you want to call it). I want to use it for prefix searches in a dictionary on a severely underpowered piece of hardware. It's supposed to work more or less like auto-completion, i. e. showing a list of words that the typed prefix matches.

My implementation is based on this article, but the code therein doesn't include prefix searches, though the author says:

[...] Say you want to enumerate all the nodes that have keys with a common prefix "AB". You can perform a depth first search starting at that root, stopping whenever you encounter back edges.

But I don't see how that is supposed to work. For example, if I build a radix tree from these words:

illness
imaginary
imagination
imagine
imitation
immediate
immediately
immense
in

I will get the exact same "best match" for the prefixes "i" and "in" so that it seems difficult to me to gather all matching words just by traversing the tree from that best match.

Additionally, there is a radix tree implementation in Java that has an implemented prefix search in RadixTreeImpl.java. That code explicitly checks all nodes (starting from a certain node) for a prefix match - it actually compares bytes.

Can anyone point me to a detailed description on implementing a prefix search on radix trees? Is the algorithm used in the Java implementation the only way to do it?

381

asked Apr 27 '09 18:04

j66k

2 Answers

Think about what your trie encodes. At each node, you have the path that lead you to that node, so in your example, you start at Λ (that's a capital Lambda, this greek font kind of sucks) the root node corresponding to an empty string. Λ has children for each letter used, so in your data set, you have one branch, for "i".

Λ
Λ→"i"

At the "i" node, there are two children, one for "m" and one for "n". The next letter is "n", so you take that,

Λ→"i"→"n"

and since the only word that starts "i","n" in your data set is "in", there are no children from "n". That's a match.

Now, let's say the data set, instead of having "in", had "infindibulum". (What SF I'm referencing is left as an exercise.) You'd still get to the "n" node the same way, but then if the next letter you get is "q", you know the word doesn't appear in your data set at all, because there's no "q" branch. At that point, you say "okay, no match." (Maybe you then start adding the word, maybe not, depending on the application.)

But if the next letter is "f", you can keep going. You can short circuit that with a little craft, though: once you reach a node that represents a unique path, you can hang the whole string off that node. When you get to that node, you know that the rest of the string must be "findibulum", so you've used the prefix to match the whole string, and return it.

How your you use that? in a lot of non-UNIX command interpreters, like the old VAX DCL, you could use any unique prefix of a command. So, the equivalent of ls(1) was DIRECTORY, but no other command started with DIR, so you could type DIR and that was as good as doing the whole word. If you couldn't remember the correct command, you could type just 'D', and hit (I think) ESC; the DCL CLI would return you all the commands that started with D, which it could search extremely fast.

164

answered Oct 12 '22 11:10

Charlie Martin

It turns out the GNU extensions for the standard c++ lib includes a Patricia trie implementation. It's found under the policy-based data-structures extension. See http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html

answered Oct 12 '22 13:10

TG.

Related questions
                            
                                Why clang-tidy suggests to add [[nodiscard]] everywhere?
                            
                                Class can't have constants of own type inside?
                            
                                std::unordered_map gives error when inserting using emplace function
                            
                                Weird C++14 and C++17 difference in assignment operator
                            
                                Beginner C++ Array
                            
                                void(os << args). What does void mean in this context?
                            
                                With std::optional, what does it mean to "remove the move constructor from overload resolution"?
                            
                                Is std::ranges::size supposed to return an unsigned integer?
                            
                                How to measure performance in a C++ (MFC) application?
                            
                                Getting rid of the evil delay caused by ShellExecute
                            
                                Use Compiler/Linker for C++ Code Clean-up
                            
                                Anything I should know before converting a large C++ program from VS2005 to VS2008?
                            
                                Store 2D points for quick retrieval of those inside a rectangle
                            
                                Fast container for setting bits in a sparse domain, and iterating (C++)?
                            
                                Why won't cout << work with overloaded * operator?
                            
                                Is there any good example of http upload using WinInet c++ library
                            
                                How might I retrieve the version number of a Windows EXE or DLL?
                            
                                How to find a method in assembly code
                            
                                Using a DLL with .h header in C++
                            
                                Is there a gcc 4.2 warning similar to Visual Studio's regarding possible loss of data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Prefix search in a radix tree/patricia trie

Tags:

c++

algorithm

prefix

patricia-trie

j66k

People also ask

2 Answers

Charlie Martin

TG.

Recent Activity

Donate For Us