If you look at the node definitions for a simple Trie and a simple K-ary tree, they look the same. (using C++ notation) <pre class="prettyprint"><code>template <size_t K> trieNode { trieNode *[K] }; template <size_t K> KaryNode { KaryNode *[K] }; </code></pre> At its simplest a K-ary tree has multiple children per node (2 for a binary tree) And a Trie has "multiple children per node" It seems that a K-ary tree makes it's choice of child based on comparison( < or > ) of Keys While a Trie makes it's choice of child based on (unary) equality of sub-spans of the Key Since neither data structure has made it into any standards, what would be best definition of each, and how would they be differentiated?

From the point of view of the shape of the data structure, a trie is clearly an N-ary tree, in the same way that a balanced binary search tree is a binary tree, the difference being in how the data structure manages the data. A binary search tree is a binary tree with additional constraint that the keys in the nodes are ordered, a balanced binary tree adds on top of that a constraint on the difference between the lengths of different branches. Similarly, a trie is a N-ary tree with additional constrains that determine how the keys are managed. <hr> Let's try a definition of what a trie is: A trie is an efficient data structure used to implement a dictionary in which keys are sequences lexicographically. The implementation uses an N-ary tree where the branching factor is the range of valid values for each element in the key sequence[1] and each node may or not hold a value, but always holds a subsequence of the key being stored [2]. For each node in the tree, the concatenation of the subsequences of keys stored in the nodes from the root to any given node represent the key for the value stored, if the node holds a value, and/or a common prefix for all nodes in this subtree. This layout of data allows for linear lookups on the size of the keys, and sharing the prefix allows for compact representations for many natural languages (like Spanish, where different forms of each verb differ only on the last few suffix characters). 1: That keys are sequences is an important premise, as the main advantage of the tries is that they split the key into different nodes along the path. 2: Depending on the implementation each node might maintain a single element (character) from the sequence or a combination.

Is a Trie a K-ary tree?

Tags:

c++

algorithm

data-structures

tree

trie

If you look at the node definitions for a simple Trie and a simple K-ary tree, they look the same.

(using C++ notation)

template <size_t K>
trieNode
{
    trieNode *[K]
};

template <size_t K>
KaryNode
{
    KaryNode *[K]
};

At its simplest a K-ary tree has multiple children per node (2 for a binary tree)

And a Trie has "multiple children per node"

It seems that a K-ary tree makes it's choice of child based on comparison( < or > ) of Keys

While a Trie makes it's choice of child based on (unary) equality of sub-spans of the Key

Since neither data structure has made it into any standards, what would be best definition of each, and how would they be differentiated?

323

asked Jan 10 '14 04:01

Glenn Teitelbaum

1 Answers

From the point of view of the shape of the data structure, a trie is clearly an N-ary tree, in the same way that a balanced binary search tree is a binary tree, the difference being in how the data structure manages the data.

A binary search tree is a binary tree with additional constraint that the keys in the nodes are ordered, a balanced binary tree adds on top of that a constraint on the difference between the lengths of different branches.

Similarly, a trie is a N-ary tree with additional constrains that determine how the keys are managed.

Let's try a definition of what a trie is:

A trie is an efficient data structure used to implement a dictionary in which keys are sequences lexicographically. The implementation uses an N-ary tree where the branching factor is the range of valid values for each element in the key sequence^[1] and each node may or not hold a value, but always holds a subsequence of the key being stored ^[2]. For each node in the tree, the concatenation of the subsequences of keys stored in the nodes from the root to any given node represent the key for the value stored, if the node holds a value, and/or a common prefix for all nodes in this subtree.

This layout of data allows for linear lookups on the size of the keys, and sharing the prefix allows for compact representations for many natural languages (like Spanish, where different forms of each verb differ only on the last few suffix characters).

¹: That keys are sequences is an important premise, as the main advantage of the tries is that they split the key into different nodes along the path.

²: Depending on the implementation each node might maintain a single element (character) from the sequence or a combination.

answered Sep 29 '22 00:09

David Rodríguez - dribeas

Related questions
                            
                                Type deduction given member function pointer with variadic templates
                            
                                Java equivalent of C++ copy assignment operator
                            
                                How to convert string to template type
                            
                                C++ implicit conversion constructor call
                            
                                Physical layout on disk of large cross-platform C++ project with many third party dependencies
                            
                                C# GetFunctionPointerForDelegate cdecl instead of stdcall
                            
                                Initializer-list for initializing 2D std::array member
                            
                                Is there in Qt forms onChange event?
                            
                                Lifetime of std::thread arguments
                            
                                Render QImage with OpenGL
                            
                                Is it possible to add files to a CMake generated solution folder in Visual Studio?
                            
                                is it possible to use function pointers this way?
                            
                                Why are "const Eigen::Matrix<>&" and "const Ref<Eigen::Matrix<> >" apparently incompatible?
                            
                                Auto correct algorithm
                            
                                How is moving a const returned object possible?
                            
                                This is not copy-initializing, or is it?
                            
                                How to portably force NAN * zero give zero in a particular expression without branching?
                            
                                How to use dynamic_cast with for_each
                            
                                Why the common template method definition doesn't match the template class specialization?
                            
                                Q_RETURN_ARG and QQmlComponent - component not ready

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With