I've been always concerned with the space usage of a radix tree, but I didn't find any helpful discussions on this.
Now let's say we have a radix tree implementation same as linux radix-tree.c, which takes a integer and use every 6 bits to index the next position in tree. I can easily think of cases where radix tree's space usage is far more than binary search trees. Please correct me if I'm wrong:
Use cases: (0,1,1,1,1), (1,1,1,1,1), (2,1,1,1,1), ... (63,1,1,1,1).
Here just for convenience, I use (a,b,c,d,e) to represent a 30-bit integer key, with each element standing for a 6-bit value. a is MSB and e is LSB.
Radix Tree:
For this use case, radix tree will have a height of 5, and each key will take 4 separate nodes, because they are on different subtrees of root. So there will be ((5-1) * 64 + 1) = 257 nodes.
Each node contains 2^6 = 64 pointers, so it is going to use 257 * 64 * 4Byte = 65KB
Binary Search Tree
We only care how many keys are there. In this case it has 64 keys.
Assume each BST node uses 3 Pointers per Node, it is going to use 64 * 3 * 4Byte = 768 Bytes.
Comparison
Looks radix tree is very space inefficient. It uses ~100 times space than binary search tree given same number of nodes! I don't understand why it is used even in linux kernel.
Am I missing something? Thanks.
The space complexity of a binary search tree is O ( n ) O(n) O(n) in both the average and the worst cases.
In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent.
The Radix tree is a data structure based on what is called in computer science a binary tree.
A radix tree is a compressed version of a trie. In a trie, on each edge you write a single letter, while in a PATRICIA tree (or radix tree) you store whole words. And you need nine nodes.
You asked for space complexity, so let's work it out.
If we consider a non-null pointer at the leaf to be a value of interest, then it is not hard to prove by contradiction that the worst case is a fully populated tree with one value per leaf node.
If branching is N-way (in your use case 64) and height is H (in your use case 5), there are N^(H-1) leaf nodes in this tree, storing an equal number of values. The total number of nodes is
1 + N + N^2 + ... N^(H-1) = (N^H - 1) / (N-1)
So the storage requirement measured in pointers is N times this amount.
(N^H - 1) [N / (N-1)]
This yields a storage efficiency of
(N^H - 1) [N / (N-1)]
--------------------
N^(H-1)
This is the total number of pointers divided by the count of valid data pointers.
As N gets bigger, this approaches N. In your example use case, it's actually 65.01 (for N=64). So we can say the storage complexity is O(NV) where V is the number of data values to be stored.
Though we got here with first-principles analysis, it makes total sense. Storage for the leaf level of the complete tree dominates the rest by a factor of nearly N. The size of that storage is NV.
Of course the advantage of trees with enormous branching factors like this (and e.g. B-trees in databases) is that fewer node traversals are needed to get to the right leaf.
Moreover, when each traversal is a single array lookup as in the radix tree, you can't get much faster.
In your use case, a perfectly balanced binary search tree would require up to 30 comparisons with attendant branches flushing the pipeline. This compared to 5 array indexing operations could be much slower. Array indexing tends to be faster than comparison because it's non-branching code. But even if they are the same, the binary tree would only need 2^5=32 elements to cause the same amount of indexing work as a radix tree containing 2^30 elements.
To generalize this, a binary tree of 2^H elements will require the same lookup effort as an index tree capable of holding N^(H-1) elements if key comparsions and array index operations have the same cost.
As others have said, if the index bits for the top levels of the tree tend to a few common prefixes (i.e. they are the top bits of addresses of the same VM space), the worst case storage behavior of the radix tree does not occur.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With