I have written a C language library of AVL trees as general purpose sorted containers. For testing purposes, I would like to have a way to fill a tree so that it is maximally unbalanced, i.e., so that it has the maximum height for the number of nodes it contains.
AVL trees have the nice property that if, starting from the empty tree, you insert nodes in ascending (or descending) order, the tree is always exactly balanced (i.e., it has its minimum height for a given number of nodes). One sequence of integer keys that generates an exactly balanced AVL tree Tn for every number of nodes n, starting from the empty tree T0, is simply
I'm looking for a (hopefully simple) sequence of integer keys that, when inserted in the initially empty tree T0, generates AVL trees T0, ..., Tn that are all maximally unbalanced.
I would also be interested in a solution where only the last tree, Tn, is maximally unbalanced (the number of nodes n would be a parameter of the algorithm).
A solution satisfying the constraint
is preferrable, but not strictly required. A key range of 4 n instead of 2 n may be a reasonable target.
I've not been able to find anything on the Internet regarding the generation, by insertion, of AVL trees of maximal height. Of course, the sequence of generated trees I'm looking for will include all so-called Fibonacci-trees, which are the AVL trees of a given depth with the minimal number of nodes. Funnily, the English Wikipedia does not even mention Fibonacci trees (nor Fibonacci numbers!) in the article on AVL trees, while the German Wikipedia has a very good article completely dedicated to them. But I'm still in the dark regarding my question.
C language bit twiddling hacks are welcome.
AVL tree may become unbalanced, if a node is inserted in the left subtree of the left subtree. The tree then needs a right rotation. As depicted, the unbalanced node becomes the right child of its left child by performing a right rotation.
Get the balance factor (left subtree height – right subtree height) of the current node. If the balance factor is greater than 1, then the current node is unbalanced and we are either in the Left Left case or left Right case.
If there are n nodes in AVL tree, maximum height can't exceed 1.44*log2n.
An AVL tree does not create a perfectly balanced binary search trees. Instead it creates a height balanced binary search trees. A height balanced tree is either empty or the height of the left and right subtrees differ by no more than 1.
Basic Solution
Fibonacci trees have several properties that can be used to form a compact Fibonacci tree:
Without loss of generality, we will assume our Fibonacci tree has the following additional property:
Combining these properties, we find that the number of nodes in between a node of height n and its left and right children is equal to Fn-1 - 1, and we can use this fact to generate a compact Fibonacci tree:
static int fibs[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170};
void fibonacci_subtree(int root, int height, int *fib)
{
if (height == 1) {
insert_into_tree(root);
} else if (height == 2) {
insert_into_tree(root + *fib);
} else if (height >= 3) {
fibonacci_subtree(root - *fib, height - 2, fib - 2);
fibonacci_subtree(root + *fib, height - 1, fib - 1);
}
}
...
for (height = 1; height <= max_height; height++) {
fibonacci_subtree(0, height, fibs + max_height - 1);
}
This algorithm generates the minimum number of nodes possible for a given height, and it also produces the minimum possible range. You can shift the range around by making the root node something other than zero.
Compact Fill Algorithm
The basic solution only produces Fibonacci trees, which always have Fn+2 - 1 nodes. What if you want to generate an unbalanced tree with a different number of nodes while still minimizing the range?
In that case, you need to generate the next larger Fibonacci tree with a few modifications:
Here's one approach that still takes advantage of the recursive nature of the solution:
void fibonacci_subtree(int root, int height, int *fib, int num_gaps, bool prune_gaps)
{
if(height < 1)
return;
if(prune_gaps && height <= 2) {
if(!num_gaps) {
if(height == 1) {
insert_into_tree(root);
} else if(height == 2) {
insert_into_tree(root + *fib);
}
}
return;
}
if(height == 1) {
insert_into_tree(root);
} else {
int max_rr_gaps = *(fib - 1);
int rr_gaps = num_gaps > max_rr_gaps ? max_rr_gaps : num_gaps;
num_gaps -= rr_gaps;
int max_rl_gaps = *(fib - 2);
int rl_gaps = num_gaps > max_rl_gaps ? max_rl_gaps : num_gaps;
num_gaps -= rl_gaps;
int lr_gaps = num_gaps > max_rl_gaps ? max_rl_gaps : num_gaps;
num_gaps -= lr_gaps;
int ll_gaps = num_gaps;
fibonacci_subtree(root - *fib + lr_gaps, height - 2, fib - 2, lr_gaps + ll_gaps, prune_gaps);
fibonacci_subtree(root + *fib - rl_gaps, height - 1, fib - 1, rr_gaps + rl_gaps, prune_gaps);
}
}
The main loop is slightly more complicated to accommodate an arbitrary range of keys:
void compact_fill(int min_key, int max_key)
{
int num_nodes = max_key - min_key + 1;
int *fib = fibs;
int max_height = 0;
while(num_nodes > *(fib + 2) - 1) {
max_height++;
fib++;
}
int num_gaps = *(fib + 2) - 1 - num_nodes;
int natural_max = *(fib + 1) - 1;
int max_r_gaps = *(fib - 1);
int r_gaps = num_gaps > max_r_gaps ? max_r_gaps : num_gaps;
natural_max -= r_gaps;
int root_offset = max_key - natural_max;
for (int height = 1; height <= max_height; height++) {
fibonacci_subtree(root_offset, height, fibs + max_height - 1, num_gaps, height == max_height);
}
}
Closed Form Solution
If you look at the differences between every pair of words generated by the basic solution, you find that they alternate between two sequential elements of the Fibonacci sequence. This alternating pattern is defined by the Fibonacci word:
A Fibonacci word is a specific sequence of binary digits (or symbols from any two-letter alphabet). The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci numbers are formed by repeated addition.
It turns out there's a closed-form solution for the Fibonacci word:
static double phi = (1.0 + sqrt(5.0)) / 2.0;
bool fibWord(int n)
{
return 2 + floor(n * phi) - floor((n + 1) * phi);
}
You can use this closed-form solution to solve the problem using two nested loops:
// Used by the outer loop to calculate the first key of the inner loop
int outerNodeKey = 0;
int *outerFib = fibs + max_height - 1;
for(int height = 1; height <= max_height; height++) {
int innerNodeKey = outerNodeKey;
int *smallFib = fibs + max_height - height + 3; // Hat tip: @WalterTross
for(int n = fibs[height] - 1; n >= 0; n--) {
insert_into_tree(innerNodeKey);
// Use closed-form expression to pick between two elements of the Fibonacci sequence
bool smallSkip = 2 + floor(n * phi) - floor((n + 1) * phi);
innerNodeKey += smallSkip ? *smallFib : *(smallFib + 1);
}
if(height & 0x1) {
// When height is odd, add *outerFib.
outerNodeKey += *outerFib;
} else {
// Otherwise, backtrack and reduce the gap for next time.
outerNodeKey -= (*outerFib) << 1;
outerFib -= 2;
}
}
I have found this answer to my question, but I still hope that a simpler and, especially, more time-efficient and not less space-efficient algorithm can be found, hopefully with much better key range properties too.
The idea is to generate the Fibonacci trees up to a given height (which must be known beforehand), completely avoiding all tree rotations. Intermediate trees are kept AVL-balanced by the choice of the insertion order. Since they have the height of the lower of the two Fibonacci trees they link, they are all maximally unbalanced.
Insertions are done by virtually inserting all nodes in the sequence of Fibonacci trees, but, for each virtual tree, effectively inserting only the nodes which are subtrees of height 1. These are the "incremental" nodes between two consecutive Fibonacci trees.
Here is how it works in the case max_height = 5
:
insert 0
=> Fibonacci tree of height 1 (1 node):
0
insert 8
=> Fibonacci tree of height 2 (2 nodes):
0
8
insert -8
insert 12
=> Fibonacci tree of height 3 (4 nodes):
0
-8 8
12
insert -4
insert 4
insert 14
=> Fibonacci tree of height 4 (7 nodes):
0
-8 8
-4 4 12
14
insert -12
insert -2
insert 6
insert 10
insert 15
=> Fibonacci tree of height 5 (12 nodes):
0
-8 8
-12 -4 4 12
-2 6 10 14
15
And here is the code (simplified):
void fibonacci_subtree(int root, int height, int child_delta)
{
if (height == 1) {
insert_into_tree(root);
} else if (height == 2) {
insert_into_tree(root + child_delta);
} else if (height >= 3) {
fibonacci_subtree(root - child_delta, height - 2, child_delta >> 1);
fibonacci_subtree(root + child_delta, height - 1, child_delta >> 1);
}
}
...
for (height = 1; height <= max_height; height++) {
fibonacci_subtree(0, height, 1 << (max_height - 2));
}
UPDATE
The solution by godel9 solves the problem of the spread of the keys of this solution. Here is the output of godel9's code:
insert 0
=> Fibonacci tree of height 1 (1 node):
0
insert 3
=> Fibonacci tree of height 2 (2 nodes):
0
3
insert -3
insert 5
=> Fibonacci tree of height 3 (4 nodes):
0
-3 3
5
insert -2
insert 1
insert 6
=> Fibonacci tree of height 4 (7 nodes):
0
-3 3
-2 1 5
6
insert -4
insert -1
insert 2
insert 4
insert 7
=> Fibonacci tree of height 5 (12 nodes):
0
-3 3
-4 -2 1 5
-1 2 4 6
7
And here is the code in the version closest to mine (here with a static fibs
array):
static int fibs[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170 };
void fibonacci_subtree(int root, int height, int *fib)
{
if (height == 1) {
insert_into_tree(root);
} else if (height == 2) {
insert_into_tree(root + *fib);
} else if (height >= 3) {
fibonacci_subtree(root - *fib, height - 2, fib - 2);
fibonacci_subtree(root + *fib, height - 1, fib - 1);
}
}
...
for (height = 1; height <= max_height; height++) {
fibonacci_subtree(0, height, fibs + max_height - 1);
}
The final Fibonacci tree of height H has FH+2 - 1 nodes with no "holes" between the key values, and has kmax - kroot = FH+1 - 1. The key range can be positioned conveniently, if necessary, by offsetting the key value of the root, and optionally exchanging left and right in the algorithm.
What remains unsolved is the compact filling of any given key range with integer keys (while it is trivial for exactly balanced trees). With this algorithm, if you want to make a maximally unbalanced tree with n nodes (with integer keys), where n is not a Fibonacci number - 1, and you want the smalles possible range of keys, you can find the first height that can accomodate n nodes, and then run the algorithm for this height, but stopping when n nodes have been inserted. A number of "holes" will remain (in the worst case ca. n/φ ≅ n/1.618).
Contrary to what my intuitive understanding was when I wrote the introduction to this solution, the time-efficieny of this algorithm, with or without godel9's important improvement, is already optimal, since it is O(n) (so that when the insertions are included, it becomes O(n log n)). This is because the number of operations is proportional to the sum of the nodes of all Fibonacci trees from TF1 = T1 to TFH = Tn, i.e., N = Σh=1...H(Fh+2 - 1) = FH+4 - H - 1. But two consecutive Fibonacci numbers have the asymptotic ratio φ ≅ 1.618, the golden ratio, so that N/n ≅ φ2 ≅ 2.618. You can compare this with completely balanced binary trees, where very similar formulas apply, only with a "logarithm" of 2 instead of φ.
Although I doubt that it would be worthwile to get rid of the φ2 factor, considering the simplicity of the current code, it's still interesting to note the following: when you add the "incremental" nodes of any intermediate Fibonacci tree of height h, the difference between two consecutive keys of this "Fibonacci frontier" (my term) is either FH-h+3 or FH-h+4, in a peculiar alternating pattern. If we knew a generating rule for these differences, we could possibly fill the tree simply with two nested loops.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With