How to generate maximally unbalanced AVL trees

Tags:

I have written a C language library of AVL trees as general purpose sorted containers. For testing purposes, I would like to have a way to fill a tree so that it is maximally unbalanced, i.e., so that it has the maximum height for the number of nodes it contains.

AVL trees have the nice property that if, starting from the empty tree, you insert nodes in ascending (or descending) order, the tree is always exactly balanced (i.e., it has its minimum height for a given number of nodes). One sequence of integer keys that generates an exactly balanced AVL tree T_n for every number of nodes n, starting from the empty tree T₀, is simply

k₁ = 0
k_n+1 = k_n+1 , i.e., k_n = n-1

I'm looking for a (hopefully simple) sequence of integer keys that, when inserted in the initially empty tree T₀, generates AVL trees T₀, ..., T_n that are all maximally unbalanced.

I would also be interested in a solution where only the last tree, T_n, is maximally unbalanced (the number of nodes n would be a parameter of the algorithm).

A solution satisfying the constraint

max(k₁, ..., k_n) - min(k₁, ..., k_n) + 1 ≤ 2 n

is preferrable, but not strictly required. A key range of 4 n instead of 2 n may be a reasonable target.

I've not been able to find anything on the Internet regarding the generation, by insertion, of AVL trees of maximal height. Of course, the sequence of generated trees I'm looking for will include all so-called Fibonacci-trees, which are the AVL trees of a given depth with the minimal number of nodes. Funnily, the English Wikipedia does not even mention Fibonacci trees (nor Fibonacci numbers!) in the article on AVL trees, while the German Wikipedia has a very good article completely dedicated to them. But I'm still in the dark regarding my question.

C language bit twiddling hacks are welcome.

861

asked Oct 27 '13 19:10

Walter Tross

2 Answers

Basic Solution

Fibonacci trees have several properties that can be used to form a compact Fibonacci tree:

Every node in a Fibonacci tree is itself a Fibonacci tree.
The number of nodes in a Fibonacci tree of height n is equal to F_n+2 - 1.
The number of nodes in between a node and its left child is equal to the number of nodes in the node's left child's right child.
The number of nodes in between a node and its right child is equal to the number of nodes in the node's right child's left child.

Without loss of generality, we will assume our Fibonacci tree has the following additional property:

If a node has height n, then the left child has height n-2, and the right child has height n-1.

Combining these properties, we find that the number of nodes in between a node of height n and its left and right children is equal to F_n-1 - 1, and we can use this fact to generate a compact Fibonacci tree:

static int fibs[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170};

void fibonacci_subtree(int root, int height, int *fib)
{
    if (height == 1) {
        insert_into_tree(root);
    } else if (height == 2) {
        insert_into_tree(root + *fib);
    } else if (height >= 3) {
        fibonacci_subtree(root - *fib, height - 2, fib - 2);
        fibonacci_subtree(root + *fib, height - 1, fib - 1);
    }
}

...

for (height = 1; height <= max_height; height++) {
    fibonacci_subtree(0, height, fibs + max_height - 1);
}

This algorithm generates the minimum number of nodes possible for a given height, and it also produces the minimum possible range. You can shift the range around by making the root node something other than zero.

Compact Fill Algorithm

The basic solution only produces Fibonacci trees, which always have F_n+2 - 1 nodes. What if you want to generate an unbalanced tree with a different number of nodes while still minimizing the range?

In that case, you need to generate the next larger Fibonacci tree with a few modifications:

Some number of elements at the end of the sequence will not be inserted.
Those elements will create gaps, and the location of those gaps need to be tracked.
The difference between nodes needs to be reduced appropriately.

Here's one approach that still takes advantage of the recursive nature of the solution:

void fibonacci_subtree(int root, int height, int *fib, int num_gaps, bool prune_gaps)
{
    if(height < 1)
        return;
    if(prune_gaps && height <= 2) {
        if(!num_gaps) {
            if(height == 1) {
                insert_into_tree(root);
            } else if(height == 2) {
                insert_into_tree(root + *fib);
            }
        }
        return;
    }
    if(height == 1) {
        insert_into_tree(root);
    } else {
        int max_rr_gaps = *(fib - 1);
        int rr_gaps = num_gaps > max_rr_gaps ? max_rr_gaps : num_gaps;
        num_gaps -= rr_gaps;

        int max_rl_gaps = *(fib - 2);
        int rl_gaps = num_gaps > max_rl_gaps ? max_rl_gaps : num_gaps;
        num_gaps -= rl_gaps;

        int lr_gaps = num_gaps > max_rl_gaps ? max_rl_gaps : num_gaps;
        num_gaps -= lr_gaps;

        int ll_gaps = num_gaps;
        fibonacci_subtree(root - *fib + lr_gaps, height - 2, fib - 2, lr_gaps + ll_gaps, prune_gaps);
        fibonacci_subtree(root + *fib - rl_gaps, height - 1, fib - 1, rr_gaps + rl_gaps, prune_gaps);
    }
}

The main loop is slightly more complicated to accommodate an arbitrary range of keys:

void compact_fill(int min_key, int max_key)
{
    int num_nodes = max_key - min_key + 1;
    int *fib = fibs;
    int max_height = 0;

    while(num_nodes > *(fib + 2) - 1) {
        max_height++;
        fib++;
    }

    int num_gaps = *(fib + 2) - 1 - num_nodes;

    int natural_max = *(fib + 1) - 1;
    int max_r_gaps = *(fib - 1);
    int r_gaps = num_gaps > max_r_gaps ? max_r_gaps : num_gaps;
    natural_max -= r_gaps;

    int root_offset = max_key - natural_max;

    for (int height = 1; height <= max_height; height++) {
        fibonacci_subtree(root_offset, height, fibs + max_height - 1, num_gaps, height == max_height);
    }
}

Closed Form Solution

If you look at the differences between every pair of words generated by the basic solution, you find that they alternate between two sequential elements of the Fibonacci sequence. This alternating pattern is defined by the Fibonacci word:

A Fibonacci word is a specific sequence of binary digits (or symbols from any two-letter alphabet). The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci numbers are formed by repeated addition.

It turns out there's a closed-form solution for the Fibonacci word:

static double phi = (1.0 + sqrt(5.0)) / 2.0;

bool fibWord(int n)
{
    return 2 + floor(n * phi) - floor((n + 1) * phi);
}

You can use this closed-form solution to solve the problem using two nested loops:

// Used by the outer loop to calculate the first key of the inner loop
int outerNodeKey = 0;
int *outerFib = fibs + max_height - 1;

for(int height = 1; height <= max_height; height++) {

    int innerNodeKey = outerNodeKey;
    int *smallFib = fibs + max_height - height + 3; // Hat tip: @WalterTross

    for(int n = fibs[height] - 1; n >= 0; n--) {
        insert_into_tree(innerNodeKey);

        // Use closed-form expression to pick between two elements of the Fibonacci sequence
        bool smallSkip = 2 + floor(n * phi) - floor((n + 1) * phi);
        innerNodeKey += smallSkip ? *smallFib : *(smallFib + 1);
    }

    if(height & 0x1) {
        // When height is odd, add *outerFib.
        outerNodeKey += *outerFib;
    } else {
        // Otherwise, backtrack and reduce the gap for next time.
        outerNodeKey -= (*outerFib) << 1;
        outerFib -= 2;
    }
}

199

answered Oct 02 '22 17:10

godel9

I have found this answer to my question, but I still hope that a simpler and, especially, more time-efficient and not less space-efficient algorithm can be found, hopefully with much better key range properties too.

The idea is to generate the Fibonacci trees up to a given height (which must be known beforehand), completely avoiding all tree rotations. Intermediate trees are kept AVL-balanced by the choice of the insertion order. Since they have the height of the lower of the two Fibonacci trees they link, they are all maximally unbalanced.

Insertions are done by virtually inserting all nodes in the sequence of Fibonacci trees, but, for each virtual tree, effectively inserting only the nodes which are subtrees of height 1. These are the "incremental" nodes between two consecutive Fibonacci trees.

Here is how it works in the case max_height = 5:

insert 0
=> Fibonacci tree of height 1 (1 node):
                0
insert 8
=> Fibonacci tree of height 2 (2 nodes):
                0
                        8
insert -8
insert 12
=> Fibonacci tree of height 3 (4 nodes):
                0
       -8               8
                           12
insert -4
insert 4
insert 14
=> Fibonacci tree of height 4 (7 nodes):
                0
       -8               8
           -4       4      12
                             14
insert -12
insert -2
insert 6
insert 10
insert 15
=> Fibonacci tree of height 5 (12 nodes):
                0
       -8               8
  -12      -4       4      12
             -2       6  10  14
                              15

And here is the code (simplified):

void fibonacci_subtree(int root, int height, int child_delta)
{
   if (height == 1) {
      insert_into_tree(root);
   } else if (height == 2) {
      insert_into_tree(root + child_delta);
   } else if (height >= 3) {
      fibonacci_subtree(root - child_delta, height - 2, child_delta >> 1);
      fibonacci_subtree(root + child_delta, height - 1, child_delta >> 1);
   }
}

...
   for (height = 1; height <= max_height; height++) {
      fibonacci_subtree(0, height, 1 << (max_height - 2));
   }

UPDATE

The solution by godel9 solves the problem of the spread of the keys of this solution. Here is the output of godel9's code:

insert 0
=> Fibonacci tree of height 1 (1 node):
                0
insert 3
=> Fibonacci tree of height 2 (2 nodes):
                0
                        3
insert -3
insert 5
=> Fibonacci tree of height 3 (4 nodes):
                0
       -3               3
                            5
insert -2
insert 1
insert 6
=> Fibonacci tree of height 4 (7 nodes):
                0
       -3               3
           -2       1       5
                              6
insert -4
insert -1
insert 2
insert 4
insert 7
=> Fibonacci tree of height 5 (12 nodes):
                0
       -3               3
   -4      -2       1       5
             -1       2   4   6
                               7

And here is the code in the version closest to mine (here with a static fibs array):

static int fibs[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170 };

void fibonacci_subtree(int root, int height, int *fib)
{
   if (height == 1) {
      insert_into_tree(root);
   } else if (height == 2) {
      insert_into_tree(root + *fib);
   } else if (height >= 3) {
      fibonacci_subtree(root - *fib, height - 2, fib - 2);
      fibonacci_subtree(root + *fib, height - 1, fib - 1);
   }
}

...
   for (height = 1; height <= max_height; height++) {
      fibonacci_subtree(0, height, fibs + max_height - 1);
   }

The final Fibonacci tree of height H has F_H+2 - 1 nodes with no "holes" between the key values, and has k_max - k_root = F_H+1 - 1. The key range can be positioned conveniently, if necessary, by offsetting the key value of the root, and optionally exchanging left and right in the algorithm.

What remains unsolved is the compact filling of any given key range with integer keys (while it is trivial for exactly balanced trees). With this algorithm, if you want to make a maximally unbalanced tree with n nodes (with integer keys), where n is not a Fibonacci number - 1, and you want the smalles possible range of keys, you can find the first height that can accomodate n nodes, and then run the algorithm for this height, but stopping when n nodes have been inserted. A number of "holes" will remain (in the worst case ca. n/φ ≅ n/1.618).

Contrary to what my intuitive understanding was when I wrote the introduction to this solution, the time-efficieny of this algorithm, with or without godel9's important improvement, is already optimal, since it is O(n) (so that when the insertions are included, it becomes O(n log n)). This is because the number of operations is proportional to the sum of the nodes of all Fibonacci trees from T_F₁ = T₁ to T_{F_H} = T_n, i.e., N = Σ_h=1...H(F_h+2 - 1) = F_H+4 - H - 1. But two consecutive Fibonacci numbers have the asymptotic ratio φ ≅ 1.618, the golden ratio, so that N/n ≅ φ² ≅ 2.618. You can compare this with completely balanced binary trees, where very similar formulas apply, only with a "logarithm" of 2 instead of φ.

Although I doubt that it would be worthwile to get rid of the φ² factor, considering the simplicity of the current code, it's still interesting to note the following: when you add the "incremental" nodes of any intermediate Fibonacci tree of height h, the difference between two consecutive keys of this "Fibonacci frontier" (my term) is either F_H-h+3 or F_H-h+4, in a peculiar alternating pattern. If we knew a generating rule for these differences, we could possibly fill the tree simply with two nested loops.

answered Oct 02 '22 18:10

Walter Tross

Related questions
                            
                                Why do Javascript Libraries have local references to array methods (push, slice, etc..)?
                            
                                Difference between PowerShell Console and PowerShell ISE
                            
                                How do get only numeric recaptcha?
                            
                                Better way to filter a data frame with dplyr using OR?
                            
                                python ValueError: invalid literal for float()
                            
                                Docker container refuses to get killed after run command turns into a zombie
                            
                                Resource.Designer.cs under git
                            
                                Captive portal on android device
                            
                                Is it possible to make git clone recursive by default? [duplicate]
                            
                                How to estimate/determine surface normals and tangent planes at points of a depth image?
                            
                                remove deleted commit that references an issue
                            
                                Can we configure multer to not store files locally and only for using req.files.image.path?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to generate maximally unbalanced AVL trees

Tags:

Walter Tross

People also ask

2 Answers

godel9

Walter Tross

Recent Activity

Donate For Us