Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to make efficient pointer-based binary heap implementations?

Is it even possible to implement a binary heap using pointers rather than an array? I have searched around the internet (including SO) and no answer can be found.

The main problem here is that, how do you keep track of the last pointer? When you insert X into the heap, you place X at the last pointer and then bubble it up. Now, where does the last pointer point to?

And also, what happens when you want to remove the root? You exchange the root with the last element, and then bubble the new root down. Now, how do you know what's the new "last element" that you need when you remove root again?

like image 389
Derek Chiang Avatar asked Nov 01 '13 03:11

Derek Chiang


People also ask

What is the best complexity in building a heap?

2. What is the best case complexity in building a heap? Explanation: The best case complexity occurs in bottom-up construction when we have a sortes array given.

Which operation is not efficiently supported by heaps?

A heap of size n supports insert and extractMin in time O(log n) and min in constant time. Heaps do not support efficient (sub-linear) search or fast successor/predecessor operations. One additional important heap operation is decreaseKey(i, k).

How binary heap is usually implemented?

Heaps are commonly implemented with an array. Any binary tree can be stored in an array, but because a binary heap is always a complete binary tree, it can be stored compactly. No space is required for pointers; instead, the parent and children of each node can be found by arithmetic on array indices.

Why is a heap implemented using an array instead of pointers?

Advantages of storing a heap as an array rather than a pointer-based binary tree include the following. Lower memory usage (no need to store three pointers for every element of the heap). Easier memory management (just one object allocated, rather than N).


1 Answers

Solution 1: Maintain a pointer to the last node

In this approach a pointer to the last node is maintained, and parent pointers are required.

  • When inserting, starting at the last node navigate to the node below which a new last node will be inserted. Insert the new node and remember it as the last node. Move it up the heap as needed.

  • When removing, starting at the last node navigate to the second-to-last node. Remove the original last node and remember the the new last node just found. Move the original last node into the place of the deleted node and then move it up or down the heap as needed.

It is possible to navigate to the mentioned nodes in O(log(n)) time and O(1) space. Here is a description of the algorithms but the code is available below:

  • For insert: If the last node is a left child, proceed with inserting the new node as the right child of the parent. Otherwise... Start at the last node. Move up as long as the current node is a right child. If the root was not reached, move to the sibling node at the right (which necessarily exists). Then (whether or not the root was reached), move down to the left as long as possible. Proceed by inserting the new node as the left child of the current node.

  • For remove: If the last node is the root, proceed by removing the root. Otherwise... Start at the last node. Move up as long as the current node is a left child. If the root was not reached, move to the sibling left node (which necessarily exists). Then (whether or not the root was reached), move down to the right as long as possible. We have arrived at the second-to-last node.

However, there are some things to be careful about:

  • When removing, there are two special cases: when the last node is being removed (unlink the node and change the last node pointer), and when the second-to-last node is being removed (not really special but the possibility must be considered when replacing the deleted node with the last node).

  • When moving nodes up or down the heap, if the move affects the last node, the last-node pointer must be corrected.

Long ago I have made an implementation of this. In case it helps someone, here is the code. Algorithmically it should be correct (has also been subjected to stress testing with verification), but there is no warranty of course.

Solution 2: Reach the last node from the root

This solution requires maintaining the node count (but not parent pointers or the last node). The last (or second-to-last) node is found by navigating from the root towards it.

Assume the nodes are numbered starting from 1, as per the typical notation for binary heaps. Pick any valid node number and represent it in binary. Ignore the first (most significant) 1 bit. The remaining bits define the path from the root to that node; zero means left and one means right.

For example, to reach node 11 (=1011b), start at the root then go left (0), right (1), right (1).

This algorithm can be used in insert to find where to place the new node (follow the path for node node_count+1), and in remove to find the second-to-last-node (follow the path for node node_count-1).

This approach is used in libuv for timer management; see their implementation of the binary heap.

Usefulness of Pointer-based Binary Heaps

Many answers here and even literature say that an array-based implementation of a binary heap is strictly superior. However I contest that because there are situations where the use of an array is undesirable, typically because the upper size of the array is not known in advance and on-demand reallocations of an array are not deemed acceptable, for example due to latency or possibility of allocation failure.

The fact that libuv (a widely used event loop library) uses a binary heap with pointers only further speaks for this.

It is worth noting that the Linux kernel uses (pointer-based) red-black trees as a priority queue in a few cases, for example for CPU scheduling and timer management (for the same purpose as in libuv). I find it likely that changing these to use a pointer-based binary heap will improve performance.

Hybrid Approach

It is possible to combine Solution 1 and Solution 2 into a hybrid approach which dynamically picks either of the algorithms (for finding the last or second-to-last node), the one with a lower cost, measured in the number of edges that need to be traversed. Assume we want to navigate to node number N, and highest_bit(X) means the 0-based index of the highest-order bit in N (0 means the LSB).

  • The cost of navigating from the root (Solution 2) is highest_bit(N).

  • The cost of navigating from the previous node which is on the same level (Solution 1) is: 2 * (1 + highest_bit((N-1) xor N)).

Note that in the case of a level change the second equation will yield a wrong (too large) result, but in that case traversal from the root is more efficient anyway (for which the estimate is correct) and will be chosen, so there is no need for special handling.

Some CPUs have an instruction for highest_bit allowing very efficient implementation of these estimates. An alternative approach is to maintain the highest bit as a bit mask and do these calculation with bit masks instead of bit indices. Consider for example that 1 followed by N zeroes squared is equal to 1 followed by 2N zeroes).

In my testing it has turned out that Solution 1 is on average faster than Solution 2, and the hybrid approach appeared to have about the same average performance as Solution 2. Therefore the hybrid approach is only useful if one needs to minimize the worst-case time, which is (twice) better in Solution 2; since Solution 1 will in the worst case traverse the entire height of the tree up and then down.

Code for Solution 1

Note that the traversal code in insert is slightly different from the algorithm described above but still correct.

struct Node {
    Node *parent;
    Node *link[2];
};

struct Heap {
    Node *root;
    Node *last;
};

void init (Heap *h)
{
    h->root = NULL;
    h->last = NULL;
}

void insert (Heap *h, Node *node)
{
    // If the heap is empty, insert root node.
    if (h->root == NULL) {
        h->root = node;
        h->last = node;
        node->parent = NULL;
        node->link[0] = NULL;
        node->link[1] = NULL;
        return;
    }

    // We will be finding the node to insert below.

    // Start with the current last node and move up as long as the
    // parent exists and the current node is its right child.
    Node *cur = h->last;
    while (cur->parent != NULL && cur == cur->parent->link[1]) {
        cur = cur->parent;
    }

    if (cur->parent != NULL) {
        if (cur->parent->link[1] != NULL) {
            // The parent has a right child. Attach the new node to
            // the leftmost node of the parent's right subtree.
            cur = cur->parent->link[1];
            while (cur->link[0] != NULL) {
                cur = cur->link[0];
            }
        } else {
            // The parent has no right child. This can only happen when
            // the last node is a right child. The new node can become
            // the right child.
            cur = cur->parent;
        }
    } else {
        // We have reached the root. The new node will be at a new level,
        // the left child of the current leftmost node.
        while (cur->link[0] != NULL) {
            cur = cur->link[0];
        }
    }

    // This is the node below which we will insert. It has either no
    // children or only a left child.
    assert(cur->link[1] == NULL);

    // Insert the new node, which becomes the new last node.
    h->last = node;
    cur->link[cur->link[0] != NULL] = node;
    node->parent = cur;
    node->link[0] = NULL;
    node->link[1] = NULL;

    // Restore the heap property.
    while (node->parent != NULL && value(node->parent) > value(node)) {
        move_one_up(h, node);
    }
}

void remove (Heap *h, Node *node)
{
    // If this is the only node left, remove it.
    if (node->parent == NULL && node->link[0] == NULL && node->link[1] == NULL) {
        h->root = NULL;
        h->last = NULL;
        return;
    }

    // Locate the node before the last node.
    Node *cur = h->last;
    while (cur->parent != NULL && cur == cur->parent->link[0]) {
        cur = cur->parent;
    }
    if (cur->parent != NULL) {
        assert(cur->parent->link[0] != NULL);
        cur = cur->parent->link[0];
    }
    while (cur->link[1] != NULL) {
        cur = cur->link[1];
    }

    // Disconnect the last node.
    assert(h->last->parent != NULL);
    h->last->parent->link[h->last == h->last->parent->link[1]] = NULL;

    if (node == h->last) {
        // Deleting last, set new last.
        h->last = cur;
    } else {
        // Not deleting last, move last to node's place.
        Node *srcnode = h->last;
        replace_node(h, node, srcnode);
        // Set new last unless node=cur; in this case it stays the same.
        if (node != cur) {
            h->last = cur;
        }

        // Restore the heap property.
        if (srcnode->parent != NULL && value(srcnode) < value(srcnode->parent)) {
            do {
                move_one_up(h, srcnode);
            } while (srcnode->parent != NULL && value(srcnode) < value(srcnode->parent));
        } else {
            while (srcnode->link[0] != NULL || srcnode->link[1] != NULL) {
                bool side = srcnode->link[1] != NULL && value(srcnode->link[0]) >= value(srcnode->link[1]);
                if (value(srcnode) > value(srcnode->link[side])) {
                    move_one_up(h, srcnode->link[side]);
                } else {
                    break;
                }
            }
        }
    }
}

Two other functions are used: move_one_up moves a node one step up in the heap, and replace_node replaces moves an existing node (srcnode) into the place held by the node being deleted. Both work only by adjusting the links to and from the other nodes, there is no actual moving of data involved. These functions should not be hard to implement, and the mentioned link includes my implementations.

like image 174
Ambroz Bizjak Avatar answered Nov 15 '22 18:11

Ambroz Bizjak