In the .NET Framework in PresentationCore.dll, there is a generic PriorityQueue<T>
class whose code can be found here.
I wrote a short program to test the sorting, and the results weren't great:
using System; using System.Collections.Generic; using System.Diagnostics; using MS.Internal; namespace ConsoleTest { public static class ConsoleTest { public static void Main() { PriorityQueue<int> values = new PriorityQueue<int>(6, Comparer<int>.Default); Random random = new Random(88); for (int i = 0; i < 6; i++) values.Push(random.Next(0, 10000000)); int lastValue = int.MinValue; int temp; while (values.Count != 0) { temp = values.Top; values.Pop(); if (temp >= lastValue) lastValue = temp; else Console.WriteLine("found sorting error"); Console.WriteLine(temp); } Console.ReadLine(); } } }
Results:
2789658 3411390 4618917 6996709 found sorting error 6381637 9367782
There is a sorting error, and if the sample size is increased, the number of sorting errors increases somewhat proportionally.
Have I done something wrong? If not, where is the bug in the code of the PriorityQueue
class located exactly?
A priority queue in Java is a special type of queue wherein all the elements are ordered as per their natural ordering or based on a custom Comparator supplied at the time of creation.
Priority Queue in C# is a very useful data structure that has so many real-world applications.
The behavior can be reproduced using the initialization vector [0, 1, 2, 4, 5, 3]
. The result is:
[0, 1, 2, 4, 3, 5]
(we can see that 3 is incorrectly placed)
The Push
algorithm is correct. It builds a min-heap in a straightforward way:
The resulting tree is:
0 / \ / \ 1 2 / \ / 4 5 3
The issue is with the Pop
method. It starts by considering the top node as a "gap" to fill (since we popped it):
* / \ / \ 1 2 / \ / 4 5 3
To fill it, it searches for the lowest immediate child (in this case: 1). It then moves the value up to fill the gap (and the child is now the new gap):
1 / \ / \ * 2 / \ / 4 5 3
It then does the exact same thing with the new gap, so the gap moves down again:
1 / \ / \ 4 2 / \ / * 5 3
When the gap has reached the bottom, the algorithm... takes the bottom-rightmost value of the tree and uses it to fill the gap:
1 / \ / \ 4 2 / \ / 3 5 *
Now that the gap is at the bottom-rightmost node, it decrements _count
to remove the gap from the tree:
1 / \ / \ 4 2 / \ 3 5
And we end up with... A broken heap.
To be perfectly honest, I don't understand what the author was trying to do, so I can't fix the existing code. At most, I can swap it with a working version (shamelessly copied from Wikipedia):
internal void Pop2() { if (_count > 0) { _count--; _heap[0] = _heap[_count]; Heapify(0); } } internal void Heapify(int i) { int left = (2 * i) + 1; int right = left + 1; int smallest = i; if (left <= _count && _comparer.Compare(_heap[left], _heap[smallest]) < 0) { smallest = left; } if (right <= _count && _comparer.Compare(_heap[right], _heap[smallest]) < 0) { smallest = right; } if (smallest != i) { var pivot = _heap[i]; _heap[i] = _heap[smallest]; _heap[smallest] = pivot; Heapify(smallest); } }
Main issue with that code is the recursive implementation, which will break if the number of elements is too large. I strongly recommend using an optimized thirdparty library instead.
Edit: I think I found out what is missing. After taking the bottom-rightmost node, the author just forgot to rebalance the heap:
internal void Pop() { Debug.Assert(_count != 0); if (_count > 1) { // Loop invariants: // // 1. parent is the index of a gap in the logical tree // 2. leftChild is // (a) the index of parent's left child if it has one, or // (b) a value >= _count if parent is a leaf node // int parent = 0; int leftChild = HeapLeftChild(parent); while (leftChild < _count) { int rightChild = HeapRightFromLeft(leftChild); int bestChild = (rightChild < _count && _comparer.Compare(_heap[rightChild], _heap[leftChild]) < 0) ? rightChild : leftChild; // Promote bestChild to fill the gap left by parent. _heap[parent] = _heap[bestChild]; // Restore invariants, i.e., let parent point to the gap. parent = bestChild; leftChild = HeapLeftChild(parent); } // Fill the last gap by moving the last (i.e., bottom-rightmost) node. _heap[parent] = _heap[_count - 1]; // FIX: Rebalance the heap int index = parent; var value = _heap[parent]; while (index > 0) { int parentIndex = HeapParent(index); if (_comparer.Compare(value, _heap[parentIndex]) < 0) { // value is a better match than the parent node so exchange // places to preserve the "heap" property. var pivot = _heap[index]; _heap[index] = _heap[parentIndex]; _heap[parentIndex] = pivot; index = parentIndex; } else { // Heap is balanced break; } } } _count--; }
Kevin Gosse's answer identifies the problem. Although his re-balancing of the heap will work, it's not necessary if you fix the fundamental problem in the original removal loop.
As he pointed out, the idea is to replace the item at the top of the heap with the lowest, right-most item, and then sift it down to the proper location. It's a simple modification of the original loop:
internal void Pop() { Debug.Assert(_count != 0); if (_count > 0) { --_count; // Logically, we're moving the last item (lowest, right-most) // to the root and then sifting it down. int ix = 0; while (ix < _count/2) { // find the smallest child int smallestChild = HeapLeftChild(ix); int rightChild = HeapRightFromLeft(smallestChild); if (rightChild < _count-1 && _comparer.Compare(_heap[rightChild], _heap[smallestChild]) < 0) { smallestChild = rightChild; } // If the item is less than or equal to the smallest child item, // then we're done. if (_comparer.Compare(_heap[_count], _heap[smallestChild]) <= 0) { break; } // Otherwise, move the child up _heap[ix] = _heap[smallestChild]; // and adjust the index ix = smallestChild; } // Place the item where it belongs _heap[ix] = _heap[_count]; // and clear the position it used to occupy _heap[_count] = default(T); } }
Note also that the code as written has a memory leak. This bit of code:
// Fill the last gap by moving the last (i.e., bottom-rightmost) node. _heap[parent] = _heap[_count - 1];
Does not clear the value from _heap[_count - 1]
. If the heap is storing reference types, then the references remain in the heap and cannot be garbage collected until the memory for the heap is garbage collected. I don't know where this heap is used, but if it's large and lives for any significant amount of time, it could cause excess memory consumption. The answer is to clear the item after it's copied:
_heap[_count - 1] = default(T);
My replacement code incorporates that fix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With