I've got some code that continuously extracts the max valued object from a heap and processes it. However, during processing of the max, other objects in the heap are affected, and may need to get deleted. Roughly:
vector<HeapEntry*> myHeap = vector<HeapEntry*>();
fillHeap(myHeap, someData);
make_heap(myHeap.begin(), myHeap.end());
while (!myHeap.empty())
{
HeapEntry* hp = myHeap.front();
HeapEntry* neighbor = hp->getNeighbor();
if (someCondition)
{
remove(myHeap, neighbor);
}
//more processing of hp
}
And the remove function:
void remove(vector<HeapEntry*> myHeap, HeapEntry* toRemove)
{
for (it = myHeap.begin(); it != myHeap.end(); it++)
{
if (*it == hp)
{
myHeap.erase(it);
break;
}
}
make_heap(myHeap.begin(), myHeap.end());
}
This runs and gives correct output. But it's slow as all hell: 2 minutes to process a 40kb file (the size of the heap is linear in the size of the file). Anyway it needs to be more efficient.
The remove function ends up getting called roughly n times, where n is the size of the heap. So having that linear search makes the entire algorithm O(n^2). I think that's the problem, and I believe this can run in O(n*log(n)).
My goal is to do the remove function in O(log(n)) time. Something like:
I'm not quite sure how to implement this (I'm hardly familiar with the stl heap). Does anyone know how to do this without doing the linear search?
Thanks for all your replies. I decided to go with an approach that does in fact delete the HeapEntries when they are no longer valid. Actually I tried adding a valid flag to HeapEntry, and I think this would have worked if it hadn't been for some other errors that I've since fixed. Anyhow, here's how I ended up solving it.
To reiterate, I needed the ability to delete an element from a heap given just a pointer to that element. The problem with that was, the pointer didn't tell me anything about the position of the element in the heap. So, I decided to store the position, keep it updated whenever elements moved, and write a function to delete from a heap given a position. Simply put, a heap is stored as an array, and the positions of the elements define the parent/child relationships. An element's parent should be at location floor((myPos - 1) / 2), and its children should be at positions 2*myPos+1 and 2*myPos+2. I realized I could write a remove(position) function, and while swapping elements to maintain the heap property, could swap their stored positions as well. Here's a link to the the result, and it sped up execution by a factor of 5 or 10:
https://github.com/yankrasny/CC-Heap-with-random-delete
The simple approach is not to remove the elements you think you want to remove. Instead, you'd maintain a priority queue to determine the next max element and a std::set<HeapEntry*>
of removed element. When getting the max element you check if it is in the set of removed elements and you just remove it from the heap, trying the next element. Depending on the number of potentially removed elements, you might want to also remove the element from the set of removed elements when you remove it from the heap.
Instead of removing elements from the heap, you just add them to the set of removed elements. This way the heap elements still stay logarithmic and you may have up to O(n log n) operations on the set of elements.
The other alternative would be the use of a node-based priority queue to efficiently find the position of a node in the heap. For example, Boost provides a Fibonacci-heap as part of the Boost Graph Library. You can track the position of an element there. However, node-based heaps tend to perform slower on practical problem sizes due to their overhead when rearranging elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With