Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete max in O(1) and insert in O(logn)

Devise a data structure that supports the following operations on an n-element set:

  1. Insert in time O(lg n)
  2. Delete max in O(1) time. Note that delete max takes O(lg n) in a conventional heap.

My approach:

I decided that I will keep a seperate array, which will keep track of potential successors that will overtake the root(which is the largest value, in the max heap); once it is deleted. so if you delete the max, in O(1) time I will look in my array to find the next suitable successor(which I assume I will have intelligently set up).

Anybody have any better approaches? Try to stick to using heaps. This is NOT a Home-Work question, I am preparing for an interview, and its from Skiena's book.

like image 514
NoNameY0 Avatar asked Mar 05 '13 13:03

NoNameY0


1 Answers

Your requirements are very specific -- you're interested in only these two operations: insert O(lg n) and deleteMin O(1).

None of the known heap structures satisfies your specific constraints. Instead, the best structure (albeit theoretically -- Galactic structures as some would call them) is the Brodal Queue, which performs all the heap operations in O(1) worst-case time, except deleteMin which still takes O(lg n) worst-case time. All other known structures are no better.

Since you're only interested in only these two operations, you might be able to get away with a custom structure that only handles these two operations well, since you don't have to worry about decrease-key, merge, etc. that more general heap structures have to worry about.

Maintaining a dual structure, DL, containing:

  1. Sorted dynamic array, D, for from which you can do a find in O(lg n) time by binary search.
  2. Sorted linked list, L, from which you can do a deleteMin in O(1) time and an insert given a successor (or predecessor) reference in O(1) worst-case time.
  3. Sorted list T of elements recently inserted into L, but not yet synced with D.

Further, maintain a link between each entry in L and its corresponding entry in D or T and vice versa. Further, you'll need to maintain a bit for each entry of D indicating whether it's been deleted in L or not. In order to later on do a mass synchronization between D and L, you might also want to keep track of number of deletions, d, from L since the last sync. You might do a sync after the following invariant:

  • Invariant 1: d + |T| < n

is violated. That way sync time remains linear in n and you can always guarantee |T| and d are within manageable limits.

So, to insert a new element e into DL, you first do a find(e) in D for its successor (predecessor) and another search for its successor in T and grab the reference of the bigger successor (smaller predecessor) and use that to do an insert into L and add e to T and maintain references. After inserting e, we check if the Invariant 1 is violated. If so, we trigger a sync.

A sync essentially merges the contents of T and D, while removing elements marked as deleted into a new D structure. This can be done in time linear in |T| + |D| = O(n). In another linear time, references between L and D can be updated. The cost of this mass sync can be distributed (amortized) over the insertions and deletions. For this reason, these costs are only amortized costs.

To do deleteMin, you simply delete the head of L and use its backward link to D to mark its corresponding entry in D as deleted and increment d.

Observation 1: Note that deleteMin will always delete the minimum element since L is always up-to-date.

Observation 2: D is not always in sync with L. It might have some deleted elements (so marked) and some inserted elements to be found only in the T.

By Observation 2, we'll need to schedule a sync of D with L at some point, in order to maintain the O(lg n) find and insert costs. This is done whenever Invariant 1 is violated.

One pesky issue I've sort of glossed over is the following: how do you insert into T in logarithmic time, while still being able to do a linear scan during sync. Only a balanced tree of some sort can achieve this. I'd toyed a bit with the idea of limiting T's size to logarithmic size, but that would increase the amortized cost of the sync when enough number of insertions are done to trigger a sync. It seems that a sort of threaded balanced tree or even a skip list should help out here.

Feel free to critique and suggest improvements as I've not proved or implemented all the assertions here.

Update: As pointed out by @delnan, these costs are amortized. I've updated the description to highlight this fact and revised for clarity. Perhaps, with some trickery the amortization can be removed, but we'll end up with another Galactic structure in that case.

like image 140
rrufai Avatar answered Oct 14 '22 05:10

rrufai