Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging K- Sorted Lists using Priority Queue

I have been asked in my algorithm class to make a K-way merge algorithm which is of O(nlogk) After searching i found it could be done via making a k length priority queue and enqueuing it with the first element of each list. Extract the minimum, append it to result and enqueue from the list whose element has been extracted. I am confused about:

  1. How will it know when a particular list is exhausted, suppose a list has smaller elements than every other element in other lists?

  2. How will it know the element was of which list (if a structue is not used to define)?

  3. How is the time complexity O(nlogk)?

EDIT:

It would be a bit more helpful if someone can write down the algorithm step-wise, because all i have read it is in sentences and its hard to understand the way it is, if someone could write down the algorithm might be helpful to understand.

like image 538
Shaurya Chaudhuri Avatar asked Nov 19 '25 06:11

Shaurya Chaudhuri


2 Answers

Here's some Python 2 code that does the merging.

import heapq

def addtoheap(h, i, it):
    try:
        heapq.heappush(h, (next(it), i))
    except StopIteration:
        pass

def mergek(*lists):
    its = map(iter, lists)
    h = []
    for i, it in enumerate(its):
        addtoheap(h, i, it)
    while h:
        v, i = heapq.heappop(h)
        addtoheap(h, i, its[i])
        yield v

for x in mergek([1, 3, 5], [2, 4, 6], [7, 8, 9], [10]):
    print x

Why is it O(n log k)? Well for each value removed, there's a heap pop and possibly a heap push (both of which are O(log k)). Since we remove n elements, it's O(n log k).

like image 87
Paul Hankin Avatar answered Nov 20 '25 23:11

Paul Hankin


Instead of simply storing the first element of each list in the priority queue, wrap it in a structure like this;

struct wrapper
{
    int list_number;
    int element;
}

Then, when you are pushing an element onto the priority queue, just add the list number form where it came. This way, when the minimum element gets popped, you will know from which list you should push the next element to push on the queue by examining popped_element.list_number.

In order to find if your list is empty you should add a function empty to it that returns true if the list does not have any more elements and false otherwise. The function would be very easy to implement. Just check if the size is zero then the list is empty otherwise it has one or more elements.

From your question I assume that a binary heap is used to implement the priority queue. The insertion operation in a binary heap takes O(lg k) time and the extract-min operation also takes O(lg k) times where k is the the size of the heap (number of lists in your case). Now if the total number of elements you have is n, the total time to process all of them will be O(n lg k).

like image 22
digital_revenant Avatar answered Nov 20 '25 21:11

digital_revenant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!