A 2-way merge is widely studied as a part of Mergesort algorithm. But I am interested to find out the best way one can perform an N-way merge?
Lets say, I have N
files which have sorted 1 million integers each. I have to merge them into 1 single file which will have those 100 million sorted integers.
Please keep in mind that use case for this problem is actually external sorting which is disk based. Therefore, in real scenarios there would be memory limitation as well. So a naive approach of merging 2 files at a time (99 times) won't work. Lets say we have only a small sliding window of memory available for each array.
I am not sure if there is already a standardized solution to this N-way merge. (Googling didn't tell me much).
But if you know if a good n-way merge algorithm, please post algo/link.
Time complexity: If we greatly increase the number of files (N
) to be merged, how would that affect the time complexity of your algorithm?
Thanks for your answers.
I haven't been asked this anywhere, but I felt this could be an interesting interview question. Therefore tagged.
Algorithm for Merge SortStep 1: Find the middle index of the array. Step 2: Divide the array from the middle. Step 4: Call merge sort for the second half of the array. Step 5: Merge the two sorted halves into a single sorted array.
Another name for an iterative 2-way merge sort is bottom up merge sort, while another name for recursive merge sort is top down merge sort. Generally, an optimized bottom up merge sort is slightly more efficient than an optimized top down merge sort.
Working of Merge sort Algorithm According to the merge sort, first divide the given array into two equal halves. Merge sort keeps dividing the list into equal parts until it cannot be further divided. As there are eight elements in the given array, so it is divided into two arrays of size 4.
How about the following idea:
Since adding elements to a priority queue can be done in logarithmic time, item 2 is O(N × log N). Since (almost all) iterations of the while loop adds an element, the whole while-loop is O(M × log N) where M is the total number of numbers to sort.
Assuming all files have a non-empty sequence of numbers, we have M > N and thus the whole algorithm should be O(M × log N).
Search for "Polyphase merge", check out classics - Donald Knuth & E.H.Friend.
Also, you may want to take a look at the proposed Smart Block Merging by Seyedafsari & Hasanzadeh, that, similarly to earlier suggestions, uses priority queues.
Another interesting reasonsing is In Place Merging Algorithm by Kim & Kutzner.
I also recommend this paper by Vitter: External memory algorithms and data structures: dealing with massive data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With