Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort K sorted arrays, with MERGE SORT

I know that this question has been asked, and there is a very nice elegant solution using a min heap.

MY question is how would one do this using the merge function of merge sort.

You already have an array of sorted arrays. So you should be able to merge all of them into one array in O(nlog K) time, correct?

I just can't figure out how to do this!

Say I have

[ [5,6], [3,4], [1,2], [0] ]

Step 1: [ [3,4,5,6], [0,1,2] ]

Step2: [ [0,1,2,3,4,5,6] ]

Is there a simple way to do this? Is O(nlog K) theoretically achievable with mergesort?

like image 336
ordinary Avatar asked Sep 24 '13 11:09

ordinary


2 Answers

As others have said, using the min heap to hold the next items is the optimal way. It's called an N-way merge. Its complexity is O(n log k).

You can use a 2-way merge algorithm to sort k arrays. Perhaps the easiest way is to modify the standard merge sort so that it uses non-constant partition sizes. For example, imagine that you have 4 arrays with lengths 10, 8, 12, and 33. Each array is sorted. If you concatenated the arrays into one, you would have these partitions (the numbers are indexes into the array, not values):

[0-9][10-17][18-29][30-62]

The first pass of your merge sort would have starting indexes of 0 and 10. You would merge that into a new array, just as you would with the standard merge sort. The next pass would start at positions 18 and 30 in the second array. When you're done with the second pass, your output array contains:

[0-17][18-62]

Now your partitions start at 0 and 18. You merge those two into a single array and you're done.

The only real difference is that rather than starting with a partition size of 2 and doubling, you have non-constant partition sizes. As you make each pass, the new partition size is the sum of the sizes of the two partitions you used in the previous pass. This really is just a slight modification of the standard merge sort.

It will take log(k) passes to do the sort, and at each pass you look at all n items. The algorithm is O(n log k), but with a much higher constant than the N-way merge.

For implementation, build an array of integers that contains the starting indexes of each of your sub arrays. So in the example above you would have:

int[] partitions = [0, 10, 18, 30];
int numPartitions = 4;

Now you do your standard merge sort. But you select your partitions from the partitions array. So your merge would start with:

merge (inputArray, outputArray, part1Index, part2Index, outputStart)
{
    part1Start = partitions[part1Index];
    part2Start = partitions[part2Index];

    part1Length = part2Start - part1Start;
    part2Length = partitions[part2Index-1] - part2Start;

    // now merge part1 and part2 into the output array,
    // starting at outputStart
}

And your main loop would look something like:

while (numPartitions > 1)
{
    for (int p = 0; p < numPartitions; p += 2)
    {
        outputStart = partitions[p];
        merge(inputArray, outputArray, p, p+1, outputStart);
        // update partitions table
        partitions[p/2] = partitions[p] + partitions[p+1];
    }
    numPartitions /= 2;
}

That's the basic idea. You'll have to do some work to handle the dangling partition when the number is odd, but in general that's how it's done.

You can also do it by maintaining an array of arrays, and merging each two arrays into a new array, adding that to an output array of arrays. Lather, rinse, repeat.

like image 198
Jim Mischel Avatar answered Nov 15 '22 10:11

Jim Mischel


You should note that when we say complexity is O(n log k), we assume that n means TOTAL number of elements in ALL of k arrays, i.e. number of elements in a final merged array.

For example, if you want to merge k arrays that contain n elements each, total number of elements in final array will be nk. So complexity will be O(nk log k).

like image 33
Lirrik Avatar answered Nov 15 '22 10:11

Lirrik