Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize merge sort?

I've two files of 1 GB each containing only numbers in sorted order. Now I know how to read the contents of the files and sort them using merge sort algorithm and output it into an another file but what I'm interested is to how to do this only using 100MB buffer size (I do not worry about the scratch space). For example one way is to read 50 MB chunks from both the files and sort it and as it is sorted I could read a new element and continue the process till I reach the end of both files (Can anyone give me any idea how to implement this).

like image 529
0x0 Avatar asked Dec 12 '22 19:12

0x0


1 Answers

Sounds like you only need to merge the numbers in your files, not sort them, since they're already sorted in each file. The merge part of merge sort is this:

function merge(left,right)
    var list result
    while length(left) > 0 or length(right) > 0
        if length(left) > 0 and length(right) > 0
            if first(left) ≤ first(right)
                append first(left) to result
                left = rest(left)
            else
                append first(right) to result
                right = rest(right)
        else if length(left) > 0
            append left to result
            break             
        else if length(right) > 0
            append right to result
            break
    end while
    return result

Now you can just read the first 50 MB of numbers from both files in two buffers, apply the merge algorithm, then when one of the buffers has been exhausted (all its numbers analysed), read another 50 MB from the needed file. There's no need to sort anything.

You just need a condition that checks when one of your buffers is empty. When it is, read more from the file that buffer is associated with.

like image 146
IVlad Avatar answered Dec 28 '22 12:12

IVlad