Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multi-way merge vs 2-way merge

When we externally merge sort a large file, we split it into small ones, sort those, and then merge them back into a large sorted file.

When merging, we can either do many 2-way merge passes, or one multi-way merge.

I am wondering which approach is better? and why?

like image 729
KFL Avatar asked Aug 04 '12 06:08

KFL


People also ask

What is a multiway merge?

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in k sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two.

What is 2 way merge?

two-way merge An algorithm that merges two ordered files into one single sorted file. It may be viewed as a generalization of sorting by insertion, and was proposed by John von Neumann in 1945. A Dictionary of Computing. "two-way merge ."

Is 3-way merge sort better than 2 way merge sort?

The 3-way merge sort is similar to the two way merge sort. The only difference is the array here is divided into three parts till we get all one element arrays. The procedure of merging will also be similar to that of the merge sort but at a time three array elements are compared and inserted into the final array.

Is merge sort and 2 way merge sort same?

Another name for an iterative 2-way merge sort is bottom up merge sort, while another name for recursive merge sort is top down merge sort. Generally, an optimized bottom up merge sort is slightly more efficient than an optimized top down merge sort.


1 Answers

One multi-way merge is generally better. Consider three small files:

a1
a2
a3

and

b1
b2
b3

and finally

c1
c2
c3

If you do a merge with a and b, we're left with (say)

a1
b1
a2
b2
b3
a3

and

c1
c2
c3

A final merge would create the sorted list, but notice how in this final merge we have to visit the a and b items again. It's this re-merging that is wasteful in cascading two-way merges.

What you can do instead is a single multi-way merge. However, be careful how you do it. Specifically, avoid the naive double-loop that scans each cursor to see which has the minimum value. Use a min-heap instead. This will bring the complexity back down to O(n log n).

like image 143
phs Avatar answered Oct 05 '22 04:10

phs