Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why is merge sort preferred over quick sort for sorting linked lists

I read the following in a forum :

Merge sort is very efficient for immutable datastructures like linked lists

and

Quick sort is typically faster than merge sort when the data is stored in memory. However, when the data set is huge and is stored on external devices such as a hard drive, merge sort is the clear winner in terms of speed. It minimizes the expensive reads of the external drive

and

when operating on linked lists, merge sort only requires a small constant amount of auxiliary storage

Can someone help me understand the above argument? why is merge sort preferred for sorting huge linked lists? and how does it minimize expensive reads to an external drive? basically i want to understand why one would choose merge sort for sorting a big linked list.

like image 838
maxpayne Avatar asked Mar 07 '11 17:03

maxpayne


People also ask

Why merge sort is preferred over quicksort in linked list?

As a result, the cost of quicksort rises. Whereas merge sort sequentially accesses data, therefore the need for random access is low. It might happen that the nodes in linked lists may not be present in nearby memory locations, therefore Merge Sort is preferred.

Is merge sort or quick sort suitable for sorting linked list?

Merge sort is often preferred for sorting a linked list. The slow random-access performance of a linked list makes some other algorithms (such as quicksort) perform poorly, and others (such as heapsort) completely impossible.

Which sorting is better for linked list?

Generally speaking, merge sort is best suited for linked lists. This is due to the nature of the algorithm requiring less random access of memory. Quicksort can be fast but unreliable. Quicksort for arrays is a better option than for linked lists; the lookup times of arrays are faster than for linked lists.


1 Answers

Quick sort works well for sorting in-place. In particular, most of the operations can be defined in terms of swapping pairs of elements in an array. To do that, however, you normally "walk" through the array with two pointers (or indexes, etc.) One starts at the beginning of the array and the other at the end. Both then work their way toward the middle (and you're done with a particular partition step when they meet). That's expensive with files, because files are oriented primarily toward reading in one direction, from beginning to end. Starting from the end and seeking backwards is usually relatively expensive.

At least in its simplest incarnation, merge sort is pretty much the opposite. The easy way to implement it only requires looking through the data in one direction, but involves breaking the data into two separate pieces, sorting the pieces, then merging them back together.

With a linked list, it's easy to take (for example) alternating elements in one linked list, and manipulate the links to create two linked lists from those same elements instead. With an array, rearranging elements so alternating elements go into separate arrays is easy if you're willing to create a copy as big as the original data, but otherwise rather more non-trivial.

Likewise, merging with arrays is easy if you merge elements from the source arrays into a new array with the data in order -- but to do it in place without creating a whole new copy of the data is a whole different story. With a linked list, merging elements together from two source lists into a single target list is trivial -- again, you just manipulate links, without copying elements.

As for using Quicksort to produce the sorted runs for an external merge sort, it does work, but it's (decidedly) sub-optimal as a rule. To optimize a merge-sort, you normally want to maximize the lengths of each sorted "run" as you produce it. If you simply read in the data that will fit in memory, Quicksort it and write it out, each run will be restricted to (a little less than) the size of the available memory.

You can do quite a bit better than that as a rule though. You start by reading in a block of data, but instead of using a Quicksort on it, you build a heap. Then, as you write each item out from the heap into the sorted "run" file, you read another item in from your input file. If it's larger than the item you just wrote to disk, you insert it into your existing heap, and repeat.

Items that are smaller (i.e., belong before items that have already been written) you keep separate, and build into a second heap. When (and only when) your first heap is empty, and the second heap has taken over all the memory, you quit writing items to the existing "run" file, and start on a new one.

Exactly how effective this will be depends on the initial order of the data. In the worst case (input sorted in inverse order) it does no good at all. In the best case (input already sorted) it lets you "sort" the data in a single run through the input. In an average case (input in random order) it lets you approximately double the length of each sorted run, which will typically improve speed by around 20-25% (though the percentage varies depending on how much larger your data is than the available memory).

like image 152
Jerry Coffin Avatar answered Sep 19 '22 08:09

Jerry Coffin