Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need external sort?

The main reason for external sort is that the data may be larger than the main memory we have.However,we are using virtual memory now, and the virtual memory will take care of swapping between main memory and disk.Why do we need to have external sort then?

like image 844
silverwen Avatar asked Jan 09 '23 22:01

silverwen


2 Answers

An external sort algorithm makes sorting large amounts of data efficient (even when the data does not fit into physical RAM).

While using an in-memory sorting algorithm and virtual memory satisfies the functional requirements for an external sort (that is, it will sort the data), it fails to achieve the non-functional requirement of being efficient. A good external sort minimises the amount of data read and written to external storage (and historically also seek times), and a general-purpose virtual memory implementation on top of a sort algorithm not designed for this will not be competitive with an algorithm designed to minimise IO.

like image 190
Paul Hankin Avatar answered Jan 18 '23 04:01

Paul Hankin


In addition to @Anonymous's answer that external sort is better optimized for less disk IO, sometimes using in-memory sort and using the virtual memory is infeasible, since the virtual memory space is smaller than the file's size.

For example, if you have a 32 bits system (there are still a lot of these), and you want to sort a 20 GB file, 32bits system allow you to have 2^32 ~= 4GB virtual addresses, but the file you are trying to sort cannot fit in.

This used to be a real issue when 64 bits systems were still not very common, and is still an issue today for old 32 bits systems and some embadded devices.


However, even for 64 bits system, as expained in previous answers, the external sort algorithm is more optimized for the nature of sorting, and will require significantly less disk IO than letting the OS "take care of things".

like image 27
amit Avatar answered Jan 18 '23 04:01

amit