Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

external sorting

in this web page:

http://web.eecs.utk.edu/~huangj/CS302S04/notes/external-sorting2.html

Merge the resulting runs together into successively bigger runs, until the file is sorted.

As I quoted, how can we merge the resulting runs together??? We don't hava that much memory.

like image 671
Josh Morrison Avatar asked Feb 24 '11 04:02

Josh Morrison


People also ask

What is external sorting with example?

External merge sort. One example of external sorting is the external merge sort algorithm, which is a K-way merge algorithm. It sorts chunks that each fit in RAM, then merges the sorted chunks together.

What is internal and external sorting?

Internal sorting: If the input data is such that it can be adjusted in the main memory at once, it is called internal sorting. External sorting: If the input data is such that it cannot be adjusted in the memory entirely at once, it needs to be stored in a hard disk, floppy disk, or any other storage device.

What is external sorting in DBMS?

External sorting is a technique in which the data is stored on the secondary memory, in which part by part data is loaded into the main memory and then sorting can be done over there. Then this sorted data will be stored in the intermediate files. Finally, these files will be merged to get a sorted data.

Which sorting method is an external sort?

Merge sort is the algorithm most commonly used for external sorting.


2 Answers

Imagine you have the numbers 1 - 9

9  7  2  6  3  4  8  5  1

And let's suppose that only 3 fit in memory at a time.

So you'd break them into chunks of 3 and sort each, storing each result in a separate file:

279
346
158

Now you'd open each of the three files as streams and read the first value from each:

2 3 1

Output the lowest value 1, and get the next value from that stream, now you have:

2 3 5

Output the next lowest value 2, and continue onwards until you've outputted the entire sorted list.

like image 96
Jesse Cohen Avatar answered Nov 16 '22 01:11

Jesse Cohen


If you process two runs A and B into some larger run C you can do this line-by-line generating progressively larger runs, but still only reading at most 2 lines at a time. Because the process is iterative and because you're working on streams of data rather than full cuts of data you don't need to worry about memory usage. On the other hand, disk access might make the whole process slow -- but it sure beats not being able to do the work in the first place.

like image 39
Mark Elliot Avatar answered Nov 16 '22 01:11

Mark Elliot