Given 100 GB integer Data on Hard Disk with RAM amounting to 2 GB, how to sort the integers with minimal disk operation. Here fetching one number from disk is considered as one disk operation( though in reality a block of data can be fetched).
We can use additional space on the disk for temporary storage and no need to consider the operations of cleaning up temporary spaces used.
You can use counting sort. Counting sort (sometimes referred to as ultra sort or math sort) is a sorting algorithm which (like bucket sort) takes advantage of knowing the range of the numbers in the array to be sorted (array A).
The easiest way to do this is to use external sorting. We divide our source file into temporary files of size equal to the size of the RAM and first sort these files.
As other people have noted, you can use an O(n) counting sort. However there are some additional problems you need to consider. We'll assume you're storing 32-bit integers, so 100GB ~ 27e9 ints.
If all the integers are the same, then it will occur ~27e9 times, which is larger than a 32 bit int. Therefore your counters will have to be 64-bit integers.
With 2GB of RAM, you can only store ~125e6 counters in RAM at any one time. If we can't make any assumptions about the distribution of integers, we would either have to:
I think the latter option is better. Since we need ~4e9 64-bit counters and can only store 2GB, we would need to run through the entire array ~16 times. The first option is clearly no good if we consider encountering a sequence of integers such as 0,1<<31,0. These counters would not be stored in RAM at the same time, and thus at least 2 HD writes are required.
Because of this, I think for the specific size of your problem (100GB), an N-way merge sort would be better, as this would only require reading the entire array log_2(100) ~ 8 times.
However, if the interviewer immediately changed the question to "10TB array, still 2GB of RAM", then the counting sort would easily win.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With