Given a set of 1 Trillion integers on hard disk, find the smallest 1 million of them. You can fit at most 1 million integers in memory at a time.
One approach is, take the first 1 million out of 1 trillion and sort the 1 million integers and store it back in the hard disk. In this way carry on the sorting for every group of 1 million integers and store it in the hard drive. Now groups of 1 million integers are sorted up to 1 trillion. Now compare the first element of all the sorted groups the minimum of them is the minimum of the 1 trillion. Store it as the first element in the memory. Next, take the second element from the group from which the smallest element came and then check it with all other groups first element. In this way repeat the procedure until the first 1 million is sorted and stored in the memory.
Is there a more optimal approach that I'm missing?
You can use counting sort. Counting sort (sometimes referred to as ultra sort or math sort) is a sorting algorithm which (like bucket sort) takes advantage of knowing the range of the numbers in the array to be sorted (array A).
You can do this efficiently in O(n log m) by using a heap. ( n = all the numbers, m = the size of the set of numbers you want to find ).
Go through the trillion numbers one at a time. For each new number do one of the following.
After you go through all the trillion entries then the resulting heap will have the 1 million smallest numbers.
Inserting and deleting from the heap is O(log m). The single pass through the heap is n. So, the algorithm is n*log (m)
How large are the integers? If they're just 32-bit values, I would simply make an array of 4 billion 64-bit counters on disk, and upon encountering x
in the input, increment the counter in position x
. In general this approach is extremely costly in space, but proportionally the cost is low whenever the range of possible element values is much smaller than the number of items to be sorted, and best of all it's O(n)
in time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With