Write a program to find 100 largest numbers out of an array of 1 billion numbers

Tags:

sorting

I recently attended an interview where I was asked "write a program to find 100 largest numbers out of an array of 1 billion numbers."

I was only able to give a brute force solution which was to sort the array in O(nlogn) time complexity and take the last 100 numbers.

Arrays.sort(array);

The interviewer was looking for a better time complexity, I tried a couple of other solutions but failed to answer him. Is there a better time complexity solution?

203

asked Oct 07 '13 14:10

1 Answers

You can keep a priority queue of the 100 biggest numbers, iterate through the billion numbers, whenever you encounter a number greater than the smallest number in the queue (the head of the queue), remove the head of the queue and add the new number to the queue.

EDIT: As Dev noted, with a priority queue implemented with a heap, the complexity of insertion to queue is O(log N)

In the worst case you get billion*log₂(100) which is better than billion*log₂(billion)

In general, if you need the largest K numbers from a set of N numbers, the complexity is O(N log K) rather than O(N log N), this can be very significant when K is very small comparing to N.

EDIT2:

The expected time of this algorithm is pretty interesting, since in each iteration an insertion may or may not occur. The probability of the i'th number to be inserted to the queue is the probability of a random variable being larger than at least i-K random variables from the same distribution (the first k numbers are automatically added to the queue). We can use order statistics (see link) to calculate this probability. For example, lets assume the numbers were randomly selected uniformly from {0, 1}, the expected value of (i-K)th number (out of i numbers) is (i-k)/i, and chance of a random variable being larger than this value is 1-[(i-k)/i] = k/i.

Thus, the expected number of insertions is:

enter image description here

And the expected running time can be expressed as:

enter image description here

(k time to generate the queue with the first k elements, then n-k comparisons, and the expected number of insertions as described above, each takes an average log(k)/2 time)

Note that when N is very large comparing to K, this expression is a lot closer to n rather than N log K. This is somewhat intuitive, as in the case of the question, even after 10,000 iterations (which is very small comparing to a billion), the chance of a number to be inserted to the queue is very small.

158

answered Oct 19 '22 23:10

13 revs, 3 users 90%

Related questions
                            
                                What's the best way to model recurring events in a calendar application? [closed]
                            
                                Skip List vs. Binary Search Tree
                            
                                Is log(n!) = Θ(n·log(n))?
                            
                                Recursion or Iteration?
                            
                                Are there any cases where you would prefer a higher big-O time complexity algorithm over the lower one?
                            
                                What is the difference between LL and LR parsing?
                            
                                Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C
                            
                                How are ssl certificates verified?
                            
                                Javascript Array.sort implementation?
                            
                                Mapping two integers to one, in a unique and deterministic way
                            
                                Determine font color based on background color
                            
                                Finding all possible combinations of numbers to reach a given sum
                            
                                How to sort in-place using the merge sort algorithm?
                            
                                The most efficient way to implement an integer based power function pow(int, int)
                            
                                Representing and solving a maze given an image
                            
                                What is dynamic programming? [closed]
                            
                                What algorithm can be used for packing rectangles of different sizes into the smallest rectangle possible in a fairly optimal way?
                            
                                How to determine if a point is in a 2D triangle? [closed]
                            
                                How does the algorithm to color the song list in iTunes 11 work? [closed]
                            
                                What is a loop invariant?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Write a program to find 100 largest numbers out of an array of 1 billion numbers

Tags:

algorithm

sorting

userx

People also ask

1 Answers

13 revs, 3 users 90%

Recent Activity

Donate For Us