One of my friend has been asked with a question
Retrieving the max top 100 numbers from one hundred million of numbers
in a recent job interview. Do you have any idea to come up with an efficient way to solve it?
Use Heap building the MAX heap will take O(N),then the top 100 max numbers will be in the top of the Heap, all you need is to get them out from the heap(100 X O(Log(N)).
100,000,000 (one hundred million) is the natural number following 99,999,999 and preceding 100,000,001.
Run them all through a min-heap of size 100: for each input number k
, replace the current min m
with max(k, m)
. Afterwards the heap holds the 100 largest inputs.
A search engine like Lucene can use this method, with refinements, to choose the most-relevant search answers.
Edit: I fail the interview -- I got the details wrong twice (after having done this before, in production). Here's code to check it; it's almost the same as Python's standard heapq.nlargest()
:
import heapq def funnel(n, numbers): if n == 0: return [] heap = numbers[:n] heapq.heapify(heap) for k in numbers[n:]: if heap[0] < k: heapq.heapreplace(heap, k) return heap >>> funnel(4, [3,1,4,1,5,9,2,6,5,3,5,8]) [5, 8, 6, 9]
Ok, here is a really stupid answer, but it is a valid one:
Reasoning:
It is an ok solution for some sort of one time operation. It would suck running it x times per second or something. But then, we need more context - as mclientk also had with his simple SQL statement - assuming 100 million numbersdo not exist in memory is a feasible question, because... they may come from a database and most of the time will, when talking about business relevant numbers.
As such, the question is really hard to answer - efficiency first has to be defined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With