This is out of curiosity about the nsmallest and nlargest methods of heapq.py module in python.
I was reading it here in the docs.
The documentation doesn't say how it does so (nsmalles/nlargest) on any iterable.
This might be a stupid question, but can I assume that these methods internally create a heap of the iterable data structure (may be using 'heapify' method) and then return the n smallest/largest elements?
Just want to confirm my conclusion. thanks!
The algorithm for finding the n
smallest or largest items from an iterable with N
items is a bit tricky. You see, you don't create a size-N
min-heap to find the smallest items.
Instead, you make a smaller, size-n
max-heap with the first n
items, then do repeated pushpop
operations on it with the remaining items from the sequence. Once you're done, you pop the items from the heap and return them in reversed order.
This process take O(N log(n))
time (note the small n
) and of course only O(n)
space. If n
is much less than N
, it's much more efficient than sorting and slicing.
The heapq
module contains a pure-Python implementation of this algorithm, though when you import it, you may get a faster version of the code written in C instead (you can read the source for that too, but it's not quite as friendly unless you know the Python C API).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With