nlargest and nsmallest ; heapq python

Question

This is out of curiosity about the nsmallest and nlargest methods of heapq.py module in python.

I was reading it here in the docs.

The documentation doesn't say how it does so (nsmalles/nlargest) on any iterable.

This might be a stupid question, but can I assume that these methods internally create a heap of the iterable data structure (may be using 'heapify' method) and then return the n smallest/largest elements?

Just want to confirm my conclusion. thanks!

Blckknght · Accepted Answer

The algorithm for finding the n smallest or largest items from an iterable with N items is a bit tricky. You see, you don't create a size-N min-heap to find the smallest items.

Instead, you make a smaller, size-n max-heap with the first n items, then do repeated pushpop operations on it with the remaining items from the sequence. Once you're done, you pop the items from the heap and return them in reversed order.

This process take O(N log(n)) time (note the small n) and of course only O(n) space. If n is much less than N, it's much more efficient than sorting and slicing.

The heapq module contains a pure-Python implementation of this algorithm, though when you import it, you may get a faster version of the code written in C instead (you can read the source for that too, but it's not quite as friendly unless you know the Python C API).

nlargest and nsmallest ; heapq python

Tags:

python-2.7

heap

pranav3688

1 Answers

Blckknght

Recent Activity

Donate For Us

nlargest and nsmallest ; heapq python

Tags:

python-2.7

heap

pranav3688

1 Answers

Blckknght

Related questions

Recent Activity

Donate For Us