Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nlargest and nsmallest ; heapq python

This is out of curiosity about the nsmallest and nlargest methods of heapq.py module in python.

I was reading it here in the docs.

The documentation doesn't say how it does so (nsmalles/nlargest) on any iterable.

This might be a stupid question, but can I assume that these methods internally create a heap of the iterable data structure (may be using 'heapify' method) and then return the n smallest/largest elements?

Just want to confirm my conclusion. thanks!

like image 205
pranav3688 Avatar asked Mar 15 '23 12:03

pranav3688


1 Answers

The algorithm for finding the n smallest or largest items from an iterable with N items is a bit tricky. You see, you don't create a size-N min-heap to find the smallest items.

Instead, you make a smaller, size-n max-heap with the first n items, then do repeated pushpop operations on it with the remaining items from the sequence. Once you're done, you pop the items from the heap and return them in reversed order.

This process take O(N log(n)) time (note the small n) and of course only O(n) space. If n is much less than N, it's much more efficient than sorting and slicing.

The heapq module contains a pure-Python implementation of this algorithm, though when you import it, you may get a faster version of the code written in C instead (you can read the source for that too, but it's not quite as friendly unless you know the Python C API).

like image 198
Blckknght Avatar answered May 10 '23 04:05

Blckknght