I'm relatively new to python (using v3.x syntax) and would appreciate notes regarding complexity and performance of heapq vs. sorted.
I've already implemented a heapq based solution for a greedy 'find the best job schedule' algorithm. But then I've learned about the possibility of using 'sorted' together with operator.itemgetter() and reverse=True.
Sadly, I could not find any explanation on expected complexity and/or performance of 'sorted' vs. heapq.
The first step in heap sort is to build a min or max heap from the array data and then delete the root element recursively and heapify the heap until there is only one node present in the heap. Heapsort is an efficient algorithm and it performs faster than selection sort.
The heapq implements a min-heap sort algorithm suitable for use with Python's lists. A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents.
heapify only takes O(N) time.
To delete this root, all heap implementations have a O(log(n)) time complexity. For example the python heapq module implements a heap with an array, and all the time the first element of the array is the root of the heap.
If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted
function apart from it's implementation is pure python.
The heapq
is faster than sorted
in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.
The sorted
is faster if you will need to retrieve all elements in order later.
The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq
or sorted
will be faster here depends on the size of the initial array and portion you'll need to extract.
The nlargest()
and nsmallest()
functions of heapq
are most appropriate if you are trying to find a relatively small number of items. If you want to find simply single smallest or largest number , min() and max() are most suitable, because it's faster and uses sorted
and then slicing. If you are looking for the N smallest or largest items and N is small compared to the overall size of the collection, these functions provide superior performance. Although it's not necessary to use heapq in your code, it's just an interesting topic and a worthwhile subject of study.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With