I know of two approaches to it. The first one: documentation is here
heapq.nlargest(n, iterable, key=None)
and the second traditional approach of using sorted
sorted(iterable, key=key, reverse=True)[:K]
The documentation mentions that these two are equivalent. However, I just wanted to know if the complexity of both are the same or if the first approach was implemented with lesser time complexity.
I remember from my algorithms course that obtaining top K elements from a list can be done in lesser order of operations compared to sorting the entire list and then going with picking the top K. Correct me if I am wrong
Edit: What standard python libs can perform this task in O(N) operations or what's the best complexity we can get from python?
Use the built-in sorted() function or the sort() method of a list. sorted() returns a new sorted list, and sort() sorts the original list itself. By switching ascending/descending order with the reverse parameter and selecting any number of elements by slicing, you can get the n-largest/smallest elements.
Use Python's min() and max() to find smallest and largest values in your data.
Time Complexity for Append in Python This function has constant time complexity i.e. O(1), because lists are randomly accessed so the last element can be reached in O(1) time that's why the time taken to add the new element at the end of the list is O(1).
Integers are unlimited in size and have no maximum value in Python.
I'm not a great mathematician, but I guess it should depend mostly on two things:
Generally you're right, and quick tests show the difference in numbers:
>>> timeit(stmt='sorted(i)[-100:]', setup='from random import seed,random;seed(666);i=[random() for _ in range(10000)]', number=1000)
2.086820379132405
>>> timeit(stmt='heapq.nlargest(n, i)', setup='from random import seed,random;import heapq;seed(666);n=100;i=[random() for _ in range(10000)]', number=1000)
0.5397011679597199
There is more fast algorithm QuickSelect that does not perform full sorting - just makes partition, and has average complexity about O(N)
.
Thanks to @Violet Red comment: numpy.partition
Complexity of heap approach is O(NlogK)
, of sort approach is O(NlogN)
.
C++ STL contains method partial_sort
that might execute faster that full sorting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With