Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Discuss the complexity of various python methods to obtain N largest elements from a list

I know of two approaches to it. The first one: documentation is here

heapq.nlargest(n, iterable, key=None)

and the second traditional approach of using sorted

sorted(iterable, key=key, reverse=True)[:K]

The documentation mentions that these two are equivalent. However, I just wanted to know if the complexity of both are the same or if the first approach was implemented with lesser time complexity.

I remember from my algorithms course that obtaining top K elements from a list can be done in lesser order of operations compared to sorting the entire list and then going with picking the top K. Correct me if I am wrong

Edit: What standard python libs can perform this task in O(N) operations or what's the best complexity we can get from python?

like image 523
router Avatar asked Aug 27 '17 11:08

router


People also ask

How do I find the largest N in a list in Python?

Use the built-in sorted() function or the sort() method of a list. sorted() returns a new sorted list, and sort() sorts the original list itself. By switching ascending/descending order with the reverse parameter and selecting any number of elements by slicing, you can get the n-largest/smallest elements.

How do I find the largest and smallest element in a list Python?

Use Python's min() and max() to find smallest and largest values in your data.

What is the time complexity of append in Python?

Time Complexity for Append in Python This function has constant time complexity i.e. O(1), because lists are randomly accessed so the last element can be reached in O(1) time that's why the time taken to add the new element at the end of the list is O(1).

What is the largest number in Python?

Integers are unlimited in size and have no maximum value in Python.


2 Answers

I'm not a great mathematician, but I guess it should depend mostly on two things:

  1. relation between K and length of an iterable
  2. relation between amount of python and cpython code executed.

Generally you're right, and quick tests show the difference in numbers:

>>> timeit(stmt='sorted(i)[-100:]', setup='from random import seed,random;seed(666);i=[random() for _ in range(10000)]', number=1000)
2.086820379132405
>>> timeit(stmt='heapq.nlargest(n, i)', setup='from random import seed,random;import heapq;seed(666);n=100;i=[random() for _ in range(10000)]', number=1000)
0.5397011679597199
like image 149
Violet Red Avatar answered Oct 22 '22 19:10

Violet Red


There is more fast algorithm QuickSelect that does not perform full sorting - just makes partition, and has average complexity about O(N).

Thanks to @Violet Red comment: numpy.partition

Complexity of heap approach is O(NlogK), of sort approach is O(NlogN).

C++ STL contains method partial_sort that might execute faster that full sorting.

like image 29
MBo Avatar answered Oct 22 '22 19:10

MBo