Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fastest method of getting k smallest numbers in unsorted list of size N in python?

What is the fastest method to get the k smallest numbers in an unsorted list of size N using python?
Is it faster to sort the big list of numbers, and then get the k smallest numbers,
or to get the k smallest numbers by finding the minimum in the list k times, making sure u remove the found minimum from the search before the next search?

like image 608
jsky Avatar asked Nov 10 '15 04:11

jsky


2 Answers

You could use a heap queue; it can give you the K largest or smallest numbers out of a list of size N in O(NlogK) time.

The Python standard library includes the heapq module, complete with a heapq.nsmallest() function ready implemented:

import heapq

k_smallest = heapq.nsmallest(k, input_list)

Internally, this creates a heap of size K with the first K elements of the input list, then iterating over the remaining N-K elements, pushing each to the heap, then popping off the largest one. Such a push and pop takes log K time, making the overall operation O(NlogK).

The function also optimises the following edge cases:

  • If K is 1, the min() function is used instead, giving you a O(N) result.
  • If K >= N, the function uses sorting instead, since O(NlogN) would beat O(NlogK) in that case.

A better option is to use the introselect algorithm, which offers an O(n) option. The only implementation I am aware of is using the numpy.partition() function:

import numpy

# assuming you have a python list, you need to convert to a numpy array first
array = numpy.array(input_list)
# partition, slice back to the k smallest elements, convert back to a Python list
k_smallest = numpy.partition(array, k)[:k].tolist()

Apart from requiring installation of numpy, this also takes N memory (versus K for heapq), as a copy of the list is created for the partition.

If you only wanted indices, you can use, for either variant:

heapq.nsmallest(k, range(len(input_list)), key=input_list.__getitem__)  # O(NlogK)
numpy.argpartition(numpy.array(input_list), k)[:k].tolist()  # O(N)
like image 151
Martijn Pieters Avatar answered Oct 13 '22 12:10

Martijn Pieters


If the list of the kth smallest numbers doesn't need to be sorted, this can be done in O(n) time with a selection algorithm like introselect. The standard library doesn't come with one, but NumPy has numpy.partition for the job:

partitioned = numpy.partition(l, k)
# The subarray partitioned[:k] now contains the k smallest elements.
like image 34
user2357112 supports Monica Avatar answered Oct 13 '22 12:10

user2357112 supports Monica