Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the lesser n elements of a list in Python

I need to get the lesser n numbers of a list in Python. I need this to be really fast because it's in a critical part for performance and it needs to be repeated a lot of times.

n is usually no greater than 10 and the list usually has around 20000 elements. The list is always different each time I call the function. Sorting can't be made in place.

Initially, I have written this function:

def mins(items, n):
    mins = [float('inf')]*n
    for item in items:
        for i, min in enumerate(mins):
            if item < min:
                mins.insert(i, item)
                mins.pop()
                break
    return mins

But this function can't beat a simple sorted(items)[:n] which sort the entire list. Here is my test:

from random import randint, random
import time

test_data = [randint(10, 50) + random() for i in range(20000)]

init = time.time()
mins = mins(test_data, 8)
print 'mins(items, n):', time.time() - init

init = time.time()
mins = sorted(test_data)[:8]
print 'sorted(items)[:n]:', time.time() - init

Results:

mins(items, n): 0.0632939338684
sorted(items)[:n]: 0.0231449604034

sorted()[:n] is three times faster. I believe this is because:

  1. insert() operation is costly because Python lists are not linked lists.
  2. sorted() is an optimized c function and mine is pure python.

Is there any way to beat sorted()[:n] ? Should I use a C extension, or Pyrex or Psyco or something like that?

Thanks in advance for your answers.

like image 922
Manuel Ceron Avatar asked Dec 08 '08 19:12

Manuel Ceron


3 Answers

You actually want a sorted sequence of mins.

mins = items[:n]
mins.sort()
for i in items[n:]:
    if i < mins[-1]: 
        mins.append(i)
        mins.sort()
        mins= mins[:n]

This runs much faster because you aren't even looking at mins unless it's provably got a value larger than the given item. About 1/10th the time of the original algorithm.

This ran in zero time on my Dell. I had to run it 10 times to get a measurable run time.

mins(items, n): 0.297000169754
sorted(items)[:n]: 0.109999895096
mins2(items)[:n]: 0.0309998989105

Using bisect.insort instead of append and sort may speed this up a hair further.

like image 171
S.Lott Avatar answered Oct 14 '22 16:10

S.Lott


import heapq

nlesser_items = heapq.nsmallest(n, items)

Here's a correct version of S.Lott's algorithm:

from bisect    import insort
from itertools import islice

def nsmallest_slott_bisect(n, iterable, insort=insort):
    it   = iter(iterable)
    mins = sorted(islice(it, n))
    for el in it:
        if el <= mins[-1]: #NOTE: equal sign is to preserve duplicates
            insort(mins, el)
            mins.pop()

    return mins

Performance:

$ python -mtimeit -s "import marshal; from nsmallest import nsmallest$label as nsmallest; items = marshal.load(open('items.marshal','rb')); n = 10"\
 "nsmallest(n, items)"
nsmallest_heapq
100 loops, best of 3: 12.9 msec per loop
nsmallest_slott_list
100 loops, best of 3: 4.37 msec per loop
nsmallest_slott_bisect
100 loops, best of 3: 3.95 msec per loop

nsmallest_slott_bisect is 3 times faster than heapq's nsmallest (for n=10, len(items)=20000). nsmallest_slott_list is only marginally slower. It is unclear why heapq's nsmallest is so slow; its algorithm is almost identical to the presented above (for small n).

like image 25
Sriram Avatar answered Oct 14 '22 17:10

Sriram


A possibility is to use the bisect module:

import bisect

def mins(items, n):
    mins = [float('inf')]*n
    for item in items:
        bisect.insort(mins, item)
        mins.pop()
    return mins

However, it's just a bit faster for me:

mins(items, n): 0.0892250537872
sorted(items)[:n]: 0.0990262031555

Using psyco does speed it up a bit more:

import bisect
import psyco
psyco.full()

def mins(items, n):
    mins = [float('inf')]*n
    for item in items:
        bisect.insort(mins, item)
        mins.pop()
    return mins

Result:

mins(items, n): 0.0431621074677
sorted(items)[:n]: 0.0859830379486
like image 2
fredreichbier Avatar answered Oct 14 '22 16:10

fredreichbier