Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Sort On The Fly

I am thinking about a problem I haven't encountered before and I'm trying to determine the most efficient algorithm to use.

I am iterating over two lists, using each pair of elements to calculate a value that I wish to sort on. My end goal is to obtain the top twenty results. I could store the results in a third list, sort that list by absolute value, and simply slice the top twenty, but that is not ideal.

Since these lists have the potential to become extremely large, I'd ideally like to only store the top twenty absolute values, evicting old values as a new top value is calculated.

What would be the most efficient way to implement this in python?

like image 295
donopj2 Avatar asked Jun 25 '13 14:06

donopj2


2 Answers

Take a look at heapq.nlargest:

heapq.nlargest(n, iterable[, key])

Return a list with the n largest elements from the dataset defined by iterable. key, if provided, specifies a function of one argument that is used to extract a comparison key from each element in the iterable: key=str.lower Equivalent to: sorted(iterable, key=key, reverse=True)[:n]

like image 70
arshajii Avatar answered Oct 06 '22 14:10

arshajii


You can use izip to iterate the two lists in parallel, and build a generator to lazily do a calculation over them, then heapq.nlargest to effectively keep the top n:

from itertools import izip
import heapq

list_a = [1, 2, 3]
list_b = [3, 4, 7]

vals = (abs(a - b) for a, b in izip(list_a, list_b))
print heapq.nlargest(2, vals)
like image 40
Jon Clements Avatar answered Oct 06 '22 14:10

Jon Clements