I am thinking about a problem I haven't encountered before and I'm trying to determine the most efficient algorithm to use.
I am iterating over two lists, using each pair of elements to calculate a value that I wish to sort on. My end goal is to obtain the top twenty results. I could store the results in a third list, sort that list by absolute value, and simply slice the top twenty, but that is not ideal.
Since these lists have the potential to become extremely large, I'd ideally like to only store the top twenty absolute values, evicting old values as a new top value is calculated.
What would be the most efficient way to implement this in python?
Take a look at heapq.nlargest
:
heapq.nlargest(n, iterable[, key])
Return a list with the n largest elements from the dataset defined by iterable. key, if provided, specifies a function of one argument that is used to extract a comparison key from each element in the iterable:
key=str.lower
Equivalent to:sorted(iterable, key=key, reverse=True)[:n]
You can use izip
to iterate the two lists in parallel, and build a generator to lazily do a calculation over them, then heapq.nlargest
to effectively keep the top n
:
from itertools import izip
import heapq
list_a = [1, 2, 3]
list_b = [3, 4, 7]
vals = (abs(a - b) for a, b in izip(list_a, list_b))
print heapq.nlargest(2, vals)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With