Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding n largest differences between two lists

I have two lists old and new, with the same number of elements.

I'm trying to write an efficient function that takes n as a parameter, compares the elements of two lists at the same locations (by index), finds n largest differences, and returns the indices of those n elements.

I was thinking this would be best solved by a value-sorted dictionary, but one isn't available in Python (and I'm not aware of any libraries that offer it). Perhaps there's a better solution?

like image 314
max Avatar asked Feb 17 '12 05:02

max


2 Answers

Whenever you think "n largest", think heapq.

>>> import heapq
>>> import random
>>> l1 = [random.randrange(100) for _ in range(100)]
>>> l2 = [random.randrange(100) for _ in range(100)]
>>> heapq.nlargest(10, (((a - b), a, b) for a, b in zip(l1, l2)))
[(78, 99, 21), (75, 86, 11), (69, 90, 21), (69, 70, 1), (60, 86, 26), (55, 95, 40), (52, 56, 4), (48, 98, 50), (46, 80, 34), (44, 81, 37)]

This will find the x largest items in O(n log x) time, where n is the total number of items in the list; sorting does it in O(n log n) time.

It just occurred to me that the above doesn't actually do what you asked for. You want an index! Still very easy. I'll also use abs here in case you want the absolute value of the difference:

>>> heapq.nlargest(10, xrange(len(l1)), key=lambda i: abs(l1[i] - l2[i]))
[91, 3, 14, 27, 46, 67, 59, 39, 65, 36]
like image 173
senderle Avatar answered Nov 15 '22 05:11

senderle


Assuming the number of elements in the lists aren't huge, you could just difference all of them, sort, and pick the first n:

print sorted((abs(x-y) for x,y in zip(old, new)), reverse=True)[:n]

This would be O(k log k) where k is the length of your original lists.

If n is significantly smaller than k, the best idea would be to use the nlargest function provided by the heapq module:

import heapq
print heapq.nlargest(n, (abs(x-y) for x,y in zip(old, new))

This will be O(k log n) instead of O(k log k) which can be significant for k >> n. Also, if your lists are really big, you'd probably be better off using itertools.izip instead of the regular zip function.

like image 29
tzaman Avatar answered Nov 15 '22 04:11

tzaman