Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarizing a dictionary of arrays in Python

I got the following dictionary:

mydict = {
  'foo': [1,19,2,3,24,52,2,6],          # sum: 109
  'bar': [50,5,9,7,66,3,2,44],          # sum: 186
  'another': [1,2,3,4,5,6,7,8],         # sum:  36
  'entry': [0,0,0,2,99,4,33,55],        # sum: 193
  'onemore': [21,22,23,24,25,26,27,28]  # sum: 196
}

I need to efficiently filter out and sort the top x entries by the sum of the array.

For example, the Top 3 sorted and filtered list for the example above would be

sorted_filtered_dict = {
  'onemore': [21,22,23,24,25,26,27,28], # sum: 196
  'entry': [0,0,0,2,99,4,33,55],        # sum: 193
  'bar': [50,5,9,7,66,3,2,44]           # sum: 186
}

I'm fairly new to Python, and tried it myself with chaining a sum and filter function on a lambda function, but struggled with the actual syntax.

like image 738
poezn Avatar asked Jan 22 '23 15:01

poezn


2 Answers

It's easy to do with a sort:

sorted(mydict.iteritems(), key=lambda tup: sum(tup[1]), reverse=True)[:3]

This is reasonable if the ratio is similar to this one (3 / 5). If it's larger, you'll want to avoid the sort (O(n log n)), since top 3 can be done in O(n). For instance, using heapq, the heap module:

heapq.nlargest(3, mydict.iteritems(), key=lambda tup: sum(tup[1]))

This is O(n + 3 log n), since assembly the initial heap is O(n) and re-heapifying is O(log n).

EDIT: If you're using Python 2.7 or later, you can easily convert to a OrderedDict (equivalent version for Python 2.4+):

OrderedDict(heapq.nlargest(3, mydict.iteritems(), key=lambda tup: sum(tup[1])))

OrderedDict has the same API as dict, but remembers insertion order.

like image 125
Matthew Flaschen Avatar answered Jan 29 '23 09:01

Matthew Flaschen


For such a small slice it's not worth using islice

sorted(mydict.iteritems(), key=lambda (k,v): sum(v), reverse=True)[:3]
like image 26
John La Rooy Avatar answered Jan 29 '23 09:01

John La Rooy