I have a collections.defaultdict(int) that I'm building to keep count of how many times a key shows up in a set of data. I later want to be able to sort it (obviously by turning it into a list first) in a descending fashion, ordered with the highest values first. I created my dictionary like the following:
adict = defaultdict(int)
later on I do a bunch of:
adict['someval'] += 1
adict['anotherval'] +=1
adict['someval'] += 1
Ideally after that I'd like to get a print out of:
someval => 2
anotherval => 1
Method #1 : Using sum() + sorted() + items() + lambda In this, firstly we sort the dictionary by keys for desired order using sorted() , then keys and values are extracted by items() functions that are returned as pair by lambda function. The sum function does the task of populating the tuple.
keys() This task can also be performed. In this, we get the keys value of dict and iterate over the list of keys and get the values of corresponding values and concatenate both key and value, and form a list of lists.
To correctly sort a dictionary by value with the sorted() method, you will have to do the following: pass the dictionary to the sorted() method as the first value. use the items() method on the dictionary to retrieve its keys and values. write a lambda function to get the values retrieved with the item() method.
A dict's keys, reverse-sorted by the corresponding values, can best be gotten as
sorted(adict, key=adict.get, reverse=True)
since you want key/value pairs, you could work on the items as all other answers suggest, or (to use the nifty adict.get
bound method instead of itemgetters or weird lambdas;-),
[(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)]
Edit: in terms of performance, there isn't much into it either way:
$ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6))' '[(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)]'
100000 loops, best of 3: 10.8 usec per loop
$ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6)); from operator import itemgetter' 'sorted(adict.iteritems(), key=itemgetter(1), reverse=True)'
100000 loops, best of 3: 9.66 usec per loop
$ python -mtimeit -s'adict=dict((x,x**2) for x in range(-5,6))' 'sorted(adict.iteritems(), key=lambda (k,v): v, reverse=True)'
100000 loops, best of 3: 11.5 usec per loop
So, the .get
-based solution is smack midway in performance between the two items
-based ones -- slightly slower than the itemgetter
, slightly faster than the lambda
. In "bottleneck" cases, where those microsecond fractions are crucial to you, by all means do focus on that. In normal cases, where this operation is only one step within some bigger task and a microsecond more or less matters little, focusing on the simplicity of the get
idiom is, however, also a reasonable alternative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With