I have a very long list of dictionaries with string indices and integer values. Many of the keys are the same across the dictionaries, though not all. I want to generate one dictionary in which the keys are the union of the keys in the separate dictionaries and the values are the sum of all the values corresponding to that key in each of the dictionaries. (For example, the value for the key 'apple' in the combined dictionary will be the sum of the value of 'apple' in the first, plus the sum of the value of 'apple' in the second, etc.)
I have the following, but it's rather cumbersome and takes ages to execute. Is there a simpler way to achieve the same result?
comb_dict = {}
for dictionary in list_dictionaries:
for key in dictionary:
comb_dict.setdefault(key, 0)
comb_dict[key] += dictionary[key]
return comb_dict
Here are some microbenchmarks which suggest f2
(see below) might be an improvement. f2
uses iteritems
which allows you avoid an extra dict lookup in the inner loop:
import collections
import string
import random
def random_dict():
n=random.randint(1,26)
keys=list(string.letters)
random.shuffle(keys)
keys=keys[:n]
values=[random.randint(1,100) for _ in range(n)]
return dict(zip(keys,values))
list_dictionaries=[random_dict() for x in xrange(100)]
def f1(list_dictionaries):
comb_dict = {}
for dictionary in list_dictionaries:
for key in dictionary:
comb_dict.setdefault(key, 0)
comb_dict[key] += dictionary[key]
return comb_dict
def f2(list_dictionaries):
comb_dict = collections.defaultdict(int)
for dictionary in list_dictionaries:
for key,value in dictionary.iteritems():
comb_dict[key] += value
return comb_dict
def union( dict_list ):
all_keys = set()
for d in dict_list:
for k in d:
all_keys.add( k )
for key in all_keys:
yield key, sum( d.get(key,0) for d in dict_list)
def f3(list_dictionaries):
return dict(union( list_dictionaries ))
Here are the results:
% python -mtimeit -s"import test" "test.f1(test.list_dictionaries)"
1000 loops, best of 3: 776 usec per loop
% python -mtimeit -s"import test" "test.f2(test.list_dictionaries)"
1000 loops, best of 3: 432 usec per loop
% python -mtimeit -s"import test" "test.f3(test.list_dictionaries)"
100 loops, best of 3: 2.19 msec per loop
Use collections.defaultdict
instead.
http://docs.python.org/library/collections.html#defaultdict-objects
Slightly simpler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With