Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add elements in a list of dictionaries

I have a very long list of dictionaries with string indices and integer values. Many of the keys are the same across the dictionaries, though not all. I want to generate one dictionary in which the keys are the union of the keys in the separate dictionaries and the values are the sum of all the values corresponding to that key in each of the dictionaries. (For example, the value for the key 'apple' in the combined dictionary will be the sum of the value of 'apple' in the first, plus the sum of the value of 'apple' in the second, etc.)

I have the following, but it's rather cumbersome and takes ages to execute. Is there a simpler way to achieve the same result?

comb_dict = {}  
for dictionary in list_dictionaries:  
    for key in dictionary:  
        comb_dict.setdefault(key, 0)  
        comb_dict[key] += dictionary[key]  
return comb_dict
like image 963
chimeracoder Avatar asked Jul 29 '10 19:07

chimeracoder


2 Answers

Here are some microbenchmarks which suggest f2 (see below) might be an improvement. f2 uses iteritems which allows you avoid an extra dict lookup in the inner loop:

import collections
import string
import random

def random_dict():
    n=random.randint(1,26)
    keys=list(string.letters)
    random.shuffle(keys)
    keys=keys[:n]
    values=[random.randint(1,100) for _ in range(n)]    
    return dict(zip(keys,values))

list_dictionaries=[random_dict() for x in xrange(100)]

def f1(list_dictionaries):
    comb_dict = {}  
    for dictionary in list_dictionaries:  
        for key in dictionary:  
            comb_dict.setdefault(key, 0)  
            comb_dict[key] += dictionary[key]  
    return comb_dict

def f2(list_dictionaries):    
    comb_dict = collections.defaultdict(int)
    for dictionary in list_dictionaries:  
        for key,value in dictionary.iteritems():  
            comb_dict[key] += value
    return comb_dict

def union( dict_list ):
    all_keys = set()
    for d in dict_list:
        for k in d:
            all_keys.add( k )
    for key in all_keys:
        yield key, sum( d.get(key,0) for d in dict_list)

def f3(list_dictionaries):
    return dict(union( list_dictionaries ))

Here are the results:

% python -mtimeit -s"import test" "test.f1(test.list_dictionaries)"
1000 loops, best of 3: 776 usec per loop
% python -mtimeit -s"import test" "test.f2(test.list_dictionaries)"
1000 loops, best of 3: 432 usec per loop    
% python -mtimeit -s"import test" "test.f3(test.list_dictionaries)"
100 loops, best of 3: 2.19 msec per loop
like image 138
unutbu Avatar answered Sep 22 '22 06:09

unutbu


Use collections.defaultdict instead.

http://docs.python.org/library/collections.html#defaultdict-objects

Slightly simpler.

like image 22
S.Lott Avatar answered Sep 22 '22 06:09

S.Lott