Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by and aggregate the values of a list of dictionaries in Python

I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys.

Example:

my_dataset = [  
    {
        'date': datetime.date(2013, 1, 1),
        'id': 99,
        'value1': 10,
        'value2': 10
    },
    {
        'date': datetime.date(2013, 1, 1),
        'id': 98,
        'value1': 10,
        'value2': 10
    },
    {
        'date': datetime.date(2013, 1, 2),
        'id' 99,
        'value1': 10,
        'value2': 10
    }
]

group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2'])

"""
Should return:
[
    {
        'date': datetime.date(2013, 1, 1),
        'value1': 20,
        'value2': 20
    },
    {
        'date': datetime.date(2013, 1, 2),
        'value1': 10,
        'value2': 10
    }
]
"""

I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like:

def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
    keyfunc = operator.itemgetter(group_by_key)
    dataset.sort(key=keyfunc)
    new_dataset = []
    for key, index in itertools.groupby(dataset, keyfunc):
        d = {group_by_key: key}
        d.update({k:sum([item[k] for item in index]) for k in sum_value_keys})
        new_dataset.append(d)
    return new_dataset
like image 999
Kyle Getrost Avatar asked Aug 05 '13 19:08

Kyle Getrost


People also ask

How do I sort a list of dictionaries by a value of the dictionary?

To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function. By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.

How do I convert a list of dictionaries in Python?

Since python dictionary is unordered, the output can be in any order. To convert a list to dictionary, we can use list comprehension and make a key:value pair of consecutive elements. Finally, typecase the list to dict type.

How do I concatenate a list of dictionaries in Python?

To merge multiple dictionaries, the most Pythonic way is to use dictionary comprehension {k:v for x in l for k,v in x. items()} to first iterate over all dictionaries in the list l and then iterate over all (key, value) pairs in each dictionary.

Is it possible to declare a list of dictionary and dictionary of list in Python?

In Python, you can have a List of Dictionaries. You already know that elements of the Python List could be objects of any type. In this tutorial, we will learn how to create a list of dictionaries, how to access them, how to append a dictionary to list and how to modify them.


2 Answers

You can use collections.Counter and collections.defaultdict.

Using a dict this can be done in O(N), while sorting requires O(NlogN) time.

from collections import defaultdict, Counter
def solve(dataset, group_by_key, sum_value_keys):
    dic = defaultdict(Counter)
    for item in dataset:
        key = item[group_by_key]
        vals = {k:item[k] for k in sum_value_keys}
        dic[key].update(vals)
    return dic
... 
>>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class 'collections.Counter'>,
{
 datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
 datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})

The advantage of Counter is that it'll automatically sum the values of similar keys.:

Example:

>>> c = Counter(**{'value1': 10, 'value2': 5})
>>> c.update({'value1': 7, 'value2': 3})
>>> c
Counter({'value1': 17, 'value2': 8})
like image 86
Ashwini Chaudhary Avatar answered Oct 09 '22 07:10

Ashwini Chaudhary


Thanks, I forgot about Counter. I still wanted to maintain the output format and sorting of my returned dataset, so here's what my final function looks like:

def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):

    container = defaultdict(Counter)

    for item in dataset:
        key = item[group_by_key]
        values = {k:item[k] for k in sum_value_keys}
        container[key].update(values)

    new_dataset = [
        dict([(group_by_key, item[0])] + item[1].items())
            for item in container.items()
    ]
    new_dataset.sort(key=lambda item: item[group_by_key])

    return new_dataset
like image 5
Kyle Getrost Avatar answered Oct 09 '22 07:10

Kyle Getrost