I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys. Example: <pre class="prettyprint"><code>my_dataset = [ { 'date': datetime.date(2013, 1, 1), 'id': 99, 'value1': 10, 'value2': 10 }, { 'date': datetime.date(2013, 1, 1), 'id': 98, 'value1': 10, 'value2': 10 }, { 'date': datetime.date(2013, 1, 2), 'id' 99, 'value1': 10, 'value2': 10 } ] group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2']) """ Should return: [ { 'date': datetime.date(2013, 1, 1), 'value1': 20, 'value2': 20 }, { 'date': datetime.date(2013, 1, 2), 'value1': 10, 'value2': 10 } ] """ </code></pre> I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like: <pre class="prettyprint"><code>def group_and_sum_dataset(dataset, group_by_key, sum_value_keys): keyfunc = operator.itemgetter(group_by_key) dataset.sort(key=keyfunc) new_dataset = [] for key, index in itertools.groupby(dataset, keyfunc): d = {group_by_key: key} d.update({k:sum([item[k] for item in index]) for k in sum_value_keys}) new_dataset.append(d) return new_dataset </code></pre>

You can use <code>collections.Counter</code> and <code>collections.defaultdict</code>. Using a dict this can be done in <code>O(N)</code>, while sorting requires <code>O(NlogN)</code> time. <pre class="prettyprint"><code>from collections import defaultdict, Counter def solve(dataset, group_by_key, sum_value_keys): dic = defaultdict(Counter) for item in dataset: key = item[group_by_key] vals = {k:item[k] for k in sum_value_keys} dic[key].update(vals) return dic ... >>> d = solve(my_dataset, 'date', ['value1', 'value2']) >>> d defaultdict(<class 'collections.Counter'>, { datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}), datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20}) }) </code></pre> The advantage of <code>Counter</code> is that it'll automatically sum the values of similar keys.: Example: <pre class="prettyprint"><code>>>> c = Counter(**{'value1': 10, 'value2': 5}) >>> c.update({'value1': 7, 'value2': 3}) >>> c Counter({'value1': 17, 'value2': 8}) </code></pre>

Group by and aggregate the values of a list of dictionaries in Python

Tags:

python

dictionary

itertools

I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys.

Example:

my_dataset = [  
    {
        'date': datetime.date(2013, 1, 1),
        'id': 99,
        'value1': 10,
        'value2': 10
    },
    {
        'date': datetime.date(2013, 1, 1),
        'id': 98,
        'value1': 10,
        'value2': 10
    },
    {
        'date': datetime.date(2013, 1, 2),
        'id' 99,
        'value1': 10,
        'value2': 10
    }
]

group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2'])

"""
Should return:
[
    {
        'date': datetime.date(2013, 1, 1),
        'value1': 20,
        'value2': 20
    },
    {
        'date': datetime.date(2013, 1, 2),
        'value1': 10,
        'value2': 10
    }
]
"""

I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like:

def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
    keyfunc = operator.itemgetter(group_by_key)
    dataset.sort(key=keyfunc)
    new_dataset = []
    for key, index in itertools.groupby(dataset, keyfunc):
        d = {group_by_key: key}
        d.update({k:sum([item[k] for item in index]) for k in sum_value_keys})
        new_dataset.append(d)
    return new_dataset

999

asked Aug 05 '13 19:08

Kyle Getrost

2 Answers

You can use collections.Counter and collections.defaultdict.

Using a dict this can be done in O(N), while sorting requires O(NlogN) time.

from collections import defaultdict, Counter
def solve(dataset, group_by_key, sum_value_keys):
    dic = defaultdict(Counter)
    for item in dataset:
        key = item[group_by_key]
        vals = {k:item[k] for k in sum_value_keys}
        dic[key].update(vals)
    return dic
... 
>>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class 'collections.Counter'>,
{
 datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
 datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})

The advantage of Counter is that it'll automatically sum the values of similar keys.:

Example:

>>> c = Counter(**{'value1': 10, 'value2': 5})
>>> c.update({'value1': 7, 'value2': 3})
>>> c
Counter({'value1': 17, 'value2': 8})

answered Oct 09 '22 07:10

Ashwini Chaudhary

Thanks, I forgot about Counter. I still wanted to maintain the output format and sorting of my returned dataset, so here's what my final function looks like:

def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):

    container = defaultdict(Counter)

    for item in dataset:
        key = item[group_by_key]
        values = {k:item[k] for k in sum_value_keys}
        container[key].update(values)

    new_dataset = [
        dict([(group_by_key, item[0])] + item[1].items())
            for item in container.items()
    ]
    new_dataset.sort(key=lambda item: item[group_by_key])

    return new_dataset

answered Oct 09 '22 07:10

Kyle Getrost

Related questions
                            
                                Generating a JS-client based on a ASP.NET WebAPI Controller
                            
                                How to make a UITextView move up when keyboard is present [duplicate]
                            
                                How to Enable scroll for specific div and disable scroll for page
                            
                                When to use a HybridDictionary over other Dictionary types?
                            
                                Searching a map with upper bound and lower bound
                            
                                CreateProcess error=2, The system cannot find the file specified [Android studio]
                            
                                Who put that "Stack Overflow" thing in my VS 2013 caption?
                            
                                Why am I getting extra text nodes as child nodes of root node?
                            
                                UIPageViewController prevents my table view from scrolling to top when tapping the status bar
                            
                                Is it really worth mixing AngularJS and or KendoUI?
                            
                                How can I tell PHPStorm to refactor namespaces and class names?
                            
                                How to determine if NULL is contained in an array in Postgres?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Group by and aggregate the values of a list of dictionaries in Python

Tags:

python

dictionary

itertools

Kyle Getrost

People also ask

2 Answers

Ashwini Chaudhary

Kyle Getrost

Recent Activity

Donate For Us