I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys.
Example:
my_dataset = [
{
'date': datetime.date(2013, 1, 1),
'id': 99,
'value1': 10,
'value2': 10
},
{
'date': datetime.date(2013, 1, 1),
'id': 98,
'value1': 10,
'value2': 10
},
{
'date': datetime.date(2013, 1, 2),
'id' 99,
'value1': 10,
'value2': 10
}
]
group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2'])
"""
Should return:
[
{
'date': datetime.date(2013, 1, 1),
'value1': 20,
'value2': 20
},
{
'date': datetime.date(2013, 1, 2),
'value1': 10,
'value2': 10
}
]
"""
I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like:
def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
keyfunc = operator.itemgetter(group_by_key)
dataset.sort(key=keyfunc)
new_dataset = []
for key, index in itertools.groupby(dataset, keyfunc):
d = {group_by_key: key}
d.update({k:sum([item[k] for item in index]) for k in sum_value_keys})
new_dataset.append(d)
return new_dataset
To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function. By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.
Since python dictionary is unordered, the output can be in any order. To convert a list to dictionary, we can use list comprehension and make a key:value pair of consecutive elements. Finally, typecase the list to dict type.
To merge multiple dictionaries, the most Pythonic way is to use dictionary comprehension {k:v for x in l for k,v in x. items()} to first iterate over all dictionaries in the list l and then iterate over all (key, value) pairs in each dictionary.
In Python, you can have a List of Dictionaries. You already know that elements of the Python List could be objects of any type. In this tutorial, we will learn how to create a list of dictionaries, how to access them, how to append a dictionary to list and how to modify them.
You can use collections.Counter
and collections.defaultdict
.
Using a dict this can be done in O(N)
, while sorting requires O(NlogN)
time.
from collections import defaultdict, Counter
def solve(dataset, group_by_key, sum_value_keys):
dic = defaultdict(Counter)
for item in dataset:
key = item[group_by_key]
vals = {k:item[k] for k in sum_value_keys}
dic[key].update(vals)
return dic
...
>>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class 'collections.Counter'>,
{
datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})
The advantage of Counter
is that it'll automatically sum the values of similar keys.:
Example:
>>> c = Counter(**{'value1': 10, 'value2': 5})
>>> c.update({'value1': 7, 'value2': 3})
>>> c
Counter({'value1': 17, 'value2': 8})
Thanks, I forgot about Counter. I still wanted to maintain the output format and sorting of my returned dataset, so here's what my final function looks like:
def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
container = defaultdict(Counter)
for item in dataset:
key = item[group_by_key]
values = {k:item[k] for k in sum_value_keys}
container[key].update(values)
new_dataset = [
dict([(group_by_key, item[0])] + item[1].items())
for item in container.items()
]
new_dataset.sort(key=lambda item: item[group_by_key])
return new_dataset
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With