What is the most pythonic way to group by multiple keys and summarize/average values of a list of dictionaries in Python please? Say I have a list of dictionaries as below:
input = [ {'dept': '001', 'sku': 'foo', 'transId': 'uniqueId1', 'qty': 100}, {'dept': '001', 'sku': 'bar', 'transId': 'uniqueId2', 'qty': 200}, {'dept': '001', 'sku': 'foo', 'transId': 'uniqueId3', 'qty': 300}, {'dept': '002', 'sku': 'baz', 'transId': 'uniqueId4', 'qty': 400}, {'dept': '002', 'sku': 'baz', 'transId': 'uniqueId5', 'qty': 500}, {'dept': '002', 'sku': 'qux', 'transId': 'uniqueId6', 'qty': 600}, {'dept': '003', 'sku': 'foo', 'transId': 'uniqueId7', 'qty': 700} ]
Desired output for aggregation:
output=[ {'dept': '001', 'sku': 'foo', 'qty': 400}, {'dept': '001', 'sku': 'bar', 'qty': 200}, {'dept': '002', 'sku': 'baz', 'qty': 900}, {'dept': '002', 'sku': 'qux', 'qty': 600}, {'dept': '003', 'sku': 'foo', 'qty': 700} ]
or average:
output=[ {'dept': '001', 'sku': 'foo', 'avg': 200}, {'dept': '001', 'sku': 'bar', 'avg': 200}, {'dept': '002', 'sku': 'baz', 'avg': 450}, {'dept': '002', 'sku': 'qux', 'avg': 600}, {'dept': '003', 'sku': 'foo', 'avg': 700} ]
I have found this: Group by and aggregate the values of a list of dictionaries in Python but it doesn't seem to give me what I want.
In Python, we can find the average of a list by simply using the sum() and len() function.
You can do this by iterating over the dictionary and filtering out zero values first. Then take the sum of the filtered values. Finally, divide by the number of these filtered values.
To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function. By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.
To get the aggregated results
from itertools import groupby from operator import itemgetter grouper = itemgetter("dept", "sku") result = [] for key, grp in groupby(sorted(input_data, key = grouper), grouper): temp_dict = dict(zip(["dept", "sku"], key)) temp_dict["qty"] = sum(item["qty"] for item in grp) result.append(temp_dict) from pprint import pprint pprint(result)
Output
[{'dept': '001', 'qty': 200, 'sku': 'bar'}, {'dept': '001', 'qty': 400, 'sku': 'foo'}, {'dept': '002', 'qty': 900, 'sku': 'baz'}, {'dept': '002', 'qty': 600, 'sku': 'qux'}, {'dept': '003', 'qty': 700, 'sku': 'foo'}]
And to get the averages, you can simply change the contents inside the for loop, like this
temp_dict = dict(zip(["dept", "sku"], key)) temp_list = [item["qty"] for item in grp] temp_dict["avg"] = sum(temp_list) / len(temp_list) result.append(temp_dict)
Output
[{'avg': 200, 'dept': '001', 'sku': 'bar'}, {'avg': 200, 'dept': '001', 'sku': 'foo'}, {'avg': 450, 'dept': '002', 'sku': 'baz'}, {'avg': 600, 'dept': '002', 'sku': 'qux'}, {'avg': 700, 'dept': '003', 'sku': 'foo'}]
Suggestion: Anyway, I would have added both the qty
and avg
in the same dict
like this
temp_dict = dict(zip(["dept", "sku"], key)) temp_list = [item["qty"] for item in grp] temp_dict["qty"] = sum(temp_list) temp_dict["avg"] = temp_dict["qty"] / len(temp_list) result.append(temp_dict)
Output
[{'avg': 200, 'dept': '001', 'qty': 200, 'sku': 'bar'}, {'avg': 200, 'dept': '001', 'qty': 400, 'sku': 'foo'}, {'avg': 450, 'dept': '002', 'qty': 900, 'sku': 'baz'}, {'avg': 600, 'dept': '002', 'qty': 600, 'sku': 'qux'}, {'avg': 700, 'dept': '003', 'qty': 700, 'sku': 'foo'}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With