Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by multiple keys and summarize/average values of a list of dictionaries

What is the most pythonic way to group by multiple keys and summarize/average values of a list of dictionaries in Python please? Say I have a list of dictionaries as below:

input = [ {'dept': '001', 'sku': 'foo', 'transId': 'uniqueId1', 'qty': 100}, {'dept': '001', 'sku': 'bar', 'transId': 'uniqueId2', 'qty': 200}, {'dept': '001', 'sku': 'foo', 'transId': 'uniqueId3', 'qty': 300}, {'dept': '002', 'sku': 'baz', 'transId': 'uniqueId4', 'qty': 400}, {'dept': '002', 'sku': 'baz', 'transId': 'uniqueId5', 'qty': 500}, {'dept': '002', 'sku': 'qux', 'transId': 'uniqueId6', 'qty': 600}, {'dept': '003', 'sku': 'foo', 'transId': 'uniqueId7', 'qty': 700} ] 

Desired output for aggregation:

output=[ {'dept': '001', 'sku': 'foo', 'qty': 400}, {'dept': '001', 'sku': 'bar', 'qty': 200}, {'dept': '002', 'sku': 'baz', 'qty': 900}, {'dept': '002', 'sku': 'qux', 'qty': 600}, {'dept': '003', 'sku': 'foo', 'qty': 700} ] 

or average:

output=[ {'dept': '001', 'sku': 'foo', 'avg': 200}, {'dept': '001', 'sku': 'bar', 'avg': 200}, {'dept': '002', 'sku': 'baz', 'avg': 450}, {'dept': '002', 'sku': 'qux', 'avg': 600}, {'dept': '003', 'sku': 'foo', 'avg': 700} ] 

I have found this: Group by and aggregate the values of a list of dictionaries in Python but it doesn't seem to give me what I want.

like image 439
Can Lu Avatar asked Feb 10 '14 10:02

Can Lu


People also ask

How do you get the average of a list in a dictionary Python?

In Python, we can find the average of a list by simply using the sum() and len() function.

How do you find the average value of a dictionary?

You can do this by iterating over the dictionary and filtering out zero values first. Then take the sum of the filtered values. Finally, divide by the number of these filtered values.

How do I sort a list of dictionaries by a value of the dictionary?

To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function. By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.


1 Answers

To get the aggregated results

from itertools import groupby from operator import itemgetter  grouper = itemgetter("dept", "sku") result = [] for key, grp in groupby(sorted(input_data, key = grouper), grouper):     temp_dict = dict(zip(["dept", "sku"], key))     temp_dict["qty"] = sum(item["qty"] for item in grp)     result.append(temp_dict)  from pprint import pprint pprint(result) 

Output

[{'dept': '001', 'qty': 200, 'sku': 'bar'},  {'dept': '001', 'qty': 400, 'sku': 'foo'},  {'dept': '002', 'qty': 900, 'sku': 'baz'},  {'dept': '002', 'qty': 600, 'sku': 'qux'},  {'dept': '003', 'qty': 700, 'sku': 'foo'}] 

And to get the averages, you can simply change the contents inside the for loop, like this

temp_dict = dict(zip(["dept", "sku"], key)) temp_list = [item["qty"] for item in grp] temp_dict["avg"] = sum(temp_list) / len(temp_list) result.append(temp_dict) 

Output

[{'avg': 200, 'dept': '001', 'sku': 'bar'},  {'avg': 200, 'dept': '001', 'sku': 'foo'},  {'avg': 450, 'dept': '002', 'sku': 'baz'},  {'avg': 600, 'dept': '002', 'sku': 'qux'},  {'avg': 700, 'dept': '003', 'sku': 'foo'}] 

Suggestion: Anyway, I would have added both the qty and avg in the same dict like this

temp_dict = dict(zip(["dept", "sku"], key)) temp_list = [item["qty"] for item in grp] temp_dict["qty"] = sum(temp_list) temp_dict["avg"] = temp_dict["qty"] / len(temp_list) result.append(temp_dict) 

Output

[{'avg': 200, 'dept': '001', 'qty': 200, 'sku': 'bar'},  {'avg': 200, 'dept': '001', 'qty': 400, 'sku': 'foo'},  {'avg': 450, 'dept': '002', 'qty': 900, 'sku': 'baz'},  {'avg': 600, 'dept': '002', 'qty': 600, 'sku': 'qux'},  {'avg': 700, 'dept': '003', 'qty': 700, 'sku': 'foo'}] 
like image 151
thefourtheye Avatar answered Sep 29 '22 14:09

thefourtheye