I have a set of data in the list of dict format like below:
data = [
    {'name': 'A', 'tea':5, 'coffee':6},
    {'name': 'A', 'tea':2, 'coffee':3},
    {'name': 'B', 'tea':7, 'coffee':1},
    {'name': 'B', 'tea':9, 'coffee':4},
]
I'm trying to group by 'name' and sum the 'tea' separately and 'coffee' separately
The final grouped data must be in the this format:
grouped_data = [
    {'name': 'A', 'tea':7, 'coffee':9},
    {'name': 'B', 'tea':16, 'coffee':5},
]
I tried some steps:
from collections import Counter
c = Counter()
for v in data:
    c[v['name']] += v['tea']
my_data = [{'name': name, 'tea':tea} for name, tea in c.items()]
for e in my_data:
    print e
The above step returned the following output:
{'name': 'A', 'tea':7,}
{'name': 'B', 'tea':16}
Only I can sum the key 'tea', I'm not able to get the sum for the key 'coffee', can you guys please help to solve this solution to get the grouped_data format
Using pandas:
df = pd.DataFrame(data)
df
   coffee name  tea
0       6    A    5
1       3    A    2
2       1    B    7
3       4    B    9
g = df.groupby('name', as_index=False).sum()
g
  name  coffee  tea
0    A       9    7
1    B       5   16
And, the final step, df.to_dict:
d = g.to_dict('r')
d
[{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
                        You can try this:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
import itertools
final_data = [(a, list(b)) for a, b in itertools.groupby([i.items() for i in data], key=lambda x:dict(x)["name"])] 
new_final_data = [{i[0][0]:sum(c[-1] for c in i if isinstance(c[-1], int)) if i[0][0] != "name" else i[0][-1] for i in zip(*b)} for a, b in final_data]
Output:
[{'tea': 7, 'coffee': 9, 'name': 'A'}, {'tea': 16, 'coffee': 5, 'name': 'B'}
                        Using pandas, this is pretty easy to do:
import pandas as pd
data = [
    {'name': 'A', 'tea':5, 'coffee':6},
    {'name': 'A', 'tea':2, 'coffee':3},
    {'name': 'B', 'tea':7, 'coffee':1},
    {'name': 'B', 'tea':9, 'coffee':4},
]
df = pd.DataFrame(data)
df.groupby(['name']).sum()
      coffee  tea
name             
A          9    7
B          5   16
Here's one way to get it into your dict format:
grouped_data = []
for idx in gb.index:
    d = {'name': idx}
    d = {**d, **{col: gb.loc[idx, col] for col in gb}}
    grouped_data.append(d)
grouped_data
Out[15]: [{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
But COLDSPEED got the native pandas solution with the as_index=False config...
Click here to see snap shot
import pandas as pd
df =  pd.DataFrame(data)
df2=df.groupby('name').sum()
df2.to_dict('r')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With