I have a question, simplified in this example. Consider this Pandas DataFrame, df_a:
df_a=pd.DataFrame([['1001',34.3,'red'],['1001',900.04,'red'],['1001',776,'red'],['1003',18.95,'green'],['1004',321.2,'blue']],columns=['id','amount','name'])
id amount name
0 1001 34.30 red
1 1001 900.04 red
2 1001 776.00 red
3 1003 18.95 green
4 1004 321.20 blue
I would like to groupby this dataframe by summing the amount into a new DataFrame and create a new 'arange'-like index. This should be the result I would like to have:
id amount
0 1001 1710.34
1 1003 18.95
2 1004 321.20
But my efforts create a Series (I would like a DataFrame as result):
df_a.groupby(['id'])['amount'].sum()
id
1001 1710.34
1003 18.95
1004 321.20
Name: amount, dtype: float64
or create a new index based on the id column:
pd.DataFrame(df_a.groupby(['id'])['amount'].sum())
amount
id
1001 1710.34
1003 18.95
1004 321.20
I've also tried to pass the index parameter, but that doesn't work either:
pd.DataFrame(df_a.groupby(['id'])['amount'].sum(),index=df_a.index.values)
amount
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
Does anyone have an elegant solution for this ?
You have a parameter as_index in groupby for that
df_a.groupby('id', as_index = False)['amount'].sum()
You get
id amount
0 1001 1710.34
1 1003 18.95
2 1004 321.20
You can try the following by adding to_frame()
and reset_index()
:
new_df = df_a.groupby(['id'])['amount'].sum().to_frame('amount').reset_index()
print(new_df)
Result:
id amount
0 1001 1710.34
1 1003 18.95
2 1004 321.20
If you only use to_frame()
i.e. using
df_a.groupby(['id'])['amount'].sum().to_frame('amount')
it will keep index on id
as following:
amount
id
1001 1710.34
1003 18.95
1004 321.20
Other way is to reset index on dataframe in your above code:
new_df = pd.DataFrame(df_a.groupby(['id'])['amount'].sum()).reset_index()
Output would be same as above:
id amount
0 1001 1710.34
1 1003 18.95
2 1004 321.20
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With