I would like to group rows in a dataframe, given one column. Then I would like to receive an edited dataframe for which I can decide which aggregation function makes sense. The default should be just the value of the first entry in the group.
(it would be nice if the solution also worked for a combination of two columns)
#!/usr/bin/env python """Test data frame grouping.""" # 3rd party modules import pandas as pd df = pd.DataFrame([{'id': 1, 'price': 123, 'name': 'anna', 'amount': 1}, {'id': 1, 'price': 7, 'name': 'anna', 'amount': 2}, {'id': 2, 'price': 42, 'name': 'bob', 'amount': 30}, {'id': 3, 'price': 1, 'name': 'charlie', 'amount': 10}, {'id': 3, 'price': 2, 'name': 'david', 'amount': 100}]) print(df)
gives the dataframe:
amount id name price 0 1 1 anna 123 1 2 1 anna 7 2 30 2 bob 42 3 10 3 charlie 1 4 100 3 david 2
And I would like to get:
amount id name price 3 1 anna 130 30 2 bob 42 110 3 charlie 3
So:
id
column belong together. After that operation, there should still be an id
column, but it should have only unique values.amount
and price
which have the same id
get summed upname
, just the first one (by the current order of the dataframe) is taken.Is this possible with Pandas?
We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.
You are looking for
aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'} df_new = df.groupby(df['id']).aggregate(aggregation_functions)
which gives
price name amount id 1 130 anna 3 2 42 bob 30 3 3 charlie 110
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With