Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I "merge" rows by same value in a column in Pandas with aggregation functions?

Tags:

python

pandas

I would like to group rows in a dataframe, given one column. Then I would like to receive an edited dataframe for which I can decide which aggregation function makes sense. The default should be just the value of the first entry in the group.

(it would be nice if the solution also worked for a combination of two columns)

Example

#!/usr/bin/env python  """Test data frame grouping."""  # 3rd party modules import pandas as pd   df = pd.DataFrame([{'id': 1, 'price': 123, 'name': 'anna', 'amount': 1},                    {'id': 1, 'price':   7, 'name': 'anna', 'amount': 2},                    {'id': 2, 'price':  42, 'name': 'bob', 'amount': 30},                    {'id': 3, 'price':   1, 'name': 'charlie', 'amount': 10},                    {'id': 3, 'price':   2, 'name': 'david', 'amount': 100}]) print(df) 

gives the dataframe:

   amount  id     name  price 0       1   1     anna    123 1       2   1     anna      7 2      30   2      bob     42 3      10   3  charlie      1 4     100   3    david      2 

And I would like to get:

amount  id     name  price      3   1     anna    130     30   2      bob     42    110   3  charlie      3 

So:

  • Entries with the same value in the id column belong together. After that operation, there should still be an id column, but it should have only unique values.
  • All values in amount and price which have the same id get summed up
  • For name, just the first one (by the current order of the dataframe) is taken.

Is this possible with Pandas?

like image 570
Martin Thoma Avatar asked Oct 19 '17 09:10

Martin Thoma


People also ask

How do I merge rows in pandas DataFrame?

We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.


1 Answers

You are looking for

aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'} df_new = df.groupby(df['id']).aggregate(aggregation_functions) 

which gives

    price     name  amount id                         1     130     anna       3 2      42      bob      30 3       3  charlie     110 
like image 50
Martin Thoma Avatar answered Oct 17 '22 05:10

Martin Thoma