How can I "merge" rows by same value in a column in Pandas with aggregation functions?

Tags:

I would like to group rows in a dataframe, given one column. Then I would like to receive an edited dataframe for which I can decide which aggregation function makes sense. The default should be just the value of the first entry in the group.

(it would be nice if the solution also worked for a combination of two columns)

Example

#!/usr/bin/env python  """Test data frame grouping."""  # 3rd party modules import pandas as pd   df = pd.DataFrame([{'id': 1, 'price': 123, 'name': 'anna', 'amount': 1},                    {'id': 1, 'price':   7, 'name': 'anna', 'amount': 2},                    {'id': 2, 'price':  42, 'name': 'bob', 'amount': 30},                    {'id': 3, 'price':   1, 'name': 'charlie', 'amount': 10},                    {'id': 3, 'price':   2, 'name': 'david', 'amount': 100}]) print(df)

gives the dataframe:

   amount  id     name  price 0       1   1     anna    123 1       2   1     anna      7 2      30   2      bob     42 3      10   3  charlie      1 4     100   3    david      2

And I would like to get:

amount  id     name  price      3   1     anna    130     30   2      bob     42    110   3  charlie      3

So:

Entries with the same value in the id column belong together. After that operation, there should still be an id column, but it should have only unique values.
All values in amount and price which have the same id get summed up
For name, just the first one (by the current order of the dataframe) is taken.

Is this possible with Pandas?

570

asked Oct 19 '17 09:10

Martin Thoma

1 Answers

You are looking for

aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'} df_new = df.groupby(df['id']).aggregate(aggregation_functions)

which gives

    price     name  amount id                         1     130     anna       3 2      42      bob      30 3       3  charlie     110

answered Oct 17 '22 05:10

Martin Thoma

Related questions
                            
                                Hyperparameter optimization for Pytorch model [closed]
                            
                                How to solve the "Mastermind" guessing game?
                            
                                How to make Django QuerySet bulk delete() more efficient
                            
                                Removing Duplicates From Dictionary
                            
                                What are good practices for avoiding crashes / hangs in PyQt?
                            
                                how to release used memory immediately in python list?
                            
                                What is the Python way of chaining maps and filters?
                            
                                Compare Python Pandas DataFrames for matching rows
                            
                                Using pyarrow how do you append to parquet file?
                            
                                View dataframe while debugging in VS Code
                            
                                python mock side_effect or return_value dependent on call_count
                            
                                Default value for next element in Python iterator if iterator is empty?
                            
                                Starting supervisord as root or not?
                            
                                how set column as date index?
                            
                                How do I use a relative path in a Python module when the CWD has changed?
                            
                                BeautifulSoup and lxml.html - what to prefer? [duplicate]
                            
                                Retrieve a task result object, given a `task_id` in Celery
                            
                                How to connect Python to Db2
                            
                                Remove rows in python less than a certain value
                            
                                Create single row python pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I "merge" rows by same value in a column in Pandas with aggregation functions?

Tags:

python

pandas

Example

Martin Thoma

People also ask

1 Answers

Martin Thoma

Recent Activity

Donate For Us