I'm using Pandas to manipulate a csv file with several rows and columns that looks like the following <pre class="prettyprint"><code>Fullname Amount Date Zip State ..... John Joe 1 1/10/1900 55555 Confusion Betty White 5 . . Alaska Bruce Wayne 10 . . Frustration John Joe 20 . . . Betty White 25 . . . </code></pre> I'd like to create a new column entitled <code>Total</code> with a total sum of amount for each person. (Identified by <code>Fullname</code> and <code>Zip</code>). I'm having difficulty in finding the correct solution. Let's just call my csv import csvfile. Here is what I have. <pre class="prettyprint"><code>import Pandas df = pandas.read_csv('csvfile.csv', header = 0) df.sort(['fullname']) </code></pre> I think I have to use the iterrows to do what I want as an object. The problem with dropping duplicates is that I will lose the amount or the amount may be different.

I think you want this: <pre class="prettyprint"><code>df['Total'] = df.groupby(['Fullname', 'Zip'])['Amount'].transform('sum') </code></pre> So <code>groupby</code> will group by the <code>Fullname</code> and <code>zip</code> columns, as you've stated, we then call <code>transform</code> on the <code>Amount</code> column and calculate the total amount by passing in the string <code>sum</code>, this will return a series with the index aligned to the original <code>df</code>, you can then drop the duplicates afterwards. e.g. <pre class="prettyprint"><code>new_df = df.drop_duplicates(subset=['Fullname', 'Zip']) </code></pre>

Pandas Sum of Duplicate Attributes

Tags:

I'm using Pandas to manipulate a csv file with several rows and columns that looks like the following

Fullname     Amount     Date           Zip    State ..... John Joe        1        1/10/1900     55555    Confusion Betty White     5         .             .       Alaska  Bruce Wayne     10        .             .       Frustration John Joe        20        .             .       . Betty White     25        .             .       .

I'd like to create a new column entitled Total with a total sum of amount for each person. (Identified by Fullname and Zip). I'm having difficulty in finding the correct solution.

Let's just call my csv import csvfile. Here is what I have.

import Pandas df = pandas.read_csv('csvfile.csv', header = 0)  df.sort(['fullname'])

I think I have to use the iterrows to do what I want as an object. The problem with dropping duplicates is that I will lose the amount or the amount may be different.

548

asked Apr 11 '15 21:04

user2723240

1 Answers

I think you want this:

df['Total'] = df.groupby(['Fullname', 'Zip'])['Amount'].transform('sum')

So groupby will group by the Fullname and zip columns, as you've stated, we then call transform on the Amount column and calculate the total amount by passing in the string sum, this will return a series with the index aligned to the original df, you can then drop the duplicates afterwards. e.g.

new_df = df.drop_duplicates(subset=['Fullname', 'Zip'])

107

answered Sep 18 '22 21:09

EdChum

Related questions
                            
                                $parse vs $eval ? which one is best practice?
                            
                                Swift: use of 'self' in method call before super.init initializes self compile error
                            
                                seaborn: legend with background color
                            
                                Connect to MySQL database on vagrant machine in PhpStorm
                            
                                Setting the same axis limits for all subplots in matplotlib
                            
                                Stripping "::ffff:" prefix from request.connection.remoteAddress nodejs
                            
                                md-select can't set selected value
                            
                                POM error: Failure to find org.springframework.boot
                            
                                Visual Studio: Shortcut to close window not working
                            
                                How to check if a string contains a substring in Delphi?
                            
                                Adding a git commit message using vi on OS X
                            
                                How to destroy a singleton in Swift

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With