This should be an easy one, but somehow I couldn't find a solution that works. I have a pandas dataframe which looks like this: <pre class="prettyprint"><code>index col1 col2 col3 col4 col5 0 a c 1 2 f 1 a c 1 2 f 2 a d 1 2 f 3 b d 1 2 g 4 b e 1 2 g 5 b e 1 2 g </code></pre> I want to group by col1 and col2 and get the <code>sum()</code> of col3 and col4. <code>col5</code> can be dropped since the data can not be aggregated. Here is what the output should look like. I am interested in having both <code>col3</code> and <code>col4</code> in the resulting dataframe. It doesn't really matter if <code>col1</code> and <code>col2</code> are part of the index or not. <pre class="prettyprint"><code>index col1 col2 col3 col4 0 a c 2 4 1 a d 1 2 2 b d 1 2 3 b e 2 4 </code></pre> Here is what I tried: <pre class="prettyprint"><code>df_new = df.groupby(['col1', 'col2'])['col3', 'col4'].sum() </code></pre> That however only returns the aggregated results of <code>col4</code>. I am lost here. Every example I found only aggregates one column, where the issue obviously doesn't occur.

By using <code>apply</code> <pre class="prettyprint"><code>df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum()) Out[1257]: col3 col4 col1 col2 a c 2 4 d 1 2 b d 1 2 e 2 4 </code></pre> If you want to <code>agg</code> <pre class="prettyprint"><code>df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'}) </code></pre>

Due to pandas FutureWarning: Indexing with multiple keys discussed on GitHub and Stack Overflow, I recommend this solution: <pre class="prettyprint"><code>df.groupby(['col1', 'col2'])[['col3', 'col4']].sum().reset_index() </code></pre> Output: <img src="https://i.stack.imgur.com/Ggdzu.png" alt="output dataframe">

The above answer didn't work for me. <pre class="prettyprint"><code>df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]] </code></pre> I was grouping by single group by and sum columns. Here is the one worked for me. <pre class="prettyprint"><code>D1.groupby(['col1'])['col2'].sum() << The sum at the end not the middle. </code></pre>

Pandas - dataframe groupby - how to get sum of multiple columns

Tags:

python

pandas

dataframe

pandas-groupby

This should be an easy one, but somehow I couldn't find a solution that works.

I have a pandas dataframe which looks like this:

index col1   col2   col3   col4   col5
0     a      c      1      2      f 
1     a      c      1      2      f
2     a      d      1      2      f
3     b      d      1      2      g
4     b      e      1      2      g
5     b      e      1      2      g

I want to group by col1 and col2 and get the sum() of col3 and col4. col5 can be dropped since the data can not be aggregated.

Here is what the output should look like. I am interested in having both col3 and col4 in the resulting dataframe. It doesn't really matter if col1 and col2 are part of the index or not.

index col1   col2   col3   col4   
0     a      c      2      4          
1     a      d      1      2      
2     b      d      1      2      
3     b      e      2      4

Here is what I tried:

df_new = df.groupby(['col1', 'col2'])['col3', 'col4'].sum()

That however only returns the aggregated results of col4.

I am lost here. Every example I found only aggregates one column, where the issue obviously doesn't occur.

599

asked Sep 26 '17 16:09

Axel

4 Answers

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum()) Out[1257]:             col3  col4 col1 col2             a    c        2     4      d        1     2 b    d        1     2      e        2     4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

111

answered Sep 18 '22 00:09

BENY

Another generic solution is

df.groupby(['col1','col2']).agg({'col3':'sum','col4':'sum'}).reset_index()

This will give you the required output.

UPDATED (June 2020): Introduced in Pandas 0.25.0, Pandas has added new groupby behavior “named aggregation” and tuples, for naming the output columns when applying multiple aggregation functions to specific columns.

df.groupby(['col1','col2']).agg(      sum_col3 = ('col3','sum'),      sum_col4 = ('col4','sum'),      ).reset_index()

Also, you can name new columns, e.g. I've used 'sum_col3' and 'sum_col4', but you can use any name you want.

Refer to Link for detailed description.

answered Sep 20 '22 00:09

Prateek Sharma

Due to pandas FutureWarning: Indexing with multiple keys discussed on GitHub and Stack Overflow, I recommend this solution:

df.groupby(['col1', 'col2'])[['col3', 'col4']].sum().reset_index()

Output:

output dataframe

answered Sep 18 '22 00:09

oil_lamp

The above answer didn't work for me.

df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]

I was grouping by single group by and sum columns.

Here is the one worked for me.

D1.groupby(['col1'])['col2'].sum() << The sum at the end not the middle.

answered Sep 18 '22 00:09

Leo James

Related questions
                            
                                Save matplotlib file to a directory
                            
                                TypeError: 'filter' object is not subscriptable
                            
                                Why does it say that module pygame has no init member?
                            
                                Erase whole array Python
                            
                                Can't create pdf using python PDFKIT Error : " No wkhtmltopdf executable found:"
                            
                                Taking the floor of a float
                            
                                Rename unnamed column pandas dataframe
                            
                                Why is this Haskell program so much slower than an equivalent Python one?
                            
                                Text formatting error: '=' alignment not allowed in string format specifier
                            
                                How to scale images to screen size in Pygame
                            
                                Start IPython notebook server without running web browser?
                            
                                Sendmail Errno[61] Connection Refused
                            
                                Run a program from python, and have it continue to run after the script is killed
                            
                                Matplotlib fill between multiple lines
                            
                                Call another click command from a click command
                            
                                How to make python3 command run Python 3.6 instead of 3.5?
                            
                                python PIL draw multiline text on image
                            
                                How to draw a line on an image in OpenCV?
                            
                                How to replace values at specific indexes of a python list?
                            
                                How to create a word cloud from a corpus in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas - dataframe groupby - how to get sum of multiple columns

Tags:

python

pandas

dataframe

pandas-groupby

Axel

People also ask

4 Answers

BENY

Prateek Sharma

oil_lamp

Leo James

Recent Activity

Donate For Us