I have a pandas dataframe which looks like the following: <pre class="prettyprint"><code>Name Missed Credit Grade A 1 3 10 A 1 1 12 B 2 3 10 B 1 2 20 </code></pre> And my desired output is: <pre class="prettyprint"><code>Name Sum1 Sum2 Average A 2 4 11 B 3 5 15 </code></pre> Basically to get the sum of column <code>Credit</code> and <code>Missed</code> and to do average on <code>Grade</code>. What I am doing right now is two groupby on <code>Name</code> and then get sum and average and finally merge the two output dataframes which does not seem to be the best way of doing this. I have also found this on SO which makes sense if I want to work only on one column: <pre class="prettyprint"><code>df.groupby('Name')['Credit'].agg(['sum','average']) </code></pre> But not sure how to do a one-liner for both columns?

You need <code>agg</code> by <code>dictionary</code> and then <code>rename</code> columns names: <pre class="prettyprint"><code>d = {'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'} df=df.groupby('Name').agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'}).rename(columns=d) print (df) Sum1 Sum2 Average Name A 2 4 11 B 3 5 15 </code></pre> If want also create column from <code>Name</code>: <pre class="prettyprint"><code>df = (df.groupby('Name', as_index=False) .agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'}) .rename(columns={'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'})) print (df) Name Sum1 Sum2 Average 0 A 2 4 11 1 B 3 5 15 </code></pre> Solution with named aggregations: <pre class="prettyprint"><code>df = df.groupby('Name', as_index=False).agg(Sum1=('Missed','sum'), Sum2= ('Credit','sum'), Average=('Grade','mean')) print (df) Name Sum1 Sum2 Average 0 A 2 4 11 1 B 3 5 15 </code></pre>

Python pandas: mean and sum groupby on different columns at the same time

Tags:

python

pandas

I have a pandas dataframe which looks like the following:

Name    Missed    Credit    Grade A       1         3         10 A       1         1         12       B       2         3         10 B       1         2         20

And my desired output is:

Name    Sum1   Sum2    Average A       2      4      11 B       3      5      15

Basically to get the sum of column Credit and Missed and to do average on Grade. What I am doing right now is two groupby on Name and then get sum and average and finally merge the two output dataframes which does not seem to be the best way of doing this. I have also found this on SO which makes sense if I want to work only on one column:

df.groupby('Name')['Credit'].agg(['sum','average'])

But not sure how to do a one-liner for both columns?

317

asked Feb 21 '18 15:02

ahajib

2 Answers

You need agg by dictionary and then rename columns names:

d = {'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'} df=df.groupby('Name').agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'}).rename(columns=d) print (df)       Sum1  Sum2  Average Name                      A        2     4       11 B        3     5       15

If want also create column from Name:

df = (df.groupby('Name', as_index=False)        .agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'})        .rename(columns={'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'})) print (df)   Name  Sum1  Sum2  Average 0    A     2     4       11 1    B     3     5       15

Solution with named aggregations:

df = df.groupby('Name', as_index=False).agg(Sum1=('Missed','sum'),                                              Sum2= ('Credit','sum'),                                             Average=('Grade','mean')) print (df)   Name  Sum1  Sum2  Average 0    A     2     4       11 1    B     3     5       15

answered Oct 02 '22 02:10

jezrael

A = pd.DataFrame.from_dict({'Name':['A','A','B','B'],'Missed':[1,1,2,1],'Credit':[3,1,3,2],'Grades':[10,12,10,20]})  A.groupby('Name').agg({'Missed':'sum','Credit':'sum','Grades':'mean'})

answered Oct 02 '22 01:10

ashish trehan

Related questions
                            
                                How to find the closest word to a vector using word2vec
                            
                                Basemap with Python 3.5 Anaconda on Windows
                            
                                User authentication in Elasticsearch query using python
                            
                                Django Invalid HTTP_HOST header: 'testserver'. You may need to add u'testserver' to ALLOWED_HOSTS
                            
                                "Failed to decode response from marionette" message in Python/Firefox headless scraping script
                            
                                Tensor is not an element of this graph; deploying Keras model
                            
                                Splitting list based on missing numbers in a sequence
                            
                                Python list comprehension for dictionaries in dictionaries?
                            
                                Why is Tkinter Entry's get function returning nothing?
                            
                                How to pass proxy-authentication (requires digest auth) by using python requests module
                            
                                What does this mean: key=lambda x: x[1] ?
                            
                                What is the best way to convert a SymPy matrix to a numpy array/matrix
                            
                                Simpler way to draw a circle with tkinter?
                            
                                datetime and timezone conversion with pytz - mind blowing behaviour
                            
                                How to define an unsigned integer in SQLAlchemy
                            
                                Why is the output of werkzeugs `generate_password_hash` not constant?
                            
                                how to filter json array in python
                            
                                Matplotlib Crashing tkinter Application
                            
                                view and then close the figure automatically in matplotlib?
                            
                                Printing on the same line on a jupyter notebook

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With