Say my data looks like this: <pre class="prettyprint"><code>date,name,id,dept,sale1,sale2,sale3,total_sale 1/1/17,John,50,Sales,50.0,60.0,70.0,180.0 1/1/17,Mike,21,Engg,43.0,55.0,2.0,100.0 1/1/17,Jane,99,Tech,90.0,80.0,70.0,240.0 1/2/17,John,50,Sales,60.0,70.0,80.0,210.0 1/2/17,Mike,21,Engg,53.0,65.0,12.0,130.0 1/2/17,Jane,99,Tech,100.0,90.0,80.0,270.0 1/3/17,John,50,Sales,40.0,50.0,60.0,150.0 1/3/17,Mike,21,Engg,53.0,55.0,12.0,120.0 1/3/17,Jane,99,Tech,80.0,70.0,60.0,210.0 </code></pre> I want a new column <code>average</code>, which is the average of <code>total_sale</code> for each <code>name,id,dept</code> tuple I tried <pre class="prettyprint"><code>df.groupby(['name', 'id', 'dept'])['total_sale'].mean() </code></pre> And this does return a series with the mean: <pre class="prettyprint"><code>name id dept Jane 99 Tech 240.000000 John 50 Sales 180.000000 Mike 21 Engg 116.666667 Name: total_sale, dtype: float64 </code></pre> but how would I reference the data? The series is a one dimensional one of shape (3,). Ideally I would like this put back into a dataframe with proper columns so I can reference properly by <code>name/id/dept</code>.

If you call <code>.reset_index()</code> on the series that you have, it will get you a dataframe like you want (each level of the index will be converted into a column): <pre class="prettyprint"><code>df.groupby(['name', 'id', 'dept'])['total_sale'].mean().reset_index() </code></pre> EDIT: to respond to the OP's comment, adding this column back to your original dataframe is a little trickier. You don't have the same number of rows as in the original dataframe, so you can't assign it as a new column yet. However, if you set the index the same, <code>pandas</code> is smart and will fill in the values properly for you. Try this: <pre class="prettyprint"><code>cols = ['date','name','id','dept','sale1','sale2','sale3','total_sale'] data = [ ['1/1/17', 'John', 50, 'Sales', 50.0, 60.0, 70.0, 180.0], ['1/1/17', 'Mike', 21, 'Engg', 43.0, 55.0, 2.0, 100.0], ['1/1/17', 'Jane', 99, 'Tech', 90.0, 80.0, 70.0, 240.0], ['1/2/17', 'John', 50, 'Sales', 60.0, 70.0, 80.0, 210.0], ['1/2/17', 'Mike', 21, 'Engg', 53.0, 65.0, 12.0, 130.0], ['1/2/17', 'Jane', 99, 'Tech', 100.0, 90.0, 80.0, 270.0], ['1/3/17', 'John', 50, 'Sales', 40.0, 50.0, 60.0, 150.0], ['1/3/17', 'Mike', 21, 'Engg', 53.0, 55.0, 12.0, 120.0], ['1/3/17', 'Jane', 99, 'Tech', 80.0, 70.0, 60.0, 210.0] ] df = pd.DataFrame(data, columns=cols) mean_col = df.groupby(['name', 'id', 'dept'])['total_sale'].mean() # don't reset the index! df = df.set_index(['name', 'id', 'dept']) # make the same index here df['mean_col'] = mean_col df = df.reset_index() # to take the hierarchical index off again </code></pre>

Adding <code>to_frame</code> <pre class="prettyprint"><code>df.groupby(['name', 'id', 'dept'])['total_sale'].mean().to_frame() </code></pre>

Pandas groupby mean - into a dataframe?

Tags:

python

pandas

dataframe

pandas-groupby

Say my data looks like this:

date,name,id,dept,sale1,sale2,sale3,total_sale
1/1/17,John,50,Sales,50.0,60.0,70.0,180.0
1/1/17,Mike,21,Engg,43.0,55.0,2.0,100.0
1/1/17,Jane,99,Tech,90.0,80.0,70.0,240.0
1/2/17,John,50,Sales,60.0,70.0,80.0,210.0
1/2/17,Mike,21,Engg,53.0,65.0,12.0,130.0
1/2/17,Jane,99,Tech,100.0,90.0,80.0,270.0
1/3/17,John,50,Sales,40.0,50.0,60.0,150.0
1/3/17,Mike,21,Engg,53.0,55.0,12.0,120.0
1/3/17,Jane,99,Tech,80.0,70.0,60.0,210.0

I want a new column average, which is the average of total_sale for each name,id,dept tuple

I tried

df.groupby(['name', 'id', 'dept'])['total_sale'].mean()

And this does return a series with the mean:

name  id  dept 
Jane  99  Tech     240.000000
John  50  Sales    180.000000
Mike  21  Engg     116.666667
Name: total_sale, dtype: float64

but how would I reference the data? The series is a one dimensional one of shape (3,). Ideally I would like this put back into a dataframe with proper columns so I can reference properly by name/id/dept.

617

asked Oct 25 '17 17:10

Craig

2 Answers

If you call .reset_index() on the series that you have, it will get you a dataframe like you want (each level of the index will be converted into a column):

df.groupby(['name', 'id', 'dept'])['total_sale'].mean().reset_index()

EDIT: to respond to the OP's comment, adding this column back to your original dataframe is a little trickier. You don't have the same number of rows as in the original dataframe, so you can't assign it as a new column yet. However, if you set the index the same, pandas is smart and will fill in the values properly for you. Try this:

cols = ['date','name','id','dept','sale1','sale2','sale3','total_sale']
data = [
['1/1/17', 'John', 50, 'Sales', 50.0, 60.0, 70.0, 180.0],
['1/1/17', 'Mike', 21, 'Engg', 43.0, 55.0, 2.0, 100.0],
['1/1/17', 'Jane', 99, 'Tech', 90.0, 80.0, 70.0, 240.0],
['1/2/17', 'John', 50, 'Sales', 60.0, 70.0, 80.0, 210.0],
['1/2/17', 'Mike', 21, 'Engg', 53.0, 65.0, 12.0, 130.0],
['1/2/17', 'Jane', 99, 'Tech', 100.0, 90.0, 80.0, 270.0],
['1/3/17', 'John', 50, 'Sales', 40.0, 50.0, 60.0, 150.0],
['1/3/17', 'Mike', 21, 'Engg', 53.0, 55.0, 12.0, 120.0],
['1/3/17', 'Jane', 99, 'Tech', 80.0, 70.0, 60.0, 210.0]
]
df = pd.DataFrame(data, columns=cols)

mean_col = df.groupby(['name', 'id', 'dept'])['total_sale'].mean() # don't reset the index!
df = df.set_index(['name', 'id', 'dept']) # make the same index here
df['mean_col'] = mean_col
df = df.reset_index() # to take the hierarchical index off again

191

answered Sep 29 '22 11:09

Nathan

Adding to_frame

df.groupby(['name', 'id', 'dept'])['total_sale'].mean().to_frame()

answered Sep 29 '22 10:09

BENY

Related questions
                            
                                Need help in adding binary numbers in python
                            
                                AttributeError: 'float' object has no attribute 'lower'
                            
                                how to read a list of txt files in a folder in python
                            
                                How do I read the number of files in a folder using Python?
                            
                                Import modules from different folders
                            
                                print list elements line by line - is it possible using format
                            
                                shutting down computer (linux) using python
                            
                                How to find an element by href value using selenium python?
                            
                                Google Cloud SDK install on OS X: (gcloud.components.list) Failed to fetch component listing from server
                            
                                Throttling Async Functions in Python Asyncio
                            
                                Django templates stripping spaces?
                            
                                GIL in Python 3.1
                            
                                File deletion using rm command
                            
                                What is a simple fuzzy string matching algorithm in Python?
                            
                                python mongodb regex: ignore case
                            
                                efficient Term Document Matrix with NLTK
                            
                                Upgrade to numpy 1.8.0 on Ubuntu 12.04
                            
                                Simple way of creating a 2D array with random numbers (Python)
                            
                                Pandas - Writing an excel file containing unicode - IllegalCharacterError
                            
                                How to remove a column in a numpy array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With