Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas get average of a groupby

I am trying to find the average monthly cost per user_id but i am only able to get average cost per user or monthly cost per user.

Because i group by user and month, there is no way to get the average of the second groupby (month) unless i transform the groupby output to something else.

This is my df:

     df = { 'id' : pd.Series([1,1,1,1,2,2,2,2]),
            'cost' : pd.Series([10,20,30,40,50,60,70,80]),
            'mth': pd.Series([3,3,4,5,3,4,4,5])}

   cost  id  mth
0    10   1    3
1    20   1    3
2    30   1    4
3    40   1    5
4    50   2    3
5    60   2    4
6    70   2    4
7    80   2    5

I can get monthly sum but i want the average of the months for each user_id.

df.groupby(['id','mth'])['cost'].sum()

id  mth
1   3       30
    4       30
    5       40
2   3       50
    4      130
    5       80

i want something like this:

id average_monthly
1 (30+30+40)/3
2 (50+130+80)/3
like image 348
jxn Avatar asked Oct 16 '16 04:10

jxn


People also ask

How does pandas calculate average in Groupby?

Apply a function groupby to a Series. Apply a function groupby to each row or column of a DataFrame. Groupby one column and return the mean of the remaining columns in each group. Groupby two columns and return the mean of the remaining column.

How do you get your average in pandas?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

How can calculate mean value grouped on another column in pandas?

To calculate mean values grouped on another column in pandas, we will use groupby, and then we will apply mean() method. Pandas allow us a direct method called mean() which calculates the average of the set passed into it.

Is Groupby faster on index pandas?

Although Groupby is much faster than Pandas GroupBy. apply and GroupBy. transform with user-defined functions, Pandas is much faster with common functions like mean and sum because they are implemented in Cython. The speed differences are not small.


1 Answers

Resetting the index should work. Try this:

In [19]: df.groupby(['id', 'mth']).sum().reset_index().groupby('id').mean()  
Out[19]: 
    mth       cost
id                
1   4.0  33.333333
2   4.0  86.666667

You can just drop mth if you want. The logic is that after the sum part, you have this:

In [20]: df.groupby(['id', 'mth']).sum()
Out[20]: 
        cost
id mth      
1  3      30
   4      30
   5      40
2  3      50
   4     130
   5      80

Resetting the index at this point will give you unique months.

In [21]: df.groupby(['id', 'mth']).sum().reset_index()
Out[21]: 
   id  mth  cost
0   1    3    30
1   1    4    30
2   1    5    40
3   2    3    50
4   2    4   130
5   2    5    80

It's just a matter of grouping it again, this time using mean instead of sum. This should give you the averages.

Let us know if this helps.

like image 164
NullDev Avatar answered Sep 25 '22 22:09

NullDev