Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas : group by in group by and average?

I have a dataframe like this:

cluster  org      time    1      a       8    1      a       6    2      h       34    1      c       23    2      d       74    3      w       6  

I would like to calculate the average of time per org per cluster.

Expected result:

cluster mean(time) 1       15 ((8+6)/2+23)/2 2       54   (74+34)/2 3       6 

I do not know how to do it in Pandas, can anybody help?

like image 883
UserYmY Avatar asked May 19 '15 14:05

UserYmY


People also ask

How do you do Groupby and average in pandas?

Pandas Groupby Mean To get the average (or mean) value of in each group, you can directly apply the pandas mean() function to the selected columns from the result of pandas groupby.

Can you group by two things in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns.

How do you calculate mean of multiple columns in pandas?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.

What is group by () in pandas library?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.


2 Answers

If you want to first take mean on the combination of ['cluster', 'org'] and then take mean on cluster groups, you can use:

In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean()             .groupby('cluster')['time'].mean()) Out[59]: cluster 1          15 2          54 3           6 Name: time, dtype: int64 

If you want the mean of cluster groups only, then you can use:

In [58]: df.groupby(['cluster']).mean() Out[58]:               time cluster 1        12.333333 2        54.000000 3         6.000000 

You can also use groupby on ['cluster', 'org'] and then use mean():

In [57]: df.groupby(['cluster', 'org']).mean() Out[57]:                time cluster org 1       a    438886         c        23 2       d      9874         h        34 3       w         6 
like image 63
Zero Avatar answered Sep 22 '22 23:09

Zero


I would simply do this, which literally follows what your desired logic was:

df.groupby(['org']).mean().groupby(['cluster']).mean() 
like image 20
Vince Payandeh Avatar answered Sep 19 '22 23:09

Vince Payandeh