I have a dataframe like this:
cluster org time 1 a 8 1 a 6 2 h 34 1 c 23 2 d 74 3 w 6
I would like to calculate the average of time per org per cluster.
Expected result:
cluster mean(time) 1 15 ((8+6)/2+23)/2 2 54 (74+34)/2 3 6
I do not know how to do it in Pandas, can anybody help?
Pandas Groupby Mean To get the average (or mean) value of in each group, you can directly apply the pandas mean() function to the selected columns from the result of pandas groupby.
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns.
To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.
Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.
If you want to first take mean on the combination of ['cluster', 'org']
and then take mean on cluster
groups, you can use:
In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean() .groupby('cluster')['time'].mean()) Out[59]: cluster 1 15 2 54 3 6 Name: time, dtype: int64
If you want the mean of cluster
groups only, then you can use:
In [58]: df.groupby(['cluster']).mean() Out[58]: time cluster 1 12.333333 2 54.000000 3 6.000000
You can also use groupby
on ['cluster', 'org']
and then use mean()
:
In [57]: df.groupby(['cluster', 'org']).mean() Out[57]: time cluster org 1 a 438886 c 23 2 d 9874 h 34 3 w 6
I would simply do this, which literally follows what your desired logic was:
df.groupby(['org']).mean().groupby(['cluster']).mean()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With