I have a dataframe like this: <pre class="prettyprint"><code>cluster org time 1 a 8 1 a 6 2 h 34 1 c 23 2 d 74 3 w 6 </code></pre> I would like to calculate the average of time per org per cluster. Expected result: <pre class="prettyprint"><code>cluster mean(time) 1 15 ((8+6)/2+23)/2 2 54 (74+34)/2 3 6 </code></pre> I do not know how to do it in Pandas, can anybody help?

If you want to first take mean on the combination of <code>['cluster', 'org']</code> and then take mean on <code>cluster</code> groups, you can use: <pre class="prettyprint"><code>In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean() .groupby('cluster')['time'].mean()) Out[59]: cluster 1 15 2 54 3 6 Name: time, dtype: int64 </code></pre> If you want the mean of <code>cluster</code> groups only, then you can use: <pre class="prettyprint"><code>In [58]: df.groupby(['cluster']).mean() Out[58]: time cluster 1 12.333333 2 54.000000 3 6.000000 </code></pre> You can also use <code>groupby</code> on <code>['cluster', 'org']</code> and then use <code>mean()</code>: <pre class="prettyprint"><code>In [57]: df.groupby(['cluster', 'org']).mean() Out[57]: time cluster org 1 a 438886 c 23 2 d 9874 h 34 3 w 6 </code></pre>

I would simply do this, which literally follows what your desired logic was: <pre class="prettyprint"><code>df.groupby(['org']).mean().groupby(['cluster']).mean() </code></pre>

Python Pandas : group by in group by and average?

Tags:

python

pandas

group-by

mean

I have a dataframe like this:

cluster  org      time    1      a       8    1      a       6    2      h       34    1      c       23    2      d       74    3      w       6

I would like to calculate the average of time per org per cluster.

Expected result:

cluster mean(time) 1       15 ((8+6)/2+23)/2 2       54   (74+34)/2 3       6

I do not know how to do it in Pandas, can anybody help?

883

asked May 19 '15 14:05

UserYmY

2 Answers

If you want to first take mean on the combination of ['cluster', 'org'] and then take mean on cluster groups, you can use:

In [59]: (df.groupby(['cluster', 'org'], as_index=False).mean()             .groupby('cluster')['time'].mean()) Out[59]: cluster 1          15 2          54 3           6 Name: time, dtype: int64

If you want the mean of cluster groups only, then you can use:

In [58]: df.groupby(['cluster']).mean() Out[58]:               time cluster 1        12.333333 2        54.000000 3         6.000000

You can also use groupby on ['cluster', 'org'] and then use mean():

In [57]: df.groupby(['cluster', 'org']).mean() Out[57]:                time cluster org 1       a    438886         c        23 2       d      9874         h        34 3       w         6

answered Sep 22 '22 23:09

Zero

I would simply do this, which literally follows what your desired logic was:

df.groupby(['org']).mean().groupby(['cluster']).mean()

answered Sep 19 '22 23:09

Vince Payandeh

Related questions
                            
                                Spark DataFrame groupBy and sort in the descending order (pyspark)
                            
                                Resource u'tokenizers/punkt/english.pickle' not found
                            
                                Is there a more elegant way to express ((x == a and y == b) or (x == b and y == a))?
                            
                                Kill process by name?
                            
                                Deleting multiple columns based on column names in Pandas
                            
                                How to maximize a plt.show() window using Python
                            
                                Anaconda vs. EPD Enthought vs. manual installation of Python [closed]
                            
                                Python: importing a sub‑package or sub‑module
                            
                                Python - abs vs fabs
                            
                                What are the different use cases of joblib versus pickle?
                            
                                Does Python support multithreading? Can it speed up execution time?
                            
                                Benchmarking (python vs. c++ using BLAS) and (numpy)
                            
                                How to enable a virtualenv in a systemd service unit?
                            
                                Apply function to each cell in DataFrame
                            
                                How do I manipulate a variable whose name conflicts with PDB commands?
                            
                                Why are dates calculated from January 1st, 1970? [duplicate]
                            
                                Multiple columns index when using the declarative ORM extension of sqlalchemy
                            
                                Validating with an XML schema in Python
                            
                                How can I set the 'backend' in matplotlib in Python?
                            
                                Is python's sorted() function guaranteed to be stable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With