I have a dataframe like <pre class="prettyprint"><code> ID_0 ID_1 ID_2 0 a b 1 1 a c 1 2 a b 0 3 d c 0 4 a c 0 5 a c 1 </code></pre> I would like to groupby ['ID_0','ID_1'] and produce a new dataframe which has the sum of the ID_2 values for each group divided by the number of rows in each group. <pre class="prettyprint"><code>grouped = df.groupby(['ID_0', 'ID_1']) print grouped.agg({'ID_2': np.sum}), "\n", grouped.size() </code></pre> gives <pre class="prettyprint"><code> ID_2 ID_0 ID_1 a b 1 c 2 d c 0 ID_0 ID_1 a b 2 c 3 d c 1 dtype: int64 </code></pre> How can I get the new dataframe with the np.sum values divided by the size() values?

Use <code>groupby.apply</code> instead: <pre class="prettyprint"><code>df.groupby(['ID_0', 'ID_1']).apply(lambda x: x['ID_2'].sum()/len(x)) ID_0 ID_1 a b 0.500000 c 0.666667 d c 0.000000 dtype: float64 </code></pre>

How to divide the sum with the size in a pandas groupby

Tags:

python

pandas

I have a dataframe like

  ID_0 ID_1  ID_2
0    a    b     1
1    a    c     1
2    a    b     0
3    d    c     0
4    a    c     0
5    a    c     1

I would like to groupby ['ID_0','ID_1'] and produce a new dataframe which has the sum of the ID_2 values for each group divided by the number of rows in each group.

grouped  = df.groupby(['ID_0', 'ID_1'])
print grouped.agg({'ID_2': np.sum}), "\n", grouped.size()

gives

           ID_2
ID_0 ID_1
a    b        1
     c        2
d    c        0
ID_0  ID_1
a     b       2
      c       3
d     c       1
dtype: int64

How can I get the new dataframe with the np.sum values divided by the size() values?

845

asked Sep 28 '16 18:09

graffe

1 Answers

Use groupby.apply instead:

df.groupby(['ID_0', 'ID_1']).apply(lambda x: x['ID_2'].sum()/len(x))

ID_0  ID_1
a     b       0.500000
      c       0.666667
d     c       0.000000
dtype: float64

145

answered Sep 19 '22 13:09

Nickil Maveli

Related questions
                            
                                Python IF multiple "and" "or" in one statement
                            
                                pandas to sql server
                            
                                How to pass an empty parameter to a python function?
                            
                                Flushing numpy memmap to npy file
                            
                                How to get the python Counter output ordered by order of inputs?
                            
                                How to get data labels on a Seaborn pointplot?
                            
                                Split Pandas Series into DataFrame by delimiter
                            
                                Wait for class to exist before continuing with selenium in Firefox
                            
                                pyspark row number dataframe
                            
                                Selecting top n elements from each group in pandas groupby
                            
                                how to sort dataframe based on particular (string)columns using python pandas?
                            
                                Accessing Username and Password in django request header returns None
                            
                                Scrapy 1.1.0 - no active project
                            
                                True + True = 2. Elegantly perform boolean arithmetic?
                            
                                Python Spyder initializing Hello World Kivi app once?
                            
                                What status code should a PATCH request with no changes return?
                            
                                Is the continue statement necessary in a Python while loop?
                            
                                Copying a list using a[:] or copy() in python is shallow? [duplicate]
                            
                                Error in Spark while declaring a UDF
                            
                                Plot confusion matrix sklearn with multiple labels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With