Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to divide the sum with the size in a pandas groupby

Tags:

python

pandas

I have a dataframe like

  ID_0 ID_1  ID_2
0    a    b     1
1    a    c     1
2    a    b     0
3    d    c     0
4    a    c     0
5    a    c     1

I would like to groupby ['ID_0','ID_1'] and produce a new dataframe which has the sum of the ID_2 values for each group divided by the number of rows in each group.

grouped  = df.groupby(['ID_0', 'ID_1'])
print grouped.agg({'ID_2': np.sum}), "\n", grouped.size()

gives

           ID_2
ID_0 ID_1
a    b        1
     c        2
d    c        0
ID_0  ID_1
a     b       2
      c       3
d     c       1
dtype: int64

How can I get the new dataframe with the np.sum values divided by the size() values?

like image 845
graffe Avatar asked Sep 28 '16 18:09

graffe


People also ask

How do you get the sum of groupby in pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do you split a groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

What does groupby size do in pandas?

Pandas dataframe. groupby() function is one of the most useful function in the library it splits the data into groups based on columns/conditions and then apply some operations eg. size() which counts the number of entries/rows in each group.

How do you divide two values in pandas?

The simple division (/) operator is the first way to divide two columns. You will split the First Column with the other columns here. This is the simplest method of dividing two columns in Pandas.


1 Answers

Use groupby.apply instead:

df.groupby(['ID_0', 'ID_1']).apply(lambda x: x['ID_2'].sum()/len(x))

ID_0  ID_1
a     b       0.500000
      c       0.666667
d     c       0.000000
dtype: float64
like image 145
Nickil Maveli Avatar answered Sep 19 '22 13:09

Nickil Maveli