Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the size of a groupby operation in Pandas

Tags:

python

pandas

I've been performing a groupby operation on a dataframe I have that aggregates columns together based on the column 'Name':

Name | As | Bs | Cs   |  Note
Mark   3     4     7     Good
Luke   2     1     12    Well
Mark   5     6     8     Ok
John   1     18    3     Great

So in this case, the rows with 'Mark' are aggregated together on columns A,B, and C using this code:

temp_df = temp_df.groupby(['Name'], as_index=False).agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})

A thing I need to add in is to do a count on the number of rows that have the same value in 'Name'. This would give me an output like:

Name | As | Bs | Cs   |  Note   | Count
Mark   8     10    15    Good      2
Luke   2     1     12    Well      1
John   1     18    3     Great     1

How do I modify the above line of code to do what I need?

like image 490
GreenGodot Avatar asked Nov 08 '25 21:11

GreenGodot


2 Answers

Create the group and do your aggregations:

the_group = temp_df.groupby(['Name'], as_index=False)
temp_df = the_group.agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})

then compute the size from the_group

temp_df['count'] = the_group.count()['Note']

gives:

   Name  Cs  As  Bs  count
0  John   3   1  18      1
1  Luke  12   2   1      1
2  Mark  15   8  10      2

Edit:

As suggested in the comments, it is safer to use size() in case the data include NaN:

temp_df['count'] = the_group.size().reset_index()[0] 
like image 51
VinceP Avatar answered Nov 10 '25 12:11

VinceP


Use first + size and then is necessary rename columns by dict:

temp_df = temp_df.groupby('Name', sort=False) \
                .agg({'As':np.sum,'Bs':np.sum,'Cs':np.sum,'Note':'first','Name':'size'}) \
                .rename(columns={'Name':'Count'}) \
                .reset_index() \
                .reindex_axis(temp_df.columns.tolist() + ['Count'], axis=1) 
print (temp_df)
   Name  As  Bs  Cs   Note  Count
0  Mark   8  10  15   Good      2
1  Luke   2   1  12   Well      1
2  John   1  18   3  Great      1

Dont use count, only size or len.

What is the difference between size and count in pandas?

like image 42
jezrael Avatar answered Nov 10 '25 11:11

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!