I've been performing a groupby operation on a dataframe I have that aggregates columns together based on the column 'Name':
Name | As | Bs | Cs | Note
Mark 3 4 7 Good
Luke 2 1 12 Well
Mark 5 6 8 Ok
John 1 18 3 Great
So in this case, the rows with 'Mark' are aggregated together on columns A,B, and C using this code:
temp_df = temp_df.groupby(['Name'], as_index=False).agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})
A thing I need to add in is to do a count on the number of rows that have the same value in 'Name'. This would give me an output like:
Name | As | Bs | Cs | Note | Count
Mark 8 10 15 Good 2
Luke 2 1 12 Well 1
John 1 18 3 Great 1
How do I modify the above line of code to do what I need?
Create the group and do your aggregations:
the_group = temp_df.groupby(['Name'], as_index=False)
temp_df = the_group.agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})
then compute the size from the_group
temp_df['count'] = the_group.count()['Note']
gives:
Name Cs As Bs count
0 John 3 1 18 1
1 Luke 12 2 1 1
2 Mark 15 8 10 2
As suggested in the comments, it is safer to use size() in case the data include NaN:
temp_df['count'] = the_group.size().reset_index()[0]
Use first + size and then is necessary rename columns by dict:
temp_df = temp_df.groupby('Name', sort=False) \
.agg({'As':np.sum,'Bs':np.sum,'Cs':np.sum,'Note':'first','Name':'size'}) \
.rename(columns={'Name':'Count'}) \
.reset_index() \
.reindex_axis(temp_df.columns.tolist() + ['Count'], axis=1)
print (temp_df)
Name As Bs Cs Note Count
0 Mark 8 10 15 Good 2
1 Luke 2 1 12 Well 1
2 John 1 18 3 Great 1
Dont use count, only size or len.
What is the difference between size and count in pandas?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With