Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: aggregate rows for a given column and count the number

I have the following data frame my_df:

team      member
--------------------    
 A         Mary
 B         John
 C         Amy
 A         Dan
 B         Dave
 D         Paul
 B         Alex
 A         Mary
 D         Mary

I want the new output the new data frame new_df as:

team      members              number
--------------------------------------
 A       [Mary,Dan]              2
 B       [John,Dave,Alex]        3
 C       [Amy]                   1
 D       [Paul,Mary]             2

I am wondering is there any existing pandas function can perform the above task? Thanks!

like image 325
Edamame Avatar asked Jan 11 '17 00:01

Edamame


2 Answers

using groupby

pd.concat

g = df.groupby('team').member
pd.concat([g.apply(list), g.count()], axis=1, keys=['members', 'number'])

agg

g = df.groupby('team').member
g.agg(dict(members=lambda x: list(x), number='count'))

                 members  number
team                            
A            [Mary, Dan]       2
B     [John, Dave, Alex]       3
C                  [Amy]       1
D                 [Paul]       1
like image 84
piRSquared Avatar answered Sep 24 '22 09:09

piRSquared


Another option here:

(df.groupby("team", as_index=False).member
   .agg({"member": lambda x: list(x), "count": "count"}))

enter image description here

like image 29
Psidom Avatar answered Sep 24 '22 09:09

Psidom