how to concat sets when using groupby in pandas dataframe?




This is my dataframe:

> df
       a             b
    0  1         set([2, 3])
    1  2         set([2, 3])
    2  3      set([4, 5, 6])
    3  1  set([1, 34, 3, 2])

Now when I groupby, I want to update sets. If it was a list there was no problem. But the output of my command is:

> df.groupby('a').sum()

a         b                
1             NaN
2     set([2, 3])
3  set([4, 5, 6])  

What should I do in groupby to update sets? The output I'm looking for is as below:

a         b                
1     set([2, 3, 1, 34])
2     set([2, 3])
3     set([4, 5, 6])  
Alireza
Alireza Avatar asked Oct 06 '15 10:10


1 Answers

This might be close to what you want

df.groupby('a').apply(lambda x: set.union(*x.b))

In this case it takes the union of the sets.

If you need to keep the column names you could use:

df.groupby('a').agg({'b':lambda x: set.union(*x)}).reset_index('a')


    a   b
0   1   set([1, 2, 3, 34])
1   2   set([2, 3])
2   3   set([4, 5, 6])
matt_s
matt_s Avatar answered Nov 08 '22 18:11
