I have this pandas data frame:
df = DataFrame({'id':['a','b','b','b','c','c'], 'category':['z','z','x','y','y','y'], 'category2':['1','2','2','2','1','2']})
which looks like:
category category2 id
0 z 1 a
1 z 2 b
2 x 2 b
3 y 2 b
4 y 1 c
5 y 2 c
What i'd like to do is to groupby id and return the other two columns as a concatenation of unique strings.
The outcome would look like:
category category2 id
0 z 1 a
1 zxy 2 b
2 y 12 c
Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
To achieve this, we group the DataFrame by the ID, and select the Variety column. We transform by applying a lambda function on all the rows. This lambda function concatenates all the values within the group, separated by a comma and space.
Use groupby/agg
to aggregate the groups. For each group, apply set
to find the unique strings, and ''.join
to concatenate the strings:
In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]:
category category2
id
a z 1
b yxz 2
c y 12
To move id
from the index to a column of the resultant DataFrame, call reset_index
:
In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]:
id category category2
0 a z 1
1 b yxz 2
2 c y 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With