I have a dataframe like this:
text category
sfsd sgvv abc,xyz
zydf sefs sdfsd yyy
dfsd dsrgd dggr xyz
eter vxg wfe abc
dfvf ertet abc,xyz
I want an output like this:
text category
sfsd sgvv abc
sfsd sgvv xyz
zydf sefs sdfsd yyy
dfsd dsrgd dggr xyz
eter vxg wfe abc
dfvf ertet abc
dfvf ertet abc
Basically create a new row for each two or more category in category column.
Use DataFrame.explode (pandas 0.25+) with Series.str.split:
df1 = (df.assign(category = df['category'].str.split(','))
.explode('category')
.reset_index(drop=True))
For oldier pandas versions first DataFrame.set_index for not separator column(s), then Series.str.split and reshape by DataFrame.stack, last DataFrame.reset_index - first for remove second level of MultiIndex and then for convert index to column:
df1 = (df.set_index('text')['category']
.str.split(',', expand=True)
.stack()
.reset_index(level=1, drop=True)
.reset_index(name='category'))
print (df1)
text category
0 sfsd sgvv abc
1 sfsd sgvv xyz
2 zydf sefs sdfsd yyy
3 dfsd dsrgd dggr xyz
4 eter vxg wfe abc
5 dfvf ertet abc
6 dfvf ertet xyz
Try using set_index + stack + str.split + unstack + reset_index for much older versions:
print(df.set_index('text')
.stack()
.str.split(', ', expand=True)
.stack()
.unstack(-2)
.reset_index(-1, drop=True)
.reset_index())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With