I have a dataframe that looks like the following:
publication_title authors type ...
title 1 ['author1', 'author2', 'author3'] proceedings
title 2 ['author4', 'author5'] collections
title 3 ['author6', 'author7'] books
.
.
.
What I want to do is take the column 'authors' and split the list inside it into several rows by duplicating all the other columns, and I want also to store the results in a new column named: 'author' and keep the original column.
The following depicts exactly what I want to achieve:
publication_title authors author type ...
title 1 ['author1', 'author2', 'author3'] author1 proceedings
title 1 ['author1', 'author2', 'author3'] author2 proceedings
title 1 ['author1', 'author2', 'author3'] author3 proceedings
title 2 ['author4', 'author5'] author4 collections
title 2 ['author4', 'author5'] author5 collections
title 3 ['author6', 'author7'] author6 books
title 3 ['author6', 'author7'] author7 books
.
.
.
I have tried to achieve this using pandas DataFrame explode method but I cannot find a way to store the results in a new column.
Thanks for the help.
Since pandas 0.25.0
we have the explode
method. First we duplicate the authors
column and rename it at the same time using assign
then we explode this column to rows and duplicate the other columns:
df.assign(author=df['authors']).explode('author')
Output
publication_title authors type author
0 title_1 [author1, author2, author3] proceedings author1
0 title_1 [author1, author2, author3] proceedings author2
0 title_1 [author1, author2, author3] proceedings author3
1 title_2 [author4, author5] collections author4
1 title_2 [author4, author5] collections author5
2 title_3 [author6, author7] books author6
2 title_3 [author6, author7] books author7
If you want remove the duplicated index, use reset_index
:
df.assign(author=df['authors']).explode('author').reset_index(drop=True)
Output
publication_title authors type author
0 title_1 [author1, author2, author3] proceedings author1
1 title_1 [author1, author2, author3] proceedings author2
2 title_1 [author1, author2, author3] proceedings author3
3 title_2 [author4, author5] collections author4
4 title_2 [author4, author5] collections author5
5 title_3 [author6, author7] books author6
6 title_3 [author6, author7] books author7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With