Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge a string column to a set of list using Python

I have a Pandas DataFrame like this :

id     fruits
01     Apple, Apricot
02     Apple, Banana, Clementine, Pear
03     Orange, Pineapple, Pear

How can i get a list of fruits like this by deleting duplicates?

['Apple','Apricot','Banana','Clementine','Orange','Pear','Pineapple']
like image 799
ah bon Avatar asked Jun 26 '26 22:06

ah bon


2 Answers

You can flatten lists created by split, convert to sets for unique and last to lists:

a = list(set([item for sublist in df['fruits'].str.split(', ') for item in sublist]))
print (a)
['Pineapple', 'Clementine', 'Apple', 'Banana', 'Apricot', 'Orange', 'Pear']

Or:

a = df['fruits'].str.split(', ', expand=True).stack().drop_duplicates().tolist()
print (a)
['Apple', 'Apricot', 'Banana', 'Clementine', 'Pear', 'Orange', 'Pineapple']

Thanks @kabanus for alternative:

a = list(set(sum(df['fruits'].str.split(', '),[])))
like image 109
jezrael Avatar answered Jun 28 '26 11:06

jezrael


using str.extractall & drop_duplicates

df.fruits.str.extractall(r'(\w+)').drop_duplicates()[0].tolist()

outputs:

['Apple', 'Apricot', 'Banana', 'Clementine', 'Pear', 'Orange', 'Pineapple']
like image 21
Haleemur Ali Avatar answered Jun 28 '26 12:06

Haleemur Ali



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!