I have the following list:
search_list = ['STEEL','IRON','GOLD','SILVER']
which I need to search within a dataframe (df):
a b
0 123 'Blah Blah Steel'
1 456 'Blah Blah Blah'
2 789 'Blah Blah Gold'
and insert the matching rows into a new dataframe (newdf), adding a new column with the matching word from the list:
a b c
0 123 'Blah Blah Steel' 'STEEL'
1 789 'Blah Blah Gold' 'GOLD'
I can use the following code to extract the matching row:
newdf=df[df['b'].str.upper().str.contains('|'.join(search_list),na=False)]
but I can't figure out how to add the matching word from the list into column c.
I'm thinking that the match somehow needs to capture the index of the matching word in the list and then pull the value using the index number but I can't figure out how to do this.
Any help or pointers would be greatly appreciated
Thanks
You could use extract and filter out those that are nan
(i.e. no match):
search_list = ['STEEL','IRON','GOLD','SILVER']
df['c'] = df.b.str.extract('({0})'.format('|'.join(search_list)), flags=re.IGNORECASE)
result = df[~pd.isna(df.c)]
print(result)
Output
a b c
123 'Blah Blah Steel' Steel
789 'Blah Blah Gold' Gold
Note that you have to import the re module in order to use the re.IGNORECASE
flag. As an alternative you could use 2
directly that is the value of the re.IGNORECASE
flag.
UPDATE
As mentioned by @user3483203 you can save the import by using:
df['c'] = df.b.str.extract('(?i)({0})'.format('|'.join(search_list)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With