Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Searching a string within a dataframe from a list

I have the following list:

search_list = ['STEEL','IRON','GOLD','SILVER']

which I need to search within a dataframe (df):

      a    b             
0    123   'Blah Blah Steel'
1    456   'Blah Blah Blah'
2    789   'Blah Blah Gold'

and insert the matching rows into a new dataframe (newdf), adding a new column with the matching word from the list:

      a    b                   c
0    123   'Blah Blah Steel'   'STEEL'
1    789   'Blah Blah Gold'    'GOLD'

I can use the following code to extract the matching row:

newdf=df[df['b'].str.upper().str.contains('|'.join(search_list),na=False)]

but I can't figure out how to add the matching word from the list into column c.

I'm thinking that the match somehow needs to capture the index of the matching word in the list and then pull the value using the index number but I can't figure out how to do this.

Any help or pointers would be greatly appreciated

Thanks

like image 914
Big_Daz Avatar asked Dec 03 '22 10:12

Big_Daz


1 Answers

You could use extract and filter out those that are nan (i.e. no match):

search_list = ['STEEL','IRON','GOLD','SILVER']

df['c'] = df.b.str.extract('({0})'.format('|'.join(search_list)), flags=re.IGNORECASE)
result = df[~pd.isna(df.c)]

print(result)

Output

              a       b      c
123 'Blah  Blah  Steel'  Steel
789 'Blah  Blah   Gold'   Gold

Note that you have to import the re module in order to use the re.IGNORECASE flag. As an alternative you could use 2 directly that is the value of the re.IGNORECASE flag.

UPDATE

As mentioned by @user3483203 you can save the import by using:

df['c'] = df.b.str.extract('(?i)({0})'.format('|'.join(search_list)))
like image 189
Dani Mesejo Avatar answered Jan 01 '23 10:01

Dani Mesejo