I know this is a question that has been asked times and again but I'm not very good with list comprehensions and there a small twist to my code.
I have a dataframe containing keywords, I'd like to filter them if the keywords contain one or more keywords from a dedicated list.
Please note that I'm not looking for the exact expression, just the occurrence of a substring in the dataframe.
Basically I think it should look something like this :
substring_list = ['abc', 'def']
df[df['tag'].str.contains(substring) for substring in substring_list]
I keep getting syntax errors.
Any ideas ?
Thanks for the support !
Use:
df['tag'].str.contains('|'.join(substring_list))
                        Simply try this:
Use pattern base search by constructing the regex by joining the words in pattern with | as follows:
df[df.tag.str.contains('|'.join(substring_list))]
In case you have only few strings to search then simple can use like below:
df[df.tag.str.contains("abc|def")]
Example illustration:
>>> df
   tag
0  abc
1  edf
2  abc
3  def
4  efg
>>> df[df.tag.str.contains("abc|def")]
   tag
0  abc
2  abc
3  def
>>> substring_list = ['abc', 'def']
>>> df[df.tag.str.contains('|'.join(substring_list))]
   tag
0  abc
2  abc
3  def
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With