I know this is a question that has been asked times and again but I'm not very good with list comprehensions and there a small twist to my code.
I have a dataframe containing keywords, I'd like to filter them if the keywords contain one or more keywords from a dedicated list.
Please note that I'm not looking for the exact expression, just the occurrence of a substring in the dataframe.
Basically I think it should look something like this :
substring_list = ['abc', 'def']
df[df['tag'].str.contains(substring) for substring in substring_list]
I keep getting syntax errors.
Any ideas ?
Thanks for the support !
Use:
df['tag'].str.contains('|'.join(substring_list))
Simply try this:
Use pattern base search by constructing the regex by joining the words in pattern with |
as follows:
df[df.tag.str.contains('|'.join(substring_list))]
In case you have only few strings to search then simple can use like below:
df[df.tag.str.contains("abc|def")]
Example illustration:
>>> df
tag
0 abc
1 edf
2 abc
3 def
4 efg
>>> df[df.tag.str.contains("abc|def")]
tag
0 abc
2 abc
3 def
>>> substring_list = ['abc', 'def']
>>> df[df.tag.str.contains('|'.join(substring_list))]
tag
0 abc
2 abc
3 def
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With